Summing sections of dataframe in R -
for sample dataframe:
structure(list(id = 1:10, group.id = structure(c(1l, 1l, 1l, 2l, 2l, 2l, 3l, 3l, 3l, 1l), .label = c("a", "b", "c"), class = "factor"), x = c(2.12, 1.23, 2.36, 4.21, 2.36, na, 2.36, 4.36, 1.23, 2.23), y = c(6.56, 2.36, na, 4.36, 1.23, 8.56, 4.23, 5.36, 2.36, 1.23), z = c(4.36, na, 5.23, 5.36, 1.23, 4.23, 1.23, na, 3.26, 2.23), group.x = c(na, na, na, na, na, na, na, na, na, na), group.y = c(na, na, na, na, na, na, na, na, na, na), group.z = c(na, na, na, na, na, na, na, na, na, na)), .names = c("id", "group.id", "x", "y", "z", "group.x", "group.y", "group.z"), class = "data.frame", row.names = c(na, -10l)) i wish populate group.x/y/z mean of values in columns x, y , z group id.
so, mean of values in ids 1,2,3 , 10 averaged , populated in corresponding columns "group.x", "group.y" , group.z". subsequently done groups b , c, filling in rows.
ideally additional table detailing groups , number of values , means in, assess how representative values are. basic knowledge of r, subset dataframe , mean , counts each section, there must better way... ideas?
we use data.table create new columns mean value of 'x', 'y', 'z' grouped 'group.id' column. convert 'data.frame' 'data.table' setdt(df1) (or alternatively can use as.data.table suggested @ricardo saporta. 1 advantage initial dataset remains unmodified. prefer use setdt (just subjective)). don't need create na columns in initial dataset.
library(data.table) setdt(df1)[, paste('group', c('x', 'y', 'z'), sep=".") := lapply(.sd, mean, na.rm=true), group.id, .sdcols=c('x','y','z')] assuming have na columns, make sure class same i.e. 'numeric'
setdt(df1)[, 6:8 := lapply(.sd, as.numeric), .sdcols=6:8][, paste('group', c('x', 'y', 'z'), sep=".") := lapply(.sd, mean, na.rm=true), group.id, .sdcols=c('x','y','z')]
Comments
Post a Comment