r - removing and aggregating duplicates -

July 15, 2012

i've posted sample of data i'm working here.

"parcel.." main indexing variable , there amount of duplicates. duplicates not consistent in of other columns. goal aggregate data set there 1 observation of each parcel. i've used following code attempt summing numerical vectors:

aggregate(ap.sample$x.11~ap.sample$parcel..,fun=sum)

the problem removes except parcel , other vector reference.

my goal use same rule numerical vectors (sum) (x.11,x.13,x.15, num_units) of observations of parcelid, different rule (average) other numerical vectors (acres,ttl_sq_ft,mtr.size), , still different rule (just pick 1 name) character variables (pretend there's column "customer.name" different values same unique parcel id, i.e. "steven condominiums" , "stephen apartments"), , delete observations other variables.

i've tried use numcolwise function doesn't need. instinct specify columns want sum , columns want take average so:

dt<-as.data.table(ap.sample) sum_cols<-ap.05[,c(10,12,14)] mean_cols<-ap.05[,c(17:19)]

and use lapply function go through each observation , need.

df05<-dt[,lapply(.sd,sum), by=dt$parcel..,.sdcols=sum_cols] df05<-dt[,lapply(.sd,mean),by=dt$parcel..,.sdcols=mean_cols]

but spits out errors on first go. know there's simpler work around trying muscle through it.

you do:

library(dplyr) df %>%    # create hypothetical "customer.name" column    mutate(customer.name = sample(letters[1:10], size = n(), replace = true)) %>%   # group data "parcel.."   group_by(parcel..) %>%    # apply sum() selected columns   mutate_each(funs(sum(.)), one_of("x.11", "x.13", "x.15", "num_units")) %>%   # likewise mean()   mutate_each(funs(mean(.)), one_of("acres", "ttl_sq_ft", "mtr.size")) %>%   # select desired columns    select(x.11, x.13, x.15, num_units, acres, ttl_sq_ft, mtr.size, customer.name) %>%   # de-duplicate while keeping arbitrary value (the first 1 in row order)   distinct(parcel..)

Search This Blog

Lix

r - removing and aggregating duplicates -

Comments

Post a Comment

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

javascript - IE11 incompatibility with jQuery's 'readonly'? -

php - How can I echo out this array? -