stata - Retain the cluster number for each member of a cluster within an id variable -
i label how many unique clusters of data in longitudinal dataset , have each member of cluster carry cluster count. distinct clusters sharing set of dates within id. order of distinct cluster relative previous (earlier) clusters creates desired result. coding necessary address problem of event ordering required time-dependent covariate analysis.
input id date 1 28jan2015 1 28jan2015 2 26nov2015 3 19oct2015 4 26dec2015 5 23dec2015 6 22may2015 6 23sep2015 6 23sep2015 7 14jan2015 7 27feb2015 7 30may2015 8 16apr2015 8 16apr2015 8 16apr2015 8 16apr2015 8 16apr2015 9 17jul2015 9 03oct2015 9 03oct2015 10 27jul2015 end
i have attempted:
bys id (date): gen count_obs = [_n] bys id date: gen count_interval_obs = [_n] egen n_interval = group(id date)
resulting in accurate counts of total number of observations per id
, enumeration of number of observations within date
. however, egen
function group()
results in identifying each unique set of dates, numbers groups without regard id
, giving:
id wrong_cluster correct_cluster 1 28jan2015 1 1 1 28jan2015 1 1 2 26nov2015 2 1 3 19oct2015 3 1 4 26dec2015 4 1 5 23dec2015 5 1 6 22may2015 6 1 6 23sep2015 7 2 6 23sep2015 7 2
etc.
egen, group()
cannot used by:
prefix.
any assistance appreciated.
todd
edit: added explanation of why cluster identification necessary. clarified rule defines cluster.
@roberto ferrer has given direct approach. follows logic uses there route using egen
's group()
function:
egen group = group(id date2) bysort id (group): gen clust2 = sum(group != group[_n-1])
Comments
Post a Comment