stata - Retain the cluster number for each member of a cluster within an id variable -


i label how many unique clusters of data in longitudinal dataset , have each member of cluster carry cluster count. distinct clusters sharing set of dates within id. order of distinct cluster relative previous (earlier) clusters creates desired result. coding necessary address problem of event ordering required time-dependent covariate analysis.

input id    date 1   28jan2015 1   28jan2015 2   26nov2015 3   19oct2015 4   26dec2015 5   23dec2015 6   22may2015 6   23sep2015 6   23sep2015 7   14jan2015 7   27feb2015 7   30may2015 8   16apr2015 8   16apr2015 8   16apr2015 8   16apr2015 8   16apr2015 9   17jul2015 9   03oct2015 9   03oct2015 10  27jul2015 end 

i have attempted:

bys id (date): gen count_obs = [_n] bys id date: gen count_interval_obs = [_n] egen n_interval = group(id date) 

resulting in accurate counts of total number of observations per id , enumeration of number of observations within date. however, egen function group() results in identifying each unique set of dates, numbers groups without regard id, giving:

id  wrong_cluster correct_cluster 1   28jan2015 1 1 1   28jan2015 1 1 2   26nov2015 2 1 3   19oct2015 3 1 4   26dec2015 4 1 5   23dec2015 5 1 6   22may2015 6 1 6   23sep2015 7 2 6   23sep2015 7 2 

etc.

egen, group() cannot used by: prefix.

any assistance appreciated.

todd

edit: added explanation of why cluster identification necessary. clarified rule defines cluster.

@roberto ferrer has given direct approach. follows logic uses there route using egen's group() function:

egen group = group(id date2) bysort id (group): gen clust2 = sum(group != group[_n-1]) 

Comments

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - Nothing but 'run(); ' when browsing to my local project, how do I fix this? -

php - How can I echo out this array? -