r - Label as the same if the first part of string is the same -
some example data:
id_trial 001_a.txt 001_a_t2.txt 949482_b.txt 949482_b_t2.txt 95_c.txt 95_c_t2.txt
note: strings of varying length, length equal pairs minus "_t2"
how can make if part of string before _t2
same both labeled, in new column, such. is, want like:
id_trial subject 001_a.txt person_a 001_a_t2.txt person_a 949482_b.txt person_b 949482_b_t2.txt person_b 95_c.txt person_c 95_c_t2.txt person_c
even work:
id_trial subject 001_a.txt 001_a_t2.txt 949482_b.txt b 949482_b_t2.txt b 95_c.txt c 95_c_t2.txt c
any appreciated.
you can try sub
extract prefix part
df1$subject <- sub('([^_]+_.).*', '\\1',sub('([^_]+)\\1+', '\\1', df1$id_trial)) df1 # id_trial subject #1 personn_a.txt person_a #2 person_a_t2.txt person_a #3 person_b.txt person_b #4 person_b_t2.txt person_b #5 personnn_c.txt person_c #6 person_c_t2.txt person_c
if need numeric
subject
as.numeric(factor(df1$subject)) #[1] 1 1 2 2 3 3
update
for second dataset
df2$subject <- sub('\\d+_([a-z]+).*', '\\1', df2$id_trial) df2 # id_trial subject #1 001_a.txt #2 001_a_t2.txt #3 949482_b.txt b #4 949482_b_t2.txt b #5 95_c.txt c #6 95_c_t2.txt c
data
df1 <- structure(list(id_trial = c("personn_a.txt", "person_a_t2.txt", "person_b.txt", "person_b_t2.txt", "personnn_c.txt", "person_c_t2.txt" )), .names = "id_trial", class = "data.frame", row.names = c(na, -6l)) df2 <- structure(list(id_trial = c("001_a.txt", "001_a_t2.txt", "949482_b.txt", "949482_b_t2.txt", "95_c.txt", "95_c_t2.txt")), .names = "id_trial", class = "data.frame", row.names = c(na, -6l))
Comments
Post a Comment