plot - R summarize dataframe with unique features -


i have large table in following format:

data <- data.frame("chrom" = c("chr1", "chr1", "chr1", "chr4", "chr4", "chr6"), "site" = c(100, 200, 400, 140, 300, 400), "heart" = c(20, 100, 0, 35, 92, 100), "brain" = c(30, 40, 55, 100, 0, 100), "liver" = c(100, 55, 20, 90, 0, 0), "lungs" = c(100, 0, 80, 40, 30, 0)) 

giving:

> data   chrom site heart brain liver lungs 1  chr1  100    20    30   100   100 2  chr1  200   100    40    55     0 3  chr1  400     0    55    20    80 4  chr4  140    35   100    90    40 5  chr4  300    92     0     0    30 6  chr6  400   100   100     0     0 

i want make figure similar published figure. (http://www.nature.com/ncomms/2015/150218/ncomms7363/fig_tab/ncomms7363_f1.html):

enter image description here

basically each row (based on common chrom , site), want see how many intermediate values there are. define intermediate here values between 15 , 85. each organ know how many rows intermediate in organs, organ only, shared 2 organs or three.

showing power of data.table:

setup

library(data.table)  data <- data.frame("chrom" = c("chr1", "chr1", "chr1", "chr4", "chr4", "chr6"), "site" = c(100, 200, 400, 140, 300, 400), "heart" = c(20, 100, 0, 35, 92, 100), "brain" = c(30, 40, 55, 100, 0, 100), "liver" = c(100, 55, 20, 90, 0, 0), "lungs" = c(100, 0, 80, 40, 30, 0))  dt <- data.table(data)  isintermediate <- function(x){   return(x >=15 & x <= 85) }   di <- dt[ , list(chrom, site,                  heart = isintermediate(heart),                  brain = isintermediate(brain),                  liver = isintermediate(liver),                  lungs = isintermediate(lungs))] 

this creates matrix di looks like:

> di    chrom site heart brain liver lungs 1:  chr1  100  true  true false false 2:  chr1  200 false  true  true false 3:  chr1  400 false  true  true  true 4:  chr4  140  true false false  true 5:  chr4  300 false false false  true 6:  chr6  400 false false false false 

with true or false if value intermediate or not. (might quicker way creating function, find way easy follow).

count intermediate

now, counting intermediate values chrom + site simple

# noi number intermediate  > di[, list(noi = heart + brain + liver + lungs), = c("chrom","site")]    chrom site noi 1:  chr1  100   2 2:  chr1  200   2 3:  chr1  400   3 4:  chr4  140   2 5:  chr4  300   1 6:  chr6  400   0 

intermediate count organ

for number of intermediate across, gets more complicated. first, melt data using reshape

library(reshape2)  da <- melt(di, id.vars = c("chrom","site"))[value == true] 

this gives:

> da     chrom site variable value  1:  chr1  100    heart  true  2:  chr4  140    heart  true  3:  chr1  100    brain  true  4:  chr1  200    brain  true  5:  chr1  400    brain  true  6:  chr1  200    liver  true  7:  chr1  400    liver  true  8:  chr1  400    lungs  true  9:  chr4  140    lungs  true 10:  chr4  300    lungs  true 

we interested in true values, hence [value == true] line

now need count of intermediate values each site, appended each organ. can use .n , by= this, , merge our initial table:

da <- merge(da,da[, list(iacc = .n), = c("chrom","site")], = c("chrom","site"))

giving:

> da     chrom site variable value iacc  1:  chr1  100    heart  true    2  2:  chr1  100    brain  true    2  3:  chr1  200    brain  true    2  4:  chr1  200    liver  true    2  5:  chr1  400    brain  true    3  6:  chr1  400    liver  true    3  7:  chr1  400    lungs  true    3  8:  chr4  140    heart  true    2  9:  chr4  140    lungs  true    2 10:  chr4  300    lungs  true    1 

now left count of unique iaccs each organ, can table function:

output <- data.table(table(da[,list(variable,iacc)])) > output     variable iacc n  1:    heart    1 0  2:    brain    1 0  3:    liver    1 0  4:    lungs    1 1  5:    heart    2 2  6:    brain    2 2  7:    liver    2 1  8:    lungs    2 1  9:    heart    3 0 10:    brain    3 1 11:    liver    3 1 12:    lungs    3 1 

where iacc number of organs (including itself) have intermediate value @ same chrom , site, , n number of times seen.

finally, plot (forgive default colours):

library(ggplot2)  ggplot(output, aes(x = variable, y = n, fill = iacc)) + geom_bar(stat = "identity") 

enter image description here


Comments

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - Nothing but 'run(); ' when browsing to my local project, how do I fix this? -

php - How can I echo out this array? -