plot - R summarize dataframe with unique features -
i have large table in following format:
data <- data.frame("chrom" = c("chr1", "chr1", "chr1", "chr4", "chr4", "chr6"), "site" = c(100, 200, 400, 140, 300, 400), "heart" = c(20, 100, 0, 35, 92, 100), "brain" = c(30, 40, 55, 100, 0, 100), "liver" = c(100, 55, 20, 90, 0, 0), "lungs" = c(100, 0, 80, 40, 30, 0))
giving:
> data chrom site heart brain liver lungs 1 chr1 100 20 30 100 100 2 chr1 200 100 40 55 0 3 chr1 400 0 55 20 80 4 chr4 140 35 100 90 40 5 chr4 300 92 0 0 30 6 chr6 400 100 100 0 0
i want make figure similar published figure. (http://www.nature.com/ncomms/2015/150218/ncomms7363/fig_tab/ncomms7363_f1.html):
basically each row (based on common chrom , site), want see how many intermediate values there are. define intermediate here values between 15 , 85. each organ know how many rows intermediate in organs, organ only, shared 2 organs or three.
showing power of data.table:
setup
library(data.table) data <- data.frame("chrom" = c("chr1", "chr1", "chr1", "chr4", "chr4", "chr6"), "site" = c(100, 200, 400, 140, 300, 400), "heart" = c(20, 100, 0, 35, 92, 100), "brain" = c(30, 40, 55, 100, 0, 100), "liver" = c(100, 55, 20, 90, 0, 0), "lungs" = c(100, 0, 80, 40, 30, 0)) dt <- data.table(data) isintermediate <- function(x){ return(x >=15 & x <= 85) } di <- dt[ , list(chrom, site, heart = isintermediate(heart), brain = isintermediate(brain), liver = isintermediate(liver), lungs = isintermediate(lungs))]
this creates matrix di
looks like:
> di chrom site heart brain liver lungs 1: chr1 100 true true false false 2: chr1 200 false true true false 3: chr1 400 false true true true 4: chr4 140 true false false true 5: chr4 300 false false false true 6: chr6 400 false false false false
with true
or false
if value intermediate or not. (might quicker way creating function, find way easy follow).
count intermediate
now, counting intermediate values chrom + site simple
# noi number intermediate > di[, list(noi = heart + brain + liver + lungs), = c("chrom","site")] chrom site noi 1: chr1 100 2 2: chr1 200 2 3: chr1 400 3 4: chr4 140 2 5: chr4 300 1 6: chr6 400 0
intermediate count organ
for number of intermediate across, gets more complicated. first, melt data using reshape
library(reshape2) da <- melt(di, id.vars = c("chrom","site"))[value == true]
this gives:
> da chrom site variable value 1: chr1 100 heart true 2: chr4 140 heart true 3: chr1 100 brain true 4: chr1 200 brain true 5: chr1 400 brain true 6: chr1 200 liver true 7: chr1 400 liver true 8: chr1 400 lungs true 9: chr4 140 lungs true 10: chr4 300 lungs true
we interested in true values, hence [value == true]
line
now need count of intermediate values each site, appended each organ. can use .n
, by=
this, , merge our initial table:
da <- merge(da,da[, list(iacc = .n), = c("chrom","site")], = c("chrom","site"))
giving:
> da chrom site variable value iacc 1: chr1 100 heart true 2 2: chr1 100 brain true 2 3: chr1 200 brain true 2 4: chr1 200 liver true 2 5: chr1 400 brain true 3 6: chr1 400 liver true 3 7: chr1 400 lungs true 3 8: chr4 140 heart true 2 9: chr4 140 lungs true 2 10: chr4 300 lungs true 1
now left count of unique iaccs each organ, can table
function:
output <- data.table(table(da[,list(variable,iacc)])) > output variable iacc n 1: heart 1 0 2: brain 1 0 3: liver 1 0 4: lungs 1 1 5: heart 2 2 6: brain 2 2 7: liver 2 1 8: lungs 2 1 9: heart 3 0 10: brain 3 1 11: liver 3 1 12: lungs 3 1
where iacc
number of organs (including itself) have intermediate value @ same chrom , site, , n number of times seen.
finally, plot (forgive default colours):
library(ggplot2) ggplot(output, aes(x = variable, y = n, fill = iacc)) + geom_bar(stat = "identity")
Comments
Post a Comment