Download this dataset
This data comes from a subset of the gene-level FPKM (fragments per kilobase per million sequenced reads) values from Cufflinks on an ENCODE dataset. Specifically, we look at FPKM values from two replicate RNA-Seq runs on the HUVEC cell line (GSM828373) vs two replicate RNA-Seq runs on the K562 line: (GSM828368).
There are several other methods for differential analysis of RNA-Seq data, including count based methods DESeq and edgeR as well as cuffdiff (as part of cufflinks). All of these model variance with a negative binomial model, which has been shown to be appropriate for count based data. However, FPKM values are not discrete, and none of these methods offer easy to use solutions for performing differential analysis given a table of FPKM values. Cyber-T's normal distribution error model is a reasonable approximation to use for a first pass analysis. We plan on implementing a negative binomial model derived from the same Bayesian model underlying Cyber-T in the near future.
The details of the data are: There is one header row. There is one label column, and then there are two control (HUVEC) columns and two experimental (K562) columns. The data has been preprocessed to remove all rows that are zero for all samples.
This performs a two-sample t-test using the Bayes-regularized variance estimates. No normalization is performed. PPDE analysis and multiple hypothesis testing correction are performed on the p-values.
This is equivalent to a standard t-test using empirical variances. No normalization is performed. PPDE analysis and multiple hypothesis testing correction are performed on the p-values.