Crucial for high dimensional analysis:
* also interested in chromatin accessibility or conformation
Expression of \(G\) genes, 5 replicates in 2 groups:
\[[X_{g1}, \dots, X_{g5}] \quad \textrm{vs} \quad [Y_{g1}, \dots, Y_{g5}]\]
\[ E(X_{gi}) = \mu_{gX} \]
\[ E(Y_{gi}) = \mu_{gY} \]
\[ \beta_g \equiv \log_2 \left( \frac{\mu_{gY}}{\mu_{gX}} \right) \]
References
[1] Issue of DGE (Trapnell et al., 2013)
[2] Bias models (Patro, Duggal, Love, Irizarry, & Kingsford, 2017),
[3] Isoform offset (Soneson, Love, & Robinson, 2015),
[4] TMM (M. D. Robinson & Oshlack, 2010),
[5] Median ratio (Anders & Huber, 2010),
[6] edgeR GLM (McCarthy, Chen, & Smyth, 2012),
[7] DSS (H. Wu, Wang, & Wu, 2012),
[8] DESeq2 (Love, Huber, & Anders, 2014),
[9] ZI weights (Van den Berge et al., 2018)
With prior:
(1700 black, 5100 red)
(Love et al., 2014)
(Gelman, Jakulin, Pittau, & Su, 2008)
(Gelman et al., 2008)
Approximate Posterior Estimation for GLM
optim
to obtain Hessian for Laplace approximation
Solve for \(A\) in equation of the form (Efron & Morris, 1975)
\(A = f(A, \hat{\beta}_g^2, e_g^2)\)
Motivated by the following hierarchical model:
\[ \hat{\beta}_g \sim N(\beta_g, e_g^2) \]
\[ \beta_g \sim N(0, A) \]
Two approximations to note:
# import the data
se <- tximeta(samples)
gse <- summarizeToGene(se)
# build dataset
dds <- DESeqDataSet(gse, ~batch + condition)
# variance stabilization
vsd <- vst(dds)
# size factors, dispersion estimation
dds <- DESeq(dds)
# shrinkage estimation
res <- lfcShrink(dds, coef=3)
Paper and software
lfcShrink(dds, coef=2, type="apeglm")
lfcShrink(dds, coef=2, type="ashr")
Acknowledgments
INLA
\[ \log(\delta_g) \sim f(\mu_g) + \varepsilon, \quad \varepsilon \sim t_\nu(0,\sigma^2) \]
“here we opt for a Student-t distribution as it leads to inference that is more robust to the presence of outlier genes”
Cauchy can be modeled as a mixture of Normals:
\[ \beta \sim N(0, \sigma^2) \]
\[ \sigma^2 \sim \textrm{Scale-inv-} \chi^2(\nu, S^2) \]
(Gelman et al., 2008)
Anders, S., & Huber, W. (2010). Differential expression analysis for sequence count data. Genome Biology, 11, R106. https://doi.org/10.1186/gb-2010-11-10-r106
Efron, B., & Morris, C. (1975). Data Analysis Using Stein’s Estimator and Its Generalization. Journal of the American Statistical Association, 70(350), 311–319.
Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y.-S. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 1360–1383. http://www.stat.columbia.edu/~gelman/research/published/priors11.pdf
Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12), 550. https://doi.org/10.1186/s13059-014-0550-8
McCarthy, D. J., Chen, Y., & Smyth, G. K. (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research, 40, 4288–4297. https://doi.org/10.1093/nar/gks042
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A., & Kingsford, C. (2017). Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods. http://dx.doi.org/10.1038/nmeth.4197
Robinson, M. D., & Oshlack, A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology, 11(r25). https://doi.org/10.1186/gb-2010-11-3-r25
Schurch, N. J., Schofield, P., Gierliński, M., Cole, C., Sherstnev, A., Singh, V., … Barton, G. J. (2016). How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA, 22(6), 839–851. https://doi.org/10.1261/rna.053959.115
Soneson, C., Love, M. I., & Robinson, M. (2015). Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research, 4(1521). https://doi.org/10.12688/f1000research.7563.1
Stephens, M. (2016). False discovery rates: A new deal. Biostatistics, 18(2). https://doi.org/10.1093/biostatistics/kxw041
Trapnell, C., Hendrickson, D. G., Sauvageau, M., Goff, L., Rinn, J. L., & Pachter, L. (2013). Differential analysis of gene regulation at transcript resolution with RNA-seq. Nature Biotechnology. http://dx.doi.org/10.1038/nbt.2450
Van den Berge, K., Perraudeau, F., Soneson, C., Love, M. I., Risso, D., Vert, J.-P., … Clement, L. (2018). Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications. Genome Biology, 19(24). https://doi.org/10.1186/s13059-018-1406-4
Wiel, M. A. van de, Leday, G. G., Pardo, L., Rue, H., Vaart, A. W. van der, & Wieringen, W. N. van. (2012). Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics, 14(1), 113–128. https://doi.org/10.1093/biostatistics/kxs031
Wiel, M. A. van de, Neerincx, M., Buffart, T. E., Sie, D., & Verheul, H. M. (2014). ShrinkBayes: a versatile R-package for analysis of count-based sequencing data in complex study designs. BMC Bioinformatics, 15(116). https://doi.org/10.1186/1471-2105-15-116
Wu, H., Wang, C., & Wu, Z. (2012). A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics, 14(2). https://doi.org/10.1093/biostatistics/kxs033
Zhu, A., Ibrahim, J. G., & Love, M. I. (2018). Heavy-tailed prior distributions for sequence count data: Removing the noise and preserving large differences. bioRxiv. https://doi.org/10.1101/303255