Statistical Modeling of High Dimensional Counts
About this online book
This online book is a chapter of a physical book about RNA Bioinformatics.
Here I will describe how count data, as often arises in RNA sequencing (RNA-seq) experiments, can be modeled using count distributions, as well as how nonparametric methods can be used to analyze count data. The book will cover basic routines for performing data input, scaling/normalization, visualization, and statistical testing to determine sets of features where the counts reflect differences in expression across samples. The final section will cover limitations of the methods presented and extensions.
The code in this book includes the basic routines that can be found in software vignettes of various Bioconductor packages, including tximeta, DESeq2, and fishpond. Please see those package vignettes for further details. Any specific questions about Bioconductor software should be posted to the Bioconductor support site:
There are also two published workflows that are related to the analysis steps and packages described here, but which explore different directions. These workflows are hosted on the Bioconductor workflow site, and checked regularly to ensure they build correctly and without error:
- rnaseqGene - gene-level exploratory analysis and differential expression (1)
- rnaseqDTU - differential transcript usage (2)
Another related reference is (3), which is a review of RNA-seq expression analysis, written by a collection of researchers who develop statistical models and software for RNA-seq data.
1. Love M, Anders S, Kim V, Huber W. 2015. RNA-seq workflow: Gene-level exploratory analysis and differential expression. F1000Research. 4(1070):
2. Love M, Soneson C, Patro R. 2018. Swimming downstream: Statistical analysis of differential transcript usage following salmon quantification. F1000Research. 7(952):
3. Van den Berge K, Hembach KM, Soneson C, Tiberi S, Clement L, et al. 2019. RNA sequencing data: Hitchhiker’s guide to expression analysis. Annual Review of Biomedical Data Science. 2(1):139–73