Michael Love
Research Fellow
Irizarry group, DFCI/HSPH


thought experiment: measure potholes in Boston vs Cambridge
...sounds simple, but still we see datasets with 100% confounding of condition with experimental batch

relative to a reference genome






the local number of reads: "read depth"
changes in read depth relative to a reference:


deviation of coverage from that expected
from proporitions of molecules in the "pool"
involves polymerase copying DNA many times over



deviation of coverage from expected
given the proportion of molecules in the pool
(other steps are certainly also important)



useful plot for identifying non-uniform coverage

linear model of the Poisson rate including sequence bias

model for estimating isoform abundances including fragmentation, size selection, sequence bias
Probability of a vector of read counts \(\vec{n}\), indexed by read type j:
\[ f_\theta(\vec{n}) = \prod_j f_{Pois}(n_j, \vec{\theta} \cdot \vec{a}_j ) \]

likelihood of isoform abundances given fragment length distribution and sequence bias: used in Cufflinks



SVA: Leek et al 2007, svaseq: Leek 2014
Per gene, model the mean for sample j, \(\mu_j\), as:
\[ log(\mu_j) = \beta_0 + \beta_{b} 1_{j \in B} + \beta_{t} 1_{j \in T} \]
where B is the second batch, T is the treated samples.
count \(\sim\) \(\mathcal{L}\) (bias \(\cdot\) biology)
