RNA-seq / Differential expression analysis using DESeq
Description
This tool will perform an analysis for differentially expressed sequences using the R implementation of the DESeq algorithm.
Parameters
- Column describing groups [group]
- Apply normalization (yes, no) [yes]
- Disregard replicates (yes, no) [no]
- Dispersion method (fit all, fit low) [fit all]
- Dispersion estimate (parametric, local) [local]
- Multiple testing correction (none, Bonferroni, Holm, Hochberg, BH, BY) [BH]
- P-value cutoff (0-1) [0.05]
- Plot width (200-3200 [600]
- Plot height (200-3200) [600]
Details
Given an input table of counts data for at least two samples, the DESeq package performs normalization, dispersion model fitting
and statistical analysis to identify differentially expressed genomic features between two experimental conditions.
Even though it is possible to run the analysis without independent biological replicates, by either using replicates
for only one experiment condition or by estimating variability between samples of different experiment conditions,
it is highly recommended to always include at least one additional biological replicate for each experiment condition.
Notice that in it's current implementation, the tool only supports single-factor experiment designs. The experiment
conditions to be compared should be defined in the phenodata.tsv file and the appropriate column be selected using
the 'Column describing group' parameter.
If normalization is enabled, scaling of the data to account for variations in library size, are accomplished either by dividing with the average of the total counts for each sample or,
if the user have filled in the 'library_size' column of the phenodata.tsv file, by dividing with the average of those values.
A dispersion value is estimated for each genomic feature through a model fit procedure, which can be performed in a "local" or "parametric" mode.
The former is selected by default, since it is more robust, but users are encouraged to experiment with the setting to optimize results.
The data is fitted in either of two ways, where the "fit all" option is the default choice and more conservative, whereas the "fit low" setting
may yield increased sensitivity but potentially at the cost of increased false positives.
Statistical testing is performed using a generalized linear model and the p-values are adjusted for multiple testing using the classical approaches.
Output
The analysis output consists of the following:
- de-list-deseq.tsv: Table containing the results of the statistical testing, including fold change estimates and p-values.
- de-list-deseq.bed: The BED version of the results table contains genomic coordinates and p-value estimates and is ideal for quick navigation in the Genome Browser.
- ma-plot-significant-deseq.pdf: A scatter plot where the significantly differentially expressed features are highlighted.
- dispersion-plot.pdf: A plot that displays the dispersion estimates as a function of the counts values, with the fitted model overlaid.
- p-value-plot-edger.pdf: Plot of the raw and adjusted p-value distributions of the statistical test.
References
This tool uses the DESeq package for statistical analysis. Please read the following article for more detailed information:
S Anders and W H. Differential expression analysis for sequence count data. Genome Biology 2010, 11:R106.