miRNA-seq / Differential expression analysis using edgeR
Description
This tool will perform an analysis for differentially expressed sequences using the R implementation of the edgeR algorithm.
Parameters
- Column describing groups [group]
- Apply normalization (yes, no) [yes]
- Dispersion estimate (common, tagwise) [tagwise]
- Multiple testing correction (none, Bonferroni, Holm, Hochberg, BH, BY) [BH]
- P-value cutoff (0-1) [0.05]
- Plot width (200-3200 [600]
- Plot height (200-3200) [600]
Details
Given an input table of counts data for at least two samples, the edgeR package performs scaling, normalization
and statistical analysis to identify differentially expressed genomic features between two experimental conditions.
Notice that the statistical analysis assumes that there are at least two independent biological replicate samples
for each experiment condition.
In it's current implementation, the tool only supports single-factor experiment designs. The experiment
conditions to be compared should be defined in the phenodata.tsv file and the appropriate column be selected using
the 'Column describing groups' parameter.
Scaling, to account for variations in library size, are done either by dividing with the average of the total counts for each sample or,
if the user have filled in the 'library_size' column of the phenodata.tsv file, by dividing with the average of those values.
In order to reduce the impact of so called RNA composition bias, which can arise for example when only a small number of genes
are very highly expressed in one experiment condition but not in the other, an offset value can be estimated and built into
the generalized linear model. The user can choose to turn normalization off using the 'Apply normalization' parameter.
There are two different methods for estimating the dispersion. The 'common' dispersion method assumes there are a small number of samples but
many reads for estimating a common dispersion value, whereas the 'tagwise' method might be more suited for small library sizes.
Statistical testing is performed using a generalized linear model and the p-values are adjusted for multiple testing using the classical approaches.
Output
The analysis output consists of the following:
- de-list-edger.tsv: Table containing the results of the statistical testing, including fold change estimates and p-values.
- de-list-edger.bed: The BED version of the results table contains genomic coordinates and p-value estimates and is ideal for quick navigation in the Genome Browser.
- ma-plot-raw-edger.pdf: A scatter plot of the raw count value averages between experiment conditions.
- a-plot-normalized-edger.pdf: A scatter plot of the normalized count value averages between experiment conditions.
- ma-plot-significant-edger.pdf: A scatter plot where the significantly differentially expressed features are highlighted.
- mds-plot-edger.pdf: A plot showing the results of multidimensional scaling of the data to visualize sample similarities.
- edger-log.txt: Log file.
References
This tool uses the edgeR package for statistical analysis. Please read the following article for more detailed information:
MD Robinson, DJ McCarthy, and GK Smyth. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26 (1):139Ð40, Jan 2010.
.