Reads file statistics

Description

This tool calculates general statistics of the reads in the given FASTQ or FASTA file.

Details

This tool utilizes the PRINSEQ package. The statistics are calculated using the PRINSEQ option: -stats_all.
The input data can be in FASTQ or FASTA format

Output

The command output is a table that contains following values

stats_dinuc aattDinucleotide odds ratio for AA/TT.
stats_dinuc acgtDinucleotide odds ratio for AC/GT.
stats_dinuc agctDinucleotide odds ratio for AG/CT.
stats_dinuc atDinucleotide odds ratio for AT.
stats_dinuc catgDinucleotide odds ratio for CA/TG.
stats_dinuc ccggDinucleotide odds ratio for CC/GG.
stats_dinuc cg Dinucleotide odds ratio for CG.
stats_dinuc gatcDinucleotide odds ratio for GA/TC.
stats_dinuc gcDinucleotide odds ratio for GC.
stats_dinuc taDinucleotide odds ratio for TA.
stats_dupl 3The number of 3' duplicates.
stats_dupl 3maxd
stats_dupl 5 The number of 5' duplicates.
stats_dupl 5maxd
stats_dupl exact The number of exact duplicates.
stats_dupl exactmaxd
stats_dupl exactrevcompNumber of exact duplicates with reverse complements.
stats_dupl exactrevcompmaxd
stats_dupl revcomp Number of 5'/3' duplicates with reverse complements.
stats_dupl revcompmaxd
stats_dupl totalTotal number of duplicates.
stats_info basesTotal number of bases in the input file.
stats_info readsNumber of reads in the input file.
stats_len max 101 Length of the longest read.
stats_len meanMean length of the reads.
stats_len medianMedian of the read lengths.
stats_len minLength of the shortest read.
stats_len modeMode of the read lengths.
stats_len modevalNumber of mode length sequences.
stats_len rangeRange of the sequence lengths.
stats_len stddevStandard deviation of the read lengths.
stats_ns maxnMaximum number of Ns in one read.
stats_ns maxpThe maximum percentage of Ns per read.
stats_ns seqswithnNumber of reads with ambiguous base N.
stats_tag midnum The number of predefined MIDs.
stats_tag prob3The probability of a tag sequence at the 3'-end (in percentage).
stats_tag prob5The probability of a tag sequence at the 5'-end (in percentage).

Reference

This tool is based on the PRINSEQ package.