Using the results
MultiQC quality report
All pipelines (except from the download_fastq
pipeline) output a multiQC report. The report is located under {qc_dir}/multiqc_{assembly}.html
and we highly recommend always checking out the report after a pipeline run. What is reported inside the report differs per pipeline and input. In this section we go over the generic output of a seq2science QC report and how to make use of it.
General statistics
The general statistics table shows a quick summary of your data. The table is interactive and you can sort it, add or remove columns (there are many hidden columns!) by clicking on the configure columns
button. A very useful function of the general statistics table is that it allows for the plotting of two columns against each other!
Metrics & Tools
The sections in the QC report follows a chronological order. It starts with e.g. read trimming, then report alignment metrics, and later differential analysis. The sections in detail are:
Assembly stats
The assembly stats are generated by seq2science. They function as a quick check whether the assembly you are using looks good (e.g. enough genes in annotation, or contigs (chromosomes) are big).
FastQC (raw) (only present when using trim galore as trimmer):
Here the results of FastQC are displayed before trimming.
Cutadapt (only present when using trim galore as trimmer):
The pipeline makes use of trimgalore! for automatic adapter detection and adapter & quality trimming, which under the hood makes use of cutadapt for adapter trimming. This gives us a metric of the percentage of reads that have been trimmed.
FastQC (trimmed) (only present when using trim galore as trimmer):
Here the results of FastQC are displayed after trimming.
Fastp (only present when using fastp as trimmer):
Here the results of fastp are displayed. Fastp is a trimmer that also reports a selection of read quality metrics.
Picard:
Picard is a suite of tools that can do many useful things. We use it to mark (optical and PCR) duplicates, and to get the sizes of (paired-end) inserts (useful to check for e.g. over-digestion in ATAC-seq).
Samtools pre-sieve:
Samtools Stats is part of the SamTools suite. This gives us different metrics about our aligned reads. Samtools is ran twice on our aligned reads, once before removing (we call this sieving) e.g. poorly aligned reads and duplicates, and once after.
Samtools post-sieve:
Samtools Stats is part of the SamTools suite. This gives us different metrics about our aligned reads. Samtools is ran twice on our aligned reads, once before removing (we call this sieving) e.g. poorly aligned reads and duplicates, and once after.
deeptools:
deepTools is a suite of tools to process and analyze deep sequencing data. Contains a PCA, fingerprint, and profile plot.
Strandedness:
RNA-seq sample strandedness is inferred by RSeQC‘s infer_experiment.py to improve gene counting accuracy. Results can be reviewed in the MultiQC graph (and can be used to update the samples.tsv if you disagree with the inference results).
macs2 / genrich _frips (only present with peak calling):
When calling peaks on your data the fraction reads in peaks score (frips) can be insightful about how well your experiment was performed. This is calculated with featurecounts of the subread module.
deeptools correlation heatmaps:
deepTools is a suite of tools to process and analyze deep sequencing data. These plots show the binned correlation along the genome.
Peak distribution (only present with peak calling):
Shows the distribution of reads along each peak. Figure made with deepTools.
Feature/peak distribution (only present with peak calling):
Shows where peaks are called relative to regulatory sequences in the genome. Figures are made with chipseeker
Trackhub
It is often good to ‘eyeball’ the data, and check if e.g. peak calling went alright. One of the features of the pipeline is that it can generate a trackhub. You can host the trackhub yourself on a web accessible location, and visualize on the UCSC genome browser. Alternatively, you can visualize the files locally in IGV.
Generation of the trackhub files is optional for all workflows that support it, and is turned off by default.
Set create_trackhub: True
in the config to start generating your hub.
UCSC genome browser
If you move the trackhub folder to a web-accessible location, you can upload the URL to the hub.txt
file here to gain access to your personalized hub!
If you don’t have access to a web-accessible location, the bigwig files can be manually uploaded on a UCSC trackhub, as long as the assembly used is recognized by UCSC.
Integrative Genomics Viewer
IGV is a locally run genome browser with baseline functionalities for read and sequence inspection. It is an excellent alternative for quick jobs or if you do not have access to a (large enough) web-accessible location.
BigWigs
Bigwigs visualize the sequencing depth per base and form the core of the trackhub. Bigwigs are stored in workflow-dependent locations, and linked in the trackhub folder.
Bigwig files generated by the Alignment- and RNA-seq workflow are collected in the bigwigs folder. Each aligned sample (or merged sample in case of technical replicates) is converted to a bigwig file.
Bigwig files generated by the ATAC- and ChIP-seq workflow are collected in the peak-caller directory (macs, genrich, hmmratac). Each biological replicate is converted to a bigwig file.
See the replicate handling page for more information on sample replicates and conditions.
Genome
If your genome assembly is not recognized on UCSC, a number of files must be generated to map your bigwigs to. Seq2science does this for you! It creates the required genome.2bit, as well as a cytobands file. With just these files you can search your assembly by coordinates.
Gene annotations
If gene annotations are available, these are added as annotations.bigBed. This a visible as a separate track containing genes. Additionally, the genome.2bit is indexed to allow you to search your assembly by gene name (if the annotation file was formatted properly).
Supporting tracks
Track depicting the GC-percentage and the softmasked regions of the genome are generated, similarly to MakeHub.