These samples were run by seq2science v0.9.6, a tool for easy preprocessing of NGS data.
Take a look at our docs for info about how to use this report to the fullest.
- Workflow
- chip-seq
- Date
- November 28, 2022
- Project
- chip
- Contact E-mail
- yourmail@here.com
Report generated on 2022-11-28, 11:06 based on data in:
/scratch/sande/seq2science_manu/chip/results/qc/assembly_BDGP6.32_stats_mqc.html
/scratch/sande/seq2science_manu/chip/results/qc/trimming/GSM1689693.fastp.json
/scratch/sande/seq2science_manu/chip/results/qc/upset/BDGP6.32-macs2_upset_mqc.jpg
/scratch/sande/seq2science_manu/chip/results/macs2/BDGP6.32-GSM1689692_peaks.xls
/scratch/sande/seq2science_manu/chip/results/qc/samtools_stats/final_bam/BDGP6.32-GSM1689693.samtools-coordinate.samtools_stats.txt
/scratch/sande/seq2science_manu/chip/results/qc/trimming/GSM1689671.fastp.json
/scratch/sande/seq2science_manu/chip/results/qc/samtools_stats/bwa-mem2/BDGP6.32-GSM1689693.samtools-coordinate.samtools_stats.txt
/scratch/sande/seq2science_manu/chip/results/bwa-mem2/BDGP6.32-GSM1689692.samtools-coordinate-unsieved.bam.mtnucratiomtnuc.json
/scratch/sande/seq2science_manu/chip/results/macs2/BDGP6.32-GSM1689671_peaks.xls
/scratch/sande/seq2science_manu/chip/results/qc/plotHeatmap_peaks/N20000-BDGP6.32-deepTools_macs2_heatmap_mqc.png
/scratch/sande/seq2science_manu/chip/results/bwa-mem2/BDGP6.32-GSM1689671.samtools-coordinate-unsieved.bam.mtnucratiomtnuc.json
/scratch/sande/seq2science_manu/chip/results/bwa-mem2/BDGP6.32-GSM1689693.samtools-coordinate-unsieved.bam.mtnucratiomtnuc.json
/scratch/sande/seq2science_manu/chip/results/qc/plotProfile_gene/BDGP6.32-macs2.tsv
/scratch/sande/seq2science_manu/chip/results/qc/gimme/BDGP6.32-gimme.vertebrate.v5.0-macs2_mqc.html
/scratch/sande/seq2science_manu/chip/results/qc/samtools_stats/bwa-mem2/BDGP6.32-GSM1689672.samtools-coordinate.samtools_stats.txt
/scratch/sande/seq2science_manu/chip/results/qc/plotCorrelation/BDGP6.32-DESeq2_sample_distance_clustering_mqc.png
/scratch/sande/seq2science_manu/chip/results/qc/markdup/BDGP6.32-GSM1689693.samtools-coordinate.metrics.txt
/scratch/sande/seq2science_manu/chip/results/log/workflow_explanation_mqc.html
/scratch/sande/seq2science_manu/chip/results/qc/chipseeker/BDGP6.32-macs2_img2_mqc.png
/scratch/sande/seq2science_manu/chip/results/qc/samplesconfig_mqc.html
/scratch/sande/seq2science_manu/chip/results/qc/macs2/BDGP6.32-GSM1689672_featureCounts.txt.summary
/scratch/sande/seq2science_manu/chip/results/qc/trimming/GSM1689692.fastp.json
/scratch/sande/seq2science_manu/chip/results/qc/samtools_stats/bwa-mem2/BDGP6.32-GSM1689692.samtools-coordinate.samtools_stats.txt
/scratch/sande/seq2science_manu/chip/results/qc/samtools_stats/bwa-mem2/BDGP6.32-GSM1689671.samtools-coordinate.samtools_stats.txt
/scratch/sande/seq2science_manu/chip/results/qc/plotCorrelation/BDGP6.32-deepTools_pearson_correlation_clustering_mqc.png
/scratch/sande/seq2science_manu/chip/results/qc/plotCorrelation/BDGP6.32-deepTools_spearman_correlation_clustering_mqc.png
/scratch/sande/seq2science_manu/chip/results/qc/trimming/GSM1689672.fastp.json
/scratch/sande/seq2science_manu/chip/results/qc/chipseeker/BDGP6.32-macs2_img1_mqc.png
/scratch/sande/seq2science_manu/chip/results/qc/plotPCA/BDGP6.32.tsv
/scratch/sande/seq2science_manu/chip/results/bwa-mem2/BDGP6.32-GSM1689672.samtools-coordinate-unsieved.bam.mtnucratiomtnuc.json
/scratch/sande/seq2science_manu/chip/results/macs2/BDGP6.32-GSM1689693_peaks.xls
/scratch/sande/seq2science_manu/chip/results/qc/plotCorrelation/BDGP6.32-DESeq2_pearson_correlation_clustering_mqc.png
/scratch/sande/seq2science_manu/chip/results/qc/samtools_stats/final_bam/BDGP6.32-GSM1689692.samtools-coordinate.samtools_stats.txt
/scratch/sande/seq2science_manu/chip/results/qc/macs2/BDGP6.32-GSM1689693_featureCounts.txt.summary
/scratch/sande/seq2science_manu/chip/results/qc/macs2/BDGP6.32-GSM1689671_featureCounts.txt.summary
/scratch/sande/seq2science_manu/chip/results/qc/macs2/BDGP6.32-GSM1689692_featureCounts.txt.summary
/scratch/sande/seq2science_manu/chip/results/qc/plotFingerprint/BDGP6.32.tsv
/scratch/sande/seq2science_manu/chip/results/qc/markdup/BDGP6.32-GSM1689692.samtools-coordinate.metrics.txt
/scratch/sande/seq2science_manu/chip/results/qc/plotCorrelation/BDGP6.32-DESeq2_spearman_correlation_clustering_mqc.png
/scratch/sande/seq2science_manu/chip/results/qc/samtools_stats/final_bam/BDGP6.32-GSM1689671.samtools-coordinate.samtools_stats.txt
/scratch/sande/seq2science_manu/chip/results/qc/markdup/BDGP6.32-GSM1689671.samtools-coordinate.metrics.txt
/scratch/sande/seq2science_manu/chip/results/qc/markdup/BDGP6.32-GSM1689672.samtools-coordinate.metrics.txt
/scratch/sande/seq2science_manu/chip/results/qc/samtools_stats/final_bam/BDGP6.32-GSM1689672.samtools-coordinate.samtools_stats.txt
/scratch/sande/seq2science_manu/chip/results/macs2/BDGP6.32-GSM1689672_peaks.xls
Change sample names:
General Statistics
Showing 4/4 rows and 15/31 columns.Sample Name | % Duplication | GC content | % PF | % Adapter | % Dups | % Mapped | M Total seqs | % Proper Pairs | M Total seqs | % Assigned | Genome coverage | M Genome reads | M MT genome reads | Number of Peaks | Treatment Redundancy |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GSM1689671 | 56.0% | 43.0% | 98.0% | 2.3% | 57.1% | 95.2% | 38.5 | 0.0% | 11.8 | 21.0% | 11.5 X | 36.6 | 0.0 | 14391 | 0.00 |
GSM1689672 | 19.6% | 44.6% | 99.8% | 19.4% | 98.2% | 24.5 | 0.0% | 15.9 | 15.3% | 8.5 X | 24.1 | 0.0 | 14154 | 0.00 | |
GSM1689692 | 15.3% | 44.4% | 96.2% | 3.5% | 10.9% | 58.9% | 5.7 | 0.0% | 2.5 | 22.3% | 0.8 X | 3.4 | 0.0 | 10728 | 0.00 |
GSM1689693 | 51.9% | 42.2% | 99.9% | 48.0% | 98.3% | 22.0 | 0.0% | 8.2 | 12.4% | 7.7 X | 21.7 | 0.0 | 8996 | 0.00 |
Workflow explanation
Assembly stats
fastp
fastp An ultra-fast all-in-one FASTQ preprocessor (QC, adapters, trimming, filtering, splitting...)
Filtered Reads
Filtering statistics of sampled reads.
Duplication Rates
Duplication rates of sampled reads.
Sequence Quality
Average sequencing quality over each base of all reads.
GC Content
Average GC content over each base of all reads.
N content
Average N content over each base of all reads.
Picard
Picard is a set of Java command line tools for manipulating high-throughput sequencing data.
Mark Duplicates
Number of reads, categorised by duplication state. Pair counts are doubled - see help text for details.
The table in the Picard metrics file contains some columns referring read pairs and some referring to single reads.
To make the numbers in this plot sum correctly, values referring to pairs are doubled according to the scheme below:
READS_IN_DUPLICATE_PAIRS = 2 * READ_PAIR_DUPLICATES
READS_IN_UNIQUE_PAIRS = 2 * (READ_PAIRS_EXAMINED - READ_PAIR_DUPLICATES)
READS_IN_UNIQUE_UNPAIRED = UNPAIRED_READS_EXAMINED - UNPAIRED_READ_DUPLICATES
READS_IN_DUPLICATE_PAIRS_OPTICAL = 2 * READ_PAIR_OPTICAL_DUPLICATES
READS_IN_DUPLICATE_PAIRS_NONOPTICAL = READS_IN_DUPLICATE_PAIRS - READS_IN_DUPLICATE_PAIRS_OPTICAL
READS_IN_DUPLICATE_UNPAIRED = UNPAIRED_READ_DUPLICATES
READS_UNMAPPED = UNMAPPED_READS
SamTools pre-sieve
Samtools is a suite of programs for interacting with high-throughput sequencing data.
The pre-sieve statistics are quality metrics measured before applying (optional) minimum mapping quality, blacklist removal, mitochondrial read removal, read length filtering, and tn5 shift.Percent Mapped
Alignment metrics from samtools stats
; mapped vs. unmapped reads.
For a set of samples that have come from the same multiplexed library, similar numbers of reads for each sample are expected. Large differences in numbers might indicate issues during the library preparation process. Whilst large differences in read numbers may be controlled for in downstream processings (e.g. read count normalisation), you may wish to consider whether the read depths achieved have fallen below recommended levels depending on the applications.
Low alignment rates could indicate contamination of samples (e.g. adapter sequences), low sequencing quality or other artefacts. These can be further investigated in the sequence level QC (e.g. from FastQC).
Alignment metrics
This module parses the output from samtools stats
. All numbers in millions.
SamTools post-sieve
Samtools is a suite of programs for interacting with high-throughput sequencing data.
The post-sieve statistics are quality metrics measured after applying (optional) minimum mapping quality, blacklist removal, mitochondrial read removal, and tn5 shift.Percent Mapped
Alignment metrics from samtools stats
; mapped vs. unmapped reads.
For a set of samples that have come from the same multiplexed library, similar numbers of reads for each sample are expected. Large differences in numbers might indicate issues during the library preparation process. Whilst large differences in read numbers may be controlled for in downstream processings (e.g. read count normalisation), you may wish to consider whether the read depths achieved have fallen below recommended levels depending on the applications.
Low alignment rates could indicate contamination of samples (e.g. adapter sequences), low sequencing quality or other artefacts. These can be further investigated in the sequence level QC (e.g. from FastQC).
Alignment metrics
This module parses the output from samtools stats
. All numbers in millions.
deepTools
deepTools is a suite of tools to process and analyze deep sequencing data.
PCA plot
PCA plot with the top two principal components calculated based on genome-wide distribution of sequence reads
Fingerprint plot
Signal fingerprint according to plotFingerprint
Read Distribution Profile after Annotation
Accumulated view of the distribution of sequence reads related to the closest annotated gene. All annotated genes have been normalized to the same size.
- Green: -2.0Kb upstream of gene to TSS
- Yellow: TSS to TES
- Pink: TES to 0.5Kb downstream of gene
macs2_frips
Subread featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations.
deepTools - Spearman correlation heatmap of reads in bins across the genome
Spearman correlation plot generated by deeptools. Spearman correlation is a non-parametric (distribution-free) method, and assesses the monotonicity of the relationship.
deepTools - Pearson correlation heatmap of reads in bins across the genome
Pearson correlation plot generated by deeptools. Pearson correlation is a parametric (lots of assumptions, e.g. normality and homoscedasticity) method, and assesses the linearity of the relationship.
Peak distributions (macs2)
The distribution of read pileup around 20000 random peaks for each sample. This visualization is a quick and dirty way to check if your peaks look like what you would expect, and what the underlying distribution of different types of peaks is.
Peaks per sample distribution (macs2)
The distribution of peaks between samples. An upset plot is like a venn diagram, but is easier to read with many samples. This figure shows the overlap of peaks between conditions/samples.
Peak feature distribution (macs2)
Figure generated by chipseeker
Distribution of peak locations relative to TSS (macs2)
Figure generated by chipseeker
DESeq2 - Sample distance cluster heatmap of counts
Euclidean distance between samples, based on variance stabilizing transformed counts (RNA: expressed genes, ChIP: bound regions, ATAC: accessible regions). Gives us an overview of similarities and dissimilarities between samples.
DESeq2 - Spearman correlation cluster heatmap of counts
Correlation cluster heatmap based on variance stabilizing transformed counts. Spearman correlation is a non-parametric (distribution-free) method, and assesses the monotonicity of the relationship.
DESeq2 - Pearson correlation cluster heatmap of counts
Correlation cluster heatmap based on variance stabilizing transformed counts. Pearson correlation is a parametric (lots of assumptions, e.g. normality and homoscedasticity) method, and assesses the linearity of the relationship.
gimme maelstrom macs2 results
Gimme maelstrom is a method to infer differential motifs between samples. It solves a system of linear equations, Ax=b. Where we solve for x, A the motif scores and b the count table. It combines the results of different methods that solve this problem, and its result is the table below. It can be used to find **differential** motifs between samples.
factors (direct or predicted) |
motif information | z-score gd7_ectoderm |
z-score tl10b_mesoderm |
% with motif |
corr gd7_ectoderm |
corr tl10b_mesoderm |
|
---|---|---|---|---|---|---|---|
GM.5.0.bZIP.0068 | CREBB |
3.55 | -3.53 | <1 |
0.03 | -0.03 | |
GM.5.0.Homeodomain.0138 | EMS,E5 |
4.60 | -3.86 | <1 |
0.03 | -0.03 | |
GM.5.0.C2H2_ZF.0256 | NO ORTHOLOGS FOUND |
-3.03 | 0.45 | <1 |
-0.01 | 0.01 | |
GM.5.0.Unknown.0003 | NO ORTHOLOGS FOUND |
-0.19 | -3.10 | <1 |
0.02 | -0.02 | |
GM.5.0.Ets.0019 | ETS98B |
-2.29 | 3.55 | <1 |
-0.03 | 0.03 | |
GM.5.0.Nuclear_receptor.0109 | ERR,SRL |
3.77 | -3.31 | <1 |
0.03 | -0.03 | |
GM.5.0.Nuclear_receptor.0147 | ERR,SRL |
-3.81 | 3.91 | <1 |
-0.03 | 0.03 | |
GM.5.0.Homeodomain.0153 | SCRO |
3.09 | -3.45 | 1 |
0.03 | -0.03 | |
GM.5.0.C2H2_ZF.0273 | HR51,NO ORTHOLOGS FOUND |
-4.43 | 5.12 | 1 |
-0.04 | 0.04 | |
GM.5.0.Homeodomain.0099 | TUP,NO ORTHOLOGS FOUND |
3.43 | -1.25 | <1 |
0.02 | -0.02 | |
GM.5.0.C2H2_ZF.0266 | NO ORTHOLOGS FOUND |
2.97 | -3.60 | 1 |
0.03 | -0.03 | |
GM.5.0.AT_hook.0006 | NO ORTHOLOGS FOUND,BDP1,FOXP,SR,CG5098, (...) |
-3.20 | 2.86 | <1 |
-0.03 | 0.03 | |
GM.5.0.bHLH.0078 | FER1 |
-3.03 | 3.08 | <1 |
-0.03 | 0.03 | |
GM.5.0.bHLH.0112 | KN |
-3.08 | 3.29 | <1 |
-0.03 | 0.03 | |
GM.5.0.Forkhead.0015 | FD96CA,FD59A,FOXK,FD19B,BIN, (...) |
-3.44 | 3.65 | 1 |
-0.04 | 0.04 | |
GM.5.0.bZIP.0070 | MAF-S |
-3.18 | 0.99 | 1 |
-0.01 | 0.01 | |
GM.5.0.TEA.0002 | SD |
-3.42 | 3.14 | 1 |
-0.03 | 0.03 | |
GM.5.0.Unknown.0026 | NO ORTHOLOGS FOUND,ELYS |
-1.99 | 3.34 | <1 |
-0.03 | 0.03 | |
GM.5.0.Ets.0028 | PNT |
3.27 | -3.27 | <1 |
0.03 | -0.03 | |
GM.5.0.Nuclear_receptor.0122 | NO ORTHOLOGS FOUND |
-3.90 | 3.29 | 1 |
-0.03 | 0.03 | |
GM.5.0.C2H2_ZF.0229 | NO ORTHOLOGS FOUND |
3.46 | -2.90 | 1 |
0.03 | -0.03 | |
GM.5.0.Zinc_cluster.0001 | CG12659 |
-3.18 | 3.22 | 1 |
-0.03 | 0.03 | |
GM.5.0.E2F.0020 | E2F1,E2F2,DP,NO ORTHOLOGS FOUND |
-3.38 | 2.93 | 2 |
-0.02 | 0.02 | |
GM.5.0.C2H2_ZF.0161 | NO ORTHOLOGS FOUND |
3.44 | -3.31 | 1 |
0.04 | -0.04 | |
GM.5.0.Mixed.0049 | MAX |
-4.31 | 4.30 | 1 |
-0.03 | 0.03 | |
GM.5.0.C2H2_ZF.0183 | CHD1,PNT,NO ORTHOLOGS FOUND |
-3.77 | 3.62 | 1 |
-0.03 | 0.03 | |
GM.5.0.Homeodomain.0107 | DFD,SCR,PB,LAB,VSX2, (...) |
5.65 | -4.47 | 2 |
0.05 | -0.05 | |
GM.5.0.Myb_SANT.0013 | ISWI,NO ORTHOLOGS FOUND |
-3.54 | 2.89 | 1 |
-0.03 | 0.03 | |
GM.5.0.Homeodomain.0037 | NO ORTHOLOGS FOUND |
-3.20 | 3.60 | 2 |
-0.03 | 0.03 | |
GM.5.0.C2H2_ZF_Homeodomain.0004 | DA,L_1_SC,HLH4C,AC,SC, (...) |
1.77 | -3.21 | 2 |
0.02 | -0.02 |
Samples & Config
sample | assembly | biological_replicates | descriptive_name | control |
---|---|---|---|---|
GSM1689671 | BDGP6.32 | gd7_ectoderm | gd7_ectodermal_rep1 | GSM1689679 |
GSM1689672 | BDGP6.32 | gd7_ectoderm | gd7_ectodermal_rep2 | GSM1689680 |
GSM1689692 | BDGP6.32 | tl10b_mesoderm | tl10b_mesodermal_rep1 | GSM1689700 |
GSM1689693 | BDGP6.32 | tl10b_mesoderm | tl10b_mesodermal_rep2 | GSM1689701 |
# tab-separated file of the samples
samples: samples.tsv
# pipeline file locations
result_dir: ./results # where to store results
genome_dir: ./genomes # where to look for or download the genomes
# fastq_dir: ./results/fastq # where to look for or download the fastqs
# contact info for multiqc report and trackhub
email: yourmail@here.com
# produce a UCSC trackhub?
create_trackhub: true
# how to handle replicates
biological_replicates: fisher # change to "keep" to not combine them
technical_replicates: merge # change to "keep" to not combine them
# which trimmer to use
trimmer: fastp
# which aligner to use
aligner: bwa-mem2
# filtering after alignment
remove_blacklist: true
min_mapping_quality: 30
only_primary_align: true
remove_dups: true
# peak caller
peak_caller:
macs2:
--buffer-size 10000
# genrich:
# -y -q 0.15
# how much peak summits will be extended by (on each side) for the final count table
# (e.g. 100 means a 200 bp wide peak)
slop: 100
# whether or not to run gimme maelstrom to infer differential motifs
run_gimme_maelstrom: true
# differential peak analysis
# for explanation, see: https://vanheeringen-lab.github.io/seq2science/content/DESeq2.html
#contrasts:
# - 'descriptive_name_all_HEL'