Highlight Samples

Regex mode off

    Rename Samples

    Click here for bulk input.

    Paste two columns of a tab-delimited table here (eg. from Excel).

    First column should be the old name, second column the new name.

    Regex mode off

      Show / Hide Samples

      Regex mode off

        Export Plots

        px
        px
        X

        Download the raw data used to create the plots in this report below:

        Note that additional data was saved in multiqc_BDGP6.32_data when this report was generated.


        Choose Plots

        If you use plots from MultiQC in a publication or presentation, please cite:

        MultiQC: Summarize analysis results for multiple tools and samples in a single report
        Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
        Bioinformatics (2016)
        doi: 10.1093/bioinformatics/btw354
        PMID: 27312411

        Save Settings

        You can save the toolbox settings for this report to the browser.


        Load Settings

        Choose a saved report profile from the dropdown box below:

        About MultiQC

        This report was generated using MultiQC, version 1.11

        You can see a YouTube video describing how to use MultiQC reports here: https://youtu.be/qPbIlO_KWN0

        For more information about MultiQC, including other videos and extensive documentation, please visit http://multiqc.info

        You can report bugs, suggest improvements and find the source code for MultiQC on GitHub: https://github.com/ewels/MultiQC

        MultiQC is published in Bioinformatics:

        MultiQC: Summarize analysis results for multiple tools and samples in a single report
        Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
        Bioinformatics (2016)
        doi: 10.1093/bioinformatics/btw354
        PMID: 27312411

        These samples were run by seq2science v0.9.6, a tool for easy preprocessing of NGS data.

        Take a look at our docs for info about how to use this report to the fullest.

        Workflow
        chip-seq
        Date
        November 28, 2022
        Project
        chip
        Contact E-mail
        yourmail@here.com

        Report generated on 2022-11-28, 11:06 based on data in:

        Change sample names:

        Welcome! Not sure where to start?   Watch a tutorial video   (6:06)

        General Statistics

        Showing 4/4 rows and 15/31 columns.
        Sample Name% DuplicationGC content% PF% Adapter% Dups% MappedM Total seqs% Proper PairsM Total seqs% AssignedGenome coverageM Genome readsM MT genome readsNumber of PeaksTreatment Redundancy
        GSM1689671
        56.0%
        43.0%
        98.0%
        2.3%
        57.1%
        95.2%
        38.5
        0.0%
        11.8
        21.0%
        11.5 X
        36.6
        0.0
        14391
        0.00
        GSM1689672
        19.6%
        44.6%
        99.8%
        19.4%
        98.2%
        24.5
        0.0%
        15.9
        15.3%
        8.5 X
        24.1
        0.0
        14154
        0.00
        GSM1689692
        15.3%
        44.4%
        96.2%
        3.5%
        10.9%
        58.9%
        5.7
        0.0%
        2.5
        22.3%
        0.8 X
        3.4
        0.0
        10728
        0.00
        GSM1689693
        51.9%
        42.2%
        99.9%
        48.0%
        98.3%
        22.0
        0.0%
        8.2
        12.4%
        7.7 X
        21.7
        0.0
        8996
        0.00

        Workflow explanation

        Preprocessing of reads was done automatically by seq2science v0.9.6 using the chip-seq workflow. Genome assembly oryLat2 was downloaded with genomepy 0.13.0. Public samples were downloaded from the Sequence Read Archive with help of the ncbi e-utilities and pysradb. The effective genome size was estimated per sample by khmer v2.0 by calculating the number of unique kmers with k being the average read length. Single-end reads were trimmed with fastp v0.20.1 with default options. The UCSC genome browser was used to visualize and inspect alignment. Reads were aligned with bwa-mem2 v2.2.1 with options '-M'. Afterwards, duplicate reads were marked with Picard MarkDuplicates v2.23.8. General alignment statistics were collected by samtools stats v1.14. Peaks were called with macs2 v2.2.7 with options '--buffer-size 10000' in BAM mode. The effective genome size was estimated by taking the number of unique kmers in the assembly of the same length as the average read length for each sample. Deeptools v3.5.1 was used for the fingerprint, profile, correlation and dendrogram/heatmap plots, where the heatmap was made with options '--distanceBetweenBins 9000 --binSize 1000'. Narrowpeak files of biological replicates belonging to the same condition were merged with fisher's method in macs2. The fraction reads in peak score (frips) was calculated by featurecounts v1.6.4. A peak feature distribution plot and peak localization plot relative to TSS were made with chipseeker. A consensus set of summits was made with gimmemotifs.combine_peaks v0.17.1. All summits were extended with 100 bp to get a consensus peakset. Finally, a count table from the consensus peakset was made with gimmemotifs.coverage_table. Differential peaks analysis on the consensus peakset was performed with gimme maelstrom. Quality control metrics were aggregated by MultiQC v1.11.

        Assembly stats

        Genome assembly BDGP6.32 contains of 1870 contigs, with a GC-content of 42.01%, and 0.80% consists of the letter N. The N50-L50 stats are 25286936-3 and the N75-L75 stats are 23542271-4. The genome annotation contains 17869 genes.

        fastp

        fastp An ultra-fast all-in-one FASTQ preprocessor (QC, adapters, trimming, filtering, splitting...)

        Filtered Reads

        Filtering statistics of sampled reads.

        Created with Highcharts 5.0.6# ReadsChart context menuExport PlotFastp: Filtered ReadsPassed FilterLow QualityToo Many NToo shortGSM1689671GSM1689672GSM1689692GSM168969302M4M6M8M10M12M14M16M18M20M22M24M26M28M30M32M34M36M38M40M42MCreated with MultiQC
        GSM1689692
        Passed Filter: 5701275(96.2%)
        Low Quality: 29844(0.5%)
        Too Many N: 0(0.0%)
        Too short: 197470(3.3%)

        Duplication Rates

        Duplication rates of sampled reads.

        Created with Highcharts 5.0.6Duplication levelRead percentChart context menuExport PlotFastp: Duplication Rate246810121416182022242628300%20%40%60%80%100%Created with MultiQC

        Sequence Quality

        Average sequencing quality over each base of all reads.

        Created with Highcharts 5.0.6Read PositionR1 Before filtering: Sequence QualityChart context menuExport PlotFastp: Sequence Quality2468101214161820222426283032343638404244464850051015202530354045Created with MultiQC

        GC Content

        Average GC content over each base of all reads.

        Created with Highcharts 5.0.6Read PositionR1 Before filtering: Base Content PercentChart context menuExport PlotFastp: Read GC Content24681012141618202224262830323436384042444648500%20%40%60%80%100%Created with MultiQC

        N content

        Average N content over each base of all reads.

        Created with Highcharts 5.0.6Read PositionR1 Before filtering: Base Content PercentChart context menuExport PlotFastp: Read N Content24681012141618202224262830323436384042444648500%1%2%3%4%5%6%Created with MultiQC

        Picard

        Picard is a set of Java command line tools for manipulating high-throughput sequencing data.

        Mark Duplicates

        Number of reads, categorised by duplication state. Pair counts are doubled - see help text for details.

        The table in the Picard metrics file contains some columns referring read pairs and some referring to single reads.

        To make the numbers in this plot sum correctly, values referring to pairs are doubled according to the scheme below:

        • READS_IN_DUPLICATE_PAIRS = 2 * READ_PAIR_DUPLICATES
        • READS_IN_UNIQUE_PAIRS = 2 * (READ_PAIRS_EXAMINED - READ_PAIR_DUPLICATES)
        • READS_IN_UNIQUE_UNPAIRED = UNPAIRED_READS_EXAMINED - UNPAIRED_READ_DUPLICATES
        • READS_IN_DUPLICATE_PAIRS_OPTICAL = 2 * READ_PAIR_OPTICAL_DUPLICATES
        • READS_IN_DUPLICATE_PAIRS_NONOPTICAL = READS_IN_DUPLICATE_PAIRS - READS_IN_DUPLICATE_PAIRS_OPTICAL
        • READS_IN_DUPLICATE_UNPAIRED = UNPAIRED_READ_DUPLICATES
        • READS_UNMAPPED = UNMAPPED_READS
        Created with Highcharts 5.0.6# ReadsChart context menuExport PlotPicard: Deduplication StatsUnique UnpairedDuplicate UnpairedUnmappedGSM1689671GSM1689672GSM1689692GSM168969302.557.51012.51517.52022.52527.53032.53537.54042.54547.55052.55557.56062.56567.57072.57577.58082.58587.59092.59597.5100Created with MultiQC

        SamTools pre-sieve

        Samtools is a suite of programs for interacting with high-throughput sequencing data.

        The pre-sieve statistics are quality metrics measured before applying (optional) minimum mapping quality, blacklist removal, mitochondrial read removal, read length filtering, and tn5 shift.

        Percent Mapped

        Alignment metrics from samtools stats; mapped vs. unmapped reads.

        For a set of samples that have come from the same multiplexed library, similar numbers of reads for each sample are expected. Large differences in numbers might indicate issues during the library preparation process. Whilst large differences in read numbers may be controlled for in downstream processings (e.g. read count normalisation), you may wish to consider whether the read depths achieved have fallen below recommended levels depending on the applications.

        Low alignment rates could indicate contamination of samples (e.g. adapter sequences), low sequencing quality or other artefacts. These can be further investigated in the sequence level QC (e.g. from FastQC).

        Created with Highcharts 5.0.6# ReadsChart context menuExport PlotSamtools stats: Alignment ScoresMappedUnmappedGSM1689671GSM1689672GSM1689692GSM168969302M4M6M8M10M12M14M16M18M20M22M24M26M28M30M32M34M36M38M40M42MCreated with MultiQC

        Alignment metrics

        This module parses the output from samtools stats. All numbers in millions.

        Hover over a data point for more information
        Created with Highcharts 5.0.605101520253035Total sequences
        Created with Highcharts 5.0.605101520253035Mapped & paired
        Created with Highcharts 5.0.605101520253035Properly paired
        Created with Highcharts 5.0.605101520253035Duplicated
        Created with Highcharts 5.0.605101520253035QC Failed
        Created with Highcharts 5.0.605101520253035Reads MQ0
        Created with Highcharts 5.0.602004006008001000120014001600Mapped bases (CIGAR)
        Created with Highcharts 5.0.602004006008001000120014001600Bases Trimmed
        Created with Highcharts 5.0.602004006008001000120014001600Duplicated bases
        Created with Highcharts 5.0.605101520253035Diff chromosomes
        Created with Highcharts 5.0.605101520253035Other orientation
        Created with Highcharts 5.0.605101520253035Inward pairs
        Created with Highcharts 5.0.605101520253035Outward pairs

        SamTools post-sieve

        Samtools is a suite of programs for interacting with high-throughput sequencing data.

        The post-sieve statistics are quality metrics measured after applying (optional) minimum mapping quality, blacklist removal, mitochondrial read removal, and tn5 shift.

        Percent Mapped

        Alignment metrics from samtools stats; mapped vs. unmapped reads.

        For a set of samples that have come from the same multiplexed library, similar numbers of reads for each sample are expected. Large differences in numbers might indicate issues during the library preparation process. Whilst large differences in read numbers may be controlled for in downstream processings (e.g. read count normalisation), you may wish to consider whether the read depths achieved have fallen below recommended levels depending on the applications.

        Low alignment rates could indicate contamination of samples (e.g. adapter sequences), low sequencing quality or other artefacts. These can be further investigated in the sequence level QC (e.g. from FastQC).

        Created with Highcharts 5.0.6# ReadsChart context menuExport PlotSamtools stats: Alignment ScoresMappedGSM1689671GSM1689672GSM1689692GSM16896930500k1000k1500k2000k2500k3000k3500k4000k4500k5000k5500k6000k6500k7000k7500k8000k8500k9000k9500k10000k10500k11000k11500k12000k12500k13000k13500k14000k14500k15000k15500k16000k16500k17000kCreated with MultiQC

        Alignment metrics

        This module parses the output from samtools stats. All numbers in millions.

        Hover over a data point for more information
        Created with Highcharts 5.0.602468101214Total sequences
        Created with Highcharts 5.0.602468101214Mapped & paired
        Created with Highcharts 5.0.602468101214Properly paired
        Created with Highcharts 5.0.602468101214Duplicated
        Created with Highcharts 5.0.602468101214QC Failed
        Created with Highcharts 5.0.602468101214Reads MQ0
        Created with Highcharts 5.0.60100200300400500600700800Mapped bases (CIGAR)
        Created with Highcharts 5.0.60100200300400500600700800Bases Trimmed
        Created with Highcharts 5.0.60100200300400500600700800Duplicated bases
        Created with Highcharts 5.0.602468101214Diff chromosomes
        Created with Highcharts 5.0.602468101214Other orientation
        Created with Highcharts 5.0.602468101214Inward pairs
        Created with Highcharts 5.0.602468101214Outward pairs

        deepTools

        deepTools is a suite of tools to process and analyze deep sequencing data.

        PCA plot

        PCA plot with the top two principal components calculated based on genome-wide distribution of sequence reads

        Created with Highcharts 5.0.6PC1PC2Chart context menuExport Plotdeeptools: PCA Plot0.280.290.30.310.320.330.340.350.360.370.380.390.40.410.420.430.44-0.4-0.3-0.2-0.100.10.20.30.40.5Created with MultiQC

        Fingerprint plot

        Signal fingerprint according to plotFingerprint

        Created with Highcharts 5.0.6rankFraction w.r.t. bin with highest coverageChart context menuExport PlotdeepTools: Fingerprint plot00.050.10.150.20.250.30.350.40.450.50.550.60.650.70.750.80.850.90.95100.20.40.60.81Created with MultiQC

        Read Distribution Profile after Annotation

        Accumulated view of the distribution of sequence reads related to the closest annotated gene. All annotated genes have been normalized to the same size.

        • Green: -2.0Kb upstream of gene to TSS
        • Yellow: TSS to TES
        • Pink: TES to 0.5Kb downstream of gene
        Created with Highcharts 5.0.6OccurrenceChart context menuExport Plotdeeptools: Read Distribution Profile after Annotation-2000-1800-1600-1400-1200-1000-800-600-400-20002004006008001000120014000510152025303540Created with MultiQC

        macs2_frips

        Subread featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations.

        Created with Highcharts 5.0.6# ReadsChart context menuExport PlotfeatureCounts: AssignmentsAssignedUnassigned: No FeaturesUnassigned: AmbiguityGSM1689671GSM1689672GSM1689692GSM16896930500k1000k1500k2000k2500k3000k3500k4000k4500k5000k5500k6000k6500k7000k7500k8000k8500k9000k9500k10000k10500k11000k11500k12000k12500k13000k13500k14000k14500k15000k15500k16000k16500k17000kCreated with MultiQC

        deepTools - Spearman correlation heatmap of reads in bins across the genome

        Spearman correlation plot generated by deeptools. Spearman correlation is a non-parametric (distribution-free) method, and assesses the monotonicity of the relationship.


        deepTools - Pearson correlation heatmap of reads in bins across the genome

        Pearson correlation plot generated by deeptools. Pearson correlation is a parametric (lots of assumptions, e.g. normality and homoscedasticity) method, and assesses the linearity of the relationship.


        Peak distributions (macs2)

        The distribution of read pileup around 20000 random peaks for each sample. This visualization is a quick and dirty way to check if your peaks look like what you would expect, and what the underlying distribution of different types of peaks is.


        Peaks per sample distribution (macs2)

        The distribution of peaks between samples. An upset plot is like a venn diagram, but is easier to read with many samples. This figure shows the overlap of peaks between conditions/samples.


        Peak feature distribution (macs2)

        Figure generated by chipseeker


        Distribution of peak locations relative to TSS (macs2)

        Figure generated by chipseeker


        DESeq2 - Sample distance cluster heatmap of counts

        Euclidean distance between samples, based on variance stabilizing transformed counts (RNA: expressed genes, ChIP: bound regions, ATAC: accessible regions). Gives us an overview of similarities and dissimilarities between samples.


        DESeq2 - Spearman correlation cluster heatmap of counts

        Correlation cluster heatmap based on variance stabilizing transformed counts. Spearman correlation is a non-parametric (distribution-free) method, and assesses the monotonicity of the relationship.


        DESeq2 - Pearson correlation cluster heatmap of counts

        Correlation cluster heatmap based on variance stabilizing transformed counts. Pearson correlation is a parametric (lots of assumptions, e.g. normality and homoscedasticity) method, and assesses the linearity of the relationship.


        gimme maelstrom macs2 results

        Gimme maelstrom is a method to infer differential motifs between samples. It solves a system of linear equations, Ax=b. Where we solve for x, A the motif scores and b the count table. It combines the results of different methods that solve this problem, and its result is the table below. It can be used to find **differential** motifs between samples.

        factors
        (direct or predicted)
        motif information z-score
        gd7_ectoderm
        z-score
        tl10b_mesoderm
        %
        with
        motif
        corr
        gd7_ectoderm
        corr
        tl10b_mesoderm
        GM.5.0.bZIP.0068
        CREBB
        3.55 -3.53
        <1
        0.03 -0.03
        GM.5.0.Homeodomain.0138
        EMS,E5
        4.60 -3.86
        <1
        0.03 -0.03
        GM.5.0.C2H2_ZF.0256
        NO ORTHOLOGS FOUND
        -3.03 0.45
        <1
        -0.01 0.01
        GM.5.0.Unknown.0003
        NO ORTHOLOGS FOUND
        -0.19 -3.10
        <1
        0.02 -0.02
        GM.5.0.Ets.0019
        ETS98B
        -2.29 3.55
        <1
        -0.03 0.03
        GM.5.0.Nuclear_receptor.0109
        ERR,SRL
        3.77 -3.31
        <1
        0.03 -0.03
        GM.5.0.Nuclear_receptor.0147
        ERR,SRL
        -3.81 3.91
        <1
        -0.03 0.03
        GM.5.0.Homeodomain.0153
        SCRO
        3.09 -3.45
        1
        0.03 -0.03
        GM.5.0.C2H2_ZF.0273
        HR51,NO ORTHOLOGS FOUND
        -4.43 5.12
        1
        -0.04 0.04
        GM.5.0.Homeodomain.0099
        TUP,NO ORTHOLOGS FOUND
        3.43 -1.25
        <1
        0.02 -0.02
        GM.5.0.C2H2_ZF.0266
        NO ORTHOLOGS FOUND
        2.97 -3.60
        1
        0.03 -0.03
        GM.5.0.AT_hook.0006
        NO ORTHOLOGS FOUND,BDP1,FOXP,SR,CG5098, (...)
        -3.20 2.86
        <1
        -0.03 0.03
        GM.5.0.bHLH.0078
        FER1
        -3.03 3.08
        <1
        -0.03 0.03
        GM.5.0.bHLH.0112
        KN
        -3.08 3.29
        <1
        -0.03 0.03
        GM.5.0.Forkhead.0015
        FD96CA,FD59A,FOXK,FD19B,BIN, (...)
        -3.44 3.65
        1
        -0.04 0.04
        GM.5.0.bZIP.0070
        MAF-S
        -3.18 0.99
        1
        -0.01 0.01
        GM.5.0.TEA.0002
        SD
        -3.42 3.14
        1
        -0.03 0.03
        GM.5.0.Unknown.0026
        NO ORTHOLOGS FOUND,ELYS
        -1.99 3.34
        <1
        -0.03 0.03
        GM.5.0.Ets.0028
        PNT
        3.27 -3.27
        <1
        0.03 -0.03
        GM.5.0.Nuclear_receptor.0122
        NO ORTHOLOGS FOUND
        -3.90 3.29
        1
        -0.03 0.03
        GM.5.0.C2H2_ZF.0229
        NO ORTHOLOGS FOUND
        3.46 -2.90
        1
        0.03 -0.03
        GM.5.0.Zinc_cluster.0001
        CG12659
        -3.18 3.22
        1
        -0.03 0.03
        GM.5.0.E2F.0020
        E2F1,E2F2,DP,NO ORTHOLOGS FOUND
        -3.38 2.93
        2
        -0.02 0.02
        GM.5.0.C2H2_ZF.0161
        NO ORTHOLOGS FOUND
        3.44 -3.31
        1
        0.04 -0.04
        GM.5.0.Mixed.0049
        MAX
        -4.31 4.30
        1
        -0.03 0.03
        GM.5.0.C2H2_ZF.0183
        CHD1,PNT,NO ORTHOLOGS FOUND
        -3.77 3.62
        1
        -0.03 0.03
        GM.5.0.Homeodomain.0107
        DFD,SCR,PB,LAB,VSX2, (...)
        5.65 -4.47
        2
        0.05 -0.05
        GM.5.0.Myb_SANT.0013
        ISWI,NO ORTHOLOGS FOUND
        -3.54 2.89
        1
        -0.03 0.03
        GM.5.0.Homeodomain.0037
        NO ORTHOLOGS FOUND
        -3.20 3.60
        2
        -0.03 0.03
        GM.5.0.C2H2_ZF_Homeodomain.0004
        DA,L_1_SC,HLH4C,AC,SC, (...)
        1.77 -3.21
        2
        0.02 -0.02

        Samples & Config

        The samples file used for this run:

        sample assembly biological_replicates descriptive_name control
        GSM1689671 BDGP6.32 gd7_ectoderm gd7_ectodermal_rep1 GSM1689679
        GSM1689672 BDGP6.32 gd7_ectoderm gd7_ectodermal_rep2 GSM1689680
        GSM1689692 BDGP6.32 tl10b_mesoderm tl10b_mesodermal_rep1 GSM1689700
        GSM1689693 BDGP6.32 tl10b_mesoderm tl10b_mesodermal_rep2 GSM1689701

        The config file used for this run:
        # tab-separated file of the samples
        samples: samples.tsv
        
        # pipeline file locations
        result_dir: ./results  # where to store results
        genome_dir: ./genomes  # where to look for or download the genomes
        # fastq_dir: ./results/fastq  # where to look for or download the fastqs
        
        
        # contact info for multiqc report and trackhub
        email: yourmail@here.com
        
        # produce a UCSC trackhub?
        create_trackhub: true
        
        # how to handle replicates
        biological_replicates: fisher  # change to "keep" to not combine them
        technical_replicates: merge    # change to "keep" to not combine them
        
        # which trimmer to use
        trimmer: fastp
        
        # which aligner to use
        aligner: bwa-mem2
        
        # filtering after alignment
        remove_blacklist: true
        min_mapping_quality: 30
        only_primary_align: true
        remove_dups: true
        
        # peak caller
        peak_caller:
          macs2:
              --buffer-size 10000
        #  genrich:
        #      -y -q 0.15
        
        # how much peak summits will be extended by (on each side) for the final count table
        # (e.g. 100 means a 200 bp wide peak)
        slop: 100
        
        # whether or not to run gimme maelstrom to infer differential motifs
        run_gimme_maelstrom: true
        
        # differential peak analysis
        # for explanation, see: https://vanheeringen-lab.github.io/seq2science/content/DESeq2.html
        #contrasts:
        #  - 'descriptive_name_all_HEL'