Changelog

All notable changes to seq2science will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

All changed fall under either one of these types: Added, Changed, Deprecated, Removed, Fixed or Security.

Unreleased 

1.2.4 - 2025-05-14

Fixed

updated genomepy version (works with all providers again)
fixed chipseeker warnings
fixed upsetplot warnings

1.2.3 - 2025-05-9

Fixed

replaced older conda version with newer conda version, seems to install packages again
new chipseeker env

1.2.2 - 2024-02-03

Fixed

chipseeker env got corrupted, it should work again.
replaced deprecated –split-e flag with –split-3 flag for fastq downloading
removed support for GSA as their “API” changed

1.2.1 - 2023-11-15

Fixed

MultiQC error (module not found ‘imp’)

1.2.0 - 2023-09-18

Changed

DESeq2 now uses more samples to estimate count dispersions
- all samples with a label in the condition column are used
- (this feature was previously dependent on a batch effect correction in the contrast design)

Fixed

(major) regression where peak calling input controls were being ignored.

1.1.0 - 2023-09-13

Added

download samples directly from ENCODE by assay (ENCSR) and file (ENCFF) accession ids.

Changed

the init, run, and explain commands display the supported workflows in their –help

Fixed

issue when specifying colors in the samples table, causing the QC report not rendering the table correctly anymore.

1.0.4 - 2023-09-05

Changed

use the snakemake greedy scheduler as default as the ilp scheduler struggles with “many” samples
use braLan3 for motif2factors instead of braLan2

1.0.3 - 2023-07-26

Fixed

issue with printing a nice traceback when the SRA is unresponsive
more informative error with troubles inferring the strandedness of samples

1.0.2 - 2023-07-14

Fixed

crash with combination of technical reps and biological reps when combining them.
idr bug with numpy dependency > 1.20
replacing all spaces with underscores in the samples.tsv
- should only affect columns where this was not enforced already (custom columns)
- required for rule multiqc_samplesconfig

1.0.0 - 2023-05-31

Added

CRAM support for ATAC+ChIP+RNA-seq workflows (in addition to the existing alignment workflow cram support)

Changed

sctk yaml simplified

Fixed

DESeq2 should no longer crash without DE genes
bug with single-ended reads and subread
gimme maelstrom dependency missing
gimme maelstrom bug when XDG_CACHE_DIR is not set

0.9.9 - 2023-04-21

Changed

moved downloading fastqs to localrules
bam indexes are kept (not automatically removed)
Salmon updated to the latest version v1.10.1 (fixes a bug)
upsetplot updated to the latest version (fixes a bug)
genomepy updated to the latest version (no reason)
tabulate updated to the latest version (longer python support)
everything else updated to the latest version
--snakemakeOption debug_dag=True can now be used with 1 core (required)
creating conda environments now faster
- updated conda & mamba
- dropped indexing of Conda’s defaults channel
one less global variable! (sanitized_samples)
dropped correlation scores from DESeq2 clusterplots
- pheatmap is too finickey to get the fontsize right
pheatmap uses the sample order (from the samples.tsv) as best as possible

Fixed

edge case when a GSM sample is a reanalysis of another GSM sample.
error message referring to --config while it should be --configfile
cyclic dependency on rule samtools_sort (caused by tildes in config paths)
bug in DESeq2 related rules when using custom assemblies
clear error message when downloading single-end data annotated as paired-end.
“Max retries exceeded with url” for CRX samples
upsetplot & assembly_stats segfault due to interactive matplotlib backend
DESeq2 error: “EOF within quoted string”
conda environment channel priorities
trackhub index generation now removes rogue spaces from the annotation (instead of crashing)

0.9.8 - 2023-02-01

Added

(experimental) support for the Chinese Genome Sequence Archive (GSA). Samples can start with their CRX identifiers.
a force_assembly_hub flag to make an ucsc assembly hub even though a trackhub already exists

Changed

MultiQC version updated (1.14)

Fixed

edge case with the downloading-fastq workflow when samples file has an assembly column
workflow explanation not being properly added to qc report
download-fastq finishing succesfully with an error message..

0.9.7 - 2023-01-03

Added

a message at the end of a succesful seq2science run where to find the report and the trackhub
nicer error when genomepy has trouble querying the providers
flag infer_motif2factors for whether or not motif2factors should be run

Changed

Snakemake backend updated to most recent version
for atac-seq workflow macs2_keep_mates is enabled by default.
Workflow DAGs in the documentation are now simplified
MultiQC version updated
Fastqs downloaded by seq2science are now removed when not used anymore, can be turned on/off with keep_downloaded_fastq
updated gimmemotifs

Fixed

fixed macos errors thanks to reporting of @Jerolen and @WouterVGKULEUVEN
clear error when specifying unavailable trimmer (#888)
fixed bug with rule combine_biological_reps when no biological reps/descriptive names are present
issue computeMatrix_gene without configurable distances, visualized in multiQC plotProfile (#905; default now 3000 bp up&down stream of gene)

0.9.6 - 2022-10-31

Changed

all conda environments now work with strict channel priorities
singlecellTK environment updated (no longer needs pip)
increased expected RAM usage of scRNA-seq rule sctk_qc

Fixed

outdated dependency in scRNA-seq rule export_sce_obj
error in singlecellTK script with negative count values in reportCellQC()
softmask_track_1 should no longer hang indefinetly
On UCSC assembly hubs, the softmask track should align better (fixed off-by-1)
upsetplot environment being broken (matplotlib version pinned)
deeptools environment being broken (matplotlib version pinned)

0.9.5 - 2022-09-01

Changed

no longer writes multiqc filenames to an intermediate file
Updated kb-python to 0.27.3

Fixed

downloading fastq from ena directly fixed
softmask_track_1 should no longer hang indefinetly
On UCSC assembly hubs, the softmask track should align better (fixed off-by-1 error in #896)

0.9.4 - 2022-07-07

Fixed

(hotfix) pinned the snakemake backend for working rerunning

0.9.3 - 2022-06-17

Added

Seq2science makes a biological replicates count table, which is the mean of the biological replicates.
Seq2science now supports differential motif analysis by gimme maelstrom!!!
configurable setting niceness which sets a niceness prefix to all shell commands.

Fixed

issue with thread parsing when threads < 12
seq2science should fully work with slurm
samples moved to the cloud on SRA can be downloaded again with newest pysradb.
issue with generating a trackhub index with tiny transcripts (we just remove them :) )
bug with sralite files having the wrong file extension so they’re not recognized as downloaded

0.9.2 2022-05-30

Added

seq2science specific lockexception and cleanup metadata errors
deseq2science now accepts the optional argument --assembly, which can be used if the samples.tsv contains >1 assembly to specify which one is used.
- By default, the first assembly is used (same as before)

Changed

rules that download something get re-tried once, in case internet is unstable
bam files are no longer copied when sieving is not required
moved blacklist rules to blacklist.smk
rule inputs now use rules.rulename.output where possible
renamed .smk files to match the naming schemes of the other .smks.
added additional comments to clarify what happens to bam files
cleanup cache+tarballs of conda environments, saving lots of precious disk space

Fixed

fixed custom assembly extensions (e.g. ERCC spike-ins) for scATAC-seq and scRNA-seq
profiles work again
deseq2science now has a clear separation between positional and optional arguments
issue with blacklist bed containing more than 3 columns

0.9.1 - 2022-05-10

Changed

updated snakemake
- effective genome size is now estimated per kmer length instead of per sample since checkpoints should work again.

0.9.0 - 2022-05-10

Changed

renamed most globals in uppercase (main exceptions are config and samples, treps and breps)
moved most configuration steps into functions (reducing the number of stray globals)
replaced static functions with dictionaries
moved replicate stuff to the configuration
Updated Salmon
Added the option for Salmon to use the full genome as decoy sequence
Salmon now uses the full genome as decoy sequence by default.
- Config option quantifier_decoys controls which level of decoy aware quantification you want (options are ‘none’, ‘partial’ and ‘full’)
- Option ‘partial’ is insanely memory intensive, and the Salmon docs suggest no benefit…
improved parsing of the samples.tsv. More errors early on, to prevent headache later!

Fixed

get_fastq_pair_reads() was using one sample, not any sample
error message not working when trimming in scRNA-seq
trackhubs when using a mix of stranded and unstranded datasets
fix samples.tsv checks for forbidden symbols

0.8.0 - 2022-04-29

Added

idr call is configurable (idr_options)
single-cell DESeq2 (currently only via deseq2science with user-specified groups per cell)
scRNA quality control workflow with singleCellTK
- cell calling/filtering with DropletUtils
- mitochondrial gene set detection/filtering
- doublet identification/filtering with scDblFinder
- processing of alternative experiments, such as spike-in expression
- qc report generation for cell/droplet based experiments
added Seurat and FlatFile format export to scRNA qc workflow
added parameter to select velocity matrix for qc and export

Changed

rna-seq creates a TPM table for each quantification method
raw/processed scRNA count tables are now stored and exported to SingleCellExperiment S4 objects instead of Seurat S4 objects
moved scRNA post processing to separate module
export unspliced velocity counts to separate sce object
seq2science should be less susceptible to poor programming environment management by using the conda-ecosystem-user-package-isolation package
seq2science will now demand all requirements exactly the way it likes it
- this will make the workflows more stable.
local fastq files are no longer renamed (and should just work)
scRNA-seq trimming code simplified

Removed

removed scRNA merging rule due to memory issues with large and sparse samples
removed deprecated scRNA post-processing workflow (superseded by singleCellTK qc workflow)

Fixed

fixed bug causing incorrect genome string in read_kb_counts.R
bams generated with(out) filtering on size and tn5 shifting weren’t removed when not necessary anymore

0.7.2 - 2022-03-04

Added

TPM to gene counts conversion with pytxi
- by default attempts to use the GTF file to convert transcript_ids to gene_names
- otherwise will use MyGene.info
config option tpm2counts to chose which TPM to counts converter to use

Changed

pytxi is now the default TPM to gene counts converter (over tximeta)
peak/gene counts tables now use descriptive names (if given)
MultiQC DESeq2 correlation plots now display correlation metrics in the figure
using awful practices to eliminate checkpoint strandedness
deeptools_flags renamed to deeptools_bamcoverage
rna-seq trackhub per base tracks by default instead of bins per 50

Fixed

edge-cases where seq2science was too strict with rerunning
assembly stats log scale on the y-axis
s2s explain wont tell you about subsampling to -1 (all) reads
tn5 shift cigar string parsing edge-case (reference deletions/insertions)

0.7.1 - 2022-02-10

Fixed

issue with broad peaks and upsetplots

0.7.0 - 2022-02-02

Biggest change is that we revert back to snakemake 5.18 since higher versioned snakemake’s cause instability.

Added

upset plot as QC for peak calling. Should give a first feeling about the distribution of peaks between samples/conditions.

Changed

downgraded the snakemake backend as snakemake 6+ is unstable for us.

Fixed

corrupt environment creation with libreadline for edgeR normalization.
subsampling causing a crash caused by bad syntax.

0.6.1 - 2021-12-17

Fixed

corrupt environment creation with libcrypto in combination with strandedness rule

0.6.0 - 2021-12-12

Release 0.6.0 is a mix of bug fixes, small changes, and bigger stuff. Most importantly:

added a deseq2science command to do differential expression analysis on user-supplied tables with seq2science settings
for single-cell RNA-seq ADT-quantification is possible
snakemake library updated, giving seq2science a new-ish look :)

The full changes are listed below:

Added

added generic stats to the MultiQC report about the assembly, which might help pin point problems with the assembly used.
added the slop parameter to the config.yaml of atac-seq and chip-seq workflows, just so they are more visible.
added support for seurat object export and merging for kb workflow.
added support for CITE-seq-count for ADT quantification
added the option to downsample to a specific number of reads.
new deseq2science command

Changed

Seq2science now makes a separate blacklist file per blacklist option (encode & mitochondria), so that e.g. RNA-seq and ATAC-seq workflows can be run in parallel and don’t conflict on the blacklist.
error messages don’t show the full traceback anymore, making it (hopefully) more clear what is going wrong.
The effective genome size is now not calculated per sample, but per read length. When dealing with multiple samples (of similar) length this improves computational burden quite some.
samtools environment updated to version 1.14

Fixed

config option slop is now passed along to each rule using it
edge-case where local samples are in the cache, but not present in the fastq_dir
bug with differential peak/gene expression across multiple assemblies
bug with kb ref not creating index for non-velocity analysis
bug with count import in read_kb_counts.R for technical replicates and meta-data handling
deseq2 ordering in multiqc report
issue with slop not being used for the final count table
bug with onehot peaks not reporting the sample names as columns

0.5.6 - 2021-10-19

Added

MA plot, volcano plot, and PCA plots added to QC report for deseq2 related workflows

Changed

updated salmon & tximeta versions
colors for DESeq2 distance plots “fixed”
updated bwa-mem2 version and reduced the expected memory usage of bwa-mem2 to 40GB
seq2science now uses snakemake-minimal

Fixed

stranded bigwigs are no longer inverted (forward containing reverse reads and vice-versa).
fix in rename_sample preventing the inversion of R1 and R2 FASTQs.
bug with parsing cli for explanations
show/hide buttons for treps are actually made for multiqc report
fixes in deseq2/utils.R
- the samples.tsv will now work with only 2 columns
- the samples.tsv column names will be stripped of excess whitespace, similar to the config.
ATAC-seq pipeline removing the final bams, keeping the unsorted one

0.5.5 - 2021-09-01

Changed

duplicate read marking happens before sieving and no reads get removed. Removal of duplicate reads now controlled with flag remove_dups in the config.
changed option heatmap_deeptools_options to deeptools_heatmap_options
Updated sra tools and parallel fastq-dump versions
Updated genomepy version
Gene annotations are no longer gzipped and ungzipped. This should reduce rerunning.

Fixed

rerunning being triggered too easily by input order
issue with qc plots and broad peaks
magic with prefetch not having the same output location on all machines
issue with explain having duplicate lines

0.5.4 - 2021-07-07

Added

added support for kb-python kite workflow

Changed

kb count output validation
optional barcodefile argument for scRNA-seq workflow
MultiQC updated to newest version
updated kb-python version

0.5.3 - 2021-06-03

Added

DESeq2 blind sample distance & correlation cluster heatmaps for RNA-, ATAC- ChIP-seq counts
- find them annotated in the MultiQC when running >1 sample

Changed

“biological_replicate” and “technical_replicate” renamed to “…_replicates” (matches between samples.tsv & config.yaml)
fixed bug with seq2science making a {output.allsizes} file
Changed explain to use ‘passive style’
Genrich peak calling defaults
- Doesn’t remove PCR duplicates anymore (best to do with markduplicates)
- Changed extsize to 200 to be similar to macs settings
- Turned off tn5 shift, since that is done by seq2science

Fixed

depend less on local genomes (only when data is unavailable online)
trackhub explanation was missing, added
bug with broad peaks and qc that could not be made

0.5.2 - 2021-05-10

Added

added rule for scRNA post-processing R Markdown for plate/droplet based scRNA protocols (experimental)
added explanation for kb_seurat_pp rule
heatmap of N random peaks to the multiqc report in the end

Fixed

removed a warning of genome.fa.sizes already existing due to being already being downloaded beforehand (it’s removed in between)
genomepy’s provider statuc checking not being used.

0.5.1 - 2021-04-01

Added

added CLI functionality to the deseq2.R script (try it with Rscript /path/to/deseq2.R --help!)
–force flag to seq2science init to automatically overwrite existing samples.tsv and config.yaml
local fastqs with Illumina’s ‘_100’ are now recognized
added the workflow explanation to the multiqc report

Changed

config checks: all keys converted to lower case & duplicate keys throw an exception
MultiQC updated to v1.10
Link to seq2science log instead of snakemake log in final message

Fixed

Issue when filtering a combination of single-end and paired-end reads on template length
explain functionality testing
scATAC can properly use SE fastqs
scRNA can use fqexts other than R1/R2
fastq renaming works again
added missing schemas to extended docs
Bug with edgeR.upperquartile normalization. Now makes everything NaN, so pipeline finishes succesfully.

0.5.0 - 2021-03-03

Version 0.5.0 brings many quality of life improvements, such as seq2science automatically inferring what needs to be re-run when changing the samples.tsv and/or the config.yaml, differential peak analysis for chip/atac workflows and tab-completion!

To (hopefully) clear things up we changed the way technical and biological replicates are called, now technical and biological replicate, before replicate and condition.

It is important to note that the RNA-seq workflow DOES NOT remove duplicate reads anymore as a default, and that the sc/bulk ATAC-seq workflows now filters reads on the nucleosome-free region as a default.

Changed

Keep all duplicate reads in RNA-seq by default
Slimmed down the config printed at the start of a run
Changed some rules into localrules when executed on a cluster
moved onehot peaks to counts_dir
DESeq2 contrasts now accept any column names
- groups still cannot contain underscores
- no longer accepts one group name
- more examples added to the docs!

Added

dupRadar module to analyse read duplication types in RNA-seq
Differential peak analysis for ATAC- and ChIP-seq!
Options to filter bams by minimum and maximum insert sizes (added to config of bulk/sc atac)
Support experiment ids for EBI ENA and DDBJ for downloading public samples
More robust expression handling for BUS format detection from kb-python arguments
Short-hand BUS syntax for indrop v1/v2
Seq2science now supports tab-completion
Seq2science now outputs a logfile in the directory it is run

Fixed

renamed more old “replicate” variables to the new “technical_replicate”
minor logging tweak
Chipseeker now works without defining descriptive name column
fix bug in resources parsing of profiles
small bug when naming a column condition in non peak-calling workflows

0.4.3 - 2021-01-26

Changed

updated tximeta to 1.6.3 and related packages to fit (now uses R 4)
RNA-seq: sample distance matrix font scales with number of samples (should improve readability)

Fixed

RNA-seq: added sample distance matrix back to MulitQC
RNA-seq: sample distance matrix legend fixed
combine peaks with biological_replicates: keep now uses the correct peaks

0.4.2 - 2021-01-19

Changed

Updated kb-python to 0.25.1
RNA-seq with Salmon will still use bam-related QC files if bams are generated (create_trackhub = True)

Fixed

gimmemotifs not working with newest pandas, now a fixed pandas version

0.4.1 - 2020-12-18

Added

more explanations for rules

Fixed

custom genome annotations for single-cell RNA-seq workflow
trackhubs no longer looking for reversed strands if none are present

0.4.0 - 2020-12-11

Added

new workflow: (BETA) single cell RNA!

Changed

replicate renamed to technical_replicate and condition renamed to biological_replicate
bwa-mem2 default aligner for genomic workflows, instead of bwa-mem
interactive deeptools correlation heatmaps with static dendrograms in multiqc report
trackhub file permissions are set to 755 so to host the files online you don’t have to change those anymore

Fixed

bug in chip/atac trackhub generation where peaks and bigwigs used the same name, resulting in collisions and a trackhub that does not want to load
(literal) genome edge-case where taking the slop of peaks results in identical peaks. One of the duplicates is removed.
IDR should work again

0.3.2 - 2020-11-26

Added

a check to see if the downloaded fastq from ENA is not empty. Related to a recent internal error (guess) at the side of ENA sending empty fastq files
a custom message when a rule fails, that redirect to docs

Changed

sample layout lookup is split up in 100’s, to avoid a jsondecodeerror which results from very long lists of samples
the multiqc samples & config tables are generated in a script with its own environment to make base env smaller
keep_mates for macs2 turned into a script with ts own environment ot make the base env smaller
seq2science cache now respects the xdg cache
moved genome downloading rules into scripts instead of run directives, should result in user-friendlier errors

0.3.1 - 2020-11-16

Added

Added support for multiple scrna-seq platforms (Kallistobus)
Fastp detects the correct mate for trimming based on BUS settings.
Support for Kallistobus short-hand syntax.

Fixed

make a trackhub index when the gene_name is not present in gtf file
make a trackhub index when the gene_name is not present in gene all entries
update Salmon & salmon rules

0.3.1 - 2020-11-05

Added

trackhub: automatic color selection
trackhub: specify colors with the “colors” column in the samples.tsv. Accepts RGB and matplotlib colors.
trackhub: grouped samples in a composite track with sample filters and composite control

Changed

updated genomepy to 0.9.1: genomes will have alternative regions removed (if designated with “alt” in the name)
trackhub: better defaults for each track
layouts are stored per version, as to not have collisions in the way these are stored between versions.
scATAC no longer supports trackhub
bigwigs are now (BPM) normalized by default

Fixed

markduplicates now uses $TMP_DIR, if it is defined
RNA-seq cluster figures werent displaying text on some platforms
not using the local annotation files
not recognizing a mix of gzipped and unzipped annotation files
bigwigs are now correctly labelled forward/reverse (when protocol was stranded)
trackhub: RNA-seq trackhub now displays both strands of the bigwig (when protocol was stranded)
trackhub: track order is now identical to the samples.tsv (was alphabetical for ChIP-/ATAC-seq)
trackhub: assembly hub index now returns gene_name instead of transcript_id.
bug with edgeR (upperquartile) normalization failed. Not sure why it fails, but when is does, it now returns a dataframe of nan instead of failing the rule, and thus the whole pipeline.
use gimmemotifs 0.15.0, so gimme.combine_peaks works with numeric chromosome names
s2s is slightly more lenient with an edge-case when running seq2science in parallel
clearer error message when trying samples that can not be found
edge case with trying to dump sra from empty directory
now give a nice error message when a technical replicate consists of a mix of paired-end and single-end samples
issue with large number of inputs for multiqc exceeding the os command max length
bug with downloading only SRR/DRR samples (but no GSM)
issue with async generation of genome support files
checking for sequencing runs when sample is already downloaded

0.3.0 - 2020-09-22

Added

fastp as aligner (default), makes trimgalore optional other aligner
you can now specify an url for your samples file
RNA-seq: gene_id to gene_name conversion table will be output for downstream analysis
- (may be empty if gtf didn’t contain both fields or wrong formatting)
RNA-seq: quantifying with salmon will now also output a gene length table
- (gene lengths, tpms and gene counts can still be found together in the SingleCellExperiment object)

Changed

make use of pysradb for quering layout and SRR ids instead of API and web-scraping
markduplicates now removes duplicates as default
testing: clear genomepy caches between runs
add parallel-fastq-dump fallback to fasterq-dump
configuration rules split into more sections
DESeq2 options renamed (from diffexp to deseq2 and contrasts)
DESeq2 will now generate batch corrected counts (and TPMs for Salmon) for all samples, based on the set condition column.
- (batch corrected output is still meant for downstream analysis that cannot model batch effects independently, e.g. plotting)

Fixed

issue with control and technical replicates
now also SRR numbers can be directly downloaded from ENA
python3.8 syntaxwarnings
chipseeker missing gtf input
bugs with explain
bwa-mem2 not working with less than 12 cores
batch corrected TPMs no longer break when samples/rows are subset.

0.2.3 - 2020-09-01

Changed

retry mechanic for genomepy functions
moved RNA-seq sample clustering to the MultiQC
updated genomepy

Fixed

suffix being overwritten by layouts
issue with combining conditions and ruleorder for macs2
Assembly hub correctly showing annotations
.fa.sizes staying empty

0.2.2 - 2020-08-24

Added

option to add custom files to each assembly (such as ERCC spike ins for scRNA-seq)

Changed

assemblies are now checked in the configuration, similar to samples
get_genome was split in 3 rules, allowing for less reruns
Profiles are now parsed by the s2s wrapper
Checking for validity of samples.tsv now happens with pandasschema
Explicit priority arguments to all group jobs (aligner + samtools_presort)
Snakemake version (5.22.1)
Reduced threads on salmon indexing (matching aligners)
Make use of fasterq-dump instead of parallel-fastq-dump

Fixed

Test no longer use old cache files
Profiles no longer overwrite command line arguments
Fixed edge-case with condition column in samples but no peak-calling
Downloading sra with prefetch tries multiple times to correct for lost connection
Ambiguity exception with rule narrowpeak_summit
combine_peaks makes use of biological replicate’s peaks, not technical replicate’s peaks
Bug with direct peak-calling on conditions

0.2.1 - 2020-08-10

Added

Chipseeker images in MultiQC report
Samples that are on ENA are now directly downloaded from ENA as fastq. This means we skip the CPU instensive dumping step!

Fixed

Fixed issue with some samples not being findable/downloadable with s2s
Fixed has_annotation always looking for annotation even if local files present
Fixed bug where scatac-seq workflow was making fastqc reports per sample

Changed

will try to UCSC gene annotations in Ensembl format (which uses gene IDs for the gene_id field, contrary to the UCSC format that uses transcript IDs. Wild huh?)

0.2.0 - 2020-08-04

Fixed

Allow for same condition name across different assemblies & different controls

Added

HISAT2 as aligner for RNA-seq
splice-aware HISAT2 indexing for RNA-seq
quantifier HTSeq for RNA-seq
quantifier featurecounts for RNA-seq
Salmon will output a gene-level TPM matrix as well
added/expanded seq2science explain info (now covers RNA- and scATAC-seq too)
sequencing strandedness may now be inferred automatically (unless specified in the config/samples.tsv)
strandedness results are displayed in the multiQC under “Strandedness”
a DEXSeq counts matrixs can now be generated with dexseq: True
seq2science CLI now has the same reason flag as snakemake (-r/–reason flag)
(re)added fnwi + rimls logos to the qc reports that went missing in seq2science migration

Changed

rules and script names in RNA-seq. ex: txi.R is now quant_to_counts.R to better reflect its function
quant_to_counts.R now converts salmon transcript abundances to gene counts identically to DESeq2
STAR no longer outputs counts, and is no longer found under quantifiers
gene counts are generated from (filtered) bams when using either STAR or HISAT2 as aligner and HTSeq or featureCounts are quantifier
batch corrected gene counts are generated if a DESeq2 design contrast inclused a batch
batch corrected TPM are generated if a DESeq2 design contrast inclused a batch, and quantification was performed using Salmon
- for us in ANANSE, for instance
seq2science explain now retrieves messages from explain.smk.
seq2science explain now used profiles and snakemakeOptions.

Fixed

the alignment workflow no longer uses strandedness
seq2science CLI can now be run without cores with a dryrun or profile with cores
Jenkins code style (now used mamba to install flake8)

0.1.0 - 2020-07-15

Added

bwa-mem2 as aligner
new command-line option explain, which explains what has been done, and writes your material & methods section for you!

Changed

change the workflow names, replaced _ by -. (download_fastq to download-fastq, chip_seq to chip-seq, atac_seq to atac-seq, scatac_seq to scatac-seq, and rna_seq to rna-seq)
changed the way seq2science is called. Moved all the logic from bin/seq2science to seq2science/cli.py

Fixed

Bug when merging replicates and having controls

0.0.3 - 2020-07-01

Fixed

bug when specifying 2 cores, which rounded down to zero cores for samtools sorting and crash
edger environment was incompatible
seq2science cache on sensible location + seq2science clean fixed
only lookup sample layout when not local, opens up for slightly better tests in bioconda recipe

0.0.2 - 2020-06-29

Fixed

samtools using the correct nr of threads after update to v1.10

Changed

The count table for ATAC/ChIP-seq peaks is now made from finding all peaks within a range of 200 bp, and taking the most significant one (gimmemotifs’ combine_peaks) and extending the remaining peaks 200 bp. On this count table quantile normalisation, TMM, RLE and upperquartile normalisation with CPM is done. Downstream steps log transform these and mean center them. This however means that for broadpeaks no count_table is generated.
Snakefmt -l 121 applied

0.0.1 - 2020-06-17

Many minor bug- and quality of life fixes.

0.0.0 - 2020-06-11

First release of seq2science!

Changelog

Unreleased

1.2.4 - 2025-05-14

Fixed

1.2.3 - 2025-05-9

Fixed

1.2.2 - 2024-02-03

Fixed

1.2.1 - 2023-11-15

Fixed

1.2.0 - 2023-09-18

Changed

Fixed

1.1.0 - 2023-09-13

Added

Changed

Fixed

1.0.4 - 2023-09-05

Changed

1.0.3 - 2023-07-26

Fixed

1.0.2 - 2023-07-14

Fixed

1.0.0 - 2023-05-31

Added

Changed

Fixed

0.9.9 - 2023-04-21

Changed

Fixed

0.9.8 - 2023-02-01

Added

Changed

Fixed

0.9.7 - 2023-01-03

Added

Changed

Fixed

0.9.6 - 2022-10-31

Changed

Fixed

0.9.5 - 2022-09-01

Changed

Fixed

0.9.4 - 2022-07-07

Fixed

0.9.3 - 2022-06-17

Added

Fixed

0.9.2 2022-05-30

Added

Changed

Fixed

0.9.1 - 2022-05-10

Changed

0.9.0 - 2022-05-10

Changed

Fixed

0.8.0 - 2022-04-29

Added

Changed

Removed

Fixed

0.7.2 - 2022-03-04

Added

Changed

Fixed

0.7.1 - 2022-02-10

Fixed

0.7.0 - 2022-02-02

Added

Changed

Fixed

0.6.1 - 2021-12-17

Fixed

0.6.0 - 2021-12-12

Added

Changed

Fixed

0.5.6 - 2021-10-19

Unreleased 