Changelog
All notable changes to seq2science
will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
All changed fall under either one of these types: Added
, Changed
, Deprecated
, Removed
, Fixed
or Security
.
Unreleased
1.2.2 - 2024-02-03
Fixed
chipseeker env got corrupted, it should work again.
replaced deprecated –split-e flag with –split-3 flag for fastq downloading
removed support for GSA as their “API” changed
1.2.1 - 2023-11-15
Fixed
MultiQC error (module not found ‘imp’)
1.2.0 - 2023-09-18
Changed
DESeq2 now uses more samples to estimate count dispersions
all samples with a label in the condition column are used
(this feature was previously dependent on a batch effect correction in the contrast design)
Fixed
(major) regression where peak calling input controls were being ignored.
1.1.0 - 2023-09-13
Added
download samples directly from ENCODE by assay (ENCSR) and file (ENCFF) accession ids.
Changed
the init, run, and explain commands display the supported workflows in their –help
Fixed
issue when specifying colors in the samples table, causing the QC report not rendering the table correctly anymore.
1.0.4 - 2023-09-05
Changed
use the snakemake greedy scheduler as default as the ilp scheduler struggles with “many” samples
use braLan3 for motif2factors instead of braLan2
1.0.3 - 2023-07-26
Fixed
issue with printing a nice traceback when the SRA is unresponsive
more informative error with troubles inferring the strandedness of samples
1.0.2 - 2023-07-14
Fixed
crash with combination of technical reps and biological reps when combining them.
idr bug with numpy dependency > 1.20
replacing all spaces with underscores in the samples.tsv
should only affect columns where this was not enforced already (custom columns)
required for rule multiqc_samplesconfig
1.0.0 - 2023-05-31
Added
CRAM support for ATAC+ChIP+RNA-seq workflows (in addition to the existing alignment workflow cram support)
Changed
sctk yaml simplified
Fixed
DESeq2 should no longer crash without DE genes
bug with single-ended reads and subread
gimme maelstrom dependency missing
gimme maelstrom bug when XDG_CACHE_DIR is not set
0.9.9 - 2023-04-21
Changed
moved downloading fastqs to localrules
bam indexes are kept (not automatically removed)
Salmon updated to the latest version v1.10.1 (fixes a bug)
upsetplot updated to the latest version (fixes a bug)
genomepy updated to the latest version (no reason)
tabulate updated to the latest version (longer python support)
everything else updated to the latest version
--snakemakeOption debug_dag=True
can now be used with 1 core (required)creating conda environments now faster
updated conda & mamba
dropped indexing of Conda’s defaults channel
one less global variable! (sanitized_samples)
dropped correlation scores from DESeq2 clusterplots
pheatmap is too finickey to get the fontsize right
pheatmap uses the sample order (from the samples.tsv) as best as possible
Fixed
edge case when a GSM sample is a reanalysis of another GSM sample.
error message referring to
--config
while it should be--configfile
cyclic dependency on rule samtools_sort (caused by tildes in config paths)
bug in DESeq2 related rules when using custom assemblies
clear error message when downloading single-end data annotated as paired-end.
“Max retries exceeded with url” for CRX samples
upsetplot & assembly_stats segfault due to interactive matplotlib backend
DESeq2 error: “EOF within quoted string”
conda environment channel priorities
trackhub index generation now removes rogue spaces from the annotation (instead of crashing)
0.9.8 - 2023-02-01
Added
(experimental) support for the Chinese Genome Sequence Archive (GSA). Samples can start with their CRX identifiers.
a
force_assembly_hub
flag to make an ucsc assembly hub even though a trackhub already exists
Changed
MultiQC version updated (1.14)
Fixed
edge case with the downloading-fastq workflow when samples file has an assembly column
workflow explanation not being properly added to qc report
download-fastq finishing succesfully with an error message..
0.9.7 - 2023-01-03
Added
a message at the end of a succesful seq2science run where to find the report and the trackhub
nicer error when genomepy has trouble querying the providers
flag
infer_motif2factors
for whether or not motif2factors should be run
Changed
Snakemake backend updated to most recent version
for atac-seq workflow
macs2_keep_mates
is enabled by default.Workflow DAGs in the documentation are now simplified
MultiQC version updated
Fastqs downloaded by seq2science are now removed when not used anymore, can be turned on/off with
keep_downloaded_fastq
updated gimmemotifs
Fixed
fixed macos errors thanks to reporting of @Jerolen and @WouterVGKULEUVEN
clear error when specifying unavailable trimmer (#888)
fixed bug with rule combine_biological_reps when no biological reps/descriptive names are present
issue computeMatrix_gene without configurable distances, visualized in multiQC plotProfile (#905; default now 3000 bp up&down stream of gene)
0.9.6 - 2022-10-31
Changed
all conda environments now work with strict channel priorities
singlecellTK environment updated (no longer needs pip)
increased expected RAM usage of scRNA-seq rule
sctk_qc
Fixed
outdated dependency in scRNA-seq rule
export_sce_obj
error in singlecellTK script with negative count values in
reportCellQC()
softmask_track_1 should no longer hang indefinetly
On UCSC assembly hubs, the softmask track should align better (fixed off-by-1)
upsetplot environment being broken (matplotlib version pinned)
deeptools environment being broken (matplotlib version pinned)
0.9.5 - 2022-09-01
Changed
no longer writes multiqc filenames to an intermediate file
Updated kb-python to 0.27.3
Fixed
downloading fastq from ena directly fixed
softmask_track_1 should no longer hang indefinetly
On UCSC assembly hubs, the softmask track should align better (fixed off-by-1 error in #896)
0.9.4 - 2022-07-07
Fixed
(hotfix) pinned the snakemake backend for working rerunning
0.9.3 - 2022-06-17
Added
Seq2science makes a biological replicates count table, which is the mean of the biological replicates.
Seq2science now supports differential motif analysis by gimme maelstrom!!!
configurable setting
niceness
which sets a niceness prefix to all shell commands.
Fixed
issue with thread parsing when threads < 12
seq2science should fully work with slurm
samples moved to the cloud on SRA can be downloaded again with newest pysradb.
issue with generating a trackhub index with tiny transcripts (we just remove them :) )
bug with sralite files having the wrong file extension so they’re not recognized as downloaded
0.9.2 2022-05-30
Added
seq2science specific lockexception and cleanup metadata errors
deseq2science
now accepts the optional argument--assembly
, which can be used if the samples.tsv contains >1 assembly to specify which one is used.By default, the first assembly is used (same as before)
Changed
rules that download something get re-tried once, in case internet is unstable
bam files are no longer copied when sieving is not required
moved blacklist rules to blacklist.smk
rule inputs now use
rules.rulename.output
where possiblerenamed
.smk
files to match the naming schemes of the other.smk
s.added additional comments to clarify what happens to bam files
cleanup cache+tarballs of conda environments, saving lots of precious disk space
Fixed
fixed custom assembly extensions (e.g. ERCC spike-ins) for scATAC-seq and scRNA-seq
profiles work again
deseq2science
now has a clear separation between positional and optional argumentsissue with blacklist bed containing more than 3 columns
0.9.1 - 2022-05-10
Changed
updated snakemake
effective genome size is now estimated per kmer length instead of per sample since checkpoints should work again.
0.9.0 - 2022-05-10
Changed
renamed most globals in uppercase (main exceptions are
config
andsamples
,treps
andbreps
)moved most configuration steps into functions (reducing the number of stray globals)
replaced static functions with dictionaries
moved replicate stuff to the configuration
Updated Salmon
Added the option for Salmon to use the full genome as decoy sequence
Salmon now uses the full genome as decoy sequence by default.
Config option
quantifier_decoys
controls which level of decoy aware quantification you want (options are ‘none’, ‘partial’ and ‘full’)Option ‘partial’ is insanely memory intensive, and the Salmon docs suggest no benefit…
improved parsing of the samples.tsv. More errors early on, to prevent headache later!
Fixed
get_fastq_pair_reads() was using one sample, not any sample
error message not working when trimming in scRNA-seq
trackhubs when using a mix of stranded and unstranded datasets
fix samples.tsv checks for forbidden symbols
0.8.0 - 2022-04-29
Added
idr call is configurable (
idr_options
)single-cell DESeq2 (currently only via
deseq2science
with user-specified groups per cell)scRNA quality control workflow with singleCellTK
cell calling/filtering with DropletUtils
mitochondrial gene set detection/filtering
doublet identification/filtering with scDblFinder
processing of alternative experiments, such as spike-in expression
qc report generation for cell/droplet based experiments
added Seurat and FlatFile format export to scRNA qc workflow
added parameter to select velocity matrix for qc and export
Changed
rna-seq creates a TPM table for each quantification method
raw/processed scRNA count tables are now stored and exported to SingleCellExperiment S4 objects instead of Seurat S4 objects
moved scRNA post processing to separate module
export unspliced velocity counts to separate sce object
seq2science should be less susceptible to poor programming environment management by using the conda-ecosystem-user-package-isolation package
seq2science will now demand all requirements exactly the way it likes it
this will make the workflows more stable.
local fastq files are no longer renamed (and should just work)
scRNA-seq trimming code simplified
Removed
removed scRNA merging rule due to memory issues with large and sparse samples
removed deprecated scRNA post-processing workflow (superseded by singleCellTK qc workflow)
Fixed
fixed bug causing incorrect genome string in
read_kb_counts.R
bams generated with(out) filtering on size and tn5 shifting weren’t removed when not necessary anymore
0.7.2 - 2022-03-04
Added
TPM to gene counts conversion with pytxi
by default attempts to use the GTF file to convert transcript_ids to gene_names
otherwise will use MyGene.info
config option
tpm2counts
to chose which TPM to counts converter to use
Changed
pytxi is now the default TPM to gene counts converter (over tximeta)
peak/gene counts tables now use descriptive names (if given)
MultiQC DESeq2 correlation plots now display correlation metrics in the figure
using awful practices to eliminate checkpoint strandedness
deeptools_flags renamed to deeptools_bamcoverage
rna-seq trackhub per base tracks by default instead of bins per 50
Fixed
edge-cases where seq2science was too strict with rerunning
assembly stats log scale on the y-axis
s2s explain wont tell you about subsampling to -1 (all) reads
tn5 shift cigar string parsing edge-case (reference deletions/insertions)
0.7.1 - 2022-02-10
Fixed
issue with broad peaks and upsetplots
0.7.0 - 2022-02-02
Biggest change is that we revert back to snakemake 5.18 since higher versioned snakemake’s cause instability.
Added
upset plot as QC for peak calling. Should give a first feeling about the distribution of peaks between samples/conditions.
Changed
downgraded the snakemake backend as snakemake 6+ is unstable for us.
Fixed
corrupt environment creation with libreadline for edgeR normalization.
subsampling causing a crash caused by bad syntax.
0.6.1 - 2021-12-17
Fixed
corrupt environment creation with libcrypto in combination with strandedness rule
0.6.0 - 2021-12-12
Release 0.6.0 is a mix of bug fixes, small changes, and bigger stuff. Most importantly:
added a deseq2science command to do differential expression analysis on user-supplied tables with seq2science settings
for single-cell RNA-seq ADT-quantification is possible
snakemake library updated, giving seq2science a new-ish look :)
The full changes are listed below:
Added
added generic stats to the MultiQC report about the assembly, which might help pin point problems with the assembly used.
added the slop parameter to the config.yaml of atac-seq and chip-seq workflows, just so they are more visible.
added support for seurat object export and merging for kb workflow.
added support for CITE-seq-count for ADT quantification
added the option to downsample to a specific number of reads.
new deseq2science command
Changed
Seq2science now makes a separate blacklist file per blacklist option (encode & mitochondria), so that e.g. RNA-seq and ATAC-seq workflows can be run in parallel and don’t conflict on the blacklist.
error messages don’t show the full traceback anymore, making it (hopefully) more clear what is going wrong.
The effective genome size is now not calculated per sample, but per read length. When dealing with multiple samples (of similar) length this improves computational burden quite some.
samtools environment updated to version 1.14
Fixed
config option
slop
is now passed along to each rule using itedge-case where local samples are in the cache, but not present in the fastq_dir
bug with differential peak/gene expression across multiple assemblies
bug with kb ref not creating index for non-velocity analysis
bug with count import in read_kb_counts.R for technical replicates and meta-data handling
deseq2 ordering in multiqc report
issue with slop not being used for the final count table
bug with onehot peaks not reporting the sample names as columns
0.5.6 - 2021-10-19
Added
MA plot, volcano plot, and PCA plots added to QC report for deseq2 related workflows
Changed
updated salmon & tximeta versions
colors for DESeq2 distance plots “fixed”
updated bwa-mem2 version and reduced the expected memory usage of bwa-mem2 to 40GB
seq2science now uses snakemake-minimal
Fixed
stranded bigwigs are no longer inverted (forward containing reverse reads and vice-versa).
fix in
rename_sample
preventing the inversion of R1 and R2 FASTQs.bug with parsing cli for explanations
show/hide buttons for treps are actually made for multiqc report
fixes in deseq2/utils.R
the samples.tsv will now work with only 2 columns
the samples.tsv column names will be stripped of excess whitespace, similar to the config.
ATAC-seq pipeline removing the final bams, keeping the unsorted one
0.5.5 - 2021-09-01
Changed
duplicate read marking happens before sieving and no reads get removed. Removal of duplicate reads now controlled with flag
remove_dups
in the config.changed option
heatmap_deeptools_options
todeeptools_heatmap_options
Updated sra tools and parallel fastq-dump versions
Updated genomepy version
Gene annotations are no longer gzipped and ungzipped. This should reduce rerunning.
Fixed
rerunning being triggered too easily by input order
issue with qc plots and broad peaks
magic with prefetch not having the same output location on all machines
issue with explain having duplicate lines
0.5.4 - 2021-07-07
Added
added support for kb-python kite workflow
Changed
kb count output validation
optional barcodefile argument for scRNA-seq workflow
MultiQC updated to newest version
updated kb-python version
0.5.3 - 2021-06-03
Added
DESeq2 blind sample distance & correlation cluster heatmaps for RNA-, ATAC- ChIP-seq counts
find them annotated in the MultiQC when running >1 sample
Changed
“biological_replicate” and “technical_replicate” renamed to “…_replicates” (matches between samples.tsv & config.yaml)
fixed bug with seq2science making a {output.allsizes} file
Changed explain to use ‘passive style’
Genrich peak calling defaults
Doesn’t remove PCR duplicates anymore (best to do with markduplicates)
Changed extsize to 200 to be similar to macs settings
Turned off tn5 shift, since that is done by seq2science
Fixed
depend less on local genomes (only when data is unavailable online)
trackhub explanation was missing, added
bug with broad peaks and qc that could not be made
0.5.2 - 2021-05-10
Added
added rule for scRNA post-processing R Markdown for plate/droplet based scRNA protocols (experimental)
added explanation for kb_seurat_pp rule
heatmap of N random peaks to the multiqc report in the end
Fixed
removed a warning of genome.fa.sizes already existing due to being already being downloaded beforehand (it’s removed in between)
genomepy’s provider statuc checking not being used.
0.5.1 - 2021-04-01
Added
added CLI functionality to the deseq2.R script (try it with
Rscript /path/to/deseq2.R --help
!)–force flag to seq2science init to automatically overwrite existing samples.tsv and config.yaml
local fastqs with Illumina’s ‘_100’ are now recognized
added the workflow explanation to the multiqc report
Changed
config checks: all keys converted to lower case & duplicate keys throw an exception
MultiQC updated to v1.10
Link to seq2science log instead of snakemake log in final message
Fixed
Issue when filtering a combination of single-end and paired-end reads on template length
explain functionality testing
scATAC can properly use SE fastqs
scRNA can use fqexts other than R1/R2
fastq renaming works again
added missing schemas to extended docs
Bug with edgeR.upperquartile normalization. Now makes everything NaN, so pipeline finishes succesfully.
0.5.0 - 2021-03-03
Version 0.5.0 brings many quality of life improvements, such as seq2science automatically inferring what needs to be re-run when changing the samples.tsv and/or the config.yaml, differential peak analysis for chip/atac workflows and tab-completion!
To (hopefully) clear things up we changed the way technical and biological replicates are called, now technical and biological replicate, before replicate and condition.
It is important to note that the RNA-seq workflow DOES NOT remove duplicate reads anymore as a default, and that the sc/bulk ATAC-seq workflows now filters reads on the nucleosome-free region as a default.
Changed
Keep all duplicate reads in RNA-seq by default
Slimmed down the config printed at the start of a run
Changed some rules into localrules when executed on a cluster
moved onehot peaks to counts_dir
DESeq2 contrasts now accept any column names
groups still cannot contain underscores
no longer accepts one group name
more examples added to the docs!
Added
dupRadar module to analyse read duplication types in RNA-seq
Differential peak analysis for ATAC- and ChIP-seq!
Options to filter bams by minimum and maximum insert sizes (added to config of bulk/sc atac)
Support experiment ids for EBI ENA and DDBJ for downloading public samples
More robust expression handling for BUS format detection from kb-python arguments
Short-hand BUS syntax for indrop v1/v2
Seq2science now supports tab-completion
Seq2science now outputs a logfile in the directory it is run
Fixed
renamed more old “replicate” variables to the new “technical_replicate”
minor logging tweak
Chipseeker now works without defining descriptive name column
fix bug in resources parsing of profiles
small bug when naming a column condition in non peak-calling workflows
0.4.3 - 2021-01-26
Changed
updated tximeta to 1.6.3 and related packages to fit (now uses R 4)
RNA-seq: sample distance matrix font scales with number of samples (should improve readability)
Fixed
RNA-seq: added sample distance matrix back to MulitQC
RNA-seq: sample distance matrix legend fixed
combine peaks with biological_replicates: keep now uses the correct peaks
0.4.2 - 2021-01-19
Changed
Updated kb-python to 0.25.1
RNA-seq with Salmon will still use bam-related QC files if bams are generated (create_trackhub = True)
Fixed
gimmemotifs not working with newest pandas, now a fixed pandas version
0.4.1 - 2020-12-18
Added
more explanations for rules
Fixed
custom genome annotations for single-cell RNA-seq workflow
trackhubs no longer looking for reversed strands if none are present
0.4.0 - 2020-12-11
Added
new workflow: (BETA) single cell RNA!
Changed
replicate renamed to technical_replicate and condition renamed to biological_replicate
bwa-mem2 default aligner for genomic workflows, instead of bwa-mem
interactive deeptools correlation heatmaps with static dendrograms in multiqc report
trackhub file permissions are set to 755 so to host the files online you don’t have to change those anymore
Fixed
bug in chip/atac trackhub generation where peaks and bigwigs used the same name, resulting in collisions and a trackhub that does not want to load
(literal) genome edge-case where taking the slop of peaks results in identical peaks. One of the duplicates is removed.
IDR should work again
0.3.2 - 2020-11-26
Added
a check to see if the downloaded fastq from ENA is not empty. Related to a recent internal error (guess) at the side of ENA sending empty fastq files
a custom message when a rule fails, that redirect to docs
Changed
sample layout lookup is split up in 100’s, to avoid a jsondecodeerror which results from very long lists of samples
the multiqc samples & config tables are generated in a script with its own environment to make base env smaller
keep_mates for macs2 turned into a script with ts own environment ot make the base env smaller
seq2science cache now respects the xdg cache
moved genome downloading rules into scripts instead of run directives, should result in user-friendlier errors
0.3.1 - 2020-11-16
Added
Added support for multiple scrna-seq platforms (Kallistobus)
Fastp detects the correct mate for trimming based on BUS settings.
Support for Kallistobus short-hand syntax.
Fixed
make a trackhub index when the gene_name is not present in gtf file
make a trackhub index when the gene_name is not present in gene all entries
update Salmon & salmon rules
0.3.1 - 2020-11-05
Added
trackhub: automatic color selection
trackhub: specify colors with the “colors” column in the samples.tsv. Accepts RGB and matplotlib colors.
trackhub: grouped samples in a composite track with sample filters and composite control
Changed
updated genomepy to 0.9.1: genomes will have alternative regions removed (if designated with “alt” in the name)
trackhub: better defaults for each track
layouts are stored per version, as to not have collisions in the way these are stored between versions.
scATAC no longer supports trackhub
bigwigs are now (BPM) normalized by default
Fixed
markduplicates now uses $TMP_DIR, if it is defined
RNA-seq cluster figures werent displaying text on some platforms
not using the local annotation files
not recognizing a mix of gzipped and unzipped annotation files
bigwigs are now correctly labelled forward/reverse (when protocol was stranded)
trackhub: RNA-seq trackhub now displays both strands of the bigwig (when protocol was stranded)
trackhub: track order is now identical to the samples.tsv (was alphabetical for ChIP-/ATAC-seq)
trackhub: assembly hub index now returns gene_name instead of transcript_id.
bug with edgeR (upperquartile) normalization failed. Not sure why it fails, but when is does, it now returns a dataframe of nan instead of failing the rule, and thus the whole pipeline.
use gimmemotifs 0.15.0, so gimme.combine_peaks works with numeric chromosome names
s2s is slightly more lenient with an edge-case when running seq2science in parallel
clearer error message when trying samples that can not be found
edge case with trying to dump sra from empty directory
now give a nice error message when a technical replicate consists of a mix of paired-end and single-end samples
issue with large number of inputs for multiqc exceeding the os command max length
bug with downloading only SRR/DRR samples (but no GSM)
issue with async generation of genome support files
checking for sequencing runs when sample is already downloaded
0.3.0 - 2020-09-22
Added
fastp as aligner (default), makes trimgalore optional other aligner
you can now specify an url for your samples file
RNA-seq: gene_id to gene_name conversion table will be output for downstream analysis
(may be empty if gtf didn’t contain both fields or wrong formatting)
RNA-seq: quantifying with salmon will now also output a gene length table
(gene lengths, tpms and gene counts can still be found together in the SingleCellExperiment object)
Changed
make use of pysradb for quering layout and SRR ids instead of API and web-scraping
markduplicates now removes duplicates as default
testing: clear genomepy caches between runs
add parallel-fastq-dump fallback to fasterq-dump
configuration rules split into more sections
DESeq2 options renamed (from
diffexp
todeseq2
andcontrasts
)DESeq2 will now generate batch corrected counts (and TPMs for Salmon) for all samples, based on the set condition column.
(batch corrected output is still meant for downstream analysis that cannot model batch effects independently, e.g. plotting)
Fixed
issue with control and technical replicates
now also SRR numbers can be directly downloaded from ENA
python3.8 syntaxwarnings
chipseeker missing gtf input
bugs with explain
bwa-mem2 not working with less than 12 cores
batch corrected TPMs no longer break when samples/rows are subset.
0.2.3 - 2020-09-01
Changed
retry mechanic for genomepy functions
moved RNA-seq sample clustering to the MultiQC
updated genomepy
Fixed
suffix being overwritten by layouts
issue with combining conditions and ruleorder for macs2
Assembly hub correctly showing annotations
.fa.sizes staying empty
0.2.2 - 2020-08-24
Added
option to add custom files to each assembly (such as ERCC spike ins for scRNA-seq)
Changed
assemblies are now checked in the configuration, similar to samples
get_genome was split in 3 rules, allowing for less reruns
Profiles are now parsed by the s2s wrapper
Checking for validity of samples.tsv now happens with pandasschema
Explicit priority arguments to all group jobs (aligner + samtools_presort)
Snakemake version (5.22.1)
Reduced threads on salmon indexing (matching aligners)
Make use of fasterq-dump instead of parallel-fastq-dump
Fixed
Test no longer use old cache files
Profiles no longer overwrite command line arguments
Fixed edge-case with condition column in samples but no peak-calling
Downloading sra with prefetch tries multiple times to correct for lost connection
Ambiguity exception with rule narrowpeak_summit
combine_peaks makes use of biological replicate’s peaks, not technical replicate’s peaks
Bug with direct peak-calling on conditions
0.2.1 - 2020-08-10
Added
Chipseeker images in MultiQC report
Samples that are on ENA are now directly downloaded from ENA as fastq. This means we skip the CPU instensive dumping step!
Fixed
Fixed issue with some samples not being findable/downloadable with s2s
Fixed has_annotation always looking for annotation even if local files present
Fixed bug where scatac-seq workflow was making fastqc reports per sample
Changed
will try to UCSC gene annotations in Ensembl format (which uses gene IDs for the gene_id field, contrary to the UCSC format that uses transcript IDs. Wild huh?)
0.2.0 - 2020-08-04
Fixed
Allow for same condition name across different assemblies & different controls
Added
HISAT2 as aligner for RNA-seq
splice-aware HISAT2 indexing for RNA-seq
quantifier HTSeq for RNA-seq
quantifier featurecounts for RNA-seq
Salmon will output a gene-level TPM matrix as well
added/expanded
seq2science explain
info (now covers RNA- and scATAC-seq too)sequencing strandedness may now be inferred automatically (unless specified in the config/samples.tsv)
strandedness results are displayed in the multiQC under “Strandedness”
a DEXSeq counts matrixs can now be generated with
dexseq: True
seq2science CLI now has the same reason flag as snakemake (-r/–reason flag)
(re)added fnwi + rimls logos to the qc reports that went missing in seq2science migration
Changed
rules and script names in RNA-seq. ex:
txi.R
is nowquant_to_counts.R
to better reflect its functionquant_to_counts.R
now converts salmon transcript abundances to gene counts identically to DESeq2STAR no longer outputs counts, and is no longer found under
quantifiers
gene counts are generated from (filtered) bams when using either STAR or HISAT2 as aligner and HTSeq or featureCounts are quantifier
batch corrected gene counts are generated if a DESeq2 design contrast inclused a batch
batch corrected TPM are generated if a DESeq2 design contrast inclused a batch, and quantification was performed using Salmon
for us in ANANSE, for instance
seq2science explain
now retrieves messages fromexplain.smk
.seq2science explain
now used profiles and snakemakeOptions.
Fixed
the alignment workflow no longer uses strandedness
seq2science CLI can now be run without cores with a dryrun or profile with cores
Jenkins code style (now used mamba to install flake8)
0.1.0 - 2020-07-15
Added
bwa-mem2 as aligner
new command-line option
explain
, which explains what has been done, and writes your material & methods section for you!
Changed
change the workflow names, replaced _ by -. (download_fastq to download-fastq, chip_seq to chip-seq, atac_seq to atac-seq, scatac_seq to scatac-seq, and rna_seq to rna-seq)
changed the way seq2science is called. Moved all the logic from bin/seq2science to seq2science/cli.py
Fixed
Bug when merging replicates and having controls
0.0.3 - 2020-07-01
Fixed
bug when specifying 2 cores, which rounded down to zero cores for samtools sorting and crash
edger environment was incompatible
seq2science cache on sensible location + seq2science clean fixed
only lookup sample layout when not local, opens up for slightly better tests in bioconda recipe
0.0.2 - 2020-06-29
Fixed
samtools using the correct nr of threads after update to v1.10
Changed
The count table for ATAC/ChIP-seq peaks is now made from finding all peaks within a range of 200 bp, and taking the most significant one (gimmemotifs’ combine_peaks) and extending the remaining peaks 200 bp. On this count table quantile normalisation, TMM, RLE and upperquartile normalisation with CPM is done. Downstream steps log transform these and mean center them. This however means that for broadpeaks no count_table is generated.
Snakefmt -l 121 applied
0.0.1 - 2020-06-17
Many minor bug- and quality of life fixes.
0.0.0 - 2020-06-11
First release of seq2science!