API

This is a collection of public functions and classes relevant for users of PeakSQL.

DataBase

class peaksql.database.DataBase(db: str = 'PeakSQL.sqlite', in_memory: bool = False)

The DataBase class serves as an easy interface to store and retrieve NGS data in a peaksql database.

add_assembly(assembly_path: str, assembly: str = None, species: str = None)

Add an assembly (genome) to the database. Sequences from the assembly are retrieved with PyFaidx, so they aren’t stored in the database, only the path to the fasta file is stored. This thus assumes that the assembly does not change location during during the database’s lifetime.

Parameters
  • assembly_path – The path to the assembly file.

  • assembly – The name of the assembly (optional: default is the name of the file).

  • species – The name of the species the assembly belongs to (optional: default is the assembly name)

add_data(data_path: str, assembly: str, condition: str = None)

Add data (bed, narrowPeak, or bedgraph) to the database.

Parameters
  • data_path – The path to the assembly file.

  • assembly – The name of the assembly. Requires the assembly to be added to the database prior.

  • condition – Experimental condition (optional). This allows for filtering on conditions , e.g. when streaming data with a DataSet.

property assemblies

All assemblies registred in the database.

DataSet loaders

peaksql.datasets.base

class peaksql.datasets.base._DataSet(database: str, where: str = '', seq_length: int = 200, **kwargs)

DataSet baseclass.

__getitem__(index: int) → Tuple[numpy.ndarray, numpy.ndarray]

Return the sequence in one-hot encoding and the label of the corresponding index.

get_label(assembly: str, chrom: str, chromstart: int, chromend: int) → numpy.ndarray

Get the label that corresponds to chromstart:chromend.

get_onehot_sequence(assembly: str, chrom: str, chromstart: int, chromend: int) → numpy.ndarray

Get the one-hot encoded sequence based on the assembly, chromosome, chromstart and chromend.

peaksql.datasets.bedregion

peaksql.datasets.narrowpeak module

class peaksql.datasets.narrowpeak.NarrowPeakDataSet(database: str, where: str = '', seq_length: int = 200, **kwargs)

Bases: peaksql.datasets.base._DataSet

The NarrowPeakDataSet expects that narrowPeak files have been added to the DataBase.

Util

peaksql.util.sequence_to_onehot(sequence) → numpy.ndarray

Convert a sequence of length n to a one-hot encoded array of shape (n x 4).

The nucleotides A, C, G, T respectively correspond to indices 0, 1, 2, 3.