API¶

This is a collection of public functions and classes relevant for users of PeakSQL.

DataBase¶

class peaksql.database.DataBase(db: str = 'PeakSQL.sqlite', in_memory: bool = False)¶

The DataBase class serves as an easy interface to store and retrieve NGS data in a peaksql database.

add_assembly(assembly_path: str, assembly: str = None, species: str = None)¶

Add an assembly (genome) to the database. Sequences from the assembly are retrieved with PyFaidx, so they aren’t stored in the database, only the path to the fasta file is stored. This thus assumes that the assembly does not change location during during the database’s lifetime.

Parameters

assembly_path – The path to the assembly file.
assembly – The name of the assembly (optional: default is the name of the file).
species – The name of the species the assembly belongs to (optional: default is the assembly name)

add_data(data_path: str, assembly: str, condition: str = None)¶

Add data (bed, narrowPeak, or bedgraph) to the database.

Parameters

data_path – The path to the assembly file.
assembly – The name of the assembly. Requires the assembly to be added to the database prior.
condition – Experimental condition (optional). This allows for filtering on conditions , e.g. when streaming data with a DataSet.

property assemblies¶: All assemblies registred in the database.

DataSet loaders¶

peaksql.datasets.base¶

class peaksql.datasets.base._DataSet(database: str, where: str = '', seq_length: int = 200, **kwargs)¶

DataSet baseclass.

__getitem__(index: int) → Tuple[numpy.ndarray, numpy.ndarray]¶: Return the sequence in one-hot encoding and the label of the corresponding index.

get_label(assembly: str, chrom: str, chromstart: int, chromend: int) → numpy.ndarray¶: Get the label that corresponds to chromstart:chromend.

get_onehot_sequence(assembly: str, chrom: str, chromstart: int, chromend: int) → numpy.ndarray¶: Get the one-hot encoded sequence based on the assembly, chromosome, chromstart and chromend.

peaksql.datasets.bedregion¶

peaksql.datasets.narrowpeak module¶

class peaksql.datasets.narrowpeak.NarrowPeakDataSet(database: str, where: str = '', seq_length: int = 200, **kwargs)¶

Bases: peaksql.datasets.base._DataSet

The NarrowPeakDataSet expects that narrowPeak files have been added to the DataBase.

Util¶

peaksql.util.sequence_to_onehot(sequence) → numpy.ndarray¶

Convert a sequence of length n to a one-hot encoded array of shape (n x 4).

The nucleotides A, C, G, T respectively correspond to indices 0, 1, 2, 3.