API¶
This is a collection of public functions and classes relevant for users of PeakSQL.
DataBase¶
-
class
peaksql.database.
DataBase
(db: str = 'PeakSQL.sqlite', in_memory: bool = False)¶ The DataBase class serves as an easy interface to store and retrieve NGS data in a peaksql database.
-
add_assembly
(assembly_path: str, assembly: str = None, species: str = None)¶ Add an assembly (genome) to the database. Sequences from the assembly are retrieved with PyFaidx, so they aren’t stored in the database, only the path to the fasta file is stored. This thus assumes that the assembly does not change location during during the database’s lifetime.
- Parameters
assembly_path – The path to the assembly file.
assembly – The name of the assembly (optional: default is the name of the file).
species – The name of the species the assembly belongs to (optional: default is the assembly name)
-
add_data
(data_path: str, assembly: str, condition: str = None)¶ Add data (bed, narrowPeak, or bedgraph) to the database.
- Parameters
data_path – The path to the assembly file.
assembly – The name of the assembly. Requires the assembly to be added to the database prior.
condition – Experimental condition (optional). This allows for filtering on conditions , e.g. when streaming data with a DataSet.
-
property
assemblies
¶ All assemblies registred in the database.
-
DataSet loaders¶
peaksql.datasets.base¶
-
class
peaksql.datasets.base.
_DataSet
(database: str, where: str = '', seq_length: int = 200, **kwargs)¶ DataSet baseclass.
-
__getitem__
(index: int) → Tuple[numpy.ndarray, numpy.ndarray]¶ Return the sequence in one-hot encoding and the label of the corresponding index.
-
get_label
(assembly: str, chrom: str, chromstart: int, chromend: int) → numpy.ndarray¶ Get the label that corresponds to chromstart:chromend.
-
get_onehot_sequence
(assembly: str, chrom: str, chromstart: int, chromend: int) → numpy.ndarray¶ Get the one-hot encoded sequence based on the assembly, chromosome, chromstart and chromend.
-
peaksql.datasets.bedregion¶
peaksql.datasets.narrowpeak module¶
-
class
peaksql.datasets.narrowpeak.
NarrowPeakDataSet
(database: str, where: str = '', seq_length: int = 200, **kwargs)¶ Bases:
peaksql.datasets.base._DataSet
The NarrowPeakDataSet expects that narrowPeak files have been added to the DataBase.