genomepy.genome.Genome

class genomepy.genome.Genome(name, genomes_dir=None, *args, **kwargs)

Bases: Fasta

pyfaidx Fasta object of a genome with additional attributes & methods.

Generates a genome index file, sizes file and gaps file of the genome.

Parameters
  • name (str) – Genome name

  • genomes_dir (str, optional) – Genome installation directory

Returns

An object that provides a pygr compatible interface.

Return type

pyfaidx.Fasta

__init__(name, genomes_dir=None, *args, **kwargs)

An object that provides a pygr compatible interface. filename: name of fasta file or fsspec.core.OpenFile instance indexname: name of index file or fsspec.core.OpenFile instance

Methods

__init__(name[, genomes_dir])

An object that provides a pygr compatible interface.

close()

get_random_sequences([n, length, chroms, ...])

Return random genomic sequences.

get_seq(name, start, end[, rc])

Return a sequence by record name and interval [start, end).

get_spliced_seq(name, intervals[, rc])

Return a sequence by record name and list of intervals

items()

keys()

track2fasta(track[, fastafile, stranded, ...])

Return a list of fasta sequences as Sequence objects as directed from the track(s).

values()

Attributes

gaps

contigs and the number of Ns contained

plugin

dict of all active plugins and their properties

sizes

contigs and their lengths

name

genome name

genomes_dir

path to the genomepy genomes directory

genome_file

path to the genome fasta

genome_dir

path to the genome directory

index_file

path to the genome index

sizes_file

path to the chromosome sizes file

gaps_file

path to the chromosome gaps file

annotation_gtf_file

path to the gene annotation GTF file

annotation_bed_file

path to the gene annotation BED file

readme_file

path to the README file

tax_id

genome taxonomy identifier

assembly_accession

genome assembly accession

annotation_bed_file

path to the gene annotation BED file

annotation_gtf_file

path to the gene annotation GTF file

assembly_accession

genome assembly accession

gaps: dict = None

contigs and the number of Ns contained

Type

contents of the gaps file

gaps_file

path to the chromosome gaps file

genome_dir

path to the genome directory

genome_file

path to the genome fasta

genomes_dir

path to the genomepy genomes directory

get_random_sequences(n=10, length=200, chroms=None, max_n=0.1, outtype='list')

Return random genomic sequences.

Parameters
  • n (int , optional) – Number of sequences to return.

  • length (int , optional) – Length of sequences to return.

  • chroms (list , optional) – Return sequences only from these chromosomes.

  • max_n (float , optional) – Maximum fraction of Ns.

  • outtype (string , optional) – return the output as list or string. Options: “list” or “string”, default: “list”.

Returns

coordinates as lists or strings: List with [chrom, start, end] genomic coordinates. String with “chrom:start-end” genomic coordinates (can be used as input for track2fasta).

Return type

list

get_seq(name, start, end, rc=False)

Return a sequence by record name and interval [start, end).

Coordinates are 1-based, end-exclusive. If rc is set, reverse complement will be returned.

get_spliced_seq(name, intervals, rc=False)

Return a sequence by record name and list of intervals

Interval list is an iterable of [start, end]. Coordinates are 1-based, end-exclusive. If rc is set, reverse complement will be returned.

index_file

path to the genome index

name

genome name

property plugin

dict of all active plugins and their properties

readme_file

path to the README file

sizes: dict = None

contigs and their lengths

Type

contents of the sizes file

sizes_file

path to the chromosome sizes file

tax_id

genome taxonomy identifier

track2fasta(track, fastafile=None, stranded=False, extend_up=0, extend_down=0)

Return a list of fasta sequences as Sequence objects as directed from the track(s).

Parameters
  • track (list/region file/bed file) – region(s) you wish to translate to fasta. Example input files can be found in genomepy/tests/data/regions.*

  • fastafile (bool , optional) – return Sequences as list or save to file? (default: list)

  • stranded (bool , optional) – return sequences for both strands? Required BED6 (or higher) as input (default: False)

  • extend_up (int , optional) – extend the sequences up? (command is strand sensitive, default: 0)

  • extend_down (int , optional) – extend the sequences down? (command is strand sensitive, default: 0)