genomepy.genome.Genome
- class genomepy.genome.Genome(name, genomes_dir=None, *args, **kwargs)
Bases:
Fasta
pyfaidx Fasta object of a genome with additional attributes & methods.
Generates a genome index file, sizes file and gaps file of the genome.
- Parameters
name (str) – Genome name
genomes_dir (str, optional) – Genome installation directory
- Returns
An object that provides a pygr compatible interface.
- Return type
pyfaidx.Fasta
- __init__(name, genomes_dir=None, *args, **kwargs)
An object that provides a pygr compatible interface. filename: name of fasta file or fsspec.core.OpenFile instance indexname: name of index file or fsspec.core.OpenFile instance
Methods
__init__
(name[, genomes_dir])An object that provides a pygr compatible interface.
close
()get_random_sequences
([n, length, chroms, ...])Return random genomic sequences.
get_seq
(name, start, end[, rc])Return a sequence by record name and interval [start, end).
get_spliced_seq
(name, intervals[, rc])Return a sequence by record name and list of intervals
items
()keys
()track2fasta
(track[, fastafile, stranded, ...])Return a list of fasta sequences as Sequence objects as directed from the track(s).
values
()Attributes
contigs and the number of Ns contained
dict of all active plugins and their properties
contigs and their lengths
genome name
path to the genomepy genomes directory
path to the genome fasta
path to the genome directory
path to the genome index
path to the chromosome sizes file
path to the chromosome gaps file
path to the gene annotation GTF file
path to the gene annotation BED file
path to the README file
genome taxonomy identifier
genome assembly accession
- annotation_bed_file
path to the gene annotation BED file
- annotation_gtf_file
path to the gene annotation GTF file
- assembly_accession
genome assembly accession
- gaps: dict = None
contigs and the number of Ns contained
- Type
contents of the gaps file
- gaps_file
path to the chromosome gaps file
- genome_dir
path to the genome directory
- genome_file
path to the genome fasta
- genomes_dir
path to the genomepy genomes directory
- get_random_sequences(n=10, length=200, chroms=None, max_n=0.1, outtype='list')
Return random genomic sequences.
- Parameters
n (int , optional) – Number of sequences to return.
length (int , optional) – Length of sequences to return.
chroms (list , optional) – Return sequences only from these chromosomes.
max_n (float , optional) – Maximum fraction of Ns.
outtype (string , optional) – return the output as list or string. Options: “list” or “string”, default: “list”.
- Returns
coordinates as lists or strings: List with [chrom, start, end] genomic coordinates. String with “chrom:start-end” genomic coordinates (can be used as input for track2fasta).
- Return type
list
- get_seq(name, start, end, rc=False)
Return a sequence by record name and interval [start, end).
Coordinates are 1-based, end-exclusive. If rc is set, reverse complement will be returned.
- get_spliced_seq(name, intervals, rc=False)
Return a sequence by record name and list of intervals
Interval list is an iterable of [start, end]. Coordinates are 1-based, end-exclusive. If rc is set, reverse complement will be returned.
- index_file
path to the genome index
- name
genome name
- property plugin
dict of all active plugins and their properties
- readme_file
path to the README file
- sizes: dict = None
contigs and their lengths
- Type
contents of the sizes file
- sizes_file
path to the chromosome sizes file
- tax_id
genome taxonomy identifier
- track2fasta(track, fastafile=None, stranded=False, extend_up=0, extend_down=0)
Return a list of fasta sequences as Sequence objects as directed from the track(s).
- Parameters
track (list/region file/bed file) – region(s) you wish to translate to fasta. Example input files can be found in genomepy/tests/data/regions.*
fastafile (bool , optional) – return Sequences as list or save to file? (default: list)
stranded (bool , optional) – return sequences for both strands? Required BED6 (or higher) as input (default: False)
extend_up (int , optional) – extend the sequences up? (command is strand sensitive, default: 0)
extend_down (int , optional) – extend the sequences down? (command is strand sensitive, default: 0)