genomepy.providers.base.BaseProvider

class genomepy.providers.base.BaseProvider

Bases: object

Provider base class.

__init__()

Methods

__init__()

annotation_links(name, **kwargs)

Return available gene annotation links (http/ftp) for a genome

assembly_accession(name)

Return the assembly accession number (GCA* or GCF*) for a genome.

download_annotation(name[, genomes_dir, ...])

Download annotation file to to a specific directory

download_genome(name[, genomes_dir, ...])

Download a (gzipped) genome file to a specific directory

genome_taxid(name)

Return the genome taxonomy ID for a genome.

get_annotation_download_link(name, **kwargs)

Return a functional annotation download link.

get_annotation_download_links(name, **kwargs)

Retrieve functioning gene annotation download link(s).

get_genome_download_link(name[, mask])

head_annotation(name[, genomes_dir, n])

Download the first n lines of the annotation.

list_available_genomes([size])

List all available genomes.

ping()

Can the provider be reached?

search(term[, exact, size])

Search for term in genome names and descriptions (if term contains text.

Attributes

accession_fields

Metadata fields that (can) contain the assembly's accession ID.

description_fields

Metadata fields with assembly related info.

genomes

Dictionary with assembly names as key and assembly metadata dictionary as value.

name

Name of this provider.

taxid_fields

Metadata fields that (can) contain the assembly's taxonomy ID.

accession_fields = []

Metadata fields that (can) contain the assembly’s accession ID.

Return available gene annotation links (http/ftp) for a genome

Parameters

name (str) – genome name

Returns

Gene annotation links

Return type

list

assembly_accession(name: str) str

Return the assembly accession number (GCA* or GCF*) for a genome.

Parameters

name (str) – genome name

Returns

Assembly accession number

Return type

str

description_fields = []

Metadata fields with assembly related info.

download_annotation(name, genomes_dir=None, localname=None, **kwargs)

Download annotation file to to a specific directory

Parameters
  • name (str) – Genome / species name

  • genomes_dir (str , optional) – Directory to install annotation

  • localname (str , optional) – Custom name for your genome

download_genome(name: str, genomes_dir: str = None, localname: str = None, mask: str = 'soft', **kwargs)

Download a (gzipped) genome file to a specific directory

Parameters
  • name (str) – Genome / species name

  • genomes_dir (str , optional) – Directory to install genome

  • localname (str , optional) – Custom name for your genome

  • mask (str , optional) – Masking, soft, hard or none (all other strings)

genome_taxid(name: str) int

Return the genome taxonomy ID for a genome.

Parameters

name (str) – genome name

Returns

Genome Taxonomy identifier

Return type

int

genomes = {}

Dictionary with assembly names as key and assembly metadata dictionary as value.

Return a functional annotation download link.

Parameters

name (str) – genome name

Returns

http/ftp link

Return type

str

Raises

GenomeDownloadError – if no functional link was found

Retrieve functioning gene annotation download link(s).

Parameters
  • name (str) – genome name

  • **kwargs (dict, optional:) – provider specific options.

Returns

http/ftp link(s)

Return type

list

head_annotation(name: str, genomes_dir=None, n: int = 5, **kwargs)

Download the first n lines of the annotation.

The first line of the GTF is printed for review (of the gene_name field, for instance).

Parameters
  • name (str) – genome name

  • genomes_dir (str, optional) – genomes directory to install the annotation in.

  • n (int, optional) – download the annotation for n genes.

list_available_genomes(size=False)

List all available genomes.

Parameters

size (bool, optional) – Show absolute genome size.

Yields

genomes (list of tuples) – tuples with assembly name, accession, scientific_name, taxonomy id and description

name = None

Name of this provider.

static ping() bool

Can the provider be reached?

search(term: str, exact=False, size=False)

Search for term in genome names and descriptions (if term contains text. Case-insensitive), assembly accession IDs (if term starts with GCA_ or GCF_), or taxonomy IDs (if term is a number).

Note: exact accession ID search on UCSC may return different patch levels.

Parameters
  • term (str, int) – Search term, case-insensitive. Can be an assembly name (e.g. hg38), scientific name (Danio rerio), assembly accession ID (GCA_000146045), or taxonomy ID (7227).

  • exact (bool, optional) – term must be an exact match

  • size (bool, optional) – Show absolute genome size.

Yields

tuples with name and metadata

taxid_fields = []

Metadata fields that (can) contain the assembly’s taxonomy ID.