genomepy.providers.local.LocalProvider

class genomepy.providers.local.LocalProvider

Bases: BaseProvider

Local genome provider.

Give a local genome the genomepy treatment.

__init__()

Methods

__init__()

annotation_links(name, **kwargs)

Return available gene annotation links (http/ftp) for a genome

assembly_accession(name)

Return the assembly accession number (GCA* or GCF*) for a genome.

download_annotation(name[, genomes_dir, ...])

Download annotation file to to a specific directory

download_genome(name[, genomes_dir, ...])

Download a (gzipped) genome file to a specific directory

genome_taxid(name)

Return the genome taxonomy ID for a genome.

get_annotation_download_link(name, **kwargs)

Return a filepath to a matching annotation.

get_annotation_download_links(name, **kwargs)

Returns all files containing both name and an annotation extension

get_genome_download_link(path[, mask])

head_annotation(name[, genomes_dir, n])

Download the first n lines of the annotation.

list_available_genomes([size])

List all available genomes.

ping()

Can the provider be reached?

search(term[, exact, size])

return an empty generator, same as if no genomes were found at the other providers

Attributes

accession_fields

Metadata fields that (can) contain the assembly's accession ID.

description_fields

Metadata fields with assembly related info.

genomes

Dictionary with assembly names as key and assembly metadata dictionary as value.

name

Name of this provider.

taxid_fields

Metadata fields that (can) contain the assembly's taxonomy ID.

accession_fields = []

Metadata fields that (can) contain the assembly’s accession ID.

Return available gene annotation links (http/ftp) for a genome

Parameters

name (str) – genome name

Returns

Gene annotation links

Return type

list

assembly_accession(name)

Return the assembly accession number (GCA* or GCF*) for a genome.

Parameters

name (str) – genome name

Returns

Assembly accession number

Return type

str

description_fields = []

Metadata fields with assembly related info.

download_annotation(name, genomes_dir=None, localname=None, **kwargs)

Download annotation file to to a specific directory

Parameters
  • name (str) – Genome / species name

  • genomes_dir (str , optional) – Directory to install annotation

  • localname (str , optional) – Custom name for your genome

download_genome(name: str, genomes_dir: str = None, localname: str = None, mask: str = 'soft', **kwargs)

Download a (gzipped) genome file to a specific directory

Parameters
  • name (str) – Genome / species name

  • genomes_dir (str , optional) – Directory to install genome

  • localname (str , optional) – Custom name for your genome

  • mask (str , optional) – Masking, soft, hard or none (all other strings)

genome_taxid(name)

Return the genome taxonomy ID for a genome.

Parameters

name (str) – genome name

Returns

Genome Taxonomy identifier

Return type

int

genomes = {}

Dictionary with assembly names as key and assembly metadata dictionary as value.

Return a filepath to a matching annotation.

Parameters
  • name (str) – genome name

  • **kwargs (dict, optional:) – path_to_annotation : direct path to the gene annotation

Returns

path

Return type

str

Raises

GenomeDownloadError – if no functional path was found

Returns all files containing both name and an annotation extension

head_annotation(name: str, genomes_dir=None, n: int = 5, **kwargs)

Download the first n lines of the annotation.

The first line of the GTF is printed for review (of the gene_name field, for instance).

Parameters
  • name (str) – genome name

  • genomes_dir (str, optional) – genomes directory to install the annotation in.

  • n (int, optional) – download the annotation for n genes.

list_available_genomes(size=False)

List all available genomes.

Parameters

size (bool, optional) – Show absolute genome size.

Yields

genomes (list of tuples) – tuples with assembly name, accession, scientific_name, taxonomy id and description

name = 'Local'

Name of this provider.

static ping()

Can the provider be reached?

search(term, exact=False, size=False)

return an empty generator, same as if no genomes were found at the other providers

taxid_fields = []

Metadata fields that (can) contain the assembly’s taxonomy ID.