genomepy.providers.url.UrlProvider

class genomepy.providers.url.UrlProvider

Bases: BaseProvider

URL genome provider.

Simply download a genome directly through a url.

__init__()

Methods

__init__()

annotation_links(name, **kwargs)

Return available gene annotation links (http/ftp) for a genome

assembly_accession(name)

Return the assembly accession number (GCA* or GCF*) for a genome.

download_annotation(name[, genomes_dir, ...])

Download annotation file to to a specific directory

download_genome(name[, genomes_dir, ...])

Download a (gzipped) genome file to a specific directory

genome_taxid(name)

Return the genome taxonomy ID for a genome.

get_annotation_download_link(name, **kwargs)

Return a functional annotation download link.

get_annotation_download_links(name, **kwargs)

Retrieve functioning gene annotation download link(s).

get_genome_download_link(url[, mask])

head_annotation(name[, genomes_dir, n])

Download the first n lines of the annotation.

list_available_genomes([size])

List all available genomes.

ping()

Can the provider be reached?

search(term[, exact, size])

return an empty generator, same as if no genomes were found at the other providers

Attributes

accession_fields

Metadata fields that (can) contain the assembly's accession ID.

description_fields

Metadata fields with assembly related info.

genomes

Dictionary with assembly names as key and assembly metadata dictionary as value.

name

Name of this provider.

taxid_fields

Metadata fields that (can) contain the assembly's taxonomy ID.

accession_fields = []

Metadata fields that (can) contain the assembly’s accession ID.

Return available gene annotation links (http/ftp) for a genome

Parameters

name (str) – genome name

Returns

Gene annotation links

Return type

list

assembly_accession(name)

Return the assembly accession number (GCA* or GCF*) for a genome.

Parameters

name (str) – genome name

Returns

Assembly accession number

Return type

str

description_fields = []

Metadata fields with assembly related info.

download_annotation(name, genomes_dir=None, localname=None, **kwargs)

Download annotation file to to a specific directory

Parameters
  • name (str) – Genome / species name

  • genomes_dir (str , optional) – Directory to install annotation

  • localname (str , optional) – Custom name for your genome

download_genome(name: str, genomes_dir: str = None, localname: str = None, mask: str = 'soft', **kwargs)

Download a (gzipped) genome file to a specific directory

Parameters
  • name (str) – Genome / species name

  • genomes_dir (str , optional) – Directory to install genome

  • localname (str , optional) – Custom name for your genome

  • mask (str , optional) – Masking, soft, hard or none (all other strings)

genome_taxid(name)

Return the genome taxonomy ID for a genome.

Parameters

name (str) – genome name

Returns

Genome Taxonomy identifier

Return type

int

genomes = {}

Dictionary with assembly names as key and assembly metadata dictionary as value.

Return a functional annotation download link.

Parameters
  • name (str) – genome name

  • **kwargs (dict, optional:) – to_annotation : direct URL to the gene annotation

Returns

http/ftp link

Return type

str

Raises

GenomeDownloadError – if no functional link was found

Retrieve functioning gene annotation download link(s).

If provided, check if the annotation url links to a supported file type (gtf/gff3/bed). Else try to find an annotation in the same location as the genome url.

Parameters

name (str) – genome name

Returns

http/ftp link(s)

Return type

list

head_annotation(name: str, genomes_dir=None, n: int = 5, **kwargs)

Download the first n lines of the annotation.

The first line of the GTF is printed for review (of the gene_name field, for instance).

Parameters
  • name (str) – genome name

  • genomes_dir (str, optional) – genomes directory to install the annotation in.

  • n (int, optional) – download the annotation for n genes.

list_available_genomes(size=False)

List all available genomes.

Parameters

size (bool, optional) – Show absolute genome size.

Yields

genomes (list of tuples) – tuples with assembly name, accession, scientific_name, taxonomy id and description

name = 'URL'

Name of this provider.

static ping()

Can the provider be reached?

search(term, exact=False, size=False)

return an empty generator, same as if no genomes were found at the other providers

taxid_fields = []

Metadata fields that (can) contain the assembly’s taxonomy ID.