Background: A hierarchical taxonomy of organisms is a prerequisite for semantic integration of biodiversity data. Ideally, there would be a single, expansive, authoritative taxonomy that includes extinct and extant taxa, information on synonyms and common names, and monophyletic supraspecific taxa that reflect our current understanding of phylogenetic relationships.
Description: As a step towards development of such a resource, and to enable large-scale integration of phenotypic data across the vertebrates, we created the Vertebrate Taxonomy Ontology (VTO), a semantically defined taxonomic resource derived from the integration of existing taxonomic compilations, and freely distributed under a Creative Commons Zero (CC0) public domain waiver. The VTO includes both extant and extinct vertebrates and currently contains 106,927 taxonomic terms, 23 taxonomic ranks, 104,506 synonyms, and 162,132 taxonomic cross-references. Key challenges in constructing the VTO included (1) extracting and merging names, synonyms, and identifiers from heterogeneous sources; (2) replacing subgroups with more authoritative local taxonomies; and (3) automating this process as much as possible to accommodate updates in source taxonomies.
Conclusions: The VTO is the primary source of taxonomic information used by the Phenoscape Knowledgebase (http://phenoscape.org/), which integrates genetic and evolutionary phenotype data across both model and nonmodel vertebrates. The VTO is useful for crudely inferring phenotypic changes on the vertebrate tree of life, which enables queries for candidate genes for different episodes in vertebrate evolution.