Draft sequencing and assembly of the genome of the world’s largest fish, the whale shark: Rhincodon typus Smith 1828
A peer-reviewed article of this Preprint also exists.
Author and article information
Abstract
The whale shark (Rhincodon typus) has by far the largest body size of any elasmobranch (shark or ray) species and is therefore also the largest extant species of the paraphyletic assemblage commonly referred to as “fishes”. As both a phenotypic extreme and a member of the group basal to the remaining gnathostomes, which includes all tetrapods and therefore also humans, its genome is of substantial comparative interest. Whale sharks are also listed as a “vulnerable” species on the International Union for Conservation of Nature (IUCN)'s Red List of threatened species and are of growing popularity as both a target of ecotourism and as a charismatic conservation ambassador for the pelagic ecosystem. A genome map for this species would aid in defining effective conservation units and understanding global population structure. We characterised the nuclear genome of the whale shark using next generation sequencing (454, Illumina) and de novo assembly and annotation methods, based on material collected from the Georgia Aquarium. The data set consisted of 878,654,233 reads, which assembled into 11,347,816 contigs and 3,606,038 scaffolds. The estimated genome size was 3.44Gb. As expected, the proteome of the whale shark was most closely related to the only other complete genome of a cartilaginous fish, the Holocephali Elephant shark. The whale shark contained a novel Toll-like-receptor protein with sequence conservation to both the TLR4 and TLR13 proteins of mammals. The data are publicly available on a Galaxy bioinformatic server (http://whaleshark.georgiaaquarium.org). This represents the first shotgun elasmobranch genome and will aid studies of molecular systematics, biogeography, genetic differentiation, and conservation genetics in this and other shark species, as well as providing comparative data for studies of evolutionary biology and immunology across the jawed vertebrate lineages.
Cite this as
2015. Draft sequencing and assembly of the genome of the world’s largest fish, the whale shark: Rhincodon typus Smith 1828. PeerJ Preprints 6:e837v1 https://doi.org/10.7287/peerj.preprints.837v1Author comment
This is a preliminary report on this project, where enough data has now been generated to see some important features of the genome. We provide some statistics and links to publicly accessible data. We urge caution to avoid over-interpretation of the sequence but we hope the data we provide will stimulate the shark genetics community.
Sections
Additional Information
Competing Interests
The authors declare no competing interests, other than that DHW and ADMD are employees of the Georgia Aquarium, which was the principal funder of this work. TDR has an honorary (non-compensated) position at the Georgia Aquarium. Timothy D Read is an Academic Editor for PeerJ.
Author Contributions
Timothy D Read conceived and designed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.
Robert A Petit III analyzed the data, reviewed drafts of the paper.
Sandeep J Joseph analyzed the data, prepared figures and/or tables, reviewed drafts of the paper.
Md T Alam analyzed the data.
Ryan Weil conceived and designed the experiments.
Maida Ahmad analyzed the data, reviewed drafts of the paper.
Ravila Bhimani analyzed the data, reviewed drafts of the paper.
Jocelyn S Vuong analyzed the data, reviewed drafts of the paper.
Chad P Haase performed the experiments, reviewed drafts of the paper.
Harry Webb contributed reagents/materials/analysis tools.
Alistair D. M. Dove conceived and designed the experiments, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.
DNA Deposition
The following information was supplied regarding the deposition of DNA sequences:
National Center for Biotechnology information short read archive under accession number SRP044374
Data Deposition
The following information was supplied regarding data availability:
Funding
The major funding from this project came from the Georgia Aquarium, with additional resources provided by Division of Infectious Diseases development funds to TDR. Coca Cola Inc. contributed towards establishing the Galaxy web server. Funding for equipment used at the Emory Genome Center was provided by the Georgia Research Alliance, Emory School of Medicine, Department of Human Genetics and the Atlanta Clinical and Translational Sciences Institute. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.