Genes of the Pig, Sus scrofa, reconstructed with EvidentialGene

Indiana University, Bloomington, IN, United States
DOI
10.7287/peerj.preprints.27191v1
Subject Areas
Agricultural Science, Bioinformatics, Genomics, Data Science
Keywords
precision genomics, transcriptome assembly, model organism, biomedical genomics, agricultural genomics, genome informatics pipeline
Copyright
© 2018 Gilbert
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Gilbert DG. 2018. Genes of the Pig, Sus scrofa, reconstructed with EvidentialGene. PeerJ Preprints 6:e27191v1

Abstract

The pig is a well studied model animal of biomedical and agricultural importance. Genes of this species, Sus scrofa, are known from experiments and predictions, and collected at the NCBI Reference Sequence database section. Gene reconstruction from transcribed gene evidence of RNA-seq now can accurately and completely reproduce the biological gene sets of animals and plants. Such a gene set for the pig is reported here, including human orthologs missing from RefSeq and other improvements to the current NCBI pig gene set. Methodology for accurate and complete gene set reconstruction from RNA is used: the automated SRA2Genes pipeline of EvidentialGene project.

Author Comment

This is a submission to PeerJ for review.

Supplemental Information

pig18evg_datadesc.pages Conserved vertebrate genes recovered in Pig Evigene vs NCBI gene sets, as computed with vertebrate conserved genes of OrthoDB

pig18evg_datadesc.pages Columns include gene ids of BUSCO_ID, Evigene_ID, and NCBI RefSeq ID. Other columns: Cmp, the qualitative comparison (evgain, same, evloss) of alignment difference; Diff, numeric difference in alignment score to conserved protein; dEvg-Ncb, the two alignment scores; BC, the BUSCO complete/fragment/missing quality score; and Product_Name, the vertebrate protein product.

DOI: 10.7287/peerj.preprints.27191v1/supp-1

pig18evg_datadesc.pages Human genes recovered in Pig Evigene vs NCBI gene sets, as computed with human and pig RefSeq and Evigene proteins and NCBI BLASTP

pig18evg_datadesc.pages Columns include gene ids for Human RefSeq ID, Evigene_pig_ID, NCBI_pig_ID; AAsize, human protein size; EvAlign, NcAlign, alignment scores to Evigene and NCBI proteins; DiffA, difference in alignments; and Human_Gene_Name.

DOI: 10.7287/peerj.preprints.27191v1/supp-2