De novo gene evolution: How do we transition from non-coding to coding?
- Published
- Accepted
- Subject Areas
- Bioinformatics, Computational Biology, Evolutionary Studies, Genetics, Genomics
- Keywords
- de novo gene, ribosome profiling, translation, peptide, micropeptide, coding score, codon usage bias, natural selection, polymorphism, long non-coding RNA
- Copyright
- © 2017 Ruiz-Orera et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. De novo gene evolution: How do we transition from non-coding to coding? PeerJ Preprints 5:e3031v2 https://doi.org/10.7287/peerj.preprints.3031v2
Abstract
Recent years have witnessed the discovery of protein–coding genes which appear to have evolved de novo from previously non-coding sequences. This has changed the long-standing view that coding sequences can only evolve from other coding sequences. However, there are still many open questions regarding how new protein-coding sequences can arise from non-genic DNA.
Two prerequisites for the birth of a new functional protein-coding gene are that the corresponding DNA fragment is transcribed and that it is also translated. Transcription is known to be pervasive in the genome, producing a large number of transcripts that do not correspond to conserved protein-coding genes, and which are usually annotated as long non-coding RNAs (lncRNA). Recently, sequencing of ribosome protected fragments (Ribo-Seq) has provided evidence that many of these transcripts actually translate small proteins. We have used mouse non-synonymous and synonymous variation data to estimate the strength of purifying selection acting on the translated open reading frames (ORFs). Whereas a subset of the lncRNAs are likely to actually be true protein-coding genes (and thus previously misclassified), the bulk of lncRNAs code for proteins which show variation patterns consistent with neutral evolution. We also show that the ORFs that have a more favorable, coding-like, sequence composition are more likely to be translated than other ORFs in lncRNAs. This study provides strong evidence that there is a large and ever-changing reservoir of lowly abundant proteins; some of these peptides may become useful and act as seeds for de novo gene evolution.
Author Comment
Oral presentation for the Molecular Innovation symposium of SMBE 2017
The title has been changed to match that one that appears in the SMBE 2017 program.