PATACSDB - The database of polyA translational attenuators in coding sequences

Department of Bioinformatics, Institute of Biochemistry and Biophysics Polish Academy of Sciences, Warsaw, Poland
Department of Cell Biology and Physiology, Washington University School of Medicine, Saint Louis, Missouri, United States of America
Faculty of Biology, Institute of Experimental Plant Biology and Biotechnology, University of Warsaw, Warsaw, Poland
DOI
10.7287/peerj.preprints.1557v1
Subject Areas
Bioinformatics, Databases
Keywords
ribosome stalling, gene regulation, eukaryotic genomes, mRNA stability, translation
Copyright
© 2015 Habich et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
Habich M, Djuranovic S, Szczesny P. 2015. PATACSDB - The database of polyA translational attenuators in coding sequences. PeerJ PrePrints 3:e1557v1

Abstract

Recent addition to the repertoire of gene expression regulatory mechanisms are polyadenylate (polyA) tracks encoding for poly-lysine runs in protein sequences. Such tracks stall translation apparatus and induce frameshifting independently of the effects of charged nascent poly-lysine sequence on the ribosome exit channel. As such they substantially influence the stability of mRNA and amount of protein produced from a given transcript. Single base changes in these regions are enough to exert a measurable response on both protein and mRNA abundance, and makes each of these sequences potentially interesting case studies for effects of synonymous mutation, gene dosage balance and natural frameshifting. Here we present the PATACSDB, a resource that contain comprehensive list of polyA tracks from over 250 eukaryotic genomes. Our data is based on Ensembl genomic database of coding sequences and filtered with algorithm of 12A-1 which selects sequences of polyA tracks with a minimal length of 12 A's allowing for one mismatched base. The PATACSDB database is accesible at: http://sysbio.ibb.waw.pl/patacsdb. Source code is available for download from GitHub repository at http://github.com/habich/PATACSDB, including the scripts to recreate the database from the scratch on user's own computer.

Author Comment

This is a submission to PeerJ Computer Science for review.