Genome-wide discovery of local RNA structural elements in Zika virus
- Published
- Accepted
- Subject Areas
- Bioinformatics, Molecular Biology, Virology
- Keywords
- RNA, RNA structure, zika virus, motif discovery, bioinformatics, sequence analysis, ncRNA
- Copyright
- © 2018 Andrews et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2018. Genome-wide discovery of local RNA structural elements in Zika virus. PeerJ Preprints 6:e27101v1 https://doi.org/10.7287/peerj.preprints.27101v1
Abstract
In addition to encoding RNA primary structures, genomes also encode RNA secondary and tertiary structures that play roles in gene regulation and, in the case of RNA viruses, genome replication. Methods for the identification of functional RNA structures in genomes typically rely on scanning analysis windows, where multiple partially-overlapping windows are used to predict RNA structures and folding metrics to deduce regions likely to form functional structure. Separate structural models are produced for each window, where the step size can greatly affect the returned model. This makes deducing unique local structures challenging, as the same nucleotides in each window can be alternatively base paired. In the presented approach, all base pairs from all analysis windows are considered and weighted by favorable folding metrics throughout all windows. This results in unique base pairing throughout the genome and the generation of local regions/structures that can be ranked by their propensity to form unusually thermodynamically stable folds.
This approach was applied to the Zika virus (ZIKV) genome. ZIKV is linked to a variety of neurological ailments including microcephaly and Guillain-Barré syndrome and its (+)-sense RNA genome encodes two, previously described, functionally essential structured RNA regions. Our approach is able to successfully identify and model the structures of these regions, while also finding additional regions likely to form functional RNA structures throughout the viral polyprotein coding region. All data for the ZIKV genome have been archived at the RNAStructuromeDB, a repository of RNA folding data for humans and their pathogens.
Author Comment
This is a submission to PeerJ for review.
Supplemental Information
Figure S1. Comparative arc diagram depicting the previously described 5' RNA structural motifs vs. the ScanFold predicted bps
(a) Arc diagram of 5' end region as predicted via ScanFold; base pairs are colored based on their z-score cutoff where blue lines depict bps which were predicted in the z-score < -2 results (Table S7) and green lines refer to bps which were predicted in the z-score < -1 results (Table S6). (b) Arc diagram of the accepted secondary structure model for the 5' end of ZIKV as shown in (Ye et al. 2016) and mapped to the KJ776791.2 sequence.
Figure S2. Comparative arc diagram depicting the known RNA structural motifs vs. the ScanFold predicted bps
(a) Arc diagram of 3' end region as predicted via ScanFold; base pairs are colored based on their z-score cutoff where blue lines depict bps which were predicted in the z-score < -2 results (Table S7), green lines refer to bps which were predicted in the z-score < -1 results (Table S6), and yellow lines were predicted in the no filter results (Table S5). (b) Arc diagram of the accepted secondary structure model for the 5’ end of ZIKV as shown in (Goertz et al. 2017) mapped to the KJ776791.2 sequence. The start codon nucleotide locations have been highlighted with a light blue bar.
Figure S3. Secondary structure model depicting the ScanFold proposed structures within and directly adjacent to known 5' and 3' structured regions
Base pairs are colored based on their z-score cutoff: blue lines depict bps which were predicted in the z-score < -2 results (Table S7), green lines refer to bps which were predicted in the z-score < -1 results (Table S6), and yellow lines were predicted in the no filter results (Table S5). The start and stop codon nucleotides have been circled and labeled in blue and green respectively. Nucleotides which established ScanFold bp preserving mutations within the alignment are highlighted with filled green circles.
Table S1. Results of the scanning window analysis of the the ZIKV genome (NCBI accession KJ776791.2) as ouput from the ScanFold-Scan program
Each row contains the data calculated for each window. Columns A and B are the starting (i) and ending (j) coordinates of the window fragment. Column C is the temperature used for all RNAFold calculations. Column D-H refer to the ∆Gnative, thermodynamic z-score, stability ratio p-value, ensemble diversity, and f requency-of- MFE (fMFE) values respectively (detailed descriptions of all metrics can be found at the RNAStructuromeDB https://structurome.bb.iastate.edu or the corresponding manuscript (Andrews et al. 2017) ). Column I contains the sequence of the window; the ∆Gnative and centroid structure of this sequence are shown in Column J and K. Column L-O report nucleotide counts for the window sequence.
Table S2. ScanFold log file produced during the ScanFold-Fold portion of the program
The log file is separated into two portions. The first half (row 1 to 87,448) contains a table for each nucleotide in the sequence. These tables contain the cumulative base pairing information for that nucleotide as predicted throughout the scan. Column A refers to the i-nucleotide of the sequence. Column B refers to the coordinate of the j base pair. The total number of windows the i-j pair appears, as well as the total number of windows the i-nucleotide appears are reported in column D. The average window minimum free energy, z-score, and ensemble diversity of each i-j pair are reported in columns E-G respectively. Column H reports the sum of z-scores for each i-j pair, which is used to calculate the coverage-normalized z-score (calculated as the sum of z-score over total windows in which i-nucleotide appeared) as reported in Column I. Column J reports a summary of the bps predicted for each i-nucleotide. The second half of the log file, starting at row 87,449, is a list of the most favorable i-j pairs (column B and C) associated with the i-nucleotide listed in column A. In places where this nucleotide competed with other i-nucleotides for the same j-nucleotide, the “winning” i-j pair is reported and denoted with an asterisk (in some cases the winning i-j pair does not contain the original i-nucleotide or may be unpaired). Columns D, E, and F, contain the average window minimum free energy, z-score and ensemble diversity for the corresponding i-j pair.
Table S3. Results of 37 ZIKV genomes curated in the ZikaVR database (Gupta et al. 2016) aligned to KJ776791.2
Genomes were aligned using the MAFFT web server (Katoh et al. 2017; Kuraku et al. 2013) with default settings. Headings for each result contain the NCBI accession numbers and name of the aligned sequence name.
Table S4. Base pair counts tabulating the number and type of base pair which appears in the ScanFold < -1 predicted structure when compared to 37 aligned ZIKV genome
A total of 37 ZIKV genomes were aligned to KJ776791.2 using the MAFFT web server (Katoh et al. 2017; Kuraku et al. 2013) using default settings. Aligned sequences were compared to ScanFold-Fold predicted bps (with z-score < -1) to tabulate the types of base pairs which are found throughout the alignment (Table S3). Column S reports the percent of canonical bps which were found to be allowed throughout the alignment for that base pair and column T reports the different number of canonical base pair types. Results for the previously reported 5' and 3' UTR structural regions appear as separate worksheets.
Table S5. CT file of the default no-filter results output from ScanFold-Fold
Table S6. CT file of the default z-score < -1 results output from ScanFold-Fold
Table S7. CT file of the default z-score < -2 results output from ScanFold-Fold
Table S8. RNAstructure webserver scorer results of the ScanFold predicted structures compared to accepted structures
The structures and sequences were uploaded to the server as shown, and scorer was run with default settings.