Genome-wide discovery of local RNA structural elements in Zika virus

Ryan J Andrews; Julien Roche; Walter N Moss

doi:10.7287/peerj.preprints.27101v1

Genome-wide discovery of local RNA structural elements in Zika virus

Ryan J Andrews, Julien Roche, Walter N Moss

Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States

DOI: 10.7287/peerj.preprints.27101v1

Published: 2018-08-09
Accepted: 2018-08-09

Subject Areas: Bioinformatics, Molecular Biology, Virology
Keywords: RNA, RNA structure, zika virus, motif discovery, bioinformatics, sequence analysis, ncRNA

Copyright: © 2018 Andrews et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.

Cite this article: Andrews RJ, Roche J, Moss WN. 2018. Genome-wide discovery of local RNA structural elements in Zika virus. PeerJ Preprints 6:e27101v1 https://doi.org/10.7287/peerj.preprints.27101v1

Abstract

In addition to encoding RNA primary structures, genomes also encode RNA secondary and tertiary structures that play roles in gene regulation and, in the case of RNA viruses, genome replication. Methods for the identification of functional RNA structures in genomes typically rely on scanning analysis windows, where multiple partially-overlapping windows are used to predict RNA structures and folding metrics to deduce regions likely to form functional structure. Separate structural models are produced for each window, where the step size can greatly affect the returned model. This makes deducing unique local structures challenging, as the same nucleotides in each window can be alternatively base paired. In the presented approach, all base pairs from all analysis windows are considered and weighted by favorable folding metrics throughout all windows. This results in unique base pairing throughout the genome and the generation of local regions/structures that can be ranked by their propensity to form unusually thermodynamically stable folds.

This approach was applied to the Zika virus (ZIKV) genome. ZIKV is linked to a variety of neurological ailments including microcephaly and Guillain-Barré syndrome and its (+)-sense RNA genome encodes two, previously described, functionally essential structured RNA regions. Our approach is able to successfully identify and model the structures of these regions, while also finding additional regions likely to form functional RNA structures throughout the viral polyprotein coding region. All data for the ZIKV genome have been archived at the RNAStructuromeDB, a repository of RNA folding data for humans and their pathogens.

Author Comment

This is a submission to PeerJ for review.

Supplemental Information

Figure S1. Comparative arc diagram depicting the previously described 5' RNA structural motifs vs. the ScanFold predicted bps

(a) Arc diagram of 5' end region as predicted via ScanFold; base pairs are colored based on their z-score cutoff where blue lines depict bps which were predicted in the z-score < -2 results (Table S7) and green lines refer to bps which were predicted in the z-score < -1 results (Table S6). (b) Arc diagram of the accepted secondary structure model for the 5' end of ZIKV as shown in (Ye et al. 2016) and mapped to the KJ776791.2 sequence.

DOI: 10.7287/peerj.preprints.27101v1/supp-1

Download

Figure S2. Comparative arc diagram depicting the known RNA structural motifs vs. the ScanFold predicted bps

(a) Arc diagram of 3' end region as predicted via ScanFold; base pairs are colored based on their z-score cutoff where blue lines depict bps which were predicted in the z-score < -2 results (Table S7), green lines refer to bps which were predicted in the z-score < -1 results (Table S6), and yellow lines were predicted in the no filter results (Table S5). (b) Arc diagram of the accepted secondary structure model for the 5’ end of ZIKV as shown in (Goertz et al. 2017) mapped to the KJ776791.2 sequence. The start codon nucleotide locations have been highlighted with a light blue bar.

DOI: 10.7287/peerj.preprints.27101v1/supp-2

Download

Figure S3. Secondary structure model depicting the ScanFold proposed structures within and directly adjacent to known 5' and 3' structured regions

Base pairs are colored based on their z-score cutoff: blue lines depict bps which were predicted in the z-score < -2 results (Table S7), green lines refer to bps which were predicted in the z-score < -1 results (Table S6), and yellow lines were predicted in the no filter results (Table S5). The start and stop codon nucleotides have been circled and labeled in blue and green respectively. Nucleotides which established ScanFold bp preserving mutations within the alignment are highlighted with filled green circles.

DOI: 10.7287/peerj.preprints.27101v1/supp-3

Download

Table S1. Results of the scanning window analysis of the the ZIKV genome (NCBI accession KJ776791.2) as ouput from the ScanFold-Scan program

Each row contains the data calculated for each window. Columns A and B are the starting (i) and ending (j) coordinates of the window fragment. Column C is the temperature used for all RNAFold calculations. Column D-H refer to the ∆G_native, thermodynamic z-score, stability ratio p-value, ensemble diversity, and f requency-of- MFE (fMFE) values respectively (detailed descriptions of all metrics can be found at the RNAStructuromeDB https://structurome.bb.iastate.edu or the corresponding manuscript (Andrews et al. 2017) ). Column I contains the sequence of the window; the ∆G_native and centroid structure of this sequence are shown in Column J and K. Column L-O report nucleotide counts for the window sequence.

DOI: 10.7287/peerj.preprints.27101v1/supp-4

Download

Table S2. ScanFold log file produced during the ScanFold-Fold portion of the program

The log file is separated into two portions. The first half (row 1 to 87,448) contains a table for each nucleotide in the sequence. These tables contain the cumulative base pairing information for that nucleotide as predicted throughout the scan. Column A refers to the i-nucleotide of the sequence. Column B refers to the coordinate of the j base pair. The total number of windows the i-j pair appears, as well as the total number of windows the i-nucleotide appears are reported in column D. The average window minimum free energy, z-score, and ensemble diversity of each i-j pair are reported in columns E-G respectively. Column H reports the sum of z-scores for each i-j pair, which is used to calculate the coverage-normalized z-score (calculated as the sum of z-score over total windows in which i-nucleotide appeared) as reported in Column I. Column J reports a summary of the bps predicted for each i-nucleotide. The second half of the log file, starting at row 87,449, is a list of the most favorable i-j pairs (column B and C) associated with the i-nucleotide listed in column A. In places where this nucleotide competed with other i-nucleotides for the same j-nucleotide, the “winning” i-j pair is reported and denoted with an asterisk (in some cases the winning i-j pair does not contain the original i-nucleotide or may be unpaired). Columns D, E, and F, contain the average window minimum free energy, z-score and ensemble diversity for the corresponding i-j pair.

DOI: 10.7287/peerj.preprints.27101v1/supp-5

Download

Table S3. Results of 37 ZIKV genomes curated in the ZikaVR database (Gupta et al. 2016) aligned to KJ776791.2

Genomes were aligned using the MAFFT web server (Katoh et al. 2017; Kuraku et al. 2013) with default settings. Headings for each result contain the NCBI accession numbers and name of the aligned sequence name.

DOI: 10.7287/peerj.preprints.27101v1/supp-6

Download

Table S4. Base pair counts tabulating the number and type of base pair which appears in the ScanFold < -1 predicted structure when compared to 37 aligned ZIKV genome

A total of 37 ZIKV genomes were aligned to KJ776791.2 using the MAFFT web server (Katoh et al. 2017; Kuraku et al. 2013) using default settings. Aligned sequences were compared to ScanFold-Fold predicted bps (with z-score < -1) to tabulate the types of base pairs which are found throughout the alignment (Table S3). Column S reports the percent of canonical bps which were found to be allowed throughout the alignment for that base pair and column T reports the different number of canonical base pair types. Results for the previously reported 5' and 3' UTR structural regions appear as separate worksheets.

DOI: 10.7287/peerj.preprints.27101v1/supp-7

Download

Table S8. RNAstructure webserver scorer results of the ScanFold predicted structures compared to accepted structures

The structures and sequences were uploaded to the server as shown, and scorer was run with default settings.

DOI: 10.7287/peerj.preprints.27101v1/supp-11

Download

Supplemental Information

Figure S1. Comparative arc diagram depicting the previously described 5' RNA structural motifs vs. the ScanFold predicted bps

Figure S2. Comparative arc diagram depicting the known RNA structural motifs vs. the ScanFold predicted bps

Figure S3. Secondary structure model depicting the ScanFold proposed structures within and directly adjacent to known 5' and 3' structured regions

Table S1. Results of the scanning window analysis of the the ZIKV genome (NCBI accession KJ776791.2) as ouput from the ScanFold-Scan program

Table S2. ScanFold log file produced during the ScanFold-Fold portion of the program

Table S3. Results of 37 ZIKV genomes curated in the ZikaVR database (Gupta et al. 2016) aligned to KJ776791.2

Table S4. Base pair counts tabulating the number and type of base pair which appears in the ScanFold < -1 predicted structure when compared to 37 aligned ZIKV genome

Table S5. CT file of the default no-filter results output from ScanFold-Fold

Table S6. CT file of the default z-score < -1 results output from ScanFold-Fold

Table S7. CT file of the default z-score < -2 results output from ScanFold-Fold

Table S8. RNAstructure webserver scorer results of the ScanFold predicted structures compared to accepted structures

Add your feedback

Top referrals unique visitors

Share this preprint

Metrics

Download article