Figure S1. Comparative arc diagram depicting the previously described 5' RNA structural motifs vs. the ScanFold predicted bps
(a) Arc diagram of 5' end region as predicted via ScanFold; base pairs are colored based on their z-score cutoff where blue lines depict bps which were predicted in the z-score < -2 results (Table S7) and green lines refer to bps which were predicted in the z-score < -1 results (Table S6). (b) Arc diagram of the accepted secondary structure model for the 5' end of ZIKV as shown in (Ye et al. 2016) and mapped to the KJ776791.2 sequence.
Figure S2. Comparative arc diagram depicting the known RNA structural motifs vs. the ScanFold predicted bps
(a) Arc diagram of 3' end region as predicted via ScanFold; base pairs are colored based on their z-score cutoff where blue lines depict bps which were predicted in the z-score < -2 results (Table S7), green lines refer to bps which were predicted in the z-score < -1 results (Table S6), and yellow lines were predicted in the no filter results (Table S5). (b) Arc diagram of the accepted secondary structure model for the 5’ end of ZIKV as shown in (Goertz et al. 2017) mapped to the KJ776791.2 sequence. The start codon nucleotide locations have been highlighted with a light blue bar.
Figure S3. Secondary structure model depicting the ScanFold proposed structures within and directly adjacent to known 5' and 3' structured regions
Base pairs are colored based on their z-score cutoff: blue lines depict bps which were predicted in the z-score < -2 results (Table S7), green lines refer to bps which were predicted in the z-score < -1 results (Table S6), and yellow lines were predicted in the no filter results (Table S5). The start and stop codon nucleotides have been circled and labeled in blue and green respectively. Nucleotides which established ScanFold bp preserving mutations within the alignment are highlighted with filled green circles.
Table S1. Results of the scanning window analysis of the the ZIKV genome (NCBI accession KJ776791.2) as ouput from the ScanFold-Scan program
Each row contains the data calculated for each window. Columns A and B are the starting (i) and ending (j) coordinates of the window fragment. Column C is the temperature used for all RNAFold calculations. Column D-H refer to the ∆Gnative, thermodynamic z-score, stability ratio p-value, ensemble diversity, and f requency-of- MFE (fMFE) values respectively (detailed descriptions of all metrics can be found at the RNAStructuromeDB https://structurome.bb.iastate.edu or the corresponding manuscript (Andrews et al. 2017) ). Column I contains the sequence of the window; the ∆Gnative and centroid structure of this sequence are shown in Column J and K. Column L-O report nucleotide counts for the window sequence.
Table S2. ScanFold log file produced during the ScanFold-Fold portion of the program
The log file is separated into two portions. The first half (row 1 to 87,448) contains a table for each nucleotide in the sequence. These tables contain the cumulative base pairing information for that nucleotide as predicted throughout the scan. Column A refers to the i-nucleotide of the sequence. Column B refers to the coordinate of the j base pair. The total number of windows the i-j pair appears, as well as the total number of windows the i-nucleotide appears are reported in column D. The average window minimum free energy, z-score, and ensemble diversity of each i-j pair are reported in columns E-G respectively. Column H reports the sum of z-scores for each i-j pair, which is used to calculate the coverage-normalized z-score (calculated as the sum of z-score over total windows in which i-nucleotide appeared) as reported in Column I. Column J reports a summary of the bps predicted for each i-nucleotide. The second half of the log file, starting at row 87,449, is a list of the most favorable i-j pairs (column B and C) associated with the i-nucleotide listed in column A. In places where this nucleotide competed with other i-nucleotides for the same j-nucleotide, the “winning” i-j pair is reported and denoted with an asterisk (in some cases the winning i-j pair does not contain the original i-nucleotide or may be unpaired). Columns D, E, and F, contain the average window minimum free energy, z-score and ensemble diversity for the corresponding i-j pair.
Table S3. Results of 37 ZIKV genomes curated in the ZikaVR database (Gupta et al. 2016) aligned to KJ776791.2
Genomes were aligned using the MAFFT web server (Katoh et al. 2017; Kuraku et al. 2013) with default settings. Headings for each result contain the NCBI accession numbers and name of the aligned sequence name.
Table S4. Base pair counts tabulating the number and type of base pair which appears in the ScanFold < -1 predicted structure when compared to 37 aligned ZIKV genome
A total of 37 ZIKV genomes were aligned to KJ776791.2 using the MAFFT web server (Katoh et al. 2017; Kuraku et al. 2013) using default settings. Aligned sequences were compared to ScanFold-Fold predicted bps (with z-score < -1) to tabulate the types of base pairs which are found throughout the alignment (Table S3). Column S reports the percent of canonical bps which were found to be allowed throughout the alignment for that base pair and column T reports the different number of canonical base pair types. Results for the previously reported 5' and 3' UTR structural regions appear as separate worksheets.