Variable absorption of mutational trends by prion-forming domains during Saccharomycetes evolution

Prions are self-propagating alternative states of protein domains. They are linked to both diseases and functional protein roles in eukaryotes. Prion-forming domains in Saccharomyces cerevisiae are typically domains with high intrinsic protein disorder (i.e., that remain unfolded in the cell during at least some part of their functioning), that are converted to self-replicating amyloid forms. S. cerevisiae is a member of the fungal class Saccharomycetes, during the evolution of which a large population of prion-like domains has appeared. It is still unclear what principles might govern the molecular evolution of prion-forming domains, and intrinsically disordered domains generally. Here, it is discovered that in a set of such prion-forming domains some evolve in the fungal class Saccharomycetes in such a way as to absorb general mutation biases across millions of years, whereas others do not, indicating a spectrum of selection pressures on composition and sequence. Thus, if the bias-absorbing prion formers are conserving a prion-forming capability, then this capability is not interfered with by the absorption of bias changes over the duration of evolutionary epochs. Evidence is discovered for selective constraint against the occurrence of lysine residues (which likely disrupt prion formation) in S. cerevisiae prion-forming domains as they evolve across Saccharomycetes. These results provide a case study of the absorption of mutational trends by compositionally biased domains, and suggest methodology for assessing selection pressures on the composition of intrinsically disordered regions.


INTRODUCTION
Prion formation and propagation has been discovered and investigated chiefly in the budding yeast Saccharomyces cerevisiae, which is a member of the fungal class Saccharomycetes. The yeast S. cerevisiae has >200 prion-like proteins that have N/Q-rich domains of the sort observed in ≥8 known prion-formers (An, Fitzpatrick & Harrison, 2016;Harbi & Harrison, 2014a;Harbi et al., 2012). Such yeast prions have been causally associated with diverse phenomena including evolutionary capacitance, large-scale genetic control, and yeast disease-like conditions. Examples of these proteins in S. cerevisiae are reviewed in the introduction to a previous paper (Su & Harrison, 2019). In the fission yeast Schizosaccharomyces pombe and the fungus Podospora anserina, there are also observed prions (Saupe, 2011;Sideri et al., 2017). Recently, prion-like proteins have been associated with the formation of membraneless biomolecular condensates, such as stress granules (Franzmann et al., 2018;Jain et al., 2016).
The original mammalian Prion Protein domain does not have N/Q bias, and is conserved deeply since in early chordates a Prion Protein ancestral gene appeared (Ehsani et al., 2011;Harrison, Khachane & Kumar, 2010;Westaway et al., 2011). However, Sup35p (which underlies the [PSI + ] prion) has an N/Q bias that is prevalent across the Ascomycota and Basidiomycota phyla, which had a last common ancestor > 1 billion years ago (Harrison et al., 2007). A surge in the emergence of N/Q-rich yeast-prion-like proteins early in Saccharomycetes evolution resulted from mutational trends to form more polyasparagine tracts, providing the molecular basis from which several known prion-forming domains seem to have spawned (An, Fitzpatrick & Harrison, 2016). Prion-forming domains from S. cerevisiae tend to evolve more quickly as sequences than other prion-like domains but maintain their prion-like composition (Su & Harrison, 2019). In humans, several yeast-prion-like proteins are implicated in neurodegenerative processes (Kim et al., 2013;Pokrishevsky, Grad & Cashman, 2016;Sun et al., 2011). In Aplysia and Drosophila, such proteins have been associated with formation and preservation of long-term memory (Khan et al., 2015;Si et al., 2010). Other eukaryotes, such as Drosophila melanogaster, Plasmodium falciparum and the leech Helobdella robusta are home to substantial sets of prion-like proteins (An & Harrison, 2016;Pallares et al., 2018). The slime mold Dictyostelium has greater than fifth of its proteome displaying prion-like composition (An & Harrison, 2016;Malinovska & Alberti, 2015), and it maintains a cellular system for avoiding prion-like aggregation and propagation (Malinovska & Alberti, 2015;Malinovska et al., 2015). In all domains of life, prion-like proteins have been observed (Espinosa Angarica, Ventura & Sancho, 2013;Tetz & Tetz, 2017;Tetz & Tetz, 2018), with many thousands annotated in bacteria (Harrison, 2019;Iglesias, De Groot & Ventura, 2015). Bacterial prion-forming proteins have been observed experimentally (Molina-Garcia et al., 2018;Shahnawaz et al., 2017;Yuan et al., 2014;Yuan & Hochschild, 2017). Hundreds of bacterial prion-like proteins occur across multiple bacterial phyla in a sparse conservation pattern (Harrison, 2019).
Here, the evolution of the sequences of prion-forming domains in Saccharomycetes is re-visited, but from the point of view of mutation biases. Protein regions are discovered to variably absorb mutation biases that are observable in the proteome as a whole. This is evidenced in the numbers of prion-like proteins, the percentage of guanidine and cytidine (GC%) in the DNA, and the proportions of poly-asparagine and poly-glutamine.

Prion-like composition
Prion-like composition in orthologs was calculated in two ways, firstly using the PLAAC prion-like domain annotation program (Lancaster et al., 2014), and secondly using the fLPS program for annotation of compositional biases (Harrison, 2017). These were both run using default parameters, except that for fLPS the expected frequency for glutamine and asparagine residues was set equal to 0.05. For PLAAC, both the PRD score and the LLR score were analysed; the former is an indicator of the overall amount of prion-like composition in an annotated bounded prion-like region, while the latter indicates the prion-like sequence composition of the best sequence window (Lancaster et al., 2014). PLAAC scores < 0.0 or labelled 'N/A' in the output are set equal to 0.0 here.

Measures of proteome bias
Several measures of compositional bias across proteomes/genomes were examined: (i) %N (asparagine) in the proteome; (ii) %Q (glutamine) in the proteome; (iii) % poly-N in the proteome (with a minimum tract length of 3); (iv) % poly-Q in the proteome (with a minimum tract length of 3); (v) % poly-Q + poly-N in the proteome (with a minimum tract length of 3); (vi) %GC in the DNA; (vii) The fraction of N/Q-rich proteins in the proteome according to a specific fLPS bias P-value threshold (either 1e−08, 1e−10 or 1e−12); (viii) The fraction of proteins in the proteome with prion-like composition according to the program PLAAC (with PRD score >0.0, ≥15.0 or ≥30.0, or similarly for LLR score). Measures (i) to (v) were chosen since there are indicative of general mutational trends that are relevant to the predominant compositional biases of prion-forming domains in S. cerevisiae, namely bias towards asparagine and glutamine and tracts of these residue types (An, Fitzpatrick & Harrison, 2016). Measure (vi) (%GC) is the most basic compositional trend that can be analyzed for genomic DNA, which might underlie trends at the amino-acid level. Measures (vii) to (viii) indicate the degree to which individual proteins throughout the proteome have prion-like compositional biases to a certain level, and so would indicate how every protein is on average affected by mutational trends.

Correlations
Both weighted and unweighted Pearson correlation coefficients were calculated to assess the correlations of individual prion-like composition with the general trends in the proteome. Weightings for plot points were calculated according to their closest similarity with another protein, calculated as (1-%I/100), where %I is the percentage sequence identity in the most significant BLASTP sequence alignment (Altschul et al., 1997). These weightings were summed appropriately, as described in previous analyses (Harrison, 2019;Su & Harrison, 2019). Results indicate that the overall outcomes for specific proteins are not affected by non-usage of such weightings (see below).

Initial example: the Ure2 prion-forming domain demonstrates strong absorption of mutational trends
As an initial example, the evolutionary behaviour of compositional biases in the prionforming domain of Ure2p, which underlies the [URE3] prion, was examined (Figs. 1-2). The current data indicate that an ancestor of the Ure2p prion-forming domain with a strong N/Q-rich prion-like composition originated early in Saccharomycetes evolution (at least in the last common ancestor of the diverse families Debaryomycetaceae and Saccharomycetaceae), in agreement with results in previous publications (An, Fitzpatrick & Harrison, 2016;Harrison et al., 2007) (Fig. 3; the organismal branching pattern from recent fungal phylogenies was used (Kurtzman & Robnett, 2013;Shen et al., 2016). In general, there is a strong correlation between the degree of bias in the N/Q-rich region of Ure2p and the degree of compositional bias in the whole proteome/genome by several indicators (%polyasparagine or %[polyasparagine + polyglutamine] or DNA GC% or fraction of N/Q-rich prion-like proteins with fLPS P-value < 10 −10 ) (Fig. 1). The correlations with PLAAC prion-like composition score are lower, but both measures have strong correlations with %GC in DNA (Fig. 2). Thus, during the surge in formation of prion-like regions during Saccharomycetes evolution (An, Fitzpatrick & Harrison, 2016), the degree of N-bias in the individual prion-former Ure2p also increased in correlation with the general trend as it panned out across various sub-clades.

Other prion-forming proteins show a variable spectrum of absorption of mutational trends across Saccharomycetes
Of the known amyloid-based prions-as well as Ure2p-Swi1p, Cyc8p and Sup35p have domains of prion-like composition or N/Q bias that are widespread across Saccharomycetes (in 84% of orthologs for Cyc8p, 98% for Swi1p, and 90% for Sup35p; Table S1), with such domains of these latter three also arising in other Ascomycota clades (An, Fitzpatrick & Harrison, 2016;Harrison et al., 2007).
In general, there are strong correlations for Ure2p, Swip and Cyc8p with %N, %poly-N, %GC in DNA and with the numbers of proteins with prion-like composition (Tables 1-2). Within these general trends, these four demonstrate a spectrum of responses to the overall proteome-wide mutational trends, with Ure2p being the strongest correlator. Sup35p stands out as an exception; it shows on the whole weaker correlations generally with %N and %poly-N, and stronger correlations with %poly-Q than the other three. This may be because there is selection pressure to maintain a specific proportion of Qs in specific local patterns or ratios (MacLea et al., 2015). Furthermore, Pin3 protein also has a widespread prion-like domain across Saccharomycetes, there being 52/55 (95%) Saccharomycetes Pin3 orthologs having PLAAC LLR scores > 15.0. However, the degree of conservation of N/Q-rich bias per se is lower for this protein with 38/55 (75%) having a fLPS compositional bias P-value ≤1e-10. The metastable prion domain of Pin3 is the only known amyloid-based prion in S. cerevisiae to demonstrate very little correlation for its prion-like compositional biases, indicating some selection pressure for composition of a different sort, that nonetheless may preserve prion-forming ability (Tables 1-2).
The other three cases (Mot3p, Rnq1p and Nup100p) have either more recent ancestry as novel prion-like domains within Saccharomycetes (in the case of Mot3p and Rnq1p), or they arise sporadically in fungal species (Nup100p) (An, Fitzpatrick & Harrison, 2016;Su & Harrison, 2019). These three are thus not expected to demonstrate many significant correlations with measures of compositional bias, but nonetheless we see a mild negative correlation for the fLPS compositional bias of Rnq1p and Mot3p versus %Q in the proteome (Table 1), which is not typical of the other prion-forming proteins, suggesting selection pressures against Q bias in these evolutionarily recently emergent proteins. Saccharomycetaceae (29) Saccharomycodaceae (3) Phaffomycetaceae (5) Ascoideaceae (1) Debaryomycetaceae / Metschnikowiaceae (17) Pichiaceae (5) Dipodascaceae / Trichomonascaceae (2) Trigonopsidaceae (1) Lipomycetaceae (1) fLPS biased (P≤1e-10) PLAAC score ≥15.0 There is one species that is often a far outlier when these trends are examined, Ascoidea rubescens (see for example, for Ure2p in Figs. 1-2), an uncharacterized species that is the sole member of the family Ascoideaceae, which is geographically widely distributed and typically grows in beetle galleries in dead wood. It has a very high proportion of poly-N-rich proteins (Tables 1-2). Removal of this outlier species from the correlation analysis causes a substantial increase in correlations with %N and %poly-N, but not for %GC in DNA.
Thus, the three S. cerevisiae prion-forming proteomes Ure2p, Cyc8p and Swi1p appear to absorb the general mutational trends linked to the surge in formation of prion-like domains, that was observed previously (An, Fitzpatrick & Harrison, 2016). This trend is linked to a general decrease in %GC in the DNA (Tables 1-2).
Two other separately studied prion-forming domains are from New1p and Pub1p (Li et al., 2014;Osherovich & Weissman, 2001). These are both strongly correlated proteome-bias absorbers, with Pub1p (which is a hub for protein interaction with other prion-like proteins (Harbi & Harrison, 2014b) uniquely amongst all of the prion-forming domains displaying a strong correlation for both poly-N and poly-Q (Tables 1-2). Pub1p is strongly correlated despite having a low number of orthologous prion domains that have high bias for N and Q residues (53% with fLPS P-value ≤1e-10; Table S1) indicating that there is still correlated behavior for the weaker N/Q biases for this protein. Other prion-forming domains observed in the analysis of Alberti et al. (2009), also display a similar spectrum of bias absorption across Saccharomycetes evolution (Table S1). Highly-correlated bias absorbers from this data whose prion-like domains are widespread in Saccharomycetes include Lsm4p and Gln3p, whereas other widespread prion-like domains show little or no correlation, such as Ngr1p (Tables S1, S2).
Compared to the results for N/Q-compositional bias calculated using fLPS (Table 1), the trends for prion-like composition calculated using the PRDscore from PLAAC, are similar except that New1p loses many significant correlations, and an increased correlation is captured for Sup35p versus the general mutational trends linked to the large-scale surge in formation of prion-like domains (An, Fitzpatrick & Harrison, 2016). Similar trends for PLAAC are observed if Spearman correlation coefficients are applied (by reason of some proteins having several 0.0-value PLAAC PRDscores in orthologs) ( Table S3).
The above analysis uses the PLAAC PRDscore, to define the amount of prion-like composition in a bounded region, and so reflecting more absorption of biases in a way analogous to the working of the fLPS algorithm (Harrison, 2017;Lancaster et al., 2014). The PLAAC log-likelihood ratio (LLR) score has been used in the literature to pick out the most likely prion-forming sequence window within proteins (Alberti et al., 2009;An, Fitzpatrick & Harrison, 2016;Sideri et al., 2017;Tetz & Tetz, 2018). Despite the restriction of a window of fixed size (41 amino-acid residues), these LLR scores also demonstrate a similar spectrum of bias absorption, with both strong and weak absorbers evident, albeit generally with less significance (Table S4).

Table 1 Table for a set of known prion-forming domains of the correlations (weighted and unweighted) between the compositional bias (−log[fLPS P-value]), and a variety of parameters.
Weighted correlations are the upper value in each cell, unweighted the lower value. Where removal of the common far outlier species Ascoidea rubescens causes increased significance for any correlation, the third and fourth rows in a cell display the correlation coefficients (in italics). For proteins which do not have an ortholog from Ascoidea rubescens, the name is labelled with ' † †'. If its removal causes no improvement in correlations, it is labelled with ' †'. Correlations significant at ≥0.0005 are labelleda and in bold, significant at <0.0005 and ≥0.0016 labelled ** and underlined, and <0.0016, and ≥0.05 are labelled *). The threshold 0.0016 comes from a Bonferroni correction to allow for the fact that 31 sequences are being tested for a correlation against any specific proteome-wide property. In column one, the name is styled according to the most significant correlation.   Table 2 Table for a

set of known prion-forming domains of the correlations (both weighted and un-weighted) between the prion-like composition (PLAAC PRDscore) and a variety of parameters.
Weighted correlations are the upper value in each cell, unweighted the lower value. Where removal of the common far outlier species Ascoidea rubescens causes increased significance for any correlation, the third and fourth rows in a cell display the correlation coefficients (in italics). For proteins which do not have an ortholog from Ascoidea rubescens, the name is labelled with ' † †'. If its removal causes no improvement in correlations, it is labelled with ' †'. Correlations significant at ≥ 0.0005 are labelled *** and in bold, significant at > 0.0005 and ≥ 0.0016 labelled ** and underlined, and > 0.0016, and ≥ 0.05 are labelled *). The threshold 0.0016 comes from a Bonferroni correction to allow for the fact that 31 sequences are being tested for a correlation against any specific proteome-wide property. In column one, the name is styled according to the most significant correlation.

Prion-like N/Q-rich regions generally maintain lower lysine content than the rest of the proteome in Saccharomycetes
It was checked whether the N/Q-rich regions are also rich in lysine, which is encoded by AT%-rich codons, like N (asparagine). Lysine has low prion formation propensity and charged residues are disruptive to prion formation and have low prion formation propensity (Lancaster et al., 2014;Osherovich & Weissman, 2001). Lysine is a disorder-promoting residue (Oldfield & Dunker, 2014) and some intrinsically disordered regions have high positive charge (Hatos et al., 2020;Necci et al., 2018). However, the N/Q-rich regions consistently in general have lower lysine content that the remainder of the Saccharomycetes proteomes (Fig. 4). That is, the vast majority of Saccharomycetes species (∼98%) are below the x=y line on the scatter plot (Fig. 4A). This is also obvious in the distributions of K fraction (Fig. 4B, values for prion-formers are lower, t -test P = 1e−140). Thus, these regions are not simply absorbing higher levels of AT% in their DNA through the embedding within them of amino-acid residues encoded by codons with high AT%.

DISCUSSION
These results indicate that compositional aspects of many individual prion-formers behaved in a correlated way in relation to general trends as they panned out over millions of years across various sub-clades. Also, this surge in prion-like region formation is directly linked to a general trend for GC% decrease across the Saccharomycetes clade. However, some prion-forming domains resist the absorption of such mutational trends, such as the meta-stable prion-former Lsb2/Pin3 (Chernova et al., 2017b), despite it being as widely conserved as a protein as those that more easily absorb biases, such as Cyc8p and Swi1p. This suggests some greater selection pressure on amino-acid composition. The Sup35p prion-forming domain also shows some special behavior: demonstrating a stronger correlation between overall proteome poly-Q levels and its own N or Q compositional bias as determined by the program fLPS. The Sup35 prion-forming domain has a subdomain with specific local patterns involving Q residues that is required for chaperone-dependent prion maintenance, that is separate from the N-terminal N/Q-rich region that is necessary for prion nucleation and fibre growth (MacLea et al., 2015). Also, the Sup35 prion-like domain has a more ancient origin before the last common ancestor of Saccharomycetes, and outside this clade it tends to have a predominant Q-bias that has been maintained within Saccharomycetes, resisting the trend for greater N-bias (An, Fitzpatrick & Harrison, 2016). However, this is also the behaviour of Cyc8p and Swi1p outside of Saccharomycetes (An, Fitzpatrick & Harrison, 2016), so this result is demonstrating an evolutionary behavior peculiar to Sup35p. The Pub1p prion-forming domain shows strong correlations for both Q and N bias indicators. It is possible that proteins such as Pub1p that interact a lot with other prion-like proteins (Harbi & Harrison, 2014b) 'need' to absorb more general compositional trends so that they can promiscuously bind with a large list of partners. Prion-like aggregation has been shown for both Pub1p in yeast and for its co-ortholog Tia1 in humans (Gilks et al., 2004;Li et al., 2014). Its prion-like composition has also largely been maintained since the  Fig. 4, and orange points for the total list of prion-forming domains including those listed in Table S2. The x=y line is indicated. (B) Histograms for the data in (A). The same colour scheme is kept, except that the data for the rest of the proteome is in grey. Each bin is labelled with its higher bound.
Full-size DOI: 10.7717/peerj.9669/ fig-4 last common ancestor of eukaryotes (Su & Harrison, 2020). Thus, its strong absorption of mutational trends for Q and N residues has not been a barrier to such a conservation of prion-like composition. The methodology applied here might also be useful in the analysis of human proteins with N/Q biases, such as those linked to amytrophic lateral sclerosis or huntingtin from Huntington's disease (An & Harrison, 2016;Monahan et al., 2018), or to other non-N/Qbiased prion-forming domains, such as in alpha-synuclein (Watts, 2019). In particular, prion-forming domains from any such proteins that display little or no correlation with general compositional trends in the proteome may be under selection pressure against aggregation, or for a functional role for which compositional sequence parameters are precisely modulated. Recent research suggests that sequence mutations leading to subtle amino-acid side-chain differences in a short disordered segment of the Sup35p prionforming domain alter its conformational preferences and markedly modify its crossreactivity with infectious prion seeds (Shida et al., 2020). Such subtle effects are interesting in light of the fact that prion formation is largely governed by compositional preferences (Cascarina et al., 2018;Ross, Baxa & Wickner, 2004;Toombs, McCarty & Ross, 2010). Given such considerations, our results imply that some specific segments of prion-forming domains may be under selective constraint, while other segments are more free to absorb large-scale mutational trends, such as the surge in asparagine-rich prion-like tracts during Saccharomycetes speciation (An & Harrison, 2016).
One form of selective constraint was examined in detail, the avoidance of lysine residues. Both asparagine and lysine are encoded by an AT-rich codon repertoire that just differs at the third codon position. Naively one would expect them to co-occur, since lysine has disorder-promoting character and prion-forming domains are intrinsically disordered (Harbi & Harrison, 2014a;Harbi et al., 2012). However, lysine has low prion formation propensity and charged residues are disruptive to prion formation and have low prion formation propensity (Lancaster et al., 2014;Osherovich & Weissman, 2001). Here, we observed that lysine residues are avoided as S. cerevisiae prion-forming domains evolve across Saccharomycetes. Further development of such co-occurrence analysis for amino-acid residue types might yield further clues about the conservation of prion-forming status or other selective constraints on amino-acid composition in protein regions of unknown character (Harrison, 2018).
The results here provide a case study of mutational trend absorption by disordered regions generally. The results suggest some methodology for analyzing selection pressures on individual intrinsically disordered regions within the context of the behaviour of other sequences from the same proteome.

CONCLUSIONS
Many prion-forming domains, and intrinsically disordered regions generally, are continually absorbing overall mutational trends in their proteomes, but this is modulated by specific selection pressures. A spectrum of bias absorption is observed from Lsb2/Pin3which appears refractive to the mutational trends and shows little or no correlation-to Pub1, which shows very strong correlation to both asparagine-and glutamine-based biases.
The present analysis can be seen as a case study of the absorption of mutational trends in compositionally biased domains. The S. cerevisiae prion-forming list of proteins is particularly well-suited for this. Firstly, there is a substantial set of them that has accumulated via experimental analysis over the past two decades. Secondly, within the