Robust phylogenetic profile clustering for Saccharomyces cerevisiae proteins
Abstract
Background: Genes are continually formed and lost as a genome evolves. However, new genes may tend to appear during specific evolutionary epochs rather than others, or disappear together in a more recent organismal clade. Such epochal populations of new genes might tend to have specific sequence traits or functional associations. Methods: To investigate the epochal behaviour of gene origination, the concept of phylogenetic profiles was applied. A phylogenetic profile is simply an array indicating the presence or absence of a gene in a list of species. These profiles were compared and clustered to discern patterns in gene occurrences across >800 fungal species, centring the analysis on the budding yeast Saccharomyces cerevisiae. Results: Clear epochs of gene origination were observed linked to the last common ancestors of Saccharomycetaceae and Saccharomycetes, and also to Fungi and earlier ancestors. These trends are independent of the proteome and genome-assembly quality of the underlying data. Clusters of phylogenetic profiles demonstrated some significant functional associations, such as to cellular spore formation and chromosome segregation in genes originating in Saccharomycetaceae. The phylogenetic profile clustering analysis enabled detection of parameter-independent trends in intrinsic disorder, prion-like composition and gene uniqueness as a function of epochal gene age. For example, new prion-like genes tend to be enriched in genes emerging later in fungal evolution centred on S. cerevisiae. Conclusions: The profile cluster data generated here are useful for investigating experimental hypotheses, since they provide evidence for functional linkages that have yet to be discerned.