A survey of HK, HPt, and RR domains and their organization in two-component systems and phosphorelays proteins of organisms with fully sequenced genomes
- Published
- Accepted
- Subject Areas
- Bioinformatics, Computational Biology, Mathematical Biology, Synthetic Biology
- Keywords
- Biological Organization Principles, Mathematical Modeling, Signal Transduction, Domain organization
- Copyright
- © 2015 Salvado et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
- Cite this article
- 2015. A survey of HK, HPt, and RR domains and their organization in two-component systems and phosphorelays proteins of organisms with fully sequenced genomes. PeerJ PrePrints 3:e1267v1 https://doi.org/10.7287/peerj.preprints.1267v1
Abstract
Two Component Systems and Phosphorelays (TCS/PR) are environmental signal transduction cascades in prokaryotes and, less frequently, in eukaryotes. The internal domain organization of proteins and the topology of TCS/PR cascades play an important role in shaping the responses of the circuits. It is thus important to maintain updated censuses of TCS/PR proteins in order to identify the various topologies used by nature and enable a systematic study of the dynamics associated with those topologies. To create such a census, we analyzed the proteomes of 7609 organisms from all domains of life with fully sequenced and annotated genomes. To begin, we survey each proteome searching for proteins containing domains that are associated with internal signal transmission within TCS/PR: Histidine Kinase (HK), Response Regulator (RR) and Histidine Phosphotranfer (HPt) domains, and analyze how these domains are arranged in the individual proteins. Then, we find all types of operon organization and calculate how much more likely are proteins that contain TCS/PR domains to be coded by neighboring genes than one would expect from the genome background of each organism. Finally, we analyze if the fusion of domains into single TCS/PR proteins is more frequently observed than one might expect from the background of each proteome. We find 50 alternative ways in which the HK, HPt, and RR domains are observed to organize into single proteins. In prokaryotes, TCS/PR coding genes tend to be clustered in operons. 90% of all proteins identified in this study contain just one of the three domains, while 8% of the remaining proteins combine one copy of an HK, a RR, and/or an HPt domain. In eukaryotes, 25% of all TCS/PR proteins have more than one domain. These results might have implications for how signals are internally transmitted within TCS/PR cascades. These implications could explain the selection of the various designs in alternative circumstances.
Author Comment
This paper aims at cataloguing all proteins that potentially participate in TCS and phosphorelay cascades in fully sequenced genomes and at finding all unique internals signal transduction domain organizations in those cascades. This sets the stage for future systematic studies about how that organization might affect the dynamics of signal transduction in the cascades.
Supplemental Information
Ratio between the observed and the randomly expected frequency of HK genes located next to RR genes in the genome
Phylum abbreviations are explained in Table 1. The colored box represents the range of percentage values comprised between the 25% and the 75% quantiles, and the edges of the vertical bar denotes the upper and lower percentage values for each phylum.
List of the 7609 surveyed species, classified per phylum
Percentage of each TCS/PR protein type per phylum
Phylum abbreviations are given in Table 1.
Percentage of HK genes and RR genes that are neighbors in the genome to other TCS/PR genes
Phylum abbreviations are given in Table 1.
Odds ratios (ratio between the observed and the randomly expected frequency) of HK genes located next to RR and HK2 genes in the genome
Only species with HK and RR genes are taken into account in the percentages. Phylum abbreviations are given in Table 1.
Genomic distribution of all IST domain coding genes per phylum
Phylum abbreviations are given in Table 1. Here we list all combinations of HK, RR and HPt coding genes found in consecutive positions in the genome, sorted by the number of orphan genes (with no other IST domain coding genes).
Odds ratios (ratio between the observed and the randomly expected frequency) of HK genes located in the genome next to RR, HK2 and RR2 genes
Only species with HK and RR genes are taken into account in the percentages. Phylum abbreviations are given in Table 1.
Odds ratios (ratio between the observed and the randomly expected frequency) of HKRR genes located in the genome next to HK2 and RR2 genes
Only species with HKRR genes are taken into account in the percentages. Phyla without this type of protein (Aquificae, Tenericutes, Chlamydiae, Crenarchaeota, Korarchaeota, Nanoarchaeota, Nanohaloarchaeota, Alveolates, Euglenozoa, Microsporidians and Monocots) do not appear in the table. Phylum abbreviations are given in Table 1.
Odds ratios (ratio between the observed and the randomly expected frequency) of HKRRHPt genes located in the genome next to RR2 genes
Only species with HKRRHPt proteins are taken into account in the percentages. Prokaryotic phyla without this type of protein (Aquificae, Tenericutes, Deinococcus-Thermus, Fibrobacteres, Elusimicrobia, Armatimonadetes, Nitrospinae, Crenarchaeota, Korarchaeota, Thaumarchaeota, Nanoarchaeota and Nanohaloarchaeota) do not appear in the table. Eukaryotes are not included in the table since we have not found any HKRRHPt protein in this domain. Phylum abbreviations are given in Table 1.
Odds ratios (ratio between the observed and the randomly expected frequency) of HKRRHK2 genes located in the genome next to RR2 genes
Only species with HKRRHK2 proteins are taken into account in the percentages. Phyla without this type of protein (Aquificae, Tenericutes, Hyperthermophilic bacteria, Chloroflexi, Gemmatimonadetes, Fibrobacteres, Chlamydiae, Lentisphaerae, Planctomycetes, Chlorobi, Fusobacteria, Chrysiogenetes, Elusimicrobia, Armatimonadetes, Epsilonproteobacteria, Zetaproteobacteria, Other Proteobacteria, Nitrospinae, Synergistetes, Crenarchaeota, Korarchaeota, Thaumarchaeota, Nanoarchaeota, Nanohaloarchaeota, Alveolates, Amoeboflagellate, Euglenozoa, Microsporidians and Monocots) do not appear in the table. Eukaryotic phyla are not included in this statistics because, although there are some eukaryotic species with HKRRHK2 proteins, none of those HKRRHK2 genes have been found neighboring an RR gene. Phylum abbreviations are given in Table 1.
Supplementary Appendix
File including all figures and tables redone to include hypothetical proteins. Results are similar to those obtained for the dataset where these proteins are excluded.