A survey of HK, HPt, and RR domains and their organization in two-component systems and phosphorelays proteins of organisms with fully sequenced genomes

Departament de Cienciès Mèdiques Bàsiques, Universitat de Lleida, Lleida, Catalonya, Spain
Departament de Cienciès Mèdiques Bàsiques, Universitat de Lleida - IRBLleida, Lleida, Catalonya, Spain
Departament de Cienciès Mèdiques Bàsiques, Universitat de Lleida, Lleida, Catalonia, Spain
DOI
10.7287/peerj.preprints.1267v1
Subject Areas
Bioinformatics, Computational Biology, Mathematical Biology, Synthetic Biology
Keywords
Biological Organization Principles, Mathematical Modeling, Signal Transduction, Domain organization
Copyright
© 2015 Salvado et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited.
Cite this article
Salvado B, Vilaprinyo E, Sorribas A, Alves R. 2015. A survey of HK, HPt, and RR domains and their organization in two-component systems and phosphorelays proteins of organisms with fully sequenced genomes. PeerJ PrePrints 3:e1267v1

Abstract

Two Component Systems and Phosphorelays (TCS/PR) are environmental signal transduction cascades in prokaryotes and, less frequently, in eukaryotes. The internal domain organization of proteins and the topology of TCS/PR cascades play an important role in shaping the responses of the circuits. It is thus important to maintain updated censuses of TCS/PR proteins in order to identify the various topologies used by nature and enable a systematic study of the dynamics associated with those topologies. To create such a census, we analyzed the proteomes of 7609 organisms from all domains of life with fully sequenced and annotated genomes. To begin, we survey each proteome searching for proteins containing domains that are associated with internal signal transmission within TCS/PR: Histidine Kinase (HK), Response Regulator (RR) and Histidine Phosphotranfer (HPt) domains, and analyze how these domains are arranged in the individual proteins. Then, we find all types of operon organization and calculate how much more likely are proteins that contain TCS/PR domains to be coded by neighboring genes than one would expect from the genome background of each organism. Finally, we analyze if the fusion of domains into single TCS/PR proteins is more frequently observed than one might expect from the background of each proteome. We find 50 alternative ways in which the HK, HPt, and RR domains are observed to organize into single proteins. In prokaryotes, TCS/PR coding genes tend to be clustered in operons. 90% of all proteins identified in this study contain just one of the three domains, while 8% of the remaining proteins combine one copy of an HK, a RR, and/or an HPt domain. In eukaryotes, 25% of all TCS/PR proteins have more than one domain. These results might have implications for how signals are internally transmitted within TCS/PR cascades. These implications could explain the selection of the various designs in alternative circumstances.

Author Comment

This paper aims at cataloguing all proteins that potentially participate in TCS and phosphorelay cascades in fully sequenced genomes and at finding all unique internals signal transduction domain organizations in those cascades. This sets the stage for future systematic studies about how that organization might affect the dynamics of signal transduction in the cascades.

Supplemental Information

Ratio between the observed and the randomly expected frequency of HK genes located next to RR genes in the genome

Phylum abbreviations are explained in Table 1. The colored box represents the range of percentage values comprised between the 25% and the 75% quantiles, and the edges of the vertical bar denotes the upper and lower percentage values for each phylum.

DOI: 10.7287/peerj.preprints.1267v1/supp-1

List of the 7609 surveyed species, classified per phylum

DOI: 10.7287/peerj.preprints.1267v1/supp-2

Percentage of each TCS/PR protein type per phylum

Phylum abbreviations are given in Table 1.

DOI: 10.7287/peerj.preprints.1267v1/supp-3

Percentage of HK genes and RR genes that are neighbors in the genome to other TCS/PR genes

Phylum abbreviations are given in Table 1.

DOI: 10.7287/peerj.preprints.1267v1/supp-4

Odds ratios (ratio between the observed and the randomly expected frequency) of HK genes located next to RR and HK2 genes in the genome

Only species with HK and RR genes are taken into account in the percentages. Phylum abbreviations are given in Table 1.

DOI: 10.7287/peerj.preprints.1267v1/supp-5

Genomic distribution of all IST domain coding genes per phylum

Phylum abbreviations are given in Table 1. Here we list all combinations of HK, RR and HPt coding genes found in consecutive positions in the genome, sorted by the number of orphan genes (with no other IST domain coding genes).

DOI: 10.7287/peerj.preprints.1267v1/supp-6

Odds ratios (ratio between the observed and the randomly expected frequency) of HK genes located in the genome next to RR, HK2 and RR2 genes

Only species with HK and RR genes are taken into account in the percentages. Phylum abbreviations are given in Table 1.

DOI: 10.7287/peerj.preprints.1267v1/supp-7

Odds ratios (ratio between the observed and the randomly expected frequency) of HKRR genes located in the genome next to HK2 and RR2 genes

Only species with HKRR genes are taken into account in the percentages. Phyla without this type of protein (Aquificae, Tenericutes, Chlamydiae, Crenarchaeota, Korarchaeota, Nanoarchaeota, Nanohaloarchaeota, Alveolates, Euglenozoa, Microsporidians and Monocots) do not appear in the table. Phylum abbreviations are given in Table 1.

DOI: 10.7287/peerj.preprints.1267v1/supp-8

Odds ratios (ratio between the observed and the randomly expected frequency) of HKRRHPt genes located in the genome next to RR2 genes

Only species with HKRRHPt proteins are taken into account in the percentages. Prokaryotic phyla without this type of protein (Aquificae, Tenericutes, Deinococcus-Thermus, Fibrobacteres, Elusimicrobia, Armatimonadetes, Nitrospinae, Crenarchaeota, Korarchaeota, Thaumarchaeota, Nanoarchaeota and Nanohaloarchaeota) do not appear in the table. Eukaryotes are not included in the table since we have not found any HKRRHPt protein in this domain. Phylum abbreviations are given in Table 1.

DOI: 10.7287/peerj.preprints.1267v1/supp-9

Odds ratios (ratio between the observed and the randomly expected frequency) of HKRRHK2 genes located in the genome next to RR2 genes

Only species with HKRRHK2 proteins are taken into account in the percentages. Phyla without this type of protein (Aquificae, Tenericutes, Hyperthermophilic bacteria, Chloroflexi, Gemmatimonadetes, Fibrobacteres, Chlamydiae, Lentisphaerae, Planctomycetes, Chlorobi, Fusobacteria, Chrysiogenetes, Elusimicrobia, Armatimonadetes, Epsilonproteobacteria, Zetaproteobacteria, Other Proteobacteria, Nitrospinae, Synergistetes, Crenarchaeota, Korarchaeota, Thaumarchaeota, Nanoarchaeota, Nanohaloarchaeota, Alveolates, Amoeboflagellate, Euglenozoa, Microsporidians and Monocots) do not appear in the table. Eukaryotic phyla are not included in this statistics because, although there are some eukaryotic species with HKRRHK2 proteins, none of those HKRRHK2 genes have been found neighboring an RR gene. Phylum abbreviations are given in Table 1.

DOI: 10.7287/peerj.preprints.1267v1/supp-10

Supplementary Appendix

File including all figures and tables redone to include hypothetical proteins. Results are similar to those obtained for the dataset where these proteins are excluded.

DOI: 10.7287/peerj.preprints.1267v1/supp-11