Chemical composition and the potential for proteomic transformation in cancer, hypoxia, and hyperosmotic stress

View article


The relationship between cells and tissue microenvironments is a topic of vital importance for cancer biology. Because of rapid cellular proliferation and irregular vascularization, tumors often develop regions of hypoxia (Höckel & Vaupel, 2001). Tumor microenvironments also exhibit abnormal ranges of other physical-chemical variables, including hydration state (McIntyre, 2006; Abramczyk et al., 2014).

Some aspects of the complex metazoan response to hypoxia are mediated by hypoxia-inducible factor 1 (HIF-1). HIF-1 is a transcription factor that is tagged for degradation in normoxic conditions. Under hypoxia, the degradation of HIF-1 is suppressed; HIF-1 can then enter the nucleus and activate the transcription of downstream targets (Semenza, 2003). Indeed, transcriptional targets of HIF-1 are found to be differentially expressed in proteomic datasets for laboratory hypoxia (Cifani et al., 2011; McMahon et al., 2012). However, proteomic studies of cells in hypoxic conditions provide many examples of proteins that are not directly regulated by HIF-1 (McMahon et al., 2012; Fuhrmann et al., 2013), and cancer proteomic datasets also include many proteins that are not known to be regulated by HIF-1.

The complexity of the underlying regulatory mechanisms (McMahon et al., 2012) and the large differences between levels of gene expression and protein abundance (van den Beucken et al., 2011; Cifani et al., 2011; Ho et al., 2016) present many difficulties for a bottom-up understanding of global proteomic trends. As a counterpart to molecular explanations, a systems perspective can incorporate higher-level constraints (Drack & Wolkenhauer, 2011). A commonly used metaphor in systems biology is attractor landscapes. The basins of attraction are defined by dynamical systems behavior, but in many cases are analogous to minimum-energy states in thermodynamics (Emmeche, Koppe & Stjernfelt, 2000; Enver et al., 2009). Nevertheless, little attention has been given to the thermodynamic potential that is inherent to the compositional difference between the up-expressed and down-expressed proteins in proteomic experiments. Such a high-level perspective may require concepts and language that differ from those applicable to molecular interactions (Ellis, 2015).

To better understand the microenvironmental context for compositional changes, this study uses proteomic data as input into a descriptive thermodynamic model. First, a compositional analysis of differentially (up- and down-) expressed proteins identifies consistent trends in the oxidation and hydration states of proteomes of colorectal cancer (CRC), pancreatic cancer, and cells exposed to hypoxia or hyperosmotic stress. These results lay the groundwork for using a thermodynamic model to quantify environmental constraints on the potential for proteomic transformation. Finally, the Discussion section explores some implications of the hypothesis that elevated synthesis of lipids provides an electron sink for the oxidation of proteomes. In this situation, some cancer systems may develop an abnormally large redox disproportionation between pools of cellular biomacromolecules.


Data sources

Tables 14 present the sources of data. Protein IDs and expression (up/down or abundance ratios) were found in the literature, often being reported in the supporting information (SI) or supplementary (suppl.) tables. In some cases, source tables were further processed, using fold-change and significance cutoffs that, where possible, are based on statements made in the primary publication. The data are stored as *.csv files in the R package canprot, which was developed during this study (see and is provided as Dataset S1.

Table 1:
Selected proteomic datasets for colorectal cancer.*
Here and in Tables 24, n1 and n2 stand for the numbers of down- and up-expressed proteins, respectively, in each dataset.
Set n1 n2 Description Set n1 n2 Description
ΩaAⒶ 57 70 T/N ΩsAⒶ 73 175 MSS-type T/Na
ΩbAⒶ 101 28 CRC C/Aa ΩtAⒶ 79 677 T/N
ΩcAⒶ 87 81 CIN C/Aa ΩuAⒶ 55 68 CM T/Nb
ΩdAⒶ 157 76 MIN C/Aa ΩvAⒶ 33 37 stromal T/Na
ΩeAⒶ 43 56 biomarkers up/down ΩwAⒶ 51 55 chromatin-binding C/A
ΩfAⒶ 48 166 stage I/normalb ΩxAⒶ 58 65 epithelial A/N
ΩgAⒶ 77 321 stage II/normalb ΩyAⒶ 44 210 tissue secretome T/Na
ΩhAⒶ 61 57 microdissected T/Nb ΩzAⒶ 113 66 membrane enriched T/N
ΩiAⒶ 71 92 adenoma/normala ΩAAⒶ 1061 1254 A/N
ΩjAⒶ 109 72 stage I/normala ΩBAⒶ 772 1007 C/A
ΩkAⒶ 164 140 stage II/normala ΩCAⒶ 879 1281 C/N
ΩlAⒶ 63 131 stage III/normala ΩDAⒶ 123 75 stromal AD/NCa
ΩmAⒶ 42 26 stage IV/normala ΩEAⒶ 125 60 stromal CIS/NCa
ΩnAⒶ 72 45 T/N ΩFAⒶ 99 75 stromal ICC/NCa
ΩoAⒶ 335 288 A/N ΩGAⒶ 191 178 biopsy T/Nb
ΩpAⒶ 373 257 C/A ΩHAⒶ 113 86 AD/NCa
ΩqAⒶ 351 232 C/N ΩIAⒶ 169 138 CIS/NCa
ΩrAⒶ 75 61 poor/good prognosisb ΩJAⒶ 129 100 ICC/NCa
DOI: 10.7717/peerj.3421/table-1







carcinoma or adenocarcinoma




conditioned media


adenomatous colon polyps


carcinoma in situ


invasive colonic carcinoma


non-neoplastic colonic mucosa

ΩaAⒶ Source: Table 1 and Suppl. Data 1 of Watanabe et al. (2008). ΩbAⒶΩcAⒶΩdAⒶ Nuclear matrix proteome; chromosomal instability (CIN), microsatellite instability (MIN), or both types (CRC). Source: Suppl. Tables 5–7 of Albrethsen et al. (2010). ΩeAⒶ Candidate serum biomarkers. Source: Table 4 of Jimenez et al. (2010). ΩfAⒶ ΩgAⒶ Source: Suppl. Table 4 of Xie et al. (2010). ΩhAⒶ Source: Suppl. Table 4 of Zhang et al. (2010). ΩiAⒶΩjAⒶΩkAⒶΩlAⒶΩmAⒶ Source: Suppl. Table 9 of Besson et al. (2011). ΩnAⒶ Source: Suppl. Table 2 of Jankova et al. (2011). ΩoAⒶ ΩpAⒶ ΩqAⒶ Source: Table S8 of Mikula et al. (2011). ΩrAⒶ Source: extracted from Suppl. Table 5 of Kim et al. (2012), including proteins with abundance ratio >2 or <0.5. ΩsAⒶ Microsatellite stable (MSS) type CRC tissue. Source: Suppl. Table 4 of Kang et al. (2012). ΩtAⒶ Source: Suppl. Table 4 of Wiśniewski et al. (2012). ΩuAⒶ Source: Suppl. Table 2 of Yao et al. (2012). ΩvAⒶ Source: Table 1 of Mu et al. (2013). ΩwAⒶ Source: Table 2 of Knol et al. (2014). ΩxAⒶ Source: Table III of Uzozie et al. (2014). ΩyAⒶ Source: Suppl. Table 1 of de Wit et al. (2014). ΩzAⒶ Source: Supporting Table 2 of Sethi et al. (2015). ΩAAⒶΩBAⒶΩCAⒶ Source: SI Table 3 of Wiśniewski et al. (2015). ΩDAⒶ ΩEAⒶ ΩFAⒶ Source: Suppl. Table S3 of Li et al. (2016). ΩGAⒶ Source: extracted from SI Table S3 of Liu et al. (2016), including proteins with p-value < 0.05. ΩHAⒶΩIAⒶΩJAⒶ Source: Suppl. Table 4 of Peng et al. (2016).
Gene names or GI numbers were converted to UniProt IDs using the UniProt mapping tool.
IPI numbers were converted to UniProt IDs using the DAVID conversion tool.
Table 2:
Selected proteomic datasets for pancreatic cancer.*
Set n1 n2 Description Set n1 n2 Description
ΩaAⒶ 41 69 T/N ΩlAⒶ 29 73 FFPE PC/AIPc
ΩbAⒶ 60 88 T/Na ΩmAⒶ 53 73 FFPE PC/CPc
ΩcAⒶ 48 54 T/Na ΩnAⒶ 83 32 low-grade T/Na
ΩdAⒶ 19 95 CP/Na ΩoAⒶ 224 176 high-grade T/Na
ΩeAⒶ 28 29 T/N ΩpAⒶ 208 219 T/N (no DM)a
ΩfAⒶ 38 45 T/Nb ΩqAⒶ 56 167 T/N (DM)a
ΩgAⒶ 207 152 FFPE T/Na ΩrAⒶ 227 148 LCM PDAC/ANTc
ΩhAⒶ 108 86 accessible T/Nc ΩsAⒶ 65 34 T/N
ΩiAⒶ 38 47 FFPE T/Nc ΩtAⒶ 35 51 mouse 2.5 w T/Na
ΩjAⒶ 78 57 T/Na ΩuAⒶ 40 73 mouse 3.5 w T/Na
ΩkAⒶ 257 456 T/Na ΩvAⒶ 49 84 mouse 5 w T/Na
ΩwAⒶ 37 108 mouse 10 w T/Na
DOI: 10.7717/peerj.3421/table-2







chronic pancreatitis


autoimmune pancreatitis


pancreatic cancer


diabetes mellitus


pancreatic ductal adenocarcinoma


adjacent normal tissue


formalin-fixed paraffin-embedded


laser-capture microdissection


normal pancreas

ΩaAⒶ Pooled tissue samples of PC and matched normal tissue from 12 patients. Source: Tables 2 and 3 of Lu et al. (2004). ΩbAⒶ Two PC and two NP samples. Source: Tables 1 and 2 of Chen et al. (2005). ΩcAⒶ Large-scale immunoblotting (PowerBlot) of 8 tissue specimens of pancreatic intraepithelial neoplasia compared to NP and CP. Source: Table 2 of Crnogorac-Jurcevic et al. (2005). ΩdAⒶ Tissue specimens from patients with CP and 10 control specimens from patients with NP. Source: Table 1 of Chen et al. (2007). ΩeAⒶ 12 carcinoma samples (PDAC), 12 benign pancreatic cystadenomas and 10 normal tissues adjacent to the PDAC primary mass. Source: Table 1 of Cui et al. (2009). ΩfAⒶ Source: extracted from Table S2 of McKinney et al. (2011). ΩgAⒶ PDAC compared to NP. Source: Suppl. Table 3 of Pan et al. (2011). ΩhAⒶ Potentially accessible proteins in fresh samples of PC tumors (three patients) vs normal tissue (two patients with NP and one with CP). Source: extracted from the SI Table of Turtoi et al. (2011). ΩiAⒶ 11 tissue specimens containing >50% cancer and 8 unmatched, uninvolved tissues adjacent to pancreatitis. Source: Suppl. Tables 2 and 3 of Kojima et al. (2012). ΩjAⒶ Fresh-frozen PDAC tissue specimens from seven patients vs a pooled mixture of three normal main pancreatic duct tissue samples. Source: extracted from SI Table S3 of Kawahara et al. (2013), including proteins with an expression ratio >2 [or <0.5] in at least five of the seven experiments and ratio >1 [or <1] in all experiments. ΩkAⒶ Frozen samples of PDAC tumors vs adjacent benign tissue from four patients. Source: Suppl. Table 2 of Kosanam et al. (2013). ΩlAⒶΩmAⒶ Tissue samples from three patients with PC vs 3 patients with AIP or three patients with CP. Source: extracted from Tables 2, 3, and 4 of Paulo et al. (2013). ΩnAⒶ ΩoAⒶ 12 samples each (pooled) of low-grade tumor or high-grade tumor vs non-tumor. Source: extracted from Suppl. Tables S4 and S5 of Wang et al. (2013b), including proteins with ratios ≥3/2 or ≤2/3 for at least two of the four groups, and with expression differences for all four groups in the same direction. ΩpAⒶΩqAⒶ Source: extracted from Suppl. Tables S3 and S4 of Wang et al. (2013a), including proteins with >3/2 or <2/3 fold change in at least 3 of 4 iTRAQ experiments for different pooled samples. ΩrAⒶ LCM of CD24+ cells from PDAC vs CD24 cells from adjacent normal tissue (ANT). Source: SI Table S5 of Zhu et al. (2013). ΩsAⒶ Matched PDAC and normal tissue from nine patients. Source: extracted from SI Table S5 of Iuga et al. (2014), excluding “not passed” proteins (those with inconsistent regulation). ΩtAⒶΩuAⒶΩvAⒶΩwAⒶ PDAC tumors in transgenic mice vs pancreas in normal mice, at time points of 2.5, 3.5, 5 and 10 weeks. Source: Suppl. Table of Kuo et al. (2016).
Gene names, IPI numbers or UniProt names were converted to UniProt IDs using the UniProt mapping tool.
IPI numbers were converted to UniProt IDs using the DAVID conversion tool.
Includes differentially expressed proteins shared between groups and proteins identified in only one group.
Table 3:
Selected proteomic datasets for hypoxia and reoxygenation experiments or growth in 3D culture.*
Set n1 n2 Description Set n1 n2 Description Set n1 n2 Description
ΩaAⒶ 37 24 U937a ΩkAⒶ 56 40 THP-1 ΩvAⒶ 113 154 CRC-derived SPH
ΩbAⒶ 41 22 placental secretome ΩlAⒶ 178 77 A431 Hx48 ΩwAⒶ 127 292 HepG2/C3A SPH
ΩcAⒶ 71 19 B104 ΩmAⒶ 69 54 A431 Hx72 ΩxAⒶ 53 72 HeLa
ΩdAⒶ 87 28 DU145a ΩnAⒶ 48 36 A431 ReOx ΩyAⒶ 137 64 U87MG and 786-O
ΩeAⒶ 29 21 SK-N-BE(2)c; IMR-32 ΩoAⒶ 141 64 SH-SY5Y ΩzAⒶ 129 141 HCT116 transcriptiona
ΩfAⒶ 53 65 H9C2b ΩpAⒶ 65 34 A431 Hx48-S ΩAAⒶ 469 1024 HCT116 translationa
ΩgAⒶ 409 337 MCF-7 SPH P5 ΩqAⒶ 137 61 A431 Hx72-S ΩBAⒶ 66 50 adipose-derived SCa
ΩhAⒶ 248 214 MCF-7 SPH P2 ΩrAⒶ 56 49 A431 ReOx-S ΩCAⒶ 65 27 cardiomyocytes CoCl2a
ΩiAⒶ 48 52 SPH perinecrotica ΩsAⒶ 74 44 A431 Hx48-P ΩDAⒶ 35 69 cardiomyocytes SALa
ΩjAⒶ 101 186 SPH necrotica ΩtAⒶ 67 53 A431 Hx72-P ΩEAⒶ 116 225 HT29 SPH
ΩuAⒶ 41 31 A431 ReOx-P
DOI: 10.7717/peerj.3421/table-3



acute promonocytic leukemic cells


rat neuroblastoma cells


prostate carcinoma cells

SK-N-BE(2)c; IMR-32; SH-SY5Y

neuroblastoma cells


rat heart myoblast


breast cancer cells




epithelial carcinoma cells


hypoxia 48 h


hypoxia 72 h


hypoxia 48 h followed by reoxygenation for 24 h


supernatant fraction


pellet fraction




hepatocellular carcinoma cells




renal clear cell carcinoma cells

HCT116; HT29

colon cancer cells


stem cells



ΩaAⒶ 2% O2 vs normoxic conditions. Source: Table 1 of Han et al. (2006). ΩbAⒶ 1% vs 6% O2. Source: Tables 2 and 3 of Blankley et al. (2010). ΩcAⒶ Expression ratios HYP/LSC (oxygen deprivation/low serum control) >1.2 or <0.83. Source: calculated using data from Suppl. Table 2 of Datta et al. (2010), including proteins with p-value < 0.05 and EF < 1.4. ΩdAⒶ Translationally regulated genes. Source: Suppl. Tables 1–4 of van den Beucken et al. (2011). ΩeAⒶ 1% O2 for 72 h vs standard conditions. Source: Suppl. Table 1(a) of Cifani et al. (2011). ΩfAⒶ Hypoxic vs control conditions for 16 h. Source: Suppl. Table S5 of Li et al. (2012). ΩgAⒶ ΩhAⒶ Tumorspheres (50 to 200 μm diameter) at passage 5 (P5) or 2 (P2) compared to adherent cells. Source: Sheets 2 and 3 in Table S1 of Morrison et al. (2012). ΩiAⒶ ΩjAⒶ Perinecrotic and necrotic regions compared to surface of multicell spheroids (∼600 μm diameter) (expression ratios <0.77 or >1.3). Source: Suppl. Table 1C of McMahon et al. (2012). ΩkAⒶ Incubation for several days under hypoxia (1% O2). Source: Suppl. Table 2A of Fuhrmann et al. (2013) (control virus cells). ΩlAⒶΩmAⒶΩnAⒶ Source: extracted from Suppl. Table 1 of Ren et al. (2013), including proteins with iTRAQ ratios <0.83 or >1.2 and p-value < 0.05. ΩoAⒶ 5% O2 vs atmospheric levels of O2 (normalized expression ratio >1.2 or <0.83). Source: SI table of Villeneuve et al. (2013). ΩpAⒶΩqAⒶΩrAⒶΩsAⒶΩtAⒶΩuAⒶ The comparisons here include proteins with p < 0.05. Source: Suppl. Table S1 of Dutta et al. (2014). ΩvAⒶ Organotypic spheroids (∼250 μm diameter) vs lysed CRC tissue. Source: extracted from Table S2 of Rajcevic et al. (2014), filtered as follows: at least two of three experiments have differences in spectral counts, absolute overall fold change is at least 1.5, and p-value is less than 0.05. ΩwAⒶ SPH vs classical cell culture (2D growth) (log2 fold change at least ±1). Source: P1_Data sheet in the SI of Wrzesinski et al. (2014). ΩxAⒶ 1% vs 19% O2. Source: Table S1 of Bousquet et al. (2015). ΩyAⒶ 1% O2 for 24 h (fold change <0.5 or >1 for proteins detected in only hypoxic or only normoxic conditions). Source: Table S1 of Ho et al. (2016). ΩzAⒶΩAAⒶ Microarray analysis of differential gene expression in the transcriptome (total rRNA) and translatome (polysomal/total RNA ratio) of cells grown in normal and hypoxic (1% O2) conditions. Source: data file supplied by Ming-Chih Lai (Lai, Chang & Sun, 2016). ΩBAⒶ ASC from three donors cultured for 24 h in hypoxic (1% O2) vs normoxic (20% O2) conditions. Source: Tables 1 and 2 of Riis et al. (2016). ΩCAⒶ ΩDAⒶ Rat cardiomyocytes treated with CoCl2 (hypoxia mimetic) vs control or with SAL (anti-hypoxic) vs CoCl2. Source: SI Tables 1S and 2S of Xu et al. (2016). ΩEAⒶ 800 μm spheroids vs 2D monolayers. Source: Tables S1a–b of Yue et al. (2016).
Gene names, GI numbers, or other IDs were converted to UniProt IDs using the UniProt mapping tool.
IPI numbers were converted to UniProt IDs using the DAVID conversion tool.
Table 4:
Selected proteomic datasets for hyperosmotic stress experiments.*
Set n1 n2 Description Set n1 n2 Description
ΩaAⒶ 38 44 S. cerevisiae VHG 2 ha ΩnAⒶ 49 28 eel gilla
ΩbAⒶ 33 62 S. cerevisiae VHG 10 ha ΩoAⒶ 78 77 S. cerevisiae t30ab
ΩcAⒶ 18 65 S. cerevisiae VHG 12 ha ΩpAⒶ 67 67 S. cerevisiae t30bb
ΩdAⒶ 63 94 mouse pancreatic islets ΩqAⒶ 87 87 S. cerevisiae t30cb
ΩeAⒶ 148 44 adipose-derived stem cells ΩrAⒶ 25 38 IOBA-NHC
ΩfAⒶ 17 11 ARPE-19 25 mM ΩsAⒶ 105 96 CAUCR succinate tr.a
ΩgAⒶ 21 24 ARPE-19 100 mM ΩtAⒶ 209 142 CAUCR NaCl tr.a
ΩhAⒶ 114 61 ECO57 25 °C, aw 0.985a ΩuAⒶ 33 33 CAUCR succinate pr.a
ΩiAⒶ 238 61 ECO57 14 °C, aw 0.985a ΩvAⒶ 33 27 CAUCR NaCl pr.a
ΩjAⒶ 263 56 ECO57 25 °C, aw 0.967a ΩwAⒶ 294 205 CHO alla
ΩkAⒶ 372 73 ECO57 14 °C, aw 0.967a ΩxAⒶ 66 75 CHO higha
ΩlAⒶ 32 39 Chang liver cells 25 mM ΩyAⒶ 14 28 Yarrowia lipolyticab
ΩmAⒶ 19 50 Chang liver cells 100 mM ΩzAⒶ 160 141 Paracoccidioides lutziia
DOI: 10.7717/peerj.3421/table-4



very high glucose


human retinal pigmented epithelium cells


Escherichia coli O157:H7 Sakai


human conjunctival epithelial cells


Caulobacter crescentus






Chinese hamster ovary cells

ΩaAⒶΩbAⒶΩcAⒶ VHG (300 g/L) vs control (20 g/L). The comparisons here use proteins with expression ratios <0.9 or >1.1 and with p-values < 0.05. Source: SI Table of Pham & Wright (2008). ΩdAⒶ 24 h at 16.7 mM vs 5.6 mM glucose. Source: extracted from Suppl. Table ST4 of Waanders et al. (2009); including the red- and blue-highlighted rows in the source table (those with ANOVA p-value < 0.01), and applying the authors’ criterion that proteins be identified by 2 or more unique peptides in at least 4 of the 8 most intense LC-MS/MS runs. ΩeAⒶ 300 mOsm (control) or 400 mOsm (NaCl treatment). Source: Suppl. Table 1 of Oswald et al. (2011). ΩfAⒶ ΩgAⒶ Mannitol-balanced 5.5 (control), 25 or 100 mM d-glucose media. Source: Table 1 of Chen et al. (2012). ΩhAⒶ ΩiAⒶ ΩjAⒶ ΩkAⒶ Temperature and NaCl treatment (control: 35 °C, aw 0.993). Source: Suppl. Tables S13–S16 of Kocharunchitt et al. (2012). ΩlAⒶ ΩmAⒶ 5.5 (control), 25 or 100 mM d-glucose. Source: Table 1 of Chen et al. (2013). ΩnAⒶ Gill proteome of Japanese eel (Anguilla japonica) adapted to seawater or freshwater. Source: protein IDs from Suppl. Table 3 and gene names of human orthologs from Suppl. File 4 of Tse et al. (2013). ΩoAⒶ ΩpAⒶΩqAⒶ Multiple experiments for 30 min after transfer from YPKG (0.5% glucose) to YNB (2% glucose) media. Source: extracted from Suppl. Files 3 and 5 of Giardina, Stanley & Chiang (2014), using the authors’ criterion of p-value < 0.05. ΩrAⒶ 280 (control), 380, or 480 mOsm (NaCl treatment) for 24 h. Source: Table 2 of Chen et al. (2015). ΩsAⒶΩtAⒶΩuAⒶΩvAⒶ Overnight treatment with a final concentration of 40/50 mM NaCl or 200 mM sucrose vs M2 minimal salts medium plus glucose (control). Source: Table S2 of Kohler et al. (2015). ΩwAⒶ ΩxAⒶ 15 g/L vs 5 g/L (control) glucose at days 0, 3, 6, and 9. The comparisons here use all proteins reported to have expression patterns in Cluster 1 (up) or Cluster 5 (down), or only the proteins with high expression differences (ratio ≤ − 0.2 or ≥0.2) at all time points. Source: SI Table S4 of Liu et al. (2015). ΩyAⒶ 4.21 osmol/kg vs 3.17 osmol/kg osmotic pressure (NaCl treatment). Source: Table 1 of Yang et al. (2015). ΩzAⒶ 0.1 M KCl (treatment) vs medium with no added KCl (control). Source: Suppl. Tables 2 and 3 of da Silva Rodrigues et al. (2016).
Gene names, GI numbers, or NCBI RefSeq accessions were converted to UniProt IDs using the UniProt mapping tool.
Amino acid sequences were obtained for the listed GI numbers using Batch Entrez (

Sequence IDs were converted to UniProt IDs using the UniProt mapping tool ( or the gene ID conversion tool of DAVID 6.7 ( For proteins where the automatic conversions produced no matches, manual searches in UniProt were performed using the gene names or protein descriptions. If specified (i.e., as UniProt IDs with suffixes), particular isoforms of the proteins were used. Obsolete or secondary IDs reported for some proteins were updated to reflect current, primary IDs (uniprot_updates.csv in Dataset S1). Any duplicated IDs listed as having opposite expression ratios were excluded from the comparisons here.

Amino acid sequences of human proteins were taken from the UniProt human reference proteome. Sequences of proteins in other organisms and of human proteins not contained in the reference proteome were downloaded from UniProt or the NCBI website (for one study reporting GI numbers; see Table 4). Amino acid compositions were computed using functions in the CHNOSZ package (Dick, 2008) or the ProtParam tool on the UniProt website. The amino acid compositions are stored in *.Rdata files in Dataset S1.

R (R Core Team, 2016) and R packages canprot (this study) and CHNOSZ (Dick, 2008) were used to process the data and generate the figures with code specifically written for this study, which is provided in Dataset S2.

Measures of compositional oxidation and hydration state

Two compositional metrics that afford a quantitative description of proteomic data, the average oxidation state of carbon (ZC) and the water demand per residue ( n ¯ H 2 O ), are briefly described here.

The oxidation state of atoms in molecules quantifies the degree of electron redistribution due to bonding; a higher oxidation state signifies a lower degree of reduction. Although calculations of oxidation state from molecular formulas necessarily make simplifying assumptions regarding the internal electronic structure of molecules, such calculations may be used to quantify the flow of electrons in chemical reactions, and the oxidation state concept is useful for studying the transformations of complex mixtures of organic molecules. For example, calculations of the average oxidation state of carbon provide insight on the processes affecting the decomposition of carbohydrate, protein and lipid fractions of natural organic matter (Baldock et al., 2004). Moreover, oxidation state can be regarded as an ensemble property of organic systems (Kroll et al., 2015). See Dick (2016) for additional references where organic and biochemical reactions have been characterized using the average oxidation state of carbon.

Despite the large size of proteins, their relatively simple primary structure means that ZC can be computed using the elemental abundances in any particular amino acid sequence (Dick, 2014): Z C = h + 3 n + 2 o + 2 s + z c . In this equation, c, h, n, o, and s are the elemental abundances in the chemical formula C c H h N n O o S s z for a specific protein with total charge z. Note, however, that ionization by gain or loss of protons alters charge and the number of H equally, so has no effect on the value of ZC; for ease of computation, ZC is calculated here for proteins in their completely non-ionized forms.

In contrast to the elemental stoichiometry in Eq. (1), a calculation of the hydration state must account for the gain or loss of H2O. In the biochemical literature, “protein hydration” or water of hydration refers to the effective (time-averaged) number of water molecules that interact with a protein (Timasheff, 2002). These dynamically interacting molecules form a hydration shell that has important implications for crystallography and enzymatic function, but hydration numbers have been measured for few proteins and are difficult to compute, especially for the many proteins with unknown tertiary structure. Thus, the structural hydration of proteins identified in proteomic datasets generally remains unquantified.

A different concept of hydration state arises by considering the chemical components that make up proteins. A componential analysis is a method of projecting the composition of a molecule using specified chemical formula units as the components, or basis species. The notion of components is central to chemical thermodynamics (Gibbs, 1875); the choice of components determines the thermodynamic variables (chemical potentials), and a careful choice leads to more convenient representations of the compositional and energetic constraints on reactions (e.g. Zhu & Anderson, 2002).

The components, or basis species, consist of a minimum number of species whose compositions can be linearly combined to represent the composition of any protein. The 20 proteinogenic amino acids are together composed of five elements (C, H, N, O, S), so five basis species are needed to represent the primary sequences of proteins. As noted previously (see references in Dick, 2016), all possible combinations of basis species lead to thermodynamically consistent models, but are differently suited to making interpretations. Dick (2016) proposed using C5H10N2O3, C5H9NO4, C3H7NO2S, O2, and H2O as a basis for assessing compositional differences in proteomes. The first three formulas correspond to glutamine (Q), glutamic acid (E), and cysteine (C).

To account for protein ionization, a proton can be included in the basis, which is now referred to as “QEC+”. Using the QEC+ basis, the stoichiometric projection of a protein with formula C c H h + z N n O o S s z , where z is the charge of the protein and h is the number of H in the fully nonionized protein, is represented by n Cys C 3 H 7 NO 2 S + n Glu C 5 H 9 NO 4 + n Gln C 5 H 10 N 2 O 3 + n H 2 O H 2 O + n O 2 O 2 + z H + C c H h + z N n O o S s z . To compare the compositions of different-sized proteins, the stoichiometric coefficients in Reaction (R1) can be divided by the sequence length (number of amino acids) of the protein. The length-normalized coefficients, written with an overbar, include the per-residue water demand for formation of a protein ( n ¯ H 2 O ). This componential “hydration state” is used in this study, and should not be confused with the structural biochemical “protein hydration” mentioned above.

The primary reason for choosing the QEC+ basis instead of others lies in the relation of the compositional variables representing oxidation and hydration state ( n ¯ O 2 and n ¯ H 2 O ) with each other and with ZC. It is important to note that ZC is a measure of oxidation state that is independent of the choice of basis species. Smoothed scatter plots of n ¯ H 2 O vs ZC and n ¯ O 2 vs ZC are shown in Fig. S1 for the 21,006 human proteins in the UniProt reference proteome. The plots in the top row of this figure are made using the QEC basis (which is equivalent to the QEC+ basis for the plotted variables) while those in the bottom row are made using the basis species CO2, NH3, H2S, H2O, and O2; these inorganic species are often used to balance reactions in geochemical models. It is apparent from Fig. S1 that, using the QEC basis, n ¯ O 2 is highly positively correlated with ZC, and n ¯ H 2 O shows a slight negative correlation with ZC. Accordingly, in the QEC basis, n ¯ O 2 is a strong indicator of oxidation state, while n ¯ H 2 O represents a distinct compositional variable. In contrast, the plots in the bottom row of Fig. S1 show a moderate positive correlation between n ¯ O 2 and ZC and a stronger negative correlation between n ¯ H 2 O and ZC. Using that basis would therefore weaken the interpretation of n ¯ O 2 as an indicator of oxidation state and of n ¯ H 2 O as a distinct compositional variable. The relations among n ¯ H 2 O , n ¯ O 2 , and ZC also vary between basis species consisting of different combinations of amino acids; those differences together with biological considerations support the choice of QEC instead of other amino acids (Dick, 2016).

In summary, Reaction (R1) is not a mechanism for protein synthesis, but is a projection of any protein’s elemental composition into chemical components, i.e., the basis. Compared to a basis composed of simpler inorganic species, the QEC+ basis reduces the projected codependence of oxidation and hydration state in proteins, unfolding a compositional dimension that can enrich a thermodynamic model.


Colorectal cancer

The progression of colorectal cancer (CRC) begins with the formation of numerous non-cancerous lesions (adenoma), which may remain undetectable. Over time, a small fraction of adenomas develop into malignant tumors (carcinoma) (Jimenez et al., 2010; Wiśniewski et al., 2015). Publicly available datasets reporting a minimum of ca. 30 up- and 30 down-expressed proteins for tissue samples of CRC, and one meta-analysis of serum biomarkers, were compiled recently (Dick, 2016). These same datasets are listed in Table 1, with one newer addition (dataset ΩGAⒶ; Liu et al., 2016).

Many aspects of the experimental methods, statistical tests, and bioinformatics analyses used to identify significantly up-expressed and down-expressed proteins vary considerably among studies. The comparisons here are made without any control of this variability. Although particular comparisons may reflect study-specific conditions and methods, visualization of the chemical compositions of proteins for many datasets can reveal general features of the cancer phenotype.

For each dataset, Table 1 lists the numbers of down-expressed (n1) and up-expressed (n2) proteins in cancer relative to normal tissue. For datasets comparing different stages of cancer progression, groups n1 and n2 correspond to the down- and up-expressed proteins in the more advanced stage (e.g., carcinoma) compared to the less advanced stage (e.g., adenoma). Mean values of average oxidation state of carbon (ZC; Eq. (1)) and water demand per residue ( n ¯ H 2 O ; Reaction (R1)) were calculated for the up- and down-expressed groups of proteins, together with the corresponding mean differences (ΔZC and Δ n ¯ H 2 O for the means of up- minus down-expressed groups), p-values, and effect sizes. These values are listed in Table S1. Figure S2 shows the mean values of ZC and n ¯ H 2 O for the up- and down-expressed proteins together in a single plot (lettered point symbols for down-expressed and arrowheads for up-expressed proteins). Because of the high variability of mean values among datasets, compositional trends between up- and down-expressed proteins are difficult to interpret using Fig. S2. Therefore, the differences in mean values between up- and down-expressed proteins (ΔZC and Δ n ¯ H 2 O ) are plotted in this paper.

Figure 1A shows Δ n ¯ H 2 O vs ΔZC for the CRC datasets. The gray boxes cover the range from −0.01 to 0.01 for each of the variables. To draw attention to the largest and most significant changes, filled points and dashed lines indicate mean differences with a p-value (Wilcoxon test) less than 0.05; solid lines indicate mean differences with a common language effect size (CLES) ≥60% or ≤40%. The common language statistic “is the probability that a score sampled at random from one distribution will be greater than a score sampled from some other distribution” (McGraw & Wong, 1992). Here, CLES is calculated as the percentage of pairings of individual proteins with a positive difference in ZC or n ¯ H 2 O between the up- and down-expressed groups from all possible pairings between the groups. Point symbols are squares if the p-values for both ZC and n ¯ H 2 O are less than 0.05, or circles otherwise.

Compositional analysis of differential protein expression in (A) colorectal cancer and (B) pancreatic cancer.

Figure 1: Compositional analysis of differential protein expression in (A) colorectal cancer and (B) pancreatic cancer.

The plots show differences (Δ) between the mean for up-expressed and the mean for down-expressed proteins of average oxidation state of carbon (ZC) and water demand per residue ( n ̄ H 2 O ) for each dataset from Tables 1 and 2. Red colors highlight (A) adenoma/normal comparisons or (B) chronic pancreatitis/normal or low-grade tumor/normal comparisons. Here and in Fig. 2, filled points and dashed lines indicate p < 0.05; solid lines are drawn instead if the common language effect size is ≥60% or ≤40%.

The plot illustrates that proteins up-expressed in carcinoma relative to normal tissue most often have significantly higher ZC [ΩgAⒶ ΩkAⒶ ΩlAⒶ ΩnAⒶ ΩpAⒶ ΩrAⒶ ΩsAⒶ ΩuAⒶ ΩvAⒶ ΩlAⒶ], n ¯ H 2 O [ΩeAⒶ ΩoAⒶ ΩtAⒶ ΩxAⒶ ΩyAⒶ ΩDAⒶ ΩGAⒶ ΩHAⒶ], or both [ΩqAⒶ ΩAAⒶ ΩCAⒶ] (see also Dick, 2016). The red points in the plot highlight the datasets for adenoma/normal comparisons [ΩiAⒶ ΩoAⒶ ΩxAⒶ ΩAAⒶ ΩDAⒶ ΩHAⒶ]. Most of these exhibit a significant positive Δ n ¯ H 2 O but not the large increase in ZC found for many of the carcinoma/normal comparisons.

Pancreatic cancer

Many proteomic studies have been performed to investigate the differences between normal pancreas (NP) and pancreatic adenocarcinoma (PDAC). Proteomic studies also address the inflammatory conditions of autoimmune pancreatitis, which is sometimes misidentified as carcinoma (Paulo et al., 2013), and chronic pancreatitis, which is associated with increased cancer risk (Chen et al., 2007). Searches for proteomic data were aided by the reviews of Pan et al. (2013) and Ansari et al. (2014). Table 2 lists selected datasets reporting at least ca. 25 up-expressed and 25 down-expressed proteins.

The compositional comparisons in Fig. 1B show that up-expressed proteins in pancreatic cancer often have significantly higher ZC [ΩbAⒶ ΩeAⒶ ΩgAⒶ ΩiAⒶ ΩoAⒶ ΩpAⒶ ΩqAⒶ ΩrAⒶ]. A dataset obtained for pancreatic cancer associated with diabetes mellitus (Wang et al., 2013a) [ΩqAⒶ] has both significantly higher ZC and n ¯ H 2 O . Only one dataset, from a study that targeted accessible proteins (Turtoi et al., 2011) [ΩhAⒶ], is characterized by a large negative mean difference of ΔZC. Some other datasets that do not have significantly different ZC exhibit higher n ¯ H 2 O in cancer compared to non-cancerous (normal or pancreatitis) tissue [ΩaAⒶ ΩjAⒶ ΩkAⒶ ΩmAⒶ ΩuAⒶ]. Two of the four datasets with negative Δ n ¯ H 2 O [ΩdAⒶ ΩhAⒶ ΩnAⒶ ΩsAⒶ] were obtained from studies of chronic pancreatitis (Chen et al., 2007) or low-grade tumors (Wang et al., 2013b) (red points in Fig. 1B); another used a procedure to isolate accessible proteins (Turtoi et al., 2011) [ΩhAⒶ], while the remaining low- Δ n ¯ H 2 O dataset [ΩsAⒶ] may be an outlier in terms of mean chemical composition (Fig. S2). Therefore, the datasets with positive Δ n ¯ H 2 O and/or ΔZC likely reflect a general characteristic of pancreatic cancer.

Hypoxia and 3D culture

Hypoxia refers to oxygen concentrations that are lower than normal physiological levels. Hypoxia is a factor in many pathological conditions, including altitude sickness, stroke, and cardiac ischemia (e.g., Datta et al., 2010; Li et al., 2012; Fuhrmann et al., 2013). In tumors, irregular vascularization and abnormal perfusion contribute to the formation of hypoxic regions (Höckel & Vaupel, 2001). A related situation is the growth in the laboratory of 3D cell cultures (e.g., tumor spheroids), instead of two-dimensional growth on a surface. In 2D monolayers, all cells are exposed to the gas phase, but interior regions of 3D cultures are often diffusion-limited, leading to oxygen deprivation and necrosis (McMahon et al., 2012). There are some overlaps, but also many differences, between gene expression in 3D culture and hypoxic conditions (DelNero et al., 2015). These studies emphasize that growth in 3D culture is associated with heterogeneous oxygen concentrations and have found an interdependence between the effects of hypoxia and 3D growth on gene expression. The proteomic changes likely reflect not only oxygen limitation but also other processes connected with 3D growth (e.g., nutrient deprivation, extracellular architecture, and even light penetration). Although the comparisons made here do not address these individual factors, they do provide information on whether hypoxia and 3D culture lead to similar changes in the overall chemical composition of proteomes.

Table 3 lists selected proteomic datasets with a minimum of ca. 20 up- and 20 down-expressed proteins in hypoxia or 3D growth. The differences in chemical composition of the differentially expressed proteins are plotted in Fig. 2A. In many experiments, hypoxia or 3D growth induces a proteomic transformation with a significant and/or large decrease of ZC [ΩaAⒶ ΩbAⒶ ΩcAⒶ ΩgAⒶ ΩhAⒶ ΩjAⒶ ΩmAⒶ ΩoAⒶ ΩwAⒶ ΩAAⒶ ΩEAⒶ]. These datasets cluster around a narrow range of ΔZC (−0.032 to −0.021), except for dataset ΩEAⒶ (3D growth of colon cancer cells) with much lower ΔZC. As extracellular proteins have relatively high ZC (Dick, 2014), the observation in some experiments that hypoxia decreases the abundance of proteins associated with the extracellular matrix (ECM) (Blankley et al., 2010) is compatible with the overall expression of more reduced (low- ZC) proteins. Conversely, reoxygenation leads to the formation of more oxidized proteins in the supernatant (-S) and pellet (-P) fractions of isolated chromatin [ΩrAⒶ ΩuAⒶ].

Compositional analysis of differential protein expression in (A) hypoxia or 3D culture and (B) hyperosmotic stress.

Figure 2: Compositional analysis of differential protein expression in (A) hypoxia or 3D culture and (B) hyperosmotic stress.

The plots show differences (Δ) between the mean for up-expressed and the mean for down-expressed proteins of average oxidation state of carbon (ZC) and water demand per residue ( n ̄ H 2 O ) for each dataset from Tables 3 and 4. Red, blue, and orange symbols are used to highlight datasets for tumorspheres, reoxygenation or anti-hypoxic treatment, and adipose-derived stem cells, respectively.

While most studies controlled gas composition to generate hypoxia, two datasets [ΩCAⒶ ΩDAⒶ] are from a study that used cobalt chloride (CoCl2) to induce hypoxia in rat cardiomyocytes; treatment with salidroside (SAL) had anti-hypoxic effects (Xu et al., 2016). The CoCl2 and SAL treatments result in the expression of somewhat more reduced and more oxidized proteins, respectively, in agreement with the general trends for hypoxia and reoxygenation experiments.

Two datasets oppose the general trends, showing large and significantly higher ZC under hypoxia. These datasets were obtained using particular analytical methods or cell types. One of the nonconforming datasets is for the supernatant in a chromatin isolation procedure [ΩpAⒶ], and the other is for adipose-derived stem cells [ΩBAⒶ] (see below).

Hyperosmotic stress

By hyperosmotic stress is meant a condition that increases the extracellular hypertonicity, or osmolality. The addition of osmolytes (or “cosolvents”) lowers the water activity in the medium (Timasheff, 2002). Equilibration with hypertonic solutions drives water out of cells, causing cell shrinkage. The selected datasets listed in Table 4 include at least ca. 20 up-expressed and 20 down-expressed proteins in response to high concentrations of NaCl (five studies), glucose (six studies), succinate (one study), KCl (one study), or adaptation to seawater (one study). The proteomic analyses used bacterial, yeast, or mammalian cells, or fish (eel) gills (Tse et al., 2013). One study varied temperature along with NaCl concentration (Kocharunchitt et al., 2012), and one study reported both transcriptomic and proteomic ratios (Kohler et al., 2015).

In the study of Giardina, Stanley & Chiang (2014) [ΩoAⒶ ΩpAⒶ ΩqAⒶ], the reported expression ratios for extracellular proteins after transfer from low glucose to high glucose media are nearly all less than 1. Therefore, the “up-expressed” proteins in the comparisons here are taken to be those that have a higher expression ratio than the median in a given experiment. To achieve a sufficient sample size using data from Chen et al. (2015) [ΩrAⒶ], the comparisons here use a combined set of proteins, i.e., those identified to have the same direction of change in the two treatment conditions (380 and 480 mOsm NaCl) and a significant change in at least one of the conditions.

Figure 2B shows that hyperosmotic stress strongly (CLES ≤40%) and/or significantly (p-value < 0.05) induces the formation of proteins with relatively low water demand per residue in 11 datasets [ΩaAⒶΩbAⒶ ΩdAⒶΩfAⒶΩiAⒶ ΩmAⒶΩsAⒶΩtAⒶΩuAⒶΩvAⒶΩzAⒶ]. Five of these datasets, including four for bacteria [ΩsAⒶΩtAⒶΩuAⒶΩvAⒶ] and one for human cells [ΩmAⒶ], also show an increase in ZC. These trends are found in both the transcriptomic [ΩsAⒶΩtAⒶ] and proteomic [ΩuAⒶ ΩvAⒶ] data from the study of Kocharunchitt et al. (2012).

Four datasets obtained for mammalian cells have low ΔZC with no significant [ΩrAⒶΩwAⒶΩxAⒶ] or a significantly negative mean difference of n ¯ H 2 O [ΩfAⒶ]. Six datasets [ΩhAⒶΩkAⒶΩnAⒶΩoAⒶΩpAⒶΩqAⒶ] from one study each of yeast and E. coli, and of Japanese eels adapted to seawater, have very small mean differences in ZC and a negative Δ n ¯ H 2 O that follows the trends of most of the other datasets, but with lower significance (p-value > 0.05).

The comparisons here show that hyperosmotic stress consistently induces the formation of proteins with lower water demand per residue. In some, but not all, cases, this coincides with an increase in average oxidation state of carbon. Less often, and perhaps specific to mammalian cells, the proteomic composition is shifted toward lower oxidation state of carbon. There are only a couple of datasets, using NaCl treatment [ΩeAⒶΩjAⒶ], that show an increase in water demand per residue.

Notably, two datasets for adipose-derived stem cells oppose the general trends for hypoxic and hyperosmotic conditions (see Fig. 2A [ΩBAⒶ] and Fig. 2B [ΩeAⒶ]). This intriguing result shows that these stem cells respond to external stresses with proteomic transformations that are chemically similar to those in cancer (Fig. 1).

Potential diagrams

The correlations of compositional differences (negative ΔZC and Δ n ̄ H 2 O ) with hypoxia and hyperosmotic stress can be proposed as resulting from attraction of the proteomes to a context-specific low-energy state. Thermodynamic models can help to illuminate the possible microenvironmental constraints on the observed proteomic transformations. Here, the chemical affinities of stoichiometric formation reactions of proteins were calculated, grouped, and compared in order to estimate the thermodynamic potential for the overall process of proteomic transformation.

The chemical affinity quantifies the potential, or propensity, for a reaction to proceed. It is the infinitesimal change with respect to reaction progress of the negative of the Gibbs energy of the system. The chemical affinity is numerically equal to the “non-standard” or actual (Warn & Peters, 1996), “real” (Zhu & Anderson, 2002), or “overall” (Shock, 2009) negative Gibbs energy of reaction. These energies are not constant, but vary with the chemical potentials, or chemical activities, of species in the reaction. Chemical activity (a) and potential (μ) are related through μ = μ + RTlna, where the standard chemical potentials of particular species (μ = G, i.e., standard Gibbs energies) depend only on temperature and pressure.

The equilibrium constant (K) for a reaction is given by ΔG =  − 2.303RTlogK, where ΔG is the standard Gibbs energy of the reaction, 2.303 stands for the natural logarithm of 10, R is the gas constant, T is temperature in Kelvin, and log denotes the decadic logarithm. The equation used for affinity (A) is A = 2.303RTlog(KQ), where Q is the activity quotient of the reaction (e.g., Helgeson, 1979, Eq. 11.27; Warn & Peters, 1996, Eq. 7.14; Shock, 2009). Accordingly, the per-residue affinity of Reaction (R1) can be written as A = 2 . 303 R T log K + n ¯ Cys log a Cys + n ¯ Glu log a Glu + n ¯ Gln log a Gln + n ¯ H 2 O log a H 2 O + n ¯ O 2 log f O 2 z ̄ H + pH log a residue where the abbreviations of the amino acids have been substituted for their formulas. Here, a and f stand for chemical activity and fugacity (e.g., aH2O is water activity, and fO2 is oxygen fugacity). The fugacity, rather than activity, of O2 is used because gaseous oxygen is the reference state most commonly used in previous thermodynamic models. If aO2 were used instead, its values would differ from fO2 according to the solubility of oxygen in water at the given temperature but otherwise the two models would be thermodynamically equivalent. The overbar notation ( n ¯ and z ̄ ) signifies that the coefficients in Reaction (R1) are each divided by the length (number of amino acids) of the protein sequence. Likewise, the elemental composition and standard Gibbs energy per residue are those of the ionized protein (with formula C c H h + z N n O o S s z ) divided by the length of the protein.

The standard Gibbs energies of species at 37 °C and 1 bar were calculated with CHNOSZ (Dick, 2008) using equations and data taken from Wagman et al. (1982) and Kelley (1960) ( O 2 g ), Johnson, Oelkers & Helgeson (1992) and references therein (H2O), and using the Helgeson–Kirkham–Flowers equations of state (Helgeson, Kirkham & Flowers, 1981) with data taken from Amend & Helgeson (1997) and Dick, LaRowe & Helgeson (2006) (amino acids), and from Dick, LaRowe & Helgeson (2006) and LaRowe & Dick (2012) (amino acid group additivity for proteins).

In previous calculations, activities of the amino acid basis species and protein residues were set to 10−4 and 100, respectively (Dick, 2016). As long as constant total activity of residues is assumed, the specific value does not greatly affect the outcome of the calculations; here it is kept at 100. Revised activities of the amino acid basis species, corresponding to mean concentrations in human plasma (Tcherkas & Denisenko, 2001), are used here: 10−3.6 (cysteine), 10−4.5 (glutamic acid) and 10−3.2 (glutamine). Adopting these activities of basis species, instead of 10−4, lowers the calculated equipotential lines for proteomic transformations by about 0.5 to 1 logaH2O (see below). Accounting for protein ionization, with pH set to 7, also lowers the equipotential lines, by about 1 logaH2O compared to calculations for nonionized proteins.

It follows from Eq. (2) that varying the fugacity of O2 and activity of H2O alters the chemical affinity for formation of proteins by a specific amount depending on their chemical composition. For example, Figure 5A of Dick (2016) shows that decreasing logfO2 is relatively more favorable for the formation of up-expressed than down-expressed proteins in a particular cancer dataset (Knol et al., 2014; ΩwAⒶ in Table 1). This tendency is consistent with the lower ZC of these up-expressed proteins, which is unlike most other datasets for CRC (Fig. 1A).

How can the affinities of groups, rather than individual proteins, be compared? One method is based on differences in the ranks of chemical affinities of proteins between groups (Dick, 2016). Using this method, the affinities of all of the proteins in a dataset are ranked; the ranks are then summed for proteins in the up- and down-expressed groups (rup and rdown). Before taking the difference, the ranks are multiplied by a weighting factor to account for the different numbers of proteins in the groups (n = nup + ndown). This weighted rank difference (WRD) of affinity summarizes the estimates of the differential potential for formation: WRD = 2 n down n r up n up n r down .

On a contour diagram of the WRD of affinity (referred to here as a “potential diagram”), the line of zero WRD represents a rank-wise equal affinity (or “equipotential line”) for formation of proteins in the two groups.

To characterize the general trends, diagrams were made for groups of proteomic datasets with similar compositional features. For pancreatic cancer, there are 11 datasets with ΔZC > 0.01 (i.e., to the right of the gray box in Fig. 1B) and for which the mean difference of n ¯ H 2 O is neither significant (low p-value) nor large (high CLES). Conversely, there are 8 datasets for pancreatic cancer with Δ n ¯ H 2 O > 0 . 01 and for which the mean difference of ZC is neither large nor significant. Similarly, weighted rank-difference diagrams were constructed for 13 (ΔZC > 0.01) and 10 ( Δ n ¯ H 2 O > 0 . 01 ) datasets for CRC, 8 datasets for hypoxia (ΔZC <  − 0.01), and 12 datasets for hyperosmotic stress ( Δ n ¯ H 2 O < 0 . 01 ). The individual diagrams for each of these groups are presented in Fig. S3.

In order to observe the central tendencies among the various datasets, the potential diagrams for each group in Fig. S3 were combined by taking the arithmetic mean of the WRD at all grid points in logfO2–logaH2O space. The resulting diagrams (Fig. 3) have equipotential lines, shown in white, and zones of positive and negative WRD of affinity, i.e., greater relative potential for formation of up- and down-expressed groups of proteins, colored red and blue, respectively.

Merged potential diagrams for proteomic transformations.

Figure 3: Merged potential diagrams for proteomic transformations.

Plots are shown for (A) 13 datasets for colorectal cancer and (B) 11 datasets for pancreatic cancer with ΔZC > 0.01, (C) eight datasets for hypoxia or 3D culture with ΔZC <  − 0.01, (D) 10 datasets for colorectal cancer and (E) eight datasets for pancreatic cancer with Δ n ¯ H 2 O > 0 . 01 , and (F) 12 datasets for hyperosmotic stress with Δ n ¯ H 2 O < 0 . 01 . Red and blue colors denote higher relative potential for formation of up- and down-expressed proteins, respectively. White lines are equipotential lines, where the mean weighted rank difference of affinity (WRD; Eq. (3)) of the included datasets is 0; black lines show the median and interquartile range of the WRD = 0 lines for individual datasets (Fig. S3). See text for details.

The solid black lines in Fig. 3 show the median position along the x- or y-axis for the equipotential lines in each group (Fig. S3), and the dashed black lines are positioned at the 1st and 3rd quartiles. The interquartile ranges for the cancer groups are smaller than those for hypoxia, but less so for hyperosmotic stress. The smaller range would be expected if the cancer datasets reflected a somewhat narrower set of conditions than the datasets for experiments with hypoxia; the latter represent a wide variety of organisms, cell types, and laboratory conditions (Table 3).


Calculations of the average oxidation state of carbon and water demand per residue, derived from elemental stoichiometry, provide information on the microenvironmental factors affecting differential protein expression in cancer and laboratory experiments. Hypoxia or hyperosmotic stress generally induces the expression of proteins with lower overall oxidation state of carbon or lower water demand per residue, respectively, compared to down-expressed proteins. In contrast, proteomes of CRC and pancreatic cancer are often characterized by greater water demand per residue or oxidation state of carbon. The formation of more highly oxidized proteins despite the hypoxic conditions of many tumors hints at a complex set of microenvironmental–cellular interactions in cancer.

Plots of data from experiments with hypoxia and hyperosmotic stress illuminate two dimensions of possible compositional attraction to a low-energy state (Fig. 2). A thermodynamic model quantifies the altered potential for proteomic transformation in response to changing oxygen fugacity and water activity. The equipotential lines for cancer proteomes with high differential water demand lie between logaH2O =  − 1 to −3, while the potential threshold for transformation of proteomes in hyperosmotic stress is closer to unit activity of water (logaH2O =  − 0 to −2) (Figs. 3D3F). Although there is considerable variability among the individual datasets (Fig. S3), the merged diagrams demonstrate a physiologically realistic range for the activity of water. Water activity in cells is close to one, but restricted diffusion of H2O in “osmotically inactive” regions of cells (Model, 2014) could result in locally lower water activities. The present findings provide evidence that the molecular processes regulating proteomic transformations operate within the chemical constraints of subcellular regions of depleted water activity.

The finding of a frequently positive water demand for the transformation between normal and cancer proteomes offers a new perspective on the biochemistry of hydration in cancer. The thermodynamic calculations predict that, in contrast to hyperosmotic stress, proteomes of cancer tissues are stabilized by increasing water activity. A higher than normal water activity would be consistent with the greater hydration of tissue that is apparent in spectroscopic analysis of breast cancer tissue (e.g., Abramczyk et al., 2014). Speculatively, the relatively high water content needed for embryonic development (Moulton, 1923) could be recreated in cancer cells if they revert to an embryonic mode of growth (McIntyre, 2006).

The equipotentials for transformation of proteomes in cancer cluster near an oxygen fugacity of ca. 10−68 to 10−66. The oxygen fugacity should be interpreted not as actual oxygen concentration, rather as a internal scale of oxidation potential. Oxygen fugacity and water activity can be converted to the Eh scale for redox potential, giving values that are comparable to other biochemical measurements (Dick, 2016).

Although cancer proteomes are obtained from tissues that are likely derived from hypoxic tumor environments, their differential expression is most often in favor of oxidized proteins (Figs. 1A and 1B). What are some explanations for this finding? Perhaps the relatively high logfO2 threshold for chemical transformation of hypoxia-responsive proteins could support a buffering action that potentiates the formation of relatively oxidized proteins in cancer (compare the median and quartiles in Fig. 3C with those in Figs. 3A and 3B). This speculative hypothesis requires a division of the cellular proteome into localized, chemically interacting subsystems. Alternatively, the development of a high oxidation potential in cancer cells may be associated with a higher concentration of mitochondrially produced reactive oxygen species (ROS). Neither of these possibilities addresses the magnitude of the chemical differences in the proteomes, and the question remains: where do the electrons go?

A plausible hypothesis comes from considering the different oxidation states of biomolecules. Fatty acids are reduced compared to amino acids, nucleotides, and saccharides (Amend et al., 2013). In parallel with the formation of more reduced proteins, hypoxia induces the accumulation of lipids in cell culture (Gordon, Barcza & Bush, 1977). Cancer cells are also known for increased lipid synthesis. Lipid droplets, which are derived from the endoplasmic reticulum (ER), form in great quantities in cancer cells (Koizume & Miyagi, 2016). Assuming that lipids are synthesized from relatively oxidized metabolic precursors, their formation requires a source of electrons. These considerations lead to the hypothesis that increased lipid synthesis is coupled to the oxidation of the proteome.

Calculations that combine proteomic and cellular data can be used to quantify a hypothetical redox balance between cellular lipids and proteins. The major assumptions in the calculations here are that the overall cellular oxidation state of carbon is the same in cancer and hypoxia, and that changes in this cellular oxidation state are brought about by altering only the numbers of lipid and protein molecules. The overall chemical composition of the lipids is assumed to be constant, but the proteins are assigned different values of ZC. These simplifying assumptions are meant to pose quantifiable “what if” questions, to serve as points of reference about the range of molecular composition of cells (Milo & Phillips, 2015).

The worked-out calculation is shown in Fig. 4. The lipid:protein ratio in hypoxia is taken from Gordon, Barcza & Bush (1977), and ballpark values for the differences in ZC of proteins in hypoxia and cancer are from the present study. Notably, the lipid:protein weight ratio in hypoxia (0.19) is higher than in normal cells (i.e., 0.15 using data from Gordon, Barcza & Bush, 1977 or 0.16 using data compiled by Milo & Phillips, 2015 for E. coli). The calculation indicates that an increase of the lipid:protein weight ratio in cancer cells by ca. 20% over that in hypoxic normal cells could provide an electron sink that is large enough to take up the electrons released by oxidation of the proteome in hypoxic normal cells to generate that in hypoxic cancer cells. That proteomic transformation is quantified here by an increase of ΔZC from ca. −0.03 to 0.03, both relative to non-hypoxic normal cells (Fig. 4).

A computer-aided “back of the envelope” calculation to estimate the lipid to protein ratio (L:P) in cancer cells and the percent difference from normal cells in hypoxic conditions.

Figure 4: A computer-aided “back of the envelope” calculation to estimate the lipid to protein ratio (L:P) in cancer cells and the percent difference from normal cells in hypoxic conditions.

Bold text indicates function definitions (R code) or numerical results (comments/results (rounded)). Numerical values are taken from [1] the chemical formula of 1-palmitoyl-2,3-dioleoyl-glycerol, given as an example of a triacylglycerol (triglyceride) in the chapter on lipid metabolism in Voet, Voet & Pratt (2013), [2] the average chemical formula of proteins in the UniProt human proteome, for which amino acid compositions are stored in human_base.Rdata in the canprot package, [3] this study, and [4] Table 2 of Gordon, Barcza & Bush (1977) (mouse cells grown in hypoxic conditions).

As found by Raman spectroscopy, levels of both lipids and proteins are elevated in colorectal cancer (Stone et al., 2004). Lipid droplets are formed extensively in CRC stem cells (Tirinato et al., 2015), suggestive of a higher lipid:protein ratio than either cancer or normal epithelial cells. In contrast to CRC, lipids are decreased in breast cancer compared to normal breast tissue (Frank, McCreery & Redd, 1995; Stone et al., 2004). Given a lower lipid content, and therefore smaller electron sink, one might expect that proteomes in breast cancer are oxidized to a lesser extent than those in CRC and pancreatic cancer. Other factors that affect the systemic redox balance, such as a more reduced gut microbiome in CRC (Dick, 2016) and metabolic coupling between epithelial and stromal cells, may be important for an accurate account of the compositional relationships among biomacromolecules.

These compositional and thermodynamic analyses support the notion that changes in bulk chemical composition of cells and the microenvironment have a significant role in shaping the differential expression of proteins. The analysis done here is primarily concerned with top-down causal factors (physical constraints on protein synthesis and degradation), but does not preclude a major role for bottom-up factors (e.g., regulation of gene expression). Speculatively, further applications of these methods could be used to predict the ability of chemotherapy or other treatments to reduce or reverse the potential for formation of the proteins required by cancer cells. Based on the current findings, a decreased proteomic oxidation and/or hydration state may emerge as one aspect of beneficial treatments.

This approach to the data differs from conventional interpretations of proteomic data that are based on the functions of proteins. Nevertheless, the scope of explanations dealing with functions and molecular interactions offers limited insight on the high-level organization of proteomes in a cellular and microenvironmental context. Although a variety of bioinformatics tools are available for functional interpretations (Laukens, Naulaerts & Berghe, 2015), none so far addresses the overall chemical requirements of proteomic transformations. The compositional and thermodynamic descriptions presented here encourage a fresh look at the question, “What is cancer made of?”


Although many hypoxia experiments induce the formation of proteins with lower oxidation state of carbon (ZC), the up-expressed proteins in colorectal and pancreatic cancer are often relatively oxidized compared to the down-expressed ones. Hyperosmotic stress in the laboratory leads to the formation of proteins with relatively low water demand per residue ( n ¯ H 2 O ), but cancer proteomes often show the opposite trend, with up-expressed proteins having higher average n ¯ H 2 O than down-expressed ones.

The global proteomic differences can be described as compositional changes in terms of chemical basis species and quantified in a thermodynamic framework. A positive thermodynamic potential for each proteomic transformation is predicted in a specific range of oxidation and hydration potential. However, the distribution of biomolecules other than proteins should also be considered to account for changes in cellular redox balance. An electron sink associated with a ca. 20% greater lipid to protein ratio in cancer compared to normal hypoxic cells would be sufficient to balance the electrons released by the formation of more oxidized proteins in CRC and pancreatic cancer. It thus appears possible that a redox disproportionation develops in some cancers, leading to pools of both more reduced and more oxidized macromolecules compared to normal conditions.

Supplemental Information

R source package including protein expression and amino acid composition data (canprot_0.0.5.tar.gz)

DOI: 10.7717/peerj.3421/supp-1

Project code file, to be used with R, the canprot package (this study), and CHNOSZ version 1.1.0

DOI: 10.7717/peerj.3421/supp-2

Compositional summaries: mean values of ZC and nH2O and corresponding mean differences, p-values, and common-language effect sizes (CLES)

DOI: 10.7717/peerj.3421/supp-3

Comparison of basis species: scatterplots of nO2 vs ZC and nH2O vs ZC for proteins in the UniProt human proteome

DOI: 10.7717/peerj.3421/supp-4

Average compositions of down- and up-expressed proteins in each dataset, plotted as point symbols and arrowheads, on nH2OZC diagrams

DOI: 10.7717/peerj.3421/supp-5

Potential diagrams for each dataset. These diagrams were merged to make the diagrams in Fig. 3

DOI: 10.7717/peerj.3421/supp-6
10 Citations   Views   Downloads