Hepatocellular carcinoma (HCC), which is the second most dominant cause of cancer deaths throughout the word, is the most ordinary form of primary carcinoma of the liver (Llovet et al., 2016). It is estimated that approximately 841,000 new cases are expected to occur worldwide and more than 780,000 patients would die of HCC in 2018 (Bray et al., 2018). Kinase and immune checkpoint inhibitors have been shown to be effective options for the treatment of advanced-stage HCC, but they have limited effectiveness (Yang et al., 2019b). Despite the new progress in drug development, the clinical outcomes in patients with advanced HCC remains poor. Based on the Surveillance, Epidemiology, and End Results database, the 5-year survival rate was 30.5% for patients with local HCC, and less than 5% for those with distant metastasis (Oweira et al., 2017). Due to the poor outcomes, it is necessary to investigate novel effective markers for the prognosis of HCC.
Recent developments in high-throughput sequencing, technologies and bioinformatics have drastically changed research on genomic in disease, and many marker changes related to prognosis and survival have been revealed through mining databases (Liu et al., 2018). Several biomarkers have been shown to predict the prognosis of the patients with HCC. For example, serum lncRNA urothelial carcinoma-associated 1 (UCA1) was an independent harmful prognostic indicator for HCC (Zheng et al., 2018). Collagen triple helix repeat containing 1 (CTHRC1) may serve as a prognostic biomarker for HCC (Zhou et al., 2019). Serum acetylcarnitine is a meaningful biomarker reflecting HCC diagnosis and progression (Lu et al., 2016). Notably, a six glycolysis-related gene signature was found to predict survival in patients with HCC (Jiang et al., 2019). However, in view of the intrinsic biological heterogeneity of tumors and batch effects caused by different sequencing platforms, previous prognostic gene signature had to standardize gene expression profiles, which was very difficult for data processing. Moreover, compared with a single gene marker, multigene prognostic signatures are better alternatives for predicting prognosis and survival (Chen et al., 2018). Thus, a novel method which omits data standardization and scaling based on the relative ranking of gene expression levels has been used in this study. Many reliable results have been obtained in various studies by using this method (Heinaniemi et al., 2013; Li et al., 2017; Popovici et al., 2012).
Glycolysis, one of the most ancient metabolic processes, is a low-energy-providing pathway. The metabolic properties of cancer cells differed from those of normal cells (Annibaldi & Widmann, 2010). Cancer cells had rapid metabolic features which increased uptake of glucose and glycolysis (Akram, 2013). This allowed cancer cells to preferentially metabolize glucose through aerobic glycolysis, offering them with a progression advantage (Hua et al., 2018). Some studies indicated that aerobic glycolysis phenotype was associated with poor prognosis of HCC (Cui et al., 2018; Guo et al., 2015; Hua et al., 2018; Lin et al., 2018; Xu et al., 2017). However, there were no systematic studies to establish glycolysis-related gene pairs (GRGPs) signatures to predict the survival of patients with HCC. Therefore, it was necessary to establish multigene prognostic signatures for HCC using glycolysis-related genes pairs.
The purpose of this study was to construct GRGPs signatures to predict the prognosis in patients with HCC.
Materials and Methods
The expression profiles and clinicopathological data of HCC and normal tissues were obtained from The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov/) and International Cancer Genome Consortium Japan (ICGC, https://dcc.icgc.org/releases/current/Projects/LIRI-JP). The clinical information inclusion criteria were set as follows: (1) patients had completely detailed clinical information; (2) The follow-up time of samples exceeded 30 days. GSE10140, GSE10141, GSE10143, GSE15654, GSE14520, GSE76427, GSE45114 expression profile was derived from the Gene Expression Omnibus (GEO) database, including 1,469 samples. The TCGA was used as training dataset. Other databases were used as validation datasets.
Gene set enrichment analysis
To explore whether the specific gene sets were significant different between tumor group and normal group, we performed gene sets enrichment analysis (GSEA) (http://www.broadinstitute.org/gsea/index.jsp). The mRNAs expression levels between tumor and non-tumor groups were analyzed to confirm which function could be available for further study. Normalized P-value ≤ 0.05 were considered to be statistically significant.
Construction and evaluation of glycolysis‑related gene pairs signature
First of all, the glycolysis‑related mRNAs level in the same sample was pairwise compared to generate a score for each glycolysis‑related gene pair (GRGP). If the expression level of gene 1 was greater than gene 2, the GRGP score was 1, otherwise it was 0 (Li et al., 2017). GRGPs with high variation in TCGA dataset (median absolute deviation >0.05) were included in subsequent model construction. The prognostic value of GRGPs was identified by univariate Cox regression. Then, GRGPs with P ≤ 0.05 in univariate analysis were incorporated into Lasso regression model in order to establish a GRGPs signature. A risk score was established according to the following formula: risk score = . To classify patients into low-risk group and high-risk group, the optimal cut off of the GRGPs signature was defined using time-dependent receiver operating characteristic (ROC) curve analysis at 1 year in the TCGA dataset for overall survival (OS). The OS between two groups was compared utilizing Kaplan–Meier and Log-rank test. Risk score and other clinicopathological characteristics were included in the model so as to confirm whether risk score was an independent factor to predict the progress of the patients. Further, the clinical value of the GRGPs signature was verified by comparing the risk scores of patients with different ages, gender, grade and stage.
The OS of the patients with HCC at 1 year, 3 years and 5 years were predicted using a nomogram. Index of concordance (C-index) and Calibration curves were applied to explore the accuracy of the model established from TCGA dataset.
Validation of the GRGPs signature
Seven GEO databases (GSE10140, GSE10141, GSE10143, GSE15654, GSE14520, GSE76427, and GSE45114) and ICGC database were enrolled for subsequent verification. According to human race, all the databases were divided into European and American dataset and Asian dataset. European and American dataset (765 samples) included GSE10140, GSE10141, GSE10143, and GSE15654 database; Asian dataset (947 samples) included GSE14520, GSE76427, GSE45114 and ICGC database. Due to the lack of detailed clinical information (such as age, gender, stage) in European and American dataset, thus it was only used for survival analysis. Asian dataset was used for subsequent survival analysis and independent prognostic analysis. The GRGPs signature was calculated, and then Kaplan–Meier, Log-rank test and Cox regression were used to verify whether the GRGPs signature was significantly associated with OS. The ROC curve, C-index and Calibration curves were constructed to examine whether the model established by TCGA dataset could effectively predict survival in patients with HCC.
Cox regression was utilized to evaluate the associations between the expression levels of the glycolytic-related mRNAs and OS. Moreover, univariate and multivariate Cox regression were applied to determine the prognostic values of the GRGPs signature and various clinicopathological characteristics. The prediction accuracy of the risk score for 1-year, 3-years and 5-years survival was evaluated using ROC curve analysis. Statistical tests were two-sided, and P values below 0.05 were considered to be statistically significant.
Preliminary selection of genes using GSEA
According to inclusion criteria, 349 patients with HCC were obtained. Expression signatures of marker gene sets were obtained by condensing multiple gene sets from the Molecular Signature Database (MSigDB). Each expression signature involved 50 specific gene sets that stand for clearly defined biological statuses and processes (Zhang, Zhang & Yu, 2019). GSEA was applied to investigate whether the identified gene sets revealed significant differences between the tumor and normal groups. Forty-four gene sets were upregulated in HCC, and 20 gene sets were highly enriched, with normalized P < 0.05 among the 50 gene sets (Fig. 1; Table S1). As can be seen in the Table S1, the biological process of glycolysis (P < 0.05) contained 199 genes, which was the second largest in size.
Identification of prognostic GRGPs
Based on 199 GRGs, 19,701 GRGPs were established. After removing GRGPs with comparatively low variation (MAD > 0.05), 1,102 GRGPs were reserved and selected as initial candidate GRGPs. Cox regression was utilized to select prognostic GRGPs. According to the results of univariate Cox, 35 GRGPs had prognostic values for the patients with HCC (P < 0.05, Table S2). Subsequently, lasso regression model revealed that 17 GRGPs as prognostic factors (Table 1; Figs. 2A–2B). As shown in Table 1, five GRGPs (HMMR_PFKFB1, CHST1_GYS2, MERTK_GYS2, GPC1_GYS2 and LDHA_GOT2) were found to be harmful prognostic factors and twelve GRGPs (IDUA_GNPDA1, IDUA_ME2, IDUA_G6PD, IDUA_GPC1, MPI_GPC1, SDC2_LDHA, PRPS1_PLOD2, GALK1_IER3, MET_PLOD2, GUSB_IGFBP3, IL13RA1_IGFBP3 and CYB5A_IGFBP3) were verified to be beneficial prognostic factors.
|Gene1||Encoding protein||Function||Gene2||Encoding protein||Function||Coefficient|
|IDUA||hydrolyzes the terminal alpha-L-iduronic acid residues of two glycosaminoglycans, dermatan sulfate and heparan sulfate||Chondroitin sulfate/dermatan sulfate metabolism and Glycosaminoglycan metabolism||GNPDA1||An allosteric enzyme||The reversible conversion of D-glucosamine-6-phosphate into D-fructose-6-phosphate and ammonium||−0.294|
|IDUA||hydrolyzes the terminal alpha-L-iduronic acid residues of two glycosaminoglycans, dermatan sulfate and heparan sulfate||Chondroitin sulfate/dermatan sulfate metabolism and Glycosaminoglycan metabolism||ME2||A mitochondrial NAD-dependent malic enzyme||Catalyzes the oxidative decarboxylation of malate to pyruvate||−0.145|
|IDUA||hydrolyzes the terminal alpha-L-iduronic acid residues of two glycosaminoglycans, dermatan sulfate and heparan sulfate||Chondroitin sulfate/dermatan sulfate metabolism and Glycosaminoglycan metabolism||G6PD||A cytosolic enzyme encoded by a housekeeping X-linked gene||Produce NADPH||−0.160|
|IDUA||Chondroitin sulfate/dermatan sulfate metabolism and Glycosaminoglycan metabolism||GPC1||Disease related genes belongs to the glypican family||Play a role in the control of cell division and growth regulation||−0.345|
|HMMR||Involved in cell motility||Regulation of PLK1 activity at G2/M transition and metabolism||PFKFB1||A member of the family of bifunctional 6-phosphofructo-2-kinase||An activator of the glycolysis pathway and an inhibitor of the gluconeogenesis pathway/participate in hepatocellular carcinoma tumorigenesis||0.049|
|MPI||Phosphomannose isomerase catalyzes the interconversion of fructose-6-phosphate and mannose-6-phosphate||Metabolism of proteins and amino sugar and nucleotide sugar metabolism||GPC1||Disease related genes belongs to the glypican family||Play a role in the control of cell division and growth regulation||−0.343|
|SDC2||A transmembrane (type I) heparan sulfate proteoglycan and is a member of the syndecan proteoglycan family||Microglia activation during neuroinflammation: overview and cell surface interactions at the vascular wall||LDHA||Cancer-related protein belongs to the LDH/MDH superfamily||Catalyzes the conversion of L-lactate and NAD to pyruvate and NADH in the final step of anaerobic glycolysis||−0.291|
|PRPS1||Catalyzes the phosphoribosylation of ribose 5-phosphate to 5-phosphoribosyl-1-pyrophosphate||Thiopurine pathway, pharmacokinetics/pharmacodynamics and carbon metabolism||PLOD2||A membrane-bound homodimeric enzyme||Participate in collagen chain trimerization and degradation of the extracellular matrix||−0.085|
|GALK1||Galactokinase is a major enzyme for the metabolism of galactose||Galactokinase is a major enzyme for the metabolism of galactose||IER3||A predicted intracellular protein belongs to the IER3 family||Protect cells from Fas- or tumor necrosis factor type alpha-induced apoptosis||−0.061|
|CHST1||A member of the keratin sulfotransferase family of proteins. The encoded enzyme catalyzes the sulfation of the proteoglycan keratin||Among its related pathways are Keratan sulfate/keratin metabolism and metabolism||GYS2||Liver glycogen synthase||Participate in galactose metabolism and glycogen metabolism||0.019|
|MET||A member of the receptor tyrosine kinase family of proteins and the product of the proto-oncogene MET||Hepatocyte growth factor, induces dimerization and activation of the receptor||PLOD2||A membrane-bound homodimeric enzyme||Participate in collagen chain trimerization and degradation of the extracellular matrix||−0.265|
|MERTK||A member of the MER/AXL/TYRO3 receptor kinase family||Regulate cell survival, migration, differentiation, and phagocytosis of apoptotic cells||GYS2||Liver glycogen synthase||Participate in galactose metabolism and glycogen metabolism||0.334|
|GPC1||Disease related genes belongs to the glypican family||Play a role in the control of cell division and growth regulation||GYS2||Liver glycogen synthase||Participate in galactose metabolism and glycogen metabolism||0.198|
|IL13RA1||A subunit of the interleukin 13 receptor||Bind tyrosine kinase TYK2 and mediate the signaling processes||IGFBP3||Encodes a protein with an IGFBP domain and a thyroglobulin type-I domain||Prolonging the half-life of insulin-like growth factor (IGF) and altering their interaction with cell surface receptors||−0.308|
|LDHA||Cancer-related protein belongs to the LDH/MDH superfamily||Catalyzes the conversion of L-lactate and NAD to pyruvate and NADH in the final step of anaerobic glycolysis||GOT2||A pyridoxal phosphate-dependent enzyme||Play a role in amino acid metabolism and the urea and tricarboxylic acid cycles.||0.171|
|CYB5A||A membrane-bound cytochrome||Reduces ferric hemoglobin (methemoglobin) to ferrous hemoglobin||IGFBP3||Encodes a protein with an IGFBP domain and a thyroglobulin type-I domain||Prolonging the half-life of insulin-like growth factor (IGF) and altering their interaction with cell surface receptors||−0.116|
PLK1, Polo-like kinase 1; MET, Mesenchymal Epithelial Transition; MER/AXL/TYRO3 receptor, TAM receptors; TYK2, Tyrosine Kinase 2; LDH/MDH, lactate and Malate dehydrogenases; NAD, Nicotinamide adenine dinucleotide; NADH, Nicotinamide adenine dinucleotide; IGFBP, insulin-like growth factor-binding protein; IGF, insulin-like growth factor; IER3, Immediate Early Response 3; NADPH, nicotinamide adenine dinucleotide phosphate.
Construction of GRGPs signature
These seventeen GRGPs were utilized to establish a GRGPs signature. Risk score of GRGPs signature for each patient was calculated utilizing the following formula (formula 1): risk score = (−0.30794 * IDUA_GNPDA1) − (0.15299 * IDUA_ME2) − (0.16389 * IDUA_G6PD) − (0.35599 * IDUA_GPC1) + (0.04846 * HMMR_PFKFB1) − (0.35632 * MPI_GPC1) − (0.29752 * SDC2_LDHA) − (0.09077 * PRPS1_PLOD2) − (0.06137 * GALK1_IER3) + (0.02511 * CHST1_GYS2) − (0.26287 * MET_PLOD2) − (0.00305 * GUSB_IGFBP3) + (0.34302 * MERTK_GYS2) + (0.20608 * GPC1_GYS2) − (0.31484 * IL13RA1_IGFBP3) + (0.17629 * LDHA_GOT2) − (0.10962 * CYB5A_IGFBP3). The cutoff point of risk score was set at −0.698 utilizing ROC curve analysis, which classified the patients into high-risk group or low-risk group (Fig. 2C). Risk score was significantly associated with OS of the patients with HCC, in which OS in low-risk group was longer than that in high-risk group (P = 2.898e−14, Fig. 3A). The survival time of patients with HCC decreased with risk score increasing (Fig. 4).
Association between risk score and clinicopathologic factors
Risk score increased with age, stage, and survival status, demonstrating that the GRGPs signature might be relevant to the progression of HCC. Risk score of patients with advanced-stage and advanced-age were significantly higher than those with early-stage and early-age (Figs. S1A and S1B). Risk scores of dead patients were higher than those of living patients (Fig. S1C). The heat map illustrated that the high expression of these seventeen GRGPs were significantly related to female, lower survival status of patients, higher stage and higher grade (Fig. S1D).
The GRGPs signature was an independent prognostic factor
In TCGA dataset, univariate Cox regression revealed that GRGPs signature was associated with OS and its HR was 3.508 (95% CI [2.608–4.720], P < 0.001, Table S3; Fig. 5A). After controlling clinical features such as gender, age, tumor stage and grade, GRGPs signature remained to be an independent prognostic indicator (HR = 3.204, 95% CI [2.293–4.476], P < 0.001, Table 2; Fig. 5B). GRGPs signature and TNM stage were independent prognostic factors based on the TCGA. Thus, these factors were included in nomogram. GRGPs signature was the largest contributor to 1-year, 3-year and 5-year OS (Fig. 6A). The C-index of the nomogram was 0.764 (95% CI [0.715–0.813]). The areas under the ROC curve (AUC) corresponding to the survival at 1 year, 3 years and 5 years were 0.803, 0.777 and 0.774, respectively (P < 0.05). The C-index, ROC curve and Calibration explained that the GRGPs signature had better accuracy (Figs. 6B–6D).
Validation of the GRGPs signature
In validation datasets, the risk score of GRGPs signature was calculated according to formula 1. The risk score was also significantly correlated with OS of patients with HCC (European and American dataset: P = 1.143e−02, Fig. 3B; Asian dataset: P = 6.342e−08, Fig. 3C). Univariate independent prognostic analysis indicated that GRGPs signature were independent prognostic factors in Asian dataset (HR of risk score = 2.661, 95% CI [1.862–3.803], P < 0.001, Table S4; Fig. 5C). After controlling stage and gender, GRGPs signature remained an independent prognostic indicator in multivariate analysis (HR = 2.567, 95% CI [1.714–3.844], P < 0.001, Table S5; Fig. 5D). According to results of independent prognostic analysis, GRGPs signature, stage and gender were included in nomogram based on Asian dataset. GRGPs signature and age were the largest contributor to 1-year, 3-year and 5-year OS in Asian dataset nomogram (Fig. 6E). The C-index of nomogram based on Asian dataset was 0.705 (95% CI [0.660–0.750]). The areas under the ROC curve at 1 year, 3 years and 5 years in Asian dataset were 0.694, 0.664, 0.536, respectively (Fig. 6H). The 3-year and 5-year calibration curves also proved that the GRGPs signature had great accuracy and robustness (Figs. 6F and 6G).
HCC is one of the most universal malignant liver tumors worldwide. Long-term prognosis for HCC remains mighty poor, with metastasis being the major cause of mortality (Uchino et al., 2011). Most tumor cells support synthetic growth and evade apoptosis through aerobic glycolysis. (Iansante et al., 2015; Warburg, 1956). Glycolytic transcriptional factors and glycolysis-related proteins in cancers are significantly correlated with poor prognosis, indicating that glycolytic status may be potentially valuable prognostic biomarkers for various cancers (Yu et al., 2019). Thus, it was valuable and urgent to establish a GRGPs signature in term of predicting the prognosis of patients with HCC.
In consideration of the intrinsic biological heterogeneity of tumors and batch effects caused by different sequencing platforms, previous gene signature needed to standardize or scale gene expression profiles, which resulted in the inability to process large amounts of data quickly. The construction method of prognostic model using gene pairs in this study can overcome these shortcomings greatly, which has been shown to have high stability and accuracy in some studies (Heinaniemi et al., 2013; Li et al., 2017; Popovici et al., 2012).
In this study, a GRGPs signature was constructed to predict the prognosis of patients with HCC, whose accuracy was better than gene signature previously constructed (Jiang et al., 2019). The HCC patients from TCGA dataset could be classified into low-risk group and high-risk group using the optimal cutoff point determined by ROC curve (P = 2.898e−14). The high-risk patients with HCC had shorter OS than the low-risk patients. Consistent with TCGA dataset, The OS in low-risk patients was longer than that in high-risk patients in two validation datasets. The AUC in TCGA and validation datasets were integrally greater than 0.6 which revealed that the GRGPs signature had certain accuracy in predicting survival. Both univariate and multivariate analysis indicated that the GRGPs signature could be set as an independent prognostic factor to predict the prognosis of patients with HCC in TCGA dataset and validation dataset. Furthermore, nomograms were established based on the GRGPs signature and other clinical characteristics, which might serve as potential predictive tools for patients with HCC.
Seventeen GRGPs (HMMR_PFKFB1, CHST1_GYS2, MERTK_GYS2, GPC1_GYS2, LDHA_GOT2, IDUA_GNPDA1, IDUA_ME2, IDUA_G6PD, IDUA_GPC1, MPI_GPC1, SDC2_LDHA, PRPS1_PLOD2, GALK1_IER3, MET_PLOD2, GUSB_IGFBP3, IL13RA1_IGFBP3 and CYB5A_IGFBP3) were associated to OS of patients with HCC, which might be molecular markers of prognosis and potential therapeutic targets.
The prognostic signature consists of 17 GRGPs including 23 unique GRGs. Most of the GRGs involved in this signature are metabolism and tumor related genes that play an important role in patient prognosis and tumor metabolism. GYS is the rate-limiting enzyme for glycogen synthesis, which consists of two isoforms: GYS1 and GYS2 (Roach et al., 2012; Szymanska et al., 2015). Previous studies revealed that loss of GYS2 leaded to glycogen storage disease type 0 (GSD-0) with the symptom of weakened glucose tolerance (Orho et al., 1998; Szymanska et al., 2015). A recent study indicated that decreased expression levels of GYS2 reduced glycogen and significantly correlated with metastasis and poor prognosis of the patients with HCC, GYS2 restricted HBV-Related HCC growth through negative feedback loop with p53 (Chen et al., 2019). Savci-Heijink et al. (2019) demonstrated that the expression level of IDUA was down-regulated in patients with breast cancer and IDUA could be used as one of the potential targets for distinguishing whether breast cancer patients will undergo visceral metastasis (Savci-Heijink et al., 2019). A common growth factor co-receptor, Glypican-1 (GPC1), is abnormally rich in pancreatic cancer and that GPC1 deficiency inhibits tumor growth, angiogenesis and metastasis (Aikawa et al., 2008). Enhanced expression level of GPC1 is associated with BMP and activin receptors in pancreatic cancer, and the low expression of GPC1 could suppress pancreatic cancer cell growth (Kayed et al., 2006). It was found that increased expression level of GPC1 was significantly relevant with poor prognosis of the patients with pancreatic ductal adenocarcinoma (Lu et al., 2017; Zhou et al., 2018). The overexpression of GPC1 was correlated with poor prognosis of the patients with esophageal squamous cell carcinoma, and GPC1 is a key molecule that alters the threshold of chemoresistance to chemo-sensitivity against cis-Diammineplatinum (II) dichloride (CDDP) (Hara et al., 2016). However, there were no studies to report GPC1 prognostic role in HCC. Based on the results of this study, it was revealed to be potential molecular biomarkers and therapeutic targets for the HCC patients. Thus, more researches were necessary to figure out how GPC1 affects the prognosis of HCC exactly. In this study, HMMR was considered as potential molecular target for the treatment of HCC. HMMR has not been directly reported as therapeutic target for HCC. The Y-linked proto-oncogene could promote the expression of HMMR, which was correlated with poor prognosis in the patients with HCC (Kido & Lau, 2019). Over-expression of HMMR was verified as indicators of poor prognosis and metastasis in lung cancer (Liu et al., 2019; Zhang, Zhang & Yu, 2019). HMMR was confirmed to be a potential independent indicator of predicting survival in patients with papillary muscle-invasive bladder cancer (Wang et al., 2019; Yang et al., 2019a).
Several limitations exist in the current study. First, the study was a retrospective study, although we tried to incorporate as many datasets as possible to verify this signature more rigorously. More prospective studies was demanded to prove the prognostic functions of glycolysis-related signals. Second, Gene expression signatures are susceptible to sampling deviation caused by intratumor heterogeneity. Although we removed low variation GRGPs so as to reduce cross-study batch effects, some may still reserve genetic heterogeneity. Third, the functional experiments should be conducted to further indicate the potential molecular mechanisms for predicting effects of GRGPs.
Our study systematically demonstrated the expression of glycolysis-related mRNAs in HCC, verifying the prognostic value of these mRNAs. The GRGPs signature could predict survival in patients with HCC. Therefore, the seventeen GRGPs and their signature may be molecular biomarkers and therapeutic targets for the patients with HCC, which conduces to explore new treatments for HCC.
Clinicopathological significance of the prognostic factors of HCC in TCGA and Asian datasets.
Statistical P values were significant at (A) age, (B) survival status, (C) TNM stage, (D) The heat map revealed the distribution of clinicopathological features and seventeen GRGPs expression contrasted between the low-risk and high-risk groups in TCGA dataset.