Prediction and analysis of novel key genes ITGAX, LAPTM5, SERPINE1 in clear cell renal cell carcinoma through bioinformatics analysis

Background Clear Cell Renal Cell Carcinoma (CCRCC) is the most aggressive subtype of Renal Cell Carcinoma (RCC) with high metastasis and recurrence rates. This study aims to find new potential key genes of CCRCC. Methods Four gene expression profiles (GSE12606, GSE53000, GSE68417, and GSE66272) were downloaded from the Gene Expression Omnibus (GEO) database. The TCGA KIRC data was downloaded from The Cancer Genome Atlas (TCGA). Using GEO2R, the differentially expressed genes (DEG) in CCRCC tissues and normal samples were analyzed. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed in DAVID database. A protein-protein interaction (PPI) network was constructed and the hub gene was predicted by STRING and Cytoscape. GEPIA and Kaplan-Meier plotter databases were used for further screening of Key genes. Expression verification and survival analysis of key genes were performed using TCGA database, GEPIA database, and Kaplan-Meier plotter. Receiver operating characteristic (ROC) curve was used to analyze the diagnostic value of key genes in CCRCC, which is plotted by R software based on TCGA database. UALCAN database was used to analyze the relationship between key genes and clinical pathology in CCRCC and the methylation level of the promoter of key genes in CCRCC. Results A total of 289 up-regulated and 449 down-regulated genes were identified based on GSE12606, GSE53000, GSE68417, and GSE66272 profiles in CCRCC. The upregulated DEGs were mainly enriched with protein binding and PI3K-Akt signaling pathway, whereas down-regulated genes were enriched with the integral component of the membrane and metabolic pathways. Next, the top 35 genes were screened out from the PPI network according to Degree, and three new key genes ITGAX, LAPTM5 and SERPINE1 were further screened out through survival and prognosis analysis. Further results showed that the ITGAX, LAPTM5, and SERPINE1 levels in CCRCC tumor tissues were significantly higher than those in normal tissues and were associated with poor prognosis. ROC curve shows that ITGAX, LAPTM5, and SERPINE1 have good diagnostic value with good specificity and sensitivity. The promoter methylation levels of ITGAX, LAPTM5 and SERPINE1 in CCRCC tumor tissues were significantly lower than those in normal tissues. We also found that key genes were associated with clinical pathology in CCRCC. Conclusion ITGAX, LAPTM5, and SERPINE1 were identified as novel key candidate genes that could be used as prognostic biomarkers and potential therapeutic targets for CCRCC.


INTRODUCTION
Kidney cancer is a complex disease composed of a variety of cancers, showing different histology, clinical course, genetic changes, and response to treatment . Renal Cell Carcinoma is the most common tumor in the kidney, Whose morbidity and mortality are rising worldwide. Renal cell carcinoma is divided into different subtypes, including clear cell renal cell carcinoma (CCRCC), chromogenic cell renal carcinoma (chRCC), and papillary renal cell carcinoma (pRCC) . CCRCC is a metabolic disease (Wettersten et al., 2017), accounting for more than 80% of all renal cell carcinomas (Makhov et al., 2018). It is the most aggressive subtype of renal cell carcinoma with a high rate of metastasis and recurrence (Jiang et al., 2020;Yuan et al., 2018). Although some progress has been made in the treatment of CCRCC, the current treatments of CCRCC still focus on surgical treatment and traditional chemotherapy (Loo et al., 2019;Bex et al., 2019). At the same time, there is a lack of effective early diagnosis methods in the clinic, and some patients still have a relapse and targeted drug tolerance, leading to poor prognosis of radiotherapy and chemotherapy. Therefore, finding new targeted biomarkers relevant to the diagnosis and treatment of CCRCC remains of paramount importance.
In this study, bioinformatics methods were used to obtain CCRCC gene expression data from GEO database, and normal samples and CCRCC samples were selected for grouping processing. Next, 738 DEGs were screened, including 289 up-regulated genes and 449 down-regulated genes. And then, GO enrichment analysis and KEGG signal pathway analysis were performed by DAVID. The up-regulated DEGs are mainly concentrated on protein binding, plasma membrane, inflammation, signal transduction, and PI3K-Akt signaling pathways, while the down-regulated genes are mainly concentrated on the extracellular exosome, oxidation-reduction process, integral component of membrane, protein homodimerization activity, and metabolic pathways. Finally, the top 35 hub genes were screened by PPI network. Based on the novelty of ITGAX, LAPTM5 and SERPINE1 that have not been reported in CCRCC, the expression of these three genes is significantly associated with survival prognosis and all have high degrees. Therefore, ITGAX, LAPTM5, and SERPINE1 were finally selected as key genes. Further analysis showed that ITGAX, LAPTM5, and SERPINE1 are highly expressed in CCRCC, which are significantly related to the survival prognosis of CCRCC. The methylation level of ITGAX, LAPTM5, SERPINE1 in CCRCC is reduced. Moreover ITGAX, LAPTM5 and SERPINE1 are related to the Clinical pathology of CCRCC and have good diagnostic value for CCRCC. In conclusion, we provided a systematic and comprehensive analysis of CCRCC and is the first to suggest that ITGAX, LAPTM5, and SERPINE1 might be used as biomarkers for the new clinical diagnosis and treatment of CCRCC.

Study design and data processing
In order to clarify our study, we designed a flow chart to demonstrate data collection, processing, analysis and verification (Fig. 1). The online tool GEO2R (https://www.cancer. gov/about-nci/organization/ccg/research/structural-genomics/tcga) is an online analysis of GEO series based on the R programming language, which is used to screen DEGs between normal kidney and CCRCC samples from the GSE data sets. The data was standardized and filtered to select significant DEGs. P-value < 0.05, log Fold Change (|log FC|) ≥1 were chosen as the cutoff criteria. log Fold Change (| log FC |) ≥ 1 means that the multiple of change is greater than or equal to 2. It is generally considered that there is a difference between 2 times and more. Then, DEGs were further screened according to cutoff criteria: P-value < 0.05, log FC ≥1 as up-regulated genes, P-value < 0.05, log FC ≤1 as down-regulated genes. Finally, importing all the up-regulated genes or down-regulated genes in the 4 datasets into Funrich 3.1.1 software, and taking the intersection of the up-regulated genes or down-regulated genes in the 4 datasets respectively. TCGA RNA-seq simple converter was used to standardize and log2 conversion of TCGA KIRC data. TCGA KIRC data were used for expression verification and ROC curve analysis of Key genes.  (Kanehisa & Goto, 2000) pathway enrichment analysis of important DEGs, which promoted the visualization of gene and protein function (Dennis Jr et al., 2003). Among them, the count value represents the number of genes enriched in the pathway. The cutoff value was P < 0.05.

Protein-Protein Interaction (PPI) Network construction and hub gene screening
The online tool STRING (Von Mering et al., 2003) (http://string-db.org) was utilized for the analysis of protein-protein interactions . Importing all DEGs (including up-regulated genes and down-regulated genes) into the STRING database for analysis, and the confidence level ≥0.4 is considered to be significant for PPI. STRING and Cytoscape software (version 3.6.1) were used to construct the PPI network, which is regarded as a network software platform for visualizing protein molecular interactions. Using the degree algorithm in the cytohubba plug-in of Cytoscape, the degree value could be calculated and thus the hub genes could also be screened from PPI network. The importance of genes is directly related to the degree of protein. According to the degree value (degree ≥37), the first 35 genes were selected as hub genes. Three key genes ITGAX, LAPTM5 and SERPINE1with high degree were screened through survival and prognosis analysis as well as the novelty of genes.

Key genes analysis and verification
Three key genes were analyzed and verified comprehensively by using TCGA, GEPIA, Oncomine, Kaplan-Meier plotter and UALCAN databases. Similarly, TCGA database also applied to ROC curve analysis. ROC curve was plotted by R software, which could be used to analyze the diagnostic value of key genes in CCRCC. GEPIA (http://gepia.cancer-pku.cn/) analyzed the expression of key genes in CCRCC. Oncomine (Rhodes et al., 2007) (http://www.on comine.org) was a database consisting of microarray data of various tumors, which verified the expression of key genes in CCRCC. Kaplan-Meier Plotter (https://kmplot.com/analysis/) was used to analyze the survival of key genes in CCRCC. UALCAN (Chandrashekar et al., 2017) (http://ualcan.path. uab.edu/index.html) online database was used to analyze the relationship between key genes and Clinical pathology in CCRCC, and the promoter methylation level of key genes in CCRCC was analyzed.

Statistical analysis
Statistically significant differences between the normal tissues group and tumor tissues group were determined using Student's t tests. Also, Kaplan-Meier analysis was used to assess OS. ROC curve is a graphical plot (Cao & López-de Ullibarri, 2019) that reflects the sensitivity and specificity of continuous variables. P < 0.05 was considered to be statistically significant for all the tests. All the statistical analyses applied GraphPad prism 6.0 or R software.

Volcano plots of the differentially expressed genes in four datasets
The data sets of CCRCC, which were GSE12606, GSE53000, GSE68417 and GSE66272, were downloaded from the GEO database (Table 1) and analyzed by GEO2R separately. A total of 9,560 DEGs were screened from the GSE12606 data sets, among which 1568 genes were up-regulated and 5436 genes were down-regulated instead. A total of 1,286 DEGs were screened from the GSE53000 data sets, among which 583 were up-regulated and 708 were down-regulated. A total of 1,890 DEGs were screened from the GSE68417 data sets among them 726 up-regulated and 1,164 down-regulated genes were selected. There were 3,627 up-regulated genes and 5,170 down-regulated genes among the 8,797 DEGs screened from the GSE66272 data sets. The screening criteria were P-value 0.05, log Fold Change (| log FC |) ≥ 1. All of the DEGs from the four data sets were presented in Volcano plots (Figs. 2A-2D). Among them, red represents high-expressed genes, green represents low-expressed genes, and black represents genes whose expression levels are not significant in each data set. We gained the intersection of four independent data sets through

GO and KEGG enrichment analysis of DEGs
To further explore the function of DEGs, enrichment analysis of up-regulated genes and down-regulated genes were displayed respectively. DAVID 6.8 was used to perform GO and KEGG analysis of DEGs in CCRCC (Table 2). In biological processes, up-regulated DEGs are mostly involved in cell adhesion, signal transduction, immune response, especially the regulation of inflammatory response (Fig. 3A); while down-regulated DEGs are mostly involved in oxidation-reduction process proteolysis, ion transmembrane transport,  response to drug, and ion transport (Fig. 3E). In terms of cellular components, upregulated DEGs are mainly distributed in the plasma membrane, integral component of the membrane, extracellular exosome, and membrane (Fig. 3B); down-regulated DEGs are mainly distributed in integral component of membrane, plasma membrane, cellular exosomes, and integral component of plasma membrane (Fig. 3F). In terms of molecular function, up-regulated DEGs generally have protein binding, identical protein bind, ATP binding capacity, and protein homodimerization activity (Fig. 3C); down-regulated DEGs generally have identical protein homodimerization activity, calcium ion binding, sequencespecific DNA binding, and oxidoreductase activity (Fig. 3G). In the KEGG signal pathway, the up-regulation of DEGs mostly involved in the PI3K-Akt signaling pathway, pathways in cancer, focal adhesion, and HIF-1 signaling pathway (Fig. 3D); while the down-regulation of DEGs mainly involved in the metabolic pathway, Biosynthesis of antibiotics, Carbon metabolism, and Aldosterone-regulated sodium reabsorption (Fig. 3H). CCRCC is a kind of metabolic disease. Metabolic reprogramming covers different processes including aerobic glycolysis, fatty acid metabolism and the utilization of tryptophan, glutamine as well as arginine (Lucarelli et al., 2019), which has also been proved by the results of KEGG pathway enrichment analysis in the work. KEGG pathway results show that down-regulated DEGs are enriched in metabolic pathways, Glycolysis/Gluconeogenesis, Glycine, serine and threonine metabolism and so on. In addition, GO and KEGG enrichment analyses were performed for all DEG (upregulated and down-regulated genes) (Fig. S2). Finally, 11 common enrichment results were found from the three analyses (including up-regulated DEG, down-regulated DEG and all DEG enrichment analyses). The common enrichment results are as follows: response to drug, plasma membrane, integral component of membrane, extracellular space, extracellular exosome, cell surface, cell surface, protein homodimerization activity, identical protein binding. The results of enrichment analysis were consistent with previous studies (Wang, Yu & Chai, 2019;Tian et al., 2019). It suggests that these 11 pathways may be important in CCRCC.

Construction of PPI network and screening of key genes
To identify the key genes, the STRING online database and Cytoscape software were used to analyze all DEGs (including up-regulated genes and down-regulated genes)and construct PPI network (Fig. 4A). Based on the main role of proteins in biological functions, their interaction determines the molecular and cellular mechanisms that control the health and disease state of the organism (Safari-Alighiarloo et al., 2014). Next, using the degree algorithm in the cytohubba plug-in of Cytoscape to screen the hub genes in PPI network. The gene whose degree ≥37 could be defined hub gene, therefore the first 35 genes (Table 3) in the PPI network were chosen to be hub genes (Fig. 4D). Next, GO enrichment analysis was conducted for all DEGs and the first 35 DEGs in the PPI network. The result showed that all DEGs mainly enriched in signal transduction, oxidation-reduction process, cell adhesion, inflammatory response, plasma membrane, integral component of membrane, extracellular exosome, integral component of plasma membrane (Figs. 4B-4C). The top 35 DEGs were enriched in extracellular space, extracellular exosome, inflammatory response,   The molecular function of GO analysis showed that the up-regulation of DEGs was mainly related to protein binding, identical protein bind, ATP binding capacity, and protein homodimerization activity. (D) The KEGG pathways related to the up-regulation of DEGs expression mainly include the PI3K-Akt signaling pathway, focal adhesion, pathways in cancer, and HIF-1 signaling pathway. (E) The biological process of GO analysis showed that the downregulation of DEGs was mainly related to oxidationreduction process proteolysis, ion transmembrane transport, response to the drug, and ion transport. (F) The enrichment analysis of down-regulated DEGs cell components is mainly related to integral components of membrane, plasma membrane, cellular exosomes, and integral component of plasma membrane. (G) The molecular function of GO analysis showed that the downregulation of DEGs was mainly related to protein homodimerization activity, calcium ion binding, oxidoreductase activity, and sequence-specific DNA binding. (H) The KEGG pathways related to the down-regulation of DEGs expression mainly include metabolic pathway, Biosynthesis of antibiotics, Carbon metabolism, and Aldosterone-regulated sodium reabsorption. an integral component of the plasma membrane, and cell surface (Fig. 4E). By comparison, it turned out that the enrichment analysis results of all DEGs in PPI network contained the results of the top 35's. In the end, three new key genes with high degree which were ITGAX, LAPTM5 and SERPINE1, could be screened by using GEPIA and Kaplan-Meier plotter database.

The expression of key genes
Among the 35 genes, we focused on ITGAX, LAPTM5, and SERPINE1, which have not been reported to be related to the occurrence and development of CCRCC. Firstly, the expression levels of ITGAX, LAPTM5, and SERPINE1 in CCRCC tumor tissues are significantly higher than those in normal tissues adjacent to cancer according to GEPIA database (Figs. 5A-5C). Furthermore, the data of TCGA KIRC showed that compared with adjacent normal tissues, the mRNA expressions of ITGAX, LAPTM5, and SERPINE1 in 72 pairs of CCRCC tissues were significantly increased (Figs. 5D-5F). In the Oncomine Gumz renal database, the mRNA levels of ITGAX, LAPTM5 and SERPINE1 were also upregulated in CCRCC tissues when compared with adjacent normal kidney tissues (Figs. 5H-5G). According to the UALCAN database, the promoter methylation levels of ITGAX, LAPTM5 and SERPINE were decreased in CCRCC (Figs. 5J-5L). To sum up, according to the GEPIA database, Oncomine Gumz renal database and TCGA database, the expression levels of these three genes in CCRCC tumor tissues are significantly higher than those in normal tissues adjacent to cancer. It could be further speculated that the high expression of ITGAX, LAPTM5, and SERPINE1 in CCRCC tumor tissue might be related to the decrease of promoter methylation.

The association of key genes expression with clinical pathology in CCRCC
Furthermore, the relationship between the mRNA expression of ITGAX, LAPTM5 and SERPINE1 and different clinical pathology grades were measured. The results showed that their mRNA expression was significantly related to pathological grades (Figs. 6A-6C). And the expression of ITGAX, LAPTM5, and SERPINE1 mRNA in CCRCC samples are also significantly correlated with severe clinical staging (Figs. 6D-6F). Among them, the expression levels of ITGAX, LAPTM5, and SERPINE1 were higher in stage 4 and grade 4.
In conclusion, ITGAX, LAPTM5, and SERPINE1 are significantly associated with clinical pathology.

Survival and diagnostic value of ITGAX, LAPTM5, and SERPINE1 in CCRCC
According to the Kaplan-Meier plotter database, the overall survival of ITGAX, LAPTM5, and SERPINE1 genes was tested (Figs. 7A-7C). The results showed that the high expression of three key genes in CCRCC was negatively correlated with prognosis. Then, the ROC curve was used to evaluate the difference between CCRCC and the normal tissues in the TCGA KIRC data.

DISCUSSION
Clear Cell Renal Cell Carcinoma (CCRCC) is a metabolic disease whose morbidity is rising worldwide. The feature of kidney cancer is to participate in the target genes' mutation of metabolic pathways (Lucarelli et al., 2019). Recently, with the application of bioinformatics, the molecular characteristics of CCRCC have been greatly improved and the development of targeted therapy has been promoted. These advances have significantly improved the median survival of patients with advanced disease. However, about 30% of CCRCC local patients will still relapse or metastasize after surgical removal of the tumor (Li et al., 2019). Around 1/3 of the metastatic patients had poor prognosis and rare high drug resistance rate. Under the circumstance of different treatment, the identification of biomarkers was needed urgently so as to predict the drug's effects (Deleuze et al., 2020). Therefore, identification of CCRCC key genes and prognostic judgment is still very crucial. In our research, some reported genes related to CCRCC, such as VEGFA (Zeng et al., 2016), EGFR (Cossu-Rocca et al., 2016), were also screened out. Vascular Endothelial Growth Factor A (VEGFA) is a member of the PDGF/VEGF growth factor family. VEGFA has a potential role in the diagnosis and treatment of CCRCC. VEGFA can inhibit the proliferation of CCRCC 786-O cells, promote cell apoptosis, and inhibit cell migration and invasion (Zeng et al., 2016). Epidermal growth factor receptor (EGFR) is closely related to the progression of many epithelial malignancies and is an important therapeutic target (Cossu-Rocca et al., 2016). EGFR is a cell surface protein, belonging to the ERBB family. EGFR binds to epidermal growth factor to induce receptor dimerization and tyrosine self phosphorylation, which eventually leads to cell proliferation (Mitsudomi & Yatabe, 2010). EGFR can activate a variety of signaling pathways, mainly MAPK / ERK and PI3K / AKT pathways (Yarden & Sliwkowski, 2001). Compared with the former 4 studies, this study found three new key genes which were ITGAX, LAPTM5 and SERPINE1, and applied several methods to do functional analysis and systematic research to key genes such as methylation level analysis, survival analysis, ROC curve analysis and so on. In previous studies, GSE12606 mainly focused on the functional analysis of HLA ligand in CCRCC, while GSE5300 mainly focused on the structure and revolution of CCRCC genome, and GSE66272 mainly focused on the role of miRNA in CCRCC and GSE68417 mainly focused on the analysis to the gene expression profile of CCRCC. On the contrary, this study put emphasis on screening the biomarkers used to do early diagnosis in CCRCC and analyzing the intersection of DEGs in four datasets. Besides, three new potential marker genes ITGAX, LAPTM5, SERPINE1 were also screened out and proved. Integrin alpha x (ITGAX) is a member of the integrin family, commonly function as a receptor for extracellular matrix. It is reported that ITGAX is involved in the angiogenesis of dendritic cells and tumor angiogenesis (Wang et al., 2019). In addition, ITGAX is identified as a new type of aggressive prostate cancer susceptibility gene (Williams et al., 2014). Lysosomal protein transmembrane 5 (LAPTM5), known as E3 protein, may play a role in hematopoiesis and prevent excessive activation of lymphocytes (Cai et al., 2015). It is reported that LAPTM5 can regulate the proliferation and viability of bladder cancer cells, leading to cell cycle arrest in the G0/G1 phase (Chen et al., 2017). LAPTM5 is also associated with the spontaneous regression of neuroblastoma (Inoue et al., 2009). Studies have shown that Serpin family E member 1 (SERPINE1 is a regulator of Glioblastoma (GBM) cell proliferation, and is related to poor prognosis of patients and mesenchymal GBM. Down-regulation of SERPINE1 in primary GBM cells inhibited the growth and invasiveness of tumors in the brain, and SERPINE1 plays a key role in the spread of GBM (Seker et al., 2019).

CONCLUSIONS
To sum up, the expression levels of ITGAX, LAPTM5, SERPINE1 in CCRCC tumor tissues are significantly higher than those in normal tissues adjacent to cancer and are related to the tumor stage and tumor grade. ITGAX, LAPTM5, and SERPINE1 have high diagnostic efficiency for tumors and normal tissues, and their expressions are associated with poor prognosis of CCRCC. The decrease of promoter methylation of ITGAX, LAPTM5 and SERPINE1 in CCRCC tumor tissues indicates that the high expression of key genes in CCRCC might be relevant to the low methylation level. The limitation of this study lied in that the internal molecular mechanisms where key genes played a role remained unclear, which need a further research. Further studies are needed to explore the detailed mechanisms of these key genes in CCRCC. In conclusion, we identified ITGAX, LAPTM5, and SERPINE1 as potential marker genes of CCRCC by bioinformatics methods, providing insights for future therapeutic design. Meanwhile, we conducted a relatively systematic and comprehensive analysis on CCRCC data, thereby providing a theoretical basis for identifying therapeutic targets of CCRCC, promoting early detection, and monitoring tumor progression.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This work was supported by grants from the National Natural Science Foundation of China [81702743] and China Postdoctoral Science Foundation [2018M640612, 2019T120568]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.