Identiﬁcation of potential biomarkers and pivotal biological pathways for prostate cancer using bioinformatics analysis methods

Background: Prostate cancer (PCa) is a common urinary malignancy, whose molecular mechanism has not been fully elucidated. We aimed to screen for key genes and biological pathways related to PCa using bioinformatics method. Methods: Diﬀerentially expressed genes (DEGs) were ﬁltered out from the GSE103512 dataset and subjected to the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses. The protein-protein interactions (PPI) network was constructed, following by the identiﬁcation of hub genes. The results of former studies were compared with ours. The relative expression levels of hub genes were examined in The Cancer Genome Atlas (TCGA) and Oncomine public databases. The UCSC Xena online tools were used to study whether the expression of hub genes was correlated with the survival of PCa patients from TCGA cohorts. Results: Totally, 252 (186 upregulated and 66 downregulated) DEGs were identiﬁed. GO analysis enriched mainly in ‘oxidation-reduction process’ and ‘positive regulation of transcription from RNA polymerase II promoter’; KEGG pathway analysis enriched mostly in ‘metabolic pathways’ and ‘protein digestion and absorption’. KLK3, CDH1, KLK2, FOXA1, and EPCAM were identiﬁed as hub genes from the PPI network. CDH1, FOXA1, and EPCAM were validated by other relevant GEO datasets. All hub genes were validated by both TCGA and Oncomine except KLK2. Two additional top DEGs (ABCC4 and SLPI) were found to be associated with the prognosis of PCa patients. Conclusions: This study excavated the key genes and pathways in PCa, which might be biomarkers for diagnosis, prognosis, and potential therapeutic targets. Abstract 16 Background: Prostate cancer (PCa) is a common urinary malignancy, whose molecular 17 mechanism has not been fully elucidated. We aimed to screen for key genes and biological 18 pathways related to PCa using bioinformatics method. 19 Methods: Differentially expressed genes (DEGs) were filtered out from the GSE103512 dataset 20 and subjected to the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes 21 (KEGG) pathway analyses. The protein-protein interactions (PPI) network was constructed, 22 following by the identification of hub genes. The results of former studies were compared with 23 ours. The relative expression levels of hub genes were examined in The Cancer Genome Atlas 24 (TCGA) and Oncomine public databases. The UCSC Xena online tools were used to study whether 25 the expression of hub genes was correlated with the survival of PCa patients from TCGA cohorts. 26 Results: Totally, 252 (186 upregulated and 66 downregulated) DEGs were identified. GO analysis 27 enriched mainly in ‘oxidation-reduction process’ and ‘positive regulation of transcription from RNA polymerase II promoter’; KEGG pathway analysis enriched mostly in ‘metabolic pathways’ 29 and ‘protein digestion and absorption’. KLK3, CDH1, KLK2, FOXA1, and EPCAM were identified as hub genes from the PPI network. CDH1, FOXA1, and EPCAM were validated by other relevant GEO datasets. All hub genes were validated by both TCGA and Oncomine except KLK2. Two additional top DEGs (ABCC4 and SLPI) were found to be associated with the prognosis of PCa patients. Conclusions: This study excavated the key genes and pathways in PCa, which might be biomarkers for diagnosis, prognosis, and potential therapeutic targets.


Introduction
According to the Cancer Statistics of 2019, prostate cancer (PCa) has emerged as the second most frequently diagnosed malignancy and is estimated to be the second leading cause of cancer-related mortalities among American males (Siegel & Miller 2019).At present, prostate specific antigen (PSA) has been the most routinely used biomarker for the screening and monitoring of PCa (Gandaglia et al. 2019).However, elevated PSA does not necessarily indicate PCa and often leads to false positive results as well as overdiagnosis since it can also be seen in other benign lesions such as prostatitis or benign prostatic hyperplasia (BPH) (Inahara et al. 2006;Liu et al. 2019b).
Therefore, excavating reliable and effective biomarkers and learning hub genes involved in the biological process of PCa is urgently needed.
With the development of high-throughput sequencing technology and bioinformatic analysis methods, the Gene Expression Omnibus (GEO) online public database has been widely utilized to screen out differentially expressed genes (DEGs), to study molecular signals and their relations, and to aid in constructing gene regulatory networks (Clough & Barrett 2016).Up to now, by either analysis of a single dataset or integrated analysis of multiple datasets in GEO, several studies have dug out genes that exert important influence on the occurrence and progression of PCa, such as CDH1 (Fang et al. 2017), CDCA8 (Zhao et al. 2017), RPS21 (Fan et al. 2018), PIK3R1 (He et al. 2018), EPCAM (Lu 2019), LMNB1 (Song et al. 2019b), IGF2 (Tan et al. 2019), and IKZF1 (Tong et al. 2019).However, the key genes detected by the above studies are largely different from each other and had little in common, and such discrepancy could be attributed to the fact that PCa is Manuscript to be reviewed considered as a heterogenous disease as a whole (Liu et al. 2019a;Thomas & Pachynski 2018).
As a consequence, further studies in this regard is still required for the exploration and validation of key genes.
In 2018, Brouwer-Visser et al. studied the immune microenvironment in four solid cancer types including PCa and uploaded the corresponding gene expression profile dataset GSE103512 (Brouwer-Visser et al. 2018).For the first time, the present study employed the GSE103512 dataset to screen out the DEGs between 60 PCa and 7 matched normal prostate (NP) tissue, and a total of 252 DEGs were detected.The Gene Ontology (GO) functional annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of the DEGs were performed, following by the construction of the protein-protein interaction (PPI) network which could uncover the underlying molecular mechanisms involved in the development of PCa.The cytoHubba and MCODE plugins of the Cytoscape software was applied to identify the hub genes and functional modules in the constructed PPI network, respectively.The Cancer Genome Atlas (TCGA) database, the Oncomine database, and the results reported by former studies were utilized for validation of our outcomes.

Microarray data
The GSE103512 gene expression profile CEL files, which consisted of 60 PCa samples of different TNM staging and Gleason grading and 7 matched NP samples (Brouwer-Visser et al. 2018) Manuscript to be reviewed the CEL files were converted to their corresponding gene symbols based on the annotation information of the platform GPL13158 (Affymetrix HT HG-U133+ PM Array Plate), which contains 54,715 probes.

Data processing and screening of DEGs
The CEL files of GSE103512 were read using the affy package of the R programming language (Ver.3.6.0).Data preconditioning (including background adjustment, normalization, and summarization) were performed using the Robust Multi-array Average (RMA) method.The limma package of R was adopted to identify the DEGs between PCa and NP samples, with a |log 2 fold change| ≥ 1 and a P < 0.05 was deemed to be of statistically significance (Ritchie et al. 2015).The overall upregulated and downregulated DEGs information were saved for further analyses.

Functional and enrichment analyses of DEGs
The present study adopted the Database for Annotation Visualization and Integrated Discovery (DAVID; https://david.ncifcrf.gov/)to obtain GO annotation and KEGG pathway enrichment information of the DEGs identified previously (Dennis et al. 2003).Results with P < 0.05 with count number ≥ 2 were regarded as statistically significant.

PPI network construction and modules analyses
PPI analysis serves as an entry point for better interpretation of relationships between different proteins on a genome-wide scale, and might be helpful to provide novel insights into protein function explanation (Stelzl et al. 2005) Manuscript to be reviewed were subsequently submitted to the Cytoscape software for network visualization and hub gene identification, which was realized by the cytoHubba plugin.In addition, the Molecular Complex Detection (MCODE) plugin of Cytoscape was applied to identify potential functional modules in the PPI network with default parameters (degree cutoff ≥ 2, node score cutoff ≥ 2, K-core ≥ 2, and max depth = 100) (Bandettini et al. 2012).

Validation of hub genes
First, we performed a literature search regarding gene expression profiles of PCa in Pubmed and then extracted hub genes reported by the eligible studies, which was used to compare with those identified in the present study.Next, we examined the relative mRNA expression levels of the identified hub genes using the Gene Expression Profiling Interactive Analysis (GEPIA) and Oncomine online tools for further validation.The GEPIA (http://gepia.cancer-pku.cn/)provides comprehensive online services based on TCGA database, and we used the differential expression analysis and patient survival analysis functions in this case (Tang et al. 2017).The Oncomine online database (https://www.oncomine.org/)was applied to examine the mRNA expression of hub genes in both multiple cancers and their corresponding normal tissues (Rhodes et al. 2004).

Survival analysis of hub genes
Provided by the University of California Santa Cruz (UCSC), the UCSC Xena online tools (https://xenabrowser.net) enables researchers to study functional genomic datasets for correlations between genomic and/or phenotypic variables.In this case we used this tool to study whether the expression of hub genes was correlated with the survival of PCa patients from TCGA cohorts.Manuscript to be reviewed the median expression value of genes, and their overall survivals were analyzed using the Kaplan-Meier method with a log-tank test.A P < 0.05 was deemed as statistically significant.

Data processing and screening of DEGs
The PCa microarray expression dataset GSE103512 contained information of mRNA expression of 60 PC samples and 7 matched NP samples.A total of 19,918 official gene symbols were discerned and the gene expression matrix was constructed.The DEGs were filtered using the limma R package (criteria: adjusted P < 0.05 and |log 2 fold change| ≥ 1).A total of 252 DEGs were identified between PCa and NP samples, including 186 upregulated genes and 66 downregulated genes.The top 10 upregulated and downregulated DEGs based on fold changes are listed in Table 1.The volcano plot of all the genes detected and the cluster heatmap of the 252 DEGs were collected in Fig. 1.

Functional and pathway enrichment analyses of DEGs
GO function annotation of the identified DEGs was obtained using the DAVID database and its online analysis tool, which included the following 3 portions: biological process (BP), cell component (CC), and molecular function (MF).The results were deemed as statistically significant if P < 0.05, and the top 15 GO terms of the upregulated and downregulated DEGs are compiled in Table 2.As shown in Fig. 2A, the upregulated genes were mainly enriched in oxidation-reduction process (ontology: BP), extracellular exosome (ontology: CC), and calcium ion binding (ontology: MF).As shown in Fig. 2B, the downregulated genes were mainly enriched in positive regulation PeerJ reviewing PDF | (2019:06:38928:2:0:NEW 4 Sep 2019) Manuscript to be reviewed of transcription from RNA polymerase II promoter (ontology: BP), extracellular exosome (ontology: CC), and protein binding (ontology: MF).
The ensuing KEGG pathway analyses were also performed using the online analysis tool of DAVID database and the results were collected in Table 3.As shown in Fig. 2C, the upregulated DEGs enriched in 5 pathways: 1) metabolic pathways, 2) fatty acid metabolism, 3) arginine and proline metabolism, 4) PPAR signaling pathway, and 5) fatty acid biosynthesis.As shown in Fig. 2D, the downregulated DEGs enriched in 5 pathways: 1) protein digestion and absorption, 2) aldosterone-regulated sodium reabsorption, 3) carbohydrate digestion and absorption, 4) mineral absorption, and 5) endocrine and other factor-regulated calcium reabsorption.The KEGG pathways bubble plot of all DEGs was presented in Fig. 3.

PPI network construction and modules analyses
The STRING online database was applied to analyze the 252 identified DEGs and to construct a PPI network, which consisted of 181 nodes interacting with each other via 403 edges.The results were downloaded for further analysis by Cytoscape software.According to the descending order of degree value, the top 5 hub genes were subsequently screened, namely Kallikrein-related peptidase 3 (KLK3), cadherin 1 (CDH1), Kallikrein-related peptidase 2 (KLK2), forkhead box A1 (FOXA1), and epithelial cell adhesion molecule (EPCAM), as presented in Table 4. Furthermore, seven functional cluster modules were filtered from the PPI network using the MCODE plugin of the Cytoscape software, and the top 2 modules (Fig. 4) were selected for further KEGG pathway enrichment analyses using the DAVID database.As shown in Manuscript to be reviewed pathway; while the genes in Module 2 are mainly enriched in fatty acid metabolism, fatty acid biosynthesis, PPAR signaling pathway, and fatty acid degradation.

Validation of hub genes expression in multiple databases
As shown in Table 6, the literature search in Pubmed/GEO regarding gene expression profiles of PCa versus NP tissues yielded 10 relevant studies, 5 (Chen et al. 2012;Endo et al. 2009;Fan et al. 2018;Fang et al. 2017;He et al. 2018) of which were based on a single GEO dataset whereas the other 5 studies (Lu 2019;Song et al. 2019b;Tan et al. 2019;Tong et al. 2019;Zhao et al. 2017) were integrated bioinformatic analyses based on multiple GEO datasets.The hub genes reported by the 10 eligible studies were extracted and compared with those identified in the present study.
As marked in bold fonts in Table 6, there were some overlapped findings between our studies and other studies: our identified hub genes FOXA1, CDH1, and EPCAM were seen in 2 (Endo et al. 2009;Tan et al. 2019), 1 (Fang et al. 2017), and 1 (Lu 2019) other studies, respectively.
Next, we further validated our findings using data from TCGA.GEPIA was applied to determine the expression differences of hub genes between PCa and NP tissues.As shown in Fig. 5A, mRNA levels of KLK3, CDH1, FOXA1, and EPCAM were significantly upregulated in PCa samples compared with NP samples; but there was no significant difference in the mRNA level of KLK2 between PCa and NP samples.Besides, we investigate the prognostic values of hub genes using TCGA data and conducted survival analyses with UCSC Xena online tools.As suggested in Fig. 5B, all 5 hub genes had no statistically significant impact on PCa patients' overall survivals.Furthermore, we put the top 10 significantly upregulated and downregulated DEGs into survival analysis and found 2 genes had significant impact: the relative higher expression of ABCC4 Manuscript to be reviewed (upregulated) as well as relative lower expression of SLPI (downregulated) were associated with poorer overall survival.
Furthermore, an overview of hub genes expression in multiple types of cancers revealed that all 5 hub genes were remarkably overexpressed in PCa tissues compared with NP tissues according to the Oncomine database (Fig. 6).

Discussion
As one of the leading healthcare concerns worldwide, prostate cancer has become one of the most prevalent malignancies of adult males with an incidence of around 0.01% (Jemal et al. 2010).
Although numerous progress has been made to uncover the molecular mechanisms of PCa development and progression, the outcomes remain obscure and the inconsistencies among different published studies are obvious (Barbieri et al. 2017).The main reason for such phenomenon is generally thought to be the complexity and heterogeneity nature of PCa as a whole (Liu et al. 2019a;Thomas & Pachynski 2018).Thus, it's of vital significance to perform further research in this regard to validate and update the information acquired.
In current study, we first screened a total of 252 ( 186 Manuscript to be reviewed the high proliferation rate and nutritional requirement so as to constantly support the increased cell division (Chandel 2014;Sancho et al. 2016).For PCa, one of the most noticeable metabolic changes is, as our results indicated, the fatty acid (FA) or lipid metabolism (Deep & Schlaepfer 2016;Yang et al. 2016).The lipid biosynthesis is essential for membrane formation and cell signaling; for instance, the metabolic intermediates of de novo lipogenesis can serve as second messengers and regulate PCa migration and invasion (Ferro et al. 2017).Moreover, the lipid metabolism of PCa is closely related to androgen by androgen receptor (AR) signaling.For examples, AR signaling can elevate the uptake of exogenous lipids by PCa cells (Liu et al. 2010) as well as stimulate adipose tissues to release FA (O'Reilly et al. 2014).Furthermore, the peroxisome-activated receptors (PPARs) family participate participates greatly in metabolic regulation.For example, PPARγ is a nuclear FA receptor which can interact with androgen receptor (AR) and regulate growth of PCa (Olokpa et al. 2017).The Fatty acid-binding protein 5 (FABP5)-PPARγ signaling pathway promotes the malignant progression of castration-resistant PCa cells (Jing et al. 2000;Kawaguchi et al. 2016).Recently, the FABP5 inhibitor dmrFABP5 were illustrated to inhibit the FABP5-PPARγ pathway to suppress the malignant progression of Manuscript to be reviewed skin, mammary gland, prostate, colon, pancreas and brain, KLKs are a sort of serine proteases existed in the body fluids excreted by the above organs such as sweat, milk, and seminal fluid, involving in various physiological functions like electrolyte balance, extracellular matrix remodeling, and prohormone processing (Borgono et al. 2004;Shaw & Diamandis 2007).Aberrant expression of KLKs can be found in several kinds of malignancies including ovarian cancer (Loessner et al. 2018), breast cancer (Figueroa et al. 2018), gastrointestinal cancer (Kontos et al. 2013), and prostate cancer (Mavridis et al. 2014;McDonald & Parsons 2016).Further on, it was suggested that KLKs could lead to proliferation of the epithelium via protease-activated receptors (PARs), which might contribute to PCa development (Ramsay et al. 2008).Several studies have highlighted the association between KLK3 gene polymorphism and susceptibility to PCa (Chen & Xin 2017;Ding et al. 2018;Motamedi et al. 2019), and Zambon et al. successfully combined the KLK3 genetics analysis and free-to-total PSA ratio for PCa diagnosis (Zambon et al. 2012).
Similarly, KLK2 was reported to be potential marker for diagnosis because its polymorphism can increase vulnerability to PCa (David et al. 2002;Nam et al. 2006).And more notably, the loss of miR-378 (a regulator of KLK2) in PCa was found to be correlated with aggressive phenotype and short-term relapse events (Avgeris et al. 2014).Both KLK2 and KLK3 were reported to elevate the level of insulin-like growth factor (IGF) by cleaving the IGF-binding proteins, which consequently affects cell survival, mitogenesis, and differentiation.Analogously, KLK3 can cleave the TGFβ-binding proteins and activate TGFβ, resulting in cell proliferation (Borgono et al. 2004).Manuscript to be reviewed and the maintenance of cell differentiation as well as normal structure of epithelium (Mayer et al. 1993;Takeichi 1995).Pathologically, inactivation or absence of CDH1 expression can decrease intercellular junctions and thus promote cancer invasion and metastasis, as seen in diffuse gastric cancer (Melo et al. 2017) and lobular breast cancer (McVeigh et al. 2014).Contrary to expectation, our results indicated that CDH1 expression turned out to be significantly upregulated in PCa compared to NP tissues.Consistent with our results, another study using the GSE26910 dataset also found overexpressed CDH1 in PCa samples (Fang et al. 2017).It was hypothesized that overexpression of CDH1 might have impact on oncogenesis since previous research have indicated the intercellular adhesion between cancer and normal cells were enhanced in the late stage of tumor formation (Albelda 1993).Plus, polymorphism of CDH1 was also found to elevate the risk of PCa (Imtiaz et al. 2019;Qiu et al. 2009).Future work is merited to explore the expression changes of CDH1 in different stages of PCa.

The encoding product of CDH1 gene, namely E-cadherin, participates in cell-cell adhesion
Forkhead Box A1 (FOXA1), also known as hepatocyte nuclear factor 3α (HNF3α), is a transcription factor that plays important roles in development as well as cancer formation (Bernardo & Keri 2012).FOXA1 has multiple impact on PCa, including 1) controls the morphogenesis and cell differentiation of prostate (Gao et al. 2005); 2) facilitates AR transactivation which is essential for PCa proliferation and survival (Zhao et al. 2014); 3) inhibits PCa progress towards neuroendocrine prostate cancer (NEPC), whose prognosis is worse (Kim et al. 2017); 4) downregulates TGF-β signaling and thus suppresses castration-resistant prostate cancer (CRPC) progression (Song et al. 2019a).Our findings were validated by another expression profiling study (Endo et al. 2009), and the complexity of functions of FOXA1 still calls for further Manuscript to be reviewed research to fully interpret its role in PCa development and progression.
EPCAM encodes epithelial cell adhesion molecule, also known as CD326, is usually high expressed in cancer tissues and participates in intercellular adhesion limitation, cell signaling, migration, proliferation and differentiation (Maetzel et al. 2009).Consistent with our results, EPCAM is shown to be overexpressed in localized and metastatic PCa (Massoner et al. 2014).
Other integrated bioinformatic analysis using 4 GEO datasets also validated our findings (Lu 2019).
When it comes to validation of the identified hub genes, 3 of them (CDH1, FOXA1, and EPCAM) were validated by other studies using relevant GEO datasets; 4 genes (except KLK2) were validated by TCGA data; and all 5 genes were validated by the Oncomine data.This implied their potentials as effective and reliable biomarkers for diagnosis and as possible therapeutic targets.However, in survival analysis using TCGA cohorts, all hub genes turned out to have insignificant impact on PCa patients' overall survival (OS).To add readability, we put the top 10 upregulated and downregulated DEGs into the survival analysis, outputting 2 significant genes, namely ABCC4 and SLPI.ABCC4 (ATP-cassette binding protein 4), which was significantly upregulated as our results revealed, its relatively higher expression was associated with poorer OS of PCa patients.This negative effect of ABCC4 expression towards PCa might result from that ABCC4 could decrease the efficacy of docetaxel in treating PCa cells (Oprea-Lager et al. 2013).
Conversely, the relatively higher level of downregulated DEG SLPI (Stomatin-like protein 1) was related to better OS of PCa patients.Not much studies focused on SLPI.Intriguingly, it is revealed that overexpression of SLPI stabilizes F-box protein Fbw7-γ by inhibiting its degradation, which Manuscript to be reviewed is implicated in the degradation of oncogene c-Myc (Zhang et al. 2012).So SLPI might impact negatively on cell proliferation of PCa through the above mechanism.Therefore, these 2 genes might hopefully serve as prognostic indicators for PCa patients.
We also conducted a module analysis on the existing PPI network and selected the top 2 significant modules for the ensuing KEGG pathway enrichment analysis of the genes contained.
The results demonstrated that the genes in Module 1 are mainly enriched in adherens junction and Hippo signaling pathway; while the genes in Module 2 are mainly enriched in fatty acid metabolism, fatty acid biosynthesis, PPAR signaling pathway, and fatty acid degradation.
Apparently, the modules KEGG analyses were quite conformed to the overall KEGG results.The FA metabolism related pathways, PPAR signaling pathway, and cell adhesion-related hub genes and pathways were already discussed above.It was demonstrated that he activation of Hippo signaling pathway regulates cell growth and proliferation by inhibiting YAP and TAZ transcription co-activators, and its dysregulation impacts significantly on development of cancers including colon, liver, breast, lung, ovary, and prostate (Salem & Hansen 2019;Yu et al. 2015).

Conclusions
By bioinformatic analyses including GO/KEGG enrichment, PPI network, hub gene identification, and module analysis, the current study validated KLK3, CDHI, FOXA1, and EPCAM might potentially serve as effective and reliable molecular biomarkers for diagnosis of PCa; and ABCC4 and SLPI might be utilized as prognostic indicators of PCa.However, further basic and clinical researches are necessary for the verification of the clinical value of our findings.Manuscript to be reviewed Manuscript to be reviewed An overview of mRNA levels of hub genes in various types of cancer based on Oncomine data.
The numbers in colored grids shows the counts of datasets with statistically significant mRNA overexpression (red) or low expression (blue) of genes.Grid color was determined by the best gene rank percentile for analysis within the grids.The threshold was set as gene rank percentile = All, p = 0.05, and FC = 2. Manuscript to be reviewed Manuscript to be reviewed Manuscript to be reviewed Manuscript to be reviewed Manuscript to be reviewed Top 5 hub genes in the protein-protein interaction (PPI) network.

Figure 5
Figure 5 . PPI relationships of DEGs was analyzed and the corresponding PPI network was constructed by the Search Tool for the Retrieval of Interacting

Table 2 (on next page)
Top 15 enriched gene ontology terms of the upregulated and downregulated DEGs.

Table 3 (on next page)
Top 5 enriched pathways of the upregulated and downregulated DEGs.