Bioinformatics analysis for the identification of differentially expressed genes and related signaling pathways in H. pylori-CagA transfected gastric cancer cells

Aim Helicobacter pylori cytotoxin-associated protein A (CagA) is an important virulence factor known to induce gastric cancer development. However, the cause and the underlying molecular events of CagA induction remain unclear. Here, we applied integrated bioinformatics to identify the key genes involved in the process of CagA-induced gastric epithelial cell inflammation and can ceration to comprehend the potential molecular mechanisms involved. Materials and Methods AGS cells were transected with pcDNA3.1 and pcDNA3.1::CagA for 24 h. The transfected cells were subjected to transcriptome sequencing to obtain the expressed genes. Differentially expressed genes (DEG) with adjusted P value < 0.05, — logFC —> 2 were screened, and the R package was applied for gene ontology (GO) enrichment and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. The differential gene protein–protein interaction (PPI) network was constructed using the STRING Cytoscape application, which conducted visual analysis to create the key function networks and identify the key genes. Next, the Kaplan–Meier plotter survival analysis tool was employed to analyze the survival of the key genes derived from the PPI network. Further analysis of the key gene expressions in gastric cancer and normal tissues were performed based on The Cancer Genome Atlas (TCGA) database and RT-qPCR verification. Results After transfection of AGS cells, the cell morphology changes in a hummingbird shape and causes the level of CagA phosphorylation to increase. Transcriptomics identified 6882 DEG, of which 4052 were upregulated and 2830 were downregulated, among which q-value < 0.05, FC > 2, and FC under the condition of ≤2. Accordingly, 1062 DEG were screened, of which 594 were upregulated and 468 were downregulated. The DEG participated in a total of 151 biological processes, 56 cell components, and 40 molecular functions. The KEGG pathway analysis revealed that the DEG were involved in 21 pathways. The PPI network analysis revealed three highly interconnected clusters. In addition, 30 DEG with the highest degree were analyzed in the TCGA database. As a result, 12 DEG were found to be highly expressed in gastric cancer, while seven DEG were related to the poor prognosis of gastric cancer. RT-qPCR verification results showed that Helicobacter pylori CagA caused up-regulation of BPTF, caspase3, CDH1, CTNNB1, and POLR2A expression. Conclusion The current comprehensive analysis provides new insights for exploring the effect of CagA in human gastric cancer, which could help us understand the molecular mechanism underlying the occurrence and development of gastric cancer caused by Helicobacter pylori.

and EPIYA-D are preferably phosphorylated in combination in Western CagA and East Asian CagA, respectively.Therefore, there may be a stepwise event in which EPIYA-C or EPIYA-D is phosphorylated by SFKs at the start of an infection followed by phosphorylation of EPIYA-A or EPIYA-B by c-Abl at a subsequent time .Deregulation of SHP2, the prooncogenic PTPase involved in the regulation of cell growth, motility, and morphology.East Asian CagA exhibits a stronger ability to bind/deregulate SHP2 and a greater capability to induce SHP2-dependent morphological changes in gastric epithelial cells than Western CagA. Collectively, the findings reveal that the East Asian CagA-specific EPIpYA-D motif is qualitatively very different from the Western CagA-specific EPIpYA-C motif in terms of the biological activity required for deregulation of the SHP2 oncopro-tein, which may causatively account for the higher incidence of gastric cancers in East Asian countries than in Western countries (Takahashi-Kanemitsu, Knight & Hatakeyama, 2020). CagA affects the proliferation and apoptosis of cells through various regulation and signaling pathways, ultimately promoting gastric mucosal carcinogenesis (Takahashi-Kanemitsu, Knight & Hatakeyama, 2020).
Past studies have demonstrated that the non-physiological scaffolding of CagA in cells promote the malignant transformation of normal cells by conferring onto them cancer markers with multiple phenotypes. In chronic inflammation, CagA's in vivo carcinogenic activity is further enhanced. Because H. pylori infection triggers a pro-inflammatory response in the host cell, the resultant feed-forward stimulation loop enhances the carcinogenic effects of CagA and cause inflammation in the gastric mucosa, where CagA is injected. Considering the need for clarification on these aspects, we attempted to explore the molecular mechanisms of CagA-induced gastric epithelial cells to seek effective molecular targets in order to provide a basis for early clinical diagnosis, prevention, and treatment of gastric cancer (Cover, 2016). Then, we applied integrated bioinformatics to identify the key genes involved in the process of CagA-induced gastric epithelial cell inflammation and canceration to comprehend the potential molecular mechanisms involved.

pcDNA3.1::CagA plasmid vector transfection of AGS cells
The CagA plasmid pcDNA3.1(+)/cagA and the empty vector pcDNA3.1(+)/EGFP were purchased from Nobel Biotech (Shanghai, China). AGS cells were obtained from the ATCC. AGS cells were incubated in RPMI-1640 medium (Gibco, Grand Island, NY, USA) supplemented with 10% heat-inactivated fetal bovine serum (Gibco), 100 U/ml of penicillin, and 100 g/ml of streptomycin at 37 • C in a humidified incubator (NSE, Brunswick, NJ, USA) containing 5% CO 2 . AGS cells were seeded in 6-well plates respectively at a density of 5 × 10 6 cells/well, grown to whose confluence reached at 60-70%, then the cells were transfected with 3 µg plasmid and 5 µl Lipofectamine 2000 (Invitrogen, USA) in 125 µl Opti-MEM TM medium (Gibco, USA) followed by the addition of 1,875 µl Opti-MEM TM medium according to the manufacturer. After 24 h the transfection efficiency was evaluated by observation under a fluorescence microscope, and the relevant cell samples were collected. The CagA expression was verified by western blotting.

Differential gene collection and screening
The vectors pcDNA3.1::CagA and pcDNA3.1 were transfected into AGS cells respectively. After 24 h, cell samples were collected and sent to NOVOgene (Beijing, China) transcriptome for sequencing to obtain the differentially expressed genes between the two. By adjusting P < 0.05, | logFC |>2, the genes with significant differences are listed.

Analysis of gene ontology (GO) enrichment and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway of differentially expressed genes (DEG)
The GO enrichment analysis is a commonly used method for large-scale functional analysis. The gene functions can be classified as biological processes (BP), molecular functions (MF), and cellular components (CC). KEGG is a widely used database that stores the information on a large number of related genomes, biological pathways, diseases, chemicals, and drugs. We applied the R package with data package, visualization, and integrated discovery to perform GO enrichment analysis and KEGG pathway analysis on the DEG in this study, with P < 0.05 considered as statistically significant, and passed the ''ggplot2'' of R package to visually generate histograms and lists (Tang et al., 2020).

PPI network construction and key cluster identification
The PPI networks of DEG were constructed using the STRING database (Gu et al., 2019) (http://string-db.org), which is a software application that is commonly used to identify interactions, assess potential PPI relationships, and identify previously determined differences. Briefly, the DEG were mapped into the STRING database. The PPI networks were then visualized by the Cytoscape software (Gu et al., 2019) (https://cytoscape.org/). The software predicts the network, with each node as a gene. The network visualization helps identify the interactions and pathway relationships among the proteins encoded by DEG in gastric cancer. The corresponding protein in the central node could be a core protein or a key candidate gene with important physiological regulatory functions. According to the Cytoscape visualization network of molecular interactions, the Molecular Complex Detection (MCODE) plug-in is used to identify densely interconnected clusters, based on the following selection criteria: degree ≥ 2, node score ≥ 0.2, K-core ≥ 2, max depth = 100 (Li, 2019).

Selection of key genes and their expression analysis in Hp infection status
The top 30 central genes with the most connections in the PPI network are defined as key genes. The differential expression of Hp infection and uninfected tissues was analyzed with reference to the TCGA database (P < 0.05 is considered to indicate statistical significance).

Survival analysis of the key genes
The Kaplan-Meier plotter database (Ma, Zhou & Zheng, 2020) (http://www.kmplot.com) is an online tool that can be used to evaluate 54,675 genes under the conditions of 10,461 cancer samples. We used this database to perform a survival analysis (P < 0.05 was considered to indicate statistical significance). Functional enrichment analysis of the genes that were highly expressed in gastric cancer was performed.

Identification of RT-qPCR
The cagA gene knockout mutant strain Hp/cagA cm was constructed by Sangon Biotech( Shanghai, China). AGS cells were seeded in 6-well plates respectively at a density of 5 × 10 6 cells/well, grown to whose confluence reached at 60-70%, then the cells were infected with Hp/ cagA::Cm and Wild type Hp/cagA + with a multiplicity of infection (MOI) of 30, respectively. 24 h later, these cells were harvested to investigate BPTF, CASP3, CDH1, CTNNB1 and POLR2A mRNA levels by RT-qPCR.

Western blot
The total protein was extracted according to the instructions of the lysate kit. After quantification by the BCA protein quantification kit, SDS-PAGE electrophoresis for 2 h, membrane transfer for 2 h, 1XTBST (0.05% Tween20) solution containing skimmed milk powder was blocked at room temperature for 2 h, and CagA primary antibody was added (1: 1,000), p-CagA (1: 1,000), GAPDH (1: 5,000) Incubate overnight at 4 • C, wash the membrane with 1×TBST (0.05% Tween20) solution 3 times, 10 min/time, add two Incubate at room temperature for 2 h with anti (1 :10,000), wash the membrane with 1×TBST as above, add a chemiluminescence reagent for color development, and expose and image with a chemiluminescence imager.

Transfection of pcDNA3.1::CagA plasmid into AGS cells and verification by western blotting
After 24 h of transfection of pcDNA3.1::CagA, The efficiency of fluorescent transfection is estimated to be >70% (Fig. 1A). The morphology of the cells was observed under the microscope. It was found that compared with the control group(Control) and the empty vector group(pcDNA3.1), the morphology of the cells in the cagA transfection group(cagA) changed significantly. The shape of the cell changed from obtuse to long fusiform, spindle-shaped, irregular, and the polarity of the cell disappeared, showing a 'hummingbirdchange' (Fig. 1B). Western blot verification showed that CagA and p-CagA protein was successfully expressed in pcDNA3.1::cagA transfected group (Fig. 1C).

GO term enrichment analysis of DEG
Using the R package with data package, visualization, and integrated discovery, GO enrichment analysis was performed on 1062 DEG with different meanings. Our results revealed that 151 DEG participated in BP, 56 in CC, and 40 in MF. With respect to BP, the DEG were significantly enriched in the mRNA catabolic process, covalent chromatin modification, and histone modification. With respect to CC, they were mainly enriched in focal adhesion, cell-substrate adherens junction, and cell-substrate junction. With respect to MF, they were mainly enriched in cadherin binding, cell adhesion molecule binding, and ubiquitin-protein transferase activity (Fig. 3, Table 2).

KEGG pathway analysis of DEG
Using the R package with data package, visualization, and integrated discovery, KEGG enrichment analysis was performed on 1062 DEG with different meanings. Our results revealed that a total of 21 pathways were enriched, mainly ribosome, ubiquitin-mediated proteolysis, and cancer pathways (Fig. 4, Table 3).

Construction of PPI network and identification of key genes
STRING and Cytoscape analyses identified a total of 845 DEG participating in the PPI network, with 5,571 edges ( Fig. 5A), 471 upregulation, and 374 downregulation. Through the MCODE plug-in, the first three densely interconnected clusters of the PPI network were analyzed. Cluster 1 consisted of 67 nodes and 1,098 edges. The enrichment results indicated that the genes included in Cluster 1 of the PPI were mainly enriched in the terms extracellular exosome'' and ''poly(A) RNA binding''. Cluster 2 was composed of 20 nodes and 13 edges. The enrichment results indicated that the genes included in Cluster 2 were mainly enriched in the terms ''nuclear-transcribed mRNA catabolic process'' and ''acetylation''. Cluster 3 was composed of 15 nodes and 92 edges. The enrichment results indicated that the genes included in Cluster 3 were mainly enriched in the terms

Key gene expression analysis in Hp infection status
The DEG identified in the PPI network (≥53) was analyzed in the TCGA database to assess the correlation with Helicobacter pylori infection. A total of 14 DEGs were highly expressed in positive Helicobacter pylori infection (P < 0.

Survival analysis of key genes
The Kaplan-Meier plotter bioinformatics analysis platform was used to investigate the prognostic value of genes in 14 potential centers, including data from 875 gastric cancer patients for overall survival analysis. Our results show that under high expression (P < 0.05), a total of 7 genes are associated with poor prognosis of gastric cancer (P < 0.05), namely ATM, BPTF, CDH1, POLR2A, RNP1, BPL30 and RPS27 (Figs. 7A-7G).

DISCUSSION
The development of gastric cancer is an extremely complicated biological process, involving the abnormal expression of various tumor-related genes, activation of various tumorrelated pathways, and inactivation of tumor suppressor genes. The causative gene is silent and inactive. In fact, evidence prove that the tumor is induced by genetic and epigenetic changes (Belinsky, 2004;Herman & Baylin, 2003;Jones & Baylin, 2002). Helicobacter pylori is closely related to gastric cancer, and Helicobacter pylori CagA is involved in multiple cellular processes related to carcinogenesis (Hatakeyama, 2017). In combination with public biological databases (such as GO and KEGG), the development of a high-throughput detection technology would facilitate systematic exploration of a list of DEG throughout the genome (Ma, Zhou & Zheng, 2020) and comb through the related BP. The application of informatics provides a good means to comprehend the mechanisms of occurrence and development of gastric cancer at the molecular level.
In this study, we compared 1062 genes with significant differences between the pcDNA3.1::CagA and pcDNA3.1 group via bioinformatics. Of these genes, 594 were upregulated and 468 were downregulated. Functional enrichment revealed that these genes participated in multiple signaling pathways, including the Notch signaling pathway and Wnt signaling pathway. The notch signaling pathway is a signal transduction system that repeatedly regulates cell proliferation and apoptosis. We found that the Notch signaling pathway was closely related to cell differentiation, proliferation, apoptosis, adhesion, and the transformation of epidermal cells into the mesenchyme; this pathway is essential for the normal development of most tissues (Leong & Karsan, 2006;Luo, Renault & Rando, 2005;Maillard, Fang & Pear, 2005;Zanotti & Canalis, 2016). Past studies have demonstrated that this pathway plays an important role in regulating the cell cycle as well (Bhattacharya et al., 2017;Herranz & Milán, 2008;Seidel & Kimble, 2015). In a large number of hematopoietic and solid tumors, the Notch pathway undergoes genetically alteration. The activation or inhibition of the pathway depends on the background and  the activation status of other potential oncogenic pathways. There are several different patterns of abnormal regulatory pathways and their targets in cancer (Ranganathan, Weaver & Capobianco, 2011;Vasquez-Del Carpio et al., 2011;Weaver et al., 2014). These pattern include the activation and inactivation mutations, receptor/ligand overexpression, epigenetic regulation, and the effects of post-translational modifications (Wang et al., 2007). Wnt is a secreted glycoprotein that can regulate diverse biological functions (MacDonald, Tamai & He, 2009). Wnt signaling is one of the main regulators of embryonic development, tissue renewal, and regeneration in multicellular organisms (Sidrat et al., 2020; Tepekoy, Akkoyunlu & Demir, 2015). This signaling pathway controls several aspects of the development process, including cell proliferation, apoptosis, cell migration, and cell polarity during the development and maintenance of adult stem cells. Cell proliferation and apoptosis are often associated with tumor formation and development (Bordonaro, 2020;Foulquier et al., 2018;Yang et al., 2016). Inappropriate activation of the Wnt pathway is also a major factor influencing the human carcinogenesis (Martin-Orozco et al., 2019) involving 13 enriched genes. The PPI network analysis provided the interaction network with 845 genes, and the first 3 clusters with a high correlation were analyzed through the MCODE plug-in. Cluster 1 genes mainly participated in the extracellular exosome pathway, cluster 2 genes mainly participated in nuclear-transcribed mRNA catabolic processes, and cluster 3 genes were mainly involved in transcription. Some of the past studies have demonstrated that extracellular exosomes are involved in the development of tumors. The results of GO enrichment in these clusters indicate their partial relationship to tumors, suggesting that the signal molecules regulated by the Oriental strain CagA may participate in the possible molecular mechanism of tumor development.
The 30 key genes with the highest screening in the PPI network were analyzed through data, and 14 genes were highly expressed in Helicobacter pylori-positive gastric cancer patients (according to the TCGA database analysis, including ATM, BPTF, CDH1, CTNNB1, HSPA8, HDAC1, POLR2A), ISG15, RPL8, RNP1, RPL30, RPS27, RUVBL1 and CASP3). Finally, use the Kaplan-Meier plotter tool to predict the relationship between    them and the poor prognosis of the patient. We have noticed that the high survival rate of these 7 genes is very low, which is related to the poor prognosis of gastric cancer, including genes ATM, BPTF, CDH1, POLR2A, RNP1, BPL30 and RPS27. The enrichment analysis of these 7 genes showed that they are related to the binding of P53, the binding of transcription factors and transcriptional regulation. After verification by RT-qPCR, the results showed that CagA of Helicobacter pylori only caused the up-regulation of 5 genes, including BPTF, CASP3, CDH1, CTNNB1 and POLR2A. Compared with survival analysis, BPTF, CDH1 and POLR2A have high gene expression and low survival rate. Past studies have reported that CDH1 gene mutations are associated with diffuse gastric cancer. This gene encodes E-cadherin, a transmembrane cadherin, and cell adhesion molecules that depend on this gene are involved in the formation of cell junctions and the maintenance of epithelial integrity (Cho et al., 2017;Figueiredo et al., 2019;Li, 2019;Van der Post et al., 2015). CDH1 is involved in mediating cell adhesion, migration, epithelial cell proliferation and cell cycle (Han et al., 2019;Pal et al., 2020). CDH1 germline mutations are associated with the encoded tumor suppressor protein E-cadherin, which is the genetic cause of hereditary diffuse gastric cancer (Van der Post et al., 2015). Among the other seven genes, BPTF is the core subunit of the nucleosome remodeling factor (NURF) complex and plays an important role in chromatin remodeling. This gene can directly activate oncogenic signals or coordinate activation with other key protein factors, thereby affecting tumor progression (Zhao et al., 2019). Human POLR2A encodes the highly conserved RPB1 protein, which is the largest of the 12 subunits of the essential RNA polymerase II (pol II) enzyme. This protein complex is responsible for the transcription of pol II encoded by all proteins. Further studies have shown that the sustained release of pol II bound to the promoter, the truncated RPB1 encoding and the shortened C-terminal domain will affect is needed to clarify the underlying mechanism. The rise and development of the field of bioinformatics has accelerated the development of biology. Bioinformatics tools provide opportunities to deal with big data that cannot be managed manually (Wroblewski & Peek Jr, 2016) CONCLUSION DEG of the H. pylori CagA plasmid group and the empty vector (negative control) group were obtained via high-throughput sequencing, followed by bioinformatics analysis using the R software, Cytoscape, and related databases. For this purpose, first, 1062 DEG with statistical significance were identified, of which 594 were upregulated and 468 were downregulated. GO enrichment and KEGG pathway analysis revealed that DEG was mainly enriched in the Wnt pathway, Notch pathway, Adhesive connection, and other pathways in cancer. To provide a theoretical basis for studying the biological processes of gastric cancer, we successfully constructed DEG PPI network, screened out 30 key genes with a relatively high degree, and further studied the network to understand the interaction among DEG. Comprehensive analysis of TCGA database, RT-qPCR and Kaplan-Meier plotter showed that Helicobacter pylori CagA can cause the up-regulation of genes BPTF, CDH1, POLR2A, and their high expression is attributable to poor clinical results. Through data analysis, these genes may be induced and regulated by Helicobacter pylori CagA. These findings enable us to understand the downstream target gene molecules and signal pathways regulated by Helicobacter pylori CagA, and provide a theoretical basis for studying the mechanism of Helicobacter pylori CagA. The target genes and signal pathways obtained in this study are related to the occurrence and development of tumors. These findings enable us to further explore and understand the basic molecular mechanism of Helicobacter pylori CagA regulating the tumorigenesis and development of target genes and signaling pathways.