Integrated bulk and single-cell RNA sequencing identifies an aneuploidy-based gene signature to predict sensitivity of lung adenocarcinoma to traditional chemotherapy drugs and patients’ prognosis
- Published
- Accepted
- Received
- Academic Editor
- Fanglin Guan
- Subject Areas
- Bioinformatics, Molecular Biology, Immunology, Oncology
- Keywords
- Aneuploidy, Bulk RNA sequencing, Single-cell RNA-sequencing, Prognostic signature, Immunotherapy, Lung adenocarcinoma
- Copyright
- © 2024 Wang et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits using, remixing, and building upon the work non-commercially, as long as it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
- Cite this article
- 2024. Integrated bulk and single-cell RNA sequencing identifies an aneuploidy-based gene signature to predict sensitivity of lung adenocarcinoma to traditional chemotherapy drugs and patients’ prognosis. PeerJ 12:e17545 https://doi.org/10.7717/peerj.17545
Abstract
Background
Patients with lung adenocarcinoma (LUAD) often develop a poor prognosis. Currently, researches on prognostic and immunotherapeutic capacity of aneuploidy-related genes in LUAD are limited.
Methods
Genes related to aneuploidy were screened based on bulk RNA sequencing data from public databases using Spearman method. Next, univariate Cox and Lasso regression analyses were performed to establish an aneuploidy-related riskscore (ARS) model. Results derived from bioinformatics analysis were further validated using cellular experiments. In addition, typical LUAD cells were identified by subtype clustering, followed by SCENIC and intercellular communication analyses. Finally, ESTIMATE, ssGSEA and CIBERSORT algorithms were employed to analyze the potential relationship between ARS and tumor immune environment.
Results
A five-gene ARS signature was developed. These genes were abnormally high-expressed in LUAD cell lines, and in particular the high expression of CKS1B promoted the proliferative, migratory and invasive phenotypes of LUAD cell lines. Low ARS group had longer overall survival time, higher degrees of inflammatory infiltration, and could benefit more from receiving immunotherapy. Patients in low ASR group responded more actively to traditional chemotherapy drugs (Erlotinib and Roscovitine). The scRNA-seq analysis annotated 17 cell subpopulations into seven cell clusters. Core transcription factors (TFs) such as CREB3L1 and CEBPD were enriched in high ARS cell group, while TFs such as BCLAF1 and UQCRB were enriched in low ARS cell group. CellChat analysis revealed that high ARS cell groups communicated with immune cells via SPP1 (ITGA4-ITGB1) and MK (MDK-NCl) signaling pathways.
Conclusion
In this research, integrative analysis based on the ARS model provided a potential direction for improving the diagnosis and treatment of LUAD.
Introduction
Lung adenocarcinoma (LUAD), as the main type of lung cancer (Sung et al., 2021; Schabath & Cote, 2019), has become a major cause of cancer-related mortality globally (Siegel et al., 2023). The strong proliferation and invasion ability of LUAD cells is closely related to the poor survival rate of the cancer (Siegel, Miller & Jemal, 2019; Liu et al., 2021). Currently, chemotherapy remains still one of the primary therapy for LUAD but its efficacy is greatly restricted by patients’ resistance (Zhang et al., 2020). For this reason, from 2015, the World Health Organization has encouraged to identify molecular markers to help diagnose LUAD (Nicholson et al., 2022; Thai et al., 2021).
Aneuploidy is a form of abnormality in the number of chromosomes. As massive genetic mutation, aneuploidy enables cells to adjust to changes in internal and external environment, and facilitates tumor occurrence by promoting genetic instability, diversity, and evolution (Ben-David & Amon, 2020). Around half of blood cancers and 90% of solid tumors exhibit different forms of aneuploidy, which has been found to be associated with tumor progression and unfavorable prognosis (Xian et al., 2021; Stopsack et al., 2019). Aneuploidy triggers stress response in tumor cells, initiating or promoting the imbalance of local immune cells to increase the risk of immune escape (Giam & Rancati, 2015). Based on aneuploidy-related genes, Liu et al. (2023b) identified three molecular subtypes of head and neck squamous cell carcinomas and developed a risk model predictive of the prognosis of the cancer. Jia et al. (2021) discovered that a higher aneuploidy score was related to more limited clinical benefits from radiotherapy in non-small cell lung cancer (Yang et al., 2021). Taylor et al. (2018) disclosed loss of chromosome 3p and gains of chromosome arm1q and 3q using lung cancer dataset. However, using aneuploidy-related gene markers to facilitate the diagnosis or treatment of LUAD have been rarely studied.
High-throughput transcriptomics enabled by bulk RNA determines average gene expression profiles of all the cells in a cluster, and signal-cell RNA sequencing (scRNA-seq) clarifies the heterogeneity of cell types (Kodous, Balaiah & Ramanathan, 2023). Recently, scRNA-seq research has been widely applied in many research fields including in the studies of tumor ecosystems (Yan et al., 2021; Hornburg et al., 2021), immune-related diseases (Lin et al., 2022; Wang et al., 2021) and neuroscience (Galiakberova et al., 2023). Analysis on the variations and regulatory mechanism of cell subpopulation specificity at single-cell level can help better understand the pathological and physiological processes underlying LUAD development.
The use of scRNA-seq technology contributes to the development of cancer diagnosis, personalized therapeutic strategies as well as how to combat treatment resistance (Kinker et al., 2020). Thus, in this work, we systematically explored the intra-cancer cell heterogeneity of LUAD samples by integrating bulk and scRNA-seq data. An independent prognostic model was developed based on the foundation of aneuploidy-related prognostic genes from bulk transcriptome data of LUAD. The prognostic significance of the signature was confirmed using separate queues. The regulation of malignant phenotype of LUAD by the model genes was confirmed with cellular experiments. Immunotherapy for different LUAD patients and sensitive traditional chemotherapy drugs were analyzed. A total of seven cell clusters were annotated using scRNA-seq data, and five of them were immunocytes. Based on the aneuploidy-related riskscore (ARS) model, tumor subpopulations were stratified into high and low ARS cell groups. Regulons based on TF-target and communication signaling pathways between cells differed between the two cell groups. In short, the current discoveries provided a novel direction for the diagnosis and treatment of LUAD.
Materials and Methods
Study source and data preprocessing
A pan-cancer genomic aneuploidy score was downloaded from a previous study (Taylor et al., 2018). After excluding the aneuploidy score of other cancer samples, The Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD) samples with aneuploidy score were retained for further study (Table S1). Subsequently, the RNA-Seq data of TCGA-LUAD were acquired from the The Cancer Genoma Atlas GDC application programming interface (API) website. For validation, GSE31210 (Okayama et al., 2012) and GSE30219 (Rousseaux et al., 2013) data were downloaded from the GEO database. For TCGA-LUAD samples, samples with clinical information and survival time >0 days were retained but those without survival status were removed. Finally, a total of 483 LUAD samples were compiled for further analysis (Table S2). For GEO data, samples without survival status were deleted, and mean expression value of the genes was adopted when multiple probes matched to one gene. All the data used in this article were derived from publicly available databases without access restriction and therefore did not require additional ethical approval. Data collection and analysis were performed strictly following the relevant rules and regulations.
Selection of differentially expressed genes (DEGs) related to aneuploidy and functional annotation
Differential gene analysis between tumor and adjacent tissue samples was conducted using the TCGA-LUAD database. False discovery rate (FDR) <0.05 and |log2FC| > 1 were the screening criteria. At the same time, the correlation between DEGs and aneuploidy was analyzed using Spearman method (R > 0.3 and p < 0.05). Finally, the clusterprofiler package (Yu et al., 2012) was employed to perform Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis on these aneuploidy-related DEGs.
Developing a riskscore model based on aneuploidy-related genes
The TCGA dataset and the GEO dataset served as a training and a validation set, respectively. Prognostic genes for LUAD were screened from all the aneuploidy-related DEGs using univariate Cox analysis, followed by performing Lasso regression to shrink the number of genes in the risk model. The risk score for each patient was calculated using the following Eq. (1):
(1) where Expi represented the expression of prognosis-related gene, β represented the Cox regression coefficient of the corresponding gene.
Samples with ARS score > 0 after zscore transformation were assigned into high-ARS group, while those with ARS score < 0 were assigned into low-ARS group. Survival curves were plotted using Kaplan-Meier for prognosis analysis. Additionally, the log-rank test was used to assess the significance of the differences.
Immune landscape analysis
To comprehensively assess the immune cell composition and activity in LUAD, we used two complementary computational approaches (ssGSEA and CIBERSORT). Based on the reference dataset from Charoentong et al.’s (2017) study ssGSEA was used to calculate the score of 28 immune cells in each sample. In addition, CIBERSORT is a more precise quantification method that estimates the proportion of 22 immune cells using a predefined set of immune cell-specific marker genes (Chen et al., 2018).
Immunotherapy and drug sensitivity evaluation
An immunotherapy cohort (IMvigor210) was included to evaluate the influence of ARS on predicting immunotherapy sensitivity. The IMvigor210 cohort of patients demonstrated different degrees of responses to PD-L1 receptor blockers, including progressive disease (PD), stable disease (SD), partial response (PR), and complete response (CR). In addition, pRRophic package (Geeleher, Cox & Huang, 2014) was adopted to estimate potential sensitive chemotherapy drugs between two ARS groups in the TCGA queue. A lower half maximal inhibitory concentration (IC50) value indicated a higher sensitivity to drugs.
Single-cell RNA sequencing analysis of LUAD samples
The GSE203360 dataset (He et al., 2022) including six LUAD samples was used for scRNA sequencing analysis. The R package Seurat (Stuart et al., 2019) (v4.0.4) was adopted for following analysis. Cells were primarily filtered based on the criteria that each gene expressed at least in three cells and each cell contained more than 200 expressed genes. Cells with mitochondrial genes >25% were excluded to delete aging cells with limited biological significance. Subsequently, dimensionality reduction clustering analysis was performed employing principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), FindNeighbor and FindCluster function. Based on cell markers from the CellMarker2.0 website, initial subpopulations were reclassified according to high-expressed marker genes. The DEGs in annotated subgroups were screened by FindAllMarkers function, followed by KEGG enrichment analysis.
Screening of differential regulons and cell communications in tumor cells divided by ARS
Cancer cells were extracted from scRNA-seq data. The ARS score of each cell was calculated using the Eq. (2). Thereafter, these cancer cells were stratified into high and low ARS cell groups under the same grouping criteria used in TCGA-LUAD queue. The human RcisTarget database was downloaded from website (https://resources.aertslab.org/cistarget/). R package “SCENIC” (Van de Sande et al., 2020) was employed for the development of TF regulatory networks. A potential TF-target regulatory relationship was established first and then motif analysis was used to eliminate genes indirectly regulated by TF. A regulon is a set of TFs and their directly regulated target genes. The connections among the regulons were analyzed using connection specific indicators (CSI). CellChat software (Jin et al., 2021), a free online R package (https://GitHub.com/sqjin/CellChat), was applied to infer intercellular communications and select predominant receptor–ligand pairs.
Cell culture and transient transfection
Human LUAD cell lines including H1299, A549 and human normal bronchial epithelial cell BEAS-2B were commercially obtained from BNCC (Beijing, China). DMEM F12 with 10% FBS (Gibco, Thermo Fisher, Massachusetts, USA) was used for culturing A549 and BEAS-2B cells in a humidified environment at 37 °C with 5% CO2. The H1299, A549 cell lines were transfected with CKS1B siRNA (Dharmacon, Lafayette, CO, USA) utilizing Lipofectamine RNAi Max (Invitrogen, Waltham, MA, USA). The target sequence for CKS1B siRNA was CGCACAAACAAATTTACTATTCG (5′–3′).
QRT-PCR
Total RNA was separated using TRIzol (Thermo Fisher, Waltham, MA, USA) reagent. QRT-PCR was conducted on the extracted RNA from each sample (2 μg) using FastStart Universal SYBR Green Master on a LightCycler 480 PCR System (Roche, Switzerland, USA). The cDNA consisted of a total reaction volume of 20 μl (an appropriate water volume, 0.5 μl of forward and reverse primers, 2 μl of cDNA template, and 10 μl of PCR mixture) and served as a template. PCR reaction was reacted following the cycling conditions that began with initial DNA denaturation phase at 95 °C for 30 s, followed by 45 cycles at 94 °C for 15 s, at 56 °C for 30 s, and at 72 °C for 20 s. Each sample was run in triplicate. Threshold cycle (CT) data were obtained and standardized to the level of GAPDH applying the 2−ΔΔCT method. The mRNA expression in both tumor and normal control tissues was compared. The sequences of primer pairs for the genes were listed in Table 1.
Gene | Forward primer sequence (5′-3′) | Reverse primer sequence (5′-3′) |
---|---|---|
PLK1 | GCACAGTGTCAATGCCTCCAAG | GCCGTACTTGTCCGAATAGTCC |
ANLN | CAGACAGTTCCATCCAAGGGAG | CTTGACAACGCTCTCCAAAGCG |
HMMR | GGCTGGGAAAAATGCAGAGGATG | CCTTTAGTGCTGACTTGGTCTGC |
CKS1B | GGAATCTTGGCGTTCAGCAGAG | GAGGCTGAAAAGTAGCTTGCCAG |
UBE2S | CGATGGCATCAAGGTCTTTCCC | CAGCAGGAGTTTCATGCGGAAC |
GAPDH | GTCTCCTCTGACTTCAACAGCG | ACCACCCTGTTGCTGTAGCCAA |
Transwell assay
Migration and invasion of H1299 and A549 cell lines were detected by Transwell assays. Briefly, chambers coated (for invasion) or uncoated with Matrigel (BD Biosciences, Franklin Lakes, NJ, USA) (for migration) were first inoculated with cells (5 × 104). Serum-free medium and complete DMEM medium were added to the upper and lower layer, respectively. After incubation for 24 h, 4% paraformaldehyde was used for fixing migrating or invading cells, followed by cell dyeing using crystalline violet (0.1%).
Cell viability
Cell viability was detected by performing cell counting kit-8 assay (Beyotime, Beijing, China). Specifically, the cells were different treatments were cultured in 96-well plates at a density of 1 × 103 cells/well and added with CCK-8 solution at indicated time points. After 2-h incubation at 37 °C, a microplate reader (Thermo Fisher, Waltham, MA, USA) was utilized to measure the O.D 450 values of each well.
Statistical analysis
All analysis were conducted utilizing R package software (version 4). Survival data were summarized using Kaplan–Meier approach, and statistical significance was detected by the log-rank test. The time receiver operating characteristic (ROC) curves were adopted to evaluate the prediction ability of the model. Student t-tests were used to determine statistical significance with a one-way ANOVA used for comparative testing. A p < 0.05 was thought to be statistically significant. SangerBox (http://sangerbox.com/) provided analytical assistance (Shen et al., 2022).
Results
DEGs related to aneuploidy and functional annotation
Firstly, RNA-seq data of the TCGA-LUAD dataset were used to screen differential genes (FDR < 0.05 and |log2FC| > 1) between tumor and para-cancerous tissue samples and a total of 3,316 DEGs were obtained for subsequent analysis (Fig. 1A). Correlation analysis detected a significant correlation between 182 DEGs and AS, and these 182 DEGs were high-expressed in tumor tissue (Fig. 1B). Further gene functional enrichment analysis showed that these 182 DEGs were enriched mainly in DNA replication and cell cycle-related pathways (Figs. 1C–1F).
Establishment and assessment of the riskscore model based on the aneuploidy-related genes
Univariate Cox analysis identified 125 genes with significant prognostic impact (p < 0.01) (Table S3). Lasso cox regression in R software package glmnet (Simon et al., 2011) was used to reduce the gene number. Ten-fold cross validation was applied to develop the model. The model was the optimal when penalty parameter lambda was 0.038 (Figs. 2A and 2B), therefore, five genes at lambda = 0.038 were chosen as key prognostic genes. The coefficient of the corresponding gene was shown in Fig. 2C. Each patient’s riskscore was calculated following Eq. (2).
(2)
Based on Eq. (2), samples in TCGA-LUAD queue were separated into high and low ARS groups. It could be observed that the OS time in low ARS group was longer than that in high-ARS group (p < 0.0001, Fig. 2D). We observed that in the TCGA cohort, the area under the ROC curve (AUC) was 0.68, 0.65, and 0.67 for 1-, 3-, and 5-year OS, respectively (Fig. 2D). The five key genes were all overexpressed in tumor tissues (Fig. 2D). The model was confirmed using GSE31210 and GSE30219 datasets (Figs. 2E and 2F). Specifically, patients in the GSE31210 cohort had AUC values of 0.63, 0.68, and 0.74 for 1-, 3-, and 5-year OS, respectively, while those in the GSE30219 cohort had AUC values of 0.77, 0.74, and 0.75 for 1-, 3-, and 5-year OS, respectively.
The ARS model was an independent indicator for LUAD prognosis
To verify the applicability of the five-gene ARS signature in clinical setting, COX regression was performed. In the TCGA cohort, staging and risk scores and patients’ OS were significantly correlated (p < 0.001) (Fig. S1A). Multivariate analysis showed that staging and risk score were also independent predictors (Fig. S1B). Subsequently, similar results were found in the GSE30219 cohort, and age, staging and risk score were also the independent predictors of LUAD prognosis (p < 0.001) (Figs. S1D, S1E). Importantly, the nomogram all showed the same predictive function (Figs. S1C, S1F). These results suggested that the risk model developed based on these two datasets was a favorable tool for predicting the prognostic outcomes of LUAD patients, and that both staging and risk scores were independent prognostic factors.
CKS1B was a characteristic gene in the ARS to promote the malignant phenotypes of LUAD
To validate the reliability of the ARS model in predicting the prognosis of LUAD and the correlation between the model genes and LUAD progression, we measured the expression of PLK1, ANLN, HMMR, CKS1B and UBE2S in LUAD cell lines H1299 and A549 and bronchial normal epithelial cells BEAS-2B using PCR. Consistent with the bioinformatics results, the level of PLK1, ANLN, HMMR, CKS1B and UBE2S was elevated in the H1299 and A549 cell lines, in particular CKS1B was significantly upregulated in tumor cell lines (Figs. 3A–3E). Next, changes in migration and invasive of H1299 and A549 after inhibiting the expression of CKS1B in the two cell lines were examined. We found that CKS1B expression promoted the development of LUAD, and that the migration and invasion abilities of the two cell lines were significantly suppressed after inhibition of CKS1B (Figs. 3F and 3G). In addition, silencing CKS1B expression affected the viability of H1299 and A549 cell lines relative to controls (Figs. 3H and 3I). The above results demonstrated the reliability of using ARS model to predict the prognostic survival of LUAD, and in particular, the model genes significantly contributed to the malignant phenotype of LUAD.
Analysis of immune cell infiltration in the two cohorts
As shown in Fig. 4A, most of immune cells including T follicular helper cell, central memory CD4 T cell, immature B cell, activated B cell, immature, activated and plasmacytoid dendritic cell, eosinophil cell, macrophage, type 17 T helper cell, mast cell, monocyte and natural killer cell had higher score in low ARS group (p < 0.05), while few immunocytes such as activated, effector memory CD4 T cell, memory B cell, Type 2 T helper cell, gamma delta T cell, natural killer T cell and neutrophil were enriched in high ARS group. CIBERSORT was used to perform a more detailed assessment on different states of the immune cells. It could be found that low ARS group had obviously a higher proportion of T cells CD4 memory resting, Tregs, dendritic cells resting, and mast cells resting (p < 0.01), whereas that of T cells CD4 memory activated and macrophages at M0 and M1 stage were noticeably reduced (p < 0.0001, Fig. 4B).
The responsiveness of ARS model to immunotherapy and drug sensibility
Immunotherapy is a treatment that fights against cancer with synergistic survival benefits (Curran et al., 2010). Herein, we tested the prognostic significance of ARS model in immunotherapy dataset IMvigor210 cohort. Kaplan-Meier curves displayed that patient with a higher ARS had shorter OS (p < 0.05, Fig. 5A). Among them, SD/PD patients exhibited a higher ARS than CR/PR patients (p < 0.05, Fig. 5B). In high ARS group, the proportion of SD/PD was higher than that in the low ARS group (Fig. 5C). Applying pRRophic package, we further predicted potential sensitive drugs for different patients and found that patients in high ARS group were more sensitive to cosplatin, BI-2536, pyrimethamine and GW843682X, whereas those in low ARS group were more responsive to erlotinib and roscovitine (Figs. 5D and 5E).
Single-cell landscape of LUAD samples
He et al. (2022) generated scRNA-seq count matrix. PCA and t-SNE dimension reduction were performed based on six specimens derived from invasive adenocarcinoma patients. A total of 17 subpopulations were obtained applying FindNeighbor and FindCluster (Dim = 30, Resolution = 0.5) and reclassified according to the expression of cell markers in the subpopulations (Fig. S2). Finally, seven cell types were determined and denoted as alveolar epithelial cell, B cells, cancer cells, mast cells, monocytes/macrophages, myeloid dendritic cells and T cells (Figs. 6A and 6B). By using the FindAllMarkers function under the screening criteria of logfc = 0.5 (multiple of differences) and Minpct = 0.25 (smallest expression ratio of differential genes), the DEGs in the seven cell types were retained and the top five genes were shown in a heatmap (Fig. 6C). Subsequently, to explore the function of these cell subpopulations, we selected the top five genes for enrichment analysis. Figure 6D showed the enrichment of marker genes in different pathways in different cell types. Among them, monocytes/macrophages were mainly involved in pathways such as phagosome and lysosome. B and T cells were also predominantly enriched in the pathway of protein processing in endoplasmic reticulum. These results may provide insight into the complex biology within LUAD and offer potential targets for further improve personalized treatment for LUAD patients.
TF-target connection and function analysis
Applying SCENIC algorithm and CSI, key regulons were identified and clustered. As seen in Fig. 7A, regulons with similar CSI were clustered together. Regulons with analogous high CSI may jointly regulate downstream genes and were involved in cell functions. Next, we calculated the regulon specificity score (RSS) for the two ARS cell groups. The horizontal and vertical axes represented the ranking of the regulons and the RSS, respectively. The top five TFs in the ranking were represented by red dots. A higher RSS indicated a closer relationship between the regulons and cell type specificity. As shown in Figs. 7B and 7C, regulons such as CREB3L1, JUN_extended, CEBPD, FOS_extended and JUNB_extended were enriched in high ARS cell group. Nevertheless, the regulons including BCLAF1_extended, TAF7_extended, RAD21_extended, BCLAF1 and UQCRB ranked the top 5 in low ARS cell group.
CellChat identified communication patterns between immune cells and LUAD cells
Cell communication analysis helps to analyze the interaction between cells, explore the tumor immune microenvironment and dig potential therapeutic targets for diseases (Jin et al., 2021). As mentioned above, high and low ARS groups had different infiltration status of immune cells. Herein, we investigated the communication between immune cells and LUAD cells using scRNA-seq analysis. The results demonstrated that the cancer cells in high ARS group could receive signals from immune cells through the SPP1 signaling pathway via ligand receptor pair ITGA4-ITGB1 (Fig. 8A). As shown in Fig. 8B, tumor cells communicated with other immune cells through the MK signaling pathway via ligand receptor pair MDK-NCl. Moreover, high ARS group had thicker lines, which indicated stronger interaction signals. Communications through GDF (GDF15-TGFBR2) and ECF (AREG-EGRF) signaling pathway between high and low ARS cell groups were also found (Figs. 8C and 8D). These results demonstrated that ARS helped to detect an enhanced tumor-immune system interplay in LUAD.
Discussion
LUAD is represented by highly inflammatory infiltration (Luo et al., 2020; Zheng et al., 2023). The early asymptomatic characteristics of LUAD often leads to a late diagnosis of most LUAD patients as well as a poor cancer prognosis. With the development of surgical and drug-assisted therapies, novel and effective diagnostic methods for heterogeneous LUAD patients should be created to improve the survival outcomes of LUAD.
In this study, we focused on developing, establishing, and verifying the aneuploidy-related gene signature by integrating bulk and scRNA-seq data to thoroughly analyze the role of aneuploidy in the diagnosis and treatment of LUAD. The predictive model utilizing a range of functional genes has emerged as a focal point in directing the prognosis of tumors. Wu et al. (2022) constructed a prognostic model based on necroptosis-related genes and the area of the ROC curve of this model in the training cohort was 0.73, 0.7, and 0.67, respectively. In addition, (Guo & Liu, 2022) obtained the Transient receptor potential channel (TRP) gene set based on the MSigDB database and constructed a related risk model based on TRP. To assess the validity of the model, the results showed that the areas of the ROC curves in the training cohort at 1, 3 and 5 years were 0.747, 0.663 and 0.654, respectively (Guo & Liu, 2022). In this study, we constructed an ARS-related prognostic model in which the AUC values for the training set were 0.68, 0.65, and 0.67, respectively. In this study, we constructed an ARS-related prognostic model in which the AUC values were 0.68, 0.65, and 0.67 for the TCGA cohort and 0.77, 0.74, and 0.75 for the GSE30219 validation cohort. This result suggests that the predictive power of our constructed model is also reliable.
Through differential gene analysis, COX regression, Pearson analysis and Lasso method, five genes (PLK1, ANLN, HMMR, CKS1B and UBE2S) were screened to construct an ARS model. We found that patients in high ARS group displayed shorter OS time, which was in agreement with a previous research (Zhang & Zhao, 2023). At the same time, the risk model was validated and proven to be an independent indicator for the prognosis of LUAD. These five genes were all overexpressed risk factors in tumor tissues in comparison with para-cancerous tissues. Previous researches on these genes mainly have their focuses on the potential to serve as diagnostic biomarkers and the mechanisms of action. Polo-like kinase 1 (PLK1), an evolutionarily conserved Ser/Thr kinase, has the function to regulate cell cycle, and the imbalance of PLK1 could lead to chromosomal instability and aneuploidy (Gheghiani et al., 2021). PLK1 is high-expressed in various cancers and is related to unfavorable prognosis (Galusic et al., 2020). Increasing evidence suggested that PLK1 regulates several critical TFs in cancers and facilitates cell proliferation, metastasis, epithelial mesenchymal transition (Moore, Gheghiani & Fu, 2023; Iliaki, Beyaert & Afonina, 2021). These findings indicated that PLK1 may be a promising target for cancer treatment. Currently, many PLK1 inhibitors have been developed for disease treatments. An intriguing observation in our research was that patients in high ARS group were more sensitive to PLK1 inhibitors (BI-2536 and GW843682X). This finding further supported the potential of PLK1 to serve as a diagnostic and therapeutic marker for LUAD. ANLN is an actin-binding protein that is often overexpressed to cause a poor prognosis (Long et al., 2018) and metastasis (Xu et al., 2019) in LUAD, but it could be suppressed by miR-30a-5p (Deng et al., 2021). Hence, ANLN has been frequently selected as a hub gene to build riskscore signature to evaluate the prognosis in LUAD (Luo et al., 2020; Zhang & Zhao, 2023). This suggested that these two genes could promote metastasis and development of LUAD cancer cells.
Hyaluronan-mediated motility receptor (HMMR) plays an important role in cancer progression. For example, overexpressed HMMR in mouse mammary gland epithelium increases the occurrence of BRCA1-mutant tumors, which is directly related to hereditary breast cancer (Mateo et al., 2022), while inhibition on HMMR expression could abrogate peritoneal metastasis of gastric cancer cells (Yang et al., 2022). A newly published pan-cancer analysis supported the latent capacity of HMMR as a diagnostic and prognostic indicator of tumor, because HMMR is widely high-expressed in massive tumor cells (Shang et al., 2022). A holistic pan-cancer analysis proved that CKS1B is a cancer-promoting factor (Jia et al., 2021). In addition, Wang et al. (2021) first characterized CKS1B as a potential prognostic marker in LUAD, and this gene was also discovered in our present work. Another pan-cancer analysis on UBE2S (Bao et al., 2022) revealed that UBE2S expression is negatively correlated with the prognosis of different human cancers and is noticeably positive related to MDSC and T helper 2 cell (Th2) subsets in most tumors, indicating that UBE2S is an immune-carcinogenic marker. This work highlighted the potential of using UBE2S in immunotherapy for LUAD patients.
To better understand the transcriptional regulatory networks across intracellular cancer cells, we predicted the TF-target regulating networks between high and low ARS cell groups. Different regulons in the two risk cell groups were discovered. In high ARS cell group, regulons CREB3L1 and CEBPD were enriched. Abnormal CREB3L1 expression results in growth, metastasis or migration of malignant tumors (Wang et al., 2021; Mellor et al., 2022). CREB3L1 could facilitate the above biological processes by reshaping the tumor immune microenvironment (Pan et al., 2022; Song et al., 2022). Interestingly, the potential of CREB3L1 to predict immunotherapeutic efficacy is also supported by a previous pan-cancer analysis (Lin et al., 2022). CEBPD is one of the hub genes for triple-negative breast cancer prognosis (Kim et al., 2019). We found that TFs BCLAF1 and UQCRB with over 900 downstream target genes were enriched in low ARS cell group. Studies also confirmed that during tumor development, BCLAF1 acts either as a tumor promoter (Mou et al., 2020) or a suppressor (Li et al., 2018) dependent on the cellular environment and category of tumor (Yu et al., 2022). Some studies uncovered that UQCRB is involved in tumorigenesis by recognizing its genetic mutations or relevant microRNAs in pancreatic ductal adenocarcinoma (Harada et al., 2009) or colorectal cancer (Hong et al., 2020). To the best of our knowledge, these four TFs were first described in LUAD cancer cells, but their mechanism underlying LUAD development required more in-depth investigation.
Aneuploidy is related to lower immune infiltration in many types of tumors (Davoli et al., 2017). Consistent with previous findings, high ARS group showed a low degree of immune infiltration, as shown by the results from ssGSEA and CIBERSORT analysis. Subsequent scRNA-seq data concerning CellChat analysis uncovered that high ARS cell group received and sent communication signals via SPP1(ITGA4-ITGB1) and MK(MDK-NCl) signaling pathways. Intriguingly, these two pathways help shape a tumor immune barrier (Liu et al., 2023a) to develop an immunosuppressive environment in tumor (Yu et al., 2023). This finding may explain the apparently poor prognosis and less active response of LUAD patients to immunotherapy at molecular level.
Despite these interesting findings, there are also some limitations to this study. Firstly, the expression of the aneuploidy-related genes should be detected at protein level. In addition, the communication pathways between cancer and immune cells were obtained via bioinformatics algorithms and further verification is needed. Moreover, the effect of the prognostic genes in LUAD is required to be confirmed by in vivo validation. Finally, as all the data were obtained based on public databases, future work is encouraged to use a larger sample size to confirm the accuracy of the present findings.
Conclusions
To conclude, we analyzed the bulk and scRNA-seq data related to aneuploidy in LUAD. An independent ARS signature for predicting LUAD patient’s prognosis was established. The viability of the ARS model genes to predict the prognosis and progression of LUAD was tested by performing a series of cellular experiments, and the key ARS-related genes contributing to the malignant phenotypes of LUAD were mined. Our study provided a new direction for the biomarker research in LUAD, and confirmed that the ARS model can help improve the current personalized treatment for LUAD patients, especially in the selection of chemotherapeutic agents and immunotherapy strategies.
Supplemental Information
Validation the independence of RAS signature in clinic prognosis.
(A-B) Univariate and Multivariate Analysis of Clinical Characteristics and Riskscore in the TCGA Cohort. (C) TCGA queue-validated model of the nomogram. (D-E) Univariate and Multivariate Analysis of Clinical Characteristics and Riskscore in the GSE30219 Cohort. (F) GSE30219 queue-validated model of the nomogram.