Identification of biomarkers associated with clinical severity of chronic obstructive pulmonary disease

We sought to identify the biomarkers related to the clinical severity of stage I to stage IV chronic obstructive pulmonary disease (COPD). Gene expression profiles from the blood samples of COPD patients at each of the four stages were acquired from the Gene Expression Omnibus Database (GEO, accession number: GSE54837). Genes showing expression changes among the different stages were sorted by soft clustering. We performed functional enrichment, protein–protein interaction (PPI), and miRNA regulatory network analyses for the differentially expressed genes. The biomarkers associated with the clinical classification of COPD were selected from logistic regression models and the relationships between TLR2 and inflammatory factors were verified in clinical blood samples by qPCR and ELISA. Gene clusters demonstrating continuously rising or falling changes in expression (clusters 1, 2, and 7 and clusters 5, 6, and 8, respectively) from stage I to IV were defined as upregulated and downregulated genes, respectively, and further analyzed. The upregulated genes were enriched in functions associated with defense, inflammatory, or immune responses. The downregulated genes were associated with lymphocyte activation and cell activation. TLR2, HMOX1, and CD79A were hub proteins in the integrated network of PPI and miRNA regulatory networks. TLR2 and CD79A were significantly correlated with clinical classifications. TLR2 was closely associated with inflammatory responses during COPD progression. Functions associated with inflammatory and immune responses as well as lymphocyte activation may play important roles in the progression of COPD from stage I to IV. TLR2 and CD79A may serve as potential biomarkers for the clinical severity of COPD. TLR2 and CD79A may also serve as independent biomarkers in the clinical classification in COPD. TLR2 may play an important role in the inflammatory responses of COPD.


INTRODUCTION
Chronic obstructive pulmonary disease (COPD) is a heterogeneous condition characterized by the progressive, irreversible limitation of a patient's airflow (Agusti, 2014;Vestbo et al., 2013). It is a leading cause of chronic morbidity and it has been predicted to be the third leading cause of mortality worldwide by 2030(Celli et al., 2015Rodriguez-Roisin et al., 2017). COPD is a progressive disease that typically worsens over time (Rabe et al., 2013). However, the molecular mechanisms underlying its progression are poorly understood (Singh et al., 2018). It has been reported that an early diagnosis and individualized treatments can reduce mortality and the socioeconomic burden caused by the disease. Damage to the airways and lungs may be delayed but not effectively reversed (Pauwels et al., 2004).
Spirometry is used to diagnose the severity of COPD by determining the degree of airflow limitation. Spirometry is based on the forced expiratory volume in one second (FEV1) expressed as a percentage of the predicted normal value for a person's gender, age, weight, and height (Vestbo et al., 2013). COPD is divided into four stages based on FEV1 according to the GOLD guidelines (Vestbo et al., 2013), and COPD exacerbations are classified as episodes of worsening symptoms from stage I to IV (Wedzicha & Seemungal, 2007). The treatment recommendations for European and American practitioners are based on FEV1 (Qaseem et al., 2011) but FEV1 does not directly reflect the systemic manifestation of symptoms in COPD patients (Alotaibi & Ansari, 2016). Several studies have demonstrated that there is a genetic component to COPD susceptibility (Hersh et al., 2011;McCloskey et al., 2001). Thus, the identification of these genetic markers may help to develop diagnostic and therapeutic targets for the treatment of COPD. Previous genome-wide association studies have identified several genetic loci related to COPD susceptibility (Cho et al., 2011;Cho et al., 2014). A recent study investigated the gene expression profiles among patients with frequent COPD exacerbations and identified three genes (ARHGEF10, LAF4, and B3GNT ) as predictors of exacerbations (Singh et al., 2014). However, that study did not investigate whether those genes could be used as biomarkers for the clinical classification of COPD.
We downloaded the forementioned gene expression profile (GSE54837) (Singh et al., 2014) and investigated the genes showing changes in expression among different stages of COPD. We identified key genes using bioinformatics analyses and then further selected the biomarkers associated with the clinical classification of COPD using logistical regression models.

Microarray dataset
We downloaded the gene expression profiles with accession number GSE54837 (Singh et al., 2014) from the Gene Expression Omnibus (GEO) database. These profiles included different clinical stage data, and were obtained based on the Affymetrix Human Genome U133 Plus 2.0 Array. We obtained 226 blood samples from 90 stage I patients, 58 stage II patients, 55 stage III patients, and 13 stage IV patients. The age and sex of each patient is shown in Table S1.

Data preprocessing
The original CEL data were preprocessed (background adjustment, quantile normalization, final summarization, and probe ID to gene symbol) using the RMA method (Irizarry et al., 2003) in the Affy package (Gautier et al., 2004). The expression values of the probes were averaged as the final gene expression value when the different probes corresponded to the same gene symbol. After the expression matrix of all genes in each sample was obtained, the gene expression values of each sample group with different clinical grades were averaged according to the clinical classification and the final gene expression differential matrix was obtained.

Gene clustering analysis
The differential matrix of samples was subjected to noise-robust soft clustering using the fuzzy c-means algorithm in the Mfuzz package (Futschik & Carlisle, 2005;Kumar & Futschik, 2007). The Fuzzy C-Means (Futschik & Carlisle, 2005) clustering method was adopted for clustering analysis on the genes of samples using varying times and changes in expression levels. Multiple clustering results were obtained, and the results were divided according to the trends into two parts: rising and falling. The parameters used were minSTD = 0.1, acore = 0.5, and cOptimal = 10.

Functional enrichment analyses
The gene sets with significant rising or falling trends were examined using the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) (Huang, Sherman & Lempicki, 2008). Enrichment results were identified in biological processes (BP), cellular components (CC), molecular functions (MF), and pathways.

Protein-Protein Interaction (PPI) analysis and miRNA-target gene regulatory relation analysis
PPI pairs were predicted based on the genes with rising and falling trends using the Search Tool for the Retrieval of Interacting Genes (STRING) (Mering et al., 2003). The required confidence of (combined score) > 0.4 was set as the threshold value for PPI relation prediction.
The regulatory miRNAs of genes with PPI relations were predicted using Webgestalt (Zhang, Kirov & Snoddy, 2005) based on the hypergeometric test with a threshold of enriched gene count ≥ 3. The p value was adjusted using the Benjamini-Hochberg (BH) method (Benjamini & Hochberg, 1995), and the adjusted p value < 0.05 was set as the threshold.

Integration of PPI and miRNA-target gene regulatory relations
The PPI pairs and miRNA-target gene regulatory relation pairs were integrated to construct a network using Cytoscape (Shannon et al., 2003). The degree of connectivity in the network was analyzed and the hub protein (the node with a high degree of connectivity) was selected.

Influencing factors of the clinical grade of COPD
We performed logistical regression analysis after obtaining the hub proteins in the network based on the age (≤ 60 = 0, >60 = 1), gender (female = 0, male = 1), and the gene corresponding to the hub protein as potential factors influencing the clinical grade to further determine whether the expression of the genes could be used as biomarkers of clinical grade. Univariate logistic regression analysis was conducted to preliminarily screen the factors with p value < 0.05. Multivariate logistic regression analysis was then conducted to further select the factors with p value < 0.05.

Clinical data collections and detections
Investigators obtained written informed consent before enrolling participants in the clinical trial. We selected patients who underwent pulmonary function testing at the respiratory function laboratory of the Affiliated Huai'an Hospital of Xuzhou Medical University between January 1, 2019 and May 31, 2019. Patients with complications including asthma, pulmonary interstitial fibrosis, bronchial dilatation, other pulmonary diseases, or malignant tumors were excluded. COPD was staged according to the GOLD guidelines: GOLD I, FEV1 ≥80% predicted; GOLD II, FEV1 < 80% and ≥50% predicted; GOLD III, FEV1 < 50% and ≥30% predicted and GOLD IV, FEV1 < 30% predicted. Patients selected for the study were divided into the following four groups: no related diseases non-smoker, no related diseases smoker, COPD stage I-II, and COPD III-IV. Peripheral venous blood samples were collected from all subjects and placed in anticoagulation centrifuge tubes. Mononuclear cells were separated by density gradient centrifugation. CD14 + immunomagnetic beads (Miltenyi, Germany) were sorted by the automatic magnetic bead sorter. TLR2 gene expression in the monocytes was detected using real-time quantitative PCR according to the instructions of the Real-time PCR kit (Takara, Tokyo, Japan). The amplification specificity was confirmed by melting curves and fluorescence was determined at 60 • C. PCR conditions were as follows: predegeneration at 95 • C for 30 s, followed by 40 cycles for 5 s at 95 • C and 31 s at 60 • C. qRT-PCR reactions were performed in a total volume of 20 µL. The primer sequences for TLR2 were as follows: F: 5 -TCGGAGTTCTCCCAG TTCTCT-3 , R: 5 -TCCAGTGCTTCAACCCACAA -3 . The internal reference was β-actin F: 5 -CCTGGCACCCAGCACAAT-3 , R:5 -GGGCCGGACTCGTCATAC3 . Additional blood samples were taken and centrifuged at 3,000 r/min for 10 min, after which the serum was separated and stored in a refrigerator at −80 • C for testing. All experiments were performed in triplicate. Relative quantitative expression levels were calculated using the 2 − Ct method. The levels of IL-6, IL-8, TNF-α, and IFN-γ in the serum samples were detected and compared by an ELISA Kit (Sangon Biotech, Shanghai, China). All experiments were approved by the Ethics Committee of the Affiliated Huai'an Hospital of Xuzhou Medical University. The clinical characteristics of all the subjects are shown in Table S2.

Statistical analysis
We used SPSS 19.0 and Graphpad Prism 7.0 for statistical analysis. The expression levels of TLR2 and inflammatory factors were analyzed using a one-way ANOVA, followed by the least significant difference multiple comparison post-hoc test, when appropriate. We analyzed the correlation between TLR2 expression levels and inflammatory factor levels in all detected samples using Pearson Correlation Analysis. All statistical analyses were conducted with a significance level of α = 0.05 (P < 0.05).

Gene clustering analysis
Opposing trends were observed among clusters 1, 2, and 7 and clusters 5, 6, and 8 as the clinical stage progressed from stage I to stage IV (Fig. 1). Clusters 5, 6, and 8 showed significantly increased expression, while clusters 1, 2, and 7 displayed significantly decreased expressions from stage I to IV. The changes in expression in clusters 3, 4, 9, and 10 were disordered. We selected clusters 1, 2, 5, 6, 7, and 8 for further analysis. A total of 78, 42, and 77 genes were included in clusters 5, 6, and 8, respectively, and were defined as upregulated genes. 59, 44, and 56 genes were included in clusters 1, 2, and 7, respectively, and were defined as downregulated genes.

Functional enrichment analyses
The upregulated genes were significantly related to GO functions including defense, inflammatory, wounding, and immune responses, the positive regulation of biosynthetic processes, system lupus erythematosus pathways, the toll-like receptor signaling pathway, and cytokine-cytokine receptor interactions (Table 1). The downregulated genes were involved in GO functions including lymphocyte, cell, and leukocyte activation, the external side of the plasma membrane, type I diabetes mellitus pathway signaling, Jak-STAT signaling, and T cell receptor signaling (Table 2).

PPI and miRNA-target gene regulatory relations prediction
100 pairs were predicted from upregulated genes and 40 PPI pairs were predicted from downregulated genes using STRING. 64 and 92 miRNA-target gene regulatory relation pairs were predicted from upregulated and downregulated genes, respectively (Table 3).

Network integration analysis
Two integrated networks were constructed based on the obtained PPI and miRNA-target gene regulatory relation pairs (Fig. 2). A constructed network of upregulated genes displayed the following five nodes with degrees ≥ 6: toll like receptor 2 (TLR2), matrix metallopeptidase 9 (MMP9), heme oxygenase 1 (HMOX1), and C-C motif chemokine receptor 1 (CCR1) (Fig. 2A). In the network of downregulated genes, the CD79a molecule (CD79A) and phospholipase C gamma 1 (PLCG1) had degrees ≥ 6 and were considered to be hub nodes (Fig. 2B).

Influencing factors of the clinical grade of COPD
Univariate logistic regression analysis results revealed that the factors of age, TLR2, MMP9, CCR1, CD79A, and PLCG1 were significantly associated with clinical classification (Table 4).   These factors were further analyzed by multivariate logistic regression analysis and TLR2 was found to be significantly positively correlated and CD79A was significantly negatively correlated with clinical classification, suggesting that they could be used as independent biomarkers of clinical classification (Table 5).

The relationships between TLR2 and inflammatory factors in COPD progression
TLR2 can trigger signal transduction and lead to the release of inflammatory mediators. COPD is defined as a chronic inflammatory lung disease. We chose IL-6, IL-8, TNF-α, and IFN-γ to assess the severity of COPD as they are known inflammatory factors in COPD progression. We found that the levels of IL-6, IL-8, TNF-α, and IFN-γ in the smoker and COPD III-IV groups were significant higher than those in the non-smoker and COPD I-II groups, and TLR2 in mononuclear cells of the peripheral blood was significantly correlated with the clinical classification of the patients (Figs. 3A-3E). We also found that TLR2 and the expressions of inflammatory markers were more significant in the GOLDIII/VI group than those in the healthy smoker group. We used the ROC curves to identify the sensitivity and specificity of the expression levels. The results showed that the five markers, IL-6, IL-8, TNF-α, IFN-γ and TLR2, have diagnostic value for COPD. The areas under the curve (AUC)

DISCUSSION
COPD is a growing global health problem. As the population ages, life expectancy among the elderly is expected to rise in relation to the number of smokers and the prevalence of COPD. Researchers are working to identify and diagnose COPD using easily available biomarkers to facilitate the early detection of fluid build-up in COPD and improve diagnostic accuracy. Previous studies focused on finding new biomarkers in the blood for the early diagnosis of COPD and many of these potential diagnostic markers need to be verified in additional studies. We studied gene clusters with continuously rising (clusters 5, 6, and 8) or falling (clusters 1, 2, and 7) trends in expression changes from stage I to stage IV. The upregulated genes, such as TLR2, and HMOX1, were significantly enriched in functions associated with defense, inflammatory, or immune responses, while the downregulated genes, such as CD79A, were associated with the activation of lymphocytes, cells, and leukocytes. TLR2, HMOX1, and CD79A were hub proteins in the integrated network. Logistic regression analysis showed that TLR2 was positively correlated and CD79A was negatively correlated with clinical classification. Our results showed that the upregulated genes, such as TLR2, and HMOX1, were significantly associated with the inflammatory and immune responses. Exposure to inhaled pollutants can result in chronic airway inflammation in COPD by activating structural and inflammatory cells within the lungs (Rovina, Koutsoukou & Koulouris, 2013). COPD exacerbation is defined as an acute worsening of the symptoms that are implicated in increased systemic and airway inflammation (Wedzicha & Seemungal, 2007). The inflammatory and immune responses are known to play a critical role in COPD. Dave Singh, et al. (2014b) found that systemic immune function is associated with the exacerbation of COPD. Our study further suggested the important role of inflammatory and immune responses in the progression of COPD from stage I to stage IV. IL-6, IL-8, TNF-α and IFN-γ are important inflammatory mediators during the progression of COPD and IL-6 participates in the inflammatory response. It can further aggravate the oxidative stress and inflammatory responses by promoting the proliferation and maturation of T lymphocytes and B lymphocytes. IL-8 is an inflammatory mediator that promotes the activation of neutrophils and the release of various enzymes that damage the bronchi. Excessive levels of IL-8 can aggravate the inflammatory response and destroy the lung tissue (Farahi et al., 2017;Nakamoto et al., 2019). By acting on neutrophils, TNF-α enhances the expression of leukocyte adhesion molecules and activates other factors involved in the inflammatory response. Higher levels of TNF-α and IFN-γ in COPD also indicate increased inflammation (Xu, Li & Sun, 2019;Zhang et al., 2016). Our findings about these inflammatory factors in COPD were consistent with previous studies.
TLR2 was among the top two hub proteins in the network. TLR2 is a member of the toll-like receptor family, which plays a fundamental role in the activation of innate immunity (Da Silva et al., 2008). Haw et al. (2018) found that TLR2 mRNA expression was increased in the epithelium and parenchyma of the mouse airway when chronically exposed to cigarette smoke. These results were replicated in human COPD patients. Our results showed that increased TLR2 was an independent clinical classification biomarker in COPD. The results of some studies do not support this research. A previous study reported that the alveolar macrophages in smokers and COPD patients presented an equally decreased surface expression of TLR2 compared to non-smokers (Droemann et al., 2005). The role of TLR2 in the pathogenesis of COPD is also controversial in previous literature (Freeman et al., 2013;Haw et al., 2018;Simpson et al., 2013;Von Scheele et al., 2011). These conflicting results may be due to differences in experimental methods (e.g., peripheral blood monocytes vs. macrophages) or cohorts of patients with varying medical backgrounds.
We found that the downregulated genes were involved in functions associated with lymphocyte activation, which may suggest that lymphocyte activation plays a role in COPD development. COPD has been thought to possess an autoimmune component (Agusti et al., 2003;Lee et al., 2007), but the antigenic stimulation responsible for lymphocyte activation is unclear. Previous studies demonstrated that the numbers of B lymphocytes and CD8+ T cells are increased in the airways of COPD patients compared with healthy controls (Gosman et al., 2006;Saetta et al., 1999). The number of lymphocytes has been shown to increase with the severity of the disease (Hogg et al., 2004). We found that the hub protein, CD79A, was enriched with lymphocyte activation. CD79A has been demonstrated to play diverse roles in the development and function of B lymphocytes (Pelanda et al., 2002) and may be downregulated in severe COPD when compared with controls (Cockayne et al., 2011). Our results showed that CD79A was negatively correlated with the clinical classification of COPD. Thus, lymphocyte activation and CD79A may be associated with the clinical severity of COPD.
Our study is limited because we analyzed the differentially expressed genes in COPD patients using a single set of GEO data. The sample capacity was small and more GEO data should be used in further studies. Secondly, by extracting the total RNA content in blood monocytes, we found that the Ct value was relatively large, indicating that the expression of TLR2 in blood was lower, and the micro expression levels may have caused experimental errors. We performed each test at least three times to improve the repeatability of the results. Finally, we only studied the expression of TLR2 in the clinical population but its specific mechanism and inconsistency in previous studies was well explained and requires additional experimentation. Our results show that the inflammatory and immune responses, as well as lymphocyte activation, may play important roles in COPD development from stage I to stage IV. TLR2, and CD79A may serve as potential biomarkers in the exacerbation of COPD, and TLR2 and CD79A may also serve as independent biomarkers for the clinical classification of COPD.