PeerJ:Computational Biologyhttps://peerj.com/articles/index.atom?journal=peerj&subject=900Computational Biology articles published in PeerJValidation of CDC45 as a novel biomarker for diagnosis and prognosis of gastric cancerhttps://peerj.com/articles/171302024-03-182024-03-18Lihua WuGan GaoHui MiZhou LuoZheng WangYongdong LiuLiangyan WuHaihua LongYongqi Shen
Background
Cell division cycle protein 45 (CDC45) has been demonstrated to play vital roles in the progression of various malignancies. However, the clinical significance of CDC45 in gastric cancer (GC) remains unreported.
Method
In this study, we employed the TCGA database and the TCGA & GTEx dataset to compare the mRNA expression levels of CDC45 between gastric cancer tissues and adjacent or normal tissues (p < 0.05 was considered statistically significant), which was further validated in multiple datasets including GSE13911, GSE29272, GSE118916, GSE66229, as well as RT-qPCR. Furthermore, we harnessed the Human Protein Atlas (HPA) to evaluate the protein expression of CDC45, which was subsequently verified through immunohistochemistry (IHC). To ascertain the diagnostic utility of CDC45, receiver operating characteristic (ROC) curves and the area under the ROC curve (AUC) were calculated in TCGA database, and further validated it in TCGA & GTEx and GSE66229 datasets. The Kaplan–Meier method was used to reveal the prognostic importance of CDC45 in The Cancer Genome Atlas (TCGA) database and authenticated through the GSE66229, GSE84433, and GSE84437 datasets. Through cBioPortal, we identified co-expressed genes of CDC45, and pursued enrichment analysis. Additionally, we availed gene set enrichment analysis (GSEA) to annotate the biological functions of CDC45.
Results
Differential expression analysis revealed that CDC45 was significantly upregulated at both the mRNA and protein levels in GC (all p < 0.05). Remarkably, CDC45 emerged as a promising prognostic indicator and a novel diagnostic biomarker for GC. In a comprehensive the drug susceptibility analysis, we found that patients with high expression of CDC45 had high sensitivity to various chemotherapeutic agents, among which 5-fluorouracil, docetaxel, cisplatin, and elesclomol were most evident. Furthermore, our findings suggested a plausible association between CDC45 and immune cell infiltration. Enrichment analysis revealed that CDC45 and its associated genes may play crucial roles in muscle biofunction, whereas GSEA demonstrated significant enrichment of gene sets pertaining to G protein-coupled receptor ligand binding and G alpha (i) signaling events.
Conclusion
Our study elucidates that upregulation of CDC45 is intricately associated with immune cell infiltration and holds promising potential as a favorable prognostic marker and a novel diagnostic biomarker for GC.
Background
Cell division cycle protein 45 (CDC45) has been demonstrated to play vital roles in the progression of various malignancies. However, the clinical significance of CDC45 in gastric cancer (GC) remains unreported.
Method
In this study, we employed the TCGA database and the TCGA & GTEx dataset to compare the mRNA expression levels of CDC45 between gastric cancer tissues and adjacent or normal tissues (p < 0.05 was considered statistically significant), which was further validated in multiple datasets including GSE13911, GSE29272, GSE118916, GSE66229, as well as RT-qPCR. Furthermore, we harnessed the Human Protein Atlas (HPA) to evaluate the protein expression of CDC45, which was subsequently verified through immunohistochemistry (IHC). To ascertain the diagnostic utility of CDC45, receiver operating characteristic (ROC) curves and the area under the ROC curve (AUC) were calculated in TCGA database, and further validated it in TCGA & GTEx and GSE66229 datasets. The Kaplan–Meier method was used to reveal the prognostic importance of CDC45 in The Cancer Genome Atlas (TCGA) database and authenticated through the GSE66229, GSE84433, and GSE84437 datasets. Through cBioPortal, we identified co-expressed genes of CDC45, and pursued enrichment analysis. Additionally, we availed gene set enrichment analysis (GSEA) to annotate the biological functions of CDC45.
Results
Differential expression analysis revealed that CDC45 was significantly upregulated at both the mRNA and protein levels in GC (all p < 0.05). Remarkably, CDC45 emerged as a promising prognostic indicator and a novel diagnostic biomarker for GC. In a comprehensive the drug susceptibility analysis, we found that patients with high expression of CDC45 had high sensitivity to various chemotherapeutic agents, among which 5-fluorouracil, docetaxel, cisplatin, and elesclomol were most evident. Furthermore, our findings suggested a plausible association between CDC45 and immune cell infiltration. Enrichment analysis revealed that CDC45 and its associated genes may play crucial roles in muscle biofunction, whereas GSEA demonstrated significant enrichment of gene sets pertaining to G protein-coupled receptor ligand binding and G alpha (i) signaling events.
Conclusion
Our study elucidates that upregulation of CDC45 is intricately associated with immune cell infiltration and holds promising potential as a favorable prognostic marker and a novel diagnostic biomarker for GC.The impact of FASTQ and alignment read order on structural variant calling from long-read sequencing datahttps://peerj.com/articles/171012024-03-152024-03-15Kyle J. LesackJames D. Wasmuth
Background
Structural variant (SV) calling from DNA sequencing data has been challenging due to several factors, including the ambiguity of short-read alignments, multiple complex SVs in the same genomic region, and the lack of “truth” datasets for benchmarking. Additionally, caller choice, parameter settings, and alignment method are known to affect SV calling. However, the impact of FASTQ read order on SV calling has not been explored for long-read data.
Results
Here, we used PacBio DNA sequencing data from 15 Caenorhabditis elegans strains and four Arabidopsis thaliana ecotypes to evaluate the sensitivity of different SV callers on FASTQ read order. Comparisons of variant call format files generated from the original and permutated FASTQ files demonstrated that the order of input data affected the SVs predicted by each caller. In particular, pbsv was highly sensitive to the order of the input data, especially at the highest depths where over 70% of the SV calls generated from pairs of differently ordered FASTQ files were in disagreement. These demonstrate that read order sensitivity is a complex, multifactorial process, as the differences observed both within and between species varied considerably according to the specific combination of aligner, SV caller, and sequencing depth. In addition to the SV callers being sensitive to the input data order, the SAMtools alignment sorting algorithm was identified as a source of variability following read order randomization.
Conclusion
The results of this study highlight the sensitivity of SV calling on the order of reads encoded in FASTQ files, which has not been recognized in long-read approaches. These findings have implications for the replication of SV studies and the development of consistent SV calling protocols. Our study suggests that researchers should pay attention to the input order sensitivity of read alignment sorting methods when analyzing long-read sequencing data for SV calling, as mitigating a source of variability could facilitate future replication work. These results also raise important questions surrounding the relationship between SV caller read order sensitivity and tool performance. Therefore, tool developers should also consider input order sensitivity as a potential source of variability during the development and benchmarking of new and improved methods for SV calling.
Background
Structural variant (SV) calling from DNA sequencing data has been challenging due to several factors, including the ambiguity of short-read alignments, multiple complex SVs in the same genomic region, and the lack of “truth” datasets for benchmarking. Additionally, caller choice, parameter settings, and alignment method are known to affect SV calling. However, the impact of FASTQ read order on SV calling has not been explored for long-read data.
Results
Here, we used PacBio DNA sequencing data from 15 Caenorhabditis elegans strains and four Arabidopsis thaliana ecotypes to evaluate the sensitivity of different SV callers on FASTQ read order. Comparisons of variant call format files generated from the original and permutated FASTQ files demonstrated that the order of input data affected the SVs predicted by each caller. In particular, pbsv was highly sensitive to the order of the input data, especially at the highest depths where over 70% of the SV calls generated from pairs of differently ordered FASTQ files were in disagreement. These demonstrate that read order sensitivity is a complex, multifactorial process, as the differences observed both within and between species varied considerably according to the specific combination of aligner, SV caller, and sequencing depth. In addition to the SV callers being sensitive to the input data order, the SAMtools alignment sorting algorithm was identified as a source of variability following read order randomization.
Conclusion
The results of this study highlight the sensitivity of SV calling on the order of reads encoded in FASTQ files, which has not been recognized in long-read approaches. These findings have implications for the replication of SV studies and the development of consistent SV calling protocols. Our study suggests that researchers should pay attention to the input order sensitivity of read alignment sorting methods when analyzing long-read sequencing data for SV calling, as mitigating a source of variability could facilitate future replication work. These results also raise important questions surrounding the relationship between SV caller read order sensitivity and tool performance. Therefore, tool developers should also consider input order sensitivity as a potential source of variability during the development and benchmarking of new and improved methods for SV calling.EPI-SF: essential protein identification in protein interaction networks using sequence featureshttps://peerj.com/articles/170102024-03-132024-03-13Sovan SahaPiyali ChatterjeeSubhadip BasuMita Nasipuri
Proteins are considered indispensable for facilitating an organism’s viability, reproductive capabilities, and other fundamental physiological functions. Conventional biological assays are characterized by prolonged duration, extensive labor requirements, and financial expenses in order to identify essential proteins. Therefore, it is widely accepted that employing computational methods is the most expeditious and effective approach to successfully discerning essential proteins. Despite being a popular choice in machine learning (ML) applications, the deep learning (DL) method is not suggested for this specific research work based on sequence features due to the restricted availability of high-quality training sets of positive and negative samples. However, some DL works on limited availability of data are also executed at recent times which will be our future scope of work. Conventional ML techniques are thus utilized in this work due to their superior performance compared to DL methodologies. In consideration of the aforementioned, a technique called EPI-SF is proposed here, which employs ML to identify essential proteins within the protein-protein interaction network (PPIN). The protein sequence is the primary determinant of protein structure and function. So, initially, relevant protein sequence features are extracted from the proteins within the PPIN. These features are subsequently utilized as input for various machine learning models, including XGB Boost Classifier, AdaBoost Classifier, logistic regression (LR), support vector classification (SVM), Decision Tree model (DT), Random Forest model (RF), and Naïve Bayes model (NB). The objective is to detect the essential proteins within the PPIN. The primary investigation conducted on yeast examined the performance of various ML models for yeast PPIN. Among these models, the RF model technique had the highest level of effectiveness, as indicated by its precision, recall, F1-score, and AUC values of 0.703, 0.720, 0.711, and 0.745, respectively. It is also found to be better in performance when compared to the other state-of-arts based on traditional centrality like betweenness centrality (BC), closeness centrality (CC), etc. and deep learning methods as well like DeepEP, as emphasized in the result section. As a result of its favorable performance, EPI-SF is later employed for the prediction of novel essential proteins inside the human PPIN. Due to the tendency of viruses to selectively target essential proteins involved in the transmission of diseases within human PPIN, investigations are conducted to assess the probable involvement of these proteins in COVID-19 and other related severe diseases.
Proteins are considered indispensable for facilitating an organism’s viability, reproductive capabilities, and other fundamental physiological functions. Conventional biological assays are characterized by prolonged duration, extensive labor requirements, and financial expenses in order to identify essential proteins. Therefore, it is widely accepted that employing computational methods is the most expeditious and effective approach to successfully discerning essential proteins. Despite being a popular choice in machine learning (ML) applications, the deep learning (DL) method is not suggested for this specific research work based on sequence features due to the restricted availability of high-quality training sets of positive and negative samples. However, some DL works on limited availability of data are also executed at recent times which will be our future scope of work. Conventional ML techniques are thus utilized in this work due to their superior performance compared to DL methodologies. In consideration of the aforementioned, a technique called EPI-SF is proposed here, which employs ML to identify essential proteins within the protein-protein interaction network (PPIN). The protein sequence is the primary determinant of protein structure and function. So, initially, relevant protein sequence features are extracted from the proteins within the PPIN. These features are subsequently utilized as input for various machine learning models, including XGB Boost Classifier, AdaBoost Classifier, logistic regression (LR), support vector classification (SVM), Decision Tree model (DT), Random Forest model (RF), and Naïve Bayes model (NB). The objective is to detect the essential proteins within the PPIN. The primary investigation conducted on yeast examined the performance of various ML models for yeast PPIN. Among these models, the RF model technique had the highest level of effectiveness, as indicated by its precision, recall, F1-score, and AUC values of 0.703, 0.720, 0.711, and 0.745, respectively. It is also found to be better in performance when compared to the other state-of-arts based on traditional centrality like betweenness centrality (BC), closeness centrality (CC), etc. and deep learning methods as well like DeepEP, as emphasized in the result section. As a result of its favorable performance, EPI-SF is later employed for the prediction of novel essential proteins inside the human PPIN. Due to the tendency of viruses to selectively target essential proteins involved in the transmission of diseases within human PPIN, investigations are conducted to assess the probable involvement of these proteins in COVID-19 and other related severe diseases.Tracking mosquito-borne diseases via social media: a machine learning approach to topic modelling and sentiment analysishttps://peerj.com/articles/170452024-03-012024-03-01Song-Quan OngHamdan Ahmad
Mosquito-borne diseases (MBDs) are a major threat worldwide, and public consultation on these diseases is critical to disease control decision-making. However, traditional public surveys are time-consuming and labor-intensive and do not allow for timely decision-making. Recent studies have explored text analytic approaches to elicit public comments from social media for public health. Therefore, this study aims to demonstrate a text analytics pipeline to identify the MBD topics that were discussed on Twitter and significantly influenced public opinion. A total of 25,000 tweets were retrieved from Twitter, topics were modelled using LDA and sentiment polarities were calculated using the VADER model. After data cleaning, we obtained a total of 6,243 tweets, which we were able to process with the feature selection algorithms. Boruta was used as a feature selection algorithm to determine the importance of topics to public opinion. The result was validated using multinomial logistic regression (MLR) performance and expert judgement. Important issues such as breeding sites, mosquito control, impact/funding, time of year, other diseases with similar symptoms, mosquito-human interaction and biomarkers for diagnosis were identified by both LDA and experts. The MLR result shows that the topics selected by LASSO perform significantly better than the other algorithms, and the experts further justify the topics in the discussion.
Mosquito-borne diseases (MBDs) are a major threat worldwide, and public consultation on these diseases is critical to disease control decision-making. However, traditional public surveys are time-consuming and labor-intensive and do not allow for timely decision-making. Recent studies have explored text analytic approaches to elicit public comments from social media for public health. Therefore, this study aims to demonstrate a text analytics pipeline to identify the MBD topics that were discussed on Twitter and significantly influenced public opinion. A total of 25,000 tweets were retrieved from Twitter, topics were modelled using LDA and sentiment polarities were calculated using the VADER model. After data cleaning, we obtained a total of 6,243 tweets, which we were able to process with the feature selection algorithms. Boruta was used as a feature selection algorithm to determine the importance of topics to public opinion. The result was validated using multinomial logistic regression (MLR) performance and expert judgement. Important issues such as breeding sites, mosquito control, impact/funding, time of year, other diseases with similar symptoms, mosquito-human interaction and biomarkers for diagnosis were identified by both LDA and experts. The MLR result shows that the topics selected by LASSO perform significantly better than the other algorithms, and the experts further justify the topics in the discussion.A Cellular Potts Model of the interplay of synchronization and aggregationhttps://peerj.com/articles/169742024-02-292024-02-29Rose UnaTilmann Glimm
We investigate the behavior of systems of cells with intracellular molecular oscillators (“clocks”) where cell-cell adhesion is mediated by differences in clock phase between neighbors. This is motivated by phenomena in developmental biology and in aggregative multicellularity of unicellular organisms. In such systems, aggregation co-occurs with clock synchronization. To account for the effects of spatially extended cells, we use the Cellular Potts Model (CPM), a lattice agent-based model. We find four distinct possible phases: global synchronization, local synchronization, incoherence, and anti-synchronization (checkerboard patterns). We characterize these phases via order parameters. In the case of global synchrony, the speed of synchronization depends on the adhesive effects of the clocks. Synchronization happens fastest when cells in opposite phases adhere the strongest (“opposites attract”). When cells of the same clock phase adhere the strongest (“like attracts like”), synchronization is slower. Surprisingly, the slowest synchronization happens in the diffusive mixing case, where cell-cell adhesion is independent of clock phase. We briefly discuss potential applications of the model, such as pattern formation in the auditory sensory epithelium.
We investigate the behavior of systems of cells with intracellular molecular oscillators (“clocks”) where cell-cell adhesion is mediated by differences in clock phase between neighbors. This is motivated by phenomena in developmental biology and in aggregative multicellularity of unicellular organisms. In such systems, aggregation co-occurs with clock synchronization. To account for the effects of spatially extended cells, we use the Cellular Potts Model (CPM), a lattice agent-based model. We find four distinct possible phases: global synchronization, local synchronization, incoherence, and anti-synchronization (checkerboard patterns). We characterize these phases via order parameters. In the case of global synchrony, the speed of synchronization depends on the adhesive effects of the clocks. Synchronization happens fastest when cells in opposite phases adhere the strongest (“opposites attract”). When cells of the same clock phase adhere the strongest (“like attracts like”), synchronization is slower. Surprisingly, the slowest synchronization happens in the diffusive mixing case, where cell-cell adhesion is independent of clock phase. We briefly discuss potential applications of the model, such as pattern formation in the auditory sensory epithelium.Mathematical modelling of antibiotic interaction on evolution of antibiotic resistance: an analytical approachhttps://peerj.com/articles/169172024-02-262024-02-26Ramin NashebiMurat SariSeyfullah Enes Kotil
Background
The emergence and spread of antibiotic-resistant pathogens have led to the exploration of antibiotic combinations to enhance clinical effectiveness and counter resistance development. Synergistic and antagonistic interactions between antibiotics can intensify or diminish the combined therapy’s impact. Moreover, these interactions can evolve as bacteria transition from wildtype to mutant (resistant) strains. Experimental studies have shown that the antagonistically interacting antibiotics against wildtype bacteria slow down the evolution of resistance. Interestingly, other studies have shown that antibiotics that interact antagonistically against mutants accelerate resistance. However, it is unclear if the beneficial effect of antagonism in the wildtype bacteria is more critical than the detrimental effect of antagonism in the mutants. This study aims to illuminate the importance of antibiotic interactions against wildtype bacteria and mutants on the deacceleration of antimicrobial resistance.
Methods
To address this, we developed and analyzed a mathematical model that explores the population dynamics of wildtype and mutant bacteria under the influence of interacting antibiotics. The model investigates the relationship between synergistic and antagonistic antibiotic interactions with respect to the growth rate of mutant bacteria acquiring resistance. Stability analysis was conducted for equilibrium points representing bacteria-free conditions, all-mutant scenarios, and coexistence of both types. Numerical simulations corroborated the analytical findings, illustrating the temporal dynamics of wildtype and mutant bacteria under different combination therapies.
Results
Our analysis provides analytical clarification and numerical validation that antibiotic interactions against wildtype bacteria exert a more significant effect on reducing the rate of resistance development than interactions against mutants. Specifically, our findings highlight the crucial role of antagonistic antibiotic interactions against wildtype bacteria in slowing the growth rate of resistant mutants. In contrast, antagonistic interactions against mutants only marginally affect resistance evolution and may even accelerate it.
Conclusion
Our results emphasize the importance of considering the nature of antibiotic interactions against wildtype bacteria rather than mutants when aiming to slow down the acquisition of antibiotic resistance.
Background
The emergence and spread of antibiotic-resistant pathogens have led to the exploration of antibiotic combinations to enhance clinical effectiveness and counter resistance development. Synergistic and antagonistic interactions between antibiotics can intensify or diminish the combined therapy’s impact. Moreover, these interactions can evolve as bacteria transition from wildtype to mutant (resistant) strains. Experimental studies have shown that the antagonistically interacting antibiotics against wildtype bacteria slow down the evolution of resistance. Interestingly, other studies have shown that antibiotics that interact antagonistically against mutants accelerate resistance. However, it is unclear if the beneficial effect of antagonism in the wildtype bacteria is more critical than the detrimental effect of antagonism in the mutants. This study aims to illuminate the importance of antibiotic interactions against wildtype bacteria and mutants on the deacceleration of antimicrobial resistance.
Methods
To address this, we developed and analyzed a mathematical model that explores the population dynamics of wildtype and mutant bacteria under the influence of interacting antibiotics. The model investigates the relationship between synergistic and antagonistic antibiotic interactions with respect to the growth rate of mutant bacteria acquiring resistance. Stability analysis was conducted for equilibrium points representing bacteria-free conditions, all-mutant scenarios, and coexistence of both types. Numerical simulations corroborated the analytical findings, illustrating the temporal dynamics of wildtype and mutant bacteria under different combination therapies.
Results
Our analysis provides analytical clarification and numerical validation that antibiotic interactions against wildtype bacteria exert a more significant effect on reducing the rate of resistance development than interactions against mutants. Specifically, our findings highlight the crucial role of antagonistic antibiotic interactions against wildtype bacteria in slowing the growth rate of resistant mutants. In contrast, antagonistic interactions against mutants only marginally affect resistance evolution and may even accelerate it.
Conclusion
Our results emphasize the importance of considering the nature of antibiotic interactions against wildtype bacteria rather than mutants when aiming to slow down the acquisition of antibiotic resistance.moSCminer: a cell subtype classification framework based on the attention neural network integrating the single-cell multi-omics dataset on the cloudhttps://peerj.com/articles/170062024-02-262024-02-26Joung Min ChoiChaelin ParkHeejoon Chae
Single-cell omics sequencing has rapidly advanced, enabling the quantification of diverse omics profiles at a single-cell resolution. To facilitate comprehensive biological insights, such as cellular differentiation trajectories, precise annotation of cell subtypes is essential. Conventional methods involve clustering cells and manually assigning subtypes based on canonical markers, a labor-intensive and expert-dependent process. Hence, an automated computational prediction framework is crucial. While several classification frameworks for predicting cell subtypes from single-cell RNA sequencing datasets exist, these methods solely rely on single-omics data, offering insights at a single molecular level. They often miss inter-omic correlations and a holistic understanding of cellular processes. To address this, the integration of multi-omics datasets from individual cells is essential for accurate subtype annotation. This article introduces moSCminer, a novel framework for classifying cell subtypes that harnesses the power of single-cell multi-omics sequencing datasets through an attention-based neural network operating at the omics level. By integrating three distinct omics datasets—gene expression, DNA methylation, and DNA accessibility—while accounting for their biological relationships, moSCminer excels at learning the relative significance of each omics feature. It then transforms this knowledge into a novel representation for cell subtype classification. Comparative evaluations against standard machine learning-based classifiers demonstrate moSCminer’s superior performance, consistently achieving the highest average performance on real datasets. The efficacy of multi-omics integration is further corroborated through an in-depth analysis of the omics-level attention module, which identifies potential markers for cell subtype annotation. To enhance accessibility and scalability, moSCminer is accessible as a user-friendly web-based platform seamlessly connected to a cloud system, publicly accessible at http://203.252.206.118:5568. Notably, this study marks the pioneering integration of three single-cell multi-omics datasets for cell subtype identification.
Single-cell omics sequencing has rapidly advanced, enabling the quantification of diverse omics profiles at a single-cell resolution. To facilitate comprehensive biological insights, such as cellular differentiation trajectories, precise annotation of cell subtypes is essential. Conventional methods involve clustering cells and manually assigning subtypes based on canonical markers, a labor-intensive and expert-dependent process. Hence, an automated computational prediction framework is crucial. While several classification frameworks for predicting cell subtypes from single-cell RNA sequencing datasets exist, these methods solely rely on single-omics data, offering insights at a single molecular level. They often miss inter-omic correlations and a holistic understanding of cellular processes. To address this, the integration of multi-omics datasets from individual cells is essential for accurate subtype annotation. This article introduces moSCminer, a novel framework for classifying cell subtypes that harnesses the power of single-cell multi-omics sequencing datasets through an attention-based neural network operating at the omics level. By integrating three distinct omics datasets—gene expression, DNA methylation, and DNA accessibility—while accounting for their biological relationships, moSCminer excels at learning the relative significance of each omics feature. It then transforms this knowledge into a novel representation for cell subtype classification. Comparative evaluations against standard machine learning-based classifiers demonstrate moSCminer’s superior performance, consistently achieving the highest average performance on real datasets. The efficacy of multi-omics integration is further corroborated through an in-depth analysis of the omics-level attention module, which identifies potential markers for cell subtype annotation. To enhance accessibility and scalability, moSCminer is accessible as a user-friendly web-based platform seamlessly connected to a cloud system, publicly accessible at http://203.252.206.118:5568. Notably, this study marks the pioneering integration of three single-cell multi-omics datasets for cell subtype identification.In silico prediction of candidate gene targets for the management of African cassava whitefly (Bemisia tabaci, SSA1-SG1), a key vector of viruses causing cassava brown streak diseasehttps://peerj.com/articles/169492024-02-232024-02-23Tadeo KaweesiJohn ColvinLahcen CampbellPaul VisendiGareth MaslenTitus AlicaiSusan Seal
Whiteflies (Bemisia tabaci sensu lato) have a wide host range and are globally important agricultural pests. In Sub-Saharan Africa, they vector viruses that cause two ongoing disease epidemics: cassava brown streak disease and cassava mosaic virus disease. These two diseases threaten food security for more than 800 million people in Sub-Saharan Africa. Efforts are ongoing to identify target genes for the development of novel management options against the whitefly populations that vector these devastating viral diseases affecting cassava production in Sub-Saharan Africa. This study aimed to identify genes that mediate osmoregulation and symbiosis functions within cassava whitefly gut and bacteriocytes and evaluate their potential as key gene targets for novel whitefly control strategies. The gene expression profiles of dissected guts, bacteriocytes and whole bodies were compared by RNAseq analysis to identify genes with significantly enriched expression in the gut and bacteriocytes. Phylogenetic analyses identified three candidate osmoregulation gene targets: two α-glucosidases, SUC 1 and SUC 2 with predicted function in sugar transformations that reduce osmotic pressure in the gut; and a water-specific aquaporin (AQP1) mediating water cycling from the distal to the proximal end of the gut. Expression of the genes in the gut was enriched 23.67-, 26.54- and 22.30-fold, respectively. Genome-wide metabolic reconstruction coupled with constraint-based modeling revealed four genes (argH, lysA, BCAT & dapB) within the bacteriocytes as potential targets for the management of cassava whiteflies. These genes were selected based on their role and essentiality within the different essential amino acid biosynthesis pathways. A demonstration of candidate osmoregulation and symbiosis gene targets in other species of the Bemisia tabaci species complex that are orthologs of the empirically validated osmoregulation genes highlights the latter as promising gene targets for the control of cassava whitefly pests by in planta RNA interference.
Whiteflies (Bemisia tabaci sensu lato) have a wide host range and are globally important agricultural pests. In Sub-Saharan Africa, they vector viruses that cause two ongoing disease epidemics: cassava brown streak disease and cassava mosaic virus disease. These two diseases threaten food security for more than 800 million people in Sub-Saharan Africa. Efforts are ongoing to identify target genes for the development of novel management options against the whitefly populations that vector these devastating viral diseases affecting cassava production in Sub-Saharan Africa. This study aimed to identify genes that mediate osmoregulation and symbiosis functions within cassava whitefly gut and bacteriocytes and evaluate their potential as key gene targets for novel whitefly control strategies. The gene expression profiles of dissected guts, bacteriocytes and whole bodies were compared by RNAseq analysis to identify genes with significantly enriched expression in the gut and bacteriocytes. Phylogenetic analyses identified three candidate osmoregulation gene targets: two α-glucosidases, SUC 1 and SUC 2 with predicted function in sugar transformations that reduce osmotic pressure in the gut; and a water-specific aquaporin (AQP1) mediating water cycling from the distal to the proximal end of the gut. Expression of the genes in the gut was enriched 23.67-, 26.54- and 22.30-fold, respectively. Genome-wide metabolic reconstruction coupled with constraint-based modeling revealed four genes (argH, lysA, BCAT & dapB) within the bacteriocytes as potential targets for the management of cassava whiteflies. These genes were selected based on their role and essentiality within the different essential amino acid biosynthesis pathways. A demonstration of candidate osmoregulation and symbiosis gene targets in other species of the Bemisia tabaci species complex that are orthologs of the empirically validated osmoregulation genes highlights the latter as promising gene targets for the control of cassava whitefly pests by in planta RNA interference.Data-driven detection of age-related arbitrary monotonic changes in single-cell gene expression distributionshttps://peerj.com/articles/168512024-02-082024-02-08Jian Hao ChengDaigo Okada
Identification of genes whose expression increases or decreases with age is central to understanding the mechanisms behind aging. Recent scRNA-seq studies have shown that changes in single-cell expression profiles with aging are complex and diverse. In this study, we introduce a novel workflow to detect changes in the distribution of arbitrary monotonic age-related changes in single-cell expression profiles. Since single-cell gene expression profiles can be analyzed as probability distributions, our approach uses information theory to quantify the differences between distributions and employs distance matrices for association analysis. We tested this technique on simulated data and confirmed that potential parameter changes could be detected in a set of probability distributions. Application of the technique to a public scRNA-seq dataset demonstrated its potential utility as a straightforward screening method for identifying aging-related cellular features.
Identification of genes whose expression increases or decreases with age is central to understanding the mechanisms behind aging. Recent scRNA-seq studies have shown that changes in single-cell expression profiles with aging are complex and diverse. In this study, we introduce a novel workflow to detect changes in the distribution of arbitrary monotonic age-related changes in single-cell expression profiles. Since single-cell gene expression profiles can be analyzed as probability distributions, our approach uses information theory to quantify the differences between distributions and employs distance matrices for association analysis. We tested this technique on simulated data and confirmed that potential parameter changes could be detected in a set of probability distributions. Application of the technique to a public scRNA-seq dataset demonstrated its potential utility as a straightforward screening method for identifying aging-related cellular features.In silico and in vitro evaluation of the anti-virulence potential of patuletin, a natural methoxy flavone, against Pseudomonas aeruginosahttps://peerj.com/articles/168262024-02-012024-02-01Ahmed MetwalyMoustafa M. SalehAisha AlsfoukIbrahim M. IbrahimMuhamad Abd-ElraoufEslam ElkaeedHazem ElkadyIbrahim Eissa
This study aimed to investigate the potential of patuletin, a rare natural flavonoid, as a virulence and LasR inhibitor against Pseudomonas aeruginosa. Various computational studies were utilized to explore the binding of Patuletin and LasR at a molecular level. Molecular docking revealed that Patuletin strongly interacted with the active pocket of LasR, with a high binding affinity value of −20.96 kcal/mol. Further molecular dynamics simulations, molecular mechanics generalized Born surface area (MM/GBSA), protein-ligand interaction profile (PLIP), and essential dynamics analyses confirmed the stability of the patuletin-LasR complex, and no significant structural changes were observed in the LasR protein upon binding. Key amino acids involved in binding were identified, along with a free energy value of −26.9 kcal/mol. In vitro assays were performed to assess patuletin’s effects on P. aeruginosa. At a sub-inhibitory concentration (1/4 MIC), patuletin significantly reduced biofilm formation by 48% and 42%, decreased pyocyanin production by 24% and 14%, and decreased proteolytic activities by 42% and 20% in P. aeruginosa isolate ATCC 27853 (PA27853) and P. aeruginosa clinical isolate (PA1), respectively. In summary, this study demonstrated that patuletin effectively inhibited LasR activity in silico and attenuated virulence factors in vitro, including biofilm formation, pyocyanin production, and proteolytic activity. These findings suggest that patuletin holds promise as a potential therapeutic agent in combination with antibiotics to combat antibiotic-tolerant P. aeruginosa infections.
This study aimed to investigate the potential of patuletin, a rare natural flavonoid, as a virulence and LasR inhibitor against Pseudomonas aeruginosa. Various computational studies were utilized to explore the binding of Patuletin and LasR at a molecular level. Molecular docking revealed that Patuletin strongly interacted with the active pocket of LasR, with a high binding affinity value of −20.96 kcal/mol. Further molecular dynamics simulations, molecular mechanics generalized Born surface area (MM/GBSA), protein-ligand interaction profile (PLIP), and essential dynamics analyses confirmed the stability of the patuletin-LasR complex, and no significant structural changes were observed in the LasR protein upon binding. Key amino acids involved in binding were identified, along with a free energy value of −26.9 kcal/mol. In vitro assays were performed to assess patuletin’s effects on P. aeruginosa. At a sub-inhibitory concentration (1/4 MIC), patuletin significantly reduced biofilm formation by 48% and 42%, decreased pyocyanin production by 24% and 14%, and decreased proteolytic activities by 42% and 20% in P. aeruginosa isolate ATCC 27853 (PA27853) and P. aeruginosa clinical isolate (PA1), respectively. In summary, this study demonstrated that patuletin effectively inhibited LasR activity in silico and attenuated virulence factors in vitro, including biofilm formation, pyocyanin production, and proteolytic activity. These findings suggest that patuletin holds promise as a potential therapeutic agent in combination with antibiotics to combat antibiotic-tolerant P. aeruginosa infections.