Integrative bioinformatics informed by network toxicology and machine learning elucidates the carcinogenic mechanisms of benzo[a]pyrene-induced breast cancer
Abstract
Background
Benzo[a]pyrene (BaP) is a recognized mutagen and carcinogen, yet epidemiological links to breast cancer (BC) remain inconclusive.
Methods
We integrated network toxicology, machine learning, and bioinformatics. BaP targets (ChEMBL, PharmMapper, SEA, GeneCards, OMIM) were intersected with BC genes, followed by GO/KEGG enrichment. Core regulatory genes were further screened using diverse machine-learning algorithms, and their expression levels, diagnostic performance, and associations with the tumor immune microenvironment were subsequently validated. Gene Set Variation Analysis (GSVA) was applied to assess pathway activity, and Cytoscape was used to construct a lncRNA–miRNA–mRNA multilevel regulatory network, thereby elucidating post-transcriptional control mechanisms. Finally, molecular docking and molecular dynamics simulations were performed to evaluate interactions between BaP and the core targets.
Results
A total of 216 overlapping BaP–breast cancer targets were initially identified, which were significantly enriched in processes such as the MAPK signaling pathway. Seven core genes were identified by machine-learning–based screening; among them, KIF11, INHBA, NEK2, and AURKA exhibited significantly higher expression in breast cancer tissues and were associated with worse patient prognosis and altered immune-cell infiltration. Based on pathway analyses, tumor progression was inferred to be promoted by these genes through regulation of the cell cycle, DNA replication, and cell-adhesion pathways. Molecular modeling results confirm BaP exhibits stable binding affinity with KIF11, AURKA, INHBA, and NEK2 proteins.
Conclusion
This research provides new theoretical insights into the etiology of environmental pollutant-induced BC and offers potential molecular biomarkers for risk assessment and the formulation of targeted prevention strategies.