Disease diagnosis and prediction using deep learning: a review

Shyamala Krishnan; Navamani Thandava Meganathan

doi:10.7717/peerj-cs.3484

Disease diagnosis and prediction using deep learning: a review

Shyamala Krishnan, Navamani Thandava Meganathan

Vellore Institute of Technology University, Vellore, India

DOI: 10.7717/peerj-cs.3484

Published: 2026-02-25
Accepted: 2025-11-26
Received: 2025-05-06

Academic Editor: Nicole Nogoy

Subject Areas: Artificial Intelligence, Data Mining and Machine Learning, Data Science
Keywords: Machine learning, Deep learning, Disease diagnosis, Prediction, Healthcare

Copyright: © 2026 Krishnan and Thandava Meganathan
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

Cite this article: Krishnan S, Thandava Meganathan N. 2026. Disease diagnosis and prediction using deep learning: a review. PeerJ Computer Science 12:e3484 https://doi.org/10.7717/peerj-cs.3484

The authors have chosen to make the review history of this article public.

Abstract

Deep learning (DL) is a machine learning technique that processes data in a manner influenced by the functioning of the human brain. It is an effective tool for deciphering complicated data and may be applied to many other processes, such as decision-making, image recognition, and natural language processing. The requirement to process large amounts of data rapidly and precisely drives the demand for deep learning technologies in the healthcare industry. Deep learning can find patterns in medical data, including genomic data, patient records, and medical imaging. Additionally, it can be utilized to create prediction models that can aid clinicians in selecting the course of treatment for patients. This article employed deep learning models to examine medical data for better diagnoses. DL models efficiently improve accuracy, handle complicated medical data, and detect subtle trends. A comparative analysis of deep learning architectures revealed that DL helps boost diagnostic accuracy and recognize subtle disease patterns. However, issues like the need for vast training data, overfitting, model interpretability, and high computational resources exist. Also, we presented the applications in diagnosing heart disease, cancer, Alzheimer’s, and other specific diseases, demonstrating the potential of deep learning in predictive modeling for clinical decision support. This article comprehensively reviews deep learning architectures and comparative research for disease identification and prediction, and explores emerging solutions such as federated learning and explainable artificial intelligence (AI). The study also tackles research obstacles and potential advantages by presenting the current status and probable future directions of deep learning in disease diagnosis and prognosis.

Introduction

Machine learning (ML) is a subgroup of artificial intelligence (AI) that allows machines to learn from complex data to perform tasks without being explicitly programmed. Several industries, including healthcare and disease diagnosis, are adopting machine learning. There are several machine learning applications in healthcare, including administration, treatment, and diagnosis. Numerous researchers and practitioners have shown that Machine Learning-Based Disease Diagnostics (MLBDD) hold the potential to be quick and affordable (Ahsan & Siddique, 2022). However, ML-based algorithms are limitless and do not experience human weariness. Traditional diagnostic approaches are often expensive, time-consuming, and require human participation. Additionally, diagnosis methods are bound by the patient’s capabilities. As an outcome, developing a technique for identifying diseases with surprisingly high patient populations in a medical context may be feasible. X-ray and magnetic resonance imaging (MRI) images, as well as tabular information about the diseases, age, and gender of patients, are used to create MLBDD systems (Ahsan et al., 2020). ML can be used to diagnose various diseases, including pneumonia, breast cancer, heart failure, and Alzheimer’s disease. ML algorithms’ emergence and the use of technology in disease diagnosis sectors demonstrate their value in the medical industry.

Deep learning (DL) has completely transformed the science of disease prediction and medical diagnosis. Medical personnel may increase the precision and speed of diagnosis, lower the cost of medical tests, and ultimately raise the standard of patient care by utilizing the potential of DL algorithms. The application of deep learning algorithms for disease diagnosis and prediction has significantly increased in recent years (Yu et al., 2023). The rationale behind investigating deep learning in disease detection and prediction is to provide researchers with a potent instrument to enhance healthcare. This study intends to serve as a beneficial reference point for anyone working in this sector and its associated domains. Deep learning has notable benefits compared to conventional approaches, such as the possibility of improved diagnosis accuracy by analyzing intricate medical data. Deep learning algorithms may be used to examine data and predict the disease risk, enabling early detection, which is crucial for effective treatment. This allows the implementation of proactive measures. Moreover, deep learning can enhance healthcare accessibility in places with limited resources or remote locations, providing significant advantages to underserved groups.

The devised approach should handle more data, better algorithms, and improved interpretability to attain high precision, a maximal Receiver Operating Characteristic (ROC), and a low false positive rate. The discovery enabled health systems to easily manage their conventional clinical diagnostics and create deep learning strategies for improving patient outcomes and lowering healthcare costs. There are many areas where standards coverage is lacking when analyzing machine learning methods for predicting medical facts for disease diagnosis. Ahmed & Husien (2024) demonstrated how ensemble learning can enhance the accuracy and robustness of heart disease prediction. However, the lack of interpretability, data biases, and little real-world validation make clinical adoption difficult. Tejaswi, Srinivasu & Gottumukkala (2025) thoroughly analyzed preprocessing techniques, dataset utilization, machine learning, and deep learning approaches for predicting lung cancer. Furthermore, practical clinical issues, including interpretability, data privacy, and healthcare integration, are unresolved. Singh & Gulati (2025) used Convolutional Neural Networks (CNNs) for feature extraction and K-Nearest Neighbors (KNN) for classification to improve accuracy. The research investigates a machine learning-based method for diagnosing and predicting chronic diseases. Combining data compression and noise reduction approaches, the model surpasses traditional methods and reaches over 93% accuracy. However, issues such as model interpretability, real-world validation, and possible biases in dataset representation need to be resolved for broader clinical use. Sia et al. (2025) integrated Support Vector Machine (SVM), random forest, KNN, and Artificial Neural Network (ANN) for symptom-based diagnosis to develop a machine learning-based disease prediction chatbot. SVM’s efficacy in disease categorization was shown by its accuracy of 92.24%. The chatbot improves user engagement by using Long Short-Term Memory (LSTM) and Natural Language Toolkit (NLT). Practical implementation is still hampered by issues in handling equivocal symptom inputs, generalizability across various populations, and real-world validation.

Savitha, Kannan & Logeswaran (2025) combined Harris Hawks Optimization (HHO) with Deep Belief Networks (DBNs) through better feature selection and hyperparameter tuning to improve the prediction of cardiovascular disease. The HHO process is optimized using the Correlation-based Weighted Compound Feature Generation (CWCFG) approach, which outperforms other metaheuristic algorithms, such as RBFO and GWO, and conventional machine learning models. Despite the model’s increased accuracy and better interpretability, issues with clinical validation, practical application, and computational complexity for widespread use still exist. Loganathan et al. (2025) introduced a MATLAB-based automated cancer detection and classification system for brain tumors, skin cancer, and lung cancer that uses CNNs. The system reduces human labor while improving diagnostic accuracy by integrating image processing, feature extraction, and segmentation. However, drawbacks include the need for high-quality medical imaging, the possibility of model bias, and difficulties with practical clinical validation. For broader application, further enhancements may concentrate on dataset variety, real-time adaptation, and interface with medical imaging systems. The main obstacles that AI-driven disease prediction models face include data biases, clinical integration problems, insufficient real-world validation, and a lack of interpretability. High-accuracy models have limited practical use due to their explainability, fairness, and scalability issues. Explainable AI (XAI), bias reduction, privacy-preserving strategies, and extensive clinical validation are needed to close these gaps. Increasing healthcare integration and optimizing computing efficiency are essential for practical implementation, as shown in Table 1. Closing these gaps is crucial to ensuring the practical implementation, transparency, and generalizability of healthcare. This study distinguishes itself from previous evaluations that concentrate on individual diseases or general deep learning methods by offering a cross-disease, modality-aware comparative examination of deep learning techniques in chronic illness diagnosis. It classifies deep learning applications by disease type and data modalities (e.g., imaging, signal, clinical text). It correlates them with suitable deep learning architectures, providing a decision-support matrix for researchers and clinicians. This review highlights significant deficiencies in the literature, including inadequate multimodal fusion strategies, poor generalizability across diverse populations, and a deficiency in explainability. It proposes a comprehensive framework that integrates transfer learning, federated learning, and explainable AI for future deep learning systems in healthcare. This comprehensive viewpoint establishes our evaluation as a strategic framework for the progression of next-generation AI-driven clinical diagnostic tools.

Table 1:

Comparative analysis of recent research studies in disease diagnosis.

Related work	Dataset	Methodology	Contributions	Drawbacks
Shatnawi, Abuein & Al-Quraan (2025)	Kaggle CT-scan dataset (four cancer types)	Enhanced CNN (ConvNeXt, VGG16, ResNet50, EfficientNetB0) with preprocessing	• Achieved 100% accuracy using Enhanced CNN, which outperformed existing models • Emphasized the importance of integrating advanced image processing techniques into clinical practice.	• The model lacks interpretability and fails to include any explainable AI or Transformer-based methodologies.
Khalfallah et al. (2025)	Large EEG datasets from multiple centers (UNIVERSITY OF SHEFFIELD EEG, CHB-MIT EEG, IBIB PAN EEG, AHEPA GENERAL HOSPITAL OF THESSALONIKI EEG)	CNN, ResNet, ChronoNet + Multi-head Attention	• 100% multiclass EEG disorder classification with attention mechanism. • Multi-head self-attention is used to enhance temporal EEG feature extraction. • Providing significant contributions to the domain of brain-computer interfacing and enhancing the prospects for improved categorization and diagnosis of neurological diseases	• No explainability mechanism was included, hence constraining transparency for decision making. • Lack of a multimodal approach and expanding datasets for neurological disorders, enabling better clinical decision-making.
Alzahrani (2025)	Small, imbalanced clinical dataset (309 samples)	Conditional Tabular Generative Adversarial Network(CTGAN) + Synthetic Minority Oversampling Technique (SMOTE) + Random Forest	• Improved accuracy via synthetic data and balancing. • It attained an accuracy of 98.93% using lightweight models such as Random Forest.	• The model excludes deep learning and Transformer methodologies and lacks interpretability. • The limited size of the dataset constrains generalizability. Lack of Practical Implementation.
Saryazdi & Mostafaeipour (2025)	Hospital dataset (Real time -medium-sized)	Decision Tree + Fuzzy Clustering	• A comprehensible fuzzy-based decision system was developed, integrating rule-based categorization with optimal runtime efficiency. • The system achieved 97% accuracy while preserving model simplicity and clarity.	• Expanding Dataset Size and Diversity. • Exploring advanced Fuzzy classification methods
Abbas et al. (2025)	2,310 image dataset (augmented; four skin conditions)	VGG16 + LRP for explainability	• Used transfer learning (VGG16) and Layer-wise Relevance Propagation (LRP) to improve the interpretability of skin disease prediction. • It achieved high accuracy on an enhanced image dataset.	• Not validated on bigger and more varied datasets, limiting its generalizability across domains. • Lacks privacy-preserving techniques such as federated learning or blockchain, presenting possible confidentiality issues. • Lack of interpretability of the model.
Lopez Alcaraz et al. (2025)	MIMIC-IV ECG + ECG-View II (large ECG dataset)	XGBoost with SHAP; tree-based classifiers	• This study used tree-based models and SHAP values to provide robust external validation and model transparency for neuropsychiatric predictions based on ECG data. • The research emphasizes non-invasive, economical diagnostic applications.	• Demographic differences may conceal ECG signals unique to disorders. • Dependence on derived features instead of actual ECG waveforms may limit diagnostic precision. • Lack of external validation impacts the generalizability of the results.
Stabellini et al. (2025)	Cancer-specific hospital dataset	SVM, KNN, Decision Tree, Random Forest	• Various machine learning classifiers were evaluated for heart disease prediction, demonstrating high accuracy and low computational cost. • Compared ML models for classification accuracy.	• To ensure robustness and generalizability, scores will be externally validated in more varied cohorts of patients with varying demographic, socioeconomic, and geographical contexts.

DOI: 10.7717/peerj-cs.3484/table-1

Although deep learning has revolutionized disease prediction and diagnosis, each model class possesses intrinsic constraints that have spurred the creation of succeeding architectures. Initial deep learning models, including fundamental ANNs, were constrained in processing high-dimensional medical data due to their superficial architectures and the vanishing gradient problem. CNNs tackled these issues by implementing local connectivity and weight sharing, facilitating the fast acquisition of hierarchical information directly from medical pictures. Nonetheless, CNNs are hampered by their static inductive biases and narrow receptive fields, which impede their ability to grasp long-range relationships across various parts of an image. Recurrent Neural Networks (RNNs) and LSTM networks were created to address the limits of temporal modeling, rendering them effective for sequential healthcare data like electrocardiogram (ECG) signals or clinical text, however, they encounter difficulties with long-range dependencies and elevated training costs. Vision Transformers (ViTs) were developed to rectify the limitations of CNNs by utilizing self-attention processes that collect global context from initial layers, enabling the modeling of distant relationships and intricate spatial dependencies in medical images. Notwithstanding their benefits, ViTs need substantial data and computing resources, prompting the development of hybrid CNN-Transformer architectures that combine the efficiency of CNNs in local feature extraction with the global representation capabilities of ViTs. Recently, ensemble learning techniques and optimization-driven hybrids, such as Deep Belief Networks (DBNs) augmented with metaheuristics, have been suggested to improve resilience and accuracy; nevertheless, these methods present issues related to interpretability and computational complexity. The progression of deep learning models in healthcare demonstrates an ongoing cycle of overcoming previous restrictions while introducing new challenges, highlighting the necessity for explainable, efficient, and clinically validated AI systems.

The main contributions of this study are as follows:

1.

Reviewed the most recent machine learning and deep learning architectures and methodologies that can be applied to predicting and diagnosing diseases.
2.

Highlighting the algorithmic steps from data collection to model deployment and continuous learning to design a DL model.
3.

Identified deep learning’s potential for predicting medical outcomes and disease diagnosis.
4.

A review of the comprehensive comparative analysis of deep learning approaches is presented.
5.

Opportunities, research challenges, and recent advancements related to the use of DL for disease diagnosis and prediction are discussed.

The structure of this study is as follows: In ‘Survey Methodology’ and ‘Disease Diagnosis Using Machine Learning’, survey methodology and recent research works related to disease diagnosis and prediction using ML models are discussed. DL models for disease diagnosis and prediction are discussed in ‘Deep Learning Models’. ‘Deep Learning Techniques for Disease Prediction and Diagnosis’ provides Deep Learning Techniques for Prediction and Disease Diagnosis. A comprehensive comparative analysis of the various DL approaches for disease diagnosis and prediction is provided in ‘Comprehensive Comparative Analysis of Deep Learning Approaches’. Challenges and open issues are discussed in ‘Research Challenges and Open Issues in Disease Diagnosis and Prediction ’. Recent advancements in disease diagnosis and prediction are discussed in ‘Recent Advancements in Deep Learning for Disease Diagnosis and Prediction’. The conclusion and future works are provided in ‘Conclusion and Future Work’, respectively.

Survey methodology

This review employs a systematic survey methodology to investigate the role of deep learning in disease diagnosis and prediction. Literature pertinent to the years 2018 to 2025 was obtained from sources such as IEEE Xplore, PubMed, and Scopus using specific keywords associated with deep learning in healthcare. Studies were chosen based on their use of deep learning models, clinical datasets, and documented performance measures. The search approach included phrases such as “deep learning,” “disease prediction,” “medical imaging,” “clinical diagnosis,” “healthcare AI,” “CNN,” “Recurrent Neural Network,” “transformer,” and “health informatics.” Works were classified according to disease category, data modality, and deep learning architecture. A comparative examination elucidates the strengths, limits, and new developments, including explainable AI, federated learning, and transformer-based models. This technique guarantees a thorough and analytical evaluation of deep learning’s influence on clinical decision-making and predictive healthcare.

To enhance the rigor of this survey, a systematic technique was implemented. This entails the precise delineation of the databases consulted (e.g., PubMed, IEEE Xplore, Scopus), the keywords employed, and the temporal parameters established. Explicit inclusion and exclusion criteria must be defined to guarantee impartial selection.

The study selection process followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. The detailed flow of records through each phase is described as follows:

Identification

•
A total of 105 records were identified through database searches:
- IEEE Xplore
- PubMed
- Scopus
•

No additional records were identified through manual searches or other sources.

Screening

After the removal of duplicates, 100 unique records remained.
Titles and abstracts of all 100 records were screened.
Five studies were excluded at this stage for not meeting the core objectives (e.g., absence of deep learning, irrelevant disease domain, or insufficient methodological description).

Eligibility

Ninety-five full-text articles were assessed for eligibility based on the predefined inclusion and exclusion criteria.
No additional full-text articles were excluded at this stage because the initial screening had already filtered non-relevant articles.

Included

Finally, 100 studies met all inclusion criteria and were incorporated into the qualitative synthesis (survey analysis).

Disease diagnosis using machine learning

The process of evaluating which disease or condition best explains a person’s symptoms and indicators is known as disease diagnosis. The intricate procedure includes a physical examination, information collection from the patient, and ordering the necessary tests. Several methods of machine learning for disease diagnosis exist. Algorithms based on machine learning may be used to, among other things, personalize therapy, identify trends in medical data, and predict disease risk.

Overview of machine learning

ML, a branch of AI, allows computers to “self-learn” from training data and improve over time without explicit programming. Detecting patterns in data and learning from them permits machine learning algorithms to develop their predictions. Medical experts can apply machine learning in healthcare to create better diagnostic tools for examining medical images. For instance, medical imaging (such as X-rays or MRI scans) can utilize a machine-learning algorithm for pattern recognition. Machines learn on their own without being programmed by humans. Discovering patterns and learning from ML are useful in healthcare research for better diagnostic tools and for examining images. Machine learning can address multiple disease diagnoses and prevention. ML is used in the healthcare industry in many ways, such as diagnostics, treatment, research, drug discovery, and healthcare administration.

Figure 1 shows the ML models, which are classified into supervised learning, unsupervised learning, reinforcement learning, transfer learning, and federated learning. This organized overview contextualizes the methodological underpinnings on which the study’s proposed model is built, guiding readers through the reasoning behind the selection of various methodologies. In machine learning, supervised learning involves using labeled datasets to train algorithms to recognize patterns and predict outcomes (Jiang, Gradus & Rosellini, 2020). In unsupervised learning, the training datasets are not used to supervise the models throughout the machine-learning process. Instead, the models extract the insights and hidden patterns from the provided data (DataRobot, 2021).

Figure 1: Classification of machine learning models.

Download full-size image

DOI: 10.7717/peerj-cs.3484/fig-1

Machine learning techniques for disease diagnosis

New methods of disease diagnosis in medicine are being developed using ML. Figure 1 shows the number of techniques that can be used to diagnose the disease accurately, and depending on the disease, the best algorithm can be used. Here are a few related works. Heart disease is a significant health problem that affects millions of people worldwide. Early and accurate diagnosis of heart disease is essential for adequate treatment and prevention. Parkinson’s disease is a neurodegenerative disease brought on by the death of brain cells that produce dopamine. One interesting strategy being developed by researchers for the early diagnosis of Parkinson’s disease is using machine learning. A machine learning-based method created by Senturk (2020) successfully classified Parkinson’s patients with an accuracy of 93.84%. It showed that machine learning might be a valuable technique for the early diagnosis of Parkinson’s disease. Based on ML theory, notably Support Vector Machines (SVM) and Random Forests (RF), Huang, Gao & Ye (2021) created an intelligent data-driven model. The performance and accuracy of Cough Variant Asthma (CVA) diagnosis can be enhanced. Additionally, it was demonstrated that the suggested methodology was a simple way to increase the effectiveness of disease diagnosis. ML is effective in predicting several diseases, including diabetes, cancer, and heart disease, and ML algorithms are as accurate as or even more accurate than human doctors. Early diagnosis is critical for the successful treatment of many diseases. ML has the potential to diagnose diseases early when they are more curable. This could lead to improved patient outcomes and reduced healthcare costs.

Using spirometry data, Bhattacharjee et al. (2022) created machine-learning models to categorize lung disorders into obstructive and non-obstructive categories. Using 5-fold Cross-Validation (CV), models were trained using spirometry data from 1,163 patients. An additional blind dataset of 151 patients was used for external validation. With an accuracy of 83.7%, the Multi-Layer Perceptrons (MLP) model operated at peak efficiency. In summary, despite the many challenges that need to be solved, recent advances in ML have led to new challenges in the medical field, like unbalanced data, ML interpretation, and ML ethics (Ahsan et al., 2020). Deep Learning is a potential approach for disease detection and prediction, with ANN serving as the fundamental notion of machine learning. DL models can recognize intricate links and patterns in data. Advanced DL architectures like CNN and RNN can efficiently process large volumes of data in real-time.

The comparative analysis in Table 1 indicates that while each research work significantly adds to its field, there is evident fragmentation in methodological integration. It emphasizes the essential need to include interpretability, multimodal data, and empirical validation to enhance the therapeutic relevance of deep learning models. Few research studies have concurrently examined essential AI components, including explainability, privacy, generalizability, and real-world scalability. This suggests that contemporary research often prioritizes performance criteria above configuration with the broader requirements of clinical implementation. Notwithstanding sophisticated models, no comprehensive research integrates large-scale data utilization, explainable AI, Transformer models, class balancing, and federated learning elements essential for ethically responsible and transparent AI in healthcare. This indicates a considerable deficiency in cohesive, comprehensive, intelligent diagnostic systems. Consequently, there is an urgent need for multidisciplinary and multimodal AI pipelines that exhibit high performance while being interpretable, secure, and relevant to various patient demographics.

Deep learning models

Recently, various healthcare applications, including disease diagnosis and prognosis, have begun to use deep learning. Deep learning algorithms can extract significant features helpful for diagnosis and prediction from vast volumes of data. CNNs, Recurrent Neural Networks (RNNs), and Deep Neural Networks (DNNs) are some of the deep learning techniques that have been employed for disease diagnosis and prediction. Figure 2 shows the representation of Deep learning models. A unique synthesis of deep learning model categories closely correlated with certain neurological diseases, connecting architectural design with clinical significance. Figure 2 highlights functional alignment rather than providing a generic list of models; for instance, it illustrates how generative models, such as Generative Adversarial Network (GANs), are appropriate for data augmentation in rare disease contexts, while convolutional networks are superior for spatial biomarker extraction, thus presenting a disease-specific framework for model selection that is seldom discussed in existing literature.

Numerous deep learning architectures and methods are frequently employed to diagnose and predict diseases. CNN architectures are among the most commonly utilized architectures. CNNs are very helpful for image-based data, like that from medical imaging. CNNs can use different levels of abstraction to extract information from images that can be used for disease diagnosis and prediction (Mohades Deilami, Sadr & Tarkhan, 2022). RNN is another often-utilized model for disease diagnosis. For time-series data, like Electrocardiogram (ECG) data, RNNs are especially helpful. The temporal dependencies in the data can be modeled by RNNs and used for disease diagnosis and prediction. Various alternative deep learning architectures and methods are frequently employed for disease detection and prediction, in addition to CNNs and RNNs. These include Deep Belief Networks (DBNs), GANs, Generative Artificial Intelligence (GAI), and autoencoders, as shown in Fig. 2.

Convolutional neural networks

Convolutional Neural Networks are deep learning models used in image recognition, video analysis, Natural Language Processing (NLP), medical imaging, remote sensing, generative tasks, and speech recognition, and they need a large set of input parameters. CNNs are used for automatic feature extraction and operate well on large data sizes. CNNs are used in medical image analysis for disease diagnosis and prediction and mathematical notation is outlined in Eq. (1). Compared to MultiLayer Perceptron (MLP)s, CNNs require less memory and time to train on the data. CNN architecture contains layers such as convolutional, pooling, fully connected, and output.

Figure 3 depicts the architecture of CNNs, which have proven essential for disease detection via deep learning. It demonstrates how CNNs systematically extract and abstract spatial elements from medical images, emulating clinical pattern recognition for disease diagnosis. Each layer represents physiologically significant modifications that facilitate precise categorization. The stratified architecture of CNNs facilitates a gradual and autonomous learning process from unprocessed medical images to precise diagnostic predictions. Wang (2024) used a CNN to diagnose breast cancer from X-ray images and achieved an accuracy of 92.1%. The result highlights the capability of deep learning to aid radiologists by offering a second opinion or automating preliminary screening, thereby alleviating the burden and enhancing early detection rates.

Figure 3: Convolutional neural networks (ul Haq et al., 2022).

Download full-size image

DOI: 10.7717/peerj-cs.3484/fig-3

In the study of medical images for disease diagnosis and prognosis, CNNs have produced encouraging results. To extract information from images and learn a hierarchical representation of the image, CNNs employ convolutional layers. CNNs have been used to diagnose several disorders, including pulmonary, cardiovascular, and cancer. Sudhish, Nair & Shailesh (2024) incorporated CNNs into the Content-Based Medical Image Retrieval (CBMIR) framework, providing a revolutionary method for disease identification and facilitating the automated, efficient, and precise retrieval of pertinent medical images. The suggested pipeline utilizes hierarchical feature extraction, multi-level gain-based selection, and sophisticated indexing algorithms to achieve optimal efficiency in managing sizeable medical image libraries, effectively tackling significant.

Recurrent neural networks

RNNs are useful for tasks requiring sequential data, such as time series or electronic health records. RNN performs better than other DL models when healthcare data includes time series properties, as shown in Fig. 4. Each unit processes the input sequentially while retaining prior context, making it effective for analyzing time-dependent clinical data, such as ECG, Electroencephalogram, or patient history and mathematical notation is outlined in Eqs. (1), (2).

Hidden state update:

(1) $h_{t} = f (W_{h h} h_{t - 1} + W_{x h} x_{t} + b_{h}) .$

Output:

(2) $y_{t} = W_{h y} h_{t} + b_{y} .$

Figure 4: Recurrent neural networks (ul Haq et al., 2022).

Download full-size image

DOI: 10.7717/peerj-cs.3484/fig-4

RNNs are widely used to predict changes in healthcare conditions when the variables exhibit a time series aspect. They have been utilized to predict the course of disease and the effectiveness of treatment. CNNs have extracted the geographical distribution of skin lesions to help identify skin cancer.

By examining the temporal trends in heart rate variability, RNNs have been utilized to detect cardiac disease. The spatial distribution of blood glucose levels has been analyzed using CNNs to identify diabetes. RNNs have been used to identify Alzheimer’s disease by examining the temporal patterns of brain activity. CNNs have studied the geographical distribution of skin lesions to help identify skin cancer. By reviewing the temporal trends in heart rate variability, RNNs have been utilized to detect cardiac disease. CNNs have examined the spatial distribution of blood glucose levels to identify diabetes. Kumar & Ghosh (2024) emphasized the efficacy of BiLSTM-based deep learning models for the early identification of Parkinson’s Disease using online handwriting analysis. The suggested approach integrates sophisticated kinematic feature extraction with sequential modeling, providing a robust, efficient, and accurate solution that surpasses current machine learning methods. This method boosts diagnostic precision and offers a non-invasive, accessible tool for early diagnosis, improving patient outcomes.

Long short-term memory networks

RNNs that can manage long-term data dependencies include LSTM networks. LSTMs incorporate both short- and long-term memory by integrating a gating mechanism, as shown in Fig. 5, which shows how synchronized gating mechanisms in LSTM enable the selective retention and updating of diagnostic patterns across time, essential for understanding learning progression trends in chronic disease modeling. Mathematical notation is outlined in Eqs. (3)–(6).

Forget gate:

(3) $f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f}) .$ Input gate and candidate gate:

(4) $i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}), {\tilde{C}}_{t} = t a n h (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{C}) .$ Cell state update:

(5) $C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot {\tilde{C}}_{t} .$ Output gate:

(6) $o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o}), h_{t} = o_{t} \cdot t a n h (C_{t}) .$

Figure 5: Long short-term memory (ul Haq et al., 2022).

Download full-size image

DOI: 10.7717/peerj-cs.3484/fig-5

Oktay & Kocer (2020) proposed a convolutional LSTM approach to distinguish between Parkinsonian Tremor (PT) and Essential Tremor (ET). They used a jump motion controller with a 4D camera to record tremors and tried to extract the key characteristics using a CNN. They classified the ET and PT using the LSTM network. However, the normalization method and strategy to prevent over-fitting were not disclosed. RNNs with LSTM are particularly effective at handling jobs involving sequential data. For instance, by investigating the temporal patterns of brain activity, LSTMs are used to detect cardiac disease.

A collection of heart rate variability measurements from heart disease patients and healthy controls was used to train an LSTM model. An accuracy of 85% was achieved in the model’s ability to recognize patients with heart disease. Goyal, Rani & Singh (2024) presented a multilayered framework based on deep learning that revolutionizes Alzheimer’s Disease diagnosis, attaining enhanced accuracy via transfer learning, LSTM-based temporal modeling, and GAN-facilitated data augmentation. The approach addresses critical issues in early detection, data scarcity, and overfitting, surpassing current methodologies and paving the way for future progress in personalized medicine and AI-driven diagnostics. Its prospective applications in early detection, multimodal integration, and clinical implementation underscore its importance as a crucial instrument in combating Alzheimer’s Disease.

Generative adversarial networks

GANs are a kind of deep learning model that produces artificial data. They are frequently employed in the healthcare industry to produce artificial medical images and supplement data. Figure 6 presents a GAN that generates synthetic patient data to supplement constrained clinical datasets, thereby improving the efficacy of deep learning in diagnosing uncommon diseases. They train antagonistic neural networks side by side. There are two networks: the Discriminator and the Generator and mathematical notation is outlined in Eqs. (7)–(9).

Generator:

(7) $G (z; θ_{g}) \to x^{'} .$ Discriminator:

(8) $D (x; θ_{d}) \in [0, 1] .$ Minimax Objective:

(9) $\underset{G D}{min max} V (D, G) = E_{x \sim p d a t a (x)} [l o g D (x)] + E_{z \sim p z (z)} [l o g (1 - D (G (z)))] .$

Figure 6: Framework of generative adversarial network (ul Haq et al., 2022).

Download full-size image

DOI: 10.7717/peerj-cs.3484/fig-6

The Generator creates bogus data samples to trick the Discriminator by increasing its chances of making a mistake. The Discriminator tries to discriminate between fake and actual samples. The two neural networks that build GANs are the generator and the discriminator networks. The discriminator network separates actual and artificial data, while the generator network creates artificial data.

Tufail et al. (2024) indicated substantial improvements in classification accuracy, sensitivity, and overall performance, underscoring the effectiveness of the suggested method. Using GANs to supplement minority classes improves the model’s generalizability and establishes a benchmark for using data augmentation in medical imaging. This study highlights the efficacy of integrating transfer learning with GAN-based augmentation to enhance the early identification of Alzheimer’s disease (AD), facilitating improved disease management and patient care.

Deep neural networks

An ANN with numerous layers of neurons is known as a DNN. The connections between the neurons in each layer and the subsequent layer are weighted. The connections’ weights are changed throughout the training phase so the DNN can learn how to do a particular job and mathematical notation is outlined in Eqs. (10), (11).

Layer wise transformation:

(10) $h^{(l)} = f (W^{(l)} h^{(l - 1)} + b^{(l)}) .$ Final Output:

(11) $y = f (W^{(L)} h^{(L - 1)} + b^{(L)}) .$

Deep neural networks are being created for use in the healthcare industry by the research group Google DeepMind Health. Algorithms developed by DeepMind Health may be used to identify diseases, predict patient outcomes, and create customized treatment regimens (Singh, 2024). SinhaRoy & Sen (2024) improved the early diagnosis of Alzheimer’s Disease (AD) by producing synthetic Magnetic Resonance Imaging (MRI) images with Deep Convolutional Generative Adversarial Networks (DCGANs). The authors highlighted the efficacy of GAN-based methodologies in mitigating data constraints and improving diagnostic accuracy. It emphasizes an innovative method for Alzheimer’s disease prediction, providing a robust instrument for early diagnosis and enhancing the use of deep learning in medical imaging.

Generative artificial intelligence

Generative AI, which can produce new data such as text, images, or music based on training data, offers several fascinating potential uses in diagnosing and predicting diseases. It can uncover hidden patterns by analyzing vast patient data, including genetic information, scans, and medical records, and finding links and patterns humans might overlook. This capability can lead to earlier diagnoses, more precise prognoses, and even the identification of new diseases and mathematical notation is outlined in Eqs. (12), (13).

General generative model likelihood:

(12) $p θ (x) = \int p θ (x | z) p (z) d z .$ Variational Autoencoder (VAE) objective (commonly used in GAI):

(13) $L (θ, ϕ; x) = E_{q ϕ (z | x)} [l o g p θ (x | z)] - D_{K L} (q_{ϕ} (z | x) ‖ p (z)) .$

Additionally, generative AI can simulate disease progression by building models that mimic the course of a disease in a specific patient, assisting medical professionals in determining the optimal course of action and predicting a patient’s potential response to various treatments. Moreover, generative AI can produce synthetic data that resembles actual patient data, which is invaluable for developing new diagnostic tools and training other AI models (LeewayHertz & Takyar, 2024). Employing creative ideas to surpass conventional methods is crucial in healthcare. The use of GAI has been steadily rising across several domains. Balas & Micieli (2023) used generative artificial intelligence technology to produce visuals depicting the visual perception of a patient with visual snow syndrome, using textual descriptions to facilitate the text-to-image translation process. Diverse models provided a clear image, including DALL·E2, midjourney, and Stable Diffusion.

Deep learning techniques for disease prediction and diagnosis

Deep learning techniques have the power to transform the medical industry. Deep learning models can assist in improving patient outcomes by giving clinicians early and more precise diagnoses. New disease therapies and cures are also being developed using deep learning techniques. Future deep learning applications in healthcare will probably increase as deep learning technology advances. Concerning chronic disease diagnosis using DL approaches, we thoroughly reviewed the studies on heart disease, cancer, diabetes, skin, contagious, Alzheimer’s, and Parkinson’s disorders.

Overview of the dataset

Various chronic diseases and data modalities from public and commercial sources comprise the datasets used in existing works. Table 2 shows that the datasets used in medical research vary significantly in nature, availability, and accessibility, reflecting the varying data requirements of different disease areas.

Table 2:

Summary of related works applying deep learning to various medical image datasets across different disease categories.

Related work	Disease (category)	Dataset (type)	Image type	Number of Images
Asif et al. (2025)	Brain tumor (e.g., glioma, meningioma)	Public (Kaggle brain MRI dataset)	MRI (T1-weighted scans)	2,870 (original images; ~10,000 with augmentation)
Guo et al. (2023)	Pneumonia (respiratory)	Public (Kermany Kaggle & RAIG X-ray)	Chest X-ray	5,856 (training) + 3,900 (external test)
Vuran et al. (2025)	Monkeypox (skin infection)	Public (Mpox Skin Lesion v2.0)	Skin lesion images (clinical)	755
Malik et al. (2024)	Skin cancer (melanoma, BCC)	Public (ISIC dermoscopy archive)	Dermoscopic images	3,762 (training) + 1,060 (testing)
El-Ghany, Mahmood & Abd El-Aziz (2024)	Gastrointestinal (ulcers, polyps)	Public (Kvasir-Capsule & Kvasir v2)	Wireless capsule endoscopy	~35,468 (Kvasir-Capsule) + 8,000 (Kvasir v2)
Raza et al. (2024)	Alzheimer’s disease (neurological)	Public (ADNI)	MRI (structural sMRI)	1,075
Li et al. (2025)	Cardiac diseases (e.g., cardiomyopathy)	Public (ACDC challenge data)	Cardiac MRI	150 patients (~28–40 frames each ≈5,000 images)
Thatha et al. (2025)	Breast cancer (histopathology)	Public (BreakHis histology dataset)	Histopathology (H&E stained)	7,909

DOI: 10.7717/peerj-cs.3484/table-2

Figure 7 illustrates a disease diagnostic pipeline based on deep learning, beginning with the data collection from medical imaging and clinical records. Preprocessing improves the data by normalizing, augmenting, and handling missing variables. Feature extraction employs deep learning algorithms.

Figure 7: Disease diagnosis workflow pipeline based on deep learning.

Download full-size image

DOI: 10.7717/peerj-cs.3484/fig-7

The model undergoes training using data and is then precisely adjusted to enhance its accuracy. Disease classification predictions are assessed using performance metrics. Explainability approaches, such as Grad-CAM, emphasize significant characteristics, thereby improving the transparency of the model. The last stage involves incorporating the model into clinical operational processes, thus facilitating immediate diagnosis and bolstering healthcare decision-making. Figure 7 illustrates a comprehensive pipeline for illness detection using deep learning, highlighting the progressive integration of clinical data management, model development, and practical implementation. It distinctly integrates explainability and feedback mechanisms to enhance trust and decision-making in therapeutic environments.

Disease diagnosis and prediction

The study reviews numerous research studies that use DL methods to diagnose and forecast chronic diseases. These diseases include heart disease, cancer, diabetes, skin disorders, contagious diseases, and neurodegenerative diseases such as Alzheimer’s and Parkinson’s. The study focuses on advances in applying DL models and other deep learning architectures to improve diagnostic accuracy, early detection, and tailored treatment planning for various disorders. By evaluating varied datasets such as medical images, patient records, and genetic data, DL techniques have been demonstrated to considerably improve prediction results, allowing for quicker treatments and possibly lowering healthcare expenditures.

Heart disease

Electrocardiograms (ECGs), echocardiograms, blood pressure measurements, and imaging methods like Computed Tomography angiograms or MRI are used to identify cardiovascular diseases (CVDs), including heart disease, stroke, arrhythmias, and hypertension. Deep learning models have been used to predict cardiac disease and evaluate ECG signals to diagnose arrhythmias by analyzing numerical data such as blood pressure, heart rate, and cholesterol levels. Predicting heart disease often entails using DNNs to integrate clinical imaging (such as echocardiograms and CT scans) with patient history data (such as age, gender, and smoking status). Deep learning is used in stroke diagnosis and prediction to examine CT and MRI images and pinpoint brain regions that have been impacted by ischemia or bleeding. Deep learning models have also been used to spot minute changes in the heart’s structure, such as identifying coronary artery disease on CT scans (Rani et al., 2024). Baviskar et al. (2023) highlighted the necessity for an effective prediction approach to cope with the complications of important heart-related diseases. Uma Maheswari & Valarmathi (2023) used the Optimized Deep Belief Network to develop a prediction and recommendation model for heart disease prediction. Hybrid methods will extract the best characteristics from diverse data sources.

Cancer disease

A group of conditions known as cancer is characterized by abnormal cell proliferation and the capacity to infiltrate or spread to various body locations. These stand in contrast to normal tumors, which remain stable. Large databases of chemical compounds can be screened using deep learning to find those that can destroy cancer cells. When cancer is diagnosed sooner, it can often be treated more successfully (Joshi & Aziz, 2024). Asthma, pneumonia, COPD, lung cancer, and other respiratory conditions are excellent prospects for deep learning applications. Deep CNNs are taught to recognize nodules and categorize them as benign or malignant to detect lung cancer using CT images. Positron Emission Tomography (PET) scans may also be performed with CNNs to find metabolically active cancers. A mix of imaging and pulmonary function tests is used to diagnose asthma and COPD. Deep learning models are increasingly used to examine chest CT images and X-rays for indications of emphysema and airway blockage. Deep learning is also utilized to predict respiratory failure or exacerbations in chronic respiratory disorders using time-series data from spirometers and pulse oximeters (Khandakar et al., 2024). Imaging, biopsy, and endoscopy are often used to identify gastrointestinal disorders, such as Crohn’s disease, colorectal cancer, and Irritable Bowel Syndrome (IBS). Images from colonoscopies are analyzed using deep learning models to look for polyps or other abnormalities that might be signs of colorectal cancer. Deep neural networks are used for automated image segmentation to identify benign or malignant tumors. Colorectal cancer is also staged using CT scans, and deep learning models use changes seen in radiological imaging to predict how the tumor will react to treatment. Deep learning is used to assess the thickness of the gut wall in MRI images for Crohn’s disease and to predict flare-ups in biomarkers such as blood and stool tests. Deep learning has also helped microbiome research; algorithms now examine 16S rRNA sequencing data to find bacterial signatures associated with Crohn’s disease or IBS (Sokouti & Sokouti, 2024). Pradhan, Chawla & Rawat (2023) created a novel lung cancer diagnostic model using an optimized deep learning technique and attribute correlation-based optimized weighted feature extraction. By comparing the suggested model to other optimization and machine learning techniques, it has been shown to perform better. Additionally, the suggested SA-SLnO-RNN model has the drawback of being unable to resolve combinatorial optimization issues.

Neurological diseases: Alzheimer’s disease

Deep learning algorithms greatly enhance the intricate data sets used in neurological disorders, which impact the brain, spinal cord, and nervous system, including neuroimaging (MRI, CT, and PET scans), electroencephalography (EEG), and genetic data. To identify early alterations in brain structure from Magnetic Resonance Imaging (MRI) images, for instance, deep learning-based image processing is often used to diagnose Parkinson’s disease (PD). There has been a recent uptick in the use of deep learning models for the diagnosis of Alzheimer’s disease. These models examine genetic information, CerebroSpinal Fluid (CSF) biomarkers, and patterns in neuroimaging data, such as atrophy in specific brain areas. Deep learning networks may be taught to recognize tiny lesions that may not be apparent to human observers, allowing MRI to segment brain lesions in diagnosing multiple sclerosis. Gait analysis and voice recognition are two examples of physiological data that may be used with these technologies to enhance the monitoring and early diagnosis of neurological diseases (Hussain & Nazir, 2024).

Alzheimer’s disease is a degenerative brain disease that impairs cognition and causes memory loss. LSTMs have been used to identify Alzheimer’s disease. An LSTM model was trained using a dataset of EEG recordings from patients with Alzheimer’s disease and healthy controls. 90% of the time, the model successfully identified people with Alzheimer’s disease. Nguyen et al. (2022) suggested an ensemble learning framework for AD detection that combines deep learning, machine learning, and a multi-model, uni-data approach. The deep learning model was developed using a 3D-ResNet to benefit from 3D structural properties in neuroimaging data. Transfer learning may be employed to reduce further overfitting from the sparse training data. Ahmed, Elsharkawy & Elkorany (2023) introduced a Deep Convolutional Neural Network (DCNN) architecture based on brain MRI images for AD diagnosis. Normal Controls (NC), Mild Cognitive Impairment (MCI), and AD are distinguished using a multiclass DCNN classifier. To categorize EEG spectrum images into three groups for early AD detection, Bi & Wang (2019) have proposed an allegedly advanced discriminative deep probabilistic model with multi-task learning. Their approach comprises a multi-task learning technique with an advanced discriminative deep convolutional generative model. The developed model performs well because it connects feature extraction and classification compared to other generative models.

Parkinson’s disease

Parkinson’s disease (PD) is a progressive disorder that causes uncontrolled shaking, stiffness, and balance and coordination problems. Tanveer et al. (2022) analyzed multiple modalities, datasets, architectural configurations, and experimental setups. Using time-series data, the hybrid CNN-RNN structures have also produced precise findings in diagnosing Parkinson’s disease. The approach taken by Alissa et al. (2022) to diagnose PD focuses on identifying movement abnormalities in patients using drawing tasks. They do this by utilizing a convolutional neural network, a deep neural network architecture, to distinguish between healthy controls and PD patients. It is possible to advance the compact model into an automated single-task diagnostic tool that operates offline in real time and can be conveniently implemented in a clinical environment. The suggested systems, the AE deep features-based system, and the Mel Frequency Cepstral Coefficients-Gaussian Mixture Models (MFCC-GMM) based system had great results, approaching 100%, as empirically proved and this method, based on voice, can detect PD without having a medical test (Khaskhoussy & Ayed, 2022).

Diabetes

Diabetes mellitus is a metabolic condition in which the body experiences persistently elevated blood sugar levels. It serves as the primary energy source for the brain. Diabetes, regardless of type, can result in excess blood sugar. Blood glucose levels, body mass index (BMI), and hormone testing are often used to identify diabetes, obesity, and thyroid conditions. Deep learning may enhance diagnosis and prediction in various domains by evaluating vast numerical data with medical imaging. For instance, by examining genetic, clinical, and demographic data, deep learning can predict the likelihood of type 2 diabetes. Ultrasound imaging and blood tests (such as TSH, T3, and T4 levels) identify thyroid diagnoses, such as hypothyroidism or hyperthyroidism. Thyroid nodules and other anomalies may be found using deep learning models trained on ultrasound images. Regarding obesity, deep learning models can detect fat distribution in the body and predict consequences like type 2 diabetes or cardiovascular disease by analyzing body scans (such as CT or MRI scans) (El-Bashbishy & El-Bakry, 2024). Önal, Güraksin & Duman (2023) presented a hybrid deep learning and image processing technique based on iris images for a more objective inspection and diabetes diagnosis. The recommended method identified the iris border and automatically retrieved the pancreatic area from the iridology chart. Kurt et al. (2023) proposed a novel and successful decision support model using RNN-LSTM and Bayesian optimization to diagnose patients in the gestational diabetes (GD) risk group with 95% sensitivity and 99% specificity on the generated dataset. By obtaining 98% AUC (95% CI [0.95–1.00] and p < 0.001, the model effectively diagnosed GD.

Skin diseases

Clinical examination and biopsy are the primary methods to identify dermatological conditions, including melanoma, psoriasis, and eczema. Deep learning algorithms are being used more and more in medical images. Dermoscopy is often used to diagnose melanoma, a skin cancer, since deep learning algorithms may identify early cancer symptoms by classifying lesions based on color, shape, and texture patterns. Furthermore, image-classification networks may improve diagnosis accuracy by distinguishing benign moles from malignant melanomas. Skin biopsy and visual examination are the two methods used to diagnose the persistent skin disorder psoriasis. By examining characteristics, including skin lesions, inflammation, and scaling, deep learning has been used to categorize and track the severity of psoriasis in dermatological images. By examining dermatological images to find distinctive patterns linked to atopic dermatitis and other types of eczema, deep learning models may assist in automating the diagnostic process for eczema (Groh et al., 2024).

Contagious diseases

Blood tests, imaging, and PCR-based diagnostics are the primary methods used to detect infectious disorders such as TB, hepatitis, HIV/AIDS, and COVID-19. For instance, RT-PCR testing and chest X-rays/CT scans diagnose COVID-19. Deep learning algorithms identify the virus from radiological images by identifying bilateral lung infiltrates and distinctive ground-glass opacities. Chest X-rays and sputum smear microscopy are often employed to diagnose tuberculosis, and deep learning models are utilized to identify radiological abnormalities in the lungs and categorize them as suggestive of TB infection. While genomic sequencing is increasingly used to find viral mutations, serological testing is still employed to find biomarkers in hepatitis diagnosis. By examining genetic sequences and patient data, deep learning models are also used to predict medication resistance in hepatitis B and C (Ajagbe & Adigun, 2024).

In summary, deep learning technologies are revolutionizing the detection and categorization of many diseases in several medical areas. Deep learning models provide practical tools for early disease identification, individualized therapy, and prognosis prediction by evaluating complex datasets, including clinical and demographic data, medical imaging, genetic data, and electrophysiological signals. Deep learning’s potential to enhance healthcare outcomes expands as more diseases are researched and more varied datasets become accessible. In addition to improving diagnostic precision, this strategy helps uncover subtle patterns in data that conventional approaches can miss.

Algorithmic steps to implement deep learning models

The disease diagnosis and prediction process using deep learning encompasses many stages, including data collection, preprocessing, model selection, training, and deployment. This improves the precision of diagnoses and the ability to make predictions in medical environments. The algorithmic steps show the fundamental disease diagnosis and prediction stages for implementing deep learning models. This survey is a valuable guide for researchers and clinicians aiming to leverage deep learning in healthcare applications using algorithmic steps to design the DL model (Algorithm 1).

Algorithm 1

Algorithm for disease diagnosis and prediction using DL models.

1.

Data Collection (Dataset Description, Sources, and Size)
The efficacy of any DL model significantly depends on the quality and volume of the data used for training. Data collecting entails acquiring precise, varied, and representative samples that reflect pertinent trends for analysis. To ensure openness and repeatability, thorough explanations of the datasets used for model evaluation and training are provided.
- (i) Gather relevant medical data (data source) from Electronic Health Records (EHRs), imaging studies, wearable sensors, genomic and biomarker data, or other sources, such as publicly available medical datasets. Based on the type of data, data acquisition can be done in the following ways:
  - •
    
    Vital Signs: Collected from electronic health records to predict disease outcomes.
  - •
    
    Medical Imaging, Data: Includes images taken by X-ray, MRI, CT, and PET scanners.
  - •
    
    Sensors for the Body: Wearable electronic devices monitor vital signs in real-time.
  - •
    
    Time-Series Data: Provides critical indications like heart rate, step count, and gait analysis.
  - •
    
    Public Medical Datasets: Affordable and ready-to-use datasets for quicker experimentation and benchmarking.
  - •
    
    Personalized Information Gathering via Research Projects: Collaborates with medical facilities or academic institutes to gather patient-specific information.
- (ii) Size of the Dataset
  - •
    
    Each dataset contains hundreds to millions of samples, with particular subsets selected for testing, validation, and training.
  - •
    
    Each dataset varies in size, ranging from thousands to millions of records, depending on the source and type of collected data.
- (iii) Ensure data anonymization and compliance with patient privacy regulations.
- (iv) Information Collection: Incorporate further patient information (age, gender, medical history) to enhance the dataset for feature engineering.
2.

Preprocessing
Raw data often contains unstructured, inconsistent, or missing data. Preprocessing is crucial to convert it into a clean, analyzable format. Preprocessing, class balancing, and splitting the dataset can be used depending on the data type, and various preprocessed techniques are mentioned below:
- (i) Numerical data
  - •
    
    Normalization/Standardization: Scale numerical features to a standard range to improve model performance. Several standardizing and normalizing procedures are available to make numerical data more suitable for analysis, and you need to understand them better to use the dataset for implementation and improve accuracy.
  - •
    
    Check null values, outlier detection, and feature correlation
- (ii) Image Data
  - •
    
    Resize images to a consistent size- Sets all images to the same size for convolutional neural networks (CNNs).
  - •
    
    Apply data augmentation techniques (e.g., rotation, flipping, cropping, or histogram) to increase dataset size and artificially improve model generalization.
- (iii) Time-Series Data:
  - •
    
    Normalize values.
  - •
    
    Impute missing values using appropriate methods (e.g., mean imputation, forward fill).
  - •
    
    Transform data into sliding windows that overlap for sequential models.
  - •
    
    Reshape data into a format suitable for the chosen model (e.g., sliding windows).
  - •
    
    Normalization ensures consistency in measurements.
  - •
    
    Imputation fills in missing data.
  - •
    
    Sliding Window Transformation captures temporal relationships
- (iv) Class Balancing:
  - •
    
    Check whether the class distribution is balanced. If the data is imbalanced for numerical data, use sampling techniques like random oversampling, SMOTE, etc., for image-apply data augmentation techniques.
- (v) Split Data: Divide the preprocessed data into training, validation, and test sets.
  - •
    
    Training set-Used to train the model.
  - •
    
    Validation set-Used to monitor model performance during training and prevent overfitting.
  - •
    
    Testing set-Used to evaluate the trained model’s generalizability on unseen data.
3.

Model Selection and Architecture Design
Select the appropriate model architecture based on the data type and analysis goals. Deep learning (DL) model architecture selection is impacted by data type (e.g., images, time series, tabular data), task complexity (e.g., classification, segmentation, prediction), and available computer resources. A suitable DL model can be selected, and various DL models are listed below:
- (i) Convolutional Neural Networks (CNNs) suitable for image-based data (e.g., medical scans). CNNs utilize convolutional and pooling layers to extract features, followed by dense layers for classification. The final layer uses softmax activation for multi-class problems. Convolutional layers extract spatial characteristics, while dense layers categorize. CNNs quickly deal with spatial hierarchies and medical image patterns, facilitating automated feature extraction without manual intervention. Tunable parameters are Kernel size, number of filters, learning rate, batch size, and optimizer (Kumar et al., 2024).
- (ii) Pretrained Models (Alexnet, VGG16, Resnet, GoogLeNet, and more): Pretrained models are neural networks trained on massive datasets (ImageNet) and may be fine-tuned to perform particular tasks. These models are helpful for medical research since many healthcare datasets are small. Pre-trained models are selected to decrease training time and increase accuracy, particularly when medical datasets are limited (Kumar et al., 2024). The number of trainable layers, learning rate, batch size, optimizer, and fine-tuning approach (full/partial layer freezing) are all tunable factors.
- (iii) Recurrent Neural Networks (RNNs)/Long Short-Term Memory (LSTMs) are used for time-series data (e.g., vital signs). They capture temporal dependencies critical for sequential patterns such as disease development. LSTM layers capture temporal dependencies, followed by dense layers for prediction. The final layer uses sigmoid activation for binary classification or softmax for multi-class problems. This improves disease progression prediction by effectively capturing the temporal interdependence of patient data. Tunable parameters are sequence length, hidden units, and dropout rate (Jiang et al., 2024).
- (iv) Generative Adversarial Networks (GANs) for synthetic data generation. Train a generator to create realistic synthetic data resembling accurate patient data and a discriminator to distinguish real from synthetic data. GANs imitate real-world patient data, lowering the danger of overfitting from inadequate data. GANs were selected due to their capacity to provide high-quality, realistic data while maintaining privacy. The number of training epochs, learning rate, batch size, loss function, and generator and discriminator architecture are tunable parameters (Kundu et al., 2024).
- (v) Transformers and Attention-based architectures: Transformer designs evaluate textual or image input using self-attention techniques. Attention techniques are gaining popularity for increasing interpretability and capturing long-range interdependence. Transformers use self-attention strategies to replace recurring behaviors, such as sequence labeling and disease progression modeling. Variants include Vision Transformers (ViT), which are used for medical imaging applications such as segmentation and classification. BERT is good at digesting unorganized clinical material, such as pathology reports and doctors’ notes. ViT: Captures global dependencies, outperforming CNNs in several medical imaging applications. Tunable parameters are the number of attention heads, embedding dimension, and learning rate warmup (Djenouri et al., 2024).
- (vi) Graph Neural Network (GNN): GNNs are very effective for modeling interactions in non-Euclidean datasets, including biological networks, patient similarity graphs, and chemical structures. Applications include analyzing protein-protein interaction networks for disease biomarker development and predicting disease comorbidities using patient data graphs. Tunable parameters are a number of graph layers, hidden units, activation function, learning rate, and message-passing iterations (Paul et al., 2024).
- (vii) Reinforcement Learning (RL): RL is utilized in decision-making tasks that require optimizing consecutive activities, such as treatment planning or drug development. Applications include dynamic therapy suggestions based on patient reactions and optimizing radiation treatment doses. The RL model was selected because of its capacity to pick up adaptive strategies via interactions, including trial and error. Tunable parameters are learning rate, policy architecture, exploration-exploitation method, reward function, and discount factor (Yuan, Sun & Chen, 2023).
- (viii) Custom Architectures for Multimodal Data: Integrating many kinds of data (for example, imaging, clinical, and genetic) can address many healthcare challenges. Multimodal deep learning models combine CNNs for images, RNNs for sequences, and MLPs for tabular data, predicting patient outcomes using imaging and clinical data. These models were selected due to their adaptability in combining diverse data and enhancing forecast precision. Tunable parameters include learning rate, optimizer, feature extraction technique, modality fusion approach, and network architecture design (Ahmad et al., 2023).
- (ix) Deep Belief Networks (DBNs) are unsupervised deep neural networks. DBM involves two main stages: pre-training and fine-tuning. Pre-training involves hidden layers, fine-tuning is a feed-forward neural network, and DBN has inter-layer relationships without connections among values. DBNs are chosen for their ability to model complex patterns in high-dimensional data. Tunable parameters are the number of layers, hidden units, the learning rate, the activation function, and the training epochs (Zeng, Li & Peng, 2023).
- (x) Generative AI: Train models to generate synthetic data that reflects disease progression. Utilize generated data to uncover hidden patterns and simulate disease trajectories. Tunable parameters are latent space dimension, learning rate, loss function, and training epochs (Atchison et al., 2024).
The choice of model relies on the individual’s goal.
•

CNNs are ideal for spatial data, such as medical images.
•

RNNs/LSTMs are ideal for processing sequential data, such as time series.
•

GANs are effective for supplementing datasets and domain adaptability.
•

Transformers are suitable for both textual and multimodal data.
•

Create customized hybrid models to overcome the restrictions of single architectures.
4.

Model Training
Train the chosen model on preprocessed data to discern patterns and connections within the dataset.
- (i)
  Compile the Model
  - •
    
    Choose an optimizer (e.g., Adam) to update model weights during training. Regulates the model’s weight tunings following the loss function
  - •
    
    Select a loss function (e.g., categorical cross-entropy) to measure the difference between model predictions and accurate labels.
  - •
    
    Define performance metrics (e.g., accuracy) to track model performance. Monitor the training progress.
- (ii)
  Train the Model
  - •
    
    Train the model with the training data.
  - •
    
    Process the training data through the model in batches (batch size).
  - •
    
    Employ backpropagation to calculate the gradient of the loss function and adjust the weights accordingly
  - •
    
    Monitor performance on the validation set to prevent overfitting
  - •
    
    Specify the number of epochs (training iterations) for optimal performance.
- (iii)
  Validation Techniques
  - •
    
    Cross-validation (k-fold, leave-one-out) and k-fold cross-validation (e.g., 5-fold or 10-fold) are applied to ensure unbiased performance assessment.
  - •
    
    Stratified Sampling: Used to maintain class balance in dataset partitions.
  - •
    
    Discuss how performance was measured (e.g., AUC-ROC, precision-recall curves).
5.
Model Evaluation
- •
  
  Evaluate the trained model’s performance on the unseen test data.
- •
  
  Calculate metrics like loss and accuracy to assess model generalizability.
The assessment criteria for classification and regression tasks differ according to the nature of the medical issue. Diseases may need one or both categories of tasks. The following tasks can be used to evaluate the DL model.
- (i)
  
  Classification: Classification metrics can be used to classify and diagnose diseases such as cancer (malignant vs. benign), diabetes (presence vs. absence), and AD and PD prediction. Classification metrics such as accuracy, recall, precision, F1-score, and ROC-AUC are applied during model evaluation. The Medical Relevance of classification is that Recall is crucial in medical applications for accurate disease detection. Precision prevents unnecessary therapies for healthy individuals. AUC-ROC evaluates the model’s ability to differentiate between disease-free and disease-free individuals. Measures support early detection, reduce misdiagnosis, and ensure deep learning models’ accuracy in clinical decision-making.
- (ii)
  
  Regression: Regression is used to predict the progress of the disease (e.g., tumor growth size, severity progression of Parkinson’s tremors, and blood glucose level forecasts). Regression metrics are Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Coefficient of Determination (R²), and Residual analysis, which are applied during continuous evaluation. For continuous medical variables like tumor development, disease severity progression, or blood glucose levels, regression metrics like Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R2 (coefficient of determination) are used. MAE and RMSE are essential to measure deviation from actual values and guarantee an accurate disease diagnosis. Strong predictive power is shown by a high R2 value, which makes these measurements crucial for individualized treatment planning.
- (iii)
  
  Hybrid Metrics: Use criteria such as accuracy, recall, and F1-score for categorization. For regression analysis, assess using MAE, RMSE, and R². Consolidate performance indicators for a comprehensive assessment. Hybrid metrics provide a thorough assessment when both classification and regression tasks are necessary. For instance, in models of Parkinson’s or Alzheimer’s disease, regression metrics forecast the severity of the disease’s development across time, whereas classification metrics determine if a patient has the disease. A combined evaluation strategy ensures a more comprehensive evaluation of medical AI models, which enhances patient outcome predictions and diagnostic precision.
- (iv)
  
  Tailored Loss Function: Integrate classification and regression loss functions for training purposes
Finally, customizing the evaluation procedure according to the specific model and disease prediction allows for a more precise assessment of models, confirming their efficacy in medical applications.
6.
Model Fine-Tuning and Optimization
- (i)
  
  Enhance model efficacy by optimizing its architecture, hyperparameters, or training methodology.
- (ii)
  Refine the model’s performance through techniques like:
  - •
    
    Hyperparameter Tuning: Optimize hyperparameters (e.g., learning rate, number of layers) using a grid or random search.
  - •
    
    Modify parameters such as learning rate, batch size, and layer count.
    
    (a)
    
    Grid Search: Comprehensive exploration of specified parameter values.
    
    (b)
    
    Random Search: Stochastically choose parameters from a specified domain.
    
    (c)
    
    Bayesian Optimization: Employ probabilistic models to determine hyperparameters.
    
    (d)
    
    Regularization: Apply techniques like dropout, early stopping, and L2 regularization to prevent overfitting and improve model generalization.
    
    (f)
    
    Mitigate overfitting with L2 Regularization, which imposes a penalty proportionate to the total squared weights. This approach penalizes big weights in the loss function by adding a term proportional to the squared sum of the weights. This aids in smoothing decision boundaries and preventing overfitting, notably in CNNs for medical imaging (X-rays, MRIs) and deep learning models for sequential medical data (ECG, EEG).
    
    (g)
    
    Dropout: This approach randomly eliminates neurons during training to compel the model to acquire resilient properties. It deactivates a subset of neurons during training, causing the network to build redundant and generalized representations. It is often employed in deep learning models for disease categorization to improve robustness.
    
    (h)
    
    Early Stopping: Cease training when validation performance ceases to improve. Monitors validation performance and stops training when progress slows, avoiding extra training that might lead to overfitting.
    
    (i)
    
    Algorithms for Optimization: Utilize powerful optimizers. Adaptive optimizers like Adam, RMSprop, or SGD with momentum may improve model generalization by ensuring steady and effective weight updates, particularly in intricate medical datasets.
- (iii)
  
  Explainability: Use explainable AI such as SHapley Additive exPlanations, Local Interpretable Model-agnostic Explanations, or Gradient-weighted Class Activation Mapping to interpret the model predictions and ensure clinical transparency.
7.
Model Deployment
- •
  
  Incorporate the learned model into practical applications to facilitate prediction or support decision-making.
- •
  
  Implement the model on cloud platforms (AWS, Google Cloud, Azure) or edge devices (e.g., smartphones, IoT sensors).
- •
  
  Mobile/IoT Integration: Integrate models into wearable devices for instantaneous predictions
- •
  
  To incorporate the model into the applications to make it usable.
- •
  
  Help medical practitioners incorporate the learned model into a clinical decision support system.
8.
Monitoring model performance and improving
- •
  
  Keep the model current and correct in a real-world setting where things change.
- •
  
  Consistently monitor metrics (accuracy, recall, etc.) on real-time data.
- •
  
  Employ XAI (eXplainable AI) methodologies to render user model predictions comprehensible.
- •
  
  Establish contingency protocols for ambiguous predictions (e.g., identifying high-risk scenarios for manual evaluation).
- •
  
  Continuously monitor model performance in real-world use.
- •
  
  Retrain the model periodically with new data to maintain accuracy and adapt to evolving medical practices.

Model optimization and performance metrics

Hyperparameters are essential in influencing the performance, generalization, and efficiency of deep learning models in disease diagnosis. The Learning Rate (LR) is a crucial hyperparameter that regulates the magnitude of weight updates. A minimal learning rate (e.g., 0.001 for Adam or 0.01 for SGD) facilitates stable convergence and mitigates overshooting, whereas a greater rate accelerates learning but poses a danger of divergence. To maximize this parameter, researchers commonly apply tuning procedures such as grid search, cosine annealing, or cyclical learning rate scheduling. Another essential hyperparameter is the batch size, often ranging from 16 to 128 in medical imaging investigations. Reduced batch sizes enhance generalization due to increased stochasticity in gradient updates, whereas higher batch sizes optimize computational efficiency and use GPU parallelization. Researchers frequently determine batch size based on dataset dimensions and hardware limits, occasionally employing gradient accumulation when memory constraints inhibit bigger batches.

The number of epochs specifies the frequency with which a model traverses the dataset. The selection of epochs, typically ranging from 50 to 200 in medical deep learning research, is contingent upon the size and complexity of the dataset. Inadequate epochs may result in underfitting, while an excessive number of epochs poses a danger of overfitting. To mitigate this, strategies like early halting and monitoring validation loss are commonly utilized. The optimizer directly influences convergence. Adam, SGD with momentum, and RMSProp are commonly employed in disease diagnostic applications. Adam is renowned for its adjustable learning rate mechanism, which is effective on diverse medical datasets, although SGD offers more steady convergence, especially when utilized with momentum. Comparative testing is frequently used to identify the best appropriate optimizer.

Regularization methods, including dropout (rates of 0.2 to 0.5) and L1/L2 penalties (λ values from 0.0001 to 0.01), are employed to alleviate overfitting, a prevalent challenge in medical domains with insufficient data. Dropout randomly disables neurons during training, encouraging the network to acquire more resilient representations, whereas weight decay diminishes dependence on substantial weights. Cross-validation is frequently employed to optimize these parameters. The weight initialization approach is essential. Xavier (Glorot) and the initialization are commonly utilized to stabilize gradient propagation in CNNs, GANs, and DNNs, hence mitigating the hazards of disappearing or bursting gradients. Although they are frequently established by architecture, they substantially influence initial training dynamics.

In non-linear representation learning, activation functions like Rectified Linear Unit (ReLU) and Leaky ReLU are frequently employed in CNNs and RNNs, whereas Gaussian Error Linear Unit (GELU) and Swish have gained prominence in Transformer-based architectures like ViTs because of their smoother activation characteristics. The output layer often uses softmax for classification or sigmoid for binary jobs. In vision-based illness diagnosis, the size of the input picture is a significant hyperparameter. Standard dimensions, like 224 × 224 (ResNet, ViT) and 299 × 299 (Inception), are employed to reconcile spatial detail with computing efficiency. Larger medical images, such as 512 × 512 MRI or CT scans, are frequently shrunk or cropped according to model capability. In Vision Transformers, patch size (often 16 × 16 or 32 × 32) is a critical hyperparameter; smaller patches retain intricate features of lesions, whilst bigger patches decrease processing demands.

Learning rate schedulers, including step decay, cosine annealing, and ReduceLROnPlateau, are frequently utilized to dynamically modify the learning rate throughout the training process. These strategies promote convergence, mitigate stagnation, and improve generalization, especially in multimodal and unbalanced datasets. These hyperparameters together impact predicted accuracy, training efficiency, robustness, and clinical application. Meticulous selection, informed by empirical tuning methodologies such as cross-validation, Bayesian optimization, or adaptive scheduling, is crucial to enhancing the efficacy of deep learning models in disease detection.

Gupta et al. (2025) employed fivefold cross-validation (performed five times) on a coronary CT dataset, documenting average accuracy and AUC throughout the folds. The authors specifically conducted fivefold cross-validation five times to stabilize their results. Kong et al. (2025) employed 5-fold cross-validation to train deep MRI classifiers for the selection of appropriate image inputs. Formal hypothesis testing is also utilized. The glioma/MRI study employed t-tests and Mann–Whitney U-tests to compare patient groups, whereas other studies utilized paired t-tests or bootstrap tests on cross-validated ratings for comparing model variants. Huang et al. (2023) Performance indicators (accuracy, AUC, F1, etc.) are invariably accompanied by uncertainty estimates, typically represented as 95% confidence intervals. The CT model attained an AUC of around 0.95 with a 95% confidence interval of 0.92–0.97, while a transformer-based EHR model demonstrated an AUROC of 0.844 (95% CI [0.838–0.851]) on the internal test set and 0.849 (95% CI [0.846–0.851]) on an independent cohort.

Importantly, the majority of studies incorporate external validation using independent datasets. Models are evaluated on reserved cohorts or distinct hospital datasets to exhibit generalization and robustness. To identify the location of kidney stones, Ma et al. (2020) proposed a Heterogeneous Modified Artificial Neural Network (HMANN) approach that lowers noise and aids in kidney image segmentation. Three classifiers were tested to arrive at the correct prediction results for Chronic Kidney Disease (CKD), multilayer perceptron, artificial neural networks, and support vector machines and AUC, while Gupta et al. (2025) attained consistent AUCs with narrow confidence intervals on an unobserved cardiac dataset. Authors frequently observe that external testing “exhibits robustness and generalizability.” Certain research enhance reliability by conducting repeated experiments (e.g., multiple cross-validation runs) or use bootstrapping techniques. Gupta et al. (2025) specifically conducted fivefold cross-validation five times to stabilize their results. Statistical significance tests, external test-set performance, 95% confidence intervals for AUC/accuracy, and k-fold cross-validation data are all used by contemporary deep-learning diagnostic investigations to prove that their findings are reliable and repeatable.

Table 3 presents a comprehensive summary of the performance evaluation measures often utilized in deep learning-based illness detection. The table delineates classification and regression metrics, as well as segmentation and calibration measures, providing a comprehensive overview of model assessment in medical contexts. In classification tasks, accuracy, precision, recall (also known as sensitivity), specificity, F1-score, AUC-ROC, and AUC-PR are commonly used to assess correctness, sensitivity in disease detection, and threshold-independent separability of predictions. Regression measures such as Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and the R² score are employed in scenarios like illness progression tracking or severity grading. Image-based applications, such as tumor and lesion detection, depend on spatial overlap metrics like the Dice Similarity Coefficient (DSC) and Intersection over Union (IoU), which quantitatively assess the concordance between predicted and actual anatomical areas.

Table 3:

Performance evaluation metrics for deep learning-based disease diagnosis.

Metric	Formula	Clinical relevance	Common use case
Accuracy	(TP + TN)/(TP + TN + FP + FN)	Overall correctness; limited for imbalanced data	General classification
Precision (PPV)	TP/(TP + FP)	Reliability of positive diagnosis	Cancer detection, anomaly detection
Recall (Sensitivity/TPR)	TP/(TP + FN)	Ability to detect actual disease cases	Screening tasks (e.g., COVID-19, Alzheimer’s)
Specificity (TNR)	TN/(TN + FP)	Ability to correctly identify healthy cases	Avoiding false alarms in clinical diagnosis
F1-score	2 × (Precision × Recall)/(Precision + Recall)	Balanced trade-off between precision and recall	Imbalanced disease datasets
AUC-ROC	Area under ROC curve	Threshold-independent measure of separability	Binary/multi-class diagnosis
AUC-PR	Area under Precision–Recall curve	Focused on performance of minority (disease) class	Rare disease detection
Confusion matrix	Counts of TP, TN, FP, FN	Provides detailed error distribution	Model bias analysis
MAE (Mean Absolute Error)	$(1 / n) Σ \| y_{i} - {\hat{y}}_{i} \|$	Average deviation in predictions	Disease progression (e.g., Parkinson’s severity score)
RMSE (Root Mean Square Error)	$\sqrt{((1 / n)} Σ (y_{i} - {\hat{y}}_{i})^{2})$	Penalizes larger errors	Clinical risk score prediction
R²Score	$1 - Σ (y_{i} - {\hat{y}}_{i})^{2} / Σ (y_{i} - \bar{y})^{2}$	Measures variance explained by model	Disease severity regression
Dice Similarity Coefficient (DSC)	$2 \| X \cap Y \| / (\| X \| + \| Y \|)$	Overlap of predicted vs. true regions	Tumor/lesion segmentation
IoU (Intersection over Union)	$\| X \cap Y \| / (X \cup Y)$	Segmentation boundary accuracy	MRI/CT-based lesion detection
Brier score	$(1 / n) Σ (f_{i} - o_{i})^{2}$	Probability calibration	Risk prediction models
Calibration curve	Plot predicted vs. observed probabilities	Evaluates the probability reliability	Clinical decision support

DOI: 10.7717/peerj-cs.3484/table-3

Furthermore, probabilistic metrics such as the Brier Score and calibration curves assess the dependability of anticipated probabilities, hence enhancing confidence in clinical decision-making. By summarizing the formulae, clinical significance, and popular use cases, the table not only gives clarity on metric selection but also stresses the need of employing many complementing measures for a full and clinically useful evaluation of deep learning models.

Comprehensive comparative analysis of deep learning approaches

Deep learning models can uncover intricate relationships and patterns in data that would be challenging or impossible for conventional machine learning models to discover. Table 4 compares deep learning algorithms, including approaches, datasets, models, performance indicators, contributions, and limitations for other diseases. This comparative analysis highlights numerous methodological advancements observed in recent disease diagnostic research, such as improved model architectures, enhanced feature extraction strategies, integration of transfer learning, and the adoption of hybrid and multimodal learning techniques. It elucidates domain-specific patterns, model intricacies, and the trade-offs among accuracy, interpretability, and scalability. It highlighted persistent constraints, including insufficient explainability, restricted dataset variety, and the need for multimodal or privacy-preserving methodologies. This study demonstrates that deep learning models like DBN, LSTM, CNN, and GANs are highly effective in diagnosing and predicting various medical conditions. The models can handle complex medical data with ease and provide superior results. Overall, DL approaches produced better performance, achieving top accuracy in prostate (99.07%), lung (99.8%), and potentially breast cancer (98.96%). Notably, a machine learning method achieved high accuracy in Alzheimer’s disease (98.68%). Overall, models that use advanced optimization techniques, transfer learning, and neural network designs, such as CNNs, RNNs, and hybrid models, are better at handling medical datasets and achieving high performance levels across various disorders.

Table 4:

Comparative analysis of deep learning approaches for disease diagnosis.

Disease type/Related work	Dataset and type of data used	DL models	Performance metrics (%)	Contributions	Drawbacks
Heart Disease (Uma Maheswari & Valarmathi, 2023; Almazroi et al., 2023; DeGroat et al., 2024; Alshraideh et al., 2024)	Cleveland Heart Disease Dataset (Numeric Format)	DBN with SVM	Accuracy: 97.91%	• DBN-SVM integrates DBNs for feature extraction and SVMs for classification.	• SVM scalability issues
	Cleveland Heart Disease Dataset (Numeric Format)	DBN with SVM	Accuracy: 97.91%	• Excellent at processing organized numerical data and obtaining high accuracy.	• High computational complexity
	UCI Repository (Numeric Format)	LSTM	Accuracy: 82.49%,	• LSTM captures temporal patterns but struggles with identifying non-disease instances.	• Poor performance with imbalanced datasets, leading to false positives for non-disease cases.
			Precision: 77.08%,	• Effective at capturing temporal connections in health data.	• Class imbalance
			Recall: 87.24%, F1-score: 84.41%
	UCI Repository (Numeric Format)	CNN with Bi-LSTM	Accuracy: 94.5%, Precision: 94%, Recall: 94%,	• The hybrid model combines sequential and spatial data for balanced performance.	• Higher computational cost due to the hybrid architecture.
	UCI Repository (Numeric Format)	CNN with Bi-LSTM	F1-score: 94%	• Combines the advantages of CNN and LSTM for spatial and sequential data.	• Complexity in model design and tuning
	Real-time dataset (Numerical Format)	SVM with PSO	Accuracy: 94.3%	• Used various ML models to predict heart disease	• Lack of Retrospective Data and additional evaluation metrics
				• Findings significantly impact early disease detection, diagnosis, and customized therapy, which might help healthcare providers make wise choices and enhance patient outcomes.	• Data dimensionality and model interpretability

Breast Cancer (Saber et al., 2021; Alloqmani, Abushark & Khan, 2023; Maria et al., 2023; Fatima et al., 2024)	DDSM, MIAS, and Private Dataset (Images)	Transfer Learning	Accuracy: 98.96%	• Pre-trained models enhance accuracy for small mammography datasets.	• It relies on high-quality pre-trained models;
	DDSM, MIAS, and Private Dataset (Images)	Transfer Learning	Accuracy: 98.96%	• Using pre-trained models significantly decreases training time.	• Performance depends on data preprocessing.
	INbreast and MIAS (Images)	CNN	MIAS AUC: 97.36%, INbreast AUC: 94.25%	• CNNs excel at identifying malignant areas with consistent performance.	• Requires extensive training data to avoid overfitting.
	INbreast and MIAS (Images)	CNN	MIAS AUC: 97.36%, INbreast AUC: 94.25%	• Effective for spotting fine-grained patterns in image datasets.	• Lack of interpretability
	BI-RADS Dataset and SRM Medical Hospital (Images)	DMD-CNN	Accuracy: 98.2%	• Customized CNNs like DMD-CNN demonstrate high diagnostic accuracy in clinical applications.	• Limited generalizability to other datasets without retraining.
	BI-RADS Dataset and SRM Medical Hospital (Images)	DMD-CNN	Accuracy: 98.2%	• Tailored architectures improve task-specific performance.	• Increased computational demands
	Public dataset (images)	Neural Network	Accuracy: 93%	Breast Cancer Prediction Analysis • Provided comprehensive analysis. • Identified key predictors for accurate diagnosis. • This study compared traditional ML with the DL model	• A small dataset was used, and various ML models should have been explored
			Precision: 98% Recall: 87%,		• Overfitting risks in DL models
			F1-score: 92%
Lung Cancer (Pradhan, Chawla & Rawat, 2023; Demiroğlu et al., 2023; Yamuna Devi et al., 2024)	Kaggle Lung Cancer Dataset (Numeric Format)	SA-SLnO	Accuracy: 96.66%	• Optimization methods like SA-SLnO improve DL models by enhancing parameter tuning and feature selection.	• Optimization approaches can be computationally expensive.
	Kaggle Lung Cancer Dataset (Numeric Format)	SA-SLnO	Accuracy: 96.66%	• Enhances feature selection and parameter optimization.	• Complexity integration with DL models
	UCI Repository (CT Images)	DarkNet-53 and DenseNet-201 with Neighborhood Component Analysis	Accuracy: 98.86%	• DenseNet-201 performs well with complex image data and integrated neighborhood analysis.	• High computational resources are needed for training.
	UCI Repository (CT Images)		Accuracy: 98.86%	• Combines deep layers with robust analysis to increase accuracy.	• Difficult to train with limited data
	Bronchoscopy and Hamlyn lung datasets (Images)	Proposed CNN	Accuracy: 99.8%	• The proposed method is used for the early identification of lung cancer.	• Concentrated only on segmentation and classification
			Precision: 99.8%	• Adaptive median filter for segmentation	• Lack of model interpretability
			Recall: 100%	• Investigated the malignant tumor detection method over a few currently used structures
			F1-score: 99.9%	• Focused on the edge-segmentation process
				• Classification is done using the cluster technique
Colorectal Cancer	MCO Dataset (Images)	CRCNet	Confidence Interval (CI): 95%	• The small confidence intervals provide high dependability.	• Dependency on image quality and data preprocessing.
(Li et al., 2022; Mulenga et al., 2021; Sharkas & Attallah, 2024; Millward et al., 2025)	MCO Dataset (Images)	CRCNet	Confidence Interval (CI): 95%	• CRCNet is reliable and well-suited for accurate diagnostic tasks.	• Dependency on image quality and data preprocessing.
	UCI Repository (CT Images)	t-SNE	AUC: 94%	• Effective for showing multidimensional data patterns.	• t-SNE’s outputs are non-deterministic, making it unsuitable for large-scale models.
	UCI Repository (CT Images)	t-SNE	AUC: 94%	• t-SNE aids in visualizing and classifying intricate patterns in microbiome data.
	CRC-based Microbiome Datasets (Numeric Format)	DNN	Accuracy: 95%	• Flexible with both organized and unstructured data types.	• Requires significant hyperparameter tuning to achieve optimal results.
	CRC-based Microbiome Datasets (Numeric Format)	DNN	Accuracy: 95%	• DNN models efficiently process diagnostic tasks with structured datasets.
	Austin-CRC (n = 353), RNSH-CRC (n = 1,070), MCO-CRC (n=885). (Images)	SegFormer-B0 for broad tumor segmentation.	Broad tumor segmentation F1-score: 0.95.	• Created a completely automatic and comprehensible iTIL rating methodology.	• Erroneous identification of tangentially sectioned tumor nuclei as Tumor-Infiltrating Lymphocytes (TILs).
		- SegFormer-B1 for TIL detection.	- TSN segmentation F1-score: Tumor 0.88, Stroma 0.91, Necrosis 0.64.	• Generalizes across many independent cohorts without the need for retraining.	• Inability to differentiate mucin from stroma.
			- TIL detection F1-score: 0.59, Average Precision: 0.52.	• Pioneered pixel-level segmentation for both tumor and tumor-infiltrating lymphocytes in whole slide images of colorectal cancer.	• Marginally decreased stratification efficacy in MCO-CRC Stage III instances.
			- Prognostic stratification (5-Year OS HR: 1.67, Multivariate HR: 1.37).
Prostate Cancer (Salman et al., 2022; Yildirim et al., 2022; Iqbal et al., 2021)	Sakarya University Research Hospital Dataset (Images)	CNN	Accuracy: 97%	• Extracted essential image features with great precision.	• Overfitting risk when applied to small datasets without augmentation.
	Sakarya University Research Hospital Dataset (Images)	CNN	Accuracy: 97%	• CNN’s feature extraction abilities are highly effective for prostate cancer detection.	• Large data requirements
	mp-MRI Data (T2W, DWI, and ADC) (Images)	Pre-trained CNN	Accuracy: 96.09%	• Reduces training time while ensuring precision.	• Dependency on the selection of suitable pre-trained models.
	mp-MRI Data (T2W, DWI, and ADC) (Images)	Pre-trained CNN	Accuracy: 96.09%	• Pre-trained CNNs utilize MRI-specific features efficiently, reducing retraining effort.	• Retraining is still required for optimal precision
	Public Dataset (Images)	LSTM and ResNet-101	Accuracy: 99.07%	• High precision is achieved via the complementary qualities of ResNet and LSTM.	• Computational complexity increases significantly with model size
	Public Dataset (Images)	LSTM and ResNet-101	Accuracy: 99.07%	• Combines ResNet-101’s deep-layer capabilities with LSTM for sequential processing, ensuring exceptional precision.	• Lack of model transparency
			RMSD: 2.5 mm
Alzheimer’s Disease (Alghamdi et al., 2025; Bi & Wang, 2019; Raghavaiah & Varadarajan, 2022; Matlani, 2024)	ADNI Dataset (Images)	Black Widow Optimization (BWO) with Fuzzy C-Means Clustering (FCM)	Accuracy: 98.68%, Sensitivity: 97.72%, Specificity: 97.19%	• Combines optimization and clustering to enhance diagnostic sensitivity and specificity.	• There is a high dependency on parameter tuning for the optimization algorithm.
	ADNI Dataset (Images)		Accuracy: 98.68%, Sensitivity: 97.72%, Specificity: 97.19%	• Balances sensitivity and specificity for greater diagnostic accuracy.	• Dependence on the quality of data
	ADNI Dataset (Images)	Deep Convolutional Neural Network (DCNN)	Accuracy: 93.86%	• DCNN outperforms competitors, though hybrid techniques may better balance accuracy and interpretability.	• Lack of interpretability compared to simpler or hybrid models.
	ADNI Dataset (Images)	Deep Convolutional Neural Network (DCNN)	Accuracy: 93.86%	• It extracts rich hierarchical characteristics from neuroimaging data.	• Small dataset
	Public Dataset (Images)	Contractive Slab and Spike Convolutional Deep Boltzmann Machine (CssCDBM)	Accuracy: 95.04%	• Effectively handles noise in neuroimaging datasets.	• Computationally intensive and prone to overfitting in small datasets.
	Public Dataset (Images)		Accuracy: 95.04%	• CssCDBM shows resilience against noisy neuroimaging data, delivering solid performance.	• Interpretability issues
	MRI images (ADNI and OASIS dataset)	Hybrid Bi-directional Long Short-Term Memory with Artificial Neural Network	Accuracy of 99.22%	• Used Improved Wild Horse Optimization algorithm (IWHO)
		(BiLSTM-ANN)	Precision: 98.26	• Automatic Alzheimer’s Disease Diagnosis using Hybrid Deep Learning	• Comparable with other machine learning models.
			Recall: 98.06-ADNI and Accuracy: 98.96% Recall	• Utilizes Improved Adaptive Weaver Filtering (IAWF) for image pre-processing.	• Future use of ensemble-based feature extraction techniques.
			Sensitivity: 98.32	• Principal Component Analysis extracts significant features from images using a Normalized Global Image Descriptor (PCA-NGIST).
			Specificity: 99.21 for the OASIS
Parkinson’s Disease (Alissa et al., 2022; Khaskhoussy & Ayed, 2022; Kumar & Ghosh, 2024; Hadadi & Arabani, 2024)	Leeds Teaching Hospitals NHS Trust (Images)	CNN	Accuracy: 93.5%	• Strong at extracting features from handwriting images.	• Struggles with temporal patterns without complementary sequential models.
	Leeds Teaching Hospitals NHS Trust (Images)	CNN	Accuracy: 93.5%	• Performs well in analyzing image-based handwriting data for Parkinson’s diagnosis.	• Missing Clinical settings
	PaHaW Dataset (Images)	Bi-LSTM	Accuracy: 100%	• Exceptional at identifying temporal relationships in sequential data.	• It may overfit small datasets without adequate validation.
	PaHaW Dataset (Images)	Bi-LSTM	Accuracy: 100%	• Bi-LSTM achieves perfect accuracy in sequential handwriting data.	• The severity of PD classification should have been explored
	Public Dataset & MFCC-GMM (Numeric Format)	Deep features-based Autoencoder with MFCC-GMM	Accuracy: 99%	• Effective feature extraction from complex voice data.	• Requires preprocessing expertise and may underperform in noisy datasets.
	Public Dataset & MFCC-GMM (Numeric Format)	Deep features-based Autoencoder with MFCC-GMM	Accuracy: 99%	• It combines autoencoders and GMM to process speech features effectively for Parkinson’s prediction.	• Class imbalance
	NewHandPD dataset (Images)	Harris Hawks Optimization (HHO) with pre-trained models	Accuracy of 94.12%	• Implemented Harris Hawks Optimization algorithm for optimal hyperparameter values.	• Continuous analysis of handwriting changes can provide insights into disease control and drug effects.
			Precision: 94.1%,	• Outperforms other methods in 10 iterations.	• Small size of the data used
			Recall: 94.24%,	• Combines a deep neural network and the Harris Hawks Optimization algorithm.
			F1-score: 94.11%, and AUC: 0.98
Multiple Sclerosis (Schwab & Karlen, 2020; Kaur et al., 2023; Balgetir et al., 2021; Yaghoubi et al., 2024)	Floodlight Open Study (Real-time Numeric Dataset)	Multilayer Perceptron with Neural Soft Attention	Accuracy: 88%	• Attention processes emphasize critical elements for better decision-making.	• Accuracy is moderate compared to other advanced DL methods.
	Floodlight Open Study (Real-time Numeric Dataset)	Multilayer Perceptron with Neural Soft Attention	Accuracy: 88%	• Neural attention enhances focus on relevant features, improving diagnostic accuracy.	• The data sample is limited in size.
	Public Dataset (Images)	DeepMS2G	Accuracy: 83%	• Excel at analyzing dynamic gait data for multiple sclerosis.	• Limited accuracy indicates scope for better architecture or training techniques.
	Public Dataset (Images)	DeepMS2G	Accuracy: 83%	• Captures gait dynamics effectively but leaves room for performance improvement.
	Public Dataset (Numeric Format)	VGG19-SVM	Accuracy: 89.23%	• Hybrid approaches increase classification efficiency.	• Performance heavily depends on dataset quality and feature extraction.
	Public Dataset (Numeric Format)	VGG19-SVM	Accuracy: 89.23%	• Combines deep feature extraction with SVM for enhanced classification.	• Model Explainability
	Kashani Comprehensive MS Center in Isfahan (Images) Real-time data collected		Accuracy: 97.44%	• Mobile Net V2 Model for MS Diagnosis	• Experimentation with diverse algorithms for enhanced performance.
		Transfer learning	Precision: 100%	• Introduced a transfer learning network for MS diagnosis in SLO images.	• Potential integration with other deep-learning methods.
			Recall: 91.67%	• MS Diagnosis Using SLO Images and Computer Technology
			F1-score: 95.65%	• Converting color images to gray using Taylor-Coye and DWT algorithms.
				• Segmenting images and extracting retinal vessels.
				• Used two classification techniques: classic Machine Learning and transfer learning models.
Diabetes (Naz & Ahuja, 2020; Önal, Güraksin & Duman, 2023; Gao et al., 2018; Kurt et al., 2023)	UCI Diabetes Dataset (Numeric Format)	ANN	Accuracy: 97.14%	• High flexibility to structured datasets and good prediction capability.	• Struggles with noisy or incomplete datasets and simple neural networks are used.
	UCI Diabetes Dataset (Numeric Format)	ANN	Accuracy: 97.14%	• Structured datasets are well-suited for ANN, showcasing high diabetes prediction accuracy.
	International Diabetes Federation (Images)	Deep CNN	Accuracy: 80%	• When correctly designed, CNNs are capable of learning complicated features.	• Low accuracy indicates inadequate handling of specific diabetes dataset challenges.
	International Diabetes Federation (Images)	Deep CNN	Accuracy: 80%	• Deep CNN needs additional feature engineering to improve performance in diabetes datasets.
	Clinic Dataset (Numeric Format)	RNN-LSTM with Bayesian Optimization	Accuracy: 98%	• Bayesian optimization improves hyperparameter tweaking, resulting in more excellent model performance.	• Computationally expensive and sensitive to hyperparameter initialization.
	Clinic Dataset (Numeric Format)	RNN-LSTM with Bayesian Optimization	Accuracy: 98%	• Fine-tuned RNN-LSTM achieves top-tier clinical prediction accuracy for diabetes.
	Gestational Diabetes dataset (Numerical data) real-time dataset collected	RNN-LSTM with Bayesian optimization	Sensitivity: 95%	• Employed deep learning and Bayesian optimization.	• Limited sample size
			Specificity: 99%	• Compared SVM, RF, and RNN-LSTM methods on original and resampled datasets.	• Lack of model transparency
				• RNN-LSTM with Bayesian optimization developed the most effective prediction model.

DOI: 10.7717/peerj-cs.3484/table-4

These findings demonstrate that deep learning algorithms accurately identify and predict diseases. It is crucial to remember that these findings come from studies that utilized datasets and models. Deep learning models’ performance may change depending on the dataset and model utilized. Deep learning is a promising technology for predicting and diagnosing diseases. More research is necessary to enhance the functionality of deep learning models and increase their accessibility. Also, Table 4 shows that deep learning models are potent tools for disease detection, providing elevated accuracy, versatility, and adaptation across diverse datasets. These results indicate that sophisticated deep learning approaches, with customized data and feature engineering, may enhance model performance and clinical relevance. Hybrid architectures, optimization approaches, and transfer learning improve model performance and tackle data-specific issues. However, future initiatives must prioritize enhancing model generalizability, explainability, computing requirements, misclassification, overfitting, and handling unbalanced datasets to facilitate practical clinical use. The healthcare industry may use these findings to develop AI-driven diagnostic tools to enhance patient outcomes.

Research challenges and open issues in disease diagnosis and prediction

Despite their potency and adaptability, deep learning models encounter several substantial hurdles. Confronting these challenges necessitates a multidisciplinary strategy that encompasses data collection and preprocessing techniques, algorithmic improvements, fairness-conscious model training, interpretability methodologies, secure learning, resilient models against adversarial attacks, and collaboration with domain experts and impacted communities to advance deep learning and achieve its maximum potential. These problems are concisely summarized below (Talaei Khoei, Ould Slimane & Kaabouch, 2023).

Large dataset

Deep learning models need substantial computational resources. A basic deep learning model requires more network parameters in its architecture. The efficacy of deep learning models is significantly contingent upon the size of the training dataset. Due to their automated feature extraction capability, deep learning models need exposure to several variations of instances within the corresponding classes in the dataset to achieve effective generalization. This explains the success of deep learning in domains where large quantities of data can be readily gathered. A substantial volume of data may be amassed across several areas, such as natural language processing and computer vision, resulting in the increasing popularity of deep learning applications in these fields. Nonetheless, producing vast quantities of data poses a considerable challenge in medicine. Consequently, we cannot adequately train deep learning models using limited datasets to attain the intended objectives. Moreover, medical data analysis is more complex than other fields, such as image or voice recognition. These complexities impair the model’s capacity for effective generalization. A substantial amount of training data is hence necessary to mitigate this problem. Innovative data generation methodologies must enhance the volume of data collected and utilized for dataset formulation in the medical domain (Lopez Alcaraz et al., 2025).

Deep learning methods need extensive and varied datasets for successful generalization, since exposure to various examples aids in the acquisition of strong representations of fundamental patterns. In medical fields, databases frequently exhibit limited sizes and imbalances due to difficulties in acquiring annotated patient information. Insufficient data heightens the likelihood of overfitting, causing models to remember training samples instead of identifying genuine disease-related characteristics, which leads to diminished test performance and skewed predictions for underrepresented populations. The absence of generality is especially alarming in healthcare, since it may result in erroneous diagnosis or misclassification of uncommon illnesses. Research indicates that limited datasets “fail to generalize patterns,” resulting in unstable decision limits and diminished dependability when utilized with unfamiliar patient data.

To solve these issues, researchers apply methodologies such as transfer learning, regularization, ensemble models, and data augmentation. Data augmentation is very efficacious in improving model performance when working with limited datasets. Through artificial dataset expansion, augmentation introduces increased variability to models while maintaining diagnostic labels. Prevalent methodologies encompass geometric and photometric alterations (e.g., rotation, flipping, brightness modifications) for medical imaging, as well as Synthetic Minority Over-sampling Technique (SMOTE) or Conditional Tabular Generative Adversarial Network (CTGAN) for tabular clinical datasets. Generative models, such as GANs, are employed in advanced techniques to generate realistic MRI or X-ray images, which have been shown to improve the accuracy of Alzheimer’s and cancer identification. Data augmentation is a critical tool for the development of dependable deep learning systems in healthcare contexts with limited data, as it mitigates overfitting, improves generalization, and equilibrates class distributions.

Availability of public and real-time disease diagnosis datasets and ethical considerations

There are publicly accessible databases for various diseases; however, collecting or obtaining real-time data remains challenging, owing to ethical concerns. Patient permission is required before data collection to guarantee privacy and compliance with ethical guidelines. Real-time medical data can only be released as open source after successfully deploying the DL models, protecting patient anonymity. Access to public and real-time disease diagnosis databases is essential for improving healthcare research, especially in creating AI-driven diagnostic systems. Many datasets exist across many disease areas, each providing distinct insights into certain diseases. Access to these datasets often requires compliance with ethical principles and data-sharing agreements to safeguard patient privacy and data security. These resources provide a basis for pioneering research in disease diagnosis, enabling the creation of more precise, real-time diagnostic instruments that may eventually enhance patient outcomes (Borda et al., 2022).

Quality and availability of data

Deep learning models demand substantial quantities of labeled training data for efficient learning. Acquiring enough high-quality labeled data may be costly, time-intensive, or complex, especially in specialist fields or when handling sensitive information such as healthcare and cybersecurity. Despite several methods, like data augmentation, to produce substantial data, generating sufficient training data to meet the demands of deep learning models may sometimes be arduous. Moreover, a limited dataset might result in overfitting, causing deep learning models to excel on training data but struggle to generalize to novel data. Balancing model complexity and regularization methods to prevent overfitting while attaining effective generalization is difficult in deep learning. Furthermore, investigating methodologies to enhance data efficiency, such as few-shot, active, or semi-supervised learning, remains a prominent study domain (Talaei Khoei, Ould Slimane & Kaabouch, 2023).

Data enrichment methods and the time complexity of the model

Employing conventional data augmentation techniques facilitates the introduction of significant variances in data samples for training. Nonetheless, the quality of the enhanced instances deteriorates beyond a certain threshold of augmentation. Consequently, novel data enrichment procedures are necessary for disease diagnostic tools. Generative Adversarial Networks (GANs) are used throughout several fields, including medical image synthesis, to provide high-quality data samples. Additional strategies are necessary to synthesize high-quality data. Deep learning models for medical diagnoses must manage large amounts of data, leading to increased network complexity. This requires additional processing resources and raises the model’s computational complexity. To address model complexity, deep learning models need high-end computers with sufficient CPUs and GPUs (Shyamala & Navamani, 2024).

Lack of cohesive frameworks and suboptimal utilization of transformer architectures

Existing research works concentrated on attaining high accuracy within certain domains (e.g., lung cancer or heart disease) using single-modality data and model types. Nevertheless, they lacked a cohesive framework that amalgamated essential components such as interpretability (XAI), privacy-preserving techniques, Transformer-based topologies, and cross-modal data fusion. This fragmentation hinders the development of resilient, practically deployable systems. Despite their shown efficacy in sequential biological data, Transformers have been little investigated. Research works partly integrate attention processes; nevertheless, no research has comprehensively used Transformer models for EEG, ECG, or text-based features. This indicates a lost chance to represent intricate temporal and contextual linkages in clinical data (Shatnawi, Abuein & Al-Quraan, 2025).

Inadequate hyperparameter tuning and uneven datasets

Numerous research studies failed to provide detailed documentation of tuning methodologies. While a few research works exhibited performance enhancements via systematic tuning, others depended on default configurations, compromising repeatability and possible performance advancements. Data imbalance was a considerable challenge, particularly in the lung cancer and dermatological datasets. While CTGAN and SMOTE were used in some instances to mitigate this issue, most research failed to implement thorough class-balancing procedures or efficiently utilize data augmentation. This raises problems over the model’s generalizability to diverse real-world populations (Alzahrani, 2025).

Absence of privacy-enhancing techniques and limited external validation and generalizability

Only a few research works have investigated federated learning or blockchain as methods to facilitate collaborative learning while safeguarding patient data privacy. Considering the sensitivity of medical data, this is a significant oversight that must be rectified in further study. External validation was either absent or inadequately highlighted in the majority of research. While research works validated across many datasets, most mainly depended on single-dataset, training-testing split assessments, limiting confidence in the models’ efficacy on unobserved populations and diverse clinical environments (Abbas et al., 2025).

Transparency

Deep learning’s interpretability and explainability are significant roadblocks to deciphering complicated models. The decision-making procedures of increasingly complex deep neural networks, with their many layers and parameters, might seem like “black boxes,” making it hard to understand the reasoning behind individual predictions. The lack of transparency in areas with high stakes, such as healthcare and finance, makes it harder to trust and use these models. Finding an acceptable balance between model performance and readability is essential for researchers, regulators, and end-users to understand the model’s logic, which is necessary for making informed decisions and holding models accountable in the complex world of contemporary deep learning (Hadadi & Arabani, 2024).

The present investigation exhibits encouraging outcomes; nonetheless, it is constrained by many limitations. The dataset used was constrained in size and variety owing to the availability of annotated data, potentially impacting the model’s generalizability to wider populations. Secondly, although performance was assessed using standard measures, external validation using unobserved clinical datasets was not conducted because of limitations in data availability. Third, computing resource limits hindered the deployment of more complicated Transformer-based structures and significant hyperparameter tuning. Additionally, privacy-enhancing approaches such as federated learning or differential privacy were not implemented in this model version but are planned for future work to ensure compliance with data confidentiality regulations. Despite these difficulties, deep learning can potentially enhance disease diagnosis and prognosis. Large quantities of patient data can be analyzed using deep learning to find patterns that are invisible to human clinicians. This can help diagnose people at risk of acquiring specific diseases and result in more precise and prompt diagnoses.

Recent advancements in deep learning for disease diagnosis and prediction

The rapid growth of deep learning in medical diagnostics offers promising opportunities, although there are still obstacles to overcome. This section examines many significant advancements that tackle these difficulties and drive the field ahead. We explored transfer learning methods that utilize current knowledge to address the limitations of medical datasets. In addition, we examined techniques for handling unbalanced data sets in which specific diseases are infrequent. Furthermore, we investigated how XAI methodologies promote confidence and clarity in deep learning models used in healthcare applications. Transfer learning has substantially influenced medical imaging and diagnostics by allowing researchers to utilize pre-trained models from different tasks, such as generic X-ray categorization or ImageNet datasets, to train deep-learning models. This enables enhanced efficiency on smaller datasets unique to diseases and pre-trained models relevant to domains. To improve diagnosis accuracy even with minimal data, information from massive medical datasets uses transfer learning and foundation models like BioBERT and MedGPT. The amalgamation of Vision Transformers (ViTs) with Graph Neural Networks (GNNs) has significant promise. ViTs may improve feature extraction in high-resolution imaging techniques such as CT scans and fundus photography. At the same time, GNNs excel in modeling intricate relational frameworks in patient histories, disease ontologies, and genetic connections (Sadr & Nazari Soleimandarabi, 2022). Federated learning enables medical institutions to collectively train models without sharing raw data, thus safeguarding patient privacy while promoting the development of exact models. Its capacity to develop strong models on various datasets guarantees wider application across multiple populations and improves diagnostic accuracy. Better early disease identification, individualized treatment strategies, and real-time monitoring are all possible results of integrating federated learning into healthcare, eventually enhancing patient outcomes (Bebortta et al., 2023).

An imbalance in medical datasets can lead algorithms to prioritize the majority class, potentially overlooking essential trends. Techniques such as oversampling the underrepresented class or undersampling the overrepresented are used to balance the classes. Class-weighted loss functions can prioritize the minority class more during model training. Model generalization can be enhanced by addressing class imbalance in medical datasets using strategies such as adaptive loss functions, oversampling, and synthetic data generation (Alzahrani, 2025; Hariprasad et al., 2023). The intricate nature of deep learning models, sometimes referred to as the “black box” problem, presents a hurdle to their use in clinical environments. Scientists are now focusing on creating XAI methodologies to offer a deeper understanding of the internal mechanisms of these models. This will help increase confidence and make incorporating AI technologies into healthcare operations easier. Interpretability approaches, such as Grad-CAM, SHAP, and attention-based methods, are being used to improve confidence in deep learning models by offering insights into model decision-making (Lopez Alcaraz et al., 2025; Saravanan et al., 2023). These are just a handful of fascinating advancements in deep learning that are helping to overcome the difficulties associated with disease prediction and diagnosis. We may anticipate even more developments as research goes on, resulting in more precise, understandable, and eventually life-saving uses of AI in healthcare (Rahal et al., 2024).

Conclusion and future work

This study demonstrates that DL has the potential to revolutionize the diagnosis and prediction of diseases. It can provide significant improvements in diagnostic accuracy, early identification, individualized therapy, and overall patient outcomes, while also contributing to the reduction of healthcare costs. Recent literature syntheses indicate that a variety of architectures, including CNNs, RNNs, LSTMs, GANs, DNNs, and hybrid models, have been effectively implemented in a variety of modalities, including medical imaging, time-series physiological data, clinical records, and genetic datasets. These models have demonstrated exceptional performance in the diagnosis of cardiovascular illnesses, malignancies, neurological disorders, including Alzheimer’s and Parkinson’s disease, diabetes, dermatological afflictions, and infectious diseases. Despite these developments, the review emphasizes the persistent obstacles that impede clinical adoption, including the inadequate interpretability of model predictions, the dependence on extensive and high-quality annotated datasets, the suboptimal integration of multimodal data sources, apprehensions regarding generalizability across diverse patient populations, and the constraints of computational resources. Additionally, the absence of cohesive frameworks that integrate interpretability, scalability, and security, incomplete external validation, and restricted implementation of privacy-preserving methodologies continue to present significant obstacles to their widespread adoption in healthcare systems.

Future research must focus on overcoming these limitations by developing multimodal and context-aware learning frameworks that can incorporate genetic, textual, signal, and imaging data to offer enhanced, context-specific diagnostic insights. This will be necessary to address dataset scarcity and imbalance by enhancing data efficacy through generative AI, transfer learning, and self-supervised learning. The incorporation of explainable AI methodologies, such as Local Interpretable Model-agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP), Grad-CAM, ProtoPNet, and SENN, directly into model pipelines is necessary to ensure transparency, cultivate clinician confidence, and comply with regulatory norms, thereby enhancing interpretability. The improvement of privacy-preserving methodologies, such as federated learning, differential privacy, blockchain, and secure multi-party computing, is equally important in order to facilitate collaborative training while protecting sensitive patient data. The practical applicability and accessibility of adaptive, continuously learning systems will be substantially enhanced by the development of lightweight and resource-efficient models for deployment on edge devices and in low-resource environments. The future of AI-driven diagnostic instruments will be influenced by the exploration of innovative paradigms such as Vision Transformers for medical imaging, Graph Neural Networks for relational biomedical data, and foundational models like MedGPT and BioBERT for clinical text analysis. Ultimately, the integration of forthcoming diagnostic systems into healthcare workflows must be precise, transparent, equitable, and seamless, necessitating ongoing collaboration among AI researchers, healthcare professionals, policymakers, and patients to bridge the gap between research innovation and clinical practice.

Supplemental Information

List of commonly used abbreviations in deep learning based disease diagnosis.

DOI: 10.7717/peerj-cs.3484/supp-1

Download

[1] Abbas S, Ahmed F, Khan WA, Ahmad M, Khan MA, Ghazal TM. 2025. Intelligent skin disease prediction system using transfer learning and explainable artificial intelligence. Scientific Reports 15(1):1746

[2] Ahmad A, Tariq A, Hussain HK, Gill AY. 2023. Revolutionizing healthcare: how deep learning is poised to change the landscape of medical diagnosis and treatment. Journal of Computer Networks, Architecture and High Performance Computing 5(2):458-471

[3] Ahmed HM, Elsharkawy ZF, Elkorany AS. 2023. Alzheimer disease diagnosis for magnetic resonance brain images using deep learning neural networks. Multimedia Tools and Applications 82(12):17963-17977

[4] Ahmed M, Husien I. 2024. Heart disease prediction using hybrid machine learning: a brief review. Journal of Robotics and Control (JRC) 5(3):884-892

[5] Ahsan MM, Alam TE, Trafalis T, Huebner P. 2020. Deep MLP-CNN model using mixed data to distinguish between COVID-19 and Non-COVID-19 patients. Symmetry 12(9):1526

[6] Ahsan MM, Siddique Z. 2022. Machine learning-based heart disease diagnosis: a systematic literature review. Artificial Intelligence in Medicine 128(3):102289

[7] Ajagbe SA, Adigun MO. 2024. Deep learning techniques for detection and prediction of pandemic diseases: a systematic literature review. Multimedia Tools and Applications 83(2):5893-5927

[8] Alghamdi AM, Ashraf MU, Bahaddad AA, Almarhabi KA, Al Shehri WA, Daraz A. 2025. A novel approach hybrid of ensemble learning and 3-D CNN mechanism: early-stage diagnosis of Alzheimer’s disease using EEG signals. Scientific Reports 15(1):35893

[9] Alissa M, Lones MA, Cosgrove J, Alty JE, Jamieson S, Smith SL, Vallejo M. 2022. Parkinson’s disease diagnosis using convolutional neural networks and figure-copying tasks. Neural Computing and Applications 34(2):1433-1453

[10] Alloqmani A, Abushark YB, Khan AI. 2023. Anomaly detection of breast cancer using deep learning. Arabian Journal for Science and Engineering 48(8):10977-11002

[11] Almazroi AA, Aldhahri EA, Bashir S, Ashfaq S. 2023. A clinical decision support system for heart disease prediction using deep learning. IEEE Access 11:61646-61659

[12] Alshraideh M, Alshraideh N, Alshraideh A, Alkayed Y, Al Trabsheh Y, Alshraideh B. 2024. Enhancing heart attack prediction with machine learning: a study at jordan university hospital. Applied Computational Intelligence and Soft Computing 2024(1):5080332

[13] Alzahrani A. 2025. Early detection of lung cancer using predictive modeling incorporating CTGAN features and tree-based learning. IEEE Access 13(1):34321-34333

[14] Asif RN, Naseem MT, Ahmad M, Mazhar T, Khan MA, Khan MA, Al-Rasheed A, Hamam H. 2025. Brain tumor detection empowered with ensemble deep learning approaches from MRI scan images. Scientific Reports 15(1):15002

[15] Atchison K, Wu P, Samii L, Walsh M, Ismail Z, Iaboni A, Goodarzi Z. 2024. Detection of anxiety symptoms and disorders in older adults: a diagnostic accuracy systematic review. Age and Ageing 53(7):afae122

[16] Balas M, Micieli JA. 2023. ‘Visual snow syndrome: use of text-to-image artificial intelligence models to improve the patient perspective. Canadian Journal of Neurological Sciences/Journal Canadien des Sciences Neurologiques 50(6):946-947

[17] Balgetir F, Bilek F, Kakakus S, Arslan-Tuncer S, Demir CF. 2021. Detection of ataxia in low disability MS patients by hybrid convolutional neural networks based on images of plantar pressure distribution. Multiple Sclerosis and Related Disorders 56(8):103261

[18] Baviskar V, Verma M, Chatterjee P, Singal G. 2023. Efficient heart disease prediction using hybrid deep learning classification models. IRBM 44(5):100786

[19] Bebortta S, Tripathy SS, Basheer S, Chowdhary CL. 2023. FedEHR: a federated learning approach towards the prediction of heart diseases in IoT-based electronic health records. Diagnostics 13(20):3166

[20] Bhattacharjee S, Saha B, Bhattacharyya P, Saha S. 2022. Classification of obstructive and non-obstructive pulmonary diseases based on spirometry using machine learning techniques. Journal of Computational Science 63:101768

[21] Bi X, Wang H. 2019. Early Alzheimer’s disease diagnosis based on EEG spectral images using deep learning. Neural Networks 114(18):119-135

[22] Borda A, Molnar A, Neesham C, Kostkova P. 2022. Ethical issues in AI-enabled disease surveillance: perspectives from global health. Applied Sciences 12(8):3890

[23] DataRobot. 2021. Unsupervised machine learning, DataRobot AI cloud. [cited 2021 Dec 17] (accessed 29 June 2024)

[24] DeGroat W, Abdelhalim H, Patel K, Mendhe D, Zeeshan S, Ahmed Z. 2024. Discovering biomarkers associated and predicting cardiovascular disease with high accuracy using a novel nexus of machine learning techniques for precision medicine. Scientific Reports 14(1):1

[25] Demiroğlu U, Şenol B, Yildirim M, Eroğlu Y. 2023. Classification of computerized tomography images to diagnose non-small cell lung cancer using a hybrid model. Multimedia Tools and Applications 82(21):33379-33400

[26] Djenouri Y, Belhadi A, Yazidi A, Srivastava G, Lin JCW. 2024. Artificial intelligence of medical things for disease detection using ensemble deep learning and attention mechanism. Expert Systems 41(6):e13093

[27] El-Bashbishy AES, El-Bakry HM. 2024. Pediatric diabetes prediction using deep learning. Scientific Reports 14(1):4206

[28] El-Ghany SA, Mahmood MA, Abd El-Aziz AA. 2024. An accurate deep learning-based computer-aided diagnosis system for Gastrointestinal disease detection using wireless capsule endoscopy image analysis. Applied Sciences 14(22):10243

[29] Fatima A, Shabbir A, Janjua JI, Ramay SA, Bhatty RA, Irfan M, Abbas T. 2024. Analyzing breast cancer detection using machine learning & deep learning techniques. Journal of Computing & Biomedical Informatics 7(02):11

[30] Gao Z, Li J, Guo J, Chen Y, Yi Z, Zhong J. 2018. Diagnosis of diabetic retinopathy using deep neural networks. IEEE Access 7:3360-3370

[31] Goyal P, Rani R, Singh K. 2024. A multilayered framework for diagnosing and classifying Alzheimer’s disease using transfer learned Alexnet and LSTM. Neural Computing and Applications 36(7):3777-3801

[32] Groh M, Badri O, Daneshjou R, Koochek A, Harris C, Soenksen LR, Doraiswamy PM, Picard R. 2024. Deep learning-aided decision support for diagnosis of skin disease across skin tones. Nature Medicine 30(2):573-583

[33] Guo K, Cheng J, Li K, Wang L, Lv Y, Cao D. 2023. Diagnosis and detection of pneumonia using weak-label based on X-ray images: a multi-center study. BMC Medical Imaging 23(1):209

[34] Gupta V, Petursson P, Rawshani A, Boren J, Ramunddal T, Bhatt DL, Omerovic E, Angerås O, Smith G, Sattar N, Andersson E, Redfors B, Hilgendorf L, Bergström G, Pirazzi C, Skoglund K, Rawshani A. 2025. End-to-end deep-learning model for the detection of coronary artery stenosis on coronary CT images. Open Heart 12(1):e002998

[35] Hadadi S, Arabani SP. 2024. A novel approach for Parkinson’s disease diagnosis using deep learning and Harris Hawks optimization algorithm with handwritten samples. Multimedia Tools and Applications 83:81491-81510

[36] Hariprasad R, Navamani TM, Rote TR, Chauhan I. 2023. Design and development of an efficient risk prediction model for cervical cancer. IEEE Access 11:74290-74300

[37] Huang D, Cogill S, Hsia RY, Yang S, Kim D. 2023. Development and external validation of a pretrained deep learning model for the prediction of non-accidental trauma. NPJ Digital Medicine 6(1):131

[38] Huang H, Gao W, Ye C. 2021. An intelligent data-driven model for disease diagnosis based on machine learning theory. Journal of Combinatorial Optimization 42(4):884-895

[39] Hussain I, Nazir MB. 2024. Mind matters: exploring AI, machine learning, and deep learning in neurological health. International Journal of Advanced Engineering Technologies and Innovations 1(4):209-230

[40] Iqbal S, Siddiqui GF, Rehman A, Hussain L, Saba T, Tariq U, Abbasi AA. 2021. Prostate cancer detection using deep learning and traditional techniques. IEEE Access 9:27085-27100

[41] Jiang T, Gradus JL, Rosellini AJ. 2020. Supervised machine learning: a brief primer. Behavior Therapy 51(5):675-687

[42] Jiang L, Yang X, Yu C, Wu Z, Wang Y. 2024. Advanced AI framework for enhanced detection and assessment of abdominal trauma: integrating 3D segmentation with 2D CNN and RNN models.

[43] Joshi AA, Aziz RM. 2024. A two-phase cuckoo search based approach for gene selection and deep learning classification of cancer disease using gene expression data with a novel fitness function. Multimedia Tools and Applications 83:71721-71752

[44] Kaur R, Levy J, Motl RW, Sowers R, Hernandez ME. 2023. Deep learning for multiple sclerosis differentiation using multi-stride dynamics in gait. IEEE Transactions on Biomedical Engineering 70(7):2181-2192

[45] Khalfallah S, Puech W, Tlija M, Bouallegue K. 2025. Exploring the effectiveness of machine learning and deep learning techniques for EEG signal classification in neurological disorders. IEEE Access 13:17002-17015

[46] Khandakar S, Al Mamun MA, Islam MM, Hossain K, Melon MMH, Javed MS. 2024. Unveiling early detection and prevention of cancer: machine learning and deep learning approaches. Educational Administration: Theory and Practice 30(5):14614-14628

[47] Khaskhoussy R, Ayed YB. 2022. Speech processing for early Parkinson’s disease diagnosis: machine learning and deep learning-based approach. Social Network Analysis and Mining 12(1):73

[48] Kong C, Yan D, Liu K, Yin Y, Ma C. 2025. Multiple deep learning models based on MRI images in discriminating glioblastoma from solitary brain metastases: a multicentre study. BMC Medical Imaging 25(1):171

[49] Kumar K, Ghosh R. 2024. Parkinson’s disease diagnosis using recurrent neural network based deep learning model by analyzing online handwriting. Multimedia Tools and Applications 83(4):11687-11715

[50] Kumar R, Kumbharkar P, Vanam S, Sharma S. 2024. Medical images classification using deep learning: a survey. Multimedia Tools and Applications 83(7):19683-19728

[51] Kundu D, Rahman MM, Rahman A, Das D, Siddiqi UR, Alam MGR, Dey SK, Muhammad G, Ali Z. 2024. Federated deep learning for monkeypox disease detection on GAN-augmented dataset. IEEE Access 12:32819-32829

[52] Kurt B, Gürlek B, Keskin S, Özdemir S, Karadeniz Ö, Buçan Kırkbir İ, Kurt T, Ünsal S, Kart C, Baki N, Turhan K. 2023. Prediction of gestational diabetes using deep learning and Bayesian optimization and traditional machine learning techniques. Medical & Biological Engineering & Computing 61(7):1649-1660

[53] LeewayHertz, Takyar A. 2024. From diagnosis to treatment: exploring the applications of generative AI in healthcare.

[54] Li X, Jonnagaddala J, Yang S, Zhang H, Xu XS. 2022. A retrospective analysis using deep-learning models for prediction of survival outcome and benefit of adjuvant chemotherapy in stage II/III colorectal cancer. Journal of Cancer Research and Clinical Oncology 148(8):1955-1963

[55] Li H, Yuan Q, Wang Y, Qu P, Jiang C, Kuang H. 2025. An algorithm for cardiac disease detection based on the magnetic resonance imaging. Scientific Reports 15(1):4053

[56] Loganathan E, Naveenkumar P, Santhosh C, Shankareshwaran P. 2025. Cancer disease identification and recommendation using hybrid deep learning algorithms. International Research Journal on Advanced Science Hub 7(01):40-50

[57] Lopez Alcaraz JM, Oloyede E, Taylor D, Haverkamp W, Strodthoff N. 2025. Explainable and externally validated machine learning for neuropsychiatric diagnosis via electrocardiograms. ArXiv

[58] Ma F, Sun T, Liu L, Jing H. 2020. Detection and diagnosis of chronic kidney disease using deep learning-based heterogeneous modified artificial neural network. Future Generation Computer Systems 111:17-26

[59] Malik SG, Jamil SS, Aziz A, Ullah S, Ullah I, Abohashrh M. 2024. High-precision skin disease diagnosis through deep learning on dermoscopic images. Bioengineering 11(9):867

[60] Maria HH, Kayalvizhi R, Malarvizhi S, Venkatraman R, Patil S, Kumar AS. 2023. Real-time deployment of BI-RADS breast cancer classifier using deep-learning and FPGA techniques. Journal of Real-Time Image Processing 20(4):80

[61] Matlani P. 2024. BiLSTM-ANN: early diagnosis of Alzheimer’s disease using hybrid deep learning algorithms. Multimedia Tools and Applications 83(21):60761-60788

[62] Millward J, He Z, Nibali A, Mouradov D, Mielke LA, Tran K, Chou A, Hawkins NJ, Ward RL, Gill AJ, Sieber OM, Williams DS. 2025. Automated deep learning-based assessment of tumour-infiltrating lymphocyte density determines prognosis in colorectal cancer. Journal of Translational Medicine 23(1):298

[63] Mohades Deilami F, Sadr H, Tarkhan M. 2022. Contextualized multidimensional personality recognition using combination of deep neural network and ensemble learning. Neural Processing Letters 54(5):3811-3828

[64] Mulenga M, Kareem SA, Sabri AQM, Seera M. 2021. Stacking and chaining of normalization methods in deep learning-based classification of colorectal cancer using gut microbiome data. IEEE Access 9:97296-97319

[65] Naz H, Ahuja S. 2020. Deep learning approach for diabetes prediction using PIMA Indian dataset. Journal of Diabetes & Metabolic Disorders 19(1):391-403

[66] Nguyen D, Nguyen H, Ong H, Le H, Ha H, Duc NT, Ngo HT. 2022. Ensemble learning using traditional machine learning and deep neural network for diagnosis of Alzheimer’s disease. IBRO Neuroscience Reports 13(5):255-263

[67] Oktay AB, Kocer A. 2020. Differential diagnosis of Parkinson and essential tremor with convolutional LSTM networks. Biomedical Signal Processing and Control 56(1):101683

[68] Önal MN, Güraksin GE, Duman R. 2023. Convolutional neural network-based diabetes diagnostic system via iridology technique. Multimedia tools and Applications 82(1):173-194

[69] Paul SG, Saha A, Hasan MZ, Noori SRH, Moustafa A. 2024. A systematic review of graph neural network in healthcare-based applications: recent advances, trends, and future directions. IEEE Access 12:15145-15170

[70] Pradhan K, Chawla P, Rawat S. 2023. A deep learning-based approach for detection of lung cancer using self-adaptive sea lion optimization algorithm (SA-SLnO) Journal of Ambient Intelligence and Humanized Computing 14(9):12933-12947

[71] Raghavaiah P, Varadarajan S. 2022. A CAD system design for Alzheimer’s disease diagnosis using temporally consistent clustering and hybrid deep learning models. Biomedical Signal Processing and Control 75(2):103571

[72] Rahal HR, Slatnia S, Kazar O, Barka E, Harous S. 2024. Blockchain-based multi-diagnosis deep learning application for various diseases classification. International Journal of Information Security 23(1):15-30

[73] Rani P, Kumar R, Jain A, Lamba R, Sachdeva RK, Kumar K, Kumar M. 2024. An extensive review of machine learning and deep learning techniques on heart disease classification and prediction. Archives of Computational Methods in Engineering 31:3331-3349

[74] Raza HA, Ansari SU, Javed K, Hanif M, Qaisar SM, Haider U, Pławiak P, Maab I. 2024. A proficient approach for the classification of Alzheimer’s disease using a hybridization of machine learning and deep learning. Scientific Reports 14(1):30925

[75] Saber A, Sakr M, Abo-Seida OM, Keshk A, Chen H. 2021. A novel deep-learning model for automatic detection and classification of breast cancer using the transfer-learning technique. IEEE Access 9:71194-71209

[76] Sadr H, Nazari Soleimandarabi M. 2022. ACNN-TL: attention-based convolutional neural network coupling with transfer learning and contextualized word representation for enhancing the performance of sentiment classification. The Journal of Supercomputing 78(7):10149-10175

[77] Salman ME, Çakar GÇ, Azimjonov J, Kösem M, Cedi̇moğlu İH. 2022. Automated prostate cancer grading and diagnosis system using deep learning-based Yolo object detection algorithm. Expert Systems with Applications 201(1):117148

[78] Saravanan S, Ramkumar K, Narasimhan K, Subramaniyaswamy V, Kotecha K, Abraham A. 2023. Explainable artificial intelligence (EXAI) models for early prediction of Parkinson’s disease based on spiral and Wave drawings. IEEE Access 11:68366-68378

[79] Saryazdi MD, Mostafaeipour A. 2025. Identification and validation of key predictive factors for heart attack diagnosis using machine learning and fuzzy clustering. Engineering Applications of Artificial Intelligence 142(34):109968

[80] Savitha S, Kannan AR, Logeswaran K. 2025. Augmenting cardiovascular disease prediction through CWCF integration leveraging harris hawks search in deep belief networks. Cognitive Computation 17(1):52

[81] Schwab P, Karlen W. 2020. A deep learning approach to diagnosing multiple sclerosis from smartphone data. IEEE Journal of Biomedical and Health Informatics 25(4):1284-1291

[82] Senturk ZK. 2020. Early diagnosis of Parkinson’s disease using machine learning algorithms. Medical Hypotheses 138(4):109603

[83] Sharkas M, Attallah O. 2024. Color-CADx: a deep learning approach for colorectal cancer classification through triple convolutional neural networks and discrete cosine transform. Scientific Reports 14(1):6914

[84] Shatnawi MQ, Abuein Q, Al-Quraan R. 2025. Deep learning-based approach to diagnose lung cancer using CT-scan images. Intelligence-Based Medicine 11(11):100188

[85] Shyamala K, Navamani TM. 2024. Design of an efficient prediction model for Early Parkinson’s disease diagnosis. IEEE Access 12:137295–137309

[86] Sia M, Ng KW, Haw SC, Jayaram J. 2025. Chronic disease prediction chatbot using deep learning and machine learning algorithms. Bulletin of Electrical Engineering and Informatics 14(1):742-751

[87] Singh K. 2024. AI and healthcare: opportunities and challenges. 1-195 ebook

[88] Singh Y, Gulati N. 2025. Machine learning techniques for accurate prediction and detection of chronic diseases. In: Machine Learning in Multimedia. Boca Raton, Florida: CRC Press. 1-21

[89] SinhaRoy R, Sen A. 2024. A hybrid deep learning framework to predict Alzheimer’s disease progression using generative adversarial networks and deep convolutional neural networks. Arabian Journal for Science and Engineering 49(3):3267-3284

[90] Sokouti M, Sokouti B. 2024. Cancer genetics and deep learning applications for diagnosis, prognosis, and categorization. Journal of Biological Methods 11(3):e99010017

[91] Stabellini N, Makram OM, Kunhiraman HH, Daoud H, Shanahan J, Montero AJ, Blumenthal RS, Aggarwal C, Swami U, Virani SS, Noronha V, Agarwal N, Dent S, Guha A. 2025. A novel machine learning-based cancer-specific CVD risk score among patients with breast, colorectal, or lung cancer. JNCI Cancer Spectrum 9(1):pkaf016

[92] Sudhish DK, Nair LR, Shailesh S. 2024. Content-based image retrieval for medical diagnosis using fuzzy clustering and deep learning. Biomedical Signal Processing and Control 88(6):105620

[93] Talaei Khoei T, Ould Slimane H, Kaabouch N. 2023. Deep learning: systematic review, models, challenges, and research directions. Neural Computing and Applications 35(31):23103-23124

[94] Tanveer M, Rashid AH, Kumar R, Balasubramanian R. 2022. Parkinson’s disease diagnosis using neural networks: survey and comprehensive evaluation. Information Processing & Management 59(3):102909

[95] Tejaswi GT, Srinivasu N, Gottumukkala PSV. 2025. A survey of machine learning and deep learning techniques for lung cancer prediction in IoT and cloud platform. International Journal of Image and Graphics 2750014

[96] Thatha VN, Karthik MG, Gaddam VG, Krishna DP, Venkataramana S, Lella KK, Pamula U. 2025. Histopathological image based breast cancer diagnosis using deep learning and bio inspired optimization. Scientific Reports 15(1):19034

[97] Tufail H, Ahad A, Puspitasari I, Shayea I, Coelho PJ, Pires IM. 2024. Deep learning in smart healthcare: a GAN-based approach for imbalanced Alzheimer’s disease classification. Procedia Computer Science 241(3):146-153

[98] Uma Maheswari K, Valarmathi A. 2023. A novel mechanism to recognize heart disease by optimised deep belief network with SVM classification. Journal of Intelligent & Fuzzy Systems 44(1):167-184

[99] ul Haq A, Li JP, Agbley BLY, Mawuli CB, Ali Z, Nazir S, Din SU. 2022. A survey of deep learning techniques based Parkinson’s disease recognition methods employing clinical data. Expert Systems with Applications 208(3):118045

[100] Vuran S, Ucan M, Akin M, Kaya M. 2025. Multi-classification of skin lesion images including Mpox disease using transformer-based deep learning architectures. Diagnostics 15(3):374

[101] Wang L. 2024. Mammography with deep learning for breast cancer detection. Frontiers in Oncology 14:1281922

[102] Yaghoubi N, Masumi H, Fatehi MH, Ashtari F, Kafieh R. 2024. Deep learning and classic machine learning models in the automatic diagnosis of multiple sclerosis using retinal vessels. Multimedia Tools and Applications 83(13):37483-37504

[103] Yamuna Devi MM, Jeyabharathi J, Kirubakaran S, Narayanan S, Srikanth T, Chakrabarti P. 2024. Efficient segmentation and classification of the lung carcinoma via deep learning. Multimedia Tools and Applications 83(14):41981-41995

[104] Yildirim K, Yildirim M, Eryesil H, Talo M, Yildirim O, Karabatak M, Ogras MS, Artas H, Acharya UR. 2022. Deep learning-based PI-RADS score estimation to detect prostate cancer using multiparametric magnetic resonance imaging. Computers and Electrical Engineering 102(8):108275

[105] Yu Z, Wang K, Wan Z, Xie S, Lv Z. 2023. Popular deep learning algorithms for disease prediction: a review. Cluster Computing 26(2):1231-1251

[106] Yuan X, Sun C, Chen S. 2023. Cooperative DNN partitioning for accelerating DNN-empowered disease diagnosis via swarm reinforcement learning. Applied Soft Computing 148:110844

[107] Zeng N, Li H, Peng Y. 2023. A new deep belief network-based multi-task learning for diagnosis of Alzheimer’s disease. Neural Computing and Applications 35(16):11599-11610

Disease diagnosis and prediction using deep learning: a review

Introduction

Survey methodology

Disease diagnosis using machine learning

Overview of machine learning

Figure 1: Classification of machine learning models.

Machine learning techniques for disease diagnosis

Deep learning models

Figure 2: Representation of deep learning models.

Convolutional neural networks

Figure 3: Convolutional neural networks (ul Haq et al., 2022).

Recurrent neural networks

Figure 4: Recurrent neural networks (ul Haq et al., 2022).

Long short-term memory networks

Figure 5: Long short-term memory (ul Haq et al., 2022).

Generative adversarial networks

Figure 6: Framework of generative adversarial network (ul Haq et al., 2022).

Deep neural networks

Generative artificial intelligence

Deep learning techniques for disease prediction and diagnosis

Overview of the dataset

Figure 7: Disease diagnosis workflow pipeline based on deep learning.

Disease diagnosis and prediction

Heart disease

Cancer disease

Neurological diseases: Alzheimer’s disease

Parkinson’s disease

Diabetes

Skin diseases

Contagious diseases

Algorithmic steps to implement deep learning models

Model optimization and performance metrics

Comprehensive comparative analysis of deep learning approaches

Research challenges and open issues in disease diagnosis and prediction

Large dataset

Availability of public and real-time disease diagnosis datasets and ethical considerations

Quality and availability of data

Data enrichment methods and the time complexity of the model

Lack of cohesive frameworks and suboptimal utilization of transformer architectures

Inadequate hyperparameter tuning and uneven datasets

Absence of privacy-enhancing techniques and limited external validation and generalizability

Transparency

Recent advancements in deep learning for disease diagnosis and prediction

Conclusion and future work

Supplemental Information

List of commonly used abbreviations in deep learning based disease diagnosis.

Download article

Report a problem

Follow this publication for updates

Change notification settings or unfollow

Share this publication

Metrics

Links

Articles citing this paper