Hybrid CNN–Transformer–Conformer deep learning model for accurate and generalizable skin cancer detection from dermoscopic Images
Abstract
Skin cancer is the leading cause of morbidity worldwide, and it requires an automated diagnostic systems that achieve high accuracy as well as better generalizability across diverse clinical scenarios. This paper proposed a CNN-Transformer-Conformer (CTC) model that combines Convolutional Neural Networks (CNNs), Transformers, and Conformers to classify dermoscopic images with significant precision. Utilizing the HAM10000 dataset, the CTC model employs a multi-stage preprocessing pipeline to remove artifacts such as hair, illumination variations, and low contrast. The CTC model includes a CNN backbone for feature extraction, a Transformer encoder for capturing global contextual features, and Conformer blocks for balanced local and global features. To address the class imbalance in HAM10000, weighted cross-entropy loss is used during training. The model achieved a validation accuracy of 96.8%, with a melanoma recall of 97.3%.