Innovative fusion of Mamba and CNN for accurate skin lesion segmentation
Abstract
Computer-aided diagnosis (CAD) plays a critical role in the early identification and assessment of skin malignancies. Accurate segmentation of skin lesions is pivotal for the efficacy of CAD systems, significantly influencing overall system accuracy. Precise segmentation remains challenging due to recent advancements, due to inherent variability in lesion characteristics, including variations in color, texture, and lesion shape, as well as boundary definition, and differences in lighting conditions. While convolutional neural networks (CNNs) are broadly acknowledged for their effectiveness in capturing local features in images, they face inherent restrictions in obtaining long-range dependencies. The recently developed Mamba architectures have emerged to overcome this issue, offering significant advantages by employing efficient state space models to more effectively capture these dependencies. We introduce VMISeg, a novel hybrid deep learning model integrating CNN (InceptionV3) and Mamba (VMamba) encoders, unifying the strengths of these two architectures. VMISeg employs a multilevel feature fusion schema using a Feature Fusion Module (FFM) and innovative connections to fuse global contextual information with local features efficiently. Furthermore, the architecture incorporates a CNN-based decoder comprising upsampling layers and Inception blocks that are symmetrical to the structure of the InceptionV3 encoder. The evaluation was conducted using the International Skin Imaging Collaboration (ISIC) datasets; VMISeg achieved Dice scores of 93.4% on ISIC-2016, 89.3% on ISIC-2017, and 92.0% on ISIC-2018. Our findings demonstrate the effectiveness of the proposed hybrid model in performing competitively with state-of-the-art methods and achieving a notable enhancement in accuracy.