Abstract
Accurate brain tumor segmentation holds critical clinical significance for personalized radiotherapy dose planning and prognostic evaluation, yet existing methods present a fundamental trade-off between global contextual modeling capacity and computational efficiency. While U-Net variants partially mitigate limited receptive fields through skip connections, the inherent inductive biases of conventional convolution operations in their encoders lead to suboptimal robustness in segmenting heterogeneous tumor boundaries, particularly in glioma infiltration zones. Although Transformer architectures effectively capture long-range dependencies, their quadratic computational complexity and excessive parameterization hinder clinical deployment. This study proposes DCLA-UNet, a lightweight architecture that achieves a balance between segmentation accuracy and computational efficiency through three key components: 1) The Dynamic Cross-Layer Compressive Attention (DCLA) module enables cross-layer interaction of multi-scale spatial features via hierarchical channel compression and adaptive dynamic down-sampling; 2) The Slim Large Kernel Module (SLKM) in the encoder integrates large-kernel depthwise convolution with axial depthwise convolution, expanding receptive fields while reducing computational cost; 3) The Multi-Scale Fusion Module (MSFM) in the decoder employs parallel multi-branch feature reconstruction pathways for multi-scale feature integration. The proposed architecture contains only 0.527M trainable parameters (94.96% reduction vs. 3D U-Net) and requires 39.232 GFLOPs (89.7% reduction). On the BraTS2021 benchmark, our method achieves 0.88 mean Dice Similarity Coefficient (+2.09% vs. 3D U-Net) and 6.359 average HD95 (+8.9%improvement), outperforming lightweight segmentation networks including MogaNet and SegFormer3D. The source code is available at
https://github.com/Helium-327/DCLA-UNet-3D.