IEEE Access, 2026 (SCI-Expanded, Scopus)
Accurate 3D medical image segmentation must reconcile local boundary fidelity with global anatomical context and remain robust across scanners, imaging modalities and lesion scales. In this study, we propose MAGNUS, a unified hybrid architecture that combines a convolutional neural network encoder, a Vision Transformer branch for global context and a decoder with squeeze-and-excitation gating and deep super-vision. Deep features from the convolutional path and the transformer path are aligned by bidirectional cross-attention and enriched by a parallel multi-kernel scale-adaptive 3D convolution module at the deepest level. We evaluate MAGNUS on three public benchmarks: ATLAS v2.0 brain magnetic resonance imaging with T1-weighted scans, ISLES’22 ischemic stroke diffusion-weighted and apparent diffusion coefficient imaging, and PANTHER pancreatic magnetic resonance imaging. All experiments use a unified preprocessing pipeline with patient-wise five-fold cross-validation and ensemble inference. MAGNUS achieves mean Dice scores of 0.549 on ATLAS v2.0, 0.695 on ISLES’22 and 0.833 on PANTHER. The method consistently reduces boundary error measured by the ninety-fifth percentile Hausdorff distance and the average symmetric surface distance when compared with state-of-the-art 3D segmentation methods. These results indicate that deep-scale cross-attention, explicit multi-scale convolution and evidence-guided decoding together provide a practical recipe for a unified approach to robust 3D segmentation across different anatomies and imaging protocols.