Boosting Convolution with Efficient MLP-Permutation for Volumetric Medical Image Segmentation

Mar 23, 2023 Mar 23, 2023

1 Motivation

Automatic segmentation of medical images is in high demand for clinical diagnosis and treatment planning. CNN and Transformer have been widely used in medical image segmentation. However, CNN is not good at capturing long-range dependencies, while Transformer is computationally expensive for volumetric medical image segmentation. In this work, we propose a novel permutable hybrid network for volumetric medical image segmentation, named PHNet, which capitalizes on the strengths of both CNN and MLP, and achieves a good balance between efficiency and effectiveness, as shown in Figure 1(a).

2 Method

PHNet adopts an encoder-decoder paradigm, as exemplified in Figure 1(b). The encoder consists of a 2.5D convolution module and an MLPP module. The 2.5D convolution is responsible for extracting local features, and the output feature maps are subsequently forwarded to MLPP to capture global features. The decoder processes the hierarchical features for prediction.

Although CNNs are capable of modeling long-range dependencies through deep stacks of convolution layers, MLP has superior ability to learn global context. Motivated by this, we design MLPP as depicted in Figure 2 to acquire global information in deep layers of the encoder. MLPP decomposes the training of in-plane feature (IP-MLP) and through-plan feature (TP-MLP) in sequential order. To facilitate communication of cross-axis tokens, we further propose an auxiliary attention branch (AA-MLP) in IP-MLP.

3 Experiments

As shown in Figure 3(a), our PHNet achieves state-of-the-art performance on the public Synapse benchmark, which demonstrates the effectiveness of our method through local-to-global modeling. Figure 3(b) further highlights the superior balance between efficiency and effectiveness of PHNet compared to CNN and Transformer.

4 Conclusion

This work introduced a permutable hybrid network, PHNet, specifically designed for volumetric medical image segmentation. By integrating 2D CNN, 3D CNN, and MLP, PHNet effectively captures both local and global features. Additionally, we proposed a permutable MLP block to address spatial information loss and alleviate computational burden. Experimental results on four public datasets demonstrate the superiority of PHNet over state-of-the-art approaches. Future research will explore extending the framework to other medical image analysis tasks, such as disease diagnosis and localization, and further examine the interactions and effectiveness of CNN, Transformer, and MLP.