Javascript must be enabled to continue!

ViT-UperNet: a hybrid vision transformer with unified-perceptual-parsing network for medical image segmentation

AbstractThe existing image semantic segmentation models have low accuracy in detecting tiny targets or multi-targets at overlapping regions. This work proposes a hybrid vision transformer with unified-perceptual-parsing network (ViT-UperNet) for medical image segmentation. A self-attention mechanism is embedded in a vision transformer to extract multi-level features. The image features are extracted hierarchically from low to high dimensions using 4 groups of Transformer blocks with different numbers. Then, it uses a unified-perceptual-parsing network based on a feature pyramid network (FPN) and a pyramid pooling module (PPM) for the fusion of multi-scale contextual features and semantic segmentation. FPN can naturally use hierarchical features, and generate strong semantic information on all scales. PPM can better use the global prior knowledge to understand complex scenes, and extract features with global context information to improve segmentation results. In the training process, a scalable self-supervised learner named masked autoencoder is used for pre-training, which strengthens the visual representation ability and improves the efficiency of the feature learning. Experiments are conducted on cardiac magnetic resonance image segmentation where the left and right atrium and ventricle are selected for segmentation. The pixels accuracy is 93.85%, the Dice coefficient is 92.61% and Hausdorff distance is 11.16, which are improved compared with the other methods. The results show the superiority of Vit-UperNet in medical images segmentation, especially for the low-recognition and serious-occlusion targets.

Springer Science and Business Media LLC

Yang Ruiping Liu Kun Xu Shaohua Yin Jian Zhang Zhen

Complex & Intelligent Systems

2024

Title: ViT-UperNet: a hybrid vision transformer with unified-perceptual-parsing network for medical image segmentation

Description:

AbstractThe existing image semantic segmentation models have low accuracy in detecting tiny targets or multi-targets at overlapping regions.

This work proposes a hybrid vision transformer with unified-perceptual-parsing network (ViT-UperNet) for medical image segmentation.

A self-attention mechanism is embedded in a vision transformer to extract multi-level features.

The image features are extracted hierarchically from low to high dimensions using 4 groups of Transformer blocks with different numbers.

Then, it uses a unified-perceptual-parsing network based on a feature pyramid network (FPN) and a pyramid pooling module (PPM) for the fusion of multi-scale contextual features and semantic segmentation.

FPN can naturally use hierarchical features, and generate strong semantic information on all scales.

PPM can better use the global prior knowledge to understand complex scenes, and extract features with global context information to improve segmentation results.

In the training process, a scalable self-supervised learner named masked autoencoder is used for pre-training, which strengthens the visual representation ability and improves the efficiency of the feature learning.

Experiments are conducted on cardiac magnetic resonance image segmentation where the left and right atrium and ventricle are selected for segmentation.

The pixels accuracy is 93.

85%, the Dice coefficient is 92.

61% and Hausdorff distance is 11.

16, which are improved compared with the other methods.

The results show the superiority of Vit-UperNet in medical images segmentation, especially for the low-recognition and serious-occlusion targets.

Back

Related Results

Depth-aware salient object segmentation

Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...

Automatic Load Sharing of Transformer

Transformer plays a major role in the power system. It works 24 hours a day and provides power to the load. The transformer is excessive full, its windings are overheated which lea...

Hierarchical Clause Annotation: Building a Clause-Level Corpus for Semantic Parsing with Complex Sentences

Most natural-language-processing (NLP) tasks suffer performance degradation when encountering long complex sentences, such as semantic parsing, syntactic parsing, machine translati...

AI‐enabled precise brain tumor segmentation by integrating Refinenet and contour‐constrained features in MRI images

AbstractBackgroundMedical image segmentation is a fundamental task in medical image analysis and has been widely applied in multiple medical fields. The latest transformer‐based de...

High frequency modeling of power transformers under transients

This thesis presents the results related to high frequency modeling of power transformers. First, a 25kVA distribution transformer under lightning surges is tested in the laborator...

Colour image segmentation using perceptual colour difference saliency algorithm

The topic of colour image segmentation has been and still is a hot issue in areas such as computer vision and image processing because of its wide range of practical applications. ...

Semantic Graphical Dependence Parsing Model in Improving English Teaching Abilities

It is a very difficult problem to achieve high-order functionality for graphical dependency parsing without growing decoding difficulties. To solve this problem, this article offer...

Review on 2D and 3D MRI Image Segmentation Techniques

Background: Magnetic Resonance Imaging is most widely used for early diagnosis of abnormalities in human organs. Due to the technical advancement in digital image processing, auto...

Email:
Password:

Email: