Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

ViT-UperNet: a hybrid vision transformer with unified-perceptual-parsing network for medical image segmentation

View through CrossRef
AbstractThe existing image semantic segmentation models have low accuracy in detecting tiny targets or multi-targets at overlapping regions. This work proposes a hybrid vision transformer with unified-perceptual-parsing network (ViT-UperNet) for medical image segmentation. A self-attention mechanism is embedded in a vision transformer to extract multi-level features. The image features are extracted hierarchically from low to high dimensions using 4 groups of Transformer blocks with different numbers. Then, it uses a unified-perceptual-parsing network based on a feature pyramid network (FPN) and a pyramid pooling module (PPM) for the fusion of multi-scale contextual features and semantic segmentation. FPN can naturally use hierarchical features, and generate strong semantic information on all scales. PPM can better use the global prior knowledge to understand complex scenes, and extract features with global context information to improve segmentation results. In the training process, a scalable self-supervised learner named masked autoencoder is used for pre-training, which strengthens the visual representation ability and improves the efficiency of the feature learning. Experiments are conducted on cardiac magnetic resonance image segmentation where the left and right atrium and ventricle are selected for segmentation. The pixels accuracy is 93.85%, the Dice coefficient is 92.61% and Hausdorff distance is 11.16, which are improved compared with the other methods. The results show the superiority of Vit-UperNet in medical images segmentation, especially for the low-recognition and serious-occlusion targets.
Title: ViT-UperNet: a hybrid vision transformer with unified-perceptual-parsing network for medical image segmentation
Description:
AbstractThe existing image semantic segmentation models have low accuracy in detecting tiny targets or multi-targets at overlapping regions.
This work proposes a hybrid vision transformer with unified-perceptual-parsing network (ViT-UperNet) for medical image segmentation.
A self-attention mechanism is embedded in a vision transformer to extract multi-level features.
The image features are extracted hierarchically from low to high dimensions using 4 groups of Transformer blocks with different numbers.
Then, it uses a unified-perceptual-parsing network based on a feature pyramid network (FPN) and a pyramid pooling module (PPM) for the fusion of multi-scale contextual features and semantic segmentation.
FPN can naturally use hierarchical features, and generate strong semantic information on all scales.
PPM can better use the global prior knowledge to understand complex scenes, and extract features with global context information to improve segmentation results.
In the training process, a scalable self-supervised learner named masked autoencoder is used for pre-training, which strengthens the visual representation ability and improves the efficiency of the feature learning.
Experiments are conducted on cardiac magnetic resonance image segmentation where the left and right atrium and ventricle are selected for segmentation.
The pixels accuracy is 93.
85%, the Dice coefficient is 92.
61% and Hausdorff distance is 11.
16, which are improved compared with the other methods.
The results show the superiority of Vit-UperNet in medical images segmentation, especially for the low-recognition and serious-occlusion targets.

Related Results

Depth-aware salient object segmentation
Depth-aware salient object segmentation
Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...
Automatic Load Sharing of Transformer
Automatic Load Sharing of Transformer
Transformer plays a major role in the power system. It works 24 hours a day and provides power to the load. The transformer is excessive full, its windings are overheated which lea...
Hierarchical Clause Annotation: Building a Clause-Level Corpus for Semantic Parsing with Complex Sentences
Hierarchical Clause Annotation: Building a Clause-Level Corpus for Semantic Parsing with Complex Sentences
Most natural-language-processing (NLP) tasks suffer performance degradation when encountering long complex sentences, such as semantic parsing, syntactic parsing, machine translati...
AI‐enabled precise brain tumor segmentation by integrating Refinenet and contour‐constrained features in MRI images
AI‐enabled precise brain tumor segmentation by integrating Refinenet and contour‐constrained features in MRI images
AbstractBackgroundMedical image segmentation is a fundamental task in medical image analysis and has been widely applied in multiple medical fields. The latest transformer‐based de...
High frequency modeling of power transformers under transients
High frequency modeling of power transformers under transients
This thesis presents the results related to high frequency modeling of power transformers. First, a 25kVA distribution transformer under lightning surges is tested in the laborator...
Colour image segmentation using perceptual colour difference saliency algorithm
Colour image segmentation using perceptual colour difference saliency algorithm
The topic of colour image segmentation has been and still is a hot issue in areas such as computer vision and image processing because of its wide range of practical applications. ...
Semantic Graphical Dependence Parsing Model in Improving English Teaching Abilities
Semantic Graphical Dependence Parsing Model in Improving English Teaching Abilities
It is a very difficult problem to achieve high-order functionality for graphical dependency parsing without growing decoding difficulties. To solve this problem, this article offer...
Review on 2D and 3D MRI Image Segmentation Techniques
Review on 2D and 3D MRI Image Segmentation Techniques
Background: Magnetic Resonance Imaging is most widely used for early diagnosis of abnormalities in human organs. Due to the technical advancement in digital image processing, auto...

Back to Top