Javascript must be enabled to continue!
A Deep Ensemble Learning Approach Based on a Vision Transformer and Neural Network for Multi-Label Image Classification
View through CrossRef
Convolutional Neural Networks (CNNs) have proven to be very effective in image classification due to their status as a powerful feature learning algorithm. Traditional approaches have considered the problem of multiclass classification, where the goal is to classify a set of objects at once. However, co-occurrence can make the discriminative features of the target less salient and may lead to overfitting of the model, resulting in lower performance. To address this, we propose a multi-label classification ensemble model including a Vision Transformer (ViT) and CNN for directly detecting one or multiple objects in an image. First, we improve the MobileNetV2 and DenseNet201 models using extra convolutional layers to strengthen image classification. In detail, three convolution layers are applied in parallel at the end of both models. ViT can learn dependencies among distant positions and local detail, making it an effective tool for multi-label classification. Finally, an ensemble learning algorithm is used to combine the classification predictions of the ViT, the modified MobileNetV2, and DenseNet201 bands for increased image classification accuracy using a voting system. The performance of the proposed model is examined on four benchmark datasets, achieving accuracies of 98.24%, 98.89%, 99.91%, and 96.69% on ASCAL VOC 2007, PASCAL VOC 2012, MS-COCO, and NUS-WIDE 318, respectively, showing that our framework can enhance current state-of-the-art methods.
Title: A Deep Ensemble Learning Approach Based on a Vision Transformer and Neural Network for Multi-Label Image Classification
Description:
Convolutional Neural Networks (CNNs) have proven to be very effective in image classification due to their status as a powerful feature learning algorithm.
Traditional approaches have considered the problem of multiclass classification, where the goal is to classify a set of objects at once.
However, co-occurrence can make the discriminative features of the target less salient and may lead to overfitting of the model, resulting in lower performance.
To address this, we propose a multi-label classification ensemble model including a Vision Transformer (ViT) and CNN for directly detecting one or multiple objects in an image.
First, we improve the MobileNetV2 and DenseNet201 models using extra convolutional layers to strengthen image classification.
In detail, three convolution layers are applied in parallel at the end of both models.
ViT can learn dependencies among distant positions and local detail, making it an effective tool for multi-label classification.
Finally, an ensemble learning algorithm is used to combine the classification predictions of the ViT, the modified MobileNetV2, and DenseNet201 bands for increased image classification accuracy using a voting system.
The performance of the proposed model is examined on four benchmark datasets, achieving accuracies of 98.
24%, 98.
89%, 99.
91%, and 96.
69% on ASCAL VOC 2007, PASCAL VOC 2012, MS-COCO, and NUS-WIDE 318, respectively, showing that our framework can enhance current state-of-the-art methods.
Related Results
Depth-aware salient object segmentation
Depth-aware salient object segmentation
Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...
Automatic Load Sharing of Transformer
Automatic Load Sharing of Transformer
Transformer plays a major role in the power system. It works 24 hours a day and provides power to the load. The transformer is excessive full, its windings are overheated which lea...
High frequency modeling of power transformers under transients
High frequency modeling of power transformers under transients
This thesis presents the results related to high frequency modeling of power transformers. First, a 25kVA distribution transformer under lightning surges is tested in the laborator...
Deep convolutional neural network and IoT technology for healthcare
Deep convolutional neural network and IoT technology for healthcare
Background Deep Learning is an AI technology that trains computers to analyze data in an approach similar to the human brain. Deep learning algorithms can find complex patterns in ...
Neural Networks for Quality Sorting of Agricultural Produce
Neural Networks for Quality Sorting of Agricultural Produce
The objectives of this project were to develop procedures and models, based on neural networks, for quality sorting of agricultural produce. Two research teams, one in Purdue Unive...
Deep Neural Ensemble Classification for COVID-19 Dataset
Deep Neural Ensemble Classification for COVID-19 Dataset
The COVID-19 pandemic has necessitated the development of accurate and efficient classification models for diagnosis and prognosis. While deep learning has shown promising results ...
Afaan Oromo Multi-Label News Text Classification Using Deep Learning Approach
Afaan Oromo Multi-Label News Text Classification Using Deep Learning Approach
Abstract
Classification is a technique for categorizing textual data into a form of predefined categories. Due to its major consequences in regard to critical tasks such as...
BMFS: Bidirectional weighted approach for multi-label feature selection algorithm
BMFS: Bidirectional weighted approach for multi-label feature selection algorithm
Abstract
Shortcomings of the existing multi-label feature selection algorithms, such as non-considering the correlation of label space, ignoring the possible difference of ...

