Javascript must be enabled to continue!

A Deep Ensemble Learning Approach Based on a Vision Transformer and Neural Network for Multi-Label Image Classification

Convolutional Neural Networks (CNNs) have proven to be very effective in image classification due to their status as a powerful feature learning algorithm. Traditional approaches have considered the problem of multiclass classification, where the goal is to classify a set of objects at once. However, co-occurrence can make the discriminative features of the target less salient and may lead to overfitting of the model, resulting in lower performance. To address this, we propose a multi-label classification ensemble model including a Vision Transformer (ViT) and CNN for directly detecting one or multiple objects in an image. First, we improve the MobileNetV2 and DenseNet201 models using extra convolutional layers to strengthen image classification. In detail, three convolution layers are applied in parallel at the end of both models. ViT can learn dependencies among distant positions and local detail, making it an effective tool for multi-label classification. Finally, an ensemble learning algorithm is used to combine the classification predictions of the ViT, the modified MobileNetV2, and DenseNet201 bands for increased image classification accuracy using a voting system. The performance of the proposed model is examined on four benchmark datasets, achieving accuracies of 98.24%, 98.89%, 99.91%, and 96.69% on ASCAL VOC 2007, PASCAL VOC 2012, MS-COCO, and NUS-WIDE 318, respectively, showing that our framework can enhance current state-of-the-art methods.

MDPI AG

Anas W. Abulfaraj Faisal Binzagr

Big Data and Cognitive Computing

2025

Title: A Deep Ensemble Learning Approach Based on a Vision Transformer and Neural Network for Multi-Label Image Classification

Description:

Convolutional Neural Networks (CNNs) have proven to be very effective in image classification due to their status as a powerful feature learning algorithm.

Traditional approaches have considered the problem of multiclass classification, where the goal is to classify a set of objects at once.

However, co-occurrence can make the discriminative features of the target less salient and may lead to overfitting of the model, resulting in lower performance.

To address this, we propose a multi-label classification ensemble model including a Vision Transformer (ViT) and CNN for directly detecting one or multiple objects in an image.

First, we improve the MobileNetV2 and DenseNet201 models using extra convolutional layers to strengthen image classification.

In detail, three convolution layers are applied in parallel at the end of both models.

ViT can learn dependencies among distant positions and local detail, making it an effective tool for multi-label classification.

Finally, an ensemble learning algorithm is used to combine the classification predictions of the ViT, the modified MobileNetV2, and DenseNet201 bands for increased image classification accuracy using a voting system.

The performance of the proposed model is examined on four benchmark datasets, achieving accuracies of 98.

24%, 98.

89%, 99.

91%, and 96.

69% on ASCAL VOC 2007, PASCAL VOC 2012, MS-COCO, and NUS-WIDE 318, respectively, showing that our framework can enhance current state-of-the-art methods.

Back

Related Results

Depth-aware salient object segmentation

Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...

Automatic Load Sharing of Transformer

Transformer plays a major role in the power system. It works 24 hours a day and provides power to the load. The transformer is excessive full, its windings are overheated which lea...

Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)

BACKGROUND As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...

High frequency modeling of power transformers under transients

This thesis presents the results related to high frequency modeling of power transformers. First, a 25kVA distribution transformer under lightning surges is tested in the laborator...

Deep convolutional neural network and IoT technology for healthcare

Background Deep Learning is an AI technology that trains computers to analyze data in an approach similar to the human brain. Deep learning algorithms can find complex patterns in ...

Neural Networks for Quality Sorting of Agricultural Produce

The objectives of this project were to develop procedures and models, based on neural networks, for quality sorting of agricultural produce. Two research teams, one in Purdue Unive...

Deep Neural Ensemble Classification for COVID-19 Dataset

The COVID-19 pandemic has necessitated the development of accurate and efficient classification models for diagnosis and prognosis. While deep learning has shown promising results ...

BMFS: Bidirectional weighted approach for multi-label feature selection algorithm

Abstract Shortcomings of the existing multi-label feature selection algorithms, such as non-considering the correlation of label space, ignoring the possible difference of ...

Email:
Password:

Email: