Javascript must be enabled to continue!
WS-AM: Weakly Supervised Attention Map for Scene Recognition
View through CrossRef
Recently, convolutional neural networks (CNNs) have achieved great success in scene recognition. Compared with traditional hand-crafted features, CNN can be used to extract more robust and generalized features for scene recognition. However, the existing scene recognition methods based on CNN do not sufficiently take into account the relationship between image regions and categories when choosing local regions, which results in many redundant local regions and degrades recognition accuracy. In this paper, we propose an effective method for exploring discriminative regions of the scene image. Our method utilizes the gradient-weighted class activation mapping (Grad-CAM) technique and weakly supervised information to generate the attention map (AM) of scene images, dubbed WS-AM—weakly supervised attention map. The regions, where the local mean and the local center value are both large in the AM, correspond to the discriminative regions helpful for scene recognition. We sampled discriminative regions on multiple scales and extracted the features of large-scale and small-scale regions with two different pre-trained CNNs, respectively. The features from two different scales were aggregated by the improved vector of locally aggregated descriptor (VLAD) coding and max pooling, respectively. Finally, the pre-trained CNN was used to extract the global feature of the image in the fully- connected (fc) layer, and the local features were combined with the global feature to obtain the image representation. We validated the effectiveness of our method on three benchmark datasets: MIT Indoor 67, Scene 15, and UIUC Sports, and obtained 85.67%, 94.80%, and 95.12% accuracy, respectively. Compared with some state-of-the-art methods, the WS-AM method requires fewer local regions, so it has a better real-time performance.
Title: WS-AM: Weakly Supervised Attention Map for Scene Recognition
Description:
Recently, convolutional neural networks (CNNs) have achieved great success in scene recognition.
Compared with traditional hand-crafted features, CNN can be used to extract more robust and generalized features for scene recognition.
However, the existing scene recognition methods based on CNN do not sufficiently take into account the relationship between image regions and categories when choosing local regions, which results in many redundant local regions and degrades recognition accuracy.
In this paper, we propose an effective method for exploring discriminative regions of the scene image.
Our method utilizes the gradient-weighted class activation mapping (Grad-CAM) technique and weakly supervised information to generate the attention map (AM) of scene images, dubbed WS-AM—weakly supervised attention map.
The regions, where the local mean and the local center value are both large in the AM, correspond to the discriminative regions helpful for scene recognition.
We sampled discriminative regions on multiple scales and extracted the features of large-scale and small-scale regions with two different pre-trained CNNs, respectively.
The features from two different scales were aggregated by the improved vector of locally aggregated descriptor (VLAD) coding and max pooling, respectively.
Finally, the pre-trained CNN was used to extract the global feature of the image in the fully- connected (fc) layer, and the local features were combined with the global feature to obtain the image representation.
We validated the effectiveness of our method on three benchmark datasets: MIT Indoor 67, Scene 15, and UIUC Sports, and obtained 85.
67%, 94.
80%, and 95.
12% accuracy, respectively.
Compared with some state-of-the-art methods, the WS-AM method requires fewer local regions, so it has a better real-time performance.
Related Results
Depth-aware salient object segmentation
Depth-aware salient object segmentation
Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...
An Empirical Study on Factors of Influence for Single-Frame Supervised Temporal Action Detection
An Empirical Study on Factors of Influence for Single-Frame Supervised Temporal Action Detection
Abstract
Owing to the substantial time and labor demands associated with video annotation for fully-supervised temporal action detection (TAD), extensive research has been ...
Mobile Phone Indoor Scene Recognition Location Method Based on Semantic Constraint of Building Map
Mobile Phone Indoor Scene Recognition Location Method Based on Semantic Constraint of Building Map
At present, indoor localization is one of the core technologies of location-based services (LBS), and there exist numerous scenario-oriented application solutions. Visual features,...
The Predictive Value of MAP and ETCO2 Changes After Emergency Endotracheal Intubation for Severe Cardiovascular Collapse
The Predictive Value of MAP and ETCO2 Changes After Emergency Endotracheal Intubation for Severe Cardiovascular Collapse
Abstract
Objective: To analyze the changes in mean arterial pressure (MAP) and end-tidal CO2 (ETCO2) in patients after emergency endotracheal intubation (ETI). To explore t...
Strong Representation Learning for Weakly Supervised Object Detection
Strong Representation Learning for Weakly Supervised Object Detection
To solve the problem that the feature maps generated by feature extraction network of traditional weakly supervised learning object detection algorithm is not strong in feature, an...
On Weakly S-Primary Ideals of Commutative Rings
On Weakly S-Primary Ideals of Commutative Rings
Let R be a commutative ring with identity and S be a multiplicatively closed subset of R. The purpose of this paper is to introduce the concept of weakly S-primary ideals as a new ...
Weakly 2‐Absorbing Ideals in Almost Distributive Lattices
Weakly 2‐Absorbing Ideals in Almost Distributive Lattices
The concepts of weakly 2‐absorbing ideal and weakly 1‐absorbing prime ideal in an almost distributive lattice (ADL) are introduced, and the necessary conditions for a weakly 1‐abso...
Underwater Acoustic Target Recognition Based on Supervised Feature-Separation Algorithm
Underwater Acoustic Target Recognition Based on Supervised Feature-Separation Algorithm
For the purpose of improving the accuracy of underwater acoustic target recognition with only a small number of labeled data, we proposed a novel recognition method, including 4 st...

