Javascript must be enabled to continue!
Transformer-CNN hybrid network for crowd counting
View through CrossRef
Efficient feature representation is the key to improving crowd counting performance. CNN and Transformer are the two commonly used feature extraction frameworks in the field of crowd counting. CNN excels at hierarchically extracting local features to obtain a multi-scale feature representation of the image, but it struggles with capturing global features. Transformer, on the other hand, could capture global feature representation by utilizing cascaded self-attention to capture remote dependency relationships, but it often overlooks local detail information. Therefore, relying solely on CNN or Transformer for crowd counting has certain limitations. In this paper, we propose the TCHNet crowd counting model by combining the CNN and Transformer frameworks. The model employs the CMT (CNNs Meet Vision Transformers) backbone network as the Feature Extraction Module (FEM) to hierarchically extract local and global features of the crowd using a combination of convolution and self-attention mechanisms. To obtain more comprehensive spatial local information, an improved Progressive Multi-scale Learning Process (PMLP) is introduced into the FEM, guiding the network to learn at different granularity levels. The features from these three different granularity levels are then fed into the Multi-scale Feature Aggregation Module (MFAM) for fusion. Finally, a Multi-Scale Regression Module (MSRM) is designed to handle the multi-scale fused features, resulting in crowd features rich in high-level semantics and low-level detail. Experimental results on five benchmark datasets demonstrate that TCHNet achieves highly competitive performance compared to some popular crowd counting methods.
SAGE Publications
Title: Transformer-CNN hybrid network for crowd counting
Description:
Efficient feature representation is the key to improving crowd counting performance.
CNN and Transformer are the two commonly used feature extraction frameworks in the field of crowd counting.
CNN excels at hierarchically extracting local features to obtain a multi-scale feature representation of the image, but it struggles with capturing global features.
Transformer, on the other hand, could capture global feature representation by utilizing cascaded self-attention to capture remote dependency relationships, but it often overlooks local detail information.
Therefore, relying solely on CNN or Transformer for crowd counting has certain limitations.
In this paper, we propose the TCHNet crowd counting model by combining the CNN and Transformer frameworks.
The model employs the CMT (CNNs Meet Vision Transformers) backbone network as the Feature Extraction Module (FEM) to hierarchically extract local and global features of the crowd using a combination of convolution and self-attention mechanisms.
To obtain more comprehensive spatial local information, an improved Progressive Multi-scale Learning Process (PMLP) is introduced into the FEM, guiding the network to learn at different granularity levels.
The features from these three different granularity levels are then fed into the Multi-scale Feature Aggregation Module (MFAM) for fusion.
Finally, a Multi-Scale Regression Module (MSRM) is designed to handle the multi-scale fused features, resulting in crowd features rich in high-level semantics and low-level detail.
Experimental results on five benchmark datasets demonstrate that TCHNet achieves highly competitive performance compared to some popular crowd counting methods.
Related Results
Evolution and Impact of Crowd funding in India
Evolution and Impact of Crowd funding in India
Crowd funding is a digital financing model through which individuals, entrepreneurs, or businesses secure funds from a large number of contributors, typically via the Internet. Thi...
Automatic Load Sharing of Transformer
Automatic Load Sharing of Transformer
Transformer plays a major role in the power system. It works 24 hours a day and provides power to the load. The transformer is excessive full, its windings are overheated which lea...
Crowd Density Estimation via Global Crowd Collectiveness Metric
Crowd Density Estimation via Global Crowd Collectiveness Metric
Drone-captured crowd videos have become increasingly prevalent in various applications in recent years, including crowd density estimation via measuring crowd collectiveness. Tradi...
Crowd counting algorithm based on face detection and skin color recognition
Crowd counting algorithm based on face detection and skin color recognition
This paper introduces an innovative crowd counting algorithm using skin color information. Through stages of color space transformation, threshold segmentation, morphological proce...
High frequency modeling of power transformers under transients
High frequency modeling of power transformers under transients
This thesis presents the results related to high frequency modeling of power transformers. First, a 25kVA distribution transformer under lightning surges is tested in the laborator...
A deep crowd density classification model for Hajj pilgrimage using fully convolutional neural network
A deep crowd density classification model for Hajj pilgrimage using fully convolutional neural network
This research enhances crowd analysis by focusing on excessive crowd analysis and crowd density predictions for Hajj and Umrah pilgrimages. Crowd analysis usually analyzes the numb...
The Dual-Attention Mechanism-Based Subway Station Crowded Crowds Counting Method
The Dual-Attention Mechanism-Based Subway Station Crowded Crowds Counting Method
Abstract
The subway station is an important place for passenger flow distribution in subway networks, and real-time monitoring of passenger flow within stations helps promo...
Bibliometric analysis of sharing economy logistics and crowd logistics
Bibliometric analysis of sharing economy logistics and crowd logistics
Purpose
This study aims to review the literature on sharing economy logistics and crowd logistics to answer the three following questions: How is the literature on sharing economy ...

