Javascript must be enabled to continue!
An Empirical Study on Factors of Influence for Single-Frame Supervised Temporal Action Detection
View through CrossRef
Abstract
Owing to the substantial time and labor demands associated with video annotation for fully-supervised temporal action detection (TAD), extensive research has been devoted to the domain of weakly-supervised TAD. However, existing weakly-supervised TAD approaches still suffer from severe localization errors due to the absence of fine-grained frame-level annotations. To tackle this issue, single-frame supervised TAD has been recently proposed as a potential method. This paper does not introduce a new approach. Instead, the aim of this paper is to conduct an empirical study on factors of influence for single-frame supervised TAD, which have not yet been studied and thus are still unclear. We go back to basics and investigate the effects of several fundamental components on the performance of single-frame supervised TAD: 1) feature extraction, 2) feature modeling, 3) temporal embedding, 4) classification head, and 5) video-level classification loss. In this investigation, we explore the potentials of traditional technical solutions in the task of single-frame supervised TAD and unveil the benefits of such solutions, which have not yet been reported to the research community. Based on the findings, we build a baseline detector, which achieves the state-of-the-art performance. It should be noted that, to make up for the limit of mAP (mean average precision), not only mAP but also VCCR (video-level classification correctness rate) is employed in the performance evaluation. Make a note of the fact that the VCCR is a supplementary metric supporting the mAP. We hope that our work can facilitate future research in this field.
Title: An Empirical Study on Factors of Influence for Single-Frame Supervised Temporal Action Detection
Description:
Abstract
Owing to the substantial time and labor demands associated with video annotation for fully-supervised temporal action detection (TAD), extensive research has been devoted to the domain of weakly-supervised TAD.
However, existing weakly-supervised TAD approaches still suffer from severe localization errors due to the absence of fine-grained frame-level annotations.
To tackle this issue, single-frame supervised TAD has been recently proposed as a potential method.
This paper does not introduce a new approach.
Instead, the aim of this paper is to conduct an empirical study on factors of influence for single-frame supervised TAD, which have not yet been studied and thus are still unclear.
We go back to basics and investigate the effects of several fundamental components on the performance of single-frame supervised TAD: 1) feature extraction, 2) feature modeling, 3) temporal embedding, 4) classification head, and 5) video-level classification loss.
In this investigation, we explore the potentials of traditional technical solutions in the task of single-frame supervised TAD and unveil the benefits of such solutions, which have not yet been reported to the research community.
Based on the findings, we build a baseline detector, which achieves the state-of-the-art performance.
It should be noted that, to make up for the limit of mAP (mean average precision), not only mAP but also VCCR (video-level classification correctness rate) is employed in the performance evaluation.
Make a note of the fact that the VCCR is a supplementary metric supporting the mAP.
We hope that our work can facilitate future research in this field.
Related Results
Role of the Frontal Lobes in the Propagation of Mesial Temporal Lobe Seizures
Role of the Frontal Lobes in the Propagation of Mesial Temporal Lobe Seizures
Summary: The depth ictal electroencephalographic (EEG) propagation sequence accompanying 78 complex partial seizures of mesial temporal origin was reviewed in 24 patients (15 from...
Mt or not Mt: Temporal variation in detection probability in spatial capture-recapture and occupancy models
Mt or not Mt: Temporal variation in detection probability in spatial capture-recapture and occupancy models
State variables such as abundance and occurrence of species are central to many questions in ecology and conservation, but our ability to detect and enumerate species is imperfect ...
MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing v1
MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing v1
Human tissues comprise trillions of cells that populate a complex space of molecular phenotypes and functions and that vary in abundance by 4–9 orders of magnitude. Relying solely ...
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Abstarct
Introduction
Isolated brain hydatid disease (BHD) is an extremely rare form of echinococcosis. A prompt and timely diagnosis is a crucial step in disease management. This ...
Stress and Modal Analysis Assessment of Race Cars Chassis Structure
Stress and Modal Analysis Assessment of Race Cars Chassis Structure
A simulation investigation was conducted to compare and validate results for using bended and non-bended chassis structure for race cars. In this study, the basic engineering desig...
Weight reduction of motorcycle frame
by topology optimization
Weight reduction of motorcycle frame
by topology optimization
Purpose: of this paper is to improve the fuel efficiency of electrical motorcycle by
reducing the weight of its frame without affecting the basic functionalities, dimensions and
pe...
The Multi-Temporal Database of Planetary Image Data (MUTED): A Web-Tool to Support Surface Change Analyses on Mars, Moon, and Mercury
The Multi-Temporal Database of Planetary Image Data (MUTED): A Web-Tool to Support Surface Change Analyses on Mars, Moon, and Mercury
<p><strong>Introduction:</strong></p>
<p>The Multi-Temporal Database of Planetary Image Data (MUTED) is a comp...
Depth-aware salient object segmentation
Depth-aware salient object segmentation
Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...

