Javascript must be enabled to continue!

Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking

Visual object tracking is one of the core techniques in human-centered artificial intelligence, which is very useful for human–machine interaction. State-of-the-art tracking methods have shown their robustness and accuracy on many challenges. However, a large amount of videos with precisely dense annotations are required for fully supervised training of their models. Considering that annotating videos frame-by-frame is a labor- and time-consuming workload, reducing the reliance on manual annotations during the tracking models’ training is an important problem to be resolved. To make a trade-off between the annotating costs and the tracking performance, we propose a weakly supervised tracking method based on co-saliency learning, which can be flexibly integrated into various tracking frameworks to reduce annotation costs and further enhance the target representation in current search images. Since our method enables the model to explore valuable visual information from unlabeled frames, and calculate co-salient attention maps based on multiple frames, our weakly supervised methods can obtain competitive performance compared to fully supervised baseline trackers, using only 3.33% of manual annotations. We integrate our method into two CNN-based trackers and a Transformer-based tracker; extensive experiments on four general tracking benchmarks demonstrate the effectiveness of our method. Furthermore, we also demonstrate the advantages of our method on egocentric tracking task; our weakly supervised method obtains 0.538 success on TREK-150, which is superior to prior state-of-the-art fully supervised tracker by 7.7%.

MDPI AG

Jie Zhao Ying Gao Chunjuan Bo Dong Wang

Sensors

2025

Title: Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking

Description:

Visual object tracking is one of the core techniques in human-centered artificial intelligence, which is very useful for human–machine interaction.

State-of-the-art tracking methods have shown their robustness and accuracy on many challenges.

However, a large amount of videos with precisely dense annotations are required for fully supervised training of their models.

Considering that annotating videos frame-by-frame is a labor- and time-consuming workload, reducing the reliance on manual annotations during the tracking models’ training is an important problem to be resolved.

To make a trade-off between the annotating costs and the tracking performance, we propose a weakly supervised tracking method based on co-saliency learning, which can be flexibly integrated into various tracking frameworks to reduce annotation costs and further enhance the target representation in current search images.

Since our method enables the model to explore valuable visual information from unlabeled frames, and calculate co-salient attention maps based on multiple frames, our weakly supervised methods can obtain competitive performance compared to fully supervised baseline trackers, using only 3.

33% of manual annotations.

We integrate our method into two CNN-based trackers and a Transformer-based tracker; extensive experiments on four general tracking benchmarks demonstrate the effectiveness of our method.

Furthermore, we also demonstrate the advantages of our method on egocentric tracking task; our weakly supervised method obtains 0.

538 success on TREK-150, which is superior to prior state-of-the-art fully supervised tracker by 7.

7%.

Back

Related Results

Depth-aware salient object segmentation

Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...

Neural Correlates of High-Level Visual Saliency Models

AbstractVisual saliency highlights regions in a scene that are most relevant to an observer. The process by which a saliency map is formed has been a crucial subject of investigati...

A Dynamic Bottom-Up Saliency Detection Method for Still Images

AbstractIntroductionExisting saliency detection algorithms in the literature have ignored the importance of time. They create a static saliency map for the whole recording time. Ho...

Review of Visual Saliency Prediction: Development Process from Neurobiological Basis to Deep Models

The human attention mechanism can be understood and simulated by closely associating the saliency prediction task to neuroscience and psychology. Furthermore, saliency prediction i...

Hubungan Pengetahuan terkait Label Gizi dengan Kebiasaan Membaca Label Gizi pada Siswa SMA Al-Islam

Latar Belakang: Masih sedikit konsumen yang dapat memahami dan menggunakan label gizi sesuai dengan fungsinya. Hal ini dikarenakan masih rendahnya kesadaran masyarakat terkait pent...

Saliency detection using adaptive background template

Since most existing saliency detection models are not suitable for the condition that the salient objects are near at the image border, the authors propose a saliency detection app...

Fuze Well Mechanical Interface

<div class="section abstract"> <div class="htmlview paragraph">This interface standard applies to fuzes used in airborne weapons that use a 3-Inch Fuze Well. It defin...

Abstract PO-038: Improving lung cancer survival analysis from CT images by saliency sampling

Abstract Background: Survival analysis of the patient has an important role in the cancer treatment process. Traditional models based on clinical information, signs,...

Email:
Password:

Email: