Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

A SAM2-Driven RGB-T Annotation Pipeline with Thermal-Guided Refinement for Semantic Segmentation in Search-and-Rescue Scenes

View through CrossRef
High-quality RGB–thermal infrared (RGB-T) semantic segmentation datasets are crucial for search-and-rescue (SAR) applications, yet their development is hindered by the scarcity of annotated ground truth and by the challenges of thermal-camera calibration, which typically depends on heated targets with limited geometric definition. Recent approaches, such as MATT, focus on transferring SAM-based RGB masks to multi-spectral data, but they do not fully address the need for robust cross-modal alignment, quality control, or human-in-the-loop reliability assessment in RGB-T segmentation. To fill this gap, we propose a general annotation methodology that performs geometric alignment of RGB-T pairs, combines model-based proposals with interactive refinement, and incorporates annotation cost and systematic quality checks using inter-annotator agreement. In this methodology, multimodal alignment is ensured through feature-based matching and homography estimation. Annotation integrates automatic proposals and guided refinement, and final masks undergo quantitative cost and quality control before being used in downstream model training. The proposed methodology was evaluated on a SAR-oriented RGB-T dataset comprising 306 image pairs. Consistent cross-modal alignment was achieved via SuperGlue-based matching and homography estimation, enabling the implementation of a SAM2-based semi-automatic annotation pipeline in Label Studio. Results across two annotators show that the proposed approach reduces annotation time by 21% while achieving a high annotation quality mean IoU = 74.9%) and a high inter-annotator agreement (mean pixel accuracy = 88.4%, Cohen's kappa = 83%). The curated labels were then used to benchmark two representative RGB-T segmentation models. These findings demonstrate the practical value of the proposed methodology and establish a reproducible framework for generating reliable RGB-T semantic segmentation datasets, complementing and extending recent multispectral auto-labeling approaches.
Title: A SAM2-Driven RGB-T Annotation Pipeline with Thermal-Guided Refinement for Semantic Segmentation in Search-and-Rescue Scenes
Description:
High-quality RGB–thermal infrared (RGB-T) semantic segmentation datasets are crucial for search-and-rescue (SAR) applications, yet their development is hindered by the scarcity of annotated ground truth and by the challenges of thermal-camera calibration, which typically depends on heated targets with limited geometric definition.
Recent approaches, such as MATT, focus on transferring SAM-based RGB masks to multi-spectral data, but they do not fully address the need for robust cross-modal alignment, quality control, or human-in-the-loop reliability assessment in RGB-T segmentation.
To fill this gap, we propose a general annotation methodology that performs geometric alignment of RGB-T pairs, combines model-based proposals with interactive refinement, and incorporates annotation cost and systematic quality checks using inter-annotator agreement.
In this methodology, multimodal alignment is ensured through feature-based matching and homography estimation.
Annotation integrates automatic proposals and guided refinement, and final masks undergo quantitative cost and quality control before being used in downstream model training.
The proposed methodology was evaluated on a SAR-oriented RGB-T dataset comprising 306 image pairs.
Consistent cross-modal alignment was achieved via SuperGlue-based matching and homography estimation, enabling the implementation of a SAM2-based semi-automatic annotation pipeline in Label Studio.
Results across two annotators show that the proposed approach reduces annotation time by 21% while achieving a high annotation quality mean IoU = 74.
9%) and a high inter-annotator agreement (mean pixel accuracy = 88.
4%, Cohen's kappa = 83%).
The curated labels were then used to benchmark two representative RGB-T segmentation models.
These findings demonstrate the practical value of the proposed methodology and establish a reproducible framework for generating reliable RGB-T semantic segmentation datasets, complementing and extending recent multispectral auto-labeling approaches.

Related Results

A SAM2-Driven RGB-T Annotation Pipeline with Thermal-Guided Refinement for Semantic Segmentation in Search-and-Rescue Scenes
A SAM2-Driven RGB-T Annotation Pipeline with Thermal-Guided Refinement for Semantic Segmentation in Search-and-Rescue Scenes
High-quality RGB–thermal infrared (RGB-T) semantic segmentation datasets are crucial for search-and-rescue (SAR) applications, yet their development is hindered by the scarcity of ...
Installation Analysis of Matterhorn Pipeline Replacement
Installation Analysis of Matterhorn Pipeline Replacement
Abstract The paper describes the installation analysis for the Matterhorn field pipeline replacement, located in water depths between 800-ft to 1200-ft in the Gul...
Thermal Effects in High Compactness CEA Stack
Thermal Effects in High Compactness CEA Stack
Thermal management is a pivotal aspect of stack durability and system operability. Consequently, understanding the thermal mapping within a stack based on its operating conditions ...
Depth-aware salient object segmentation
Depth-aware salient object segmentation
Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...
SAFERNet: Channel, Positional, and Global Attention Fusion for Efficient RGB-T Segmentation in Disaster Robotics
SAFERNet: Channel, Positional, and Global Attention Fusion for Efficient RGB-T Segmentation in Disaster Robotics
Real-time RGB and thermal (RGB-T) fusion is vital for disaster robotics, where robots must navigate unstructured, hazardous environments under tight resource constraints. The main ...
RGB-Guided Multi-Kernel Attention Feature Extraction Network for Hyperspectral Image Super-Resolution
RGB-Guided Multi-Kernel Attention Feature Extraction Network for Hyperspectral Image Super-Resolution
Hyperspectral image (HSI) super-resolution aims to reconstruct high-spatial-resolution images from their low-resolution counterparts while preserving critical spectral fidelity. Ex...
A Semantic Orthogonal Mapping Method Through Deep-Learning for Semantic Computing
A Semantic Orthogonal Mapping Method Through Deep-Learning for Semantic Computing
In order to realize an artificial intelligent system, a basic mechanism should be provided for expressing and processing the semantic. We have presented semantic computing models i...
KONSEP CORPORATE RESCUE DALAM KEPAILITAN DI INDONESIA
KONSEP CORPORATE RESCUE DALAM KEPAILITAN DI INDONESIA
Abstract Corporate Rescue Concept goal is to save company from bankruptcy so its business can be continued and the debts can be paid. This concept has been applied in other countri...

Back to Top