Javascript must be enabled to continue!

A SAM2-Driven RGB-T Annotation Pipeline with Thermal-Guided Refinement for Semantic Segmentation in Search-and-Rescue Scenes

High-quality RGB–thermal infrared (RGB-T) semantic segmentation datasets are crucial for search-and-rescue (SAR) applications, yet their development is hindered by the scarcity of annotated ground truth and by the challenges of thermal-camera calibration, which typically depends on heated targets with limited geometric definition. Recent approaches, such as MATT, focus on transferring SAM-based RGB masks to multi-spectral data, but they do not fully address the need for robust cross-modal alignment, quality control, or human-in-the-loop reliability assessment in RGB-T segmentation. To fill this gap, we propose a general annotation methodology that performs geometric alignment of RGB-T pairs, combines model-based proposals with interactive refinement, and incorporates annotation cost and systematic quality checks using inter-annotator agreement. In this methodology, multimodal alignment is ensured through feature-based matching and homography estimation. Annotation integrates automatic proposals and guided refinement, and final masks undergo quantitative cost and quality control before being used in downstream model training. The proposed methodology was evaluated on a SAR-oriented RGB-T dataset comprising 306 image pairs. Consistent cross-modal alignment was achieved via SuperGlue-based matching and homography estimation, enabling the implementation of a SAM2-based semi-automatic annotation pipeline in Label Studio. Results across two annotators show that the proposed approach reduces annotation time by 21% while achieving a high annotation quality mean IoU = 74.9%) and a high inter-annotator agreement (mean pixel accuracy = 88.4%, Cohen's kappa = 83%). The curated labels were then used to benchmark two representative RGB-T segmentation models. These findings demonstrate the practical value of the proposed methodology and establish a reproducible framework for generating reliable RGB-T semantic segmentation datasets, complementing and extending recent multispectral auto-labeling approaches.

MDPI AG

Andrés Salas-Espinales Ricardo Vázquez-Martín Anthony Mandow

2026

Title: A SAM2-Driven RGB-T Annotation Pipeline with Thermal-Guided Refinement for Semantic Segmentation in Search-and-Rescue Scenes

Description:

Recent approaches, such as MATT, focus on transferring SAM-based RGB masks to multi-spectral data, but they do not fully address the need for robust cross-modal alignment, quality control, or human-in-the-loop reliability assessment in RGB-T segmentation.

To fill this gap, we propose a general annotation methodology that performs geometric alignment of RGB-T pairs, combines model-based proposals with interactive refinement, and incorporates annotation cost and systematic quality checks using inter-annotator agreement.

In this methodology, multimodal alignment is ensured through feature-based matching and homography estimation.

Annotation integrates automatic proposals and guided refinement, and final masks undergo quantitative cost and quality control before being used in downstream model training.

The proposed methodology was evaluated on a SAR-oriented RGB-T dataset comprising 306 image pairs.

Consistent cross-modal alignment was achieved via SuperGlue-based matching and homography estimation, enabling the implementation of a SAM2-based semi-automatic annotation pipeline in Label Studio.

Results across two annotators show that the proposed approach reduces annotation time by 21% while achieving a high annotation quality mean IoU = 74.

9%) and a high inter-annotator agreement (mean pixel accuracy = 88.

4%, Cohen's kappa = 83%).

The curated labels were then used to benchmark two representative RGB-T segmentation models.

These findings demonstrate the practical value of the proposed methodology and establish a reproducible framework for generating reliable RGB-T semantic segmentation datasets, complementing and extending recent multispectral auto-labeling approaches.

Back

High-quality RGB–thermal infrared (RGB-T) semantic segmentation datasets are crucial for search-and-rescue (SAR) applications, yet their development is hindered by the scarcity of ...

Principes et outils pour l’annotation des corpus

La linguistique de corpus, c’est à dire les recherches sur le langage portant sur un matériel linguistique écrit ou oral recueilli et conservé, s’est considérablement développée au...

Remote SAMsing: From Segment Anything to Segment Everything

SAM2 produces high-quality zero-shot segmentation on natural images, but applying it to large remote sensing scenes exposes two problems: (1) its mask generator faces an inherent q...

Installation Analysis of Matterhorn Pipeline Replacement

Abstract The paper describes the installation analysis for the Matterhorn field pipeline replacement, located in water depths between 800-ft to 1200-ft in the Gul...

Thermal Effects in High Compactness CEA Stack

Thermal management is a pivotal aspect of stack durability and system operability. Consequently, understanding the thermal mapping within a stack based on its operating conditions ...

Depth-aware salient object segmentation

Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...

SAFERNet: Channel, Positional, and Global Attention Fusion for Efficient RGB-T Segmentation in Disaster Robotics

Real-time RGB and thermal (RGB-T) fusion is vital for disaster robotics, where robots must navigate unstructured, hazardous environments under tight resource constraints. The main ...

RGB-Guided Multi-Kernel Attention Feature Extraction Network for Hyperspectral Image Super-Resolution

Hyperspectral image (HSI) super-resolution aims to reconstruct high-spatial-resolution images from their low-resolution counterparts while preserving critical spectral fidelity. Ex...

Email:
Password:

Email:

A SAM2-Driven RGB-T Annotation Pipeline with Thermal-Guided Refinement for Semantic Segmentation in Search-and-Rescue Scenes

Related Results