Javascript must be enabled to continue!

CMCAF:Conditional Multi-scale Cross-Modal Adaptive Fusion Network for RGB-T salient object detection

RGB-T salient object detection (SOD) integrates visible and thermal cues for robust localization under complex conditions. However, existing methods often employ indiscriminate fusion strategies that assume equal reliability across modalities. This approach fails to dynamically mitigate interference when one modality degrades (e.g., thermal crossover), leading to the propagation of noise and the corruption of reliable features. To address this, we propose the Conditional Multi-scale Cross-Modal Adaptive Fusion (CMCAF) network. The core philosophy of CMCAF is to treat the thermal modality not merely as a feature source, but as a dynamic condition to modulate RGB processing. Specifically, a Shared Swin Backbone (SSB) is employed to extract aligned cross-modal representations. At the bottleneck, a Thermal-Conditioned Modulation (TCM) block generates channel-wise affine parameters. It functions as a gate: amplifying reliable cues when thermal data is salient, while effectively suppressing noise propagation to protect RGB semantics when thermal data is unreliable. To accommodate object scale variations, a Scale-Aware Fusion (SAF) module acts as a scale arbitrator, adaptively balancing semantic context and fine details. Furthermore, a Thermal-Guided Gating Decoder (TGGD) screens skip connections via dual gating, filtering out background noise backflow during reconstruction. Extensive experiments on RGB-T and RGB-D benchmarks demonstrate that CMCAF consistently outperforms state-of-the-art methods, exhibiting superior accuracy and strong robustness against modal noise. The codes and results can be accessed at https://github.com/AmazingJ-123/RGBT-SODCMCAF

Elsevier BV

Yiding Wang Xiaoyang Yuan Zongyan Yue Yiming Kong Sen Lin

2026

Title: CMCAF:Conditional Multi-scale Cross-Modal Adaptive Fusion Network for RGB-T salient object detection

Description:

RGB-T salient object detection (SOD) integrates visible and thermal cues for robust localization under complex conditions.

However, existing methods often employ indiscriminate fusion strategies that assume equal reliability across modalities.

This approach fails to dynamically mitigate interference when one modality degrades (e.

, thermal crossover), leading to the propagation of noise and the corruption of reliable features.

To address this, we propose the Conditional Multi-scale Cross-Modal Adaptive Fusion (CMCAF) network.

The core philosophy of CMCAF is to treat the thermal modality not merely as a feature source, but as a dynamic condition to modulate RGB processing.

Specifically, a Shared Swin Backbone (SSB) is employed to extract aligned cross-modal representations.

At the bottleneck, a Thermal-Conditioned Modulation (TCM) block generates channel-wise affine parameters.

It functions as a gate: amplifying reliable cues when thermal data is salient, while effectively suppressing noise propagation to protect RGB semantics when thermal data is unreliable.

To accommodate object scale variations, a Scale-Aware Fusion (SAF) module acts as a scale arbitrator, adaptively balancing semantic context and fine details.

Furthermore, a Thermal-Guided Gating Decoder (TGGD) screens skip connections via dual gating, filtering out background noise backflow during reconstruction.

Extensive experiments on RGB-T and RGB-D benchmarks demonstrate that CMCAF consistently outperforms state-of-the-art methods, exhibiting superior accuracy and strong robustness against modal noise.

The codes and results can be accessed at https://github.

com/AmazingJ-123/RGBT-SODCMCAF.

Back

Related Results

Depth-aware salient object segmentation

Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...

The Nuclear Fusion Award

The Nuclear Fusion Award ceremony for 2009 and 2010 award winners was held during the 23rd IAEA Fusion Energy Conference in Daejeon. This time, both 2009 and 2010 award winners w...

Object Tracking in RGB-T Videos Using Modal-Aware Attention Network and Competitive Learning

Object tracking in RGB-thermal (RGB-T) videos is increasingly used in many fields due to the all-weather and all-day working capability of the dual-modality imaging system, as well...

Nonproliferation and fusion power plants

Abstract The world now appears to be on the brink of realizing commercial fusion. As fusion energy progresses towards near-term commercial deployment, the question arises a...

DASYOLO: Dual-Attention-Synergistic YOLO for Cross-Modality Object Detection

Abstract The fusion of infrared and visible images effectively overcomes the limitations of single modalities in object detection, demonstrating significant advantages in a...

Co-relation of <i>Prakriti</i> of an Infant with Skin Color RGB Values of Facial Photograph and Standardization of Reference Standards of <i>Prakriti</i> Color Representor

Aim and Objective of the Study: In spite of individualized variation, skin color-one of the Prakriti determining characteristics contribute an important role to confirm the diagnos...

A UAV-Based Multi-Scenario RGB-Thermal Dataset and Fusion Model for Enhanced Forest Fire Detection

UAVs are essential for forest fire detection due to vast forest areas and inaccessibility of high-risk zones, enabling rapid long-range inspection and detailed close-range surveill...

Three-Dimensional Object Detection Network Based on Multi-Layer and Multi-Modal Fusion

Cameras and LiDAR are important sensors in autonomous driving systems that can provide complementary information to each other. However, most LiDAR-only methods outperform the fusi...

Email:
Password:

Email: