Javascript must be enabled to continue!

Integrating Pose Features and Cross-Relationship Learning for Human–Object Interaction Detection

Background: The main challenge in human–object interaction detection (HOI) is how to accurately reason about ambiguous, complex, and difficult to recognize interactions. The model structure of the existing methods is relatively single, and the image input may be occluded and cannot be accurately recognized. Methods: In this paper, we design a Pose-Aware Interaction Network (PAIN) based on transformer architecture and human posture to address these issues through two innovations: A new feature fusion method is proposed, which fuses human pose features and image features early before the encoder to improve the feature expression ability, and the individual motion-related features are additionally strengthened by adding to the human branch; the Cross-Attention Relationship fusion Module (CARM) better fuses the three-branch output and captures the detailed relationship information of HOI. Results: The proposed method achieves 64.51%AProle#1, 66.42%AProle#2 on the public dataset V-COCO and 30.83% AP on HICO-DET, which can recognize HOI instances more accurately.

MDPI AG

Lang Wu Jie Li Shuqin Li Yu Ding Meng Zhou Yuntao Shi

2025

Title: Integrating Pose Features and Cross-Relationship Learning for Human–Object Interaction Detection

Description:

Background: The main challenge in human–object interaction detection (HOI) is how to accurately reason about ambiguous, complex, and difficult to recognize interactions.

The model structure of the existing methods is relatively single, and the image input may be occluded and cannot be accurately recognized.

Methods: In this paper, we design a Pose-Aware Interaction Network (PAIN) based on transformer architecture and human posture to address these issues through two innovations: A new feature fusion method is proposed, which fuses human pose features and image features early before the encoder to improve the feature expression ability, and the individual motion-related features are additionally strengthened by adding to the human branch; the Cross-Attention Relationship fusion Module (CARM) better fuses the three-branch output and captures the detailed relationship information of HOI.

Results: The proposed method achieves 64.

51%AProle#1, 66.

42%AProle#2 on the public dataset V-COCO and 30.

83% AP on HICO-DET, which can recognize HOI instances more accurately.

Back

Related Results

Depth-aware salient object segmentation

Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...

Deep learning for small object detection in images

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] With the rapid development of deep learning in computer vision, especially deep convolutional neural network...

Detection of acne by deep learning object detection

AbstractImportanceState-of-the art performance is achieved with a deep learning object detection model for acne detection. There is little current research on object detection in d...

Object Detection Using CNN

Object detection system using Convolutional Neural Network(CNN) that can accurately identify and classify objects in videos. The purpose of object detection using CNN to enhance te...

Classification of Deep Learning Techniques for Object Detection

The object detection framework recognises real-world objects within the frame of a moving photograph or computer-generated image. The object has a location to flow to through other...

Initial Experience with Pediatrics Online Learning for Nonclinical Medical Students During the COVID-19 Pandemic 

Abstract Background: To minimize the risk of infection during the COVID-19 pandemic, the learning mode of universities in China has been adjusted, and the online learning o...

Contour Tracking

Abstract Object tracking is a fundamental problem in computer vision. It is generally required as a preprocessing step that is used to perform motion‐based object recogni...

A novel deep learning‐based single shot multibox detector model for object detection in optical remote sensing images

AbstractRemote sensing image object detection is widely used in civil and military fields. The important task is to detect objects such as ships, planes, airports, harbours and so ...

Email:
Password:

Email: