Javascript must be enabled to continue!

RNIC-A Retrospect Network for image captioning

Abstract As cross-domain research combining computer vision and natural language processing, the current image captioning research mainly considers how to improve the visual features, less attention has been paid to utilizing the inherent properties of language to boost captioning performance. Facing this challenge, we proposed a textual attention mechanism, which can obtain semantic relevance between words by scanning all generated words. The Retrospect Network for image captioning(RNIC) proposed in this paper aims to improve input and prediction process by using textual attention. Concretely, the textual attention mechanism is applied to the model simultaneously with the visual attention mechanism to provide the input of the model with the maximum information required for generating captions. In this way, our model can learn to collaboratively attend on both visual and textual features. Moreover, the semantic relevance between words obtained by retrospect is used as the basis for prediction, so that the decoder can simulate the human language system and better make predictions based on the already generated contents. We evaluate the effectiveness of our model on the COCO image captioning datasets and achieve superior performance overthe previous methods.extraction function to extract the hidden unit information of multiple time steps for prediction, to solve the problem of insufficient LSTM prediction information. Experiments have shown that both model significantly improved the various evaluation indicators in the AI CHALLENGER test set.

Research Square Platform LLC

XIU LONG YI YOU FU DU LEI ZHENG XIAO PENG LIU RONG HUA

2021

Title: RNIC-A Retrospect Network for image captioning

Description:

Facing this challenge, we proposed a textual attention mechanism, which can obtain semantic relevance between words by scanning all generated words.

The Retrospect Network for image captioning(RNIC) proposed in this paper aims to improve input and prediction process by using textual attention.

Concretely, the textual attention mechanism is applied to the model simultaneously with the visual attention mechanism to provide the input of the model with the maximum information required for generating captions.

In this way, our model can learn to collaboratively attend on both visual and textual features.

Moreover, the semantic relevance between words obtained by retrospect is used as the basis for prediction, so that the decoder can simulate the human language system and better make predictions based on the already generated contents.

We evaluate the effectiveness of our model on the COCO image captioning datasets and achieve superior performance overthe previous methods.

extraction function to extract the hidden unit information of multiple time steps for prediction, to solve the problem of insufficient LSTM prediction information.

Experiments have shown that both model significantly improved the various evaluation indicators in the AI CHALLENGER test set.

Back

This dissertation is dedicated to image captioning, the task of automatically generating a natural language description of a given image. Most modern automatic caption generators a...

Towards Connection-Scalable RNIC Architecture

Abstract RDMA is a widely adopted optimization strategy in datacenter networking that surpasses traditional kernel-based TCP/IP networking through mechanisms such as kernel...

The Road Map From Artificial Intelligence, Machine Learning, Deep Learning Techniques Towards Image Captioning System.

Abstract Image Captioning is the process of generating textual descriptions of an image. These descriptions need to be syntactically and semantically correct. Image Caption...

Better Understanding: Stylized Image Captioning with Style Attention and Adversarial Training

Compared with traditional image captioning technology, stylized image captioning has broader application scenarios, such as a better understanding of images. However, stylized imag...

Caption

When Malcolm Fraser opened The Australian Captioning Centre in 1982, he emphasised the importance of changing technology in improving the provision of captions:there is always goin...

Double Exposure

I. Happy Endings Chaplin’s Modern Times features one of the most subtly strange endings in Hollywood history. It concludes with the Tramp (Chaplin) and the Gamin (Paulette Godda...

Enhanced Captioning : Speaker Identification Using Graphical and Text-Based Identifiers

<p>This thesis proposes a new technique for speaker identification in captioning using three identifiers: image, name and colour. This technique was implemented as a proofof-...

Enhanced Captioning : Speaker Identification Using Graphical and Text-Based Identifiers

<p>This thesis proposes a new technique for speaker identification in captioning using three identifiers: image, name and colour. This technique was implemented as a proofof-...

Email:
Password:

Email:

RNIC-A Retrospect Network for image captioning

Related Results