Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

RNIC-A Retrospect Network for image captioning

View through CrossRef
Abstract As cross-domain research combining computer vision and natural language processing, the current image captioning research mainly considers how to improve the visual features, less attention has been paid to utilizing the inherent properties of language to boost captioning performance. Facing this challenge, we proposed a textual attention mechanism, which can obtain semantic relevance between words by scanning all generated words. The Retrospect Network for image captioning(RNIC) proposed in this paper aims to improve input and prediction process by using textual attention. Concretely, the textual attention mechanism is applied to the model simultaneously with the visual attention mechanism to provide the input of the model with the maximum information required for generating captions. In this way, our model can learn to collaboratively attend on both visual and textual features. Moreover, the semantic relevance between words obtained by retrospect is used as the basis for prediction, so that the decoder can simulate the human language system and better make predictions based on the already generated contents. We evaluate the effectiveness of our model on the COCO image captioning datasets and achieve superior performance overthe previous methods.extraction function to extract the hidden unit information of multiple time steps for prediction, to solve the problem of insufficient LSTM prediction information. Experiments have shown that both model significantly improved the various evaluation indicators in the AI CHALLENGER test set.
Title: RNIC-A Retrospect Network for image captioning
Description:
Abstract As cross-domain research combining computer vision and natural language processing, the current image captioning research mainly considers how to improve the visual features, less attention has been paid to utilizing the inherent properties of language to boost captioning performance.
Facing this challenge, we proposed a textual attention mechanism, which can obtain semantic relevance between words by scanning all generated words.
The Retrospect Network for image captioning(RNIC) proposed in this paper aims to improve input and prediction process by using textual attention.
Concretely, the textual attention mechanism is applied to the model simultaneously with the visual attention mechanism to provide the input of the model with the maximum information required for generating captions.
In this way, our model can learn to collaboratively attend on both visual and textual features.
Moreover, the semantic relevance between words obtained by retrospect is used as the basis for prediction, so that the decoder can simulate the human language system and better make predictions based on the already generated contents.
We evaluate the effectiveness of our model on the COCO image captioning datasets and achieve superior performance overthe previous methods.
extraction function to extract the hidden unit information of multiple time steps for prediction, to solve the problem of insufficient LSTM prediction information.
Experiments have shown that both model significantly improved the various evaluation indicators in the AI CHALLENGER test set.

Related Results

Image Captioning with External Knowledge
Image Captioning with External Knowledge
This dissertation is dedicated to image captioning, the task of automatically generating a natural language description of a given image. Most modern automatic caption generators a...
Towards Connection-Scalable RNIC Architecture
Towards Connection-Scalable RNIC Architecture
Abstract RDMA is a widely adopted optimization strategy in datacenter networking that surpasses traditional kernel-based TCP/IP networking through mechanisms such as kernel...
The Road Map From Artificial Intelligence, Machine Learning, Deep Learning Techniques Towards Image Captioning System.
The Road Map From Artificial Intelligence, Machine Learning, Deep Learning Techniques Towards Image Captioning System.
Abstract Image Captioning is the process of generating textual descriptions of an image. These descriptions need to be syntactically and semantically correct. Image Caption...
Better Understanding: Stylized Image Captioning with Style Attention and Adversarial Training
Better Understanding: Stylized Image Captioning with Style Attention and Adversarial Training
Compared with traditional image captioning technology, stylized image captioning has broader application scenarios, such as a better understanding of images. However, stylized imag...
Caption
Caption
When Malcolm Fraser opened The Australian Captioning Centre in 1982, he emphasised the importance of changing technology in improving the provision of captions:there is always goin...
Double Exposure
Double Exposure
I. Happy Endings Chaplin’s Modern Times features one of the most subtly strange endings in Hollywood history. It concludes with the Tramp (Chaplin) and the Gamin (Paulette Godda...
Enhanced Captioning : Speaker Identification Using Graphical and Text-Based Identifiers
Enhanced Captioning : Speaker Identification Using Graphical and Text-Based Identifiers
<p>This thesis proposes a new technique for speaker identification in captioning using three identifiers: image, name and colour. This technique was implemented as a proofof-...
Enhanced Captioning : Speaker Identification Using Graphical and Text-Based Identifiers
Enhanced Captioning : Speaker Identification Using Graphical and Text-Based Identifiers
<p>This thesis proposes a new technique for speaker identification in captioning using three identifiers: image, name and colour. This technique was implemented as a proofof-...

Back to Top