Javascript must be enabled to continue!
A Comprehensive Survey on Image Captioning for Indian Languages: Techniques, Datasets, and Challenges
View through CrossRef
Abstract
In image captioning, we generate visual descriptions from an image. Image Cap-tioning requires identifying the key entity, feature, and association in an image. There is also a requirement to generate captions that are syntactically and semantically correct. The process of image captioning requires computer vision and natural language processing. In the past few decades, a substantial attempt has been made to generate the caption for images. In this survey article, we are going to present an extensive survey on image captioning for Indian Languages. To summarize recent research work in image captioning, first, we briefly review the traditional approach to image captioning depending on template and retrieval. Further deep-learning approaches for image captioning are concentrated which are classified as encoder-decoder architecture, attention-based approach, and transformer architecture. Our main focus in this survey is based on image cap-tioning techniques for Indian languages like Hindi, Bengali Assamese, etc. After that, we analyze the state-of-the-art approach on the most widely dataset i.e. MS COCO dataset with their strengths, limitations, and performance metrics i.e. BLEU, ROUGE, METEOR, CIDEr, SPICE. At last, we explore discussion on open challenges and future direction in the field of image captioning.
Title: A Comprehensive Survey on Image Captioning for Indian Languages: Techniques, Datasets, and Challenges
Description:
Abstract
In image captioning, we generate visual descriptions from an image.
Image Cap-tioning requires identifying the key entity, feature, and association in an image.
There is also a requirement to generate captions that are syntactically and semantically correct.
The process of image captioning requires computer vision and natural language processing.
In the past few decades, a substantial attempt has been made to generate the caption for images.
In this survey article, we are going to present an extensive survey on image captioning for Indian Languages.
To summarize recent research work in image captioning, first, we briefly review the traditional approach to image captioning depending on template and retrieval.
Further deep-learning approaches for image captioning are concentrated which are classified as encoder-decoder architecture, attention-based approach, and transformer architecture.
Our main focus in this survey is based on image cap-tioning techniques for Indian languages like Hindi, Bengali Assamese, etc.
After that, we analyze the state-of-the-art approach on the most widely dataset i.
e.
MS COCO dataset with their strengths, limitations, and performance metrics i.
e.
BLEU, ROUGE, METEOR, CIDEr, SPICE.
At last, we explore discussion on open challenges and future direction in the field of image captioning.
Related Results
Image Captioning with External Knowledge
Image Captioning with External Knowledge
This dissertation is dedicated to image captioning, the task of automatically generating a natural language description of a given image. Most modern automatic caption generators a...
The Road Map From Artificial Intelligence, Machine Learning, Deep Learning Techniques Towards Image Captioning System.
The Road Map From Artificial Intelligence, Machine Learning, Deep Learning Techniques Towards Image Captioning System.
Abstract
Image Captioning is the process of generating textual descriptions of an image. These descriptions need to be syntactically and semantically correct. Image Caption...
TAPER-WE: Transformer-Based Model Attention with Relative Position Encoding and Word Embedding for Video Captioning and Summarization in Dense Environment
TAPER-WE: Transformer-Based Model Attention with Relative Position Encoding and Word Embedding for Video Captioning and Summarization in Dense Environment
In the era of burgeoning digital content, the need for automated video captioning and summarization in dense environments has become increasingly critical. This paper introduces TA...
Better Understanding: Stylized Image Captioning with Style Attention and Adversarial Training
Better Understanding: Stylized Image Captioning with Style Attention and Adversarial Training
Compared with traditional image captioning technology, stylized image captioning has broader application scenarios, such as a better understanding of images. However, stylized imag...
Kra-Dai Languages
Kra-Dai Languages
Kra-Dai (also called Tai-Kadai and Kam-Tai) is a family of approximately 100 languages spoken in Southeast Asia, extending from the island of Hainan, China, in the east to the Indi...
Double Exposure
Double Exposure
I. Happy Endings
Chaplin’s Modern Times features one of the most subtly strange endings in Hollywood history. It concludes with the Tramp (Chaplin) and the Gamin (Paulette Godda...
RNIC-A Retrospect Network for image captioning
RNIC-A Retrospect Network for image captioning
Abstract
As cross-domain research combining computer vision and natural language processing, the current image captioning research mainly considers how to improve the visua...

