Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

An Analysis on Recent Approaches for Image Captioning

View through CrossRef
Image captioning is an interdisciplinary area that uses techniques from computer vision and natural language processing to provide a textual description of a picture. The Image captioning task is the process of understanding the scene present in the image by identifying objects and associated actions present to create a meaningful human-like caption which can be used for wide range of applications, including image retrieval, video indexing, assistive technology for the visually impaired, content-based image search, biomedicine, and autonomous cars. Formerly, Machine Learning was utilized for this purpose which will be extensive use of hand-crafted features such as Scale-Invariant Feature Transform (SIFT), Local Binary Patterns (LBP), the Histogram of Oriented Gradients (HOG), and combinations of these features. Extracting handmade characteristics from huge datasets is not straightforward or viable. Many deep learning-based techniques were later proposed. Deep Learning retrieval and template-based approaches were presented; however, both had drawbacks such as losing crucial objects. Recent breakthroughs in deep learning and natural language processing have resulted in considerable increases in image captioning system performance which involves adopting attention mechanisms, transformer-based architectures, multi modal connections, Object-Detection based encoder-decoder and many others. In this survey will explore some of the most recent techniques for image captioning, the datasets and evaluation measures that have been employed in deep learning-based automatic image captioning. The ultimate intention of this study is to act as a guide for researchers by emphasizing future directions for research work. Index Terms: image captioning, computer vision, deep learning, Textual description, natural language processing.
Title: An Analysis on Recent Approaches for Image Captioning
Description:
Image captioning is an interdisciplinary area that uses techniques from computer vision and natural language processing to provide a textual description of a picture.
The Image captioning task is the process of understanding the scene present in the image by identifying objects and associated actions present to create a meaningful human-like caption which can be used for wide range of applications, including image retrieval, video indexing, assistive technology for the visually impaired, content-based image search, biomedicine, and autonomous cars.
Formerly, Machine Learning was utilized for this purpose which will be extensive use of hand-crafted features such as Scale-Invariant Feature Transform (SIFT), Local Binary Patterns (LBP), the Histogram of Oriented Gradients (HOG), and combinations of these features.
Extracting handmade characteristics from huge datasets is not straightforward or viable.
Many deep learning-based techniques were later proposed.
Deep Learning retrieval and template-based approaches were presented; however, both had drawbacks such as losing crucial objects.
Recent breakthroughs in deep learning and natural language processing have resulted in considerable increases in image captioning system performance which involves adopting attention mechanisms, transformer-based architectures, multi modal connections, Object-Detection based encoder-decoder and many others.
In this survey will explore some of the most recent techniques for image captioning, the datasets and evaluation measures that have been employed in deep learning-based automatic image captioning.
The ultimate intention of this study is to act as a guide for researchers by emphasizing future directions for research work.
Index Terms: image captioning, computer vision, deep learning, Textual description, natural language processing.

Related Results

Image Captioning with External Knowledge
Image Captioning with External Knowledge
This dissertation is dedicated to image captioning, the task of automatically generating a natural language description of a given image. Most modern automatic caption generators a...
A Comprehensive Survey on Image Captioning for Indian Languages: Techniques, Datasets, and Challenges
A Comprehensive Survey on Image Captioning for Indian Languages: Techniques, Datasets, and Challenges
Abstract In image captioning, we generate visual descriptions from an image. Image Cap-tioning requires identifying the key entity, feature, and association in an image. Th...
The Road Map From Artificial Intelligence, Machine Learning, Deep Learning Techniques Towards Image Captioning System.
The Road Map From Artificial Intelligence, Machine Learning, Deep Learning Techniques Towards Image Captioning System.
Abstract Image Captioning is the process of generating textual descriptions of an image. These descriptions need to be syntactically and semantically correct. Image Caption...
Better Understanding: Stylized Image Captioning with Style Attention and Adversarial Training
Better Understanding: Stylized Image Captioning with Style Attention and Adversarial Training
Compared with traditional image captioning technology, stylized image captioning has broader application scenarios, such as a better understanding of images. However, stylized imag...
Caption
Caption
When Malcolm Fraser opened The Australian Captioning Centre in 1982, he emphasised the importance of changing technology in improving the provision of captions:there is always goin...
RefCap: Image Captioning with Referent Objects Attributes
RefCap: Image Captioning with Referent Objects Attributes
Abstract In recent years, significant progress has been made in visual-linguistic multi-modality research, leading to advancements in visual comprehension and its applicati...
Image Captioning using Neural Networks
Image Captioning using Neural Networks
Authors have given a thorough review of all previous work on deep image captioning models. Authors describe multiple kinds of attention mechanisms used in deep learning models for ...

Back to Top