Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Modified multimodal similarity based generative adversarial network in text to image generation

View through CrossRef
Text-to-image synthesis (T2I) is a challenging task, as the model must generate high-quality images that are both semantically realistic and consistent. Current approaches typically begin by producing an initial blurred image, which is then refined to improve quality. However, many existing methods struggle to ensure that the refined image accurately corresponds to the provided text description. To address this limitation, this paper proposes a novel Multimodal Similarity-based Generative adversarial network for Text to Image Generation (MSG-TIG) framework. The proposed MSG-TIG framework involves the input text and segmented mask image as input. These inputs are subjected to the preprocessing step, where the text is transformed into reduced words by using the TS2 approach that offers dimension-reduced text for better performance and the noise in the image gets removed using Median filtering. From the preprocessed text, Bag of Words (BoW) and Class Frequency assisted Term Frequency and Inverse Document Frequency (CF-TF-IDF) features are extracted. Conversely, the color features, Compute Neighbour Pixel value in Hierarchy of Skeleton (CNP-HoS)-based features are extracted from the preprocessed mask image. Subsequently, the extracted feature set is passed into the Modified Similarity Score-assisted Multimodal Similarity-based Generative Adversarial Network (MSS-MS-GAN) to generate multiple images. The MSS-MS-GAN adopts the Modified Similarity Score assisted Multimodal Similarity Model (MSS-MSM) in the Generator phase to obtain better generative output by reducing the risk of collapse. The MSS-MS-GAN strategy acquired the Inception Score of 4.913, SSIM of 0.861 and PSNR of 35.245. In addition, it acquired lesser error values, such as MAE=0.228 and MSE=0.094, respectively.
Title: Modified multimodal similarity based generative adversarial network in text to image generation
Description:
Text-to-image synthesis (T2I) is a challenging task, as the model must generate high-quality images that are both semantically realistic and consistent.
Current approaches typically begin by producing an initial blurred image, which is then refined to improve quality.
However, many existing methods struggle to ensure that the refined image accurately corresponds to the provided text description.
To address this limitation, this paper proposes a novel Multimodal Similarity-based Generative adversarial network for Text to Image Generation (MSG-TIG) framework.
The proposed MSG-TIG framework involves the input text and segmented mask image as input.
These inputs are subjected to the preprocessing step, where the text is transformed into reduced words by using the TS2 approach that offers dimension-reduced text for better performance and the noise in the image gets removed using Median filtering.
From the preprocessed text, Bag of Words (BoW) and Class Frequency assisted Term Frequency and Inverse Document Frequency (CF-TF-IDF) features are extracted.
Conversely, the color features, Compute Neighbour Pixel value in Hierarchy of Skeleton (CNP-HoS)-based features are extracted from the preprocessed mask image.
Subsequently, the extracted feature set is passed into the Modified Similarity Score-assisted Multimodal Similarity-based Generative Adversarial Network (MSS-MS-GAN) to generate multiple images.
The MSS-MS-GAN adopts the Modified Similarity Score assisted Multimodal Similarity Model (MSS-MSM) in the Generator phase to obtain better generative output by reducing the risk of collapse.
The MSS-MS-GAN strategy acquired the Inception Score of 4.
913, SSIM of 0.
861 and PSNR of 35.
245.
In addition, it acquired lesser error values, such as MAE=0.
228 and MSE=0.
094, respectively.

Related Results

E-Press and Oppress
E-Press and Oppress
From elephants to ABBA fans, silicon to hormone, the following discussion uses a new research method to look at printed text, motion pictures and a te...
Research on Style Migration Techniques Based on Generative Adversarial Networks in Chinese Painting Creation
Research on Style Migration Techniques Based on Generative Adversarial Networks in Chinese Painting Creation
Abstract The continuous progress and development of science and technology have brought rich and diverse artistic experiences to the current society. The image style...
AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model
AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model
Multimodal sentiment analysis is an essential task in natural language processing which refers to the fact that machines can analyze and recognize emotions through logical reasonin...
Social Event Classification Based on Multimodal Masked Transformer Network
Social Event Classification Based on Multimodal Masked Transformer Network
The key to multimodal social event classification is to fully and accurately utilize the features of both image and text modalities. However, most existing methods have the followi...
Double Exposure
Double Exposure
I. Happy Endings Chaplin’s Modern Times features one of the most subtly strange endings in Hollywood history. It concludes with the Tramp (Chaplin) and the Gamin (Paulette Godda...
Seg2pix: Few Shot Training Line Art Colorization with Segmented Image Data
Seg2pix: Few Shot Training Line Art Colorization with Segmented Image Data
There are various challenging issues in automating line art colorization. In this paper, we propose a GAN approach incorporating semantic segmentation image data. Our GAN-based met...
DESIGNING A MULTIMODAL TRANSPORT NETWORK
DESIGNING A MULTIMODAL TRANSPORT NETWORK
Objective: To create a methodology for designing a multimodal transport network under various scenarios of socioeconomic development of the Russian Federation and its regions which...

Back to Top