Javascript must be enabled to continue!

EVJVQA CHALLENGE: MULTILINGUAL VISUAL QUESTION ANSWERING

Visual Question Answering (VQA) is a challenging task of natural language processing (NLP) and computer vision (CV), attracting significant attention from researchers. English is a resource-rich language that has witnessed various developments in datasets and models for visual question answering. Visual question answering in other languages also would be developed for resources and models. In addition, there is no multilingual dataset targeting the visual content of a particular country with its own objects and cultural characteristics. To address the weakness, we provide the research community with a benchmark dataset named EVJVQA, including 33,000+ pairs of question-answer over three languages: Vietnamese, English, and Japanese, on approximately 5,000 images taken from Vietnam for evaluating multilingual VQA systems or models. EVJVQA is used as a benchmark dataset for the challenge of multilingual visual question answering at the 9th Workshop on Vietnamese Language and Speech Processing (VLSP 2022). This task attracted 62 participant teams from various universities and organizations. In this article, we present details of the organization of the challenge, an overview of the methods employed by shared-task participants, and the results. The highest performances are 0.4392 in F1-score and 0.4009 in BLUE on the private test set. The multilingual QA systems proposed by the top 2 teams use ViT for the pre-trained vision model and mT5 for the pre-trained language model, a powerful pre-trained language model based on the transformer architecture. EVJVQA is a challenging dataset that motivates NLP and CV researchers to further explore the multilingual models or systems for visual question answering systems.

Publishing House for Science and Technology, Vietnam Academy of Science and Technology (Publications)

Ngan Luu-Thuy Nguyen Nghia Hieu Nguyen Duong T.D. Vo Khanh Quoc Tran Kiet Van Nguyen

Journal of Computer Science and Cybernetics

2023

Title: EVJVQA CHALLENGE: MULTILINGUAL VISUAL QUESTION ANSWERING

Description:

Visual Question Answering (VQA) is a challenging task of natural language processing (NLP) and computer vision (CV), attracting significant attention from researchers.

English is a resource-rich language that has witnessed various developments in datasets and models for visual question answering.

Visual question answering in other languages also would be developed for resources and models.

In addition, there is no multilingual dataset targeting the visual content of a particular country with its own objects and cultural characteristics.

To address the weakness, we provide the research community with a benchmark dataset named EVJVQA, including 33,000+ pairs of question-answer over three languages: Vietnamese, English, and Japanese, on approximately 5,000 images taken from Vietnam for evaluating multilingual VQA systems or models.

EVJVQA is used as a benchmark dataset for the challenge of multilingual visual question answering at the 9th Workshop on Vietnamese Language and Speech Processing (VLSP 2022).

This task attracted 62 participant teams from various universities and organizations.

In this article, we present details of the organization of the challenge, an overview of the methods employed by shared-task participants, and the results.

The highest performances are 0.

4392 in F1-score and 0.

4009 in BLUE on the private test set.

The multilingual QA systems proposed by the top 2 teams use ViT for the pre-trained vision model and mT5 for the pre-trained language model, a powerful pre-trained language model based on the transformer architecture.

EVJVQA is a challenging dataset that motivates NLP and CV researchers to further explore the multilingual models or systems for visual question answering systems.

Back

Multilingual Visual Question Answering (mVQA) is an extremely challenging task which needs to answer a question given in different languages and take the context in an image. This ...

Language Alternation in Multilingual Societies: Analyzing Bi/Multilingual Conversation

The research examines the relationship between language choice and alternation in bilingual/multilingual conversations within a multicultural/multilingual context. It builds on the...

Metacognition in multilingual learning and teaching

Abstract Metacognition has been increasingly discussed as one of the main features of learning in the 21st century (see Haukås, Bjørke, & Dypedahl, 2018). In the Dynamic Model ...

Interactive Question Answering

The increasing amount of information available online has led to the development of technologies that help to deal with it. One of them is Interactive Question Answering (IQA), a r...

Nanjing Yunjin intelligent question-answering system based on knowledge graphs and retrieval augmented generation technology

Abstract Nanjing Yunjin, a traditional Chinese silk weaving craft, is celebrated globally for its unique local characteristics and exquisite workmanship, forming an integ...

Moving towards (new) multilingual paradigms

Abstract Multilingual education is increasingly perceived as a desirable goal in a world where global networks play a significant role. Crucially, educating multilin...

EFFECT OF BILINGUAL INSTRUCTIONAL METHOD IN THE ACADEMIC ACHIEVEMENT OF JUNIOR SECONDARY SCHOOL STUDENTS IN MATHEMATICS

The importance of mathematics in the modern society is overwhelming. The importance of mathematics has long been recognized all over the world, and that is why all students are req...

Identity, Multilingualism and CALL: Responding to New Global Realities

This volume focuses on a range of topics and studies that address the notion of plurilingualism and multilingual identity in computer-mediated language learning (CALL) spaces. Inte...

Email:
Password:

Email:

EVJVQA CHALLENGE: MULTILINGUAL VISUAL QUESTION ANSWERING

Related Results