Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

It's All Connected: A Survey for Multimodal Arabic AI

View through CrossRef
Abstract Multimodal AI integrates text, vision, and speech within unified reasoning frameworks, yet Arabic remains significantly underrepresented due to diglossia, morphological complexity, and scarce multimodal resources. This survey delivers the first comprehensive technical roadmap for Arabic multimodal AI, covering the progression from unimodal Arabic NLP, OCR, and ASR to recent Arabic-capable Multimodal Large Language Models (MLLMs). We review available multimodal datasets, modality encoders, tokenization approaches, connector designs, and fusion strategies used in state-of-the-art systems. We also provide the first consolidated evaluation of Arabic-capable MLLMs on multimodal benchmarks ARB and PEARL analyzing performance, robustness, and domain generalization across OCR-grounded and open-domain VQA settings. Despite recent progress, challenges persist in cultural grounding, dialect inclusivity, dataset scale, and open-access ecosystem maturity. We outline actionable directions for scalable and culturally aligned Arabic multimodal intelligence, including parameter-efficient adaptation, broader corpus development, and unified evaluation protocols. By consolidating technical advances and empirical insights, this survey establishes a foundation to guide the next generation of Arabic-centric multimodal research.
Springer Science and Business Media LLC
Title: It's All Connected: A Survey for Multimodal Arabic AI
Description:
Abstract Multimodal AI integrates text, vision, and speech within unified reasoning frameworks, yet Arabic remains significantly underrepresented due to diglossia, morphological complexity, and scarce multimodal resources.
This survey delivers the first comprehensive technical roadmap for Arabic multimodal AI, covering the progression from unimodal Arabic NLP, OCR, and ASR to recent Arabic-capable Multimodal Large Language Models (MLLMs).
We review available multimodal datasets, modality encoders, tokenization approaches, connector designs, and fusion strategies used in state-of-the-art systems.
We also provide the first consolidated evaluation of Arabic-capable MLLMs on multimodal benchmarks ARB and PEARL analyzing performance, robustness, and domain generalization across OCR-grounded and open-domain VQA settings.
Despite recent progress, challenges persist in cultural grounding, dialect inclusivity, dataset scale, and open-access ecosystem maturity.
We outline actionable directions for scalable and culturally aligned Arabic multimodal intelligence, including parameter-efficient adaptation, broader corpus development, and unified evaluation protocols.
By consolidating technical advances and empirical insights, this survey establishes a foundation to guide the next generation of Arabic-centric multimodal research.

Related Results

قصيد”اللغة العربية تنعى حظها بين أهلها“ لحافظ ابراهيم: دراسة تحليلية
قصيد”اللغة العربية تنعى حظها بين أهلها“ لحافظ ابراهيم: دراسة تحليلية
Many Languages are spoken in the world. The diversity of human languages and colors are sign of Allah, for those of knowledge (Al-Quran, 30:22). Although the Arabic language origin...
Arabic Natural Language Processing
Arabic Natural Language Processing
The Arabic language presents researchers and developers of natural language processing (NLP) applications for Arabic text and speech with serious challenges. The purpose of this ar...
Arabic Learning for Academic Purposes
Arabic Learning for Academic Purposes
This study aimed to determine the goal of teaching Arabic for Academic purposes. Teaching Arabic for non-Arabic speakers is generally divided into two types: Arabic language for li...
Using Diacritics in the Arabic Script of Malay to Scaffold Arab Postgraduate Students in Reading Malay Words
Using Diacritics in the Arabic Script of Malay to Scaffold Arab Postgraduate Students in Reading Malay Words
Purpose – This study aims to investigate the use of diacritics in the Arabic script of Malay to facilitate Arab postgraduate students of UKM to read the Malay words accurately. It ...
Effective Arabic Language Teaching Strategies in the Language Laboratory for Students of Darussalam Gontor Islamic Institution
Effective Arabic Language Teaching Strategies in the Language Laboratory for Students of Darussalam Gontor Islamic Institution
Language is an important tool for the life of civilized man. Through language, people can communicate with each other, and convey their intentions and feelings to others. The moder...
AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model
AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model
Multimodal sentiment analysis is an essential task in natural language processing which refers to the fact that machines can analyze and recognize emotions through logical reasonin...

Back to Top