Javascript must be enabled to continue!
It's All Connected: A Survey for Multimodal Arabic AI
View through CrossRef
Abstract
Multimodal AI integrates text, vision, and speech within unified reasoning frameworks, yet Arabic remains significantly underrepresented due to diglossia, morphological complexity, and scarce multimodal resources. This survey delivers the first comprehensive technical roadmap for Arabic multimodal AI, covering the progression from unimodal Arabic NLP, OCR, and ASR to recent Arabic-capable Multimodal Large Language Models (MLLMs). We review available multimodal datasets, modality encoders, tokenization approaches, connector designs, and fusion strategies used in state-of-the-art systems. We also provide the first consolidated evaluation of Arabic-capable MLLMs on multimodal benchmarks ARB and PEARL analyzing performance, robustness, and domain generalization across OCR-grounded and open-domain VQA settings. Despite recent progress, challenges persist in cultural grounding, dialect inclusivity, dataset scale, and open-access ecosystem maturity. We outline actionable directions for scalable and culturally aligned Arabic multimodal intelligence, including parameter-efficient adaptation, broader corpus development, and unified evaluation protocols. By consolidating technical advances and empirical insights, this survey establishes a foundation to guide the next generation of Arabic-centric multimodal research.
Title: It's All Connected: A Survey for Multimodal Arabic AI
Description:
Abstract
Multimodal AI integrates text, vision, and speech within unified reasoning frameworks, yet Arabic remains significantly underrepresented due to diglossia, morphological complexity, and scarce multimodal resources.
This survey delivers the first comprehensive technical roadmap for Arabic multimodal AI, covering the progression from unimodal Arabic NLP, OCR, and ASR to recent Arabic-capable Multimodal Large Language Models (MLLMs).
We review available multimodal datasets, modality encoders, tokenization approaches, connector designs, and fusion strategies used in state-of-the-art systems.
We also provide the first consolidated evaluation of Arabic-capable MLLMs on multimodal benchmarks ARB and PEARL analyzing performance, robustness, and domain generalization across OCR-grounded and open-domain VQA settings.
Despite recent progress, challenges persist in cultural grounding, dialect inclusivity, dataset scale, and open-access ecosystem maturity.
We outline actionable directions for scalable and culturally aligned Arabic multimodal intelligence, including parameter-efficient adaptation, broader corpus development, and unified evaluation protocols.
By consolidating technical advances and empirical insights, this survey establishes a foundation to guide the next generation of Arabic-centric multimodal research.
Related Results
Imagined worldviews in John Lennon’s “Imagine”: a multimodal re-performance / Visões de mundo imaginadas no “Imagine” de John Lennon: uma re-performance multimodal
Imagined worldviews in John Lennon’s “Imagine”: a multimodal re-performance / Visões de mundo imaginadas no “Imagine” de John Lennon: uma re-performance multimodal
Abstract: This paper addresses the issue of multimodal re-performance, a concept developed by us, in view of the fact that the famous song “Imagine”, by John Lennon, was published ...
الإعلام العربي ومساهمته في ترويج اللغة العربية بالمجتمع الماليزي دراسة وصفية تحليلية
الإعلام العربي ومساهمته في ترويج اللغة العربية بالمجتمع الماليزي دراسة وصفية تحليلية
This study, entitled “Contribution of Arabic Media in Disseminating Arabic Language in Malaysian Society” aims at discovering the efforts of the Arabic-Malaysian media and its role...
قصيد”اللغة العربية تنعى حظها بين أهلها“ لحافظ ابراهيم: دراسة تحليلية
قصيد”اللغة العربية تنعى حظها بين أهلها“ لحافظ ابراهيم: دراسة تحليلية
Many Languages are spoken in the world. The diversity of human languages and colors are sign of Allah, for those of knowledge (Al-Quran, 30:22). Although the Arabic language origin...
Arabic Natural Language Processing
Arabic Natural Language Processing
The Arabic language presents researchers and developers of natural language processing (NLP) applications for Arabic text and speech with serious challenges. The purpose of this ar...
Arabic Learning for Academic Purposes
Arabic Learning for Academic Purposes
This study aimed to determine the goal of teaching Arabic for Academic purposes. Teaching Arabic for non-Arabic speakers is generally divided into two types: Arabic language for li...
Using Diacritics in the Arabic Script of Malay to Scaffold Arab Postgraduate Students in Reading Malay Words
Using Diacritics in the Arabic Script of Malay to Scaffold Arab Postgraduate Students in Reading Malay Words
Purpose – This study aims to investigate the use of diacritics in the Arabic script of Malay to facilitate Arab postgraduate students of UKM to read the Malay words accurately. It ...
Effective Arabic Language Teaching Strategies in the Language Laboratory for Students of Darussalam Gontor Islamic Institution
Effective Arabic Language Teaching Strategies in the Language Laboratory for Students of Darussalam Gontor Islamic Institution
Language is an important tool for the life of civilized man. Through language, people can communicate with each other, and convey their intentions and feelings to others. The moder...
AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model
AFR-BERT: Attention-based mechanism feature relevance fusion multimodal sentiment analysis model
Multimodal sentiment analysis is an essential task in natural language processing which refers to the fact that machines can analyze and recognize emotions through logical reasonin...

