Javascript must be enabled to continue!

A Multimodal Fusion Framework for Early Detection of Cognitive Impairment in Chinese Speakers Using Pinyin Sequences and Acoustic Features

Abstract Background Alzheimer's disease (AD) is a leading cause of dementia, and traditional diagnostic methods like cerebrospinal fluid testing and PET imaging are invasive, costly, and limit early detection. Language biomarker analysis offers a non‐invasive, efficient alternative to detect cognitive impairments through speech. However, in Chinese, the presence of homophones often leads to transcription errors, which may reduce model accuracy. Converting text to Pinyin sequences can minimize ambiguity, enhancing detection. This study proposes a novel speech‐based method to improve cognitive impairment detection accuracy with greater efficiency. Method This study utilized a systematic approach to differentiate individuals with cognitive impairment from healthy controls (HC). With approval from the hospital ethics committee, data from 300 participants in the China Preclinical Alzheimer's Disease Study (C‐PAS) cohort were extracted. Audio data were transcribed using iFLYTEK's speech recognition tool and converted into Pinyin sequences. Acoustic features, such as pause frequency and silent time, were extracted using OpenSMILE, and MFCC features were also incorporated. These features, along with demographic variables, formed comprehensive digital signatures for model training. To address the small sample size, data augmentation techniques such as introducing noise to numerical features and simulating word omissions, repetitions, and replacements in Pinyin sequences were applied. A Bi‐directional LSTM model, known for capturing context and semantic relevance, was employed to fuse Pinyin sequences with numerical features and optimize classification performance. Result The proposed method achieved an accuracy of 93.80% and an Area Under the Curve (AUC) of 0.93, demonstrating its superior performance compared to models trained solely on acoustic features or cognitive test scores. Ablation experiments revealed that combining pinyin sequences with acoustic features significantly enhanced model performance, emphasizing the importance of integrating both linguistic and acoustic data for detecting Alzheimer's disease in Chinese. Conclusion This study demonstrates the feasibility and effectiveness of integrating Pinyin sequences and acoustic features for non‐invasive Alzheimer's detection in Chinese. These findings providing a practical tool for early screening and paves the way for larger‐scale studies and potential clinical application.

Wiley

Liangwei Tao Zhixing Zhou Huanhuan Xia Quan Chen Yiming Li

Alzheimer's & Dementia

2025

Title: A Multimodal Fusion Framework for Early Detection of Cognitive Impairment in Chinese Speakers Using Pinyin Sequences and Acoustic Features

Description:

Language biomarker analysis offers a non‐invasive, efficient alternative to detect cognitive impairments through speech.

However, in Chinese, the presence of homophones often leads to transcription errors, which may reduce model accuracy.

Converting text to Pinyin sequences can minimize ambiguity, enhancing detection.

This study proposes a novel speech‐based method to improve cognitive impairment detection accuracy with greater efficiency.

Method This study utilized a systematic approach to differentiate individuals with cognitive impairment from healthy controls (HC).

With approval from the hospital ethics committee, data from 300 participants in the China Preclinical Alzheimer's Disease Study (C‐PAS) cohort were extracted.

Audio data were transcribed using iFLYTEK's speech recognition tool and converted into Pinyin sequences.

Acoustic features, such as pause frequency and silent time, were extracted using OpenSMILE, and MFCC features were also incorporated.

These features, along with demographic variables, formed comprehensive digital signatures for model training.

To address the small sample size, data augmentation techniques such as introducing noise to numerical features and simulating word omissions, repetitions, and replacements in Pinyin sequences were applied.

A Bi‐directional LSTM model, known for capturing context and semantic relevance, was employed to fuse Pinyin sequences with numerical features and optimize classification performance.

Result The proposed method achieved an accuracy of 93.

80% and an Area Under the Curve (AUC) of 0.

93, demonstrating its superior performance compared to models trained solely on acoustic features or cognitive test scores.

Ablation experiments revealed that combining pinyin sequences with acoustic features significantly enhanced model performance, emphasizing the importance of integrating both linguistic and acoustic data for detecting Alzheimer's disease in Chinese.

Conclusion This study demonstrates the feasibility and effectiveness of integrating Pinyin sequences and acoustic features for non‐invasive Alzheimer's detection in Chinese.

These findings providing a practical tool for early screening and paves the way for larger‐scale studies and potential clinical application.

Back

Pinyin is an alphabetic script that denotes pronunciations of Chinese characters. Studies have shown that Pinyin instruction enhances both phonological awareness (e.g., Shu et al.,...

Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)

BACKGROUND Mental health has become one of the most urgent global health issues of the twenty-first century. The World Health Organization (WHO) reports tha...

The Nuclear Fusion Award

The Nuclear Fusion Award ceremony for 2009 and 2010 award winners was held during the 23rd IAEA Fusion Energy Conference in Daejeon. This time, both 2009 and 2010 award winners w...

Outcomes of Transplant-Eligible Patients with Myelodysplastic Syndrome-Refractory Anemia with Excess Blasts Registered in a Prospective Observational Study: The JALSG-CS11-MDS-SCT

Abstract Introduction: Allogeneic hematopoietic stem cell transplantation (allo-SCT) is the sole curative therapy for myelodysplastic syndromes (MDS). Several studie...

An Analysis of the Hanyu Pinyin Errors of the Indonesian Students

Pinyin is a valuable tool for students who learn Chinese as a second language. Through years of teaching practice and observation, some errors in Indonesian students’ understanding...

Feature selection for multimodal: acoustic event detection

The detection of the Acoustic Events (AEs) naturally produced in a meeting room may help to describe the human and social activity. The automatic description of interactions betwee...

Subjective audiometric measures in individuals with repeated acoustic trauma in the combat zone

Intense sound exposure that exceeds the pain threshold of human auditory sensitivity, known as acoustic trauma, causes significant and extensive changes in the auditory system. Thr...

Development of a multimodal imaging system based on LIDAR

(English) Perception of the environment is an essential requirement for the fields of autonomous vehicles and robotics, that claim for high amounts of data to make reliable decisio...

Email:
Password:

Email:

A Multimodal Fusion Framework for Early Detection of Cognitive Impairment in Chinese Speakers Using Pinyin Sequences and Acoustic Features

Related Results