Javascript must be enabled to continue!
A Multimodal Fusion Framework for Early Detection of Cognitive Impairment in Chinese Speakers Using Pinyin Sequences and Acoustic Features
View through CrossRef
Abstract
Background
Alzheimer's disease (AD) is a leading cause of dementia, and traditional diagnostic methods like cerebrospinal fluid testing and PET imaging are invasive, costly, and limit early detection. Language biomarker analysis offers a non‐invasive, efficient alternative to detect cognitive impairments through speech. However, in Chinese, the presence of homophones often leads to transcription errors, which may reduce model accuracy. Converting text to Pinyin sequences can minimize ambiguity, enhancing detection. This study proposes a novel speech‐based method to improve cognitive impairment detection accuracy with greater efficiency.
Method
This study utilized a systematic approach to differentiate individuals with cognitive impairment from healthy controls (HC). With approval from the hospital ethics committee, data from 300 participants in the China Preclinical Alzheimer's Disease Study (C‐PAS) cohort were extracted. Audio data were transcribed using iFLYTEK's speech recognition tool and converted into Pinyin sequences. Acoustic features, such as pause frequency and silent time, were extracted using OpenSMILE, and MFCC features were also incorporated. These features, along with demographic variables, formed comprehensive digital signatures for model training. To address the small sample size, data augmentation techniques such as introducing noise to numerical features and simulating word omissions, repetitions, and replacements in Pinyin sequences were applied. A Bi‐directional LSTM model, known for capturing context and semantic relevance, was employed to fuse Pinyin sequences with numerical features and optimize classification performance.
Result
The proposed method achieved an accuracy of 93.80% and an Area Under the Curve (AUC) of 0.93, demonstrating its superior performance compared to models trained solely on acoustic features or cognitive test scores. Ablation experiments revealed that combining pinyin sequences with acoustic features significantly enhanced model performance, emphasizing the importance of integrating both linguistic and acoustic data for detecting Alzheimer's disease in Chinese.
Conclusion
This study demonstrates the feasibility and effectiveness of integrating Pinyin sequences and acoustic features for non‐invasive Alzheimer's detection in Chinese. These findings providing a practical tool for early screening and paves the way for larger‐scale studies and potential clinical application.
Title: A Multimodal Fusion Framework for Early Detection of Cognitive Impairment in Chinese Speakers Using Pinyin Sequences and Acoustic Features
Description:
Abstract
Background
Alzheimer's disease (AD) is a leading cause of dementia, and traditional diagnostic methods like cerebrospinal fluid testing and PET imaging are invasive, costly, and limit early detection.
Language biomarker analysis offers a non‐invasive, efficient alternative to detect cognitive impairments through speech.
However, in Chinese, the presence of homophones often leads to transcription errors, which may reduce model accuracy.
Converting text to Pinyin sequences can minimize ambiguity, enhancing detection.
This study proposes a novel speech‐based method to improve cognitive impairment detection accuracy with greater efficiency.
Method
This study utilized a systematic approach to differentiate individuals with cognitive impairment from healthy controls (HC).
With approval from the hospital ethics committee, data from 300 participants in the China Preclinical Alzheimer's Disease Study (C‐PAS) cohort were extracted.
Audio data were transcribed using iFLYTEK's speech recognition tool and converted into Pinyin sequences.
Acoustic features, such as pause frequency and silent time, were extracted using OpenSMILE, and MFCC features were also incorporated.
These features, along with demographic variables, formed comprehensive digital signatures for model training.
To address the small sample size, data augmentation techniques such as introducing noise to numerical features and simulating word omissions, repetitions, and replacements in Pinyin sequences were applied.
A Bi‐directional LSTM model, known for capturing context and semantic relevance, was employed to fuse Pinyin sequences with numerical features and optimize classification performance.
Result
The proposed method achieved an accuracy of 93.
80% and an Area Under the Curve (AUC) of 0.
93, demonstrating its superior performance compared to models trained solely on acoustic features or cognitive test scores.
Ablation experiments revealed that combining pinyin sequences with acoustic features significantly enhanced model performance, emphasizing the importance of integrating both linguistic and acoustic data for detecting Alzheimer's disease in Chinese.
Conclusion
This study demonstrates the feasibility and effectiveness of integrating Pinyin sequences and acoustic features for non‐invasive Alzheimer's detection in Chinese.
These findings providing a practical tool for early screening and paves the way for larger‐scale studies and potential clinical application.
Related Results
A computer‐based Pinyin intervention for disadvantaged children in China: Effects on Pinyin skills, phonological awareness, and character reading
A computer‐based Pinyin intervention for disadvantaged children in China: Effects on Pinyin skills, phonological awareness, and character reading
Pinyin is an alphabetic script that denotes pronunciations of Chinese characters. Studies have shown that Pinyin instruction enhances both phonological awareness (e.g., Shu et al.,...
Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)
Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)
BACKGROUND
Mental health has become one of the most urgent global health issues of the twenty-first century. The World Health Organization (WHO) reports tha...
The Nuclear Fusion Award
The Nuclear Fusion Award
The Nuclear Fusion Award ceremony for 2009 and 2010 award winners was held during the 23rd IAEA Fusion Energy Conference in Daejeon. This time, both 2009 and 2010 award winners w...
Outcomes of Transplant-Eligible Patients with Myelodysplastic Syndrome-Refractory Anemia with Excess Blasts Registered in a Prospective Observational Study: The JALSG-CS11-MDS-SCT
Outcomes of Transplant-Eligible Patients with Myelodysplastic Syndrome-Refractory Anemia with Excess Blasts Registered in a Prospective Observational Study: The JALSG-CS11-MDS-SCT
Abstract
Introduction: Allogeneic hematopoietic stem cell transplantation (allo-SCT) is the sole curative therapy for myelodysplastic syndromes (MDS). Several studie...
An Analysis of the Hanyu Pinyin Errors of the Indonesian Students
An Analysis of the Hanyu Pinyin Errors of the Indonesian Students
Pinyin is a valuable tool for students who learn Chinese as a second language. Through years of teaching practice and observation, some errors in Indonesian students’ understanding...
Subjective audiometric measures in individuals with repeated acoustic trauma in the combat zone
Subjective audiometric measures in individuals with repeated acoustic trauma in the combat zone
Intense sound exposure that exceeds the pain threshold of human auditory sensitivity, known as acoustic trauma, causes significant and extensive changes in the auditory system. Thr...
Feature selection for multimodal: acoustic event detection
Feature selection for multimodal: acoustic event detection
The detection of the Acoustic Events (AEs) naturally produced in a meeting room may help to describe the human and social activity. The automatic description of interactions betwee...
Development of a multimodal imaging system based on LIDAR
Development of a multimodal imaging system based on LIDAR
(English) Perception of the environment is an essential requirement for the fields of autonomous vehicles and robotics, that claim for high amounts of data to make reliable decisio...

