Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

A Multimodal Fusion Framework for Early Detection of Cognitive Impairment in Chinese Speakers Using Pinyin Sequences and Acoustic Features

View through CrossRef
Abstract Background Alzheimer's disease (AD) is a leading cause of dementia, and traditional diagnostic methods like cerebrospinal fluid testing and PET imaging are invasive, costly, and limit early detection. Language biomarker analysis offers a non‐invasive, efficient alternative to detect cognitive impairments through speech. However, in Chinese, the presence of homophones often leads to transcription errors, which may reduce model accuracy. Converting text to Pinyin sequences can minimize ambiguity, enhancing detection. This study proposes a novel speech‐based method to improve cognitive impairment detection accuracy with greater efficiency. Method This study utilized a systematic approach to differentiate individuals with cognitive impairment from healthy controls (HC). With approval from the hospital ethics committee, data from 300 participants in the China Preclinical Alzheimer's Disease Study (C‐PAS) cohort were extracted. Audio data were transcribed using iFLYTEK's speech recognition tool and converted into Pinyin sequences. Acoustic features, such as pause frequency and silent time, were extracted using OpenSMILE, and MFCC features were also incorporated. These features, along with demographic variables, formed comprehensive digital signatures for model training. To address the small sample size, data augmentation techniques such as introducing noise to numerical features and simulating word omissions, repetitions, and replacements in Pinyin sequences were applied. A Bi‐directional LSTM model, known for capturing context and semantic relevance, was employed to fuse Pinyin sequences with numerical features and optimize classification performance. Result The proposed method achieved an accuracy of 93.80% and an Area Under the Curve (AUC) of 0.93, demonstrating its superior performance compared to models trained solely on acoustic features or cognitive test scores. Ablation experiments revealed that combining pinyin sequences with acoustic features significantly enhanced model performance, emphasizing the importance of integrating both linguistic and acoustic data for detecting Alzheimer's disease in Chinese. Conclusion This study demonstrates the feasibility and effectiveness of integrating Pinyin sequences and acoustic features for non‐invasive Alzheimer's detection in Chinese. These findings providing a practical tool for early screening and paves the way for larger‐scale studies and potential clinical application.
Title: A Multimodal Fusion Framework for Early Detection of Cognitive Impairment in Chinese Speakers Using Pinyin Sequences and Acoustic Features
Description:
Abstract Background Alzheimer's disease (AD) is a leading cause of dementia, and traditional diagnostic methods like cerebrospinal fluid testing and PET imaging are invasive, costly, and limit early detection.
Language biomarker analysis offers a non‐invasive, efficient alternative to detect cognitive impairments through speech.
However, in Chinese, the presence of homophones often leads to transcription errors, which may reduce model accuracy.
Converting text to Pinyin sequences can minimize ambiguity, enhancing detection.
This study proposes a novel speech‐based method to improve cognitive impairment detection accuracy with greater efficiency.
Method This study utilized a systematic approach to differentiate individuals with cognitive impairment from healthy controls (HC).
With approval from the hospital ethics committee, data from 300 participants in the China Preclinical Alzheimer's Disease Study (C‐PAS) cohort were extracted.
Audio data were transcribed using iFLYTEK's speech recognition tool and converted into Pinyin sequences.
Acoustic features, such as pause frequency and silent time, were extracted using OpenSMILE, and MFCC features were also incorporated.
These features, along with demographic variables, formed comprehensive digital signatures for model training.
To address the small sample size, data augmentation techniques such as introducing noise to numerical features and simulating word omissions, repetitions, and replacements in Pinyin sequences were applied.
A Bi‐directional LSTM model, known for capturing context and semantic relevance, was employed to fuse Pinyin sequences with numerical features and optimize classification performance.
Result The proposed method achieved an accuracy of 93.
80% and an Area Under the Curve (AUC) of 0.
93, demonstrating its superior performance compared to models trained solely on acoustic features or cognitive test scores.
Ablation experiments revealed that combining pinyin sequences with acoustic features significantly enhanced model performance, emphasizing the importance of integrating both linguistic and acoustic data for detecting Alzheimer's disease in Chinese.
Conclusion This study demonstrates the feasibility and effectiveness of integrating Pinyin sequences and acoustic features for non‐invasive Alzheimer's detection in Chinese.
These findings providing a practical tool for early screening and paves the way for larger‐scale studies and potential clinical application.

Related Results

Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)
Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)
BACKGROUND Mental health has become one of the most urgent global health issues of the twenty-first century. The World Health Organization (WHO) reports tha...
The Nuclear Fusion Award
The Nuclear Fusion Award
The Nuclear Fusion Award ceremony for 2009 and 2010 award winners was held during the 23rd IAEA Fusion Energy Conference in Daejeon. This time, both 2009 and 2010 award winners w...
An Analysis of the Hanyu Pinyin Errors of the Indonesian Students
An Analysis of the Hanyu Pinyin Errors of the Indonesian Students
Pinyin is a valuable tool for students who learn Chinese as a second language. Through years of teaching practice and observation, some errors in Indonesian students’ understanding...
Subjective audiometric measures in individuals with repeated acoustic trauma in the combat zone
Subjective audiometric measures in individuals with repeated acoustic trauma in the combat zone
Intense sound exposure that exceeds the pain threshold of human auditory sensitivity, known as acoustic trauma, causes significant and extensive changes in the auditory system. Thr...
Feature selection for multimodal: acoustic event detection
Feature selection for multimodal: acoustic event detection
The detection of the Acoustic Events (AEs) naturally produced in a meeting room may help to describe the human and social activity. The automatic description of interactions betwee...
Development of a multimodal imaging system based on LIDAR
Development of a multimodal imaging system based on LIDAR
(English) Perception of the environment is an essential requirement for the fields of autonomous vehicles and robotics, that claim for high amounts of data to make reliable decisio...

Back to Top