Javascript must be enabled to continue!
Sound signal analysis in Japanese speech recognition based on deep learning algorithm
View through CrossRef
Abstract
As an important carrier of information, since sound can be collected quickly and is not limited by angle and light, it is often used to assist in understanding the environment and creating information. Voice signal recognition technology is a typical speech recognition application. This article focuses on the voice signal recognition technology around various deep learning models. By using deep learning neural networks with different structures and different types, information and representations related to the recognition of sound signal samples can be obtained, so as to further improve the detection accuracy of the sound signal recognition detection system. Based on this, this paper proposes an enhanced deep learning model of multi-scale neural convolutional network and uses it to recognize sound signals. The CCCP layer is used to reduce the dimensionality of the underlying feature map, so that the units captured in the network will eventually have internal features in each layer, thereby retaining the feature information to the maximum extent, which will form a convolutional multi-scale model in network deep learning Neurons. Finally, the article discusses the related issues of Japanese speech recognition on this basis. This article first uses the font (gra-phonem), that is, all these Japanese kana and common Chinese characters, using a total of 2795 units for modeling. There is a big gap between the experiment and the (BiLSTM-HMM) system. In addition, when Japanese speech is known, it is incorporated into the end-to-end recognition system to improve the performance of the Japanese speech recognition system. Based on the above-mentioned deep learning and sound signal analysis experiments and principles, the final effect obtained is better than the main effect of the Japanese speech recognition system of the latent Markov model and the long-short memory network, thus promoting its development.
Title: Sound signal analysis in Japanese speech recognition based on deep learning algorithm
Description:
Abstract
As an important carrier of information, since sound can be collected quickly and is not limited by angle and light, it is often used to assist in understanding the environment and creating information.
Voice signal recognition technology is a typical speech recognition application.
This article focuses on the voice signal recognition technology around various deep learning models.
By using deep learning neural networks with different structures and different types, information and representations related to the recognition of sound signal samples can be obtained, so as to further improve the detection accuracy of the sound signal recognition detection system.
Based on this, this paper proposes an enhanced deep learning model of multi-scale neural convolutional network and uses it to recognize sound signals.
The CCCP layer is used to reduce the dimensionality of the underlying feature map, so that the units captured in the network will eventually have internal features in each layer, thereby retaining the feature information to the maximum extent, which will form a convolutional multi-scale model in network deep learning Neurons.
Finally, the article discusses the related issues of Japanese speech recognition on this basis.
This article first uses the font (gra-phonem), that is, all these Japanese kana and common Chinese characters, using a total of 2795 units for modeling.
There is a big gap between the experiment and the (BiLSTM-HMM) system.
In addition, when Japanese speech is known, it is incorporated into the end-to-end recognition system to improve the performance of the Japanese speech recognition system.
Based on the above-mentioned deep learning and sound signal analysis experiments and principles, the final effect obtained is better than the main effect of the Japanese speech recognition system of the latent Markov model and the long-short memory network, thus promoting its development.
Related Results
Zero to hero
Zero to hero
Western images of Japan tell a seemingly incongruous story of love, sex and marriage – one full of contradictions and conflicting moral codes. We sometimes hear intriguing stories ...
Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)
Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)
BACKGROUND
Mental health has become one of the most urgent global health issues of the twenty-first century. The World Health Organization (WHO) reports tha...
Speech, communication, and neuroimaging in Parkinson's disease : characterisation and intervention outcomes
Speech, communication, and neuroimaging in Parkinson's disease : characterisation and intervention outcomes
<p dir="ltr">Most individuals with Parkinson's disease (PD) experience changes in speech, voice or communication. Speech changes often manifest as hypokinetic dysarthria, a m...
Speech, communication, and neuroimaging in Parkinson's disease : characterisation and intervention outcomes
Speech, communication, and neuroimaging in Parkinson's disease : characterisation and intervention outcomes
<p dir="ltr">Most individuals with Parkinson's disease (PD) experience changes in speech, voice or communication. Speech changes often manifest as hypokinetic dysarthria, a m...
Speech, communication, and neuroimaging in Parkinson's disease : Characterisation and intervention outcomes
Speech, communication, and neuroimaging in Parkinson's disease : Characterisation and intervention outcomes
<p dir="ltr">Most individuals with Parkinson's disease (PD) experience changes in speech, voice or communication. Speech changes often manifest as hypokinetic dysarthria, a m...
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND
As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...
A recognition method research based on the heart sound texture map
A recognition method research based on the heart sound texture map
In order to improve the Heart Sound recognition rate and reduce the recognition time, in this paper, we introduces a new method for Heart Sound pattern recognition by using Heart S...
Classification of Bisyllabic Lexical Stress Patterns Using Deep Neural Networks
Classification of Bisyllabic Lexical Stress Patterns Using Deep Neural Networks
Background and Objectives: As English is a stress-timed language, lexical stress plays an important role in the perception and processing of speech by native speakers. Incorrect st...

