Javascript must be enabled to continue!

An Improved Method In Speech Signal Input Representation Based On DTW Technique For NN Speech Recognition System

Kertas kerja ini membentangkan pemprosesan semula ciri pertuturan pemalar Pengekodan Ramalan Linear (LPC) bagi menyediakan template rujukan yang boleh diharapkan untuk set perkataan yang hendak dicam menggunakan rangkaian neural buatan. Kertas kerja ini juga mencadangkan penggunaan cirian kenyaringan yang ditakrifkan dari data pertuturan sebagai satu lagi ciri input. Algoritma Warping Masa Dinamik (DTW) menjadi asas kepada algoritma baru yang dibangunkan, ia dipanggil sebagai DTW padanan bingkai (DTW–FF). Algoritma ini direka bentuk untuk melakukan padanan bingkai bagi pemprosesan semula input LPC. Ia bertujuan untuk menyamakan bilangan bingkai input dalam set ujian dengan set rujukan. Pernormalan bingkaian ini adalah diperlukan oleh rangkaian neural yang direka untuk membanding data yang harus mempunyai kepanjangan yang sama, sedangkan perkataan yang sama dituturkan dengan kepanjangan yang berbeza–beza. Dengan melakukan padanan bingkai, bingkai input dan rujukan boleh diubahsuai supaya bilangan bingkaian sama seperti bingkaian rujukan. Satu lagi misi kertas kerja ini ialah mentakrif dan menggunakan cirian kenyaringan menggunakan algoritma penapis harmonik. Selepas kenyaringan ditakrif dan pemalar LPC dinormalkan kepada bilangan bingkaian dikehendaki, pengecaman pertuturan menggunakan rangkaian neural dilakukan. Keputusan yang baik diperoleh sehingga mencapai ketepatan setinggi 98% menggunakan kombinasi cirian DTW–FF dan cirian kenyaringan. Di akhir kertas kerja ini, perbandingan kadar convergence antara Conjugate gradient descent (CGD), Quasi–Newton, dan Steepest Gradient Descent (SGD) dilakukan untuk mendapatkan arah carian titik global yang optimal. Keputusan menunjukkan CGD memberikan nilai titik global yang paling optimal dibandingkan dengan Quasi–Newton dan SGD. Kata kunci: Warping masa dinamik, pernormalan masa, rangkaian neural, pengecaman pertuturan, conjugate gradient descent A pre–processing of linear predictive coefficient (LPC) features for preparation of reliable reference templates for the set of words to be recognized using the artificial neural network is presented in this paper. The paper also proposes the use of pitch feature derived from the recorded speech data as another input feature. The Dynamic Time Warping algorithm (DTW) is the back–bone of the newly developed algorithm called DTW fixing frame algorithm (DTW–FF) which is designed to perform template matching for the input preprocessing. The purpose of the new algorithm is to align the input frames in the test set to the template frames in the reference set. This frame normalization is required since NN is designed to compare data of the same length, however same speech varies in their length most of the time. By doing frame fixing, the input frames and the reference frames are adjusted to the same number of frames according to the reference frames. Another task of the study is to extract pitch features using the Harmonic Filter algorithm. After pitch extraction and linear predictive coefficient (LPC) features fixed to a desired number of frames, speech recognition using neural network can be performed and results showed a very promising solution. Result showed that as high as 98% recognition can be achieved using combination of two features mentioned above. At the end of the paper, a convergence comparison between conjugate gradient descent (CGD), Quasi–Newton, and steepest gradient descent (SGD) search direction is performed and results show that the CGD outperformed the Newton and SGD. Key words: Dynamic time warping, time normalization, neural network, speech recognition, conjugate gradient descent

Penerbit UTM Press

Rubita Sudirman Sh. Hussain Salleh Shaharuddin Salleh

Jurnal Teknologi

2013

Title: An Improved Method In Speech Signal Input Representation Based On DTW Technique For NN Speech Recognition System

Description:

Kertas kerja ini juga mencadangkan penggunaan cirian kenyaringan yang ditakrifkan dari data pertuturan sebagai satu lagi ciri input.

Algoritma Warping Masa Dinamik (DTW) menjadi asas kepada algoritma baru yang dibangunkan, ia dipanggil sebagai DTW padanan bingkai (DTW–FF).

Algoritma ini direka bentuk untuk melakukan padanan bingkai bagi pemprosesan semula input LPC.

Ia bertujuan untuk menyamakan bilangan bingkai input dalam set ujian dengan set rujukan.

Pernormalan bingkaian ini adalah diperlukan oleh rangkaian neural yang direka untuk membanding data yang harus mempunyai kepanjangan yang sama, sedangkan perkataan yang sama dituturkan dengan kepanjangan yang berbeza–beza.

Dengan melakukan padanan bingkai, bingkai input dan rujukan boleh diubahsuai supaya bilangan bingkaian sama seperti bingkaian rujukan.

Satu lagi misi kertas kerja ini ialah mentakrif dan menggunakan cirian kenyaringan menggunakan algoritma penapis harmonik.

Selepas kenyaringan ditakrif dan pemalar LPC dinormalkan kepada bilangan bingkaian dikehendaki, pengecaman pertuturan menggunakan rangkaian neural dilakukan.

Keputusan yang baik diperoleh sehingga mencapai ketepatan setinggi 98% menggunakan kombinasi cirian DTW–FF dan cirian kenyaringan.

Di akhir kertas kerja ini, perbandingan kadar convergence antara Conjugate gradient descent (CGD), Quasi–Newton, dan Steepest Gradient Descent (SGD) dilakukan untuk mendapatkan arah carian titik global yang optimal.

Keputusan menunjukkan CGD memberikan nilai titik global yang paling optimal dibandingkan dengan Quasi–Newton dan SGD.

Kata kunci: Warping masa dinamik, pernormalan masa, rangkaian neural, pengecaman pertuturan, conjugate gradient descent A pre–processing of linear predictive coefficient (LPC) features for preparation of reliable reference templates for the set of words to be recognized using the artificial neural network is presented in this paper.

The paper also proposes the use of pitch feature derived from the recorded speech data as another input feature.

The Dynamic Time Warping algorithm (DTW) is the back–bone of the newly developed algorithm called DTW fixing frame algorithm (DTW–FF) which is designed to perform template matching for the input preprocessing.

The purpose of the new algorithm is to align the input frames in the test set to the template frames in the reference set.

This frame normalization is required since NN is designed to compare data of the same length, however same speech varies in their length most of the time.

By doing frame fixing, the input frames and the reference frames are adjusted to the same number of frames according to the reference frames.

Another task of the study is to extract pitch features using the Harmonic Filter algorithm.

After pitch extraction and linear predictive coefficient (LPC) features fixed to a desired number of frames, speech recognition using neural network can be performed and results showed a very promising solution.

Result showed that as high as 98% recognition can be achieved using combination of two features mentioned above.

At the end of the paper, a convergence comparison between conjugate gradient descent (CGD), Quasi–Newton, and steepest gradient descent (SGD) search direction is performed and results show that the CGD outperformed the Newton and SGD.

Key words: Dynamic time warping, time normalization, neural network, speech recognition, conjugate gradient descent.

Back

Standardized driving cycles often fail to represent real-world driving conditions, limiting their effectiveness for hybrid electric vehicles energy management strategy (EMS) optimi...

Hydatid Disease of The Brain Parenchyma: A Systematic Review

Abstarct Introduction Isolated brain hydatid disease (BHD) is an extremely rare form of echinococcosis. A prompt and timely diagnosis is a crucial step in disease management. This ...

Advantages of DTW windowing function in automated correlation of stratigraphic time series

The Dynamical Time Warping (DTW) technique has been originally developed for speech recognition in the late 1960s and early 1970s and has more recently been applied in geoscientifi...

Multimodal Emotion Recognition and Human Computer Interaction for AI-Driven Mental Health Support (Preprint)

BACKGROUND Mental health has become one of the most urgent global health issues of the twenty-first century. The World Health Organization (WHO) reports tha...

DESAIN ULANG LOGO DAN CITRA: UPAYA REBRANDING DAYA TARIK WISATA JATILUWIH, KABUPATEN TABANAN

Pengabdian masyarakat desain ulang logo dan citra di Daya Tarik Wisata (DTW) Jatiluwih, Desa Jatiluwih, Kabupaten Tabanan menjadi sebuah upaya Manajemen Operasional (MO) DTW Jatilu...

The use of geographic information systems and remote sensing to evaluate climate change effect on groundwater: application to Mostaganem Plateau, Northwest Algeria

Effects of climate change in semi-arid areas occur in drought events, which affect aquifers whose recharge depends essentially on precipitation. The objective of this study is to e...

EPD Electronic Pathogen Detection v1

Electronic pathogen detection (EPD) is a non - invasive, rapid, affordable, point- of- care test, for Covid 19 resulting from infection with SARS-CoV-2 virus. EPD scanning techno...

Sound signal analysis in Japanese speech recognition based on deep learning algorithm

Abstract As an important carrier of information, since sound can be collected quickly and is not limited by angle and light, it is often used to assist in understanding the...

Email:
Password:

Email:

An Improved Method In Speech Signal Input Representation Based On DTW Technique For NN Speech Recognition System

Related Results