Javascript must be enabled to continue!
Lip2Text: Sentence-Level Lipreading on English Speakers Using the Deep Learning Approach
View through CrossRef
Abstract
Lipreading is the skill of visually analyzing lip movements and facial cues to understand spoken language. This valuable skill finds application in assisting individuals with hearing impairments and providing auditory information in noisy environments. Advancements in deep learning have revolutionized lipreading with the use of networks such as Temporal Convolutional Networks(TCNs), Gated Recurrent Units(GRUs), Recurrent Neural Networks(RNNs) and Long Short Term Memory Networks(LSTMs) which perform exceptionally well on sequential data such as lip sequences. They excel at recognizing subtle lip movements of the speaker to predict what is being spoken. Most of the current approaches perform extremely well on speakers on which the model has been trained as they learn the speaker-specific visual cues thoroughly, but the obstacle lies in dealing with unseen speakers. To overcome this challenge, we have proposed a deep learning framework based on Long Short Term Memory Networks(LSTMs) and Spatiotemporal Convolutional Neural Networks (STCNNs) which focuses on accurately mapping lip sequences to their corresponding text trained on a multi-speaker dataset with unconstrained, open vocabulary. The model uses a Wav2Lip-GFPGAN(Generative Facial Prior- Generative Adversarial Network) network which converts the input video into a video of the dataset’s speaker speaking the same utterances using lip-synchronisation and then applied LipNet model for Lip-Reading after that applied spell corrector by using python Text-Blob library. This is a novel approach in the domain of lipreading which makes our model speaker-independent, increasing the applications of this work immensely in multiple fields, including assisting hearing-impaired individuals and enhancing communication accessibility. The other application of this proposed work may be lipreading of suspicious people from CCTV footage can potentially offer some benefits in certain security and investigative contexts. However, it's important to note that lipreading is a challenging task and often relies on various factors, including video quality, lighting conditions, and the same has been tackled in this work.
Springer Science and Business Media LLC
Title: Lip2Text: Sentence-Level Lipreading on English Speakers Using the Deep Learning Approach
Description:
Abstract
Lipreading is the skill of visually analyzing lip movements and facial cues to understand spoken language.
This valuable skill finds application in assisting individuals with hearing impairments and providing auditory information in noisy environments.
Advancements in deep learning have revolutionized lipreading with the use of networks such as Temporal Convolutional Networks(TCNs), Gated Recurrent Units(GRUs), Recurrent Neural Networks(RNNs) and Long Short Term Memory Networks(LSTMs) which perform exceptionally well on sequential data such as lip sequences.
They excel at recognizing subtle lip movements of the speaker to predict what is being spoken.
Most of the current approaches perform extremely well on speakers on which the model has been trained as they learn the speaker-specific visual cues thoroughly, but the obstacle lies in dealing with unseen speakers.
To overcome this challenge, we have proposed a deep learning framework based on Long Short Term Memory Networks(LSTMs) and Spatiotemporal Convolutional Neural Networks (STCNNs) which focuses on accurately mapping lip sequences to their corresponding text trained on a multi-speaker dataset with unconstrained, open vocabulary.
The model uses a Wav2Lip-GFPGAN(Generative Facial Prior- Generative Adversarial Network) network which converts the input video into a video of the dataset’s speaker speaking the same utterances using lip-synchronisation and then applied LipNet model for Lip-Reading after that applied spell corrector by using python Text-Blob library.
This is a novel approach in the domain of lipreading which makes our model speaker-independent, increasing the applications of this work immensely in multiple fields, including assisting hearing-impaired individuals and enhancing communication accessibility.
The other application of this proposed work may be lipreading of suspicious people from CCTV footage can potentially offer some benefits in certain security and investigative contexts.
However, it's important to note that lipreading is a challenging task and often relies on various factors, including video quality, lighting conditions, and the same has been tackled in this work.
Related Results
Aviation English - A global perspective: analysis, teaching, assessment
Aviation English - A global perspective: analysis, teaching, assessment
This e-book brings together 13 chapters written by aviation English researchers and practitioners settled in six different countries, representing institutions and universities fro...
Outcomes of Transplant-Eligible Patients with Myelodysplastic Syndrome-Refractory Anemia with Excess Blasts Registered in a Prospective Observational Study: The JALSG-CS11-MDS-SCT
Outcomes of Transplant-Eligible Patients with Myelodysplastic Syndrome-Refractory Anemia with Excess Blasts Registered in a Prospective Observational Study: The JALSG-CS11-MDS-SCT
Abstract
Introduction: Allogeneic hematopoietic stem cell transplantation (allo-SCT) is the sole curative therapy for myelodysplastic syndromes (MDS). Several studie...
Funkcije komunikacijski relevantne šutnje u njemačkome
Funkcije komunikacijski relevantne šutnje u njemačkome
Additionally, this chapter presents research of silence with review of main aspects of papers in the field of conversational analysis, ethnography of communication and metaphor of ...
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
Pola Fungsi Kalimat pada Novel “Pulang” Karya Tere Liye dan Kelayakannya sebagai Materi Pengayaan Siswa Kelas Xll SMA
Pola Fungsi Kalimat pada Novel “Pulang” Karya Tere Liye dan Kelayakannya sebagai Materi Pengayaan Siswa Kelas Xll SMA
Understanding sentence function patterns plays a major role in reading a novel, especially in class XII. By studying the understanding of sentence function patterns, class XII stud...
How commitment affects trust in communication: coordination, confidence and evidence
How commitment affects trust in communication: coordination, confidence and evidence
Given the risks of defection and misinformation, humans have evolved mechanisms of strategic vigilance to evaluate speakers’ disposition to be good partners (Heintz et al., 2016) a...
Sensory integration of speech by a profoundly deaf subject using tactile aids
Sensory integration of speech by a profoundly deaf subject using tactile aids
Previous research on tactual speech perception has focused on the relative contributions of lipreading and taction with normally hearing subjects. The integration of information fr...
Typical lipreading and audiovisual speech perception without motor simulation
Typical lipreading and audiovisual speech perception without motor simulation
ABSTRACT
All it takes is a face to face conversation in a noisy environment to realize that viewing a speaker’s lip movements contributes to speech comprehension. F...

