Javascript must be enabled to continue!

Bidirectional Translation of ASL and English Using Machine Vision and CNN and Transformer Networks

This study aims to develop a system for translating American Sign Language (ASL) to and from English, enhancing accessibility for ASL users. We leveraged a publicly available dataset to train a model that accurately predicts ASL signs and their English translations. The system employs AI-based transformers for bidirectional translation: converting text and speech into ASL using computer vision and translating ASL signs into text. For user accessibility, we built a web-based interface that integrates a computer vision framework (MediaPipe) to detect key body landmarks, including hands, shoulders, and facial features. This enables the system to process text, speech input, and video recordings, which are stored using msgpack and analyzed to generate ASL imagery. Additionally, we are developing a transformer model that is trained jointly on pairs of gloss sequences and sentences using connectionist temporal classification (CTC) and cross-entropy loss. Along with that, we are utilizing an EfficientNet-B0 pretrained on the ImageNet dataset with 1D convolution blocks to extract features from video frames, helping facilitate the conversion of ASL signs into structured English text.

MDPI AG

Stefanie Amiruzzaman Md. Amiruzzaman James Dracup Alexander Pham Benjamin Crocker Linh Ngo And M. Ali Akber Dewan

2025

Title: Bidirectional Translation of ASL and English Using Machine Vision and CNN and Transformer Networks

Description:

This study aims to develop a system for translating American Sign Language (ASL) to and from English, enhancing accessibility for ASL users.

We leveraged a publicly available dataset to train a model that accurately predicts ASL signs and their English translations.

The system employs AI-based transformers for bidirectional translation: converting text and speech into ASL using computer vision and translating ASL signs into text.

For user accessibility, we built a web-based interface that integrates a computer vision framework (MediaPipe) to detect key body landmarks, including hands, shoulders, and facial features.

This enables the system to process text, speech input, and video recordings, which are stored using msgpack and analyzed to generate ASL imagery.

Additionally, we are developing a transformer model that is trained jointly on pairs of gloss sequences and sentences using connectionist temporal classification (CTC) and cross-entropy loss.

Along with that, we are utilizing an EfficientNet-B0 pretrained on the ImageNet dataset with 1D convolution blocks to extract features from video frames, helping facilitate the conversion of ASL signs into structured English text.

Back

In American Sign Language (ASL), a receiver watches the signer and receives language visually. In contrast, when using tactile ASL, a variety of ASL, the deaf-blind receiver receiv...

Aviation English - A global perspective: analysis, teaching, assessment

This e-book brings together 13 chapters written by aviation English researchers and practitioners settled in six different countries, representing institutions and universities fro...

Automatic Load Sharing of Transformer

Transformer plays a major role in the power system. It works 24 hours a day and provides power to the load. The transformer is excessive full, its windings are overheated which lea...

Bidirectional Translation of ASL and English Using Machine Vision and CNN and Transformer Networks

This study presents a real-time, bidirectional system for translating American Sign Language (ASL) to and from English using computer vision and transformer-based models to enhance...

The First Insight into the Hereditary Fusion Gene Landscape of Amyotrophic Lateral Sclerosis

Abstract Amyotrophic lateral sclerosis (ALS) is a progressive nervous system disease that causes loss of muscle control. Over 30 mutated genes ar...

High frequency modeling of power transformers under transients

This thesis presents the results related to high frequency modeling of power transformers. First, a 25kVA distribution transformer under lightning surges is tested in the laborator...

Aspects of Rhythm in ASL

The fluent production of American Sign Language (ASL), like speech involves highly skilled, complex motor activity. Thus, like all skilled motor acts, it is rhythmically structured...

570-P: Designing the Deaf Diabetes Can Together Intervention

Deaf individuals who communicate using American Sign Language (ASL) are 3 times more likely to have diabetes than hearing people, yet experience challenges obtaining diabetes educa...

Email:
Password:

Email:

Bidirectional Translation of ASL and English Using Machine Vision and CNN and Transformer Networks

Related Results