Javascript must be enabled to continue!

Symmetric Combined Convolution with Convolutional Long Short-Term Memory for Monaural Speech Enhancement

Deep neural network-based approaches have obtained remarkable progress in monaural speech enhancement. Nevertheless, current cutting-edge approaches remain vulnerable to complex acoustic scenarios. We propose a Symmetric Combined Convolution Network with ConvLSTM (SCCN) for monaural speech enhancement. Specifically, the Combined Convolution Block utilizes parallel convolution branches, including standard convolution and two different depthwise separable convolutions, to reinforce feature extraction in depthwise and channelwise. Similarly, Combined Deconvolution Blocks are stacked to construct the convolutional decoder. Moreover, we introduce the exponentially increasing dilation between convolutional kernel elements in the encoder and decoder, which expands receptive fields. Meanwhile, the grouped ConvLSTM layers are exploited to extract the interdependency of spatial and temporal information. The experimental results demonstrate that the proposed SCCN method obtains on average 86.00% in STOI and 2.43 in PESQ, which outperforms the state-of-the-art baseline methods, confirming the effectiveness in enhancing speech quality.

MDPI AG

Yang Xian Yujin Fu Peixu Xing Hongwei Tao Yang Sun

Symmetry

2025

Title: Symmetric Combined Convolution with Convolutional Long Short-Term Memory for Monaural Speech Enhancement

Description:

Deep neural network-based approaches have obtained remarkable progress in monaural speech enhancement.

Nevertheless, current cutting-edge approaches remain vulnerable to complex acoustic scenarios.

We propose a Symmetric Combined Convolution Network with ConvLSTM (SCCN) for monaural speech enhancement.

Specifically, the Combined Convolution Block utilizes parallel convolution branches, including standard convolution and two different depthwise separable convolutions, to reinforce feature extraction in depthwise and channelwise.

Similarly, Combined Deconvolution Blocks are stacked to construct the convolutional decoder.

Moreover, we introduce the exponentially increasing dilation between convolutional kernel elements in the encoder and decoder, which expands receptive fields.

Meanwhile, the grouped ConvLSTM layers are exploited to extract the interdependency of spatial and temporal information.

The experimental results demonstrate that the proposed SCCN method obtains on average 86.

00% in STOI and 2.

43 in PESQ, which outperforms the state-of-the-art baseline methods, confirming the effectiveness in enhancing speech quality.

Back

[RETRACTED]Rhino XL Reviews, NY USA: Studies show that testosterone levels in males decrease constantly with growing age. There are also many other problems that males face due ...

Temporal integration of monaural and dichotic frequency modulation

Frequency modulation (FM) detection at low modulation frequencies is commonly used as an index of temporal fine structure processing to demonstrate age- and hearing-related deficit...

Reference-Based Speech Enhancement via Feature Alignment and Fusion Network

Speech enhancement aims at recovering a clean speech from a noisy input, which can be classified into single speech enhancement and personalized speech enhancement. Personalized sp...

Analog Convolutional Operator Circuit for Low-Power Mixed-Signal CNN Processing Chip

In this paper, we propose a compact and low-power mixed-signal approach to implementing convolutional operators that are often responsible for most of the chip area and power consu...

Binaural Hearing of Speech for Aided and Unaided Conditions

Differences in speech intelligibility and identification between binaural, monaural near ear, and monaural far ear conditions were studied in sound field conditions. Scores from li...

ON TYPES OF SPEECH IN THE NOVEL NEBO, TAKO DUBOKO BY VESNA KAPOR

The paper examines models of reported speech in Vesna Kapor’s novel Nebo, tako duboko from the point of view of syntax and stylistics. According to the clas- sification by Miloš Ko...

The Neural Mechanisms of Private Speech in Second Language Learners’ Oral Production: An fNIRS Study

Background: According to Vygotsky’s sociocultural theory, private speech functions both as a tool for thought regulation and as a transitional form between outer and inner speech. ...

Speech, communication, and neuroimaging in Parkinson's disease : characterisation and intervention outcomes

<p dir="ltr">Most individuals with Parkinson's disease (PD) experience changes in speech, voice or communication. Speech changes often manifest as hypokinetic dysarthria, a m...

Email:
Password:

Email:

Symmetric Combined Convolution with Convolutional Long Short-Term Memory for Monaural Speech Enhancement

Related Results