Javascript must be enabled to continue!

Reference-Based Speech Enhancement via Feature Alignment and Fusion Network

Speech enhancement aims at recovering a clean speech from a noisy input, which can be classified into single speech enhancement and personalized speech enhancement. Personalized speech enhancement usually utilizes the speaker identity extracted from the noisy speech itself (or a clean reference speech) as a global embedding to guide the enhancement process. Different from them, we observe that the speeches of the same speaker are correlated in terms of frame-level short-time Fourier Transform (STFT) spectrogram. Therefore, we propose reference-based speech enhancement via a feature alignment and fusion network (FAF-Net). Given a noisy speech and a clean reference speech spoken by the same speaker, we first propose a feature level alignment strategy to warp the clean reference with the noisy speech in frame level. Then, we fuse the reference feature with the noisy feature via a similarity-based fusion strategy. Finally, the fused features are skipped connected to the decoder, which generates the enhanced results. Experimental results demonstrate that the performance of the proposed FAF-Net is close to state-of-the-art speech enhancement methods on both DNS and Voice Bank+DEMAND datasets. Our code is available at https://github.com/HieDean/FAF-Net.

Association for the Advancement of Artificial Intelligence (AAAI)

Huanjing Yue Wenxin Duo Xiulian Peng Jingyu Yang

Proceedings of the AAAI Conference on Artificial Intelligence

2022

Title: Reference-Based Speech Enhancement via Feature Alignment and Fusion Network

Description:

Speech enhancement aims at recovering a clean speech from a noisy input, which can be classified into single speech enhancement and personalized speech enhancement.

Personalized speech enhancement usually utilizes the speaker identity extracted from the noisy speech itself (or a clean reference speech) as a global embedding to guide the enhancement process.

Different from them, we observe that the speeches of the same speaker are correlated in terms of frame-level short-time Fourier Transform (STFT) spectrogram.

Therefore, we propose reference-based speech enhancement via a feature alignment and fusion network (FAF-Net).

Given a noisy speech and a clean reference speech spoken by the same speaker, we first propose a feature level alignment strategy to warp the clean reference with the noisy speech in frame level.

Then, we fuse the reference feature with the noisy feature via a similarity-based fusion strategy.

Finally, the fused features are skipped connected to the decoder, which generates the enhanced results.

Experimental results demonstrate that the performance of the proposed FAF-Net is close to state-of-the-art speech enhancement methods on both DNS and Voice Bank+DEMAND datasets.

Our code is available at https://github.

com/HieDean/FAF-Net.

Back

Related Results

The Nuclear Fusion Award

The Nuclear Fusion Award ceremony for 2009 and 2010 award winners was held during the 23rd IAEA Fusion Energy Conference in Daejeon. This time, both 2009 and 2010 award winners w...

[RETRACTED] Rhino XL Male Enhancement v1

[RETRACTED]Rhino XL Reviews, NY USA: Studies show that testosterone levels in males decrease constantly with growing age. There are also many other problems that males face due ...

Nonproliferation and fusion power plants

Abstract The world now appears to be on the brink of realizing commercial fusion. As fusion energy progresses towards near-term commercial deployment, the question arises a...

Speech, communication, and neuroimaging in Parkinson's disease : characterisation and intervention outcomes

<p dir="ltr">Most individuals with Parkinson's disease (PD) experience changes in speech, voice or communication. Speech changes often manifest as hypokinetic dysarthria, a m...

Speech, communication, and neuroimaging in Parkinson's disease : characterisation and intervention outcomes

<p dir="ltr">Most individuals with Parkinson's disease (PD) experience changes in speech, voice or communication. Speech changes often manifest as hypokinetic dysarthria, a m...

Speech, communication, and neuroimaging in Parkinson's disease : Characterisation and intervention outcomes

<p dir="ltr">Most individuals with Parkinson's disease (PD) experience changes in speech, voice or communication. Speech changes often manifest as hypokinetic dysarthria, a m...

The Neural Mechanisms of Private Speech in Second Language Learners’ Oral Production: An fNIRS Study

Background: According to Vygotsky’s sociocultural theory, private speech functions both as a tool for thought regulation and as a transitional form between outer and inner speech. ...

Fusion rate: a time-to-event phenomenon

Object.The term “fusion rate” is generally denoted in the literature as the percentage of patients with successful fusion over a specific range of follow up. Because the time to fu...

Email:
Password:

Email: