Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Open Set Audio Classification Using Autoencoders Trained on Few Data

View through CrossRef
Open-set recognition (OSR) is a challenging machine learning problem that appears when classifiers are faced with test instances from classes not seen during training. It can be summarized as the problem of correctly identifying instances from a known class (seen during training) while rejecting any unknown or unwanted samples (those belonging to unseen classes). Another problem arising in practical scenarios is few-shot learning (FSL), which appears when there is no availability of a large number of positive samples for training a recognition system. Taking these two limitations into account, a new dataset for OSR and FSL for audio data was recently released to promote research on solutions aimed at addressing both limitations. This paper proposes an audio OSR/FSL system divided into three steps: a high-level audio representation, feature embedding using two different autoencoder architectures and a multi-layer perceptron (MLP) trained on latent space representations to detect known classes and reject unwanted ones. An extensive set of experiments is carried out considering multiple combinations of openness factors (OSR condition) and number of shots (FSL condition), showing the validity of the proposed approach and confirming superior performance with respect to a baseline system based on transfer learning.
Title: Open Set Audio Classification Using Autoencoders Trained on Few Data
Description:
Open-set recognition (OSR) is a challenging machine learning problem that appears when classifiers are faced with test instances from classes not seen during training.
It can be summarized as the problem of correctly identifying instances from a known class (seen during training) while rejecting any unknown or unwanted samples (those belonging to unseen classes).
Another problem arising in practical scenarios is few-shot learning (FSL), which appears when there is no availability of a large number of positive samples for training a recognition system.
Taking these two limitations into account, a new dataset for OSR and FSL for audio data was recently released to promote research on solutions aimed at addressing both limitations.
This paper proposes an audio OSR/FSL system divided into three steps: a high-level audio representation, feature embedding using two different autoencoder architectures and a multi-layer perceptron (MLP) trained on latent space representations to detect known classes and reject unwanted ones.
An extensive set of experiments is carried out considering multiple combinations of openness factors (OSR condition) and number of shots (FSL condition), showing the validity of the proposed approach and confirming superior performance with respect to a baseline system based on transfer learning.

Related Results

Mapping Mineralogical Distributions on Mars with Unsupervised Machine Learning
Mapping Mineralogical Distributions on Mars with Unsupervised Machine Learning
Abstract Knowledge of the constituents of the Martian surface and their distributions over the planet informs us about Mars’ geomorphological formation and evolutionary h...
Preaching With Audio-Visuals
Preaching With Audio-Visuals
Problem The increasing usage of audio-visuals in modern communication has brought about an increase in communication efficiency, an efficiency that is sometimes lacking in sermons...
Perbandingan Tingkat Kemiripan Rekaman Suara Menggunakan Metode Itakura Saito Distance untuk Mendukung Analisa Audio Forensik
Perbandingan Tingkat Kemiripan Rekaman Suara Menggunakan Metode Itakura Saito Distance untuk Mendukung Analisa Audio Forensik
Audio mengacu pada suara yang berbentuk sinyal listrik atau digital. Audio digital sering digunakan untuk merekam, menyimpan, dan mengirimkan audio, karena dapat dengan mudah dipro...
Applying quantum autoencoders for time series anomaly detection
Applying quantum autoencoders for time series anomaly detection
Abstract Anomaly detection is an important problem with applications in various domains such as fraud detection, pattern recognition, or medical diagnosis. Several algori...
SpEx: A Tool for Visualising and Navigating Speech Audio
SpEx: A Tool for Visualising and Navigating Speech Audio
<p>Audio is a ubiquitous form of information that is usually treated as a single, unbreakable, piece of content. Thus, audio interfaces remain simple, usually consisting of p...
Diffusion-Based Model for Audio Steganography
Diffusion-Based Model for Audio Steganography
Audio steganography exploits redundancies in the human auditory system to conceal secret information within cover audio, ensuring that the hidden data remains undetectable during n...
Computational Modeling and Analysis of Multi-timbral Musical Instrument Mixtures
Computational Modeling and Analysis of Multi-timbral Musical Instrument Mixtures
In the audio domain, the disciplines of signal processing, machine learning, psychoacoustics, information theory and library science have merged into the field of Music Information...
Improving Medical Document Classification via Feature Engineering
Improving Medical Document Classification via Feature Engineering
<p dir="ltr">Document classification (DC) is the task of assigning the predefined labels to unseen documents by utilizing the model trained on the available labeled documents...

Back to Top