Javascript must be enabled to continue!

Text Data Augmentation for Deep Learning

Abstract Natural Language Processing (NLP) is one of the most captivating applications of Deep Learning. In this survey, we consider how the Data Augmentation training strategy can aid in its development. We begin with the major motifs of Data Augmentation summarized into strengthening local decision boundaries, brute force training, causality and counterfactual examples, and the distinction between meaning and form. We follow these motifs with a concrete list of augmentation frameworks that have been developed for text data. Deep Learning generally struggles with the measurement of generalization and characterization of overfitting. We highlight studies that cover how augmentations can construct test sets for generalization. NLP is at an early stage in applying Data Augmentation compared to Computer Vision. We highlight the key differences and promising ideas that have yet to be tested in NLP. For the sake of practical implementation, we describe tools that facilitate Data Augmentation such as the use of consistency regularization, controllers, and offline and online augmentation pipelines, to preview a few. Finally, we discuss interesting topics around Data Augmentation in NLP such as task-specific augmentations, the use of prior knowledge in self-supervised learning versus Data Augmentation, intersections with transfer and multi-task learning, and ideas for AI-GAs (AI-Generating Algorithms). We hope this paper inspires further research interest in Text Data Augmentation.

Springer Science and Business Media LLC

Connor Shorten Taghi M. Khoshgoftaar Borko Furht

2021

Title: Text Data Augmentation for Deep Learning

Description:

Abstract Natural Language Processing (NLP) is one of the most captivating applications of Deep Learning.

In this survey, we consider how the Data Augmentation training strategy can aid in its development.

We begin with the major motifs of Data Augmentation summarized into strengthening local decision boundaries, brute force training, causality and counterfactual examples, and the distinction between meaning and form.

We follow these motifs with a concrete list of augmentation frameworks that have been developed for text data.

Deep Learning generally struggles with the measurement of generalization and characterization of overfitting.

We highlight studies that cover how augmentations can construct test sets for generalization.

NLP is at an early stage in applying Data Augmentation compared to Computer Vision.

We highlight the key differences and promising ideas that have yet to be tested in NLP.

For the sake of practical implementation, we describe tools that facilitate Data Augmentation such as the use of consistency regularization, controllers, and offline and online augmentation pipelines, to preview a few.

Finally, we discuss interesting topics around Data Augmentation in NLP such as task-specific augmentations, the use of prior knowledge in self-supervised learning versus Data Augmentation, intersections with transfer and multi-task learning, and ideas for AI-GAs (AI-Generating Algorithms).

We hope this paper inspires further research interest in Text Data Augmentation.

Back

<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...

Sleep Habits and Occurrence of Lowback Pain among Craftsmen

<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...

CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021

The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...

Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches

Aim/Purpose: The purpose of this paper is to address the gap in the recognition of prior learning (RPL) by automating the classification of non-formal learning certificates using d...

Bounds on the sum of broadcast domination number and strong metric dimension of graphs

Let [Formula: see text] be a connected graph of order at least two with vertex set [Formula: see text]. For [Formula: see text], let [Formula: see text] denote the length of an [Fo...

ANALYSIS OF READING MATERIALS IN TEXTBOOK FOR GRADE XI SENIOR HIGH SCHOOL

This study aims to find out the GI and LD level, the text which has the highest GI and LD and what make the text has the highest GI and LD of Advanced Learning English 2 textbook. ...

Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)

BACKGROUND As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...

E-Press and Oppress

From elephants to ABBA fans, silicon to hormone, the following discussion uses a new research method to look at printed text, motion pictures and a te...

Email:
Password:

Email:

Text Data Augmentation for Deep Learning

Related Results