Javascript must be enabled to continue!

Quality aspects of annotated data

AbstractThe quality of Machine Learning (ML) applications is commonly assessed by quantifying how well an algorithm fits its respective training data. Yet, a perfect model that learns from and reproduces erroneous data will always be flawed in its real-world application. Hence, a comprehensive assessment of ML quality must include an additional data perspective, especially for models trained on human-annotated data. For the collection of human-annotated training data, best practices often do not exist and leave researchers to make arbitrary decisions when collecting annotations. Decisions about the selection of annotators or label options may affect training data quality and model performance.In this paper, I will outline and summarize previous research and approaches to the collection of annotated training data. I look at data annotation and its quality confounders from two perspectives: the set of annotators and the strategy of data collection. The paper will highlight the various implementations of text and image annotation collection and stress the importance of careful task construction. I conclude by illustrating the consequences for future research and applications of data annotation. The paper is intended give readers a starting point on annotated data quality research and stress the necessity of thoughtful consideration of the annotation collection process to researchers and practitioners.

Springer Science and Business Media LLC

Jacob Beck

AStA Wirtschafts- und Sozialstatistisches Archiv

2023

Title: Quality aspects of annotated data

Description:

AbstractThe quality of Machine Learning (ML) applications is commonly assessed by quantifying how well an algorithm fits its respective training data.

Yet, a perfect model that learns from and reproduces erroneous data will always be flawed in its real-world application.

Hence, a comprehensive assessment of ML quality must include an additional data perspective, especially for models trained on human-annotated data.

For the collection of human-annotated training data, best practices often do not exist and leave researchers to make arbitrary decisions when collecting annotations.

Decisions about the selection of annotators or label options may affect training data quality and model performance.

In this paper, I will outline and summarize previous research and approaches to the collection of annotated training data.

I look at data annotation and its quality confounders from two perspectives: the set of annotators and the strategy of data collection.

The paper will highlight the various implementations of text and image annotation collection and stress the importance of careful task construction.

I conclude by illustrating the consequences for future research and applications of data annotation.

The paper is intended give readers a starting point on annotated data quality research and stress the necessity of thoughtful consideration of the annotation collection process to researchers and practitioners.

Back

Related Results

Résumés des conférences JRANF 2021

able des matières Résumés. 140 Agenda Formation en Radioprotection JRANF 2021 Ouagadougou. 140 RPF 1 Rappel des unités de doses. 140 RPF 2 Risques déterministes et stochastique...

On the Effects of Low-Quality Training Data on Information Extraction from Clinical Reports

In the last five years there has been a flurry of work on information extraction from clinical documents, that is, on algorithms capable of extracting, from the informal and unstru...

Development and Evaluation of Gold Standard Dataset for Sentiment Analysis of Tweets

Pre-labeled data is typically required for supervised machine learning. A limited number of object classes in the majority of open access and pre-annotated datasets make them unsui...

Research on the Evaluation and Influencing Factors of China’s Provincial Employment Quality Based on Principal Tensor Analysis

The research on the quality of employment in China holds immense significance for attaining high-quality employment development. Firstly, enhancing the quality of employment facili...

Quality

Abstract Quality in the chemical industry has come to encompass three areas: ( 1 ) quality control, ( 2 ) qua...

Clustering Heterogeneous Data Values for Data Quality Analysis

Data is of high quality if it is fit for its intended purpose. Data heterogeneity can be a major quality problem, as quality aspects such as understandability and consistency can b...

Jazz Ensemble Literature Featuring Female Composers and Arrangers: An Annotated Repertoire List

This portfolio, titled “Jazz Ensemble Literature Featuring Female Composers and Arrangers: An Annotated Repertoire List,” addresses the persistent underrepresentation of women in l...

PO-285 A review of effects of exercise on the quality of life in breast cancer survivors

Objective Breast cancer is one of the most common malignant tumors in women.The number of women diagnosed with breast cancer each year is also increasing.It is also the leading cau...

Email:
Password:

Email: