Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Quality aspects of annotated data

View through CrossRef
AbstractThe quality of Machine Learning (ML) applications is commonly assessed by quantifying how well an algorithm fits its respective training data. Yet, a perfect model that learns from and reproduces erroneous data will always be flawed in its real-world application. Hence, a comprehensive assessment of ML quality must include an additional data perspective, especially for models trained on human-annotated data. For the collection of human-annotated training data, best practices often do not exist and leave researchers to make arbitrary decisions when collecting annotations. Decisions about the selection of annotators or label options may affect training data quality and model performance.In this paper, I will outline and summarize previous research and approaches to the collection of annotated training data. I look at data annotation and its quality confounders from two perspectives: the set of annotators and the strategy of data collection. The paper will highlight the various implementations of text and image annotation collection and stress the importance of careful task construction. I conclude by illustrating the consequences for future research and applications of data annotation. The paper is intended give readers a starting point on annotated data quality research and stress the necessity of thoughtful consideration of the annotation collection process to researchers and practitioners.
Springer Science and Business Media LLC
Title: Quality aspects of annotated data
Description:
AbstractThe quality of Machine Learning (ML) applications is commonly assessed by quantifying how well an algorithm fits its respective training data.
Yet, a perfect model that learns from and reproduces erroneous data will always be flawed in its real-world application.
Hence, a comprehensive assessment of ML quality must include an additional data perspective, especially for models trained on human-annotated data.
For the collection of human-annotated training data, best practices often do not exist and leave researchers to make arbitrary decisions when collecting annotations.
Decisions about the selection of annotators or label options may affect training data quality and model performance.
In this paper, I will outline and summarize previous research and approaches to the collection of annotated training data.
I look at data annotation and its quality confounders from two perspectives: the set of annotators and the strategy of data collection.
The paper will highlight the various implementations of text and image annotation collection and stress the importance of careful task construction.
I conclude by illustrating the consequences for future research and applications of data annotation.
The paper is intended give readers a starting point on annotated data quality research and stress the necessity of thoughtful consideration of the annotation collection process to researchers and practitioners.

Related Results

Résumés des conférences JRANF 2021
Résumés des conférences JRANF 2021
able des matières Résumés. 140 Agenda Formation en Radioprotection JRANF 2021 Ouagadougou. 140 RPF 1 Rappel des unités de doses. 140 RPF 2 Risques déterministes et stochastique...
On the Effects of Low-Quality Training Data on Information Extraction from Clinical Reports
On the Effects of Low-Quality Training Data on Information Extraction from Clinical Reports
In the last five years there has been a flurry of work on information extraction from clinical documents, that is, on algorithms capable of extracting, from the informal and unstru...
Development and Evaluation of Gold Standard Dataset for Sentiment Analysis of Tweets
Development and Evaluation of Gold Standard Dataset for Sentiment Analysis of Tweets
Pre-labeled data is typically required for supervised machine learning. A limited number of object classes in the majority of open access and pre-annotated datasets make them unsui...
Research on the Evaluation and Influencing Factors of China’s Provincial Employment Quality Based on Principal Tensor Analysis
Research on the Evaluation and Influencing Factors of China’s Provincial Employment Quality Based on Principal Tensor Analysis
The research on the quality of employment in China holds immense significance for attaining high-quality employment development. Firstly, enhancing the quality of employment facili...
Quality
Quality
Abstract Quality in the chemical industry has come to encompass three areas: ( 1 ) quality control, ( 2 ) qua...
Jazz Ensemble Literature Featuring Female Composers and Arrangers: An Annotated Repertoire List
Jazz Ensemble Literature Featuring Female Composers and Arrangers: An Annotated Repertoire List
This portfolio, titled “Jazz Ensemble Literature Featuring Female Composers and Arrangers: An Annotated Repertoire List,” addresses the persistent underrepresentation of women in l...
Clustering Heterogeneous Data Values for Data Quality Analysis
Clustering Heterogeneous Data Values for Data Quality Analysis
Data is of high quality if it is fit for its intended purpose. Data heterogeneity can be a major quality problem, as quality aspects such as understandability and consistency can b...
AlnC: An extensive database of long non-coding RNAs in Angiosperms
AlnC: An extensive database of long non-coding RNAs in Angiosperms
AbstractLong non-coding RNAs (lncRNAs) play a major role in diverse biological processes that are contemplated to have diverse regulatory roles in plants. While lncRNAs have been d...

Back to Top