Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Analysis of Annotated Data Models for Improving Data Quality

View through CrossRef
The public Medical Data Models (MDM) portal with more than 9.000 annotated forms from clinical trials and other sources provides many research opportunities for the medical informatics community. It is mainly used to address the problem of heterogeneity by searching, mediating, reusing, and assessing data models, e. g. the semi-interactive curation of core data records in a special domain. Furthermore, it can be used as a benchmark for evaluating algorithms that create, transform, annotate, and analyse structured patient data. Using CDISC ODM for syntactically representing all data models in the MDM portal, there are semi-automatically added UMLS CUIs at several ODM levels like ItemGroupDef, ItemDef, or CodeList item. This can improve the interpretability and processability of the received information, but only if the coded information is correct and reliable. This raises the question how to assure that semantically similar datasets are also processed and classified similarly. In this work, a (semi-)automatic approach to analyse and assess items, questions, and data elements in clinical studies is described. The approach uses a hybrid evaluation process to rate and propose semantic annotations for under-specified trial items. The evaluation algorithm operates with the commonly used NLM MetaMap to provide UMLS support and corpus-based proposal algorithms to link datasets from the provided CDISC ODM item pool.
Title: Analysis of Annotated Data Models for Improving Data Quality
Description:
The public Medical Data Models (MDM) portal with more than 9.
000 annotated forms from clinical trials and other sources provides many research opportunities for the medical informatics community.
It is mainly used to address the problem of heterogeneity by searching, mediating, reusing, and assessing data models, e.
g.
the semi-interactive curation of core data records in a special domain.
Furthermore, it can be used as a benchmark for evaluating algorithms that create, transform, annotate, and analyse structured patient data.
Using CDISC ODM for syntactically representing all data models in the MDM portal, there are semi-automatically added UMLS CUIs at several ODM levels like ItemGroupDef, ItemDef, or CodeList item.
This can improve the interpretability and processability of the received information, but only if the coded information is correct and reliable.
This raises the question how to assure that semantically similar datasets are also processed and classified similarly.
In this work, a (semi-)automatic approach to analyse and assess items, questions, and data elements in clinical studies is described.
The approach uses a hybrid evaluation process to rate and propose semantic annotations for under-specified trial items.
The evaluation algorithm operates with the commonly used NLM MetaMap to provide UMLS support and corpus-based proposal algorithms to link datasets from the provided CDISC ODM item pool.

Related Results

On the Effects of Low-Quality Training Data on Information Extraction from Clinical Reports
On the Effects of Low-Quality Training Data on Information Extraction from Clinical Reports
In the last five years there has been a flurry of work on information extraction from clinical documents, that is, on algorithms capable of extracting, from the informal and unstru...
Exploring the topical structure of short text through probability models : from tasks to fundamentals
Exploring the topical structure of short text through probability models : from tasks to fundamentals
Recent technological advances have radically changed the way we communicate. Today’s communication has become ubiquitous and it has fostered the need for information that is easie...
Quality aspects of annotated data
Quality aspects of annotated data
AbstractThe quality of Machine Learning (ML) applications is commonly assessed by quantifying how well an algorithm fits its respective training data. Yet, a perfect model that lea...
Optimising tool wear and workpiece condition monitoring via cyber-physical systems for smart manufacturing
Optimising tool wear and workpiece condition monitoring via cyber-physical systems for smart manufacturing
Smart manufacturing has been developed since the introduction of Industry 4.0. It consists of resource sharing and networking, predictive engineering, and material and data analyti...
Research on the Evaluation and Influencing Factors of China’s Provincial Employment Quality Based on Principal Tensor Analysis
Research on the Evaluation and Influencing Factors of China’s Provincial Employment Quality Based on Principal Tensor Analysis
The research on the quality of employment in China holds immense significance for attaining high-quality employment development. Firstly, enhancing the quality of employment facili...
Development and Evaluation of Gold Standard Dataset for Sentiment Analysis of Tweets
Development and Evaluation of Gold Standard Dataset for Sentiment Analysis of Tweets
Pre-labeled data is typically required for supervised machine learning. A limited number of object classes in the majority of open access and pre-annotated datasets make them unsui...
Generación de modelos de procesos y decisiones a partir de documentos de texto
Generación de modelos de procesos y decisiones a partir de documentos de texto
(English) This thesis addresses the importance of formal models for the efficient management of business processes (BPM) and business decision management (BDM) in a constantly evol...
Quality
Quality
Abstract Quality in the chemical industry has come to encompass three areas: ( 1 ) quality control, ( 2 ) qua...

Back to Top