Javascript must be enabled to continue!

Analysis of Annotated Data Models for Improving Data Quality

The public Medical Data Models (MDM) portal with more than 9.000 annotated forms from clinical trials and other sources provides many research opportunities for the medical informatics community. It is mainly used to address the problem of heterogeneity by searching, mediating, reusing, and assessing data models, e. g. the semi-interactive curation of core data records in a special domain. Furthermore, it can be used as a benchmark for evaluating algorithms that create, transform, annotate, and analyse structured patient data. Using CDISC ODM for syntactically representing all data models in the MDM portal, there are semi-automatically added UMLS CUIs at several ODM levels like ItemGroupDef, ItemDef, or CodeList item. This can improve the interpretability and processability of the received information, but only if the coded information is correct and reliable. This raises the question how to assure that semantically similar datasets are also processed and classified similarly. In this work, a (semi-)automatic approach to analyse and assess items, questions, and data elements in clinical studies is described. The approach uses a hybrid evaluation process to rate and propose semantic annotations for under-specified trial items. The evaluation algorithm operates with the commonly used NLM MetaMap to provide UMLS support and corpus-based proposal algorithms to link datasets from the provided CDISC ODM item pool.

IOS Press

Ulrich Hannes Kock-Schoppenhauer Ann-Kristin Andersen Björn Ingenerf Josef

Studies in Health Technology and Informatics

2025

Title: Analysis of Annotated Data Models for Improving Data Quality

Description:

The public Medical Data Models (MDM) portal with more than 9.

000 annotated forms from clinical trials and other sources provides many research opportunities for the medical informatics community.

It is mainly used to address the problem of heterogeneity by searching, mediating, reusing, and assessing data models, e.

the semi-interactive curation of core data records in a special domain.

Furthermore, it can be used as a benchmark for evaluating algorithms that create, transform, annotate, and analyse structured patient data.

Using CDISC ODM for syntactically representing all data models in the MDM portal, there are semi-automatically added UMLS CUIs at several ODM levels like ItemGroupDef, ItemDef, or CodeList item.

This can improve the interpretability and processability of the received information, but only if the coded information is correct and reliable.

This raises the question how to assure that semantically similar datasets are also processed and classified similarly.

In this work, a (semi-)automatic approach to analyse and assess items, questions, and data elements in clinical studies is described.

The approach uses a hybrid evaluation process to rate and propose semantic annotations for under-specified trial items.

The evaluation algorithm operates with the commonly used NLM MetaMap to provide UMLS support and corpus-based proposal algorithms to link datasets from the provided CDISC ODM item pool.

Back

In the last five years there has been a flurry of work on information extraction from clinical documents, that is, on algorithms capable of extracting, from the informal and unstru...

Exploring the topical structure of short text through probability models : from tasks to fundamentals

Recent technological advances have radically changed the way we communicate. Today’s communication has become ubiquitous and it has fostered the need for information that is easie...

Quality aspects of annotated data

AbstractThe quality of Machine Learning (ML) applications is commonly assessed by quantifying how well an algorithm fits its respective training data. Yet, a perfect model that lea...

Optimising tool wear and workpiece condition monitoring via cyber-physical systems for smart manufacturing

Smart manufacturing has been developed since the introduction of Industry 4.0. It consists of resource sharing and networking, predictive engineering, and material and data analyti...

Research on the Evaluation and Influencing Factors of China’s Provincial Employment Quality Based on Principal Tensor Analysis

The research on the quality of employment in China holds immense significance for attaining high-quality employment development. Firstly, enhancing the quality of employment facili...

Development and Evaluation of Gold Standard Dataset for Sentiment Analysis of Tweets

Pre-labeled data is typically required for supervised machine learning. A limited number of object classes in the majority of open access and pre-annotated datasets make them unsui...

PO-285 A review of effects of exercise on the quality of life in breast cancer survivors

Objective Breast cancer is one of the most common malignant tumors in women.The number of women diagnosed with breast cancer each year is also increasing.It is also the leading cau...

Generación de modelos de procesos y decisiones a partir de documentos de texto

(English) This thesis addresses the importance of formal models for the efficient management of business processes (BPM) and business decision management (BDM) in a constantly evol...

Email:
Password:

Email:

Analysis of Annotated Data Models for Improving Data Quality

Related Results