Javascript must be enabled to continue!
Structured Codes and Free-Text Notes: Measuring Information Complementarity in Electronic Health Records
View through CrossRef
ABSTRACT
Background
Electronic health records (EHRs) consist of both structured data (e.g., diagnostic codes) and unstructured data (e.g., clinical notes). It’s commonly believed that unstructured clinical narratives provide more comprehensive information. However, this assumption often lacks large-scale validation and direct validation methods.
Objective
This study aims to quantitatively compare the information in structured and unstructured EHR data and directly validate whether unstructured data offers more extensive information across a patient population.
Methods
We analyzed both structured and unstructured data from patient records and visits in a large Dutch primary care EHR database between January 2021 and January 2024. Clinical concepts were identified from free-text notes using an extraction framework tailored for Dutch and compared with concepts from structured data. Concept embeddings were generated to measure semantic similarity between structured and extracted concepts through cosine similarity. A similarity threshold was systematically determined via annotated matches and minimized weighted Gini impurity. We then quantified the concept overlap between structured and unstructured data across various concept domains and patient populations.
Results
In a population of 1.8 million patients, 42% of structured concepts in patient records and 25% in individual visits had similar matches in unstructured data. Conversely, only 13% of extracted concepts from records and 7% from visits had similar structured counterparts. Condition concepts had the highest overlap, followed by measurements and drug concepts. Subpopulation visits, such as those with chronic conditions or psychological disorders, showed different proportions of data overlap, indicating varied reliance on structured versus unstructured data across clinical contexts.
Conclusions
Our study demonstrates the feasibility of quantifying the information difference between structured and unstructured data, showing that the unstructured data provides important additional information in the studied database and populations. Despite some limitations, our proposed methodology proves versatile, and its application can lead to more robust and insightful observational clinical research.
Title: Structured Codes and Free-Text Notes: Measuring Information Complementarity in Electronic Health Records
Description:
ABSTRACT
Background
Electronic health records (EHRs) consist of both structured data (e.
g.
, diagnostic codes) and unstructured data (e.
g.
, clinical notes).
It’s commonly believed that unstructured clinical narratives provide more comprehensive information.
However, this assumption often lacks large-scale validation and direct validation methods.
Objective
This study aims to quantitatively compare the information in structured and unstructured EHR data and directly validate whether unstructured data offers more extensive information across a patient population.
Methods
We analyzed both structured and unstructured data from patient records and visits in a large Dutch primary care EHR database between January 2021 and January 2024.
Clinical concepts were identified from free-text notes using an extraction framework tailored for Dutch and compared with concepts from structured data.
Concept embeddings were generated to measure semantic similarity between structured and extracted concepts through cosine similarity.
A similarity threshold was systematically determined via annotated matches and minimized weighted Gini impurity.
We then quantified the concept overlap between structured and unstructured data across various concept domains and patient populations.
Results
In a population of 1.
8 million patients, 42% of structured concepts in patient records and 25% in individual visits had similar matches in unstructured data.
Conversely, only 13% of extracted concepts from records and 7% from visits had similar structured counterparts.
Condition concepts had the highest overlap, followed by measurements and drug concepts.
Subpopulation visits, such as those with chronic conditions or psychological disorders, showed different proportions of data overlap, indicating varied reliance on structured versus unstructured data across clinical contexts.
Conclusions
Our study demonstrates the feasibility of quantifying the information difference between structured and unstructured data, showing that the unstructured data provides important additional information in the studied database and populations.
Despite some limitations, our proposed methodology proves versatile, and its application can lead to more robust and insightful observational clinical research.
Related Results
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...
Decoding of block and convolutional codes in rank metric
Decoding of block and convolutional codes in rank metric
Décodage des codes en bloc et des codes convolutifs en métrique rang
Les code en métrique rang attirent l’attention depuis quelques années en raison de leur applica...
Bounds on the sum of broadcast domination number and strong metric dimension of graphs
Bounds on the sum of broadcast domination number and strong metric dimension of graphs
Let [Formula: see text] be a connected graph of order at least two with vertex set [Formula: see text]. For [Formula: see text], let [Formula: see text] denote the length of an [Fo...
Concatenated ????-
Direct
codes and their applications
Concatenated ????-
Direct
codes and their applications
Let [Formula: see text] denote the finite field with [Formula: see text] elements. While an LCD code [Formula: see text] satisfies the duality criteria [Formula: see text] with [Fo...
ANALYSIS OF READING MATERIALS IN TEXTBOOK FOR GRADE XI SENIOR HIGH SCHOOL
ANALYSIS OF READING MATERIALS IN TEXTBOOK FOR GRADE XI SENIOR HIGH SCHOOL
This study aims to find out the GI and LD level, the text which has the highest GI and LD and what make the text has the highest GI and LD of Advanced Learning English 2 textbook. ...
Variation-based complementarity assessment between wind and solar resources in China
Variation-based complementarity assessment between wind and solar resources in China
The complementarity between wind and solar resources is considered one of the factors that restrict the utilization of intermittent renewable power sources such as these, but the t...
Some important results on ????-Direct codes
Some important results on ????-Direct codes
[Formula: see text]-Direct codes are an extension to the class of linear codes having complementary duals (LCD codes). Defined over a finite field [Formula: see text], it is compri...

