Javascript must be enabled to continue!
Privacy Protection for Chinese Electronic Medical Records Using Large Language Models: Effectiveness Evaluation and Application of LLM Models in Medical Data Tasks
View through CrossRef
Abstract
Background
The privacy protection of medical patients has remained a critical concern in healthcare information management during the digital era. Conventional approaches have predominantly relied on rule-based protocols and data encryption systems, which typically require substantial involvement of IT professionals for implementation. Recent advancements in Large Language Models (LLMs) have introduced novel approaches for electronic medical records (EMRs) privacy protection, simultaneously enabling clinical practitioners to utilize these tools for specific data tasks.
Objectives
This study aims to leverage LLMs through a no-code framework to achieve structured processing of patient privacy data in Chinese EMRs and formulate privacy policies, while evaluating the practical efficacy of LLMs.
Methods
This study employs a disease-specific data subset from Peking Union Medical College Hospital (PUMCH), comprising data from approximately 160,000 patients, using a prompt engineering approach to enable LLMs to perform sensitive information annotation in lengthy EMR narratives. Simultaneously, it automates the classification of privacy-level for identified sensitive data and develops targeted protection strategies based on risk tiers, thereby mitigating non-essential exposure of patient privacy during data sharing. The research utilizes the Qwen model, with its entire workflow being exclusively driven by medical natural language prompts and self-evolving knowledge bases, requiring no supplementary programming or code development. These strategies were validated using the hospital’s test text dataset, with primary evaluation metrics focusing on precision rates (including accuracy of information extraction and privacy-level classification) and recall rate assessments for critical sensitive data categories.
Results
Utilizing 4 million text entries from PUMCH, we conducted sampled data observation and performed privacy annotation via LLM prompts across seven categories: names, addresses, contact details, national ID numbers, hospital names, sexually transmitted disease (STD) information, and pregnancy-related patient data. Through iterative prompt refinement via error analysis, optimal performance was achieved on the test set, demonstrating an average precision of 97% and recall of 95% across these seven entity types. Furthermore, sensitivity tier classification was implemented for three high-risk categories: addresses, STD information, and pregnancy-related data, attaining average precision of 95% and recall of 90% in sensitivity-level determination.
Discussion
We propose a novel codeless privacy protection framework leveraging LLMs, enabling intelligent anonymization of medical data through natural language interaction. This solution employs a three-tiered hierarchical protection mechanism that dynamically adapts privacy strategies to clinical scenario requirements, ensuring data security while maximizing data utility.
Cold Spring Harbor Laboratory
Title: Privacy Protection for Chinese Electronic Medical Records Using Large Language Models: Effectiveness Evaluation and Application of LLM Models in Medical Data Tasks
Description:
Abstract
Background
The privacy protection of medical patients has remained a critical concern in healthcare information management during the digital era.
Conventional approaches have predominantly relied on rule-based protocols and data encryption systems, which typically require substantial involvement of IT professionals for implementation.
Recent advancements in Large Language Models (LLMs) have introduced novel approaches for electronic medical records (EMRs) privacy protection, simultaneously enabling clinical practitioners to utilize these tools for specific data tasks.
Objectives
This study aims to leverage LLMs through a no-code framework to achieve structured processing of patient privacy data in Chinese EMRs and formulate privacy policies, while evaluating the practical efficacy of LLMs.
Methods
This study employs a disease-specific data subset from Peking Union Medical College Hospital (PUMCH), comprising data from approximately 160,000 patients, using a prompt engineering approach to enable LLMs to perform sensitive information annotation in lengthy EMR narratives.
Simultaneously, it automates the classification of privacy-level for identified sensitive data and develops targeted protection strategies based on risk tiers, thereby mitigating non-essential exposure of patient privacy during data sharing.
The research utilizes the Qwen model, with its entire workflow being exclusively driven by medical natural language prompts and self-evolving knowledge bases, requiring no supplementary programming or code development.
These strategies were validated using the hospital’s test text dataset, with primary evaluation metrics focusing on precision rates (including accuracy of information extraction and privacy-level classification) and recall rate assessments for critical sensitive data categories.
Results
Utilizing 4 million text entries from PUMCH, we conducted sampled data observation and performed privacy annotation via LLM prompts across seven categories: names, addresses, contact details, national ID numbers, hospital names, sexually transmitted disease (STD) information, and pregnancy-related patient data.
Through iterative prompt refinement via error analysis, optimal performance was achieved on the test set, demonstrating an average precision of 97% and recall of 95% across these seven entity types.
Furthermore, sensitivity tier classification was implemented for three high-risk categories: addresses, STD information, and pregnancy-related data, attaining average precision of 95% and recall of 90% in sensitivity-level determination.
Discussion
We propose a novel codeless privacy protection framework leveraging LLMs, enabling intelligent anonymization of medical data through natural language interaction.
This solution employs a three-tiered hierarchical protection mechanism that dynamically adapts privacy strategies to clinical scenario requirements, ensuring data security while maximizing data utility.
Related Results
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract
Introduction
The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Human-AI Collaboration in Clinical Reasoning: A UK Replication and Interaction Analysis
Human-AI Collaboration in Clinical Reasoning: A UK Replication and Interaction Analysis
Abstract
Objective
A paper from Goh et al found that a large language model (LLM) working alone outperformed American clinicians assisted...
Application Status and Prospect of Data Privacy Protection Technology
Application Status and Prospect of Data Privacy Protection Technology
This article aims to explore the current application status and future prospects of data privacy protection technology, analyze the challenges faced by current data privacy, explor...
Augmented Differential Privacy Framework for Data Analytics
Augmented Differential Privacy Framework for Data Analytics
Abstract
Differential privacy has emerged as a popular privacy framework for providing privacy preserving noisy query answers based on statistical properties of databases. ...
Privacy Risk in Recommender Systems
Privacy Risk in Recommender Systems
Nowadays, recommender systems are mostly used in many online applications to filter information and help users in selecting their relevant requirements. It avoids users to become o...
Financial Advisory LLM Model for Modernizing Financial Services and Innovative Solutions for Financial Literacy in India
Financial Advisory LLM Model for Modernizing Financial Services and Innovative Solutions for Financial Literacy in India
Abstract
Dynamically evolving financial conditions in India place sophisticated models of financial advisory services relative to its own peculiar conditions more in demand...
CAT-LLM: Style-enhanced Large Language Models with Text Style Definition for Chinese Article-style Transfer
CAT-LLM: Style-enhanced Large Language Models with Text Style Definition for Chinese Article-style Transfer
Text style transfer plays a vital role in online entertainment and social media. However, existing models struggle to handle the complexity of Chinese long texts, such as rhetoric,...

