Javascript must be enabled to continue!
Classifiers of Medical Eponymy in Scientific Texts
View through CrossRef
Many concepts in the medical literature are named after persons. Frequent ambiguities and spelling varieties, however, complicate the automatic recognition of such eponyms with natural language processing (NLP) tools. Recently developed methods include word vectors and transformer models that incorporate context information into the downstream layers of a neural network architecture. To evaluate these models for classifying medical eponymy, we label eponyms and counterexamples mentioned in a convenience sample of 1,079 Pubmed abstracts, and fit logistic regression models to the vectors from the first (vocabulary) and last (contextualized) layers of a SciBERT language model. According to the area under sensitivity-specificity curves, models based on contextualized vectors achieved a median performance of 98.0% in held-out phrases. This outperformed models based on vocabulary vectors (95.7%) by a median of 2.3 percentage points. When processing unlabeled inputs, such classifiers appeared to generalize to eponyms that did not appear among any annotations. These findings attest to the effectiveness of developing domain-specific NLP functions based on pre-trained language models, and underline the utility of context information for classifying potential eponyms.
Title: Classifiers of Medical Eponymy in Scientific Texts
Description:
Many concepts in the medical literature are named after persons.
Frequent ambiguities and spelling varieties, however, complicate the automatic recognition of such eponyms with natural language processing (NLP) tools.
Recently developed methods include word vectors and transformer models that incorporate context information into the downstream layers of a neural network architecture.
To evaluate these models for classifying medical eponymy, we label eponyms and counterexamples mentioned in a convenience sample of 1,079 Pubmed abstracts, and fit logistic regression models to the vectors from the first (vocabulary) and last (contextualized) layers of a SciBERT language model.
According to the area under sensitivity-specificity curves, models based on contextualized vectors achieved a median performance of 98.
0% in held-out phrases.
This outperformed models based on vocabulary vectors (95.
7%) by a median of 2.
3 percentage points.
When processing unlabeled inputs, such classifiers appeared to generalize to eponyms that did not appear among any annotations.
These findings attest to the effectiveness of developing domain-specific NLP functions based on pre-trained language models, and underline the utility of context information for classifying potential eponyms.
Related Results
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Genre implies formal and stylistic conventions of a particular text type, which inevitably affects the translation process. This „force of genre bias“ (Prieto Ramos, 2014) has been...
Biblical Texts and Interpretations in the Dead Sea Scrolls: Biblical Texts
Biblical Texts and Interpretations in the Dead Sea Scrolls: Biblical Texts
The introduction to this entry places the Dead Sea Scrolls in their historical and chronological context and discusses the popularity and provenance of the texts found in the Judea...
Machine Learning and Semantic Orientation Ensemble Methods for Egyptian Telecom Tweets Sentiment Analysis
Machine Learning and Semantic Orientation Ensemble Methods for Egyptian Telecom Tweets Sentiment Analysis
The vast amount of data currently available online attracted many parties to analyze sentiments expressed in these data extracting valuable knowledge. Many approaches have been pro...
Medical tourism and healthcare trends in Thailand
Medical tourism and healthcare trends in Thailand
Medical tourism can be defined as the travel of patients from one country to another with the intention of receiving medical treatment. This is an increasing and important feature ...
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract
Introduction
The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
The pneumonia severity index: assessment and comparison to popular machine learning classifiers
The pneumonia severity index: assessment and comparison to popular machine learning classifiers
AbstractPneumonia is the top communicable cause of death worldwide. Accurate prognostication of patient severity with Community Acquired Pneumonia (CAP) allows better patient care ...
Learning Prototype Classifiers for Long-Tailed Recognition
Learning Prototype Classifiers for Long-Tailed Recognition
The problem of long-tailed recognition (LTR) has received attention in recent years due to the fundamental power-law distribution of objects in the real-world. Most recent works in...
Tibetan Fond of the Center of Oriental Manuscripts and Xylography of the Institute for Mongolian, Buddhist and Tibetan Studies of the Siberian Branch of the Russian Academy of Sciences: Characteristics, Classification of the Medical Collection
Tibetan Fond of the Center of Oriental Manuscripts and Xylography of the Institute for Mongolian, Buddhist and Tibetan Studies of the Siberian Branch of the Russian Academy of Sciences: Characteristics, Classification of the Medical Collection
This article offers a description and subject classification of the medical texts collection from the Tibetan fond of the Center for Oriental Manuscripts and Xylographs of the Inst...

