Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Over-Sampling Effect in Pre-Training for Bidirectional Encoder Representations from Transformers (BERT) to Localize Medical BERT and Enhance Biomedical BERT (Preprint)

View through CrossRef
BACKGROUND Pre-training large-scale neural language models on raw texts has made a significant contribution to improving transfer learning in natural language processing. With the introduction of transformer-based language models, such as bidirectional encoder representations from transformers (BERT), the performance of information extraction from free text has significantly improved for both the general and medical domains; however, it is difficult to train specific BERT models that perform well for domains in which there are few publicly available databases of high quality and large size. OBJECTIVE We hypothesize that this problem can be addressed by over-sampling a domain-specific corpus and using it for pre-training with a larger corpus in a balanced manner. In this study, we verify our hypothesis by developing pre-training models using our method and evaluating their performance. METHODS Our proposed method was based on simultaneous pre-training after over-sampling. We conducted three experiments in which we generated (1) English biomedical BERT from a small biomedical corpus, (2) Japanese medical BERT from a small medical corpus, and (3) enhanced biomedical BERT pre-trained from complete PubMed abstracts in a balanced manner and compared their performance with the conventional models. RESULTS We first confirmed that our English BERT pre-trained using both general and small medical-domain corpora performed sufficiently well for practical use in the biomedical language understanding evaluation (BLUE) benchmark. Moreover, our proposed method was more effective than conventional methods for each different biomedical corpus size with the same corpus size for the general domain. Next, our Japanese medical BERT outperformed the other BERT models built using a conventional method concerning the medical document classification task. It demonstrated the same trend as in the first experiment in English. Lastly, our enhanced biomedical BERT model, in which clinical notes were not used during pre-training, achieved both clinical and biomedical scores on the BLUE benchmark that were 0.3 points above those of the model trained without our proposed method. CONCLUSIONS Well-balanced pre-training by over-sampling instances derived from a corpus appropriate for the target task allowed us to construct a high-performance BERT model.
Title: Over-Sampling Effect in Pre-Training for Bidirectional Encoder Representations from Transformers (BERT) to Localize Medical BERT and Enhance Biomedical BERT (Preprint)
Description:
BACKGROUND Pre-training large-scale neural language models on raw texts has made a significant contribution to improving transfer learning in natural language processing.
With the introduction of transformer-based language models, such as bidirectional encoder representations from transformers (BERT), the performance of information extraction from free text has significantly improved for both the general and medical domains; however, it is difficult to train specific BERT models that perform well for domains in which there are few publicly available databases of high quality and large size.
OBJECTIVE We hypothesize that this problem can be addressed by over-sampling a domain-specific corpus and using it for pre-training with a larger corpus in a balanced manner.
In this study, we verify our hypothesis by developing pre-training models using our method and evaluating their performance.
METHODS Our proposed method was based on simultaneous pre-training after over-sampling.
We conducted three experiments in which we generated (1) English biomedical BERT from a small biomedical corpus, (2) Japanese medical BERT from a small medical corpus, and (3) enhanced biomedical BERT pre-trained from complete PubMed abstracts in a balanced manner and compared their performance with the conventional models.
RESULTS We first confirmed that our English BERT pre-trained using both general and small medical-domain corpora performed sufficiently well for practical use in the biomedical language understanding evaluation (BLUE) benchmark.
Moreover, our proposed method was more effective than conventional methods for each different biomedical corpus size with the same corpus size for the general domain.
Next, our Japanese medical BERT outperformed the other BERT models built using a conventional method concerning the medical document classification task.
It demonstrated the same trend as in the first experiment in English.
Lastly, our enhanced biomedical BERT model, in which clinical notes were not used during pre-training, achieved both clinical and biomedical scores on the BLUE benchmark that were 0.
3 points above those of the model trained without our proposed method.
CONCLUSIONS Well-balanced pre-training by over-sampling instances derived from a corpus appropriate for the target task allowed us to construct a high-performance BERT model.

Related Results

A Pre-Training Technique to Localize Medical BERT and to Enhance Biomedical BERT
A Pre-Training Technique to Localize Medical BERT and to Enhance Biomedical BERT
Abstract Background: Pre-training large-scale neural language models on raw texts has been shown to make a significant contribution to a strategy for transfer learning in n...
MD2PR: A Multi-level Distillation based Dense Passage Retrieval Model
MD2PR: A Multi-level Distillation based Dense Passage Retrieval Model
Abstract Reranker and retriever are two important components in information retrieval. The retriever typically adopts a dual-encoder model, where queries and docume...
On the Remote Calibration of Instrumentation Transformers: Influence of Temperature
On the Remote Calibration of Instrumentation Transformers: Influence of Temperature
The remote calibration of instrumentation transformers is theoretically possible using synchronous measurements across a transmission line with a known impedance and a local set of...
A Comparative Evaluation of Transformers and Deep Learning Models for Arabic Meter Classification
A Comparative Evaluation of Transformers and Deep Learning Models for Arabic Meter Classification
Arabic poetry follows intricate rhythmic patterns called ‘arūḍ’ (prosody), so its automated categorization is difficult. Although earlier studies mostly depend on conventional mach...
Development of a combined magnetic encoder
Development of a combined magnetic encoder
Purpose As a type of angular displacement sensor, the Hall-effect magnetic encoder incorporates many advantages. While compared with the photoelectric encoder, the magnetic encoder...

Back to Top