Javascript must be enabled to continue!

Bridging the Knowledge Gap: Improving BERT models for answering MCQs by using Ontology-generated synthetic MCQA Dataset

BERT-based models possess impressive language understanding capabilities but often lack domain-specific knowledge, limiting their performance on specialised tasks such as medical multiple-choice question answering (MCQA). In this paper, we study how biomedical ontologies, rich repositories of medical knowledge, can be harnessed to enhance BERT-based models for medical MCQA task. Our contributions include OntoMCQA-Gen, a system which leverages different biomedical ontologies to construct BioOntoMCQA, a large synthetic MCQA dataset. OntoMCQA-Gen exploits the subclass-class relationships, definitions of concepts, and also synonym relationships from the ontologies to create this dataset of MCQs automatically. We then use this synthetic dataset to fine-tune various BERT-based models to answer medical MCQs. We evaluated these fine-tuned BERT models on the challenging MedMCQA and MedQA datasets of questions from admission examinations for medical degrees in India and USA, respectively. Our evaluation study on these datasets shows that fine-tuning the BERT-based models on BioOntoMCQA results in significantly improved accuracy scores. BioBERT and PubMedBERT, pretrained on the large medical corpus, have also shown significant improvements with our technique of fine-tuning ontology-generated synthetic data. This finding highlights the effectiveness of incorporating biomedi- cal ontologies to enhance the BERT-based model in the medical domain. Moreover, our results underscore the importance of using ontology-generated data along with model adaptation for specialised domains, contributing to a novel advancement in natural language processing.

University of Florida George A Smathers Libraries

Sahil P Sreenivasa Kumar

The International FLAIRS Conference Proceedings

2024

Title: Bridging the Knowledge Gap: Improving BERT models for answering MCQs by using Ontology-generated synthetic MCQA Dataset

Description:

In this paper, we study how biomedical ontologies, rich repositories of medical knowledge, can be harnessed to enhance BERT-based models for medical MCQA task.

Our contributions include OntoMCQA-Gen, a system which leverages different biomedical ontologies to construct BioOntoMCQA, a large synthetic MCQA dataset.

OntoMCQA-Gen exploits the subclass-class relationships, definitions of concepts, and also synonym relationships from the ontologies to create this dataset of MCQs automatically.

We then use this synthetic dataset to fine-tune various BERT-based models to answer medical MCQs.

We evaluated these fine-tuned BERT models on the challenging MedMCQA and MedQA datasets of questions from admission examinations for medical degrees in India and USA, respectively.

Our evaluation study on these datasets shows that fine-tuning the BERT-based models on BioOntoMCQA results in significantly improved accuracy scores.

BioBERT and PubMedBERT, pretrained on the large medical corpus, have also shown significant improvements with our technique of fine-tuning ontology-generated synthetic data.

This finding highlights the effectiveness of incorporating biomedi- cal ontologies to enhance the BERT-based model in the medical domain.

Moreover, our results underscore the importance of using ontology-generated data along with model adaptation for specialised domains, contributing to a novel advancement in natural language processing.

Back

The use of multiple-choice questions (MCQs) in law schools has not gained widespread acceptance, unlike in medical schools where they enjoy global usage. Law Schools traditionally ...

COGNITIVE LEVEL OF MCQS IN PHARMACOLOGY

Introduction: Medical universities have started MCQs as assessment tools in various disciplines for the last few years.However, maintaining the standards and quality of these MCQs ...

Medical knowledge of ChatGPT in public health, infectious diseases, COVID-19 pandemic, and vaccines: multiple choice questions examination based performance

BackgroundAt the beginning of the year 2023, the Chatbot Generative Pre-Trained Transformer (ChatGPT) gained remarkable attention from the public. There is a great discussion about...

AI versus human-generated multiple-choice questions for medical education: a cohort study in a high-stakes examination

Abstract Background The creation of high-quality multiple-choice questions (MCQs) is essential for medical education assessments but is resource-...

Large Language Model Clinical Vignettes and Multiple-Choice Questions for Postgraduate Medical Education

Abstract Problem Clinical vignette–based multiple-choice questions (MCQs) have been used to assess postgraduate medical t...

ChatGPT Knowledge Evaluation in Basic and Clinical Medical Sciences: Multiple Choice Question Examination-Based Performance

The Chatbot Generative Pre-Trained Transformer (ChatGPT) has garnered great attention from the public, academicians and science communities. It responds with appropriate and articu...

Over-Sampling Effect in Pre-Training for Bidirectional Encoder Representations from Transformers (BERT) to Localize Medical BERT and Enhance Biomedical BERT (Preprint)

BACKGROUND Pre-training large-scale neural language models on raw texts has made a significant contribution to improving transfer learning in natural langua...

Item Analysis of Multiple Choice Questions of Anatomy at Aziz Fatimah Medical and Dental College, Faisalabad

Objective: The aim of our study was to evaluate MCQs in send up exam of 2nd year MBBS. To discard or change poor items with low discriminatory index, very easy and very difficult i...

Email:
Password:

Email:

Bridging the Knowledge Gap: Improving BERT models for answering MCQs by using Ontology-generated synthetic MCQA Dataset

Related Results