Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

An Improved Best‐Worst Method Integrated With Combined Compromise Solution for Evaluating Large Language Models

View through CrossRef
The emergence of large language models (LLMs) has substantially changed the artificial intelligence field, enabling its wide use over different domains. As various LLM alternatives have been developed, the current study proposes a novel decision‐support framework for evaluating and benchmarking LLMs based on multicriteria decision‐making (MCDM) techniques. In the proposed framework, an improved version of the best‐worst method (BWM) is proposed to effectively reduce the computational complexity of assigning a critical weight for the evaluation criteria of LLMs. Then, the improved BWM is integrated with the combined compromise solution (CoCoSo) method for ranking LLM alternatives. Findings show that the improved BWM successfully computes the criteria weights with low computational complexity compared to the original BWM. According to the enhanced BWM, the ‘factual errors’ criterion received the highest significant weight (0.2681), while the ‘logical inconsistencies’ criteria obtained the lowest (0.0827). The rest of the criteria were distributed in between that range. Subsequently, CoCoSo ranked the involved LLM alternatives in two different runs based on the extracted weights. Sensitivity analysis was employed to evaluate the effect of the assessment criteria on LLMs’ evaluation.
Title: An Improved Best‐Worst Method Integrated With Combined Compromise Solution for Evaluating Large Language Models
Description:
The emergence of large language models (LLMs) has substantially changed the artificial intelligence field, enabling its wide use over different domains.
As various LLM alternatives have been developed, the current study proposes a novel decision‐support framework for evaluating and benchmarking LLMs based on multicriteria decision‐making (MCDM) techniques.
In the proposed framework, an improved version of the best‐worst method (BWM) is proposed to effectively reduce the computational complexity of assigning a critical weight for the evaluation criteria of LLMs.
Then, the improved BWM is integrated with the combined compromise solution (CoCoSo) method for ranking LLM alternatives.
Findings show that the improved BWM successfully computes the criteria weights with low computational complexity compared to the original BWM.
According to the enhanced BWM, the ‘factual errors’ criterion received the highest significant weight (0.
2681), while the ‘logical inconsistencies’ criteria obtained the lowest (0.
0827).
The rest of the criteria were distributed in between that range.
Subsequently, CoCoSo ranked the involved LLM alternatives in two different runs based on the extracted weights.
Sensitivity analysis was employed to evaluate the effect of the assessment criteria on LLMs’ evaluation.

Related Results

Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga
Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga
The actual use of classroom language is principally limited to the classroom environment. As far as foreign language learning is concerned, the classroom often turns out to be the ...
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Abstarct Introduction Isolated brain hydatid disease (BHD) is an extremely rare form of echinococcosis. A prompt and timely diagnosis is a crucial step in disease management. This ...
Comparative Methods for Building Chatbots: Open Source, Hybrid, and Fully Integrated Large Language Models
Comparative Methods for Building Chatbots: Open Source, Hybrid, and Fully Integrated Large Language Models
In the complex and dynamic realm of biodiversity informatics, the accessibility and comprehension of standards and vocabularies are pivotal for, but not limited to, effective data ...
A Wideband mm-Wave Printed Dipole Antenna for 5G Applications
A Wideband mm-Wave Printed Dipole Antenna for 5G Applications
<span lang="EN-MY">In this paper, a wideband millimeter-wave (mm-Wave) printed dipole antenna is proposed to be used for fifth generation (5G) communications. The single elem...
Exploring Language Features of Male and Female Speakers in Pakistani TEDx Talks: A Corpus-based Comparative Analysis
Exploring Language Features of Male and Female Speakers in Pakistani TEDx Talks: A Corpus-based Comparative Analysis
The study explores the linguistic patterns in Pakistani TEDx Talks. It is based on gender-based language use. It consists of ten talks selected from YouTube and applies both quanti...
Microwave Ablation with or Without Chemotherapy in Management of Non-Small Cell Lung Cancer: A Systematic Review
Microwave Ablation with or Without Chemotherapy in Management of Non-Small Cell Lung Cancer: A Systematic Review
Abstract Introduction  Microwave ablation (MWA) has emerged as a minimally invasive treatment for patients with inoperable non-small cell lung cancer (NSCLC). However, whether it i...

Back to Top