Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Benchmarking Knowledge and Capability of Large Language Models in Building Science Domain

View through CrossRef
<p>Large language models (LLMs) are increasingly adopted across scientific and engineering fields. However, applying general-purpose LLMs to specialized engineering domains imposes stringent requirements for structured knowledge, rigorous reasoning, and technical precision. Thus, the suitability of current general-purpose LLMs for practical applications in engineering domains remains questionable. To understand the mastery level of LLMs in the building science domain as one broad but specific engineering domain, in this paper, we perform a comprehensive benchmark analysis (with benchmark dataset of 1,487 questions) to evaluate abilities of 15 state-of-the-art (SOTA) LLMs across 12 core subject topics in the building science domain. To enable scalable and robust evaluation, we propose and validate an AI-Judger for assessment across five dimensions of abilities, i.e., knowledge and concept, logic and consistency, clarity of expression, and reflection and exploratory. Overall, SOTA general-purposes LLMs achieve only ~50% accuracy on average in answering different types of questions. The capabilities of LLMs decrease progressively from linguistic expression and factual knowledge to logical reasoning, then reflection and exploratory thinking. For different tasks, LLMs exhibit notably low accuracy on calculation (~13%), short-answer (~23%), and cloze tasks (~30%), contrast to stronger performance on single-choice (74%) and multiple-choice questions (63%). Finally, pronounced variance of LLM performance exists across topics, with relatively low accuracy on physics fundamental and HVAC&R-related questions (median of 20%-40%) compared to ~80% for building standards and codes. These identified gaps highlight the limitations of general-purpose LLMs in engineering contexts, clearly pointing to the necessity of developing domain-specific LLMs tailored for engineering applications.</p>
Title: Benchmarking Knowledge and Capability of Large Language Models in Building Science Domain
Description:
<p>Large language models (LLMs) are increasingly adopted across scientific and engineering fields.
However, applying general-purpose LLMs to specialized engineering domains imposes stringent requirements for structured knowledge, rigorous reasoning, and technical precision.
Thus, the suitability of current general-purpose LLMs for practical applications in engineering domains remains questionable.
To understand the mastery level of LLMs in the building science domain as one broad but specific engineering domain, in this paper, we perform a comprehensive benchmark analysis (with benchmark dataset of 1,487 questions) to evaluate abilities of 15 state-of-the-art (SOTA) LLMs across 12 core subject topics in the building science domain.
To enable scalable and robust evaluation, we propose and validate an AI-Judger for assessment across five dimensions of abilities, i.
e.
, knowledge and concept, logic and consistency, clarity of expression, and reflection and exploratory.
Overall, SOTA general-purposes LLMs achieve only ~50% accuracy on average in answering different types of questions.
The capabilities of LLMs decrease progressively from linguistic expression and factual knowledge to logical reasoning, then reflection and exploratory thinking.
For different tasks, LLMs exhibit notably low accuracy on calculation (~13%), short-answer (~23%), and cloze tasks (~30%), contrast to stronger performance on single-choice (74%) and multiple-choice questions (63%).
Finally, pronounced variance of LLM performance exists across topics, with relatively low accuracy on physics fundamental and HVAC&R-related questions (median of 20%-40%) compared to ~80% for building standards and codes.
These identified gaps highlight the limitations of general-purpose LLMs in engineering contexts, clearly pointing to the necessity of developing domain-specific LLMs tailored for engineering applications.
</p>.

Related Results

Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
An optimisational model of benchmarking
An optimisational model of benchmarking
PurposeThe purpose of this paper is to develop a quantitative methodology for benchmarking process which is simple, effective and efficient as a rejoinder to benchmarking detractor...
A review on benchmarking of supply chain performance measures
A review on benchmarking of supply chain performance measures
PurposeThe purpose of this paper is to redress the imbalances in the past literature of supply chain benchmarking and enhance data envelopment analysis (DEA) modeling approach in s...
The need for adaptive processes of benchmarking in small business‐to‐business services
The need for adaptive processes of benchmarking in small business‐to‐business services
PurposeThis paper aims to explore current management attitudes towards benchmarking and its implementation within small business‐to‐business service firms in order to enhance a dee...
Organisational ensuring the international benchmarking of the enterprise
Organisational ensuring the international benchmarking of the enterprise
This paper delves into the contemporary significance of organizational facilitation for international benchmarking within enterprises. It explores strategies and methodologies, she...
Der skal ikke lades sten på sten tilbage
Der skal ikke lades sten på sten tilbage
The Building by the Barbar TempleClose by the large temple at Barbar 1) lies a little tell, which was investigated in the spring of 1956. The tell was shown to cover a building of ...
De gevel – een intermediair element tussen buiten en binnen
De gevel – een intermediair element tussen buiten en binnen
This study is based on the fact that all people have a basic need for protection from other people (and animals) as well as from the elements (the exterior climate). People need a ...

Back to Top