Javascript must be enabled to continue!
GA: A Comprehensive Survey on LLM-based GUI Agent
View through CrossRef
The Graphical User Interface (GUI) is a visual method that allows users to interact with computers and mobile devices. Nowadays, users rely on GUI for completing some tasks, such as browsing web or using mobile applications. Users often meet some needs such as setting an alarm for 8:00 AM to wake them up and checking the weather for tomorrow. Some commercial agents have been integrated into users personal phones to help the user accomplish a series of basic tasks. Unfortunately, these commercial agents often relied on fixed templates or program scripts to ensure reliability. This also limited their functionality to some basic system applications. Recently large language models (LLMs) have made significant breakthroughs in natural language processing (NLP). Astonishingly, LLMs have demonstrated not only a strong ability to understand and generate text but also planning and reasoning capabilities. Some researchers have considered using LLMs as the agent’s brain, equipping these agents with corresponding capabilities. LLM-based agents are also being applied to help users automate tasks on their personal phones and computers. These agents often can understand the GUI environment on personal phones and computers, allowing them to make decisions to complete tasks. This is also the origin of the term “GUI Agent”. Our review surveys recent research on LLM-based GUI Agents. We summarize the capabilities of existing GUI Agents and also discuss the GUI Agent task automation pipeline. A comprehensive list of studies in this paper will be available at a GitHub repositories.
Institute of Electrical and Electronics Engineers (IEEE)
Title: GA: A Comprehensive Survey on LLM-based GUI Agent
Description:
The Graphical User Interface (GUI) is a visual method that allows users to interact with computers and mobile devices.
Nowadays, users rely on GUI for completing some tasks, such as browsing web or using mobile applications.
Users often meet some needs such as setting an alarm for 8:00 AM to wake them up and checking the weather for tomorrow.
Some commercial agents have been integrated into users personal phones to help the user accomplish a series of basic tasks.
Unfortunately, these commercial agents often relied on fixed templates or program scripts to ensure reliability.
This also limited their functionality to some basic system applications.
Recently large language models (LLMs) have made significant breakthroughs in natural language processing (NLP).
Astonishingly, LLMs have demonstrated not only a strong ability to understand and generate text but also planning and reasoning capabilities.
Some researchers have considered using LLMs as the agent’s brain, equipping these agents with corresponding capabilities.
LLM-based agents are also being applied to help users automate tasks on their personal phones and computers.
These agents often can understand the GUI environment on personal phones and computers, allowing them to make decisions to complete tasks.
This is also the origin of the term “GUI Agent”.
Our review surveys recent research on LLM-based GUI Agents.
We summarize the capabilities of existing GUI Agents and also discuss the GUI Agent task automation pipeline.
A comprehensive list of studies in this paper will be available at a GitHub repositories.
Related Results
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract
Introduction
The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Human-AI Collaboration in Clinical Reasoning: A UK Replication and Interaction Analysis
Human-AI Collaboration in Clinical Reasoning: A UK Replication and Interaction Analysis
Abstract
Objective
A paper from Goh et al found that a large language model (LLM) working alone outperformed American clinicians assisted...
Automating Information Retrieval from Biodiversity Literature Using Large Language Models: A Case Study
Automating Information Retrieval from Biodiversity Literature Using Large Language Models: A Case Study
Recently, Large Language Models (LLMs) have transformed information retrieval, becoming widely adopted across various domains due to their ability to process extensive textual data...
Unraveling the landscape of large language models: a systematic review and future perspectives
Unraveling the landscape of large language models: a systematic review and future perspectives
PurposeThe rapid rise of large language models (LLMs) has propelled them to the forefront of applications in natural language processing (NLP). This paper aims to present a compreh...
Financial Advisory LLM Model for Modernizing Financial Services and Innovative Solutions for Financial Literacy in India
Financial Advisory LLM Model for Modernizing Financial Services and Innovative Solutions for Financial Literacy in India
Abstract
Dynamically evolving financial conditions in India place sophisticated models of financial advisory services relative to its own peculiar conditions more in demand...
A Survey on Benchmarks of LLM-based GUI Agents
A Survey on Benchmarks of LLM-based GUI Agents
LLM-based GUI agents have made rapid progress in understanding visual interfaces, interpreting user intentions, and executing multi-step operations across web, mobile, and desktop ...
Generalized Agent Theory from First Principles
Generalized Agent Theory from First Principles
To address the fragmentation in the definition of Agent and the profound challenges concerning the nature of intelligence, consciousness, and the observer-based unification of phys...
Leveraging simulation to provide a practical framework for assessing the novel scope of risk of LLMs in healthcare
Leveraging simulation to provide a practical framework for assessing the novel scope of risk of LLMs in healthcare
Structured Abstract
Background
Large language models (LLMs) are rapidly entering clinical care, yet their definitionally probab...

