Javascript must be enabled to continue!
Trends in web data extraction using machine learning
View through CrossRef
Web data extraction has seen significant development in the last decade since its inception in the early nineties. It has evolved from a simple manual way of extracting data from web page and documents to automated extraction to an intelligent extraction using machine learning algorithms, tools and techniques. Data extraction is one of the key components of end-to-end life cycle in web data extraction process that includes navigation, extraction, data enrichment and visualization. This paper presents the journey of web data extraction over the last many years highlighting evolution of tools, techniques, frameworks and algorithms for building intelligent web data extraction systems. The paper also throws light into challenges, opportunities for future research and emerging trends over the years in web data extraction with specific focus on machine learning techniques. Both traditional and machine learning approaches to manual and automated web data extraction are experimented and results published with few use cases demonstrating the challenges in web data extraction in the event of changes in the website layout. This paper introduces novel ideas such as self-healing capability in web data extraction and proactive error detection in the event of changes in website layout as an area of future research. This unique perspective will help readers to get deeper insights in to the present and future of web data extraction.
Title: Trends in web data extraction using machine learning
Description:
Web data extraction has seen significant development in the last decade since its inception in the early nineties.
It has evolved from a simple manual way of extracting data from web page and documents to automated extraction to an intelligent extraction using machine learning algorithms, tools and techniques.
Data extraction is one of the key components of end-to-end life cycle in web data extraction process that includes navigation, extraction, data enrichment and visualization.
This paper presents the journey of web data extraction over the last many years highlighting evolution of tools, techniques, frameworks and algorithms for building intelligent web data extraction systems.
The paper also throws light into challenges, opportunities for future research and emerging trends over the years in web data extraction with specific focus on machine learning techniques.
Both traditional and machine learning approaches to manual and automated web data extraction are experimented and results published with few use cases demonstrating the challenges in web data extraction in the event of changes in the website layout.
This paper introduces novel ideas such as self-healing capability in web data extraction and proactive error detection in the event of changes in website layout as an area of future research.
This unique perspective will help readers to get deeper insights in to the present and future of web data extraction.
Related Results
Learning 2.0: The future of learning in the Petroleum Industry.
Learning 2.0: The future of learning in the Petroleum Industry.
Abstract
Learning 2.0 is a new phase of learning based on emerging trends in eLearning and the second generation of web-based services, known as Web 2.0. The name...
Utilizing Large Language Models for Geoscience Literature Information Extraction
Utilizing Large Language Models for Geoscience Literature Information Extraction
Extracting information from unstructured and semi-structured geoscience literature is a crucial step in conducting geological research. The traditional machine learning extraction ...
An Approach to Machine Learning
An Approach to Machine Learning
The process of automatically recognising significant patterns within large amounts of data is called "machine learning." Throughout the last couple of decades, it has evolved into ...
DAMPAK TEKNOLOGI TERHADAP PROSES BELAJAR MENGAJAR
DAMPAK TEKNOLOGI TERHADAP PROSES BELAJAR MENGAJAR
DAFTAR PUSTAKAAditama, M. H. R., & Selfiardy, S. (2022). Kehidupan Mahasiswa Kuliah Sambil Bekerja di Masa Pandemi Covid-19. Kidspedia: Jurnal Pendidikan Anak Usia Dini, 3(...
Optimization of ultrasonic extraction of
Lycium barbarum
polysaccharides using response surface methodology
Optimization of ultrasonic extraction of
Lycium barbarum
polysaccharides using response surface methodology
Abstract
Ultrasonic extraction was a new development method to achieve high-efficiency extraction of
Lycium barbarum
...
Designing web-based learning opportunities for children related to health care (Preprint)
Designing web-based learning opportunities for children related to health care (Preprint)
BACKGROUND
Hospitalisation is a significant and stressful experience for children and parents which may
cause both short-term and long-term negative consequ...
Initial Experience with Pediatrics Online Learning for Nonclinical Medical Students During the COVID-19 Pandemic
Initial Experience with Pediatrics Online Learning for Nonclinical Medical Students During the COVID-19 Pandemic
Abstract
Background: To minimize the risk of infection during the COVID-19 pandemic, the learning mode of universities in China has been adjusted, and the online learning o...
WEB PROGRAMMING
WEB PROGRAMMING
"Web Programming" is a comprehensive book that provides a detailed overview of various aspects of web programming. The book is co-authored by Dr. Chitra Ravi and Dr. Mohan Kumar S,...

