Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Uncovering Companies Missing from the SABI Database: A Web Scraping Approach

View through CrossRef
This study evaluates the completeness and representativeness of the SABI database, a widely used commercial source for firm-level data in Spain and Portugal, by comparing it to BORME, the official Spanish business register. Using web scraping techniques, we collected and processed approximately 100,000 BORME publications in PDF format, covering the period from 2010 to 2023. These were transformed into a structured dataset comprising over 1.2 million companies, which we then matched against SABI records from the same period. Our analysis reveals that SABI covers only 38.3% of newly established companies, with significant underrepresentation of younger firms, small enterprises, specific sectors, and certain regions. Furthermore, we find clear evidence of survivorship bias: the longer a company has been dissolved, the less likely it is to appear in SABI. Sectoral and geographic disparities are also substantial, and the coverage is skewed toward firms with higher initial capital and specific legal forms. These findings suggest that SABI represents a non-random subset of the Spanish business population, and caution should be exercised when using it for empirical research. Adjustments for sample bias are recommended to improve the reliability of analyses based on this database.
Ediciones Profesionales de la Informacion SL
Title: Uncovering Companies Missing from the SABI Database: A Web Scraping Approach
Description:
This study evaluates the completeness and representativeness of the SABI database, a widely used commercial source for firm-level data in Spain and Portugal, by comparing it to BORME, the official Spanish business register.
Using web scraping techniques, we collected and processed approximately 100,000 BORME publications in PDF format, covering the period from 2010 to 2023.
These were transformed into a structured dataset comprising over 1.
2 million companies, which we then matched against SABI records from the same period.
Our analysis reveals that SABI covers only 38.
3% of newly established companies, with significant underrepresentation of younger firms, small enterprises, specific sectors, and certain regions.
Furthermore, we find clear evidence of survivorship bias: the longer a company has been dissolved, the less likely it is to appear in SABI.
Sectoral and geographic disparities are also substantial, and the coverage is skewed toward firms with higher initial capital and specific legal forms.
These findings suggest that SABI represents a non-random subset of the Spanish business population, and caution should be exercised when using it for empirical research.
Adjustments for sample bias are recommended to improve the reliability of analyses based on this database.

Related Results

PERBANDINGAN ESTETIKA WABI SABI PADA HAIKU KARYA MATSUO BASHO DAN MASAOKA SHIKI DALAM BUKU WABI SABI KARYA MARK REIBSTEIN
PERBANDINGAN ESTETIKA WABI SABI PADA HAIKU KARYA MATSUO BASHO DAN MASAOKA SHIKI DALAM BUKU WABI SABI KARYA MARK REIBSTEIN
Penelitian ini meneliti tentang perbandingan estetika wabi sabi pada haiku karya Matsuo Basho dan Masaoka Shiki pada buku Wabi Sabi karya Mark Reibstein. Penelitian ini bertujuan m...
Sensing Retreat: Wabi-sabi-inspired retreat design for comfort and wellness
Sensing Retreat: Wabi-sabi-inspired retreat design for comfort and wellness
<p><strong>The wellness retreat industry is rapidly growing, driven by rising stress levels, a desire for digital detox, and an increased focus on holistic health. Well...
Estetika Wabi-Sabi dan Kenyamanan Interior Pendekatan Berbasis Data terhadap Desain Holistik
Estetika Wabi-Sabi dan Kenyamanan Interior Pendekatan Berbasis Data terhadap Desain Holistik
Konsep Wabi-Sabi dalam desain interior menekankan estetika ketidaksempurnaan, kesederhanaan, dan keterhubungan dengan alam, yang semakin relevan dalam menciptakan ruang yang nyaman...
Controls on the Effect of Impact Scraping on High-position and Long-runout Landslides
Controls on the Effect of Impact Scraping on High-position and Long-runout Landslides
Abstract Landslides in mountainous areas act as an important control on morphological landscape evolution and represent a major natural hazard. The dynamic characteristics ...
Long-range superharmonic Josephson current and spin-triplet pairing correlations in a junction with ferromagnetic bilayers
Long-range superharmonic Josephson current and spin-triplet pairing correlations in a junction with ferromagnetic bilayers
AbstractThe long-range spin-triplet supercurrent transport is an interesting phenomenon in the superconductor/ferromagnet ("Equation missing") heterostructure containing noncolline...
Missing Person Identification System
Missing Person Identification System
One main objective of the project is to develop a missing youngster (people) realize application that may be useful for the folk whose beloved ones are kidnapped lost, or saved by ...
Web Mining for Public E-Services Personalization
Web Mining for Public E-Services Personalization
Over the last decade, we have witnessed an explosive growth in the information available on the Web. Today, Web browsers provide easy access to myriad sources of text and multimedi...
Web Mining for Public E-Services Personalization
Web Mining for Public E-Services Personalization
Over the last decade, we have witnessed an explosive growth in the information available on the Web. Today, Web browsers provide easy access to myriad sources of text and multimedi...

Back to Top