Javascript must be enabled to continue!
Exploring Historical Labor Markets: Computational Approaches to Job Title Extraction
View through CrossRef
Historical job advertisements provide invaluable insights into the evolution of labor markets and societaldynamics. However, extracting structured information, such as job titles, from these OCRed and unstructuredtexts presents significant challenges. This study evaluates four distinct computational approachesfor job title extraction: a dictionary-based method, a rule-based approach leveraging linguistic patterns,a Named Entity Recognition (NER) model fine-tuned on historical data, and a text generation modeldesigned to rewrite advertisements into structured lists.Our analysis spans multiple versions of the ANNO dataset, including raw OCR, automatically postcorrected,and human-corrected text, as well as an external dataset of German historical job advertisements.Results demonstrate that the NER approach consistently outperforms other methods, showcasingrobustness to OCR errors and variability in text quality. The text generation approach performs well onhigh-quality data but exhibits greater sensitivity to OCR-induced noise. While the rule-based method isless effective overall, it performs relatively well for ambiguous entities. The dictionary-based approach,though limited in precision, remains stable across datasets.This study highlights the impact of text quality on extraction performance and underscores the need foradaptable, generalizable methods. Future work should focus on integrating hybrid approaches, expandingannotated datasets, and improving OCR correction techniques to enhance the extraction of structuredinformation from historical texts. These advancements will enable deeper exploration of labor markettrends and contribute to the broader field of digital humanities.
Centre pour la Communication Scientifique Directe (CCSD)
Title: Exploring Historical Labor Markets: Computational Approaches to Job Title Extraction
Description:
Historical job advertisements provide invaluable insights into the evolution of labor markets and societaldynamics.
However, extracting structured information, such as job titles, from these OCRed and unstructuredtexts presents significant challenges.
This study evaluates four distinct computational approachesfor job title extraction: a dictionary-based method, a rule-based approach leveraging linguistic patterns,a Named Entity Recognition (NER) model fine-tuned on historical data, and a text generation modeldesigned to rewrite advertisements into structured lists.
Our analysis spans multiple versions of the ANNO dataset, including raw OCR, automatically postcorrected,and human-corrected text, as well as an external dataset of German historical job advertisements.
Results demonstrate that the NER approach consistently outperforms other methods, showcasingrobustness to OCR errors and variability in text quality.
The text generation approach performs well onhigh-quality data but exhibits greater sensitivity to OCR-induced noise.
While the rule-based method isless effective overall, it performs relatively well for ambiguous entities.
The dictionary-based approach,though limited in precision, remains stable across datasets.
This study highlights the impact of text quality on extraction performance and underscores the need foradaptable, generalizable methods.
Future work should focus on integrating hybrid approaches, expandingannotated datasets, and improving OCR correction techniques to enhance the extraction of structuredinformation from historical texts.
These advancements will enable deeper exploration of labor markettrends and contribute to the broader field of digital humanities.
Related Results
Anteseden Kinerja Karyawan PT. Bank Mandiri Persero Tbk Area Jakarta Cikini
Anteseden Kinerja Karyawan PT. Bank Mandiri Persero Tbk Area Jakarta Cikini
AbstractThe problem of this research comes from a phenomenon that occurred to employees in PT. Bank Mandiri (Persero) Tbk Area Jakarta Cikini. The objectives of the research are to...
Job Analysis for Industrial Training
Job Analysis for Industrial Training
Job analysis is the common basis for designing a training course or
programme, preparing performance tests, writing position (job)
descriptions, identifying performance appraisal c...
The relationship between job stress and job burnout of preschool teachers during the COVID-19: The moderation of perceived organizational support
The relationship between job stress and job burnout of preschool teachers during the COVID-19: The moderation of perceived organizational support
BACKGROUND: COVID-19 poses great challenges for preschool teachers in China, which will increase the level of job stress and job burnout, and have an impact on the relationship bet...
Pregnant Prisoners in Shackles
Pregnant Prisoners in Shackles
Photo by niu niu on Unsplash
ABSTRACT
Shackling prisoners has been implemented as standard procedure when transporting prisoners in labor and during childbirth. This procedure ensu...
Skill signaling, job mobility and wage dynamics: evidence from Ethiopia’s industrial parks
Skill signaling, job mobility and wage dynamics: evidence from Ethiopia’s industrial parks
Purpose
This study examines how labor market information interventions influence wage formation and job mobility in emerging industrial economies. Using Ethiopia’...
An empirical study on the lead-lag relationship between individual share futures and spot markets: focused on NHN and GS Construction futures
An empirical study on the lead-lag relationship between individual share futures and spot markets: focused on NHN and GS Construction futures
This study tests the lead-lag relationship between spot and futures markets of NHN and GS construction company. We introduced the daily near by futures price and spot price of the ...
Day labor, informality and vulnerability in South Africa and the United States
Day labor, informality and vulnerability in South Africa and the United States
Purpose– The purpose of this paper is to compare conditions in informal day-labor markets in South Africa and the USA to better understand the nature of worker vulnerabilities in t...
Pengaruh Job Insecurity, Job Stress, Work-Family Conflict Terhadap Turnover Intention Pada Karyawan BPO di Yogyakarta Dimediasi Job Satisfaction
Pengaruh Job Insecurity, Job Stress, Work-Family Conflict Terhadap Turnover Intention Pada Karyawan BPO di Yogyakarta Dimediasi Job Satisfaction
This study aims to examine and analyze the influence of job insecurity, job stress and work-family conflict on turnover intention which is mediated by job satisfaction. This resear...

