Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

The many faces of a text : applications and enhancements of multi-label text classification algorithms

View through CrossRef
Multi-Label Text Classification (MLTC) is a challenging yet vital component of analyzing large text collections. The aim of MLTC is to assign one or multiple labels to a text, which can include multiple topics, emotions, or medical codes. This raises a few challenges, including imbalanced label sets, domain-specific terminology, label space complexity, and the increasing complexity of models. Hence, this dissertation explores state-of-the-art techniques to counter these issues on the one hand and critically evaluates existing methods on the other hand. Part I of the dissertation tackles the issue of tracking vaccine hesitancy arguments using MLTC models. First, we describe the development of a vaccine hesitancy monitor (Vaccinpraat) and the task of detecting vaccine hesitancy arguments from X posts and Facebook comments. Additionally, we introduce CoNTACT, a Dutch Large Language Model (LLM) adapted to the language use in COVID-19 X posts. Compared to base models, CoNTACT yields improvements for both vaccine hesitancy detection and multi-label vaccine hesitancy argument classification. Finally, we augment the Vaccinpraat dataset with LLM-generated vaccine-hesitant X posts annotated with multi-label vaccine hesitancy arguments. We find that adding this data, despite its prototypical nature, advances the performance of multiple models on argument classification further. Part II of the dissertation expands the scope of the research by investigating data scarcity, label space complexity, and computational efficiency for multiple domains. We compare the performance of generative LLMs with fine-tuned LLMs for topic classification of news articles related to Corporate Social Responsibility (CSR). To further enhance the performance of fine-tuned LLMs, we train them with additional training objectives and augment the training data with LLM-generated paraphrases of the training data. We observe that fine-tuned LLMs outperform generative LLMs. To address the issue of label space complexity, we model label hierarchies by fine-tuning LLMs with hierarchy-aware loss functions. We explore two geometric spaces to calculate the similarity measures for these loss functions, namely the Euclidean space and the hyperbolic space. We find that both spaces yield equal results for both loss functions. Finally, we investigate a computationally efficient classification method that leverages the semantic similarity between texts and labels. We efficiently optimize label-specific thresholds, which consistently outperforms existing thresholding methods on multiple datasets. In sum, this dissertation offers insight into the complexities of multi-label text classification by tackling several core issues, evaluating existing approaches to these issues, and proposing novel potential solutions.
University of Antwerp
Title: The many faces of a text : applications and enhancements of multi-label text classification algorithms
Description:
Multi-Label Text Classification (MLTC) is a challenging yet vital component of analyzing large text collections.
The aim of MLTC is to assign one or multiple labels to a text, which can include multiple topics, emotions, or medical codes.
This raises a few challenges, including imbalanced label sets, domain-specific terminology, label space complexity, and the increasing complexity of models.
Hence, this dissertation explores state-of-the-art techniques to counter these issues on the one hand and critically evaluates existing methods on the other hand.
Part I of the dissertation tackles the issue of tracking vaccine hesitancy arguments using MLTC models.
First, we describe the development of a vaccine hesitancy monitor (Vaccinpraat) and the task of detecting vaccine hesitancy arguments from X posts and Facebook comments.
Additionally, we introduce CoNTACT, a Dutch Large Language Model (LLM) adapted to the language use in COVID-19 X posts.
Compared to base models, CoNTACT yields improvements for both vaccine hesitancy detection and multi-label vaccine hesitancy argument classification.
Finally, we augment the Vaccinpraat dataset with LLM-generated vaccine-hesitant X posts annotated with multi-label vaccine hesitancy arguments.
We find that adding this data, despite its prototypical nature, advances the performance of multiple models on argument classification further.
Part II of the dissertation expands the scope of the research by investigating data scarcity, label space complexity, and computational efficiency for multiple domains.
We compare the performance of generative LLMs with fine-tuned LLMs for topic classification of news articles related to Corporate Social Responsibility (CSR).
To further enhance the performance of fine-tuned LLMs, we train them with additional training objectives and augment the training data with LLM-generated paraphrases of the training data.
We observe that fine-tuned LLMs outperform generative LLMs.
To address the issue of label space complexity, we model label hierarchies by fine-tuning LLMs with hierarchy-aware loss functions.
We explore two geometric spaces to calculate the similarity measures for these loss functions, namely the Euclidean space and the hyperbolic space.
We find that both spaces yield equal results for both loss functions.
Finally, we investigate a computationally efficient classification method that leverages the semantic similarity between texts and labels.
We efficiently optimize label-specific thresholds, which consistently outperforms existing thresholding methods on multiple datasets.
In sum, this dissertation offers insight into the complexities of multi-label text classification by tackling several core issues, evaluating existing approaches to these issues, and proposing novel potential solutions.

Related Results

Sentencing Enhancements
Sentencing Enhancements
Sentencing enhancements are policies that mandate that people who are convicted of criminalized behaviors while engaging in generally non-criminalized behaviors—such as being in a ...
BMFS: Bidirectional weighted approach for multi-label feature selection algorithm
BMFS: Bidirectional weighted approach for multi-label feature selection algorithm
Abstract Shortcomings of the existing multi-label feature selection algorithms, such as non-considering the correlation of label space, ignoring the possible difference of ...
Afaan Oromo Multi-Label News Text Classification Using Deep Learning Approach
Afaan Oromo Multi-Label News Text Classification Using Deep Learning Approach
Abstract Classification is a technique for categorizing textual data into a form of predefined categories. Due to its major consequences in regard to critical tasks such as...
Hubungan Pengetahuan terkait Label Gizi dengan Kebiasaan Membaca Label Gizi pada Siswa SMA Al-Islam
Hubungan Pengetahuan terkait Label Gizi dengan Kebiasaan Membaca Label Gizi pada Siswa SMA Al-Islam
Latar Belakang: Masih sedikit konsumen yang dapat memahami dan menggunakan label gizi sesuai dengan fungsinya. Hal ini dikarenakan masih rendahnya kesadaran masyarakat terkait pent...
Multi-label Emotion Classification on Social Media Comments using Deep learning
Multi-label Emotion Classification on Social Media Comments using Deep learning
Abstract Social media is an online platform that people use to develop social networks or relationships with others. Every day, millions of people use different social medi...
Fuze Well Mechanical Interface
Fuze Well Mechanical Interface
<div class="section abstract"> <div class="htmlview paragraph">This interface standard applies to fuzes used in airborne weapons that use a 3-Inch Fuze Well. It defin...
E-Press and Oppress
E-Press and Oppress
From elephants to ABBA fans, silicon to hormone, the following discussion uses a new research method to look at printed text, motion pictures and a te...
A Scalable Clustering-Based Local Multi-Label Classification Method
A Scalable Clustering-Based Local Multi-Label Classification Method
Multi-label classification aims to assign multiple labels to a single test instance. Recently, more and more multi-label classification applications arise as large-scale problems, ...

Back to Top