Javascript must be enabled to continue!
Incorporate Out-of-Vocabulary Words for Psycholinguistic Analysis using Social Media Texts - An OOV-aware Data Curation Process and a Hybrid Approach
View through CrossRef
Massive user generated social media texts (SMT) posits new opportunities as well as challenges for psycholinguistic analysis to understand individual differences such as personality. SMT are often written in an informal way, and thus contain lexical variants such as nonstandard spellings, capitalizations, and abbreviations. These lexical variants are referred as out-of-vocabulary (OOV) words. They are not captured in standard dictionaries used by standard Natural Language Processing (NLP) tools. Literature indicates that these OOV words may include hidden linguistic patterns that reflect individual characteristics. These OOV-related linguistic patterns are not captured by the existing closed-vocabulary and open-vocabulary approaches. To address these issues, this dissertation develops two artifacts, following a design science research process model. The first artifact is an OOV-aware data curation process that focuses on capturing and categorizing OOV words. The evaluation of the first artifact demonstrates that it can capture more OOV words and is useful in analyzing SMT. The second artifact is an OOV-aware hybrid approach that integrates the closed-vocabulary and open-vocabulary approaches with expanded OOV categories and OOV words. The hybrid approach shows an improved performance over existing approach. This dissertation makes theoretical contributions by adding additional OOV knowledge and a new method for psycholinguistic analysis of SMT. It also makes practical contributions by enabling psycholinguistic researchers and practitioners to exploit more psycholinguistic cues for tasks like personality prediction.
Title: Incorporate Out-of-Vocabulary Words for Psycholinguistic Analysis using Social Media Texts - An OOV-aware Data Curation Process and a Hybrid Approach
Description:
Massive user generated social media texts (SMT) posits new opportunities as well as challenges for psycholinguistic analysis to understand individual differences such as personality.
SMT are often written in an informal way, and thus contain lexical variants such as nonstandard spellings, capitalizations, and abbreviations.
These lexical variants are referred as out-of-vocabulary (OOV) words.
They are not captured in standard dictionaries used by standard Natural Language Processing (NLP) tools.
Literature indicates that these OOV words may include hidden linguistic patterns that reflect individual characteristics.
These OOV-related linguistic patterns are not captured by the existing closed-vocabulary and open-vocabulary approaches.
To address these issues, this dissertation develops two artifacts, following a design science research process model.
The first artifact is an OOV-aware data curation process that focuses on capturing and categorizing OOV words.
The evaluation of the first artifact demonstrates that it can capture more OOV words and is useful in analyzing SMT.
The second artifact is an OOV-aware hybrid approach that integrates the closed-vocabulary and open-vocabulary approaches with expanded OOV categories and OOV words.
The hybrid approach shows an improved performance over existing approach.
This dissertation makes theoretical contributions by adding additional OOV knowledge and a new method for psycholinguistic analysis of SMT.
It also makes practical contributions by enabling psycholinguistic researchers and practitioners to exploit more psycholinguistic cues for tasks like personality prediction.
Related Results
Handling Compound Hindi OOV words in web queries
Handling Compound Hindi OOV words in web queries
Abstract
Handling of Out of Vocabulary (OOV) words is still a problem in NLP. If there is a word for which morphological analyser is not able to find a morpheme, that word ...
Responsibilised Resilience? Reworking Neoliberal Social Policy Texts
Responsibilised Resilience? Reworking Neoliberal Social Policy Texts
Introduction This essay begins with the premise that resilience, broadly defined as positive adaptation despite adversity (Garmezy and Rutter), and resilience building are importa...
Digital Curation and Doctoral Research
Digital Curation and Doctoral Research
This article considers digital curation in doctoral study and the role of the doctoral supervisor and institution in facilitating students’ acquisition of digital curation skills...
The Hybrid Breeding of Nanomedia
The Hybrid Breeding of Nanomedia
IntroductionIf human beings have become a geophysical force, capable of impacting the very crust and atmosphere of the planet, and if geophysical forces become objects of study, pr...
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Genre implies formal and stylistic conventions of a particular text type, which inevitably affects the translation process. This „force of genre bias“ (Prieto Ramos, 2014) has been...
Psycholinguistic Meanings of Playfulness
Psycholinguistic Meanings of Playfulness
The aim of the article is to describe psycholinguistic meanings of the word-stimulus “playfulness” in the linguistic world-image of the Russian-speaking population of Ukraine. The ...
Big data curation framework: Curation actions and challenges
Big data curation framework: Curation actions and challenges
Big data curation represents an emerging topic of inquiry but still in an early phase along its adoption curve. The term big data itself is a nebulous concept, and the differences ...
Računalno potpomognuto usmjeravanje kod dvojezičnih govornika
Računalno potpomognuto usmjeravanje kod dvojezičnih govornika
This thesis investigates whether modern computer models can confirm how people encounter words and then use these findings in didactics. In recent years, computers have been used i...

