Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Incorporate Out-of-Vocabulary Words for Psycholinguistic Analysis using Social Media Texts - An OOV-aware Data Curation Process and a Hybrid Approach

View through CrossRef
Massive user generated social media texts (SMT) posits new opportunities as well as challenges for psycholinguistic analysis to understand individual differences such as personality. SMT are often written in an informal way, and thus contain lexical variants such as nonstandard spellings, capitalizations, and abbreviations. These lexical variants are referred as out-of-vocabulary (OOV) words. They are not captured in standard dictionaries used by standard Natural Language Processing (NLP) tools. Literature indicates that these OOV words may include hidden linguistic patterns that reflect individual characteristics. These OOV-related linguistic patterns are not captured by the existing closed-vocabulary and open-vocabulary approaches. To address these issues, this dissertation develops two artifacts, following a design science research process model. The first artifact is an OOV-aware data curation process that focuses on capturing and categorizing OOV words. The evaluation of the first artifact demonstrates that it can capture more OOV words and is useful in analyzing SMT. The second artifact is an OOV-aware hybrid approach that integrates the closed-vocabulary and open-vocabulary approaches with expanded OOV categories and OOV words. The hybrid approach shows an improved performance over existing approach. This dissertation makes theoretical contributions by adding additional OOV knowledge and a new method for psycholinguistic analysis of SMT. It also makes practical contributions by enabling psycholinguistic researchers and practitioners to exploit more psycholinguistic cues for tasks like personality prediction.
Claremont Colleges Library
Title: Incorporate Out-of-Vocabulary Words for Psycholinguistic Analysis using Social Media Texts - An OOV-aware Data Curation Process and a Hybrid Approach
Description:
Massive user generated social media texts (SMT) posits new opportunities as well as challenges for psycholinguistic analysis to understand individual differences such as personality.
SMT are often written in an informal way, and thus contain lexical variants such as nonstandard spellings, capitalizations, and abbreviations.
These lexical variants are referred as out-of-vocabulary (OOV) words.
They are not captured in standard dictionaries used by standard Natural Language Processing (NLP) tools.
Literature indicates that these OOV words may include hidden linguistic patterns that reflect individual characteristics.
These OOV-related linguistic patterns are not captured by the existing closed-vocabulary and open-vocabulary approaches.
To address these issues, this dissertation develops two artifacts, following a design science research process model.
The first artifact is an OOV-aware data curation process that focuses on capturing and categorizing OOV words.
The evaluation of the first artifact demonstrates that it can capture more OOV words and is useful in analyzing SMT.
The second artifact is an OOV-aware hybrid approach that integrates the closed-vocabulary and open-vocabulary approaches with expanded OOV categories and OOV words.
The hybrid approach shows an improved performance over existing approach.
This dissertation makes theoretical contributions by adding additional OOV knowledge and a new method for psycholinguistic analysis of SMT.
It also makes practical contributions by enabling psycholinguistic researchers and practitioners to exploit more psycholinguistic cues for tasks like personality prediction.

Related Results

Handling Compound Hindi OOV words in web queries
Handling Compound Hindi OOV words in web queries
Abstract Handling of Out of Vocabulary (OOV) words is still a problem in NLP. If there is a word for which morphological analyser is not able to find a morpheme, that word ...
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Genre implies formal and stylistic conventions of a particular text type, which inevitably affects the translation process. This „force of genre bias“ (Prieto Ramos, 2014) has been...
HSK 以外的汉语新词汇分类
HSK 以外的汉语新词汇分类
<p class="AA">As the number of new Chinese vocabulary increases year by year, mastering new vocabulary beyond the HSK (Global Chinese Proficiency Test) syllabus is crucial to...
Trajectory of Learning Academic Vocabulary: IT Undergraduates’ Vocabulary Learning Strategies and Performance at the Exam
Trajectory of Learning Academic Vocabulary: IT Undergraduates’ Vocabulary Learning Strategies and Performance at the Exam
Learning vocabulary is an integral part in language acquisition and acquisition of academic vocabulary is crucial for the success in an academic context. Therefore, many studies ha...
Penggunaan Kosakata untuk Pencitraan Diri di Media Sosial Facebook
Penggunaan Kosakata untuk Pencitraan Diri di Media Sosial Facebook
Abstract: This study examines the use of vocabulary for self-image by users of social media facebook. This study aims to: (1) describe the form of exponential vocabulary for self-i...
Morphophonological Changes Borrowed Core Vocabulary and Frequently-Used Words between Dholuo and Ekegusii Undergo
Morphophonological Changes Borrowed Core Vocabulary and Frequently-Used Words between Dholuo and Ekegusii Undergo
This study focuses on the borrowing of core vocabulary items and frequently used words between Dholuo and Ekegusii. It specifically seeks to investigate the morphophonological chan...
THE EFFECTIVENESS OF VOCABULARY GAMES ON VOCABULARY ACQUISITION: A LITERATURE REVIEW
THE EFFECTIVENESS OF VOCABULARY GAMES ON VOCABULARY ACQUISITION: A LITERATURE REVIEW
Acquiring language skills, namely listening, reading, speaking and writing, is fundamentally dependent on the mastery of vocabulary. Thus, teachers' application of attractive, ente...
Quantitative Evaluation of Vocabulary Emotional Color in Language Teaching
Quantitative Evaluation of Vocabulary Emotional Color in Language Teaching
Objective. In real communication, the context is complex and changeable and the color and meaning of some words will wander in the context. The development and changes of words are...

Back to Top