Javascript must be enabled to continue!
Incorporate Out-of-Vocabulary Words for Psycholinguistic Analysis using Social Media Texts - An OOV-aware Data Curation Process and a Hybrid Approach
View through CrossRef
Massive user generated social media texts (SMT) posits new opportunities as well as challenges for psycholinguistic analysis to understand individual differences such as personality. SMT are often written in an informal way, and thus contain lexical variants such as nonstandard spellings, capitalizations, and abbreviations. These lexical variants are referred as out-of-vocabulary (OOV) words. They are not captured in standard dictionaries used by standard Natural Language Processing (NLP) tools. Literature indicates that these OOV words may include hidden linguistic patterns that reflect individual characteristics. These OOV-related linguistic patterns are not captured by the existing closed-vocabulary and open-vocabulary approaches. To address these issues, this dissertation develops two artifacts, following a design science research process model. The first artifact is an OOV-aware data curation process that focuses on capturing and categorizing OOV words. The evaluation of the first artifact demonstrates that it can capture more OOV words and is useful in analyzing SMT. The second artifact is an OOV-aware hybrid approach that integrates the closed-vocabulary and open-vocabulary approaches with expanded OOV categories and OOV words. The hybrid approach shows an improved performance over existing approach. This dissertation makes theoretical contributions by adding additional OOV knowledge and a new method for psycholinguistic analysis of SMT. It also makes practical contributions by enabling psycholinguistic researchers and practitioners to exploit more psycholinguistic cues for tasks like personality prediction.
Title: Incorporate Out-of-Vocabulary Words for Psycholinguistic Analysis using Social Media Texts - An OOV-aware Data Curation Process and a Hybrid Approach
Description:
Massive user generated social media texts (SMT) posits new opportunities as well as challenges for psycholinguistic analysis to understand individual differences such as personality.
SMT are often written in an informal way, and thus contain lexical variants such as nonstandard spellings, capitalizations, and abbreviations.
These lexical variants are referred as out-of-vocabulary (OOV) words.
They are not captured in standard dictionaries used by standard Natural Language Processing (NLP) tools.
Literature indicates that these OOV words may include hidden linguistic patterns that reflect individual characteristics.
These OOV-related linguistic patterns are not captured by the existing closed-vocabulary and open-vocabulary approaches.
To address these issues, this dissertation develops two artifacts, following a design science research process model.
The first artifact is an OOV-aware data curation process that focuses on capturing and categorizing OOV words.
The evaluation of the first artifact demonstrates that it can capture more OOV words and is useful in analyzing SMT.
The second artifact is an OOV-aware hybrid approach that integrates the closed-vocabulary and open-vocabulary approaches with expanded OOV categories and OOV words.
The hybrid approach shows an improved performance over existing approach.
This dissertation makes theoretical contributions by adding additional OOV knowledge and a new method for psycholinguistic analysis of SMT.
It also makes practical contributions by enabling psycholinguistic researchers and practitioners to exploit more psycholinguistic cues for tasks like personality prediction.
Related Results
Handling Compound Hindi OOV words in web queries
Handling Compound Hindi OOV words in web queries
Abstract
Handling of Out of Vocabulary (OOV) words is still a problem in NLP. If there is a word for which morphological analyser is not able to find a morpheme, that word ...
Digital Curation and Doctoral Research
Digital Curation and Doctoral Research
This article considers digital curation in doctoral study and the role of the doctoral supervisor and institution in facilitating students’ acquisition of digital curation skills...
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Genre implies formal and stylistic conventions of a particular text type, which inevitably affects the translation process. This „force of genre bias“ (Prieto Ramos, 2014) has been...
Psycholinguistic Meanings of Playfulness
Psycholinguistic Meanings of Playfulness
The aim of the article is to describe psycholinguistic meanings of the word-stimulus “playfulness” in the linguistic world-image of the Russian-speaking population of Ukraine. The ...
DAMPAK TEKNOLOGI TERHADAP PROSES BELAJAR MENGAJAR
DAMPAK TEKNOLOGI TERHADAP PROSES BELAJAR MENGAJAR
DAFTAR PUSTAKAAditama, M. H. R., & Selfiardy, S. (2022). Kehidupan Mahasiswa Kuliah Sambil Bekerja di Masa Pandemi Covid-19. Kidspedia: Jurnal Pendidikan Anak Usia Dini, 3(...
HSK 以外的汉语新词汇分类
HSK 以外的汉语新词汇分类
<p class="AA">As the number of new Chinese vocabulary increases year by year, mastering new vocabulary beyond the HSK (Global Chinese Proficiency Test) syllabus is crucial to...
Trajectory of Learning Academic Vocabulary: IT Undergraduates’ Vocabulary Learning Strategies and Performance at the Exam
Trajectory of Learning Academic Vocabulary: IT Undergraduates’ Vocabulary Learning Strategies and Performance at the Exam
Learning vocabulary is an integral part in language acquisition and acquisition of academic vocabulary is crucial for the success in an academic context. Therefore, many studies ha...
Penggunaan Kosakata untuk Pencitraan Diri di Media Sosial Facebook
Penggunaan Kosakata untuk Pencitraan Diri di Media Sosial Facebook
Abstract: This study examines the use of vocabulary for self-image by users of social media facebook. This study aims to: (1) describe the form of exponential vocabulary for self-i...

