Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Predicting Educational Background using Text Mining

View through CrossRef
We examine to what extent educational background can be inferred from written text, assuming that educational levels are associated with the style of writing and use of language. Using a large public dataset of almost 60000 dating profiles, containing written text for each profile, we look for a methodology to measure author style. We focus on education and essays fields in each profile from which we try to identify relevant features of written text that reveal the level of education of authors behind texts. Using different types of extracted features, we explore the level of education within three approaches: (i) classifying the level of education to elementary or higher education using lexical features; (ii) using Linguistic Inquiry and Word Count (LIWC) features; (iii) combining LIWC features and lexical features. For classification, we rely on regularized logistic regression. The joint model which uses both lexical and LIWC features predicts the education level better than other text representation models, but the contribution of LIWC is marginal. Our results may not only be useful in the context of the platform economy and online markets, also more generally to researchers who need to rely on written text as an indicator of educational background.
Title: Predicting Educational Background using Text Mining
Description:
We examine to what extent educational background can be inferred from written text, assuming that educational levels are associated with the style of writing and use of language.
Using a large public dataset of almost 60000 dating profiles, containing written text for each profile, we look for a methodology to measure author style.
We focus on education and essays fields in each profile from which we try to identify relevant features of written text that reveal the level of education of authors behind texts.
Using different types of extracted features, we explore the level of education within three approaches: (i) classifying the level of education to elementary or higher education using lexical features; (ii) using Linguistic Inquiry and Word Count (LIWC) features; (iii) combining LIWC features and lexical features.
For classification, we rely on regularized logistic regression.
The joint model which uses both lexical and LIWC features predicts the education level better than other text representation models, but the contribution of LIWC is marginal.
Our results may not only be useful in the context of the platform economy and online markets, also more generally to researchers who need to rely on written text as an indicator of educational background.

Related Results

On Flores Island, do "ape-men" still exist? https://www.sapiens.org/biology/flores-island-ape-men/
On Flores Island, do "ape-men" still exist? https://www.sapiens.org/biology/flores-island-ape-men/
<span style="font-size:11pt"><span style="background:#f9f9f4"><span style="line-height:normal"><span style="font-family:Calibri,sans-serif"><b><spa...
E-Press and Oppress
E-Press and Oppress
From elephants to ABBA fans, silicon to hormone, the following discussion uses a new research method to look at printed text, motion pictures and a te...
Optimisation of potash mining technology for cell and pillar mining method
Optimisation of potash mining technology for cell and pillar mining method
The diverse demand for inorganic fertilizers has predetermined the intensification of potash mining, which is a raw material for their production. In this regard, it has become nec...
ON THE DEVELOPMENT OF A GENERAL METHOD FOR FORECASTING THE DANGEROUS PROPERTIES OF COAL SEAMS
ON THE DEVELOPMENT OF A GENERAL METHOD FOR FORECASTING THE DANGEROUS PROPERTIES OF COAL SEAMS
Purpose: to establish a quantitative effect on the dust-generating ability of mine layers of the degree of metamorphic transformations of fossil coals, mining-geological and mining...
Λc Physics at BESIII
Λc Physics at BESIII
In 2014 BESIII collected a data sample of 567 [Formula: see text] at [Formula: see text] = 4.6 GeV, which is just above the [Formula: see text] pair production threshold. By analyz...
PENGEMBANGAN MASYARAKAT LINGKAR TAMBANG DALAM PENGUSAHAAN PERTAMBANGAN
PENGEMBANGAN MASYARAKAT LINGKAR TAMBANG DALAM PENGUSAHAAN PERTAMBANGAN
Indonesia is a country rich in mining resources. Mining resources include gold, silver, copper, oil and gas, coal and others. There are a large number of companies operating in the...
The use or abuse of thematic mining information maps
The use or abuse of thematic mining information maps
Abstract Thematic and environmental geology mapping has been applied in recent years by the British Geological Survey (BGS) in cities, towns, urban fringes, rural areas a...

Back to Top