Javascript must be enabled to continue!

Topic modeling in software engineering research

AbstractTopic modeling using models such as Latent Dirichlet Allocation (LDA) is a text mining technique to extract human-readable semantic “topics” (i.e., word clusters) from a corpus of textual documents. In software engineering, topic modeling has been used to analyze textual data in empirical studies (e.g., to find out what developers talk about online), but also to build new techniques to support software engineering tasks (e.g., to support source code comprehension). Topic modeling needs to be applied carefully (e.g., depending on the type of textual data analyzed and modeling parameters). Our study aims at describing how topic modeling has been applied in software engineering research with a focus on four aspects: (1) which topic models and modeling techniques have been applied, (2) which textual inputs have been used for topic modeling, (3) how textual data was “prepared” (i.e., pre-processed) for topic modeling, and (4) how generated topics (i.e., word clusters) were named to give them a human-understandable meaning. We analyzed topic modeling as applied in 111 papers from ten highly-ranked software engineering venues (five journals and five conferences) published between 2009 and 2020. We found that (1) LDA and LDA-based techniques are the most frequent topic modeling techniques, (2) developer communication and bug reports have been modelled most, (3) data pre-processing and modeling parameters vary quite a bit and are often vaguely reported, and (4) manual topic naming (such as deducting names based on frequent words in a topic) is common.

Springer Science and Business Media LLC

Camila Costa Silva Matthias Galster Fabian Gilson

Empirical Software Engineering

2021

Title: Topic modeling in software engineering research

Description:

AbstractTopic modeling using models such as Latent Dirichlet Allocation (LDA) is a text mining technique to extract human-readable semantic “topics” (i.

, word clusters) from a corpus of textual documents.

In software engineering, topic modeling has been used to analyze textual data in empirical studies (e.

, to find out what developers talk about online), but also to build new techniques to support software engineering tasks (e.

, to support source code comprehension).

Topic modeling needs to be applied carefully (e.

, depending on the type of textual data analyzed and modeling parameters).

Our study aims at describing how topic modeling has been applied in software engineering research with a focus on four aspects: (1) which topic models and modeling techniques have been applied, (2) which textual inputs have been used for topic modeling, (3) how textual data was “prepared” (i.

, pre-processed) for topic modeling, and (4) how generated topics (i.

, word clusters) were named to give them a human-understandable meaning.

We analyzed topic modeling as applied in 111 papers from ten highly-ranked software engineering venues (five journals and five conferences) published between 2009 and 2020.

We found that (1) LDA and LDA-based techniques are the most frequent topic modeling techniques, (2) developer communication and bug reports have been modelled most, (3) data pre-processing and modeling parameters vary quite a bit and are often vaguely reported, and (4) manual topic naming (such as deducting names based on frequent words in a topic) is common.

Back

Sustainable computing is a rapidly growing research area spanning several areas of computer science. In the software engineering field, the topic has received increasing attention ...

Software Assurance

Abstract Confidence in software quality is a rare commodity throughout all industries. Software publishers, users, and system integrators are highly distrustful of anyone...

Comparative Analysis of Topic Modeling Algorithms for Short Texts in Persian Tweets

Abstract Topic modeling is a popular natural language processing technique to uncover hidden patterns and topics in extensive text collections. However, there is a lack of ...

Performance simulation methodologies for hardware/software co-designed processors

Recently the community started looking into Hardware/Software (HW/SW) co-designed processors as potential solutions to move towards the less power consuming and the less complex de...

ELIXIR Europe on the Road to Sustainable Research Software

ELIXIR (ELIXIR Europe 2019a) is an intergovernmental organization that brings together life science resources across Europe. These resources include databases, software tools, trai...

Modeling Techniques for Software-Intensive Systems

Software has become the driving force in the evolution of many systems, such as embedded systems (especially automotive applications), telecommunication systems, and large scale he...

Pengaruh Kadar Air dan Kadar Abu terhadap Nilai Kalori Batubara Berdasarkan Analisis Rergesi Linier Berganda

Abstract. Coal contains moisture in the air, ash, volatiles, and fixed carbon. Proximate analysis was conducted to determine these contents, and the calorific value of the coal was...

Exploring the use of gamification in human-centered agile-based requirements engineering

Gamification has become prevalent in educational settings, particularly in human-centered software engineering. Using gamified learning environments for comprehending software engi...

Email:
Password:

Email:

Topic modeling in software engineering research

Related Results