Javascript must be enabled to continue!
Wikipedia citations: Reproducible citation extraction from multilingual Wikipedia
View through CrossRef
Abstract
Wikipedia is an essential component of the open science ecosystem, yet it is poorly integrated with academic open science initiatives. Wikipedia Citations is a project that focuses on extracting and releasing comprehensive data sets of citations from Wikipedia. A total of 29.3 million citations were extracted from the English Wikipedia in May 2020. Following this one-off research project, we designed a reproducible pipeline that can process any Wikipedia dump in a cloud-based setting. To demonstrate its usability, we extracted 40.6 million citations in February 2023 and 44.7 million citations in February 2024. Furthermore, we equipped the pipeline with an adapted Wikipedia citation template translation module to process multilingual Wikipedia articles in 15 languages so that they are parsed and mapped into a generic structured citation template. This paper presents our open-source software pipeline for retrieving, classifying, and disambiguating citations on demand from a given Wikipedia dump.
Title: Wikipedia citations: Reproducible citation extraction from multilingual Wikipedia
Description:
Abstract
Wikipedia is an essential component of the open science ecosystem, yet it is poorly integrated with academic open science initiatives.
Wikipedia Citations is a project that focuses on extracting and releasing comprehensive data sets of citations from Wikipedia.
A total of 29.
3 million citations were extracted from the English Wikipedia in May 2020.
Following this one-off research project, we designed a reproducible pipeline that can process any Wikipedia dump in a cloud-based setting.
To demonstrate its usability, we extracted 40.
6 million citations in February 2023 and 44.
7 million citations in February 2024.
Furthermore, we equipped the pipeline with an adapted Wikipedia citation template translation module to process multilingual Wikipedia articles in 15 languages so that they are parsed and mapped into a generic structured citation template.
This paper presents our open-source software pipeline for retrieving, classifying, and disambiguating citations on demand from a given Wikipedia dump.
Related Results
Aberration of the citation
Aberration of the citation
Multiple inherent biases related to different citation practices (for e.g., self-citations, negative citations, wrong citations, multi-authorship-biased citations, honorary citatio...
Wayback machine: reincarnation to vanished online citations
Wayback machine: reincarnation to vanished online citations
Purpose
– The purpose of this paper is to know the rate of loss of online citations used as references in scholarly journals. It also indented to recover the vanish...
Citation analysis of computer systems papers
Citation analysis of computer systems papers
Citation analysis is used extensively in the bibliometrics literature to assess the impact of individual works, researchers, institutions, and even entire fields of study. In this ...
Interdependencies in Citation Metrics Using Dimensions (Case Study of Two NAUKMA Journals)
Interdependencies in Citation Metrics Using Dimensions (Case Study of Two NAUKMA Journals)
Quantitative data are increasingly influencing the evaluation of the effectiveness of research and researchers. Citations may be the main metric to assess the quality and value of ...
Self-citations, a trend prevalent across subject disciplines at the global level: an overview
Self-citations, a trend prevalent across subject disciplines at the global level: an overview
Purpose
The present study aims to determine the prevailing trend of self-citations across 27 major subject disciplines at global level. The study also examines the aspects like per...
Wikipedia: a tool to monitor seasonal diseases trends?
Wikipedia: a tool to monitor seasonal diseases trends?
ObjectiveTo explore the interest of Wikipedia as a data source to monitorseasonal diseases trends in metropolitan France.IntroductionToday, Internet, especially Wikipedia, is an im...
Exploiting Wikipedia Semantics for Computing Word Associations
Exploiting Wikipedia Semantics for Computing Word Associations
<p><b>Semantic association computation is the process of automatically quantifying the strength of a semantic connection between two textual units based on various lexi...
COVID-19 research in Wikipedia
COVID-19 research in Wikipedia
Wikipedia is one of the main sources of free knowledge on the Web. During the first few months of the pandemic, over 5,200 new Wikipedia pages on COVID-19 were created, accumulatin...

