Javascript must be enabled to continue!
Towards Potential Content-Based Features Evaluation to Tackle Meaningful Citations
View through CrossRef
The scientific community has presented various citation classification models to refute the concept of pure quantitative citation analysis systems wherein all citations are treated equally. However, a small number of benchmark datasets exist, which makes the asymmetric citation data-driven modeling quite complex. These models classify citations for varying reasons, mostly harnessing metadata and content-based features derived from research papers. Presently, researchers are more inclined toward binary citation classification with the belief that exploiting the datasets of incomplete nature in the best possible way is adequate to address the issue. We argue that contemporary ML citation classification models overlook essential aspects while selecting the appropriate features that hinder elutriating the asymmetric citation data. This study presents a novel binary citation classification model exploiting a list of potential natural language processing (NLP) based features. Machine learning classifiers, including SVM, KLR, and RF, are harnessed to classify citations into important and non-important classes. The evaluation is performed using two benchmark data sets containing a corpus of around 953 paper-citation pairs annotated by the citing authors and domain experts. The study outcomes exhibit that the proposed model outperformed the contemporary approaches by attaining a precision of 0.88.
Title: Towards Potential Content-Based Features Evaluation to Tackle Meaningful Citations
Description:
The scientific community has presented various citation classification models to refute the concept of pure quantitative citation analysis systems wherein all citations are treated equally.
However, a small number of benchmark datasets exist, which makes the asymmetric citation data-driven modeling quite complex.
These models classify citations for varying reasons, mostly harnessing metadata and content-based features derived from research papers.
Presently, researchers are more inclined toward binary citation classification with the belief that exploiting the datasets of incomplete nature in the best possible way is adequate to address the issue.
We argue that contemporary ML citation classification models overlook essential aspects while selecting the appropriate features that hinder elutriating the asymmetric citation data.
This study presents a novel binary citation classification model exploiting a list of potential natural language processing (NLP) based features.
Machine learning classifiers, including SVM, KLR, and RF, are harnessed to classify citations into important and non-important classes.
The evaluation is performed using two benchmark data sets containing a corpus of around 953 paper-citation pairs annotated by the citing authors and domain experts.
The study outcomes exhibit that the proposed model outperformed the contemporary approaches by attaining a precision of 0.
88.
Related Results
Wayback machine: reincarnation to vanished online citations
Wayback machine: reincarnation to vanished online citations
Purpose
– The purpose of this paper is to know the rate of loss of online citations used as references in scholarly journals. It also indented to recover the vanish...
Aberration of the citation
Aberration of the citation
Multiple inherent biases related to different citation practices (for e.g., self-citations, negative citations, wrong citations, multi-authorship-biased citations, honorary citatio...
Persistence and half‐life of URL citations cited in LIS open access journals
Persistence and half‐life of URL citations cited in LIS open access journals
PurposeThe main purpose of the present study is to examine the availability and persistence of URL citations in two LIS open access journals. It also intended to calculate the half...
10 Years of Toxicogenomics section in Frontiers in Genetics: Past discoveries and Future Perspectives
10 Years of Toxicogenomics section in Frontiers in Genetics: Past discoveries and Future Perspectives
The Frontiers Media family has over 200 journals, which are each headed by usually one Field Chief Editor, and several specialty sections, which are each headed by one or more Spec...
Coverage of DOAJ journals' citations through OpenCitations - Protocol v2
Coverage of DOAJ journals' citations through OpenCitations - Protocol v2
This is the protocol for the research of the coverage of DOAJ journals' citations through OpenCitations. Our goal is to find out: about the coverage of articles from open access ...
Coverage of open citations in DOAJ journals - Protocol v2
Coverage of open citations in DOAJ journals - Protocol v2
This is the protocol for the research of the coverage of open citations in DOAJ journals. Our goal is to find out: about the coverage of articles from open access journals in DOAJ ...
Publication trends in sarcoma research: A bibliometric analysis.
Publication trends in sarcoma research: A bibliometric analysis.
e23530 Background: Sarcomas are rare primary malignant tumors originating from connective tissue, worldwide research trends on this topic are unclear. We conducted a bibliometric ...
The Missing 15 Percent of Patent Citations
The Missing 15 Percent of Patent Citations
Patent citations are one of the most commonly-used metrics in the innovation literature. Leading uses of patent-to-patent citations are associated with the quantification of invent...

