Javascript must be enabled to continue!
UERR: A Unified Effective Retrieval Model for Open-Source Repositories
View through CrossRef
Open-source repositories are widely used to improve the productivity and quality of modern software development. However, it is generally not an easy task for developers to find suitable repositories from a large-scale platform, e.g., GitHub. Although GitHub provides a search engine to support repository exploration, the search capability is limited as some important criteria (e.g., the closed ratio of issues) are not supported, and users cannot specify fine-grained preferences on different criteria. To address these limitations, we propose a unified effective repository retrieval model, UERR. We first define a unified schema useful for retrieving repositories by analyzing the metadata of GitHub repositories. Our schema includes 30 attributes grouped into six dimensions, such as functionality, maintenance, and popularity. Based on the schema, we design a query grammar for users to express their fine-grained requirements with personalized preferences and an integrated method to measure the relevance between a repository and a query. In particular, we expand acronyms and split conjoined words by leveraging large language model (LLM) APIs and design a weighted N-gram based method to measure functional relevance between textual descriptions of repositories and queries. Experiments with 20 diverse queries show that UERR significantly improves the top 1~20 retrieved repositories by 27.12%~165.87% in terms of the Pre@k, MRR@k, and NDCG@k metrics, in comparison with the search engines provided by Elasticsearch and GitHub. The results also validate the superiority of our weighted N-gram based functional relevance measurement method over two representative methods, i.e., the IDF-weighted keyword matching and word embedding matching.
Title: UERR: A Unified Effective Retrieval Model for Open-Source Repositories
Description:
Open-source repositories are widely used to improve the productivity and quality of modern software development.
However, it is generally not an easy task for developers to find suitable repositories from a large-scale platform, e.
g.
, GitHub.
Although GitHub provides a search engine to support repository exploration, the search capability is limited as some important criteria (e.
g.
, the closed ratio of issues) are not supported, and users cannot specify fine-grained preferences on different criteria.
To address these limitations, we propose a unified effective repository retrieval model, UERR.
We first define a unified schema useful for retrieving repositories by analyzing the metadata of GitHub repositories.
Our schema includes 30 attributes grouped into six dimensions, such as functionality, maintenance, and popularity.
Based on the schema, we design a query grammar for users to express their fine-grained requirements with personalized preferences and an integrated method to measure the relevance between a repository and a query.
In particular, we expand acronyms and split conjoined words by leveraging large language model (LLM) APIs and design a weighted N-gram based method to measure functional relevance between textual descriptions of repositories and queries.
Experiments with 20 diverse queries show that UERR significantly improves the top 1~20 retrieved repositories by 27.
12%~165.
87% in terms of the Pre@k, MRR@k, and NDCG@k metrics, in comparison with the search engines provided by Elasticsearch and GitHub.
The results also validate the superiority of our weighted N-gram based functional relevance measurement method over two representative methods, i.
e.
, the IDF-weighted keyword matching and word embedding matching.
Related Results
Caderno de Resumos das Jornadas de Enfermagem da UERR
Caderno de Resumos das Jornadas de Enfermagem da UERR
A JORNADA DE ENFERMAGEM DA UERR foi elaborada em 2017, com finalidade de PROMOÇÃO E DIVULGAÇÃO CIENTÍFICA acerca de conhecimentos da saúde e enfermagem, buscando incentivar e compa...
Unconventional Method of Subsea Umbilical Retrieval Using Anchor Handling Vessel
Unconventional Method of Subsea Umbilical Retrieval Using Anchor Handling Vessel
Abstract
A deepwater field in West Africa was decommissioned and subsea facilities retrieval operation was carried out as part of the Abandonment and Decommissioning...
Towards Transparent Presentation of FAIR-enabling Data Repository Functions & Characteristics
Towards Transparent Presentation of FAIR-enabling Data Repository Functions & Characteristics
Identifying, finding and gaining a sufficient overview of the functions and characteristics of data repositories and their catalogues is essential for users of data repositories an...
Crosswalk among Prominent Open Research Data Repositories
Crosswalk among Prominent Open Research Data Repositories
Open Access is a synergised global movement using Internet to provide equal access to knowledge that once hid behind the subscription paywalls. Many new models for scholarly commun...
IRUS-UK: Improving understanding of the value and impact of institutional repositories
IRUS-UK: Improving understanding of the value and impact of institutional repositories
>> See video of presentation (21 min.) Many educational institutions have repositories for research outputs. The number of items available through institutional repositories ...
The influence of timing of oocytes retrieval and embryo transfer on the IVF-ET outcomes in patients having bilateral salpingectomy due to bilateral hydrosalpinx
The influence of timing of oocytes retrieval and embryo transfer on the IVF-ET outcomes in patients having bilateral salpingectomy due to bilateral hydrosalpinx
ObjectiveThe objective of the study was to investigate whether the sequence of oocyte retrieval and salpingectomy for hydrosalpinx affects pregnancy outcomes of in vitro fertilizat...
Rural and Remote Intubations in An Australian Aeromedical Retrieval Service: A Retrospective Cohort Study.
Rural and Remote Intubations in An Australian Aeromedical Retrieval Service: A Retrospective Cohort Study.
Abstract
Objective Critically unwell patients in rural and remote areas of Queensland, Australia, often require airway management with rapid sequence intubation (RSI) prior...
Open Access Medical Repositories: Status and Development Trends
Open Access Medical Repositories: Status and Development Trends
The issue of reflecting the scientific achievements of individual scientists and the results of research activities of research teams in the information environment is of great imp...

