Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Bootstrapping a Biodiversity Knowledge Graph

View through CrossRef
The "biodiversity knowledge graph" is a nice metaphor for connecting biodiversity data sources, but can we actually build it? Do we have sufficient linked data available? Given that a knowledge graph is an aggregation of data from multiple sources, how do we give those sources credit for that data, and how do we handle changes to that data? Given that the classic interface to a knowledge graph is an intimidatingly empty SPARQL query box, how do we make the knowledge within a graph more accessible? This talk discusses an attempt to build a knowledge graph with an eye on how to maintain the graph in the future. It adopts a model similar to Global Biodiversity Information Facility (GBIF) and CheckListBank where individual data providers make datasets available as independently citable units with Digital Object Identifiers (DOIs). Each dataset comprises linked data in the form of N-triples. To create a knowledge graph we simply download one or more such datasets and add them to a triple store. Each data source is assigned to its own named graph, such that we have provenance for each dataset, and we can update any dataset independently. Furthermore, anyone can build their own knowledge graph by mixing and matching the set of data (people, publications, taxa, etc.) most appropriate to their interests. To bootstrap this approach, exemplar datasets are created based on data harvested from ORCID, Zenodo, and taxonomic name databases. Each demonstration dataset could be replaced in the future by data published directly by those providers. In some cases there are sufficient shared identifiers (such as DOIs and ORCIDs) to form a graph, but taxonomic data typically forms isolated islands. To help the knowledge graph coalesce we need "glue" in the form of datasets that link pairs of different identifiers, such as Life Science Identifiers (LSIDs) for names to DOIs for publications. With the addition of those cross links we can start to generate bibliographies for taxa, discover communities of taxonomic expertise, and more. This model of building a knowledge graph also opens opportunities for smaller, focussed datasets to be added to the graph using the same approach (as set of N-triples archived in an online repository). In order to be useful, a knowledge graph needs to be easy to query and visualise. Simply providing a SPARQL endpoint is unlikely to be enough. As part of this project, I developed a GraphQL interface to the knowledge graph to provide a set of standard queries that can support a simple web interface to the graph. This provides a way to explore the graph as it is being developed, which in turn can highlight gaps in connectivity and coverage that need to be addressed.
Title: Bootstrapping a Biodiversity Knowledge Graph
Description:
The "biodiversity knowledge graph" is a nice metaphor for connecting biodiversity data sources, but can we actually build it? Do we have sufficient linked data available? Given that a knowledge graph is an aggregation of data from multiple sources, how do we give those sources credit for that data, and how do we handle changes to that data? Given that the classic interface to a knowledge graph is an intimidatingly empty SPARQL query box, how do we make the knowledge within a graph more accessible? This talk discusses an attempt to build a knowledge graph with an eye on how to maintain the graph in the future.
It adopts a model similar to Global Biodiversity Information Facility (GBIF) and CheckListBank where individual data providers make datasets available as independently citable units with Digital Object Identifiers (DOIs).
Each dataset comprises linked data in the form of N-triples.
To create a knowledge graph we simply download one or more such datasets and add them to a triple store.
Each data source is assigned to its own named graph, such that we have provenance for each dataset, and we can update any dataset independently.
Furthermore, anyone can build their own knowledge graph by mixing and matching the set of data (people, publications, taxa, etc.
) most appropriate to their interests.
To bootstrap this approach, exemplar datasets are created based on data harvested from ORCID, Zenodo, and taxonomic name databases.
Each demonstration dataset could be replaced in the future by data published directly by those providers.
In some cases there are sufficient shared identifiers (such as DOIs and ORCIDs) to form a graph, but taxonomic data typically forms isolated islands.
To help the knowledge graph coalesce we need "glue" in the form of datasets that link pairs of different identifiers, such as Life Science Identifiers (LSIDs) for names to DOIs for publications.
With the addition of those cross links we can start to generate bibliographies for taxa, discover communities of taxonomic expertise, and more.
This model of building a knowledge graph also opens opportunities for smaller, focussed datasets to be added to the graph using the same approach (as set of N-triples archived in an online repository).
In order to be useful, a knowledge graph needs to be easy to query and visualise.
Simply providing a SPARQL endpoint is unlikely to be enough.
As part of this project, I developed a GraphQL interface to the knowledge graph to provide a set of standard queries that can support a simple web interface to the graph.
This provides a way to explore the graph as it is being developed, which in turn can highlight gaps in connectivity and coverage that need to be addressed.

Related Results

The business case for investing in biodiversity data
The business case for investing in biodiversity data
1. The private sector is increasingly aware of its dependence on biodiversity and the financial risks and opportunities involved. This has generated a lot of demand for investing i...
Marine Biodiversity
Marine Biodiversity
The term marine biodiversity encompasses a broad range of subjects, spanning from descriptions of single species, or taxa, to habitats and ecosystems, and indeed the global ocean. ...
Bootstrapping promotes the RSFC-behavior associations: an application of individual cognitive traits prediction
Bootstrapping promotes the RSFC-behavior associations: an application of individual cognitive traits prediction
AbstractResting state functional connectivity records enormous functional interaction information between any pair of brain nodes, which enriches the prediction of individual pheno...
An Investigation of Secondary School Students’ Biodiversity Literacy Level
An Investigation of Secondary School Students’ Biodiversity Literacy Level
The quality of life sustained by human beings is largely possible thanks to the opportunities offered by the biodiversity resources in nature. It is widely accepted that the contin...
Impact of Biodiversity Risk on Bank Lending
Impact of Biodiversity Risk on Bank Lending
AbstractThis study focuses on the consequences of the threat of biodiversity loss for bank lending in the US economy. Enlarged biodiversity risks are expected to reduce bank lendin...
Abstract 902: Explainable AI: Graph machine learning for response prediction and biomarker discovery
Abstract 902: Explainable AI: Graph machine learning for response prediction and biomarker discovery
Abstract Accurately predicting drug sensitivity and understanding what is driving it are major challenges in drug discovery. Graphs are a natural framework for captu...
Participatory monitoring of biodiversity in East African grazing lands
Participatory monitoring of biodiversity in East African grazing lands
AbstractThere are disagreements on the use of standard biodiversity monitoring methods to promote community participation. This study combined three methods to investigate question...
Domination of Polynomial with Application
Domination of Polynomial with Application
In this paper, .We .initiate the study of domination. polynomial , consider G=(V,E) be a simple, finite, and directed graph without. isolated. vertex .We present a study of the Ira...

Back to Top