Javascript must be enabled to continue!
Bootstrapping a Biodiversity Knowledge Graph
View through CrossRef
The "biodiversity knowledge graph" is a nice metaphor for connecting biodiversity data sources, but can we actually build it? Do we have sufficient linked data available? Given that a knowledge graph is an aggregation of data from multiple sources, how do we give those sources credit for that data, and how do we handle changes to that data? Given that the classic interface to a knowledge graph is an intimidatingly empty SPARQL query box, how do we make the knowledge within a graph more accessible?
This talk discusses an attempt to build a knowledge graph with an eye on how to maintain the graph in the future. It adopts a model similar to Global Biodiversity Information Facility (GBIF) and CheckListBank where individual data providers make datasets available as independently citable units with Digital Object Identifiers (DOIs). Each dataset comprises linked data in the form of N-triples. To create a knowledge graph we simply download one or more such datasets and add them to a triple store. Each data source is assigned to its own named graph, such that we have provenance for each dataset, and we can update any dataset independently. Furthermore, anyone can build their own knowledge graph by mixing and matching the set of data (people, publications, taxa, etc.) most appropriate to their interests.
To bootstrap this approach, exemplar datasets are created based on data harvested from ORCID, Zenodo, and taxonomic name databases. Each demonstration dataset could be replaced in the future by data published directly by those providers. In some cases there are sufficient shared identifiers (such as DOIs and ORCIDs) to form a graph, but taxonomic data typically forms isolated islands. To help the knowledge graph coalesce we need "glue" in the form of datasets that link pairs of different identifiers, such as Life Science Identifiers (LSIDs) for names to DOIs for publications. With the addition of those cross links we can start to generate bibliographies for taxa, discover communities of taxonomic expertise, and more. This model of building a knowledge graph also opens opportunities for smaller, focussed datasets to be added to the graph using the same approach (as set of N-triples archived in an online repository).
In order to be useful, a knowledge graph needs to be easy to query and visualise. Simply providing a SPARQL endpoint is unlikely to be enough. As part of this project, I developed a GraphQL interface to the knowledge graph to provide a set of standard queries that can support a simple web interface to the graph. This provides a way to explore the graph as it is being developed, which in turn can highlight gaps in connectivity and coverage that need to be addressed.
Title: Bootstrapping a Biodiversity Knowledge Graph
Description:
The "biodiversity knowledge graph" is a nice metaphor for connecting biodiversity data sources, but can we actually build it? Do we have sufficient linked data available? Given that a knowledge graph is an aggregation of data from multiple sources, how do we give those sources credit for that data, and how do we handle changes to that data? Given that the classic interface to a knowledge graph is an intimidatingly empty SPARQL query box, how do we make the knowledge within a graph more accessible?
This talk discusses an attempt to build a knowledge graph with an eye on how to maintain the graph in the future.
It adopts a model similar to Global Biodiversity Information Facility (GBIF) and CheckListBank where individual data providers make datasets available as independently citable units with Digital Object Identifiers (DOIs).
Each dataset comprises linked data in the form of N-triples.
To create a knowledge graph we simply download one or more such datasets and add them to a triple store.
Each data source is assigned to its own named graph, such that we have provenance for each dataset, and we can update any dataset independently.
Furthermore, anyone can build their own knowledge graph by mixing and matching the set of data (people, publications, taxa, etc.
) most appropriate to their interests.
To bootstrap this approach, exemplar datasets are created based on data harvested from ORCID, Zenodo, and taxonomic name databases.
Each demonstration dataset could be replaced in the future by data published directly by those providers.
In some cases there are sufficient shared identifiers (such as DOIs and ORCIDs) to form a graph, but taxonomic data typically forms isolated islands.
To help the knowledge graph coalesce we need "glue" in the form of datasets that link pairs of different identifiers, such as Life Science Identifiers (LSIDs) for names to DOIs for publications.
With the addition of those cross links we can start to generate bibliographies for taxa, discover communities of taxonomic expertise, and more.
This model of building a knowledge graph also opens opportunities for smaller, focussed datasets to be added to the graph using the same approach (as set of N-triples archived in an online repository).
In order to be useful, a knowledge graph needs to be easy to query and visualise.
Simply providing a SPARQL endpoint is unlikely to be enough.
As part of this project, I developed a GraphQL interface to the knowledge graph to provide a set of standard queries that can support a simple web interface to the graph.
This provides a way to explore the graph as it is being developed, which in turn can highlight gaps in connectivity and coverage that need to be addressed.
Related Results
Global Open Biodiversity Data: Future Vision of FAIR Biodiversity Data Access, Management, Use and Stewardship
Global Open Biodiversity Data: Future Vision of FAIR Biodiversity Data Access, Management, Use and Stewardship
Major environmental–biodiversity changes and new developments in technology have changed the way we live, work and how we create our future. The main attention of biodiversity rese...
Graph convolutional neural networks for 3D data analysis
Graph convolutional neural networks for 3D data analysis
(English) Deep Learning allows the extraction of complex features directly from raw input data, eliminating the need for hand-crafted features from the classical Machine Learning p...
Bilangan Terhubung Titik Pelangi pada Graf Garis dan Graf Tengah dari Hasil Operasi Comb Graf Bintang C<sub>3</sub> dan Graf Bintang S<sub>n</sub>
Bilangan Terhubung Titik Pelangi pada Graf Garis dan Graf Tengah dari Hasil Operasi Comb Graf Bintang C<sub>3</sub> dan Graf Bintang S<sub>n</sub>
Penelitian ini bertujuan menentukan bilangan terhubung titik pelangi (rainbow vertex connection number) pada graf garis dan graf tengah yang diperoleh dari hasil operasi comb antar...
The business case for investing in biodiversity data
The business case for investing in biodiversity data
1. The private sector is increasingly aware of its dependence on biodiversity and the financial risks and opportunities involved. This has generated a lot of demand for investing i...
Marine Biodiversity
Marine Biodiversity
The term marine biodiversity encompasses a broad range of subjects, spanning from descriptions of single species, or taxa, to habitats and ecosystems, and indeed the global ocean. ...
Bootstrapping promotes the RSFC-behavior associations: an application of individual cognitive traits prediction
Bootstrapping promotes the RSFC-behavior associations: an application of individual cognitive traits prediction
AbstractResting state functional connectivity records enormous functional interaction information between any pair of brain nodes, which enriches the prediction of individual pheno...
The biodiversity of ice‐free Antarctica database
The biodiversity of ice‐free Antarctica database
AbstractAntarctica is one of Earth's most untouched, inhospitable, and poorly known regions. Although knowledge of its biodiversity has increased over recent decades, a diverse, wi...
An Investigation of Secondary School Students’ Biodiversity Literacy Level
An Investigation of Secondary School Students’ Biodiversity Literacy Level
The quality of life sustained by human beings is largely possible thanks to the opportunities offered by the biodiversity resources in nature. It is widely accepted that the contin...

