Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Building and Using Geospatial Ontology in the BioCaster Surveillance System

View through CrossRef
AbstractThis abstract presents an approach to building a geospatial ontology from Wikipedia and using it in BioCaster, a system for detecting and tracking infectious disease outbreaks from online news. Motivated by the need to interpret the geospatial dynamics of events we built a database containing the names of countries and major cities from Wikipedia. We started by automatically extracting country and dependent territory names and sub-country (subdivision and dependent area) names in the form of ISO 3166-1 and ISO 3166-2, respectively. Then, we re-created the part-whole relation between countries and sub-countries by verifying links from countries to their sub-countries. Verification was done by manual checking. The building process is semi-automatically implemented with automatically extracting locations and verification with human-aid. In addition,we extracted absolute longitudes/latitudes of each location for the use in Google Map and Google Earth applications. Finally we combined the geospatial hierarchy from Wikipedia with the BioCaster ontology (BCO). The preliminary results show a geospatial ontology with two administrative levels: 243 countries and 4,025 sub-countries. The geospatial ontology was integrated into the extant BCO, a multilingual public health ontology focusing on infectious diseases and was available at "http://biocaster.nii.ac.jp":http://biocaster.nii.ac.jp.The geospatial ontology was used to develop an algorithm for detecting locations of outbreaks that occur in news stories. Firstly, locations in news stories are automatically tagged with a named entity recognizer based on a support vector machine trained on 1,000 manually annotated texts. Secondly, we mapped location names from the text to identifiers in the geospatial ontology at the country and sub-country levels. Grounding proceeded as follows: First, we ranked pairs of disease-location by frequency in a set of collected articles which shared similar date stamps. We then chose the top disease-location pairs to re-map into each news story. The re-mapping process is done by regular expression matching. In order to infer country names where this information was missing from the text we manually constructed a ranked list of sub-country and country pairs based on population size.Data collected in a 10 week period (Dec 20, 2007 to Feb 20, 2008) showed that the system detected 7,412 English articles, covering 110 countries and 360 sub-countries, of which 58.00% Africa, 18.23% Asia, 11.37% South America, 5.30 % North America, 3.40% Middle East, 2.86% Europe and 0.34% Ocean. Relevant articles came predominantly from a few sources such as Google News, the European Media Monitor and ProMED-mail. Among disease/country outbreaks successfully detected during this period were ebola in Uganda (Bundibugyo, Kampala, Mbarara), yellow fever in Brazil (Goias, Sao Paulo), avian influenza in Indonesia (Jakarta, Banten), and cholera in Vietnam (Ha Noi, Ha Tay).The results were plotted on a publicly available Google Map and indicate that our geospatial ontology met our requirements. In the future, we plan to extend the ontology into deeper levels like districts and sub-districts (wards, towns, villages). Evaluation and comparison of our geospatial ontology to other available resources like GAZ and dbpedia will also be considered.
Title: Building and Using Geospatial Ontology in the BioCaster Surveillance System
Description:
AbstractThis abstract presents an approach to building a geospatial ontology from Wikipedia and using it in BioCaster, a system for detecting and tracking infectious disease outbreaks from online news.
Motivated by the need to interpret the geospatial dynamics of events we built a database containing the names of countries and major cities from Wikipedia.
We started by automatically extracting country and dependent territory names and sub-country (subdivision and dependent area) names in the form of ISO 3166-1 and ISO 3166-2, respectively.
Then, we re-created the part-whole relation between countries and sub-countries by verifying links from countries to their sub-countries.
Verification was done by manual checking.
The building process is semi-automatically implemented with automatically extracting locations and verification with human-aid.
In addition,we extracted absolute longitudes/latitudes of each location for the use in Google Map and Google Earth applications.
Finally we combined the geospatial hierarchy from Wikipedia with the BioCaster ontology (BCO).
The preliminary results show a geospatial ontology with two administrative levels: 243 countries and 4,025 sub-countries.
The geospatial ontology was integrated into the extant BCO, a multilingual public health ontology focusing on infectious diseases and was available at "http://biocaster.
nii.
ac.
jp":http://biocaster.
nii.
ac.
jp.
The geospatial ontology was used to develop an algorithm for detecting locations of outbreaks that occur in news stories.
Firstly, locations in news stories are automatically tagged with a named entity recognizer based on a support vector machine trained on 1,000 manually annotated texts.
Secondly, we mapped location names from the text to identifiers in the geospatial ontology at the country and sub-country levels.
Grounding proceeded as follows: First, we ranked pairs of disease-location by frequency in a set of collected articles which shared similar date stamps.
We then chose the top disease-location pairs to re-map into each news story.
The re-mapping process is done by regular expression matching.
In order to infer country names where this information was missing from the text we manually constructed a ranked list of sub-country and country pairs based on population size.
Data collected in a 10 week period (Dec 20, 2007 to Feb 20, 2008) showed that the system detected 7,412 English articles, covering 110 countries and 360 sub-countries, of which 58.
00% Africa, 18.
23% Asia, 11.
37% South America, 5.
30 % North America, 3.
40% Middle East, 2.
86% Europe and 0.
34% Ocean.
Relevant articles came predominantly from a few sources such as Google News, the European Media Monitor and ProMED-mail.
Among disease/country outbreaks successfully detected during this period were ebola in Uganda (Bundibugyo, Kampala, Mbarara), yellow fever in Brazil (Goias, Sao Paulo), avian influenza in Indonesia (Jakarta, Banten), and cholera in Vietnam (Ha Noi, Ha Tay).
The results were plotted on a publicly available Google Map and indicate that our geospatial ontology met our requirements.
In the future, we plan to extend the ontology into deeper levels like districts and sub-districts (wards, towns, villages).
Evaluation and comparison of our geospatial ontology to other available resources like GAZ and dbpedia will also be considered.

Related Results

Geospatial Intelligence: Mapping the Future
Geospatial Intelligence: Mapping the Future
Abstract: Geospatial intelligence (GEOINT) is a multidisciplinary field that combines geographic information systems (GIS), remote sensing, and data analysis to provide critical i...
A conceptual model for geospatial analytics in disease surveillance and epidemiological forecasting
A conceptual model for geospatial analytics in disease surveillance and epidemiological forecasting
The integration of geospatial analytics into disease surveillance and epidemiological forecasting has emerged as a crucial approach in understanding and mitigating the spread of in...
Distributed Geospatial Information Systems Challenges and Opportunities
Distributed Geospatial Information Systems Challenges and Opportunities
The chapter titled “Distributed Geospatial Information Systems Challenges and Opportunities” delves into the comprehensive landscape of distributed geospatial technologies and thei...
Cyber Security Implementation for Application of Geospatial Data
Cyber Security Implementation for Application of Geospatial Data
Geospatial information is often seen as just being connected with guides, compasses, and areas. In any case, the application areas of geospatial information are far more extensive ...
Evaluation Activities from the National Syndromic Surveillance Program
Evaluation Activities from the National Syndromic Surveillance Program
ObjectiveThe objective of this session is to discuss syndromic surveillance evaluation activities. Panel participants will describe contexts and importance of selected evaluation a...
Aesthetic Disruptions: Critical Surveillance Art and the Unsettling of Surveillance
Aesthetic Disruptions: Critical Surveillance Art and the Unsettling of Surveillance
In the field of surveillance studies, scholars have focused on the use of art to offer an aesthetic intervention into the operation of surveillance systems. Scholars have used the ...

Back to Top