Javascript must be enabled to continue!
A Provenance Model for the Planetary Data System
View through CrossRef
Provenance is critical to the scientific integrity of archived data. It establishes the authenticity of science data by providing a detailed account of its origin, ownership, processing history, and any transformations that occurred over its lifecycle. In addition to validating authenticity, provenance facilitates the scientific evaluation of data and can be decisive in legal contexts regarding ownership and access.The Planetary Data System Information Model (PDS4 IM) provides some support for provenance such as limited processing history and source product identifiers.Nevertheless, the need for a more comprehensive and formalized approach to provenance tracking has been expressed across the PDS for derived data products, superseded unique identifiers, and the curation of data collections over time.To address these needs the W3C PROV Data Model (PROV-DM), Figure 1, has been adopted as the foundation for a robust, interoperable provenance model in PDS4.The PROV-DM Standard PROV-DM is a W3C Recommendation designed to support the interchange of provenance information on the Web. It introduces a generic yet expressive vocabulary for modeling entities, activities, and agents, as well as their interrelations such as wasDerivedFrom, wasGeneratedBy, wasAssociatedWith, and used. The adoption of PROV-DM in PDS4 provides a well-supported foundation for provenance interoperability. Figure 1 - PROV-DM: The W3C PROV Data ModelUsing GPT_4o, phase one consisted of the development of a general model aligned with PROV-DM and implemented as a PDS4 Local Data Dictionary (LDD), an extension to the PDS4 Information Model. As a general model, this LDD can in turn be specialized to support a diverse set of use cases across the PDS archive.Four use cases are currently under consideration. Each is conceptualized below by defining their PROV-DM entities and relationships.Voyager Corrected Image - This historical use case illustrates derivation of a corrected image from raw data and calibration files.Entities: CorrectedProduct, Raw Product, Calibration File
Activities: Radiometric/Geometric Correction and Artifact Removal, Voyager Data Acquisition, Processing pipeline “vlev1”
Agents: The software module components of “vlev1”, for example, findrx, remrx, and voycal.
Relations: wasDerivedFrom, used, wasGeneratedBy, wasAttributedTo, wasAssociatedWith
Superseded Logical Identifier (LID) - This use case tracks the relationship between a superseded identifier and a new identifierEntities: Superseded LID, New LID
Activities: Superseding Activity
Agents: PDS Discipline Node, data engineer
Relations: wasDerivedFrom, wasAssociatedWith
PDS Collection Stewardship – This use case describes the lifecycle of a collection across multiple versions and curatorial responsibility transfers.Entities: PDS Collection Version 1, PDS Collection Version 2
Activities: Archive, Management
Agents: Discipline Node A, Discipline Node B
Relations: wasGeneratedBy, wasAssociatedWith, used
Planetary Science Observation – This use case captures the entities and relationships involved in a science observation.Investigation Goal: Volatile Mapping on Ryugu
Spacecraft: Hayabusa2 (operated by JAXA)
Instruments: Near-Infrared Spectrometer (NIRS3), Thermal Infrared Imager (TIR)
Activity: Observation Campaign 1
Entities: Digital objects (e.g., spectra, images, thermographs)
Targets: Crater A, Boulder Cluster B, Equatorial Ridge, Polar Region, Landing Site Z
Relations: wasDerivedFrom, wasGeneratedBy, used, wasAssociatedWith, wasInformedBy
A pseudo-XML representation representing some of the entities, activities, agents, and relationships for the first use case is provided below. Voyager Corrected Image Derived From EDR Image The VoyagerISSCorrectedImage was derived from the EDRImage. entity_voyager_corrected_image entity_voyager_edr_image activity_data_correction Some challenges remain in finalizing the implementation strategy:Identify and define the possible reasons for identifier supersession: Superseded, Replaced, Error, etc.
Determine the suitability and granularity of W3C PROV relationships for the PDS.
Determine appropriate use of bidirectional vs. unidirectional references between data products.
Develop policies and procedures for governance to ensure consistent provenance recording and utilization.
Conclusion The application of the W3C PROV-DM standard to the PDS4 information model represents a significant step toward improving the traceability, interoperability, and scientific trustworthiness of planetary data. The use cases presented demonstrate a broad applicability across contexts, from spacecraft imaging pipelines to multi-node stewardship models. Continued refinement of relationship types and implementation strategies will ensure alignment with both scientific practice and digital preservation principles.References W3C. (2013). PROV-DM: The PROV Data Model. https://www.w3.org/TR/prov-dm/AcknowledgementsThe research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration.During the preparation of this work the author(s) used GPT4o in order to transform natural language to First Order Logic (FOL) and then to several Data Definition Languages (DDL). After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.© 2025 California Institute of Technology. Government sponsorship acknowledged.
Title: A Provenance Model for the Planetary Data System
Description:
Provenance is critical to the scientific integrity of archived data.
It establishes the authenticity of science data by providing a detailed account of its origin, ownership, processing history, and any transformations that occurred over its lifecycle.
In addition to validating authenticity, provenance facilitates the scientific evaluation of data and can be decisive in legal contexts regarding ownership and access.
The Planetary Data System Information Model (PDS4 IM) provides some support for provenance such as limited processing history and source product identifiers.
Nevertheless, the need for a more comprehensive and formalized approach to provenance tracking has been expressed across the PDS for derived data products, superseded unique identifiers, and the curation of data collections over time.
To address these needs the W3C PROV Data Model (PROV-DM), Figure 1, has been adopted as the foundation for a robust, interoperable provenance model in PDS4.
The PROV-DM Standard PROV-DM is a W3C Recommendation designed to support the interchange of provenance information on the Web.
It introduces a generic yet expressive vocabulary for modeling entities, activities, and agents, as well as their interrelations such as wasDerivedFrom, wasGeneratedBy, wasAssociatedWith, and used.
The adoption of PROV-DM in PDS4 provides a well-supported foundation for provenance interoperability.
Figure 1 - PROV-DM: The W3C PROV Data ModelUsing GPT_4o, phase one consisted of the development of a general model aligned with PROV-DM and implemented as a PDS4 Local Data Dictionary (LDD), an extension to the PDS4 Information Model.
As a general model, this LDD can in turn be specialized to support a diverse set of use cases across the PDS archive.
Four use cases are currently under consideration.
Each is conceptualized below by defining their PROV-DM entities and relationships.
Voyager Corrected Image - This historical use case illustrates derivation of a corrected image from raw data and calibration files.
Entities: CorrectedProduct, Raw Product, Calibration File
Activities: Radiometric/Geometric Correction and Artifact Removal, Voyager Data Acquisition, Processing pipeline “vlev1”
Agents: The software module components of “vlev1”, for example, findrx, remrx, and voycal.
Relations: wasDerivedFrom, used, wasGeneratedBy, wasAttributedTo, wasAssociatedWith
Superseded Logical Identifier (LID) - This use case tracks the relationship between a superseded identifier and a new identifierEntities: Superseded LID, New LID
Activities: Superseding Activity
Agents: PDS Discipline Node, data engineer
Relations: wasDerivedFrom, wasAssociatedWith
PDS Collection Stewardship – This use case describes the lifecycle of a collection across multiple versions and curatorial responsibility transfers.
Entities: PDS Collection Version 1, PDS Collection Version 2
Activities: Archive, Management
Agents: Discipline Node A, Discipline Node B
Relations: wasGeneratedBy, wasAssociatedWith, used
Planetary Science Observation – This use case captures the entities and relationships involved in a science observation.
Investigation Goal: Volatile Mapping on Ryugu
Spacecraft: Hayabusa2 (operated by JAXA)
Instruments: Near-Infrared Spectrometer (NIRS3), Thermal Infrared Imager (TIR)
Activity: Observation Campaign 1
Entities: Digital objects (e.
g.
, spectra, images, thermographs)
Targets: Crater A, Boulder Cluster B, Equatorial Ridge, Polar Region, Landing Site Z
Relations: wasDerivedFrom, wasGeneratedBy, used, wasAssociatedWith, wasInformedBy
A pseudo-XML representation representing some of the entities, activities, agents, and relationships for the first use case is provided below.
Voyager Corrected Image Derived From EDR Image The VoyagerISSCorrectedImage was derived from the EDRImage.
entity_voyager_corrected_image entity_voyager_edr_image activity_data_correction Some challenges remain in finalizing the implementation strategy:Identify and define the possible reasons for identifier supersession: Superseded, Replaced, Error, etc.
Determine the suitability and granularity of W3C PROV relationships for the PDS.
Determine appropriate use of bidirectional vs.
unidirectional references between data products.
Develop policies and procedures for governance to ensure consistent provenance recording and utilization.
Conclusion The application of the W3C PROV-DM standard to the PDS4 information model represents a significant step toward improving the traceability, interoperability, and scientific trustworthiness of planetary data.
The use cases presented demonstrate a broad applicability across contexts, from spacecraft imaging pipelines to multi-node stewardship models.
Continued refinement of relationship types and implementation strategies will ensure alignment with both scientific practice and digital preservation principles.
References W3C.
(2013).
PROV-DM: The PROV Data Model.
https://www.
w3.
org/TR/prov-dm/AcknowledgementsThe research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration.
During the preparation of this work the author(s) used GPT4o in order to transform natural language to First Order Logic (FOL) and then to several Data Definition Languages (DDL).
After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.
© 2025 California Institute of Technology.
Government sponsorship acknowledged.
Related Results
Spatial and Open Research Data Infrastructure for Planetary Science - Lessons learned from European developments
Spatial and Open Research Data Infrastructure for Planetary Science - Lessons learned from European developments
The planetary community has access to a wealth of raw research data by using central data distribution platforms such as the Planetary Data System (PDS) [1], the Planetary Science ...
Planetary Cartography: Challenges for Mapping and Research Data Management
Planetary Cartography: Challenges for Mapping and Research Data Management
<p>The aim of this contribution is to summarize recent activities in the field of Planetary Cartography by highlighting current issues the community is facing, and by...
A Scheme for Generating Provenance-aware Applications through UML
A Scheme for Generating Provenance-aware Applications through UML
The metadata that captures information about the origin of data is referred to as data provenance or data lineage. The provenance of a data item captures information about the proc...
OpenPlanetaryMap Updates: Planetary Basemaps and Geocoding Web Services
OpenPlanetaryMap Updates: Planetary Basemaps and Geocoding Web Services
<p>OpenPlanetaryMap (OPM) is a collaborative project to build the first Open Planetary Mapping and Social platform for researchers, educators, storytellers, and the g...
OpenPlanetaryMap Updates: Planetary Basemaps and Geocoding Web Services
OpenPlanetaryMap Updates: Planetary Basemaps and Geocoding Web Services
<p>We introduce the project and present recent updates on OPM planetary basemaps, geocoding APIs and user interfaces.</p>
<p>OpenPlanetary...
Deciphering dilution, grain size, and provenance in sediment geochemistry
Deciphering dilution, grain size, and provenance in sediment geochemistry
A considerable number of geochemical and granulometric datasets from various sediment sequences was gathered during the recent decades in the context of palaeoenvironmental and pal...
CREATION OF AN INNOVATIVE MODEL BY APPLYING A DOUBLE-ROW PLANETARY GEARBOX TO THE OUTPUT SHAFT OF THE PUMPJACK
CREATION OF AN INNOVATIVE MODEL BY APPLYING A DOUBLE-ROW PLANETARY GEARBOX TO THE OUTPUT SHAFT OF THE PUMPJACK
In this study, the application of a planetary gear transmission mechanism in the gearbox of a pumpjack is proposed. The main objective of the proposed approach is to reduce the ove...
Sociodemographic and lifestyle factors associated with the Planetary Health diet in the Korean adult population
Sociodemographic and lifestyle factors associated with the Planetary Health diet in the Korean adult population
This study aimed to examine the associations of sociodemographic and lifestyle factors with adherence to the planetary health diet in the Korean adult population. A total of 25,336...

