Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

YODA: YODA's Organigram on Data Analysis

View through CrossRef
Although "incremental progress" is often referred to as the lesser kind of progress, it is arguably the true foundation of the scientific process. At the same time, building atop one's own or other's achievements of the past is frequently a hard problem. Data formats, computational tools and environments, best scientific practices, and means of exchange of data and results are continuously changing. This makes it challenging to effectively re-use previous scientific artefacts, beyond plain theoretical advances in a field. A core factor behind these issues is the inability of established tools and workflows for managing and  conducting data analyses to adequately capture and verify the complete set of assumptions and preconditions that represent the conceptual basis of any analysis. As a consequence, previous results frequently fail to recompute because "it doesn't run anymore", previous analysis implementations do not work with new data because "something is wrong with these data", or, worse, results are unexpected or plain wrong because "these parts are copied from over here, I think it needs them". Well established free and open source tools our approach relies on: Git, git-annex, and DataLad. This combination of tools enables the reliable, non-ambiguous tracking of different components of a study (e.g., inputs, code, output), while facilitating their independent re-use (e.g., the same data used across multiple studies; versions of a software library used repeatedly), and scales to the dataset sizes found in cutting edge high-resolution neuroimaging research. We propose a simple and intuitive organization of digital components of a scientific study. This comprises: 1) structured (raw) dataset following community standards (BIDS); 2) custom analysis and re-usable code components; 3) complete execution environments (Docker/Singularity); 3) documents for human consumption (notes, article manuscripts); 4) essential metadata of a study (authorship, change log, funding, etc.); 5) programmatic tests to verify the completeness and integrity of a study (e.g. acquisition protocol compliance checks); 6) software adaptors for automated computing, such as continuous integration systems, or high-performance computing environments. DataLad provides a high-level interface that enables users to create and manage these components. Additionally, it supports provenance capture of performed analyses steps ("datalad run"), and easy verification that an update of a component (e.g., an input dataset, or an entire computational environment image) results in reproducible results. We demonstrate how the proposed organization can be used for common research tasks, ranging from capturing acquired data, to the publication of a journal article. This includes a study template  https://github.com/myyoda/template), and concrete demos based on published articles as well as unpublished analyses (e.g. https://github.com/ReproNim/simple_workflow, https://github.com/kyleam/mlb-rundiff) that illustrate how studies can be comprehensively expressed in a portable and reproducible manner. We outline a simple approach to structuring and conducting data analyses that aims to tightly connect all their essential ingredients: data, code, and computational environments in a transparent, modular, accountable, and practical way. It exclusively relies on a file naming convention and tools that are already available to anyone, and is built on workflows that have been proven to be capable of effective large scale collaboration. These tools not only do improve accountability, collaboration, andreproducibility of the research, but also facilitate novel workflows for experimentation and efficient use of available free online platforms and services. Acknowledgements This work was supported by the CRCNS (BMBF 01GQ1411; NSF 1429999, NIH (#1P41EB019936-01A1), and the European Regional Development Fund (ERDF), Project: Center for Behavioral Brain Sciences.
Title: YODA: YODA's Organigram on Data Analysis
Description:
Although "incremental progress" is often referred to as the lesser kind of progress, it is arguably the true foundation of the scientific process.
At the same time, building atop one's own or other's achievements of the past is frequently a hard problem.
Data formats, computational tools and environments, best scientific practices, and means of exchange of data and results are continuously changing.
This makes it challenging to effectively re-use previous scientific artefacts, beyond plain theoretical advances in a field.
A core factor behind these issues is the inability of established tools and workflows for managing and  conducting data analyses to adequately capture and verify the complete set of assumptions and preconditions that represent the conceptual basis of any analysis.
As a consequence, previous results frequently fail to recompute because "it doesn't run anymore", previous analysis implementations do not work with new data because "something is wrong with these data", or, worse, results are unexpected or plain wrong because "these parts are copied from over here, I think it needs them".
Well established free and open source tools our approach relies on: Git, git-annex, and DataLad.
This combination of tools enables the reliable, non-ambiguous tracking of different components of a study (e.
g.
, inputs, code, output), while facilitating their independent re-use (e.
g.
, the same data used across multiple studies; versions of a software library used repeatedly), and scales to the dataset sizes found in cutting edge high-resolution neuroimaging research.
We propose a simple and intuitive organization of digital components of a scientific study.
This comprises: 1) structured (raw) dataset following community standards (BIDS); 2) custom analysis and re-usable code components; 3) complete execution environments (Docker/Singularity); 3) documents for human consumption (notes, article manuscripts); 4) essential metadata of a study (authorship, change log, funding, etc.
); 5) programmatic tests to verify the completeness and integrity of a study (e.
g.
acquisition protocol compliance checks); 6) software adaptors for automated computing, such as continuous integration systems, or high-performance computing environments.
DataLad provides a high-level interface that enables users to create and manage these components.
Additionally, it supports provenance capture of performed analyses steps ("datalad run"), and easy verification that an update of a component (e.
g.
, an input dataset, or an entire computational environment image) results in reproducible results.
We demonstrate how the proposed organization can be used for common research tasks, ranging from capturing acquired data, to the publication of a journal article.
This includes a study template  https://github.
com/myyoda/template), and concrete demos based on published articles as well as unpublished analyses (e.
g.
https://github.
com/ReproNim/simple_workflow, https://github.
com/kyleam/mlb-rundiff) that illustrate how studies can be comprehensively expressed in a portable and reproducible manner.
We outline a simple approach to structuring and conducting data analyses that aims to tightly connect all their essential ingredients: data, code, and computational environments in a transparent, modular, accountable, and practical way.
It exclusively relies on a file naming convention and tools that are already available to anyone, and is built on workflows that have been proven to be capable of effective large scale collaboration.
These tools not only do improve accountability, collaboration, andreproducibility of the research, but also facilitate novel workflows for experimentation and efficient use of available free online platforms and services.
Acknowledgements This work was supported by the CRCNS (BMBF 01GQ1411; NSF 1429999, NIH (#1P41EB019936-01A1), and the European Regional Development Fund (ERDF), Project: Center for Behavioral Brain Sciences.

Related Results

Cometary Physics Laboratory: spectrophotometric experiments
Cometary Physics Laboratory: spectrophotometric experiments
<p><strong><span dir="ltr" role="presentation">1. Introduction</span></strong&...
North Syrian Mortaria and Other Late Roman Personal and Utility Objects Bearing Inscriptions of Good Luck
North Syrian Mortaria and Other Late Roman Personal and Utility Objects Bearing Inscriptions of Good Luck
<span style="font-size: 11pt; color: black; font-family: 'Times New Roman','serif'">&Pi;&Eta;&Lambda;&Iota;&Nu;&Alpha; &Iota;&Gamma;&Delta...
A Touch of Space Weather - Outreach project for visually impaired students
A Touch of Space Weather - Outreach project for visually impaired students
&lt;p&gt;&lt;em&gt;&lt;span data-preserver-spaces=&quot;true&quot;&gt;'A Touch of Space Weather' is a project that brings space weather science into...
Morphometry of an hexagonal pit crater in Pavonis Mons, Mars
Morphometry of an hexagonal pit crater in Pavonis Mons, Mars
&lt;p&gt;&lt;strong&gt;Introduction:&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;Pit craters are peculiar depressions found in almost every terrestria...
Un manoscritto equivocato del copista santo Theophilos († 1548)
Un manoscritto equivocato del copista santo Theophilos († 1548)
<p><font size="3"><span class="A1"><span style="font-family: 'Times New Roman','serif'">&Epsilon;&Nu;&Alpha; &Lambda;&Alpha;&Nu;&...
Ballistic landslides on comet 67P/Churyumov&#8211;Gerasimenko
Ballistic landslides on comet 67P/Churyumov&#8211;Gerasimenko
&lt;p&gt;&lt;strong&gt;Introduction:&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;The slow ejecta (i.e., with velocity lower than escape velocity) and l...
Effects of a new land surface parametrization scheme on thermal extremes in a Regional Climate Model
Effects of a new land surface parametrization scheme on thermal extremes in a Regional Climate Model
&lt;p&gt;&lt;span&gt;The &lt;/span&gt;&lt;span&gt;EFRE project Big Data@Geo aims at providing high resolution &lt;/span&gt;&lt;span&...

Back to Top