Javascript must be enabled to continue!

Domain Specific Language and variables for systematic approach to genetic variant curation and interpretation

Four years ago, at BOSC 2019, the Forome Platform team first presented AnFiSA, an open-source variant curation and interpretation tool for analysis of the data produced from genome sequencing. In the last year the work has been published in the Journal of Biomedical Informatics1. One of the focuses of the paper was the creation of a community comprising clinicians, researchers, and professional software developers to reach our goals. We received contributions from numerous organizations and dozens of individual contributors to facilitate this effort. It became evident that for a community to operate efficiently, there must be a shared language that comes naturally to all members. Back in 2019, we argued that representing variant curation guidelines in the form of manually curated decision trees would align the guidelines development workflow with that of software development. We also suggested that developers of these guidelines would benefit from utilizing an Integrated Development Environment (IDE) that offers functionalities such as syntax highlighting, error checking, and visual debugging. Over the past four years, we have made significant progress towards creating such an IDE. We replaced the Vue.js client with one based on React.js and introduced IDE-like features. As the guidelines are often articulated as free-text English prose, we contend that developing a more structured domain-specific language (DSL) that can be grasped by both clinicians and engineers/researchers would be immensely advantageous for the community as well as helpful in mimicking software development process. Our development of the DSL and its corresponding visual representations for editing and debugging allowed us to condense decision trees into ordered sequences of inclusion and exclusion criteria. In this talk, we aim to present our systematic approach for converting free text into complex logical statements and extensive decision trees, which can be further translated into inclusion and exclusion criteria depicted by graphical widgets. We will also demonstrate the utilization of these widgets and dashboards in a debugging process for a particular corpus of data. We would like to emphasize that we view the visualization of the ordered sequences of inclusion and exclusion criteria as a valuable tool, extending beyond variant curation to areas such as patient recruitment and population health research. Our DSL utilizes genetic and sequencing annotations as variables, which encompass both technical and biological information. Technical annotations include provenance and confidence data for specific calls, while biological annotations summarize evidence pertaining to the potential impact of mutations on molecular function and phenotype. These annotations collectively provide clinicians and researchers with information on provenance, confidence, and evidence to aid in decision making. Our work on the DSL revealed the need to structure annotation types systematically to help answer three critical questions: 1) How confident am I that the variant is present in my patient? 2) How confident am I that it affects the phenotype? 3) What evidence supports the association between the observed genotype change and the phenotype? In order to accomplish this objective, in the latest version of AnFiSA we have implemented a classification system for annotations based on their scale and resolution, knowledge domain, and method. Genetic and biological annotations gather relevant evidence from public and proprietary sources about the potential mutation effects on molecular function and phenotype, combining multiple inputs to provide a summary. The strength of evidence heavily relies on the level of resolution of a specific annotation, as well as the knowledge domain and the method of an annotation2. The resolution ranges from granular transcript and variant specific to broader functional units. The knowledge domain refers to areas such as human or animal genetics, or molecular function described by experimental or computational techniques. The methods used for generating annotations vary among statistical genetics inference, bioinformatics predictions, and in vivo and in vitro experiments. The initial implementation of this classification is accessible on GitHub and can be tested in the most recent versions of AnFiSA. Our proposal is to initiate a conversation regarding the semantic and syntax of the DSL for variant curation and interpretation, as well as the most effective and intuitive way to structure annotation types. We believe that developing a standard or at least a guideline, for representation of variant curation rules will help in establishing a unified and systematic approach to reporting clinically relevant and actionable variants and ultimately will bring us closer to evidence based medicine. 1. Bouzinier, M. A. et al. AnFiSA: An open-source computational platform for the analysis of sequencing data for rare genetic disease. J. Biomed. Inform. 133, 104174 (2022). 2. Hufeng Zhou et al. FAVOR: Functional Annotation of Variants Online Resource and Annotator for Variation across the Human Genome. Nucleic Acids Res 2022 Nov 9; gkac966. PMID: 36350676. DOI: 10.1093/nar/gkac966

F1000 Research Ltd

Marina Pozhidaeva Dmitry Etin Gennadii Zakharov Michael Bouzinier

2025

Title: Domain Specific Language and variables for systematic approach to genetic variant curation and interpretation

Description:

Four years ago, at BOSC 2019, the Forome Platform team first presented AnFiSA, an open-source variant curation and interpretation tool for analysis of the data produced from genome sequencing.

In the last year the work has been published in the Journal of Biomedical Informatics1.

One of the focuses of the paper was the creation of a community comprising clinicians, researchers, and professional software developers to reach our goals.

We received contributions from numerous organizations and dozens of individual contributors to facilitate this effort.

It became evident that for a community to operate efficiently, there must be a shared language that comes naturally to all members.

Back in 2019, we argued that representing variant curation guidelines in the form of manually curated decision trees would align the guidelines development workflow with that of software development.

We also suggested that developers of these guidelines would benefit from utilizing an Integrated Development Environment (IDE) that offers functionalities such as syntax highlighting, error checking, and visual debugging.

Over the past four years, we have made significant progress towards creating such an IDE.

We replaced the Vue.

js client with one based on React.

js and introduced IDE-like features.

As the guidelines are often articulated as free-text English prose, we contend that developing a more structured domain-specific language (DSL) that can be grasped by both clinicians and engineers/researchers would be immensely advantageous for the community as well as helpful in mimicking software development process.

Our development of the DSL and its corresponding visual representations for editing and debugging allowed us to condense decision trees into ordered sequences of inclusion and exclusion criteria.

In this talk, we aim to present our systematic approach for converting free text into complex logical statements and extensive decision trees, which can be further translated into inclusion and exclusion criteria depicted by graphical widgets.

We will also demonstrate the utilization of these widgets and dashboards in a debugging process for a particular corpus of data.

We would like to emphasize that we view the visualization of the ordered sequences of inclusion and exclusion criteria as a valuable tool, extending beyond variant curation to areas such as patient recruitment and population health research.

Our DSL utilizes genetic and sequencing annotations as variables, which encompass both technical and biological information.

Technical annotations include provenance and confidence data for specific calls, while biological annotations summarize evidence pertaining to the potential impact of mutations on molecular function and phenotype.

These annotations collectively provide clinicians and researchers with information on provenance, confidence, and evidence to aid in decision making.

Our work on the DSL revealed the need to structure annotation types systematically to help answer three critical questions: 1) How confident am I that the variant is present in my patient? 2) How confident am I that it affects the phenotype? 3) What evidence supports the association between the observed genotype change and the phenotype? In order to accomplish this objective, in the latest version of AnFiSA we have implemented a classification system for annotations based on their scale and resolution, knowledge domain, and method.

Genetic and biological annotations gather relevant evidence from public and proprietary sources about the potential mutation effects on molecular function and phenotype, combining multiple inputs to provide a summary.

The strength of evidence heavily relies on the level of resolution of a specific annotation, as well as the knowledge domain and the method of an annotation2.

The resolution ranges from granular transcript and variant specific to broader functional units.

The knowledge domain refers to areas such as human or animal genetics, or molecular function described by experimental or computational techniques.

The methods used for generating annotations vary among statistical genetics inference, bioinformatics predictions, and in vivo and in vitro experiments.

The initial implementation of this classification is accessible on GitHub and can be tested in the most recent versions of AnFiSA.

Our proposal is to initiate a conversation regarding the semantic and syntax of the DSL for variant curation and interpretation, as well as the most effective and intuitive way to structure annotation types.

We believe that developing a standard or at least a guideline, for representation of variant curation rules will help in establishing a unified and systematic approach to reporting clinically relevant and actionable variants and ultimately will bring us closer to evidence based medicine.

Bouzinier, M.

et al.

AnFiSA: An open-source computational platform for the analysis of sequencing data for rare genetic disease.

Biomed.

Inform.

133, 104174 (2022).

Hufeng Zhou et al.

FAVOR: Functional Annotation of Variants Online Resource and Annotator for Variation across the Human Genome.

Nucleic Acids Res 2022 Nov 9; gkac966.

PMID: 36350676.

DOI: 10.

1093/nar/gkac966.

Back

<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...

Formal validation of variant classification rules using domain-specific language and meta-predicates

The classification and curation of genetic variants is a critical step in both clinical genomics and biomedical research. Variant interpretation algorithms, whether rule-based or m...

Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga

The actual use of classroom language is principally limited to the classroom environment. As far as foreign language learning is concerned, the classroom often turns out to be the ...

Domain Specific Language and variables for systematic approach to genetic variant curation and interpretation

Four years ago, at BOSC 2019, the Forome Platform team first presented AnFiSA, an open-source variant curation and interpretation tool for analysis of the data produced from genome...

Forome Anfisa – an open source variant interpretation tool

Whole exome and whole genome sequencing are being rapidly adopted in the healthcare industry, making way into the routine clinical practice. Most variant interpreta...

Digital Curation and Doctoral Research

This article considers digital curation in doctoral study and the role of the doctoral supervisor and institution in facilitating studentsâ€™ acquisition of digital curation skills...

Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program

Abstract Funding Acknowledgements Type of funding sources: None. INTRODUCTION Patients with heart failure (HF)...

Sleep Habits and Occurrence of Lowback Pain among Craftsmen

<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...

Email:
Password:

Email:

Domain Specific Language and variables for systematic approach to genetic variant curation and interpretation

Related Results