Javascript must be enabled to continue!

Formal validation of variant classification rules using domain-specific language and meta-predicates

The classification and curation of genetic variants is a critical step in both clinical genomics and biomedical research. Variant interpretation algorithms, whether rule-based or machine learning-driven, are developed by bioinformaticians, but ultimately used by clinicians, evaluated by payers, and scrutinized by regulators. This creates a disconnect: while practitioners and regulators define high-level classification principles, they lack tools to verify that actual algorithms adhere to them. To bridge this gap, we introduce a DSL-based validation framework that makes variant classification rules explainable, traceable, and verifiable. We first introduced our DSL-based approach with the AnFiSA variant curation and interpretation platform at BOSC 2019 [1] and published a detailed description in the Journal of Biomedical Informatics in 2022 [2]. In 2023, we advanced this work by formalizing the typing of genetic annotation variables [3] and expanding the DSL’s application beyond genetics into domains such as population and environmental health [4]. Our current contribution introduces a novel validation mechanism based on meta-predicates, which we embed into DSL scripts to specify and verify the properties of annotations used within classification rules. This methodology is inspired by Invariant-based Programming [5], in which specifications are written alongside code to ensure correctness. Our objective diverges from traditional software code verification: rather than using meta-predicates to formally prove correctness, we aim to enhance the explainability and accountability of variant curation workflows. Meta-predicates make it possible to verify that the logic behind a classification rule aligns with user-defined criteria regarding evidence, provenance, knowledge domain, scale, and purpose — an ontology we proposed in our 2023 BOSC presentation [3]. Specification : The formal language specification for the DSL including meta-predicates with examples can be found in our GitHub repository dedicated page: https://github.com/ForomePlatform/anfisa/blob/avarstar.0/doc/specs/dsl.md . We classify genetic annotations according to their purpose (provenance or evidence), the knowledge domain to which they belong, the method by which they were obtained and the scale of the annotation (e.g., specific variant or gene). A full classification of the annotations currently used in AnFiSA is also on our GitHub: https://github.com/ForomePlatform/anfisa/blob/avarstar.0/app/config/dictionary/annotations.yml Validating Rules and Classifications : During the talk we present real-world examples where our approach might add value for users and regulators: Rare Disease Diagnostics: Our system ensures rules assessing loss-of-function effects validate variant-level annotations (e.g., stop_gained) rather than relying solely on gene-level tolerance metrics (e.g., pLI). Clinical Reporting: We verify that technical metrics are integrated correctly into filtering decisions, supporting confidence in high-stakes clinical interventions. Multi-Domain Evidence: Our meta-predicate framework checks that rules combine evidence from distinct domains (e.g., population and functional genetics) as mandated by ACMG/AMP guidelines. Compliance: We can ensure “clinically actionable” variants are supported by clinical evidence. We can verify that quality control filters (e.g., QD) are consistently applied. We can confirm that rules used in regulatory submissions (e.g., to the FDA) appropriately combine functional and clinical evidence. Variant-Level Traceability : In addition to validating general rules, our system supports per-variant traceability. Users can follow the logical flow of any specific variant through inclusion and exclusion criteria, enabling comprehensive review and debugging. Our framework enables visualization of the full selection path for any variant, showing the filters it passed and the logic behind its classification. These paths are output using DOT language [6] and are represented as directed graphs, enhancing traceability and auditability. Community Engagement and Standardization : We invite the community to evaluate and collaborate on this approach. If there is broad interest, we propose advocating for its adoption as a formal standard, similar to how the Common Workflow Language has been standardized by IEEE and recognized by the FDA for describing bioinformatics pipelines. DSL and AI Synergy : The AnFiSA DSL is a lightweight, Python-inspired language designed to express variant filtering logic in a structured and auditable way. Scripts operate on streams of records — JSON-like representations of annotated genetic variants—and apply rules composed of logical statements enriched with metadata. Importantly, our approach aligns with the emerging paradigm of Deep Learning-Guided Program Synthesis, which has been shown to outperform test-time training and fine tuning (TTT/TTFT) [7] for complex reasoning tasks. Generative AI can potentially assist in writing DSL scripts, but without safeguards may produce unverified or hallucinated logic. Our, use of meta-predicates ensures that even AI-generated logic remains transparent, explainable, and verifiable, safeguarding against hallucination effects. By proposing integration of AI assistance with formal validation, we offer a path toward robust, interpretable, and compliant variant classification systems—meeting the needs of both developers and end users in modern genomic medicine. Michael Bouzinier, Serhey Trifonov, Joel Krier, Dmitry Etin, Dimitri Olchanyi, Alexey Kargalov, Arezou Ghazani, Shamil Sunyaev. Forome Anfisa – an open source variant interpretation tool, presented at BOSC 2019. https://doi.org/10.7490/f1000research.1117292.1 Bouzinier, M. A. et al. AnFiSA: An open-source computational platform for the analysis of sequencing data for rare genetic disease. J. Biomed. Inform. 133, 104174 (2022). Marina Pozhidaeva, Dmitry Etin, Gennadii Zakharov, Michael Bouzinier. Domain Specific Language and variables for systematic approach to genetic variant curation and interpretation, presented at BOSC 2019. https://doi.org/10.7490/f1000research.1119632.1 Michelle Audirac, Michael Bouzinier, Danielle Braun, Mahmood M. Shad, Scott Yockel. Systematic approach to preparing of medical claims data for biomedical research. Presented at BOSC 2023. https://doi.org/10.7490/f1000research.1119612.1 Eriksson, J., Parsa, M., Back, RJ. (2014). Proofs and Refutations in Invariant-Based Programming. In: Albert, E., Sekerinski, E. (eds) Integrated Formal Methods. IFM 2014. Lecture Notes in Computer Science(), vol 8739. Springer, Cham. https://doi.org/10.1007/978-3-319-10181-1_12 DOT Language. https://graphviz.org/doc/info/lang.html François Chollet, Mike Knoop, Gregory Kamradt, Bryan Landers, ARC Prize 2024: Technical Report, arXiv:2412.04604 [cs.AI], https://doi.org/10.48550/arXiv.2412.04604

F1000 Research Ltd

Dmitry Etin Michael Bouzinier Sergey Trifonov Eugenia Lvova Giorgi Shavtvalishvili Michael Chmuack

2025

Title: Formal validation of variant classification rules using domain-specific language and meta-predicates

Description:

The classification and curation of genetic variants is a critical step in both clinical genomics and biomedical research.

Variant interpretation algorithms, whether rule-based or machine learning-driven, are developed by bioinformaticians, but ultimately used by clinicians, evaluated by payers, and scrutinized by regulators.

This creates a disconnect: while practitioners and regulators define high-level classification principles, they lack tools to verify that actual algorithms adhere to them.

To bridge this gap, we introduce a DSL-based validation framework that makes variant classification rules explainable, traceable, and verifiable.

We first introduced our DSL-based approach with the AnFiSA variant curation and interpretation platform at BOSC 2019 [1] and published a detailed description in the Journal of Biomedical Informatics in 2022 [2].

In 2023, we advanced this work by formalizing the typing of genetic annotation variables [3] and expanding the DSL’s application beyond genetics into domains such as population and environmental health [4].

Our current contribution introduces a novel validation mechanism based on meta-predicates, which we embed into DSL scripts to specify and verify the properties of annotations used within classification rules.

This methodology is inspired by Invariant-based Programming [5], in which specifications are written alongside code to ensure correctness.

Our objective diverges from traditional software code verification: rather than using meta-predicates to formally prove correctness, we aim to enhance the explainability and accountability of variant curation workflows.

Meta-predicates make it possible to verify that the logic behind a classification rule aligns with user-defined criteria regarding evidence, provenance, knowledge domain, scale, and purpose — an ontology we proposed in our 2023 BOSC presentation [3].

Specification : The formal language specification for the DSL including meta-predicates with examples can be found in our GitHub repository dedicated page: https://github.

com/ForomePlatform/anfisa/blob/avarstar.

0/doc/specs/dsl.

md .

We classify genetic annotations according to their purpose (provenance or evidence), the knowledge domain to which they belong, the method by which they were obtained and the scale of the annotation (e.

, specific variant or gene).

A full classification of the annotations currently used in AnFiSA is also on our GitHub: https://github.

com/ForomePlatform/anfisa/blob/avarstar.

0/app/config/dictionary/annotations.

yml Validating Rules and Classifications : During the talk we present real-world examples where our approach might add value for users and regulators: Rare Disease Diagnostics: Our system ensures rules assessing loss-of-function effects validate variant-level annotations (e.

, stop_gained) rather than relying solely on gene-level tolerance metrics (e.

, pLI).

Clinical Reporting: We verify that technical metrics are integrated correctly into filtering decisions, supporting confidence in high-stakes clinical interventions.

Multi-Domain Evidence: Our meta-predicate framework checks that rules combine evidence from distinct domains (e.

, population and functional genetics) as mandated by ACMG/AMP guidelines.

Compliance: We can ensure “clinically actionable” variants are supported by clinical evidence.

We can verify that quality control filters (e.

, QD) are consistently applied.

We can confirm that rules used in regulatory submissions (e.

, to the FDA) appropriately combine functional and clinical evidence.

Variant-Level Traceability : In addition to validating general rules, our system supports per-variant traceability.

Users can follow the logical flow of any specific variant through inclusion and exclusion criteria, enabling comprehensive review and debugging.

Our framework enables visualization of the full selection path for any variant, showing the filters it passed and the logic behind its classification.

These paths are output using DOT language [6] and are represented as directed graphs, enhancing traceability and auditability.

Community Engagement and Standardization : We invite the community to evaluate and collaborate on this approach.

If there is broad interest, we propose advocating for its adoption as a formal standard, similar to how the Common Workflow Language has been standardized by IEEE and recognized by the FDA for describing bioinformatics pipelines.

DSL and AI Synergy : The AnFiSA DSL is a lightweight, Python-inspired language designed to express variant filtering logic in a structured and auditable way.

Scripts operate on streams of records — JSON-like representations of annotated genetic variants—and apply rules composed of logical statements enriched with metadata.

Importantly, our approach aligns with the emerging paradigm of Deep Learning-Guided Program Synthesis, which has been shown to outperform test-time training and fine tuning (TTT/TTFT) [7] for complex reasoning tasks.

Generative AI can potentially assist in writing DSL scripts, but without safeguards may produce unverified or hallucinated logic.

Our, use of meta-predicates ensures that even AI-generated logic remains transparent, explainable, and verifiable, safeguarding against hallucination effects.

By proposing integration of AI assistance with formal validation, we offer a path toward robust, interpretable, and compliant variant classification systems—meeting the needs of both developers and end users in modern genomic medicine.

Michael Bouzinier, Serhey Trifonov, Joel Krier, Dmitry Etin, Dimitri Olchanyi, Alexey Kargalov, Arezou Ghazani, Shamil Sunyaev.

Forome Anfisa – an open source variant interpretation tool, presented at BOSC 2019.

https://doi.

org/10.

7490/f1000research.

1117292.

1 Bouzinier, M.

et al.

AnFiSA: An open-source computational platform for the analysis of sequencing data for rare genetic disease.

Biomed.

Inform.

133, 104174 (2022).

Marina Pozhidaeva, Dmitry Etin, Gennadii Zakharov, Michael Bouzinier.

Domain Specific Language and variables for systematic approach to genetic variant curation and interpretation, presented at BOSC 2019.

https://doi.

org/10.

7490/f1000research.

1119632.

1 Michelle Audirac, Michael Bouzinier, Danielle Braun, Mahmood M.

Shad, Scott Yockel.

Systematic approach to preparing of medical claims data for biomedical research.

Presented at BOSC 2023.

https://doi.

org/10.

7490/f1000research.

1119612.

1 Eriksson, J.

, Parsa, M.

, Back, RJ.

(2014).

Proofs and Refutations in Invariant-Based Programming.

In: Albert, E.

, Sekerinski, E.

(eds) Integrated Formal Methods.

IFM 2014.

Lecture Notes in Computer Science(), vol 8739.

Springer, Cham.

https://doi.

org/10.

1007/978-3-319-10181-1_12 DOT Language.

https://graphviz.

org/doc/info/lang.

html François Chollet, Mike Knoop, Gregory Kamradt, Bryan Landers, ARC Prize 2024: Technical Report, arXiv:2412.

04604 [cs.

AI], https://doi.

org/10.

48550/arXiv.

2412.

04604 .

Back

<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...

Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga

The actual use of classroom language is principally limited to the classroom environment. As far as foreign language learning is concerned, the classroom often turns out to be the ...

Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program

Abstract Funding Acknowledgements Type of funding sources: None. INTRODUCTION Patients with heart failure (HF)...

Sleep Habits and Occurrence of Lowback Pain among Craftsmen

<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...

Sleep Habits and Occurrence of Lowback Pain among Craftsmen

<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...

Validation in Doctoral Education: Exploring PhD Students’ Perceptions of Belonging to Scaffold Doctoral Identity Work

Aim/Purpose: The aim of this article is to make a case of the role of validation in doctoral education. The purpose is to detail findings from three studies which explore PhD stude...

Compound predicates in Boleslav Prus's novel "Pharaoh" and in its Ukrainian translation

In Ukrainian studies, not enough attention is paid to the problems of translating Polish predicates into Ukrainian, although both theory and practice of translation need addressing...

Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches

Aim/Purpose: The purpose of this paper is to address the gap in the recognition of prior learning (RPL) by automating the classification of non-formal learning certificates using d...

Email:
Password:

Email:

Formal validation of variant classification rules using domain-specific language and meta-predicates

Related Results