Javascript must be enabled to continue!
Supervised non-negative matrix factorization on cell-free DNA fragmentomic features enhances early cancer detection
View through CrossRef
Abstract
Background
Cell-free circulating DNA (cfDNA) fragments exhibit non-random patterns in their length (FLEN), end-motif (EM), and distance to nucleosome position (ND). While these cfDNA features have shown promise as inputs for machine learning and deep learning models in early cancer detection, most studies utilize them as raw inputs, overlooking the potential benefits of pre-processing to extract cancer-specific features. This study aims to enhance cancer detection accuracy by developing a novel approach to feature extraction from cfDNA fragmentomics.
Methods
We implemented a supervised non-negative matrix factorization (SNMF) algorithm to generate embedding vectors capturing cancer-specific signals within cfDNA fragmentomic features. These embeddings served as input for a machine learning model to classify cancer patients from healthy individuals.
Results
We validated our framework using two datasets: an in-house cohort of 431 cancer patients and 442 healthy individuals (dataset 1), and a published cohort comprising 90 hepatocellular carcinoma (HCC) patients and 103 individuals with cirrhosis or hepatitis B (dataset 2). In dataset 1, we achieved an AUC of 94% in pan-cancer detection. In dataset 2, our framework achieved an AUC of 100% for HCC vs healthy classification, 99% for HCC vs non-HCC patients classification, and 96% for identifying HCC patients among a mixed group of non-HCC patients and healthy donors.
Conclusion
This study demonstrates the efficiency of SNMF-transformed features in improving both pan-cancer detection and specific HCC detection. Our approach offers a significant advancement in leveraging cfDNA fragmentomics for early cancer detection, potentially enhancing diagnostic accuracy in clinical settings.
openRxiv
Trung Hieu Tran
Ngoc Tan Pham
Van Thien Chi Nguyen
Dac Ho Vo
Thi Hue Hanh Nguyen
Thi Trang Tran
Thanh Truong Tran
Truong Dang Huy Vo
Thi Huyen Dao
Huu Tam Phuc Nguyen
Thi Van Phan
Thi Minh Thi Ha
Thi Dieu Huong Ngo
Nhat Huy Tran
Nhat-Thang Tran
Thanh Quang Hoang
Viet Binh Nguyen
Van Cuong Le
Xuan Chung Nguyen
Thi Minh Phuong Nguyen
Van Hung Nguyen
Nu Thien Nhat Tran
Thi Ngoc Quynh Dang
Manh Hoang Tran
Phuc Nguyen Nguyen
Thi Anh Tuyet Pham
Duy Long Vo
Thuy Nguyen Doan
Viet Hai Nguyen
Quang Dat Tran
Quang Thong Dang
Le Minh Quoc Ho
Vu Tuan Anh Nguyen
Sao Trung Nguyen
Hoai-Nghia Nguyen
Le Son Tran
Hoa Giang
Minh-Duy Phan
Trong Hieu Nguyen
Title: Supervised non-negative matrix factorization on cell-free DNA fragmentomic features enhances early cancer detection
Description:
Abstract
Background
Cell-free circulating DNA (cfDNA) fragments exhibit non-random patterns in their length (FLEN), end-motif (EM), and distance to nucleosome position (ND).
While these cfDNA features have shown promise as inputs for machine learning and deep learning models in early cancer detection, most studies utilize them as raw inputs, overlooking the potential benefits of pre-processing to extract cancer-specific features.
This study aims to enhance cancer detection accuracy by developing a novel approach to feature extraction from cfDNA fragmentomics.
Methods
We implemented a supervised non-negative matrix factorization (SNMF) algorithm to generate embedding vectors capturing cancer-specific signals within cfDNA fragmentomic features.
These embeddings served as input for a machine learning model to classify cancer patients from healthy individuals.
Results
We validated our framework using two datasets: an in-house cohort of 431 cancer patients and 442 healthy individuals (dataset 1), and a published cohort comprising 90 hepatocellular carcinoma (HCC) patients and 103 individuals with cirrhosis or hepatitis B (dataset 2).
In dataset 1, we achieved an AUC of 94% in pan-cancer detection.
In dataset 2, our framework achieved an AUC of 100% for HCC vs healthy classification, 99% for HCC vs non-HCC patients classification, and 96% for identifying HCC patients among a mixed group of non-HCC patients and healthy donors.
Conclusion
This study demonstrates the efficiency of SNMF-transformed features in improving both pan-cancer detection and specific HCC detection.
Our approach offers a significant advancement in leveraging cfDNA fragmentomics for early cancer detection, potentially enhancing diagnostic accuracy in clinical settings.
Related Results
Predictors of False-Negative Axillary FNA Among Breast Cancer Patients: A Cross-Sectional Study
Predictors of False-Negative Axillary FNA Among Breast Cancer Patients: A Cross-Sectional Study
Abstract
Introduction
Fine-needle aspiration (FNA) is commonly used to investigate lymphadenopathy of suspected metastatic origin. The current study aims to find the association be...
Complex Collision Tumors: A Systematic Review
Complex Collision Tumors: A Systematic Review
Abstract
Introduction: A collision tumor consists of two distinct neoplastic components located within the same organ, separated by stromal tissue, without histological intermixing...
Genome wide hypomethylation and youth-associated DNA gap reduction promoting DNA damage and senescence-associated pathogenesis
Genome wide hypomethylation and youth-associated DNA gap reduction promoting DNA damage and senescence-associated pathogenesis
Abstract
Background: Age-associated epigenetic alteration is the underlying cause of DNA damage in aging cells. Two types of youth-associated DNA-protection epigenetic mark...
Genome wide hypomethylation and youth-associated DNA gap reduction promoting DNA damage and senescence-associated pathogenesis
Genome wide hypomethylation and youth-associated DNA gap reduction promoting DNA damage and senescence-associated pathogenesis
Introduction: The United States currently faces two opioid crises, an evolved crisis currently manifesting as widespread abuse of illicit opioids, and a crisis in pain management l...
Echinococcus granulosus in Environmental Samples: A Cross-Sectional Molecular Study
Echinococcus granulosus in Environmental Samples: A Cross-Sectional Molecular Study
Abstract
Introduction
Echinococcosis, caused by tapeworms of the Echinococcus genus, remains a significant zoonotic disease globally. The disease is particularly prevalent in areas...
MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing v1
MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing v1
Human tissues comprise trillions of cells that populate a complex space of molecular phenotypes and functions and that vary in abundance by 4–9 orders of magnitude. Relying solely ...
Denoising Auto-Encoder-Enhanced Deep Non-Negative Matrix Factorization Clustering Model
Denoising Auto-Encoder-Enhanced Deep Non-Negative Matrix Factorization Clustering Model
Non-negative matrix factorization directly decomposes data features into a base matrix and community matrix, which are easily affected by noise. Multi-view datasets have multiple f...
Are Cervical Ribs Indicators of Childhood Cancer? A Narrative Review
Are Cervical Ribs Indicators of Childhood Cancer? A Narrative Review
Abstract
A cervical rib (CR), also known as a supernumerary or extra rib, is an additional rib that forms above the first rib, resulting from the overgrowth of the transverse proce...

