Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Explainable cohort discoveries driven by exploratory data mining and efficient risk pattern detection

View through CrossRef
[EMBARGOED UNTIL 6/1/2023] Finding small homogeneous subgroup cohorts in a large heterogeneous population is a critical process for hypothesis development within a broad range of applications, such as fraud detection, ad targeting, and geospatial traffic intervention. Most recently, cohort discovery has begun to play an important role in medical research as it has contributed to the targeting of high-need patients from smaller homogeneous subgroups for precision heath with better outcomes. Specifically, there has been a rising demand to identify the cohorts and the corresponding risk factors in precision medicine and preventive healthcare to better understand the etiology of diseases in order to tailor treatments for targeting patients. There is a clear need to discover the novel cohorts and the risk factors in the abovementioned application areas. Unfortunately, concurrent computational approaches are still lacking robust answers to the question: "which subgroups are likely to be novel and may benefit from interventions that are likely to be effective for the selected population?" Additionally, the majority of prevention research has focused on single or simple factor identification. Only a few studies have considered complex risk factors, and they are still at a preliminary stage. The development of machine learning and data mining algorithms sheds light on many areas. However, most high-performing approaches do not provide the interpretability for eXplainable artificial intelligence (XAI). These black box approaches often provide a predictive analytic capability to determine which class samples belong to. This supervised classification task requires pre-set labels in the data instead of exploring the sub-clusters. There is a need to develop innovative, data-driven, explainable cohort discovery approaches. To bridge the knowledge gap, we developed a novel subgroup discovery method which employs a deep exploratory mining process to slice and dice thousands of potential subpopulations and prioritize potential cohorts based on their explainable contrast patterns. Computational experiments were conducted on both synthesized data and a clinical autism dataset to assess performance quantitatively for coverage of pre-defined cohorts and qualitatively for novel knowledge discovery, respectively. Furthermore, scaling analysis was conducted using a distributed computing environment to suggest computational resource needs when there is an increase in subpopulation number. To address the limitation of current risk factors identification approaches, we further created a novel dynamic tree structure, Risk Hierarchical Pattern Tree (RHPTree), and a top-down search method, RHPSearch, which are both capable of efficiently analyzing a large volume of data. We also introduced two specialized search methods, the extended target search (RHPSearch-TS) and the parallel search approach (RHPSearch-SD), to further speed up the retrieval of certain items of interest. Experiments on both benchmark datasets and real-world data demonstrate that our method is not only faster but also more effective in identifying comprehensive long risk patterns than existing works. To further address real-world applications of computational work in biomedicine, we developed a multi-layer, unbiased cohort discovery architecture to provide the broad biomedical research community with a computational tool that offers capabilities beyond what traditional unsupervised cohort discovery methods, such as latent class analysis, can achieve. Experiments were conducted on both synthetic datasets and a clinical type 1 diabetes (T1D) dataset to assess the efficiency and discovery capability of the method. The high coverage, fast speed, and novel findings on the datasets demonstrate that our method is robust and feasible for cohort discovery research. The computational contributions in this dissertation work lay a foundation for eXplainable and actionable artificial intelligence (X2AI) with multiple successful applications in cancer drug repositioning, type 1 diabetes studies, environmental impacts on liver cancer, and the impact of the COVID-19 pandemic.
University of Missouri Libraries
Title: Explainable cohort discoveries driven by exploratory data mining and efficient risk pattern detection
Description:
[EMBARGOED UNTIL 6/1/2023] Finding small homogeneous subgroup cohorts in a large heterogeneous population is a critical process for hypothesis development within a broad range of applications, such as fraud detection, ad targeting, and geospatial traffic intervention.
Most recently, cohort discovery has begun to play an important role in medical research as it has contributed to the targeting of high-need patients from smaller homogeneous subgroups for precision heath with better outcomes.
Specifically, there has been a rising demand to identify the cohorts and the corresponding risk factors in precision medicine and preventive healthcare to better understand the etiology of diseases in order to tailor treatments for targeting patients.
There is a clear need to discover the novel cohorts and the risk factors in the abovementioned application areas.
Unfortunately, concurrent computational approaches are still lacking robust answers to the question: "which subgroups are likely to be novel and may benefit from interventions that are likely to be effective for the selected population?" Additionally, the majority of prevention research has focused on single or simple factor identification.
Only a few studies have considered complex risk factors, and they are still at a preliminary stage.
The development of machine learning and data mining algorithms sheds light on many areas.
However, most high-performing approaches do not provide the interpretability for eXplainable artificial intelligence (XAI).
These black box approaches often provide a predictive analytic capability to determine which class samples belong to.
This supervised classification task requires pre-set labels in the data instead of exploring the sub-clusters.
There is a need to develop innovative, data-driven, explainable cohort discovery approaches.
To bridge the knowledge gap, we developed a novel subgroup discovery method which employs a deep exploratory mining process to slice and dice thousands of potential subpopulations and prioritize potential cohorts based on their explainable contrast patterns.
Computational experiments were conducted on both synthesized data and a clinical autism dataset to assess performance quantitatively for coverage of pre-defined cohorts and qualitatively for novel knowledge discovery, respectively.
Furthermore, scaling analysis was conducted using a distributed computing environment to suggest computational resource needs when there is an increase in subpopulation number.
To address the limitation of current risk factors identification approaches, we further created a novel dynamic tree structure, Risk Hierarchical Pattern Tree (RHPTree), and a top-down search method, RHPSearch, which are both capable of efficiently analyzing a large volume of data.
We also introduced two specialized search methods, the extended target search (RHPSearch-TS) and the parallel search approach (RHPSearch-SD), to further speed up the retrieval of certain items of interest.
Experiments on both benchmark datasets and real-world data demonstrate that our method is not only faster but also more effective in identifying comprehensive long risk patterns than existing works.
To further address real-world applications of computational work in biomedicine, we developed a multi-layer, unbiased cohort discovery architecture to provide the broad biomedical research community with a computational tool that offers capabilities beyond what traditional unsupervised cohort discovery methods, such as latent class analysis, can achieve.
Experiments were conducted on both synthetic datasets and a clinical type 1 diabetes (T1D) dataset to assess the efficiency and discovery capability of the method.
The high coverage, fast speed, and novel findings on the datasets demonstrate that our method is robust and feasible for cohort discovery research.
The computational contributions in this dissertation work lay a foundation for eXplainable and actionable artificial intelligence (X2AI) with multiple successful applications in cancer drug repositioning, type 1 diabetes studies, environmental impacts on liver cancer, and the impact of the COVID-19 pandemic.

Related Results

Depth-aware salient object segmentation
Depth-aware salient object segmentation
Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...
Microwave Ablation with or Without Chemotherapy in Management of Non-Small Cell Lung Cancer: A Systematic Review
Microwave Ablation with or Without Chemotherapy in Management of Non-Small Cell Lung Cancer: A Systematic Review
Abstract Introduction  Microwave ablation (MWA) has emerged as a minimally invasive treatment for patients with inoperable non-small cell lung cancer (NSCLC). However, whether it i...
Domain Driven Data Mining
Domain Driven Data Mining
Quantitative intelligence based traditional data mining is facing grand challenges from real-world enterprise and cross-organization applications. For instance, the usual demonstra...
Optimisation of potash mining technology for cell and pillar mining method
Optimisation of potash mining technology for cell and pillar mining method
The diverse demand for inorganic fertilizers has predetermined the intensification of potash mining, which is a raw material for their production. In this regard, it has become nec...
Distributed frequent hierarchical pattern mining for robust and efficient large-scale association discovery
Distributed frequent hierarchical pattern mining for robust and efficient large-scale association discovery
Frequent pattern mining is a classic data mining technique, generally applicable to a wide range of application domains, and a mature area of research. The fundamental challenge ar...
Comprehensive analysis of liquid-liquid phase separation-related genes in prediction of breast cancer prognosis
Comprehensive analysis of liquid-liquid phase separation-related genes in prediction of breast cancer prognosis
Objective: Liquid-liquid phase separation (LLPS) is a functional unit formed by specific molecules. It lacks a membrane and has been reported to play a crucial role in tumor drug r...
The Hazards of Data Mining in Healthcare
The Hazards of Data Mining in Healthcare
From the mid-1990s, data mining methods have been used to explore and find patterns and relationships in healthcare data. During the 1990s and early 2000's, data mining was a topic...

Back to Top