Javascript must be enabled to continue!
Biclustering Models Under Collinearity in Simulated Biological Experiments
View through CrossRef
Biclustering models allow simultaneous detection of group observations that are related to variables in a data matrix. Such methods have been applied in biological data for classification. Collinearity is a common feature in biological data as there exist interactions between genes and proteins in their respective pathways. Such relationships could seriously reduce the efficiency of biclustering models. In this study, synthetic data are generated to investigate the effect of collinearity on the performance of biclustering models. Specifically, the data are generated and induced with varying degrees of collinearity using Cholesky decomposition, and are implanted with biclusters to produce different sets of synthetic data. The effectiveness of three models namely Biclustering by Cheng and Church (BCCC), Spectral Bicluster (BCSpectral) and Plaid Model in correctly detecting three types of biclusters in the generated data matrix were compared. The results show that all the models investigated are sensitive to changes in the level of collinearity. At low collinearity, all biclustering models were able to detect the implanted biclusters in the data correctly. As the level of collinearity in the data rise, the proportion of detected biclusters captured by the models reduces. In particular, BCC outperformed the other two models for moderate to high collinearity with a Jaccard coefficient of 0.499 to 0.875 and 0.746 to 0.936 for one and two implanted biclusters respectively.
Title: Biclustering Models Under Collinearity in Simulated Biological Experiments
Description:
Biclustering models allow simultaneous detection of group observations that are related to variables in a data matrix.
Such methods have been applied in biological data for classification.
Collinearity is a common feature in biological data as there exist interactions between genes and proteins in their respective pathways.
Such relationships could seriously reduce the efficiency of biclustering models.
In this study, synthetic data are generated to investigate the effect of collinearity on the performance of biclustering models.
Specifically, the data are generated and induced with varying degrees of collinearity using Cholesky decomposition, and are implanted with biclusters to produce different sets of synthetic data.
The effectiveness of three models namely Biclustering by Cheng and Church (BCCC), Spectral Bicluster (BCSpectral) and Plaid Model in correctly detecting three types of biclusters in the generated data matrix were compared.
The results show that all the models investigated are sensitive to changes in the level of collinearity.
At low collinearity, all biclustering models were able to detect the implanted biclusters in the data correctly.
As the level of collinearity in the data rise, the proportion of detected biclusters captured by the models reduces.
In particular, BCC outperformed the other two models for moderate to high collinearity with a Jaccard coefficient of 0.
499 to 0.
875 and 0.
746 to 0.
936 for one and two implanted biclusters respectively.
Related Results
Pengelompokan Penyakit Menular di Pulau Jawa Tahun 2023 Menggunakan Algoritma BCBimax
Pengelompokan Penyakit Menular di Pulau Jawa Tahun 2023 Menggunakan Algoritma BCBimax
Abstract. This study aims to identify simultaneous clustering patterns of infectious diseases and their distribution across three provinces in Java Island in 2023, using the biclus...
<em> Hox </em>Temporal Collinearity: Misleading Fallacy or Essential Developmental Mechanism?
<em> Hox </em>Temporal Collinearity: Misleading Fallacy or Essential Developmental Mechanism?
Kondo and collaborators recently reported the absence of Hox temporal collinearity in Xenopus tropicalis. They found none in the initiation of accumulation of Hox transcr...
A Combinatoric biclustering algorithm
A Combinatoric biclustering algorithm
The unsupervised analysis of gene expression data plays a very important role in Genetics experiments. That is why a lot of clustering and biclustering techniques have been propose...
Bayesian regression modeling and inference of energy efficiency data: the effect of collinearity and sensitivity analysis
Bayesian regression modeling and inference of energy efficiency data: the effect of collinearity and sensitivity analysis
The majority of research predicted heating demand using linear regression models, but they did not give current building features enough context. Model problems such as Multicollin...
Assessing the Impact of Simulated Color Vision Deficiency on Ophthalmologists’ Ability to Differentiate between Choroidal Melanoma and Choroidal Nevus
Assessing the Impact of Simulated Color Vision Deficiency on Ophthalmologists’ Ability to Differentiate between Choroidal Melanoma and Choroidal Nevus
Background: Color vision deficiency (CVD) is an often-overlooked issue within the medical community, and its consequences remain insufficiently explored. We aim to evaluate how CVD...
Adaptive Somatic Mutations Calls with Deep Learning and Semi-Simulated Data
Adaptive Somatic Mutations Calls with Deep Learning and Semi-Simulated Data
ABSTRACTA number of approaches have been developed to call somatic variation in high-throughput sequencing data. Here, we present an adaptive approach to calling somatic variations...
Exploring the topical structure of short text through probability models : from tasks to fundamentals
Exploring the topical structure of short text through probability models : from tasks to fundamentals
Recent technological advances have radically changed the way we communicate. Today’s
communication has become ubiquitous and it has fostered the need for information that is easie...
Variable selection procedures under collinearity (multicollinearity)
Variable selection procedures under collinearity (multicollinearity)
Variable selection is an important area of statistical modeling, which is still an active area of research. In this study, we investigated the performance of four variable selectio...

