Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Large-scale modelling of sparse kinase activity data

View through CrossRef
Protein kinases are a protein family that play an important role in several complex diseases such as cancer, cardiovascular and immunological diseases. Kinases have conserved binding sites, which when targeted can lead to similar activities of inhibitors against different kinases. This can be exploited to create multi-target drugs. On the other hand, selectivity (lack of similar activities) is desirable in order to avoid toxicity issues. There is a vast amount of kinase activity data in the public domain, which can be used in many different ways. Multi-task machine learning models are expected to excel for these kinds of datasets because they can learn from implicit correlations between tasks (in this case activities against a variety of kinases). However, multi-task modelling of sparse data poses two major challenges: (i) creating a balanced train-test split without data leakage and (ii) handling missing data. In this work, we construct a kinase benchmark set composed of two balanced splits without data leakage, using random and dissimilarity-driven cluster-based mechanisms, respectively. This data set can be used for benchmarking and developing kinase activity prediction models. Overall, the performance on the dissimilarity-driven cluster-based split is lower than on random splits based sets for all models, indicating poor generalizability of models. Nevertheless, we show that multi-task deep learning models, on this very sparse dataset, outperform single-task deep learning and tree-based models. Finally, we demonstrate that data imputation does not improve the performance of (multitask) models on this benchmark set.
Title: Large-scale modelling of sparse kinase activity data
Description:
Protein kinases are a protein family that play an important role in several complex diseases such as cancer, cardiovascular and immunological diseases.
Kinases have conserved binding sites, which when targeted can lead to similar activities of inhibitors against different kinases.
This can be exploited to create multi-target drugs.
On the other hand, selectivity (lack of similar activities) is desirable in order to avoid toxicity issues.
There is a vast amount of kinase activity data in the public domain, which can be used in many different ways.
Multi-task machine learning models are expected to excel for these kinds of datasets because they can learn from implicit correlations between tasks (in this case activities against a variety of kinases).
However, multi-task modelling of sparse data poses two major challenges: (i) creating a balanced train-test split without data leakage and (ii) handling missing data.
In this work, we construct a kinase benchmark set composed of two balanced splits without data leakage, using random and dissimilarity-driven cluster-based mechanisms, respectively.
This data set can be used for benchmarking and developing kinase activity prediction models.
Overall, the performance on the dissimilarity-driven cluster-based split is lower than on random splits based sets for all models, indicating poor generalizability of models.
Nevertheless, we show that multi-task deep learning models, on this very sparse dataset, outperform single-task deep learning and tree-based models.
Finally, we demonstrate that data imputation does not improve the performance of (multitask) models on this benchmark set.

Related Results

Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Abstract The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...
Abstract 137: Increased activity of protein kinase A is sufficient to cause fibrolamellar carcinoma
Abstract 137: Increased activity of protein kinase A is sufficient to cause fibrolamellar carcinoma
Abstract Tumor cells of almost all patients with fibrolamellar carcinoma (FLC) have a somatic mutation, a ~400 kB deletion on one copy of chromosome 19 that results ...
Protein kinase activities in rat pancreatic islets of Langerhans
Protein kinase activities in rat pancreatic islets of Langerhans
1. Protein kinase activities in homogenates of rat islets of Langerhans were studied. 2. On incubation of homogenates with [gamma-32P]ATP, incorporation of 32P into protein occurre...
An ultrasensitive fiveplex activity assay for cellular kinases
An ultrasensitive fiveplex activity assay for cellular kinases
Abstract Protein kinases are enzymes whose abundance, protein-protein interactions, and posttranslational modifications together determine net signaling activity ...
An ultrasensitive fiveplex activity assay for cellular kinases
An ultrasensitive fiveplex activity assay for cellular kinases
ABSTRACT Protein kinases are enzymes whose abundance, protein-protein interactions, and posttranslational modifications together determine net signaling activity in...
The mTOR Pathway Regulates PKM2 to Affect Glycolysis in Esophageal Squamous Cell Carcinoma
The mTOR Pathway Regulates PKM2 to Affect Glycolysis in Esophageal Squamous Cell Carcinoma
Objectives: Esophageal squamous cell carcinoma is a highly prevalent cancer withpoor survival rate and prognosis. Increasing evidence suggests an important role for metabolic regul...

Back to Top