Javascript must be enabled to continue!

Large-scale modelling of sparse kinase activity data

Protein kinases are a protein family that play an important role in several complex diseases such as cancer, cardiovascular and immunological diseases. Kinases have conserved binding sites, which when targeted can lead to similar activities of inhibitors against different kinases. This can be exploited to create multi-target drugs. On the other hand, selectivity (lack of similar activities) is desirable in order to avoid toxicity issues. There is a vast amount of kinase activity data in the public domain, which can be used in many different ways. Multi-task machine learning models are expected to excel for these kinds of datasets because they can learn from implicit correlations between tasks (in this case activities against a variety of kinases). However, multi-task modelling of sparse data poses two major challenges: (i) creating a balanced train-test split without data leakage and (ii) handling missing data. In this work, we construct a kinase benchmark set composed of two balanced splits without data leakage, using random and dissimilarity-driven cluster-based mechanisms, respectively. This data set can be used for benchmarking and developing kinase activity prediction models. Overall, the performance on the dissimilarity-driven cluster-based split is lower than on random splits based sets for all models, indicating poor generalizability of models. Nevertheless, we show that multi-task deep learning models, on this very sparse dataset, outperform single-task deep learning and tree-based models. Finally, we demonstrate that data imputation does not improve the performance of (multitask) models on this benchmark set.

American Chemical Society (ACS)

Sohvi Luukkonen Erik Meijer Giovanni Tricarico Johan Hofmans Pieter Stouten Gerard van Westen Eelke Lenselink

2023

Title: Large-scale modelling of sparse kinase activity data

Description:

Protein kinases are a protein family that play an important role in several complex diseases such as cancer, cardiovascular and immunological diseases.

Kinases have conserved binding sites, which when targeted can lead to similar activities of inhibitors against different kinases.

This can be exploited to create multi-target drugs.

On the other hand, selectivity (lack of similar activities) is desirable in order to avoid toxicity issues.

There is a vast amount of kinase activity data in the public domain, which can be used in many different ways.

Multi-task machine learning models are expected to excel for these kinds of datasets because they can learn from implicit correlations between tasks (in this case activities against a variety of kinases).

However, multi-task modelling of sparse data poses two major challenges: (i) creating a balanced train-test split without data leakage and (ii) handling missing data.

In this work, we construct a kinase benchmark set composed of two balanced splits without data leakage, using random and dissimilarity-driven cluster-based mechanisms, respectively.

This data set can be used for benchmarking and developing kinase activity prediction models.

Overall, the performance on the dissimilarity-driven cluster-based split is lower than on random splits based sets for all models, indicating poor generalizability of models.

Nevertheless, we show that multi-task deep learning models, on this very sparse dataset, outperform single-task deep learning and tree-based models.

Finally, we demonstrate that data imputation does not improve the performance of (multitask) models on this benchmark set.

Back

Abstract The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...

Negative Regulation by p70 S6 Kinase of FGF-2–Stimulated VEGF Release Through Stress-Activated Protein Kinase/c-Jun N-Terminal Kinase in Osteoblasts

Abstract To clarify the mechanism of VEGF release in osteoblasts, we studied whether p70 S6 kinase is involved in basic FGF-2–stimulated VEGF release in osteoblast-l...

Abstract 137: Increased activity of protein kinase A is sufficient to cause fibrolamellar carcinoma

Abstract Tumor cells of almost all patients with fibrolamellar carcinoma (FLC) have a somatic mutation, a ~400 kB deletion on one copy of chromosome 19 that results ...

Protein kinase activities in rat pancreatic islets of Langerhans

1. Protein kinase activities in homogenates of rat islets of Langerhans were studied. 2. On incubation of homogenates with [gamma-32P]ATP, incorporation of 32P into protein occurre...

Phosphatidylinositol 3-kinase activation is required for insulin stimulation of pp70 S6 kinase, DNA synthesis, and glucose transporter translocation.

Phosphatidylinositol 3-kinase (PI 3-kinase) is stimulated by insulin and a variety of growth factors, but its exact role in signal transduction remains unclear. We have used a nove...

An ultrasensitive fiveplex activity assay for cellular kinases

Abstract Protein kinases are enzymes whose abundance, protein-protein interactions, and posttranslational modifications together determine net signaling activity ...

An ultrasensitive fiveplex activity assay for cellular kinases

ABSTRACT Protein kinases are enzymes whose abundance, protein-protein interactions, and posttranslational modifications together determine net signaling activity in...

The mTOR Pathway Regulates PKM2 to Affect Glycolysis in Esophageal Squamous Cell Carcinoma

Objectives: Esophageal squamous cell carcinoma is a highly prevalent cancer withpoor survival rate and prognosis. Increasing evidence suggests an important role for metabolic regul...

Email:
Password:

Email:

Large-scale modelling of sparse kinase activity data

Related Results