Javascript must be enabled to continue!
Unbiased learning of protein conformational representation via unsupervised random forest
View through CrossRef
Abstract
Accurate data representation is paramount in biophysics to capture the functionally relevant motions of biomolecules. Traditional feature selection methods, while effective, often rely on labeled data based on prior knowledge and user-supervision, limiting their applicability to novel systems. Here, we present
unsupervised random forest
(URF), a self-supervised adaptation of traditional random forests that identifies functionally critical features of biomolecules without requiring prior labels. By devising a memory-efficient implementation, we first demonstrate URF’s capability to learn important sets of inter-residue features of a protein and subsequently to resolve its complex conformational landscape, performing at par or surpassing its traditional supervised counterpart and 15 other leading baseline methods. Crucially, URF is supplemented by an internal metric, the
learning coefficient
, which automates the process of hyper-parameter optimization, making the method robust and user-friendly. URF’s remarkable ability to learn important protein features in an unbiased fashion was validated against 10 independent protein systems including both both folded and intrinsically disordered states. In particular, benchmarking investigations showed that the protein representations identified by URF are functionally meaningful in comparison to current state-of-the-art deep learning methods. As an application, we show that URF can be seamlessly integrated with downstream analyses pipeline such as Markov state models to attain better resolved outputs. The investigation presented here establishes URF as a leading tool for unsupervised representation learning in protein biophysics.
Title: Unbiased learning of protein conformational representation via unsupervised random forest
Description:
Abstract
Accurate data representation is paramount in biophysics to capture the functionally relevant motions of biomolecules.
Traditional feature selection methods, while effective, often rely on labeled data based on prior knowledge and user-supervision, limiting their applicability to novel systems.
Here, we present
unsupervised random forest
(URF), a self-supervised adaptation of traditional random forests that identifies functionally critical features of biomolecules without requiring prior labels.
By devising a memory-efficient implementation, we first demonstrate URF’s capability to learn important sets of inter-residue features of a protein and subsequently to resolve its complex conformational landscape, performing at par or surpassing its traditional supervised counterpart and 15 other leading baseline methods.
Crucially, URF is supplemented by an internal metric, the
learning coefficient
, which automates the process of hyper-parameter optimization, making the method robust and user-friendly.
URF’s remarkable ability to learn important protein features in an unbiased fashion was validated against 10 independent protein systems including both both folded and intrinsically disordered states.
In particular, benchmarking investigations showed that the protein representations identified by URF are functionally meaningful in comparison to current state-of-the-art deep learning methods.
As an application, we show that URF can be seamlessly integrated with downstream analyses pipeline such as Markov state models to attain better resolved outputs.
The investigation presented here establishes URF as a leading tool for unsupervised representation learning in protein biophysics.
Related Results
Factors influencing and patterns of forest utilization in communities around the Huay Tak Teak Biosphere Reserve, Lampang Province
Factors influencing and patterns of forest utilization in communities around the Huay Tak Teak Biosphere Reserve, Lampang Province
Background and Objectives: To establish the land regulation, it is necessary to know basic information of the surrounding community’s land use and to be aware of basic forest laws....
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
Secondary Succession in the Lowland Forests of the Marlborough Sounds Maritime Park
Secondary Succession in the Lowland Forests of the Marlborough Sounds Maritime Park
<p>This study documents aspects of the forest recovery process in secondary communities of the Marlborough sounds Maritime park. some 39 types of seral vegetation were recogn...
A novel unsupervised deep learning network for intelligent fault diagnosis of rotating machinery
A novel unsupervised deep learning network for intelligent fault diagnosis of rotating machinery
Generally, the health conditions of rotating machinery are complicated and changeable. Meanwhile, its fault labeled information is mostly unknown. Therefore, it is man-sized to aut...
Application of Random Forest Algorithm and Multi-Temporal Satellite Data for Forest Types Classification in Chiang Mai Province
Application of Random Forest Algorithm and Multi-Temporal Satellite Data for Forest Types Classification in Chiang Mai Province
Background and Objectives: Chiang Mai Province is strategically important as a major watershed area for the Ping River basin ecosystem and boasts one of the highest biodiversity le...
Endothelial Protein C Receptor
Endothelial Protein C Receptor
IntroductionThe protein C anticoagulant pathway plays a critical role in the negative regulation of the blood clotting response. The pathway is triggered by thrombin, which allows ...
FUNDAMENTALS OF FOREST NURSERY ECONOMICS
FUNDAMENTALS OF FOREST NURSERY ECONOMICS
The relevance of the study of the economic organization of forest nurseries is explained by the need to include this type of activity in the forest economy. The underdevelopment of...
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND
As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...

