Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Unbiased learning of protein conformational representation via unsupervised random forest

View through CrossRef
AbstractAccurate data representation is paramount in biophysics to capture the functionally relevant motions of biomolecules. Traditional feature selection methods, while effective, often rely on labeled data based on prior knowledge and user-supervision, limiting their applicability to novel systems. Here, we presentunsupervised random forest(URF), a self-supervised adaptation of traditional random forests that identifies functionally critical features of biomolecules without requiring prior labels. By devising a memory-efficient implementation, we first demonstrate URF’s capability to learn important sets of inter-residue features of a protein and subsequently to resolve its complex conformational landscape, performing at par or surpassing its traditional supervised counterpart and 15 other leading baseline methods. Crucially, URF is supplemented by an internal metric, thelearning coefficient, which automates the process of hyper-parameter optimization, making the method robust and user-friendly. URF’s remarkable ability to learn important protein features in an unbiased fashion was validated against 10 independent protein systems including both both folded and intrinsically disordered states. In particular, benchmarking investigations showed that the protein representations identified by URF are functionally meaningful in comparison to current state-of-the-art deep learning methods. As an application, we show that URF can be seamlessly integrated with downstream analyses pipeline such as Markov state models to attain better resolved outputs. The investigation presented here establishes URF as a leading tool for unsupervised representation learning in protein biophysics.
Title: Unbiased learning of protein conformational representation via unsupervised random forest
Description:
AbstractAccurate data representation is paramount in biophysics to capture the functionally relevant motions of biomolecules.
Traditional feature selection methods, while effective, often rely on labeled data based on prior knowledge and user-supervision, limiting their applicability to novel systems.
Here, we presentunsupervised random forest(URF), a self-supervised adaptation of traditional random forests that identifies functionally critical features of biomolecules without requiring prior labels.
By devising a memory-efficient implementation, we first demonstrate URF’s capability to learn important sets of inter-residue features of a protein and subsequently to resolve its complex conformational landscape, performing at par or surpassing its traditional supervised counterpart and 15 other leading baseline methods.
Crucially, URF is supplemented by an internal metric, thelearning coefficient, which automates the process of hyper-parameter optimization, making the method robust and user-friendly.
URF’s remarkable ability to learn important protein features in an unbiased fashion was validated against 10 independent protein systems including both both folded and intrinsically disordered states.
In particular, benchmarking investigations showed that the protein representations identified by URF are functionally meaningful in comparison to current state-of-the-art deep learning methods.
As an application, we show that URF can be seamlessly integrated with downstream analyses pipeline such as Markov state models to attain better resolved outputs.
The investigation presented here establishes URF as a leading tool for unsupervised representation learning in protein biophysics.

Related Results

Factors influencing and patterns of forest utilization in communities around the Huay Tak Teak Biosphere Reserve, Lampang Province
Factors influencing and patterns of forest utilization in communities around the Huay Tak Teak Biosphere Reserve, Lampang Province
Background and Objectives: To establish the land regulation, it is necessary to know basic information of the surrounding community’s land use and to be aware of basic forest laws....
Endothelial Protein C Receptor
Endothelial Protein C Receptor
IntroductionThe protein C anticoagulant pathway plays a critical role in the negative regulation of the blood clotting response. The pathway is triggered by thrombin, which allows ...
Relations between structural characteristics, forest involvement, and forest knowledge among private forest owners in Sweden
Relations between structural characteristics, forest involvement, and forest knowledge among private forest owners in Sweden
AbstractAn understanding of private forest owners is needed for appropriate forest governance and outreach to forest owners. This study examined different types of objective and su...
Forest Structure and Potential of Carbon Storage at Khao Nam Sab, Kasetsart University, Sri Racha Campus, Chonburi Province
Forest Structure and Potential of Carbon Storage at Khao Nam Sab, Kasetsart University, Sri Racha Campus, Chonburi Province
Background and Objectives: Tropical Forest ecosystems are globally significant for their roles in biodiversity conservation, climate regulation, and carbon sequestration. In Thaila...
Same Equilibrium. Different Kinetics. Protein Functional Consequences
Same Equilibrium. Different Kinetics. Protein Functional Consequences
AbstractIn a living cell, protein function is regulated in several ways, including post-translational modifications (PTMs), protein-protein interaction, or by the global environmen...
Breeding avifauna of the forest interior and forest edge in the Borki Forest
Breeding avifauna of the forest interior and forest edge in the Borki Forest
AbstractThe composition and structure of breeding bird communities in the Borki Forest in North-Eastern Poland were investigated separately in the forest interior (years 2012–2014)...
The forest avifauna of Arabuko Sokoke Forest and adjacent modified habitats
The forest avifauna of Arabuko Sokoke Forest and adjacent modified habitats
AbstractArabuko Sokoke Forest (ASF) is the largest area of coastal forest remaining in East Africa and a major Important Bird Area in mainland Kenya. The study analysed data from p...
Analysis and interpretation of forest fire data of Sikkim
Analysis and interpretation of forest fire data of Sikkim
Forest ecosystems are depleting and heading towards degradation which would adversely affect the world's socio-economic harmony. Various disasters disturb the cordial relationship ...

Back to Top