Javascript must be enabled to continue!
The effect of statistical normalisation on network propagation scores
View through CrossRef
Abstract
Motivation
Network diffusion and label propagation are fundamental tools in computational biology, with applications like gene-disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applications. In this work, we characterised some common null models behind the permutation analysis and the statistical properties of the diffusion scores. We benchmarked seven diffusion scores on three case studies: synthetic signals on a yeast interactome, simulated differential gene expression on a protein-protein interaction network and prospective gene set prediction on another interaction network. For clarity, all the datasets were based on binary labels, but we also present theoretical results for quantitative labels.
Results
Diffusion scores starting from binary labels were affected by the label codification, and exhibited a problem-dependent topological bias that could be removed by the statistical normalisation. Parametric and non-parametric normalisation addressed both points by being codification-independent and by equalising the bias. We identified and quantified two sources of bias -mean value and variance- that yielded performance differences when normalising the scores. We provided closed formulae for both and showed how the null covariance is related to the spectral properties of the graph. Despite none of the proposed scores systematically outperformed the others, normalisation was preferred when the sought positive labels were not aligned with the bias. We conclude that the decision on bias removal should be problem and data-driven, i.e. based on a quantitative analysis of the bias and its relation to the positive entities.
Availability
The code is publicly available at
https://github.com/b2slab/diffuBench
Contact
sergi.picart@upc.edu
Title: The effect of statistical normalisation on network propagation scores
Description:
Abstract
Motivation
Network diffusion and label propagation are fundamental tools in computational biology, with applications like gene-disease association, protein function prediction and module discovery.
More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores.
This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applications.
In this work, we characterised some common null models behind the permutation analysis and the statistical properties of the diffusion scores.
We benchmarked seven diffusion scores on three case studies: synthetic signals on a yeast interactome, simulated differential gene expression on a protein-protein interaction network and prospective gene set prediction on another interaction network.
For clarity, all the datasets were based on binary labels, but we also present theoretical results for quantitative labels.
Results
Diffusion scores starting from binary labels were affected by the label codification, and exhibited a problem-dependent topological bias that could be removed by the statistical normalisation.
Parametric and non-parametric normalisation addressed both points by being codification-independent and by equalising the bias.
We identified and quantified two sources of bias -mean value and variance- that yielded performance differences when normalising the scores.
We provided closed formulae for both and showed how the null covariance is related to the spectral properties of the graph.
Despite none of the proposed scores systematically outperformed the others, normalisation was preferred when the sought positive labels were not aligned with the bias.
We conclude that the decision on bias removal should be problem and data-driven, i.
e.
based on a quantitative analysis of the bias and its relation to the positive entities.
Availability
The code is publicly available at
https://github.
com/b2slab/diffuBench
Contact
sergi.
picart@upc.
edu.
Related Results
Statistical normalisation of network propagation methods for computational biology
Statistical normalisation of network propagation methods for computational biology
The advent of high-throughput technologies and their decreasing cost have fostered the creation of a rich ecosystem of public database resources. In an era of affordable data acqui...
NetMix2: Unifying network propagation and altered subnetworks
NetMix2: Unifying network propagation and altered subnetworks
AbstractA standard paradigm in computational biology is to use interaction networks to analyze high-throughput biological data. Two common approaches for leveraging interaction net...
ADN program benchmarking using standardized exams for assessment and remediation
ADN program benchmarking using standardized exams for assessment and remediation
The purpose of this research investigation was to determine the correlational values between testing scores when utilizing the Assessment Technologies Instituteā¢, LLC (ATI) standar...
Network structure optimization algorithm for information propagation considering edge clustering and diffusion characteristics
Network structure optimization algorithm for information propagation considering edge clustering and diffusion characteristics
Optimizing network structure to promote information propagation has been a key issue in the research field of complex network, and both clustering and diffusion characteristics of ...
Propagation characteristics of partially coherent decentred annular beams propagating through oceanic turbulence
Propagation characteristics of partially coherent decentred annular beams propagating through oceanic turbulence
The analytical expressions for the average intensity and the centroid position of partially coherent decentred annular beams propagating through oceanic turbulence are derived, and...
The synergistic effect of ego-network stability and whole network position: a perspective of transnational coopetition network
The synergistic effect of ego-network stability and whole network position: a perspective of transnational coopetition network
PurposeThe authors selected global automobile manufacturing firms whose sales ranked within 100 in the five years from 2014 to 2018 in the Factiva database to examine how the chara...
Application of Lightning Breakdown Simulation in Inversion of Induced Fracture Network Morphology in Stimulated Reservoirs
Application of Lightning Breakdown Simulation in Inversion of Induced Fracture Network Morphology in Stimulated Reservoirs
Abstract
Accurately characterizing fracture network morphology is necessary for flow simulation and fracturing evaluation. The complex natural fractures and reservoi...
Measuring slope-scale crack propagation in weak snowpack layers
Measuring slope-scale crack propagation in weak snowpack layers
<p>For a snow avalanche to release, a weak layer has to be buried below a cohesive snow slab. The slab-weak layer configuration must not only allow failure initiation...

