Javascript must be enabled to continue!
Dual stochastic natural gradient descent
View through CrossRef
Abstract
The multinomial logistic regression (MLR) model is widely used in statistics and machine learning. On the one hand, stochastic gradient descent (SGD) is the most common approach for determining the parameters of a such model in big data scenarios, due to its simplicity and low computational complexity property. Furthermore, SGD has proven convergence under reasonable conditions. However, SGD has slow sub-linear rates of convergence and it often reduces convergence speed due to the plateau phenomenon. On the other hand, stochastic natural gradient descent (SNGD), proposed by Amari, is a manifold optimization method shown to be Fisher efficient when it converges, but its convergence properties remain unproven and it is often computationally prohibitive for models with a large number of parameters. Here, we propose dual stochastic natural gradient descent (DSNGD), a stochastic optimization method for MLR based on manifold optimization concepts. In the discrete scenario, DSNGD (i) has linear per-iteration computational complexity in the number of parameters, and (ii) is proven to converge. To achieve (i) we leverage the dual flatness of the family of joint distributions for MLR to simplify computations. To ensure (ii) DSNGD builds on the foundational ideas of convergent stochastic natural gradient descent (CSNGD), a variant of SNGD with guaranteed convergence, using an independent sequence to construct a bounded approximation of the natural gradient. By generalizing a result from Sunehag et al., we prove that DSNGD converges in the discrete case and maintains linear computational complexity per iteration. Beyond its convergence property and linear computational complexity, DSNGD empirically demonstrates fast convergence comparable to SNGD, improves upon SGD performance, and exhibits stability where SNGD does not.
Springer Science and Business Media LLC
Title: Dual stochastic natural gradient descent
Description:
Abstract
The multinomial logistic regression (MLR) model is widely used in statistics and machine learning.
On the one hand, stochastic gradient descent (SGD) is the most common approach for determining the parameters of a such model in big data scenarios, due to its simplicity and low computational complexity property.
Furthermore, SGD has proven convergence under reasonable conditions.
However, SGD has slow sub-linear rates of convergence and it often reduces convergence speed due to the plateau phenomenon.
On the other hand, stochastic natural gradient descent (SNGD), proposed by Amari, is a manifold optimization method shown to be Fisher efficient when it converges, but its convergence properties remain unproven and it is often computationally prohibitive for models with a large number of parameters.
Here, we propose dual stochastic natural gradient descent (DSNGD), a stochastic optimization method for MLR based on manifold optimization concepts.
In the discrete scenario, DSNGD (i) has linear per-iteration computational complexity in the number of parameters, and (ii) is proven to converge.
To achieve (i) we leverage the dual flatness of the family of joint distributions for MLR to simplify computations.
To ensure (ii) DSNGD builds on the foundational ideas of convergent stochastic natural gradient descent (CSNGD), a variant of SNGD with guaranteed convergence, using an independent sequence to construct a bounded approximation of the natural gradient.
By generalizing a result from Sunehag et al.
, we prove that DSNGD converges in the discrete case and maintains linear computational complexity per iteration.
Beyond its convergence property and linear computational complexity, DSNGD empirically demonstrates fast convergence comparable to SNGD, improves upon SGD performance, and exhibits stability where SNGD does not.
Related Results
When Does a Dual Matrix Have a Dual Generalized Inverse?
When Does a Dual Matrix Have a Dual Generalized Inverse?
This paper deals with the existence of various types of dual generalized inverses of dual matrices. New and foundational results on the necessary and sufficient conditions for vari...
Collaborative Promotion:A New Path for the Development of Dual-Innovation Education in Colleges and Universities in Ethnic Minority Area
Collaborative Promotion:A New Path for the Development of Dual-Innovation Education in Colleges and Universities in Ethnic Minority Area
In the context of the new era, talent is the first resource and innovation is the first driving force, and it is more and more important to emphasize the dual-creation education in...
Hip Stability Parameters with Dual Mobility, Modular Dual Mobility and Fixed Bearing in Total Hip Arthroplasty: an Analytical Evaluation.
Hip Stability Parameters with Dual Mobility, Modular Dual Mobility and Fixed Bearing in Total Hip Arthroplasty: an Analytical Evaluation.
Abstract
Background. Use of dual mobility in total hip arthroplasty has gained popularity due to the ability to reduce dislocation through increased jumping distance and im...
A Natural Gradient Descent Algorithm for the Solution of Lyapunov Equations Based on the Geodesic Distance
A Natural Gradient Descent Algorithm for the Solution of Lyapunov Equations Based on the Geodesic Distance
A new framework based on the curved Riemannian manifold is proposed to calculate the numerical solution of the Lyapunov matrix equation by using a natural gradient descent algorith...
On a non-standard two-species stochastic competing system and a related degenerate parabolic equation
On a non-standard two-species stochastic competing system and a related degenerate parabolic equation
We propose and analyse a new stochastic competing two-species population dynamics model. Competing algae population dynamics in river environments, an important engineering problem...
Stochastic Modeling Of Space Dependent Reservoir-Rock Properties
Stochastic Modeling Of Space Dependent Reservoir-Rock Properties
Abstract
Numerical modeling of space dependent and variant reservoir-rock properties such as porosity, permeability, etc., are routinely used in the oil industry....
An Improved Method In Speech Signal Input Representation Based On DTW Technique For NN Speech Recognition System
An Improved Method In Speech Signal Input Representation Based On DTW Technique For NN Speech Recognition System
Kertas kerja ini membentangkan pemprosesan semula ciri pertuturan pemalar Pengekodan Ramalan Linear (LPC) bagi menyediakan template rujukan yang boleh diharapkan untuk set perkataa...
Fourier representation of random media fields in stochastic finite element modelling
Fourier representation of random media fields in stochastic finite element modelling
PurposeTo provide an explicit representation for wide‐sense stationary stochastic fields which can be used in stochastic finite element modelling to describe random material proper...

