Javascript must be enabled to continue!
Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex Optimization
View through CrossRef
Averaging scheme has attracted extensive attention in deep learning as well as traditional machine learning. It achieves theoretically optimal convergence and also improves the empirical model performance. However, there is still a lack of sufficient convergence analysis for strongly convex optimization. Typically, the convergence about the last iterate of gradient descent methods, which is referred to as individual convergence, fails to attain its optimality due to the existence of logarithmic factor. In order to remove this factor, we first develop gradient descent averaging (GDA), which is a general projection-based dual averaging algorithm in the strongly convex setting. We further present primal-dual averaging for strongly convex cases (SC-PDA), where primal and dual averaging schemes are simultaneously utilized. We prove that GDA yields the optimal convergence rate in terms of output averaging, while SC-PDA derives the optimal individual convergence. Several experiments on SVMs and deep learning models validate the correctness of theoretical analysis and effectiveness of algorithms.
Association for the Advancement of Artificial Intelligence (AAAI)
Title: Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex Optimization
Description:
Averaging scheme has attracted extensive attention in deep learning as well as traditional machine learning.
It achieves theoretically optimal convergence and also improves the empirical model performance.
However, there is still a lack of sufficient convergence analysis for strongly convex optimization.
Typically, the convergence about the last iterate of gradient descent methods, which is referred to as individual convergence, fails to attain its optimality due to the existence of logarithmic factor.
In order to remove this factor, we first develop gradient descent averaging (GDA), which is a general projection-based dual averaging algorithm in the strongly convex setting.
We further present primal-dual averaging for strongly convex cases (SC-PDA), where primal and dual averaging schemes are simultaneously utilized.
We prove that GDA yields the optimal convergence rate in terms of output averaging, while SC-PDA derives the optimal individual convergence.
Several experiments on SVMs and deep learning models validate the correctness of theoretical analysis and effectiveness of algorithms.
Related Results
Ostrowski-Type Fractional Integral Inequalities: A Survey
Ostrowski-Type Fractional Integral Inequalities: A Survey
This paper presents an extensive review of some recent results on fractional Ostrowski-type inequalities associated with a variety of convexities and different kinds of fractional ...
Primal-dual gap estimators for a posteriori error analysis of nonsmooth minimization problems
Primal-dual gap estimators for a posteriori error analysis of nonsmooth minimization problems
The primal-dual gap is a natural upper bound for the energy error and, for uniformly convex minimization problems, also for the error in the energy norm. This feature can be used t...
Solving Triangular Intuitionistic Fuzzy Matrix Game by Applying the Accuracy Function Method
Solving Triangular Intuitionistic Fuzzy Matrix Game by Applying the Accuracy Function Method
In this paper, the matrix game based on triangular intuitionistic fuzzy payoff is put forward. Then, we get a conclusion that the equilibrium solution of this game model is equival...
Primal and sub primal lamb carcass cuts from three different genetic groups finished in feedlot
Primal and sub primal lamb carcass cuts from three different genetic groups finished in feedlot
ABSTRACT The objective of this study was to evaluate the yield, morphometric traits, and the primal and sub primal cuts of Santa Inês lamb carcasses and their crossbreds with Dorpe...
When Does a Dual Matrix Have a Dual Generalized Inverse?
When Does a Dual Matrix Have a Dual Generalized Inverse?
This paper deals with the existence of various types of dual generalized inverses of dual matrices. New and foundational results on the necessary and sufficient conditions for vari...
Integrating space syntax with spatial interaction
Integrating space syntax with spatial interaction
AbstractIn this paper, we attempt to compare space syntax with spatial interaction. At one level, these two approaches to urban spatial structure are non-comparable. Space syntax i...
Convex hull peeling
Convex hull peeling
Enveloppes convexes pelées
Cette thèse porte sur la construction du convex hull peeling (qu’on pourrait traduire littéralement par enveloppe convexe pelée). Le conv...
Waterflooding Optimization Using Gradient Based Methods
Waterflooding Optimization Using Gradient Based Methods
Abstract
Finding the best strategy for production optimization is currently an important research task for closed-loop reservoir management. The closed-loop reservoi...

