Javascript must be enabled to continue!

Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex Optimization

Averaging scheme has attracted extensive attention in deep learning as well as traditional machine learning. It achieves theoretically optimal convergence and also improves the empirical model performance. However, there is still a lack of sufficient convergence analysis for strongly convex optimization. Typically, the convergence about the last iterate of gradient descent methods, which is referred to as individual convergence, fails to attain its optimality due to the existence of logarithmic factor. In order to remove this factor, we first develop gradient descent averaging (GDA), which is a general projection-based dual averaging algorithm in the strongly convex setting. We further present primal-dual averaging for strongly convex cases (SC-PDA), where primal and dual averaging schemes are simultaneously utilized. We prove that GDA yields the optimal convergence rate in terms of output averaging, while SC-PDA derives the optimal individual convergence. Several experiments on SVMs and deep learning models validate the correctness of theoretical analysis and effectiveness of algorithms.

Association for the Advancement of Artificial Intelligence (AAAI)

Wei Tao Wei Li Zhisong Pan Qing Tao

Proceedings of the AAAI Conference on Artificial Intelligence

2022

Title: Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex Optimization

Description:

Averaging scheme has attracted extensive attention in deep learning as well as traditional machine learning.

It achieves theoretically optimal convergence and also improves the empirical model performance.

However, there is still a lack of sufficient convergence analysis for strongly convex optimization.

Typically, the convergence about the last iterate of gradient descent methods, which is referred to as individual convergence, fails to attain its optimality due to the existence of logarithmic factor.

In order to remove this factor, we first develop gradient descent averaging (GDA), which is a general projection-based dual averaging algorithm in the strongly convex setting.

We further present primal-dual averaging for strongly convex cases (SC-PDA), where primal and dual averaging schemes are simultaneously utilized.

We prove that GDA yields the optimal convergence rate in terms of output averaging, while SC-PDA derives the optimal individual convergence.

Several experiments on SVMs and deep learning models validate the correctness of theoretical analysis and effectiveness of algorithms.

Back

This paper presents an extensive review of some recent results on fractional Ostrowski-type inequalities associated with a variety of convexities and different kinds of fractional ...

Primal-dual gap estimators for a posteriori error analysis of nonsmooth minimization problems

The primal-dual gap is a natural upper bound for the energy error and, for uniformly convex minimization problems, also for the error in the energy norm. This feature can be used t...

Solving Triangular Intuitionistic Fuzzy Matrix Game by Applying the Accuracy Function Method

In this paper, the matrix game based on triangular intuitionistic fuzzy payoff is put forward. Then, we get a conclusion that the equilibrium solution of this game model is equival...

Primal and sub primal lamb carcass cuts from three different genetic groups finished in feedlot

ABSTRACT The objective of this study was to evaluate the yield, morphometric traits, and the primal and sub primal cuts of Santa Inês lamb carcasses and their crossbreds with Dorpe...

When Does a Dual Matrix Have a Dual Generalized Inverse?

This paper deals with the existence of various types of dual generalized inverses of dual matrices. New and foundational results on the necessary and sufficient conditions for vari...

Integrating space syntax with spatial interaction

AbstractIn this paper, we attempt to compare space syntax with spatial interaction. At one level, these two approaches to urban spatial structure are non-comparable. Space syntax i...

Waterflooding Optimization Using Gradient Based Methods

Abstract Finding the best strategy for production optimization is currently an important research task for closed-loop reservoir management. The closed-loop reservoi...

Collaborative Promotion:A New Path for the Development of Dual-Innovation Education in Colleges and Universities in Ethnic Minority Area

In the context of the new era, talent is the first resource and innovation is the first driving force, and it is more and more important to emphasize the dual-creation education in...

Email:
Password:

Email:

Gradient Descent Averaging and Primal-dual Averaging for Strongly Convex Optimization

Related Results