[NOTE in progress] Distributed Optimization and Statistical Learning via ADMM - Boyd

最新推荐文章于 2022-01-06 13:15:25 发布

原创

最新推荐文章于 2022-01-06 13:15:25 发布 · 596 阅读

1 ·

CC 4.0 BY-SA版权

本文是关于Boyd等人所著的“分布式优化与统计学习通过ADMM”的阅读笔记。ADMM起源于70年代，与Douglas-Rachford分裂法等方法密切相关。随着大数据时代的到来，ADMM因其适用于大规模优化问题的分布式解决而备受关注。ADMM结合了双分解和增广拉格朗日方法，尤其适合于特征或样本的并行分解。尽管在纯串行模式下，ADMM的收敛速度也较快，通常在几十次迭代后达到满意精度。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Reading notes of the paper "Distributed Optimization and Statistical Learning via ADMM" by Boyd, Parikh, Chu, Peleato and Eckstein.

Introduction

ADMM : developped in the 70s with roots in the 50s. Proved to be highly related to other methods like Douglas-Rachford splitting, Spingarn's method of partial inverse, Proximal methods, etc
Why ADMM today: with the arriving of the big data era and the need of ML algorithms, ADMM is proved to be well suited to solve large scale optimization problems, distributionally.
What big data brings to us: with big data, simple methods can be shown as very effective to solve complex pb
ADMM can be seen as a blend of Dual Decomposition and Augmented Lagrangian Methods. The latter is more robust and has a better convergence but cannot be decompose directly as in DD.
ADMM can decompose by example or by features. [To be explored in later chapters]
Note that even used in serial mode, ADMM is still comparable to others methods and often converge in tens of iterations.

Precursors

What is conjugate function exactly?
Dual ascent and Dual subgradient methods. If the stepsize is chosen appropriately and some other assumptions hold. They converge.
Why augemented lagrangian:
- More robust, less assumption(strict convexity, finiteness of f) : in pratice some convergence assumptions are not met for dual ascent, the constraint may be affine (e.x. Min x s.t. x>10) and the dual pb become unbounded.
- For equality constraints, augmented version has a faster convergence. This can be viewed from the penalty method's point of view.
Dual Decomposition: relax the connecting contraints so that the pb can be decomposed. This naturally invovles parallel computation.
The pho in Augmented Lag is actually the ste