Lagrange Multiplier && KTT Condition

最新推荐文章于 2023-10-25 10:51:42 发布

转载最新推荐文章于 2023-10-25 10:51:42 发布 · 1.6k 阅读

文章标签：

#constraints #optimization #constants #function #vector #variables

Machine Learning 专栏收录该内容

15 篇文章

订阅专栏

本文介绍了拉格朗日乘数法的基本原理及其在解决约束优化问题中的应用，并进一步讨论了该方法如何被推广为更广泛的Karush-Kuhn-Tucker (KKT) 条件。拉格朗日乘数法通过引入乘数来处理等式约束，而KKT条件则适用于不等式约束问题。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Wiki Ref: http://en.wikipedia.org/wiki/Lagrange_multipliers

general formulation: The weak Lagrangian principle

Denote the objective function by $f(/mathbf x)$ and let the constraints be given by $g_k(/mathbf x)=0$ , perhaps by moving constants to the left, as in $h_k(/mathbf x)-c_k=g_k(/mathbf x)$ . The domain of f should be an open set containing all points satisfying the constraints. Furthermore, $f$ and the $g k$ must have continuous first partial derivatives and the gradients of the $g k$ must not be zero on the domain.^[1] Now, define the Lagrangian, $Λ$ , as

$/Lambda(/mathbf x, /boldsymbol /lambda) = f + /sum_k /lambda_k g_k.$

k

is an index for variables and functions associated with a particular constraint,

k

$/mathbf /lambda$ without a subscript indicates the vector with elements $/mathbf /lambda_k$ , which are taken to be independent variables.

Observe that both the optimization criteria and constraints $g k (x)$ are compactly encoded as stationary points of the Lagrangian:

$/nabla_{/mathbf x} /Lambda = /mathbf{0}$ if and only if $/nabla_{/mathbf x} f = - /sum_k /lambda_k /nabla_{/mathbf x} g_k,$

$/nabla_{/mathbf x}$ means to take the gradient only with respect to each element in the vector $/mathbf x$ , instead of all variables.

and

$/nabla_{/mathbf /lambda} /Lambda = /mathbf{0}$ implies

g k = 0.

Collectively, the stationary points of the Lagrangian,

$/nabla /Lambda = /mathbf{0}$ ,

give a number of unique equations totaling the length of $/mathbf x$ plus the length of $/mathbf /lambda$ . This often makes it possible to solve for every $x$ and $λ k$ , without inverting the $g k$ .^[1] For this reason, the Lagrange multiplier method can be useful in situations where it is easier to find derivatives of the constraint functions than to invert them.

Often the Lagrange multipliers have an interpretation as some salient quantity of interest. To see why this might be the case, observe that:

$/frac{/partial /Lambda}{/partial {g_k}} = /lambda_k.$

So, λ_k is the rate of change of the quantity being optimized as a function of the constraint variable. As examples, in Lagrangian mechanics the equations of motion are derived by finding stationary points of the action, the time integral of the difference between kinetic and potential energy. Thus, the force on a particle due to a scalar potential, F = −∇V, can be interpreted as a Lagrange multiplier determining the change in action (transfer of potential to kinetic energy) following a variation in the particle's constrained trajectory. In economics, the optimal profit to a player is calculated subject to a constrained space of actions, where a Lagrange multiplier is the value of relaxing a given constraint (e.g. through bribery or other means).

The method of Lagrange multipliers is generalized by the Karush-Kuhn-Tucker conditions.

Wiki Ref: http://en.wikipedia.org/wiki/Karush-Kuhn-Tucker_conditions

Karush-Kuhn-Tucker conditions

In mathematics, the Karush-Kuhn-Tucker conditions (also known as the Kuhn-Tucker or the KKT conditions) are necessary for a solution in nonlinear programming to be optimal. It is a generalization of the method of Lagrange multipliers.

Let us consider the following nonlinear optimization problem:

$/min/limits_{x}/;/; f(x)$

$/mbox{subject to: }/$

$g_i(x) /ge 0 , h_j(x) = 0$

where $f (x)$ is the function to be minimized, $g_i (x)/ (i = 1, /ldots,m)$ are the nonequality constraints and $h_j (x)/ (j = 1,/ldots,l)$ are the equality constraints, and $m$ and $l$ are the number of nonequality and equality constraints, respectively.

The necessary conditions for this inequality constrained problem were first published in the Masters thesis of William Karush^[1], although they became renowned after a seminal conference paper by Harold W. Kuhn and Albert W. Tucker.^[2]

Necessary conditions

Suppose that the objective function, i.e., the function to be minimized, is $f : /mathbb{R}^n /rightarrow /mathbb{R}$ and the constraint functions are $g_i : /,/!/mathbb{R}^n /rightarrow /mathbb{R}$ and $h_j : /,/!/mathbb{R}^n /rightarrow /mathbb{R}$ . Further, suppose they are continuously differentiable at a point $x *$ . If $x *$ is a local minimum, then there exist constants $/mu_i/ (i = 1,/ldots,m)$ and $/nu_j/ (j = 1,...,l)$ such that ^[3]