Worksheet 10-2: KKT Conditions (with Solutions)#
Download: CMSE382-WS10_2.pdf, CMSE382-WS10_2-Soln.pdf
Warning
This is an AI-generated transcript of the worksheet and may contain errors or inaccuracies. Please refer to the original course materials for authoritative content.
Worksheet 10-2: Q1#
Consider the following problem:
We will find the KKT conditions
(a) Write the Lagrangian function of this problem.
Solution
(b) Write down the constraints from the KKT condition (stationarity condition) for the problem.
Solution
So the resulting equations are
(c) Write down the complementary slackness conditions for this problem.
Solution
(d) Write down the feasibility constraints (including the non-negativity constraints on the Lagrange multipliers) for this problem.
Solution
(e) Count the equations/inequalities found in the previous three parts. You should have 8 of them.
(f) Do you expect the local optima (if any) to be global optima? Why?
Solution
Yes, the function is convex, and the constraints are linear (so also convex), thus the problem is convex. The solution to the KKT requirements will be a global optimum.
(g) In theory, you could now solve for the unknowns \(x,y,\lambda_1,\lambda_2\) using the equations/inequalities you have. This turns into checking cases:
Case 1: \(\lambda_1 = \lambda_2 = 0\).
Case 2: \(\lambda_1 = 0\), \(\lambda_2 > 0\).
Case 3: \(\lambda_1 > 0\), \(\lambda_2 = 0\).
Case 4: \(\lambda_1 > 0\), \(\lambda_2 > 0\).
In this example, Cases 3, and 4 do not lead to feasible points. Find the KKT point(s) for Cases 1 and 2 if they exist.
Solution
Case 1
We are assuming \(\lambda_1 = \lambda_2 = 0\), so the stationarity conditions become
\[\begin{split} \begin{gather*} 2(x-0.1) = 0 \\ 2y = 0 \end{gather*}\end{split}\]Solving these equations gives us \(x = 0.1\), \(y = 0\).
Now we need to check if this solution is feasible. We do this by checking the feasibility constraints:
\[\begin{split} \begin{gather*} x+y - 1 = 0.1 + 0 - 1 = -0.9 \leq 0 \quad \checkmark\\ x-2y = 0.1 - 2 \cdot 0 = 0.1 \leq 0\quad \times\\ \lambda_1 = 0 \geq 0\quad \checkmark \\ \lambda_2 = 0 \geq 0 \quad \checkmark \end{gather*}\end{split}\]Since the second constraint is not satisfied, this solution is not feasible, so it cannot be a KKT point.
Case 2
We are assuming \(\lambda_1 = 0\), \(\lambda_2 > 0\). So the stationarity conditions become
\[\begin{split} \begin{gather*} 2(x-0.1) + \lambda_2 = 0 \\ 2y - 2\lambda_2 = 0 \end{gather*}\end{split}\]We can eliminate \(\lambda_2\) from these equations to obtain
\[ 2(x-0.1) + y = 0\]Since \(\lambda_2 > 0\) (specifically \(\lambda_2 \neq 0\)), the complementary slackness condition \(\lambda_2(x-2y) = 0\) implies that \(x-2y = 0\). So \(y = \frac{x}{2}\).
Solving this equation together with \(2(x-0.1) + y = 0\) gives us \(x = 0.08\), \(y = 0.04\).
Using \(2y-2\lambda_2 = 0\), we have that \(\lambda_2 = 0.04\).
Now we need to check if this solution is feasible. We do this by checking the feasibility constraints:
\[\begin{split} \begin{gather*} x+y - 1 = 0.08 + 0.04 - 1 = -0.88 \leq 0 \quad \checkmark\\ x-2y = 0.08 - 2 \cdot 0.04 = 0 \leq 0\quad \checkmark\\ \lambda_1 = 0 \geq 0\quad \checkmark \\ \lambda_2 = 0.04 > 0 \geq 0 \quad \checkmark \end{gather*}\end{split}\]
(h) What can you conclude about the optimal solution of this problem?
Solution
We have a unique KKT point, and the problem is convex. This means that the minimum is achieved at \((0.08, 0.04)\).
(i) Which of the constraints are active at the optimal solution?
Solution
The second constraint, \(x-2y \leq 0\) is active, since \(x-2y = 0.08 - 2\cdot 0.04 = 0\) means it holds with equality at the optimal solution.
The first constraint, \(x+y\leq 1\), is not active, since \(x+y = 0.08 + 0.04 = 0.12 < 1\), so it does not hold with equality at the optimal solution.
(j) Take a look at this desmos plot. Visually confirm that the solution you found above is indeed the optimal solution. Then turn on the plot of \(g(x,y)\), which is a copy of \(f(x,y)\). Change the constraints for \(g\) (the portions inside \(\{\cdots\}\)). Which constraint can be changed without changing the optimal solution? Which constraint can be changed to change the optimal solution? What does this have to do with the previous question?
Solution
You can change the first constraint \(x+y \leq 1\) without changing the optimal solution, as long as the constraint is not active at the optimal solution. For example, if we change the first constraint to \(x+y \leq 0.5\), the optimal solution remains unchanged.
If we change the second constraint \(x-2y \leq 0\) to \(x-2y \leq -0.01\), then the optimal solution changes, since the new constraint is active at the optimal solution.
This can be seen in this version of the desmos plot. The function \(g(x,y)\) has the \(x-2y \leq 0\) constraint changed and the optimal solution changes. The function \(h(x,y)\) has the \(x+y \leq 1\) constraint changed and the optimal solution does not change.
Worksheet 10-2: Q2#
Consider the optimization problem \(\min\{\mathbf{x}^T Q \mathbf{x} + 2 \mathbf{c}^T \mathbf{x}: A \mathbf{x}=\mathbf{b}\}\), where \(Q \in \mathbb{R}^{n\times n}\) is a positive definite matrix, \(\mathbf{c} \in \mathbb{R}^n\), \(\mathbf{b} \in \mathbb{R}^m\), and \(A\) is an \(m \times n\) matrix with linearly dependent rows. This is a convex optimization problem. Answer the following:
(a) Find the Lagrangian as a single matrix multiplication equation.
Solution
The Lagrangian general equation is
For a linearly constrained system, this equation simplifies to
For this problem we only have a set of \(m\) equality constraints, so the Lagrangian for this problem reads
Note how we adapted the general form to our problem by noting that the matrix \(C\) and \(\mathbf{d}\) in the general form are replaced by the matrix \(A\) and vector \(\mathbf{b}\) from our problem. We have no inequality constraints, so the terms containing \(\boldsymbol{\lambda}\) are omitted.
(b) Noting that we treat vectors as column vectors, write down the dimensions of all the matrices and vectors in the Lagrangian above.
Solution
\(\mathbf{x}\) is \(n \times 1\).
\(Q\) is \(n \times n\).
\(\mathbf{c}\) is \(n \times 1\).
\(A\) is \(m \times n\).
\(\mathbf{b}\) is \(m \times 1\).
\(\boldsymbol{\mu}\) is \(m \times 1\).
(c) For a column vector \(\mathbf{b}\), \(\nabla_{\mathbf{x}}(\mathbf{b}^\top \mathbf{x}) = \mathbf{b}\). Use this to determine \(\nabla_{\mathbf{x}} (\boldsymbol{\mu}^{\top}A \mathbf{x})\)?
Solution
\(\mu^T A\) is a row vector, (\(\mu\) is \(m \times 1\) and \(A\) is \(m \times n\) means \(\mu^T A\) is \(1 \times n\)).
So we can write \(\boldsymbol{\mu}^{\top}A \mathbf{x}\) as \(\mathbf{b}^\top \mathbf{x}\) where \(\mathbf{b}^\top = \boldsymbol{\mu}^{\top}A\).
Therefore,
\[ \nabla_{\mathbf{x}}(\boldsymbol{\mu}^{\top}A\mathbf{x}) = (\mu^\top A)^\top = A^{\top}\boldsymbol{\mu}.\]
(d) Using the same as above, what is \(\nabla_{\mathbf{x}} (2\mathbf{c}^{\top}\mathbf{x})\)?
Solution
\(2\mathbf{c}^{\top}\mathbf{x}\) is a scalar, since \(\mathbf{c}^T\) is \(1 \times n\) and \(\mathbf{x}\) is \(n \times 1\).
We can write \(2\mathbf{c}^{\top}\mathbf{x}\) as \(\mathbf{b}^\top \mathbf{x}\) where \(\mathbf{b}^\top = 2\mathbf{c}^{\top}\).
Therefore, \(\nabla_{\mathbf{x}} (2\mathbf{c}^{\top}\mathbf{x}) = (2\mathbf{c}^{\top})^\top = 2 \mathbf{c}\).
(e) Since \(Q\) is symmetric, \(\nabla_{\mathbf{x}} (\mathbf{x}^{\top}Q\mathbf{x}) = 2Q\mathbf{x}\). What is \(\nabla_{\mathbf{x}} L(\mathbf{x},\boldsymbol{\mu})\)?
Solution
Using the parts above, we have
(f) Write down the KKT conditions (stationarity and feasibility), and to simplify down the road, replace \(\mu = 2\gamma\). What are the unknowns in these equations?
Solution
The KKT conditions are
The original Stationarity:
\[\nabla_{\mathbf{x}} L(\mathbf{x},\boldsymbol{\mu}) = 2Q\mathbf{x} + 2 \mathbf{c} + A^{\top} \boldsymbol{\mu} = \mathbf{0}\]The modified Stationarity after scaling the Lagrange multipliers:
\[\nabla_{\mathbf{x}} L(\mathbf{x},\boldsymbol{\mu}) = 2(Q\mathbf{x} + \mathbf{c} + A^{\top} \boldsymbol{\gamma}) = \mathbf{0}\]where \(\boldsymbol{\mu} = 2 \boldsymbol{\gamma}\).
Feasibility: \(A \mathbf{x}=\mathbf{b}\).
The unknowns in this equation are \(\mathbf{x}\) which contains \(n\) elements, and the vector \(\boldsymbol{\gamma}\), which contains \(m\) elements (same number as the rows of the matrix \(A\)). So the total number of unknowns is \(m+n\).
(g) Solve the stationarity condition for \(\mathbf{x}\).
Solution
Solving the stationarity condition for \(\mathbf{x}\) gives
(h) Substitute the expression for \(\mathbf{x}\) into the feasibility constraint, and solve for any other unknown.
Solution
Substituting the expression for \(\mathbf{x}\) into the feasibility constraint we obtain
Which we can solve for \(\boldsymbol{\gamma}\)
(i) What is the stationary point in terms of \(\mathbf{c}\), \(Q\), \(A\), and \(\mathbf{b}\)?
Solution
The optimal solution is the one we found in (g), with the \(\boldsymbol{\gamma}\) value found in (h),
(j) Is the stationary point optimal? Explain.
Solution
Yes. The matrix \(Q\) in the quadratic form is positive definite, so the objective function is strictly convex. The constraint is affine, so convex. Therefore, this is a convex optimization problem, and KKT points are global optima. In this case, we have a unique KKT point so it is the unique global optimum.