Worksheet Chapter 10, Part 3

Worksheet Chapter 10, Part 3#

KKT Conditions for Linearly Constrained Problems#

Notebook Info#

Topics#

This notebook contains three coding examples designed to build intuition for KKT conditions in linearly constrained optimization:

Projection onto an affine set
Equality-constrained quadratic
Inequality-constrained QP active-set exploration

Learning goals#

Compute and interpret stationarity, primal feasibility, dual feasibility, and complementarity.
See KKT multipliers as measurable quantities (not just symbols).
Connect active constraints to which multipliers are nonzero.

import itertools
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline
np.set_printoptions(precision=4, suppress=True)

1. Orthogonal Projection Onto an Affine Space#

Previously, we studied orthogonal projection (see lectures 8-2 and 9-1), where the goal is to find the closest point \(\mathbf{y} \in C\) to a given \(\mathbf{x}\), denoted

\[ P_C(\mathbf{x}) = \arg \min_{\mathbf{y} \in C} \|\mathbf{y}- \mathbf{x}\|^2. \]

In the previous lectures, we found formulae for the projection operator in the case of

\[\mathbb{R}_+^n = \{ \mathbf{y} \in \mathbb{R}^n \mid y_i \geq 0 \forall i\}\]

and

\[B[0,r] = \{\mathbf{y} \mid \|\mathbf{y}\|^2 \leq r\}.\]

Now, using the KKT conditions, we can find formulae for projections onto more complex sets \(C\); specifically, when \(C\) is an affine space.

Given \(A\in\mathbb{R}^{m\times n}\) with full row rank, we will project \(y\in\mathbb{R}^n\) onto

\[ C=\{x\in\mathbb{R}^n:Ax=b\} \]

by solving

\[ \min_x \; \|x-y\|_2^2 \quad \text{s.t.}\; Ax=b. \]

In this case, the Lagrangian function is

\[L(\mathbf{x},\boldsymbol{\mu}) = \|\mathbf{x} - \mathbf{y}\|^{2} + \boldsymbol{\mu}^{\top} (A \mathbf{x} - \mathbf{b}),\]

so the stationarity condition is

\[ \nabla_{\mathbf{x}} L(\mathbf{x},\boldsymbol{\mu}) = 2\mathbf{x} -2 \mathbf{y} + A^{\top} \boldsymbol{\mu}=\mathbf{0} \]

Combined with the feasibility condition \(A \mathbf{x} = \mathbf{b}\), we find that

\[ P_C(\mathbf{y}) = \mathbf{y}-A^{\top}(AA^{\top})^{-1}(A\mathbf{y}-\mathbf{b}). \]

See the video lecture for the full derivation.

Example#

We are going to work with the system \(A \mathbf{x} = \mathbf{b}\) for

\[\begin{split} A = \begin{bmatrix} 3 & 1 & -1 \\ -2 & 2 & 1 \end{bmatrix} \qquad b = \begin{bmatrix} 0\\0 \end{bmatrix} \end{split}\]

❓❓❓ Question ❓❓❓: Take a look at this Desmos plot. What is the set \(A \mathbf{x} = \mathbf{b}\)?

Your answer here

❓❓❓ Question ❓❓❓: Use the derivation above to compute the projected point \(\mathbf{x} = P_C(-1,0,2)\) for a point \(\mathbf{x} = (-1,0,2)\). Plot your found point on the desmos plot to see if your answer looks reasonable.

A = np.array([[3, 1, -1], [-2, 2, 1]])
b = np.array([[0,0]]).T
y = np.array([[-1,0,2]]).T

# Your code here 

Solving without doing the algebra by hand.#

We can actually use linear algebra to do all of our work for us.

After pulling the \(\gamma = 2\mu\) trick, for a given \(\mathbf{y} \in \mathbb{R}^n\), we have KKT conditions for this problem given by:

\[\begin{split} \begin{aligned} 2\mathbf{x} -2 \mathbf{y} + A^{\top} \boldsymbol{\gamma}&=\mathbf{0}\\ A \mathbf{x} & = \mathbf{b} \end{aligned} \end{split}\]

Right now we have variables \(\mathbf{x}\) and \(\boldsymbol{\gamma}\) to solve for. We can put all our variables together into a column vector, and then solve the following single giant system:

\[\begin{split} \begin{bmatrix} 2I & A^T \\ A & 0 \end{bmatrix} \begin{bmatrix} \mathbf{x} \\ \boldsymbol{\gamma} \end{bmatrix} = \begin{bmatrix} 2y \\ b \end{bmatrix}. \end{split}\]

❓❓❓ Question ❓❓❓: For our example system above, write out the matrices in that single giant system. Check that all the equations implied by the KKT conditions show up in the matrix multiplication.

Your notes here

❓❓❓ Question ❓❓❓: Now code the matrices you wrote above and use the np.linalg.solve function to determine \(\mathbf{x}\) and \(\boldsymbol{\gamma}\) for this problem. For a challenge, try to build the matrices directly from your saved A, b, and y from above. Did you get the same answer as above for \(\mathbf{x}\)?

# Your code here

2. Equality-Constrained Quadratic#

We can pull the same trick to solve a quadratic optimization problem,

\[ \min_x \; x^TQx + 2c^Tx \quad \text{s.t.}\; Ax=b \]

From the 8-2 worksheet, we have that

\[ \nabla_{\mathbf{x}} L(\mathbf{x},\boldsymbol{\mu}) = 2(Q\mathbf{x} + \mathbf{c} + A^{\top} \boldsymbol{\gamma}) \]

so the KKT conditions become

\[\begin{split} \begin{aligned} Q\mathbf{x} + \mathbf{c} + A^{\top} \boldsymbol{\gamma} &= \mathbf{0} \\ A \mathbf{x}&=\mathbf{b} \end{aligned} \end{split}\]

❓❓❓ Question ❓❓❓: Write down the equivalent version of the “single giant system” from above that we can solve to get the solution.

Your notes here.

Example in \(\mathbb{R}^2\)#

Now we will work with the system where

\[\begin{split} Q = \begin{bmatrix} 3 & -1 \\ -1 & 2 \end{bmatrix} \qquad \mathbf{c} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}. \end{split}\]

We will do the simplified case where \(A\) is a vector rather than a matrix (i.e. this is a linear system), so we want to find the optimum point over \(\mathbf{a}^\top \mathbf{x} = b\) for

\[\begin{split} \mathbf{a} = \begin{bmatrix} 1 \\ 1 \end{bmatrix} \qquad b = -2 \end{split}\]

We match the above notation by setting \(A = \mathbf{a}^\top\).

❓❓❓ Question ❓❓❓: For this Desmos plot and the setup above, what are the red surface, the blue line, and the green line?

Your notes here

❓❓❓ Question ❓❓❓: Set up the “single giant system” matrix problem you found above for this specific problem and find the solution \(\mathbf{x}^*\). As before, try to do this directly from the matrices rather than hard-coding the matrix. Then plot your solution on the desmos plot to see if it’s right.

Q = np.array([[3, -1], [-1, 2]])
c = np.array([[1, 1]]).T
A = np.array([[1, 1]])
b = np.array([[-2]])

# Your code here

Example in \(\mathbb{R}^3\)#

While I can’t draw you pictures, you should be able to do exactly the same thing for larger systems where we are solving for \(\mathbf{x} \in \mathbb{R}^n\) for \(n>2\). In particular, if you already setup your code above to build the matrices you want directly from the matrices this next bit should be easy.

The problem to solve is

\[ \min_x \; x^TQx + 2c^Tx \quad \text{s.t.}\; Ax=b \]

for

\[\begin{split} Q = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 5 & 0 \\ -3 & 0 & 10 \end{bmatrix} \qquad \mathbf{c} = \begin{bmatrix} 1 \\ -3 \\ 4 \end{bmatrix}. \end{split}\]

We want to find the optimum point over \(A \mathbf{x} = \mathbf{b}\) for

\[\begin{split} \mathbf{A} = \begin{bmatrix} 1 & -1 & 3 \\ 2 & 0 & -1 \end{bmatrix} \qquad \mathbf{b} = \begin{bmatrix} 2 \\ -1 \end{bmatrix} \end{split}\]

❓❓❓ Question ❓❓❓:

Is the problem convex?
What is the solution to the problem?

# Your code here

3. Inequality-Constrained Optimization and Active Sets#

For the examples above, we had only equality constraints, \(A \mathbf{x} = \mathbf{b}\). It’s a bit more complicated when we want to include inequalities as well.

Consider the optimization problem

\[ \min_{\mathbf{x} \in \mathbb{R}^2} \; \mathbf{x}^TQ\mathbf{x} + 2c^T\mathbf{x} \quad \text{s.t.}\; G\mathbf{x}\leq \mathbf{h} \]

In this setup, we can write down the Lagrangian as

\[ L(\mathbf{x}, \boldsymbol{\lambda}) = \mathbf{x}^TQ\mathbf{x} + 2c^T\mathbf{x} + \boldsymbol{\lambda}^\top\left(G\mathbf{x} - \mathbf{h} \right). \]

❓❓❓ Question ❓❓❓: Given the following list of matrix derivative rules, determine \(\nabla_\mathbf{x}L(\mathbf{x}, \boldsymbol{\lambda})\) and thus the stationarity condition.

Scalar value	Scalar value description	Gradient notation	Gradient value
\(x^\top Q x\)	Quadratic form (\(Q\) is symmetric)	\(\dfrac{\partial (x^\top Q x)}{\partial x}\)	\(2Qx\)
\(c^\top A x\)	Vector-matrix-vector product	\(\dfrac{\partial (c^\top A x)}{\partial x}\)	\(A^\top c\)
\(b^\top x\)	Inner product	\(\dfrac{\partial (b^\top x)}{\partial x}\)	\(b\)

Your notes here.

❓❓❓ Question ❓❓❓: Write down the slackness conditions for the problem.

Your notes here

Checking active constraints using an example#

So far, we have the stationarity condition and the slackness conditions you calculated above. We also need the feasibility condition \(G\mathbf{x} \leq \mathbf{h}\), so we will also need \(\mathbf{\lambda \geq 0}\). However, it’s hard to include inequality constraints with that matrix multiplication trick above since we can only figure out equalities.

So, to do this, we will do a check where we assume a subset of the constraints from \(G\mathbf{x}\leq \mathbf{h}\) are active (that is, are actually equality rather than inequality), and check all the possible subsets.

We will do this for the quadratic function given by matrices

\[\begin{split} Q = \begin{bmatrix} 2 & 0.2 \\ 0.2 & 1 \end{bmatrix} \qquad \mathbf{c} = \begin{bmatrix} -1 \\ -0.6 \end{bmatrix} \end{split}\]

and the constraint set given by matrices

\[\begin{split} G = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{bmatrix} \qquad h = \begin{bmatrix} 1.2 \\ 1 \\ 1.4 \end{bmatrix} \end{split}\]

Q = np.array([[2.0, 0.2], [0.2, 1.0]])
c = np.array([-1.0, -0.6])
G = np.array([[1.0, 0.0], [0.0, 1.0], [1.0, 1.0]])
h = np.array([1.2, 1.0, 1.4])

The code below gives back points \(\mathbf{x}\) and \(\lambda\) such that

the stationarity condition is satisfied, so \(\nabla_\mathbf{x}L(\mathbf{x}, \boldsymbol{\lambda}) = \mathbf{0}\), and
the rows of \(G\mathbf{x} \leq \mathbf{h}\) that are active are the ones in the list active_idx. Said another way, for those rows, \(G_i\mathbf{x} = h_i\).

def solve_active_set(Q, c, G, h, active_idx):
    n = Q.shape[0]
    Aeq = G[list(active_idx), :] if len(active_idx) > 0 else np.zeros((0, n))
    beq = h[list(active_idx)] if len(active_idx) > 0 else np.zeros(0)

    mA = Aeq.shape[0]
    left_side = np.block([[Q, Aeq.T], [Aeq, np.zeros((mA, mA))]])
    right_side = np.concatenate([-c, beq])
    try:
        sol = np.linalg.solve(left_side, right_side)
    except np.linalg.LinAlgError:
        return None, None
    x = sol[:n]
    nu = sol[n:]  # multipliers for active constraints

    # map to full lambda
    lam = np.zeros(G.shape[0])
    for k, idx in enumerate(active_idx):
        lam[idx] = nu[k]
    return x, lam

❓❓❓ Question ❓❓❓: Using the function above, what are the \(\mathbf{x}\) and \(\lambda\) so that the 0th and 2nd rows of the inequality constraints are active, that is so that \(x_1 + 0x_2 = 1.2\) and \(x_1+x_2 = 1.4\)? Save the outputs as x and lam.

# Your code here
# Warning: `lambda` is a reserved keyword in Python, so make sure to save the lambda output as `lam` instead to avoid syntax errors.

Now we can check if x and lam satisfy the rest of our requirements.

❓❓❓ Question ❓❓❓: Use the np.all function to output True for the following function if the lambda entries are all \(\geq 0\), and False otherwise.

def is_lambda_pos(lam):
    pass #<-- Your code here

# Test the output
is_lambda_pos(lam)

❓❓❓ Question ❓❓❓: Use the np.all function to output True if \(G\mathbf{x} \leq \mathbf{h}\), and False otherwise.

def is_Gx_leq_h(G, x, h):
    pass #<-- Your code here

is_Gx_leq_h(G, x, h)

❓❓❓ Question ❓❓❓: Use the np.all function to output True if \(\lambda_i (G_i\mathbf{x} - h_i) = 0\) for all \(i\), and False otherwise.

def is_slackness_satisfied(lam, G, x, h):

   pass #<-- Your code here 

is_slackness_satisfied(lam, G, x, h)

❓❓❓ Question ❓❓❓: Finish the code below by checking the output \(x\) and \(\lambda\) using the functions above, and add the point to the list if it satsifies all constraints.

feasible_points = [] 

all_idx = list(range(G.shape[0]))
for r in range(len(all_idx) + 1):
    for active in itertools.combinations(all_idx, r):
        x, lam = solve_active_set(Q, c, G, h, active)

        if x is None:
            continue
        
        # Your code here
        # Add checks here and add feasible points to the list 
            
print("Feasible points that satisfy KKT conditions:")
for pt in feasible_points:
    print(pt)

❓❓❓ Question ❓❓❓: The problem above was convex, so there was exactly one KKT point which made our job easy. Now, use your code above to find the optimum of the non-convex problem where

\[\begin{split} Q = \begin{bmatrix} 1.3040 & 0.1217 \\ 0.1217 & -1.2654 \end{bmatrix} \qquad \mathbf{c} = \begin{bmatrix} -0.6233 \\ 0.0413 \end{bmatrix} \end{split}\]

and the constraint set given by matrices

\[\begin{split} G = \begin{bmatrix} 1 & 0 \\ -1 & 0 \\ 0 & 1 \\ 0 & -1 \end{bmatrix} \qquad h = \begin{bmatrix} 2 \\ 2 \\ 2 \\ 2 \end{bmatrix} \end{split}\]

You should get multiple KKT points. How do you know which one is optimal?

# Your code here

Worksheet Chapter 10, Part 3

Contents

Worksheet Chapter 10, Part 3#

KKT Conditions for Linearly Constrained Problems#

Notebook Info#

Topics#

Learning goals#

1. Orthogonal Projection Onto an Affine Space#

Example#

Solving without doing the algebra by hand.#

2. Equality-Constrained Quadratic#

Example in \(\mathbb{R}^2\)#

Example in \(\mathbb{R}^3\)#

3. Inequality-Constrained Optimization and Active Sets#

Checking active constraints using an example#