Low Rank Representation

1. Matrix Norm and Singular Value

(1). nuclear norm: $||X||_* = ||\text{singular values}||_1$

(2). frobenius norm: $||X||_F =\sqrt{\sum_{i,j}X_{i,j}^2} = ||\text{singular values}||_2$

||X||_F^2 = \sum_{i,j}X_{i,j}^2 \\ ~~~~~~~~~~~~~~~= tr(X^TX) \\ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~= tr(V \Lambda U^T U \Lambda V^T) \\ ~~~~~~~~~~~~~~~~~~= tr(V \Lambda^2V^T) \\ ~~~~~~~~~~~~~~~~~~= tr(\Lambda^2 V^TV) \\ ~~~~~~~~ = tr(\Lambda^2)

(3). spectral norm: $||X||_2 = sup_v{ \frac{||Xv||_2}{||v||_2}}=|| \text{singular values}||_{inf}$

(1-norm and inf-norm are a pair of dual norms.)

2. Low-Rank problem

\text{min}._{Z,E} ~~\text{rank}(Z) + \lambda ||E||_{2,1}

\text{s.t.} ~~~~X = AZ + E

( $||E||_{2,1}$ is the sum of all column vectors' 2-norm)

$Z \in \mathbb{R}^{d*n}$ contains n data points coming from several subspaces, d is the data dimension. For example, $X \in \mathbb{R}^{1000*50}$ contains 50 images of 5 people, each image has 1000 pixels.

In the face example, since there are only 5 people in 50 images, the "rank" of the data matrix is 5, so we want to transform the data matrix to low-rank representation.

Suppose $A$ is a 'dictionary' that linearly spans the data space, $Z$ is a coefficient matrix, and $X$ is obtained by noisy observation $AZ+E$ where $E$ is the error term.

Since minimizing the rank of $Z$ is np-hard, we can relax the problem by replacing $rank(Z)$ with $||X||_*$ . Since $||X||_* = ||\text{singular values}||_1$ and 1-norm minimization leads to sparsity, sparsity in singular values yields low rank. We can approximately solve the original optimization problem by solving

\text{min}._{Z,E} ~~||Z||_* + \lambda ||E||_{2,1}

\text{s.t.} ~~~~X = AZ + E

3. Singular Value Shrinkage

Consider the regularized frobenius norm minimization problem

\text{min}_x.~~ f(x) = \frac{1}{2}||X-Y||_F^2 + \tau ||X||_*

\frac{\partial f}{\partial x} = X - Y + \tau \partial||X||_*

$f(x)$ is a non-differentiable function, when $\vec{0} \in \frac{\partial f}{\partial x}$ , $f$ achieves its minimum where $\frac{\partial f}{\partial x}$ is the subdifferential of $f$ .

We apply singular value decomposition to $Y$

Y = U \Sigma V^T \\ ~~~~~~~~~~~~~~~~~~~~~~= U_0 \Sigma_0 V_0^T + U_1 \Sigma_1 V_1^T

where $diag(\Sigma_0)>\tau$ , $diag(\Sigma_0)<=\tau$ .

Define singular value shrink operator as

D_{\tau}(\Sigma) = \text{diag}(\{\sigma_i - \tau\}_+) = \text{diag}( \text{max}\{\sigma_i - \tau, 0\})

D_{\tau}(X) = U D_{\tau}(\Sigma)V^T ~~~~~~\text{($X = U \Sigma V^T$ is the SVD of X)}

Perform singular value shrinkage on $Y$

D_{\tau}(Y) ~~~= D_{\tau}(U_0 \Sigma_0 V_0^T + U_1 \Sigma_1 V_1^T) \\ = U_0 (\Sigma_0 - \tau I)V_0^T

Y - D_{\tau}(Y) = \tau(U_0V_0^T + \frac{1}{\tau}U_1 \Sigma_1 V_1^T)

The subgradient of nuclear norm is defined by

\partial ||X||_* = \{UV^T + W| U^TW=0, WV=0, ||W||_2&lt;=1\}

Let $U = U_0, V = V_0, W = \frac{1}{\tau}U_1 \Sigma_1 V_1^T$ , $Y-D_{\tau}(Y) \in \tau \partial ||X||_*$ .

Let $X = D_{\tau}(Y)$ , $X - Y + Y - D_{\tau}(Y) =0 \in \frac{\partial f}{\partial x}$ . So $X = D_{\tau}(Y)$ minimizes $\frac{1}{2}||X-Y||_F^2 + \tau ||X||_*$

4. Augmented Lagrangian Multiplier

\text{min}. ~~~~~~f(x) \\ \text{s.t.} ~~h_i(x)=0

L(x,v) = f(x) + \sum_i v_i h_i(x)

Define the augmented lagrangian as

A(x,v,c) = L(x,v) + \frac{c}{2} ||h_i(x)||_2^2

The optimality condition for the original problem is

1. $h_i(x^*) = 0$

2. $\nabla_xL(x^*, v^*) = \nabla f(x^*) + \sum_i v_i \nabla h_i(x^*) = 0$

if $x^*, v^*$ is the optimal of the original problem, then

\nabla_x A(x^*,v^*,c) = \nabla L_x(x^*,v^*) + c \sum_i h_i(x^*) \nabla h_i(x^*) = 0

$x^*, v^*$ also minimizes the augmented lagrangian.

When we solve the original problem with augmented lagangian, we want to minimize the augmented lagrangian and keep $\nabla_xA(x,v,c) = \nabla_xL(x,v)$

x&#x27; = argmin_x A(x,v,c)

\nabla_x A(x&#x27;,v,c) = \nabla f(x&#x27;) + \sum_i v_i \nabla h_i(x) + c \sum_i h_i(x^*) \nabla h_i(x^*)

\nabla L(x&#x27;,v&#x27;) = \nabla f(x&#x27;) + \sum_i v_i&#x27; \nabla h_i(x&#x27;)

To keep $\nabla_x A(x',v,c) = \nabla L(x',v')$ , set $v_i' = v_i + c*h_i(x')$

The augmented lagrangian algorithm

repeat until convergence{
    x' = argmin_x L(x,v)
    v_i' = v_i + c * h_i(x')
    x = x'
    v = v'
}

5. Solve Low-Rank Problem

The low-rank representation problem

\text {min}_{Z,E} ~~||Z||_* + \lambda ||E||_{2,1}

\text {s.t.} ~~~X = AZ + E

Transform is to the equivalent problem

\text {min}_{Z,E} ~~||J||_* + \lambda ||E||_{2,1}

\text {s.t.} ~~~X = AZ + E

Z = J

Form the lagrangian and the augmented lagrangian

L(Z,E,J,Y_1,Y_2) = ||J||_* + \lambda ||E||_{2,1} + tr(Y_1^T(X-AZ-E)) + tr(Y_2^T(Z-J))

A(Z,E,J,Y_1,Y_2, c) = ||J||_* + \lambda ||E||_{2,1} + tr(Y_1^T(X-AZ-E)) + tr(Y_2^T(Z-J)) + \frac{c}{2} ||X-AZ-E||_F^2 + \frac{c}{2} ||Z-J||_F^2

Update $J$ (Step 1 in augmented lagrangian )

\nabla_JA(Z,E,J,Y_1,Y_2,c) = c * [J - Z] - Y_2 + \partial ||J||_*

= c[J-(Z+\frac{1}{c}Y_2)] + \partial ||J||_* =0

[J-(Z+\frac{1}{c}Y_2)] + \frac{1}{c}\partial ||J||_*=0

Apply SVD on $Z + \frac{1}{c}Y_2$

Z + \frac{1}{c}Y_2 = U_0 \Sigma_0V_0^T + U_1 \Sigma_1V_1^T

where $diag(\Sigma_0) > 1/c$ , $diag(\Sigma_1) <= 1/c$ .

Perform singular value shrinkage with threshold $1/c$ on $Z + \frac{1}{c}Y_2$

D_{1/c}(Z + \frac{1}{c}Y_2) = U_0(\Sigma_0- \frac{1}{c}I) V_0^T

(Z + \frac{1}{c}Y_2) - D_{1/c}(Z + \frac{1}{c}Y_2) = U_1 \Sigma_1V_1^T + \frac{1}{c}U_0V_0^T

= \frac{1}{c} (U_0V_0^T + c * U_1 \Sigma_1V_1^T)

Since $\partial ||X||_* = \{UV^T + W | U^TW=0, WV=0, ||W||_2<=1\}$ , let $U=U_0, V=V_0, W=c * U_1 \Sigma_1V_1^T$ .

\frac{1}{c} (U_0V_0^T + c * U_1 \Sigma_1V_1^T) = (Z + \frac{1}{c}Y_2) - D_{1/c}(Z + \frac{1}{c}Y_2) \in \frac{1}{c} \partial||J||_*

$J^* = (Z + \frac{1}{c}Y_2) - \frac{1}{c} \partial||J||_* = D_{1/c}(Z + \frac{1}{c}Y_2)$

Update $Z$ (Step 1 in augmented lagrangian )

\nabla_Z A(Z,E,J,Y_1,Y_2,c) = -A^T Y_1 + Y_2 - c*A^T(X-AZ-E) + c*(Z-J) = 0

(A^TA+I)Z^*=\frac{1}{c}(A^TY_1-Y_2) + A^TX - A^TE + J

Update E (Step 1 in augmented lagrangian )

E^* = \text{argmin}_E \lambda||E||_{2,1} - tr(Y_1^TE) + \frac{c}{2}||X-AZ-E||_F^2

e_i = \text{argmin}_{e_i} \lambda ||e_i|| - y_i^Te_i + \frac{c}{2} ||w_i - e_i||_2^2 ~~~~~\text{(} W = X-AZ\text{)}

= \text{argmin}_{e_i} \lambda ||e_i|| - y_i^Te_i + \frac{c}{2}(e_i^Te_i - 2 w_i^Te_i + w_i^Tw_i)

= \text{argmin}_{e_i} \frac{\lambda}{c} ||e_i||_2 + \frac{1}{2} ||e_i - (w_i + \frac{y_i}{c})||_2^2

e_i^* = \frac {||w_i + \frac{y_i}{c}||_2 - \frac{\lambda}{c}} {||w_i + \frac{y_i}{c}||_2} * (w_i + \frac{y_i}{c}) ~~~~\text{if} ||w_i + \frac{y_i}{c}||_2 &gt; \lambda/c

e_i^* = 0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~\text{otherwise}

This solution can be found here.

Update dual variable $Y_1$ and $Y_2$

Y_1&#x27; = Y_1 + c*(X-AZ-E)

Y_2&#x27; = Y_2 + c*(Z-J)

6. Low-Rank Representation on Face Image Recovery

Suppose we have m images of n people (n< m), some of which are heavily corrupted. Want to recover these images by low-rank representation.

In this example, we use the dataset itself as the dictionary (i.e. $A=X$ ), thus the optimization problem reduce to

\text{min}. ||Z||_* + \lambda ||E||_{2,1}

\text{s.t.} ~~~X = XZ + E

Here are some results