Low Rank Representation

1. Matrix Norm and Singular Value

(1). nuclear norm: X=singular values1||X||_* = ||\text{singular values}||_1

(2). frobenius norm: XF=i,jXi,j2=singular values2||X||_F =\sqrt{\sum_{i,j}X_{i,j}^2} = ||\text{singular values}||_2

XF2=i,jXi,j2               =tr(XTX)                            =tr(VΛUTUΛVT)                  =tr(VΛ2VT)                  =tr(Λ2VTV)        =tr(Λ2)||X||_F^2 = \sum_{i,j}X_{i,j}^2 \\ ~~~~~~~~~~~~~~~= tr(X^TX) \\ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~= tr(V \Lambda U^T U \Lambda V^T) \\ ~~~~~~~~~~~~~~~~~~= tr(V \Lambda^2V^T) \\ ~~~~~~~~~~~~~~~~~~= tr(\Lambda^2 V^TV) \\ ~~~~~~~~ = tr(\Lambda^2)

(3). spectral norm: X2=supvXv2v2=singular valuesinf||X||_2 = sup_v{ \frac{||Xv||_2}{||v||_2}}=|| \text{singular values}||_{inf}

(1-norm and inf-norm are a pair of dual norms.)

2. Low-Rank problem

min.Z,E  rank(Z)+λE2,1\text{min}._{Z,E} ~~\text{rank}(Z) + \lambda ||E||_{2,1}
s.t.    X=AZ+E\text{s.t.} ~~~~X = AZ + E

(E2,1||E||_{2,1} is the sum of all column vectors' 2-norm)

ZRdnZ \in \mathbb{R}^{d*n} contains n data points coming from several subspaces, d is the data dimension. For example, XR100050X \in \mathbb{R}^{1000*50} contains 50 images of 5 people, each image has 1000 pixels.

In the face example, since there are only 5 people in 50 images, the "rank" of the data matrix is 5, so we want to transform the data matrix to low-rank representation.

Suppose AA is a 'dictionary' that linearly spans the data space, ZZ is a coefficient matrix, and XX is obtained by noisy observation AZ+EAZ+E where EE is the error term.

Since minimizing the rank of ZZ is np-hard, we can relax the problem by replacing rank(Z)rank(Z) with X||X||_*. Since X=singular values1||X||_* = ||\text{singular values}||_1 and 1-norm minimization leads to sparsity, sparsity in singular values yields low rank. We can approximately solve the original optimization problem by solving

min.Z,E  Z+λE2,1\text{min}._{Z,E} ~~||Z||_* + \lambda ||E||_{2,1}
s.t.    X=AZ+E\text{s.t.} ~~~~X = AZ + E

3. Singular Value Shrinkage

Consider the regularized frobenius norm minimization problem

minx.  f(x)=12XYF2+τX\text{min}_x.~~ f(x) = \frac{1}{2}||X-Y||_F^2 + \tau ||X||_*
fx=XY+τX\frac{\partial f}{\partial x} = X - Y + \tau \partial||X||_*

f(x)f(x) is a non-differentiable function, when 0fx\vec{0} \in \frac{\partial f}{\partial x}, ff achieves its minimum where fx\frac{\partial f}{\partial x} is the subdifferential of ff.

We apply singular value decomposition to YY

Y=UΣVT                      =U0Σ0V0T+U1Σ1V1TY = U \Sigma V^T \\ ~~~~~~~~~~~~~~~~~~~~~~= U_0 \Sigma_0 V_0^T + U_1 \Sigma_1 V_1^T

where diag(Σ0)>τdiag(\Sigma_0)>\tau, diag(Σ0)<=τdiag(\Sigma_0)<=\tau.

Define singular value shrink operator as

Dτ(Σ)=diag({σiτ}+)=diag(max{σiτ,0})D_{\tau}(\Sigma) = \text{diag}(\{\sigma_i - \tau\}_+) = \text{diag}( \text{max}\{\sigma_i - \tau, 0\})
Dτ(X)=UDτ(Σ)VT      (X=UΣVT is the SVD of X)D_{\tau}(X) = U D_{\tau}(\Sigma)V^T ~~~~~~\text{($X = U \Sigma V^T$ is the SVD of X)}

Perform singular value shrinkage on YY

Dτ(Y)   =Dτ(U0Σ0V0T+U1Σ1V1T)=U0(Σ0τI)V0TD_{\tau}(Y) ~~~= D_{\tau}(U_0 \Sigma_0 V_0^T + U_1 \Sigma_1 V_1^T) \\ = U_0 (\Sigma_0 - \tau I)V_0^T
YDτ(Y)=τ(U0V0T+1τU1Σ1V1T)Y - D_{\tau}(Y) = \tau(U_0V_0^T + \frac{1}{\tau}U_1 \Sigma_1 V_1^T)

The subgradient of nuclear norm is defined by

X={UVT+WUTW=0,WV=0,W2<=1}\partial ||X||_* = \{UV^T + W| U^TW=0, WV=0, ||W||_2<=1\}

Let U=U0,V=V0,W=1τU1Σ1V1TU = U_0, V = V_0, W = \frac{1}{\tau}U_1 \Sigma_1 V_1^T, YDτ(Y)τXY-D_{\tau}(Y) \in \tau \partial ||X||_*.

Let X=Dτ(Y)X = D_{\tau}(Y), XY+YDτ(Y)=0fxX - Y + Y - D_{\tau}(Y) =0 \in \frac{\partial f}{\partial x}. So X=Dτ(Y)X = D_{\tau}(Y) minimizes 12XYF2+τX\frac{1}{2}||X-Y||_F^2 + \tau ||X||_*

4. Augmented Lagrangian Multiplier

min.      f(x)s.t.  hi(x)=0\text{min}. ~~~~~~f(x) \\ \text{s.t.} ~~h_i(x)=0
L(x,v)=f(x)+ivihi(x)L(x,v) = f(x) + \sum_i v_i h_i(x)

Define the augmented lagrangian as

A(x,v,c)=L(x,v)+c2hi(x)22A(x,v,c) = L(x,v) + \frac{c}{2} ||h_i(x)||_2^2

The optimality condition for the original problem is

1.hi(x)=0h_i(x^*) = 0

2.xL(x,v)=f(x)+ivihi(x)=0 \nabla_xL(x^*, v^*) = \nabla f(x^*) + \sum_i v_i \nabla h_i(x^*) = 0

if x,vx^*, v^* is the optimal of the original problem, then

xA(x,v,c)=Lx(x,v)+cihi(x)hi(x)=0\nabla_x A(x^*,v^*,c) = \nabla L_x(x^*,v^*) + c \sum_i h_i(x^*) \nabla h_i(x^*) = 0

x,vx^*, v^* also minimizes the augmented lagrangian.

When we solve the original problem with augmented lagangian, we want to minimize the augmented lagrangian and keep xA(x,v,c)=xL(x,v)\nabla_xA(x,v,c) = \nabla_xL(x,v)

x=argminxA(x,v,c)x' = argmin_x A(x,v,c)
xA(x,v,c)=f(x)+ivihi(x)+cihi(x)hi(x)\nabla_x A(x',v,c) = \nabla f(x') + \sum_i v_i \nabla h_i(x) + c \sum_i h_i(x^*) \nabla h_i(x^*)
L(x,v)=f(x)+ivihi(x)\nabla L(x',v') = \nabla f(x') + \sum_i v_i' \nabla h_i(x')

To keep xA(x,v,c)=L(x,v)\nabla_x A(x',v,c) = \nabla L(x',v'), set vi=vi+chi(x)v_i' = v_i + c*h_i(x')

The augmented lagrangian algorithm

repeat until convergence{
    x' = argmin_x L(x,v)
    v_i' = v_i + c * h_i(x')
    x = x'
    v = v'
}

5. Solve Low-Rank Problem

The low-rank representation problem

minZ,E  Z+λE2,1\text {min}_{Z,E} ~~||Z||_* + \lambda ||E||_{2,1}
s.t.   X=AZ+E\text {s.t.} ~~~X = AZ + E

Transform is to the equivalent problem

minZ,E  J+λE2,1\text {min}_{Z,E} ~~||J||_* + \lambda ||E||_{2,1}
s.t.   X=AZ+E\text {s.t.} ~~~X = AZ + E
Z=JZ = J

Form the lagrangian and the augmented lagrangian

L(Z,E,J,Y1,Y2)=J+λE2,1+tr(Y1T(XAZE))+tr(Y2T(ZJ))L(Z,E,J,Y_1,Y_2) = ||J||_* + \lambda ||E||_{2,1} + tr(Y_1^T(X-AZ-E)) + tr(Y_2^T(Z-J))
A(Z,E,J,Y1,Y2,c)=J+λE2,1+tr(Y1T(XAZE))+tr(Y2T(ZJ))+c2XAZEF2+c2ZJF2A(Z,E,J,Y_1,Y_2, c) = ||J||_* + \lambda ||E||_{2,1} + tr(Y_1^T(X-AZ-E)) + tr(Y_2^T(Z-J)) + \frac{c}{2} ||X-AZ-E||_F^2 + \frac{c}{2} ||Z-J||_F^2
JA(Z,E,J,Y1,Y2,c)=c[JZ]Y2+J\nabla_JA(Z,E,J,Y_1,Y_2,c) = c * [J - Z] - Y_2 + \partial ||J||_*
=c[J(Z+1cY2)]+J=0= c[J-(Z+\frac{1}{c}Y_2)] + \partial ||J||_* =0
[J(Z+1cY2)]+1cJ=0[J-(Z+\frac{1}{c}Y_2)] + \frac{1}{c}\partial ||J||_*=0

Apply SVD on Z+1cY2Z + \frac{1}{c}Y_2

Z+1cY2=U0Σ0V0T+U1Σ1V1TZ + \frac{1}{c}Y_2 = U_0 \Sigma_0V_0^T + U_1 \Sigma_1V_1^T

where diag(Σ0)>1/cdiag(\Sigma_0) > 1/c, diag(Σ1)<=1/cdiag(\Sigma_1) <= 1/c.

Perform singular value shrinkage with threshold 1/c1/c on Z+1cY2Z + \frac{1}{c}Y_2

D1/c(Z+1cY2)=U0(Σ01cI)V0TD_{1/c}(Z + \frac{1}{c}Y_2) = U_0(\Sigma_0- \frac{1}{c}I) V_0^T
(Z+1cY2)D1/c(Z+1cY2)=U1Σ1V1T+1cU0V0T(Z + \frac{1}{c}Y_2) - D_{1/c}(Z + \frac{1}{c}Y_2) = U_1 \Sigma_1V_1^T + \frac{1}{c}U_0V_0^T
=1c(U0V0T+cU1Σ1V1T)= \frac{1}{c} (U_0V_0^T + c * U_1 \Sigma_1V_1^T)

Since X={UVT+WUTW=0,WV=0,W2<=1}\partial ||X||_* = \{UV^T + W | U^TW=0, WV=0, ||W||_2<=1\}, let U=U0,V=V0,W=cU1Σ1V1TU=U_0, V=V_0, W=c * U_1 \Sigma_1V_1^T.

1c(U0V0T+cU1Σ1V1T)=(Z+1cY2)D1/c(Z+1cY2)1cJ\frac{1}{c} (U_0V_0^T + c * U_1 \Sigma_1V_1^T) = (Z + \frac{1}{c}Y_2) - D_{1/c}(Z + \frac{1}{c}Y_2) \in \frac{1}{c} \partial||J||_*

J=(Z+1cY2)1cJ=D1/c(Z+1cY2)J^* = (Z + \frac{1}{c}Y_2) - \frac{1}{c} \partial||J||_* = D_{1/c}(Z + \frac{1}{c}Y_2)

ZA(Z,E,J,Y1,Y2,c)=ATY1+Y2cAT(XAZE)+c(ZJ)=0\nabla_Z A(Z,E,J,Y_1,Y_2,c) = -A^T Y_1 + Y_2 - c*A^T(X-AZ-E) + c*(Z-J) = 0
(ATA+I)Z=1c(ATY1Y2)+ATXATE+J(A^TA+I)Z^*=\frac{1}{c}(A^TY_1-Y_2) + A^TX - A^TE + J
E=argminEλE2,1tr(Y1TE)+c2XAZEF2E^* = \text{argmin}_E \lambda||E||_{2,1} - tr(Y_1^TE) + \frac{c}{2}||X-AZ-E||_F^2
ei=argmineiλeiyiTei+c2wiei22     (W=XAZ)e_i = \text{argmin}_{e_i} \lambda ||e_i|| - y_i^Te_i + \frac{c}{2} ||w_i - e_i||_2^2 ~~~~~\text{(} W = X-AZ\text{)}
=argmineiλeiyiTei+c2(eiTei2wiTei+wiTwi)= \text{argmin}_{e_i} \lambda ||e_i|| - y_i^Te_i + \frac{c}{2}(e_i^Te_i - 2 w_i^Te_i + w_i^Tw_i)
=argmineiλcei2+12ei(wi+yic)22= \text{argmin}_{e_i} \frac{\lambda}{c} ||e_i||_2 + \frac{1}{2} ||e_i - (w_i + \frac{y_i}{c})||_2^2
ei=wi+yic2λcwi+yic2(wi+yic)    ifwi+yic2>λ/ce_i^* = \frac {||w_i + \frac{y_i}{c}||_2 - \frac{\lambda}{c}} {||w_i + \frac{y_i}{c}||_2} * (w_i + \frac{y_i}{c}) ~~~~\text{if} ||w_i + \frac{y_i}{c}||_2 > \lambda/c
ei=0                                                                otherwisee_i^* = 0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~\text{otherwise}

This solution can be found here.

Y1=Y1+c(XAZE)Y_1' = Y_1 + c*(X-AZ-E)
Y2=Y2+c(ZJ)Y_2' = Y_2 + c*(Z-J)

6. Low-Rank Representation on Face Image Recovery

Suppose we have m images of n people (n< m), some of which are heavily corrupted. Want to recover these images by low-rank representation.

In this example, we use the dataset itself as the dictionary (i.e. A=XA=X), thus the optimization problem reduce to

min.Z+λE2,1\text{min}. ||Z||_* + \lambda ||E||_{2,1}
s.t.   X=XZ+E\text{s.t.} ~~~X = XZ + E

Here are some results