Low Rank Representation
1. Matrix Norm and Singular Value
(1). nuclear norm:
∣ ∣ X ∣ ∣ ∗ = ∣ ∣ singular values ∣ ∣ 1 ||X||_* = ||\text{singular values}||_1 ∣ ∣ X ∣ ∣ ∗ = ∣ ∣ singular values ∣ ∣ 1
(2). frobenius norm:
∣ ∣ X ∣ ∣ F = ∑ i , j X i , j 2 = ∣ ∣ singular values ∣ ∣ 2 ||X||_F =\sqrt{\sum_{i,j}X_{i,j}^2} = ||\text{singular values}||_2 ∣ ∣ X ∣ ∣ F = ∑ i , j X i , j 2 = ∣ ∣ singular values ∣ ∣ 2
∣ ∣ X ∣ ∣ F 2 = ∑ i , j X i , j 2 = t r ( X T X ) = t r ( V Λ U T U Λ V T ) = t r ( V Λ 2 V T ) = t r ( Λ 2 V T V ) = t r ( Λ 2 ) ||X||_F^2 = \sum_{i,j}X_{i,j}^2 \\
~~~~~~~~~~~~~~~= tr(X^TX) \\
~~~~~~~~~~~~~~~~~~~~~~~~~~~~= tr(V \Lambda U^T U \Lambda V^T) \\
~~~~~~~~~~~~~~~~~~= tr(V \Lambda^2V^T) \\
~~~~~~~~~~~~~~~~~~= tr(\Lambda^2 V^TV) \\
~~~~~~~~ = tr(\Lambda^2) ∣ ∣ X ∣ ∣ F 2 = i , j ∑ X i , j 2 = t r ( X T X ) = t r ( V Λ U T U Λ V T ) = t r ( V Λ 2 V T ) = t r ( Λ 2 V T V ) = t r ( Λ 2 ) (3). spectral norm:
∣ ∣ X ∣ ∣ 2 = s u p v ∣ ∣ X v ∣ ∣ 2 ∣ ∣ v ∣ ∣ 2 = ∣ ∣ singular values ∣ ∣ i n f ||X||_2 = sup_v{ \frac{||Xv||_2}{||v||_2}}=|| \text{singular values}||_{inf} ∣ ∣ X ∣ ∣ 2 = s u p v ∣ ∣ v ∣ ∣ 2 ∣ ∣ X v ∣ ∣ 2 = ∣ ∣ singular values ∣ ∣ i n f
(1-norm and inf-norm are a pair of dual norms.)
2. Low-Rank problem
min . Z , E rank ( Z ) + λ ∣ ∣ E ∣ ∣ 2 , 1 \text{min}._{Z,E} ~~\text{rank}(Z) + \lambda ||E||_{2,1} min . Z , E rank ( Z ) + λ ∣ ∣ E ∣ ∣ 2 , 1 s.t. X = A Z + E \text{s.t.} ~~~~X = AZ + E s.t. X = A Z + E (∣ ∣ E ∣ ∣ 2 , 1 ||E||_{2,1} ∣ ∣ E ∣ ∣ 2 , 1
is the sum of all column vectors' 2-norm)
Z ∈ R d ∗ n Z \in \mathbb{R}^{d*n} Z ∈ R d ∗ n
contains n data points coming
from several subspaces, d is the data dimension. For
example, X ∈ R 1000 ∗ 50 X \in \mathbb{R}^{1000*50} X ∈ R 1 0 0 0 ∗ 5 0
contains 50
images of 5 people, each image has 1000 pixels.
In the face example, since there are only 5 people in
50 images, the "rank" of the data matrix is 5, so we
want to transform the data matrix to low-rank
representation.
Suppose A A A
is a 'dictionary' that linearly spans
the data space, Z Z Z
is a coefficient matrix,
and X X X
is obtained by noisy observation A Z + E AZ+E A Z + E
where E E E
is the error term.
Since minimizing the rank of Z Z Z
is np-hard, we can
relax the problem by replacing r a n k ( Z ) rank(Z) r a n k ( Z )
with
∣ ∣ X ∣ ∣ ∗ ||X||_* ∣ ∣ X ∣ ∣ ∗
. Since ∣ ∣ X ∣ ∣ ∗ = ∣ ∣ singular values ∣ ∣ 1 ||X||_* = ||\text{singular values}||_1 ∣ ∣ X ∣ ∣ ∗ = ∣ ∣ singular values ∣ ∣ 1
and 1-norm minimization leads to sparsity, sparsity in singular values yields low rank. We can
approximately solve the original optimization problem
by solving
min . Z , E ∣ ∣ Z ∣ ∣ ∗ + λ ∣ ∣ E ∣ ∣ 2 , 1 \text{min}._{Z,E} ~~||Z||_* + \lambda ||E||_{2,1} min . Z , E ∣ ∣ Z ∣ ∣ ∗ + λ ∣ ∣ E ∣ ∣ 2 , 1 s.t. X = A Z + E \text{s.t.} ~~~~X = AZ + E s.t. X = A Z + E 3. Singular Value Shrinkage
Consider the regularized frobenius norm minimization
problem
min x . f ( x ) = 1 2 ∣ ∣ X − Y ∣ ∣ F 2 + τ ∣ ∣ X ∣ ∣ ∗ \text{min}_x.~~ f(x) = \frac{1}{2}||X-Y||_F^2 + \tau ||X||_* min x . f ( x ) = 2 1 ∣ ∣ X − Y ∣ ∣ F 2 + τ ∣ ∣ X ∣ ∣ ∗ ∂ f ∂ x = X − Y + τ ∂ ∣ ∣ X ∣ ∣ ∗ \frac{\partial f}{\partial x} = X - Y + \tau \partial||X||_* ∂ x ∂ f = X − Y + τ ∂ ∣ ∣ X ∣ ∣ ∗ f ( x ) f(x) f ( x )
is a non-differentiable function, when
0 ⃗ ∈ ∂ f ∂ x \vec{0} \in \frac{\partial f}{\partial x} 0 ∈ ∂ x ∂ f
, f f f
achieves its minimum where
∂ f ∂ x \frac{\partial f}{\partial x} ∂ x ∂ f
is the subdifferential
of f f f
.
We apply singular value decomposition to Y Y Y
Y = U Σ V T = U 0 Σ 0 V 0 T + U 1 Σ 1 V 1 T Y = U \Sigma V^T \\
~~~~~~~~~~~~~~~~~~~~~~= U_0 \Sigma_0 V_0^T + U_1 \Sigma_1 V_1^T Y = U Σ V T = U 0 Σ 0 V 0 T + U 1 Σ 1 V 1 T where d i a g ( Σ 0 ) > τ diag(\Sigma_0)>\tau d i a g ( Σ 0 ) > τ
,
d i a g ( Σ 0 ) < = τ diag(\Sigma_0)<=\tau d i a g ( Σ 0 ) < = τ
.
Define singular value shrink operator as
D τ ( Σ ) = diag ( { σ i − τ } + ) = diag ( max { σ i − τ , 0 } ) D_{\tau}(\Sigma) = \text{diag}(\{\sigma_i - \tau\}_+)
= \text{diag}( \text{max}\{\sigma_i - \tau, 0\}) D τ ( Σ ) = diag ( { σ i − τ } + ) = diag ( max { σ i − τ , 0 } ) D τ ( X ) = U D τ ( Σ ) V T ( X = U Σ V T is the SVD of X) D_{\tau}(X) = U D_{\tau}(\Sigma)V^T
~~~~~~\text{($X = U \Sigma V^T$ is the SVD of X)} D τ ( X ) = U D τ ( Σ ) V T ( X = U Σ V T is the SVD of X) Perform singular value shrinkage on Y Y Y
D τ ( Y ) = D τ ( U 0 Σ 0 V 0 T + U 1 Σ 1 V 1 T ) = U 0 ( Σ 0 − τ I ) V 0 T D_{\tau}(Y) ~~~= D_{\tau}(U_0 \Sigma_0 V_0^T + U_1 \Sigma_1 V_1^T) \\
= U_0 (\Sigma_0 - \tau I)V_0^T D τ ( Y ) = D τ ( U 0 Σ 0 V 0 T + U 1 Σ 1 V 1 T ) = U 0 ( Σ 0 − τ I ) V 0 T Y − D τ ( Y ) = τ ( U 0 V 0 T + 1 τ U 1 Σ 1 V 1 T ) Y - D_{\tau}(Y) = \tau(U_0V_0^T + \frac{1}{\tau}U_1 \Sigma_1 V_1^T) Y − D τ ( Y ) = τ ( U 0 V 0 T + τ 1 U 1 Σ 1 V 1 T ) The subgradient of nuclear norm is defined by
∂ ∣ ∣ X ∣ ∣ ∗ = { U V T + W ∣ U T W = 0 , W V = 0 , ∣ ∣ W ∣ ∣ 2 < = 1 } \partial ||X||_* = \{UV^T + W| U^TW=0, WV=0, ||W||_2<=1\} ∂ ∣ ∣ X ∣ ∣ ∗ = { U V T + W ∣ U T W = 0 , W V = 0 , ∣ ∣ W ∣ ∣ 2 < = 1 } Let U = U 0 , V = V 0 , W = 1 τ U 1 Σ 1 V 1 T U = U_0, V = V_0, W = \frac{1}{\tau}U_1 \Sigma_1 V_1^T U = U 0 , V = V 0 , W = τ 1 U 1 Σ 1 V 1 T
, Y − D τ ( Y ) ∈ τ ∂ ∣ ∣ X ∣ ∣ ∗ Y-D_{\tau}(Y) \in \tau \partial ||X||_* Y − D τ ( Y ) ∈ τ ∂ ∣ ∣ X ∣ ∣ ∗
.
Let X = D τ ( Y ) X = D_{\tau}(Y) X = D τ ( Y )
, X − Y + Y − D τ ( Y ) = 0 ∈ ∂ f ∂ x X - Y + Y - D_{\tau}(Y) =0 \in \frac{\partial f}{\partial x} X − Y + Y − D τ ( Y ) = 0 ∈ ∂ x ∂ f
. So X = D τ ( Y ) X = D_{\tau}(Y) X = D τ ( Y )
minimizes 1 2 ∣ ∣ X − Y ∣ ∣ F 2 + τ ∣ ∣ X ∣ ∣ ∗ \frac{1}{2}||X-Y||_F^2 + \tau ||X||_* 2 1 ∣ ∣ X − Y ∣ ∣ F 2 + τ ∣ ∣ X ∣ ∣ ∗
4. Augmented Lagrangian Multiplier
min . f ( x ) s.t. h i ( x ) = 0 \text{min}. ~~~~~~f(x) \\
\text{s.t.} ~~h_i(x)=0 min . f ( x ) s.t. h i ( x ) = 0 L ( x , v ) = f ( x ) + ∑ i v i h i ( x ) L(x,v) = f(x) + \sum_i v_i h_i(x) L ( x , v ) = f ( x ) + i ∑ v i h i ( x ) Define the augmented lagrangian as
A ( x , v , c ) = L ( x , v ) + c 2 ∣ ∣ h i ( x ) ∣ ∣ 2 2 A(x,v,c) = L(x,v) + \frac{c}{2} ||h_i(x)||_2^2 A ( x , v , c ) = L ( x , v ) + 2 c ∣ ∣ h i ( x ) ∣ ∣ 2 2 The optimality condition for the original problem is
1.h i ( x ∗ ) = 0 h_i(x^*) = 0 h i ( x ∗ ) = 0
2.∇ x L ( x ∗ , v ∗ ) = ∇ f ( x ∗ ) + ∑ i v i ∇ h i ( x ∗ ) = 0 \nabla_xL(x^*, v^*) = \nabla f(x^*) + \sum_i v_i \nabla h_i(x^*) = 0 ∇ x L ( x ∗ , v ∗ ) = ∇ f ( x ∗ ) + ∑ i v i ∇ h i ( x ∗ ) = 0
if x ∗ , v ∗ x^*, v^* x ∗ , v ∗
is the optimal of the original problem, then
∇ x A ( x ∗ , v ∗ , c ) = ∇ L x ( x ∗ , v ∗ ) + c ∑ i h i ( x ∗ ) ∇ h i ( x ∗ ) = 0 \nabla_x A(x^*,v^*,c) = \nabla L_x(x^*,v^*) + c \sum_i h_i(x^*) \nabla h_i(x^*) = 0 ∇ x A ( x ∗ , v ∗ , c ) = ∇ L x ( x ∗ , v ∗ ) + c i ∑ h i ( x ∗ ) ∇ h i ( x ∗ ) = 0 x ∗ , v ∗ x^*, v^* x ∗ , v ∗
also minimizes the augmented lagrangian.
When we solve the original problem with augmented
lagangian, we want to minimize the augmented lagrangian
and keep ∇ x A ( x , v , c ) = ∇ x L ( x , v ) \nabla_xA(x,v,c) = \nabla_xL(x,v) ∇ x A ( x , v , c ) = ∇ x L ( x , v )
x ′ = a r g m i n x A ( x , v , c ) x' = argmin_x A(x,v,c) x ′ = a r g m i n x A ( x , v , c ) ∇ x A ( x ′ , v , c ) = ∇ f ( x ′ ) + ∑ i v i ∇ h i ( x ) + c ∑ i h i ( x ∗ ) ∇ h i ( x ∗ ) \nabla_x A(x',v,c) = \nabla f(x') + \sum_i v_i \nabla h_i(x) + c \sum_i h_i(x^*) \nabla h_i(x^*) ∇ x A ( x ′ , v , c ) = ∇ f ( x ′ ) + i ∑ v i ∇ h i ( x ) + c i ∑ h i ( x ∗ ) ∇ h i ( x ∗ ) ∇ L ( x ′ , v ′ ) = ∇ f ( x ′ ) + ∑ i v i ′ ∇ h i ( x ′ ) \nabla L(x',v') = \nabla f(x') + \sum_i v_i' \nabla h_i(x') ∇ L ( x ′ , v ′ ) = ∇ f ( x ′ ) + i ∑ v i ′ ∇ h i ( x ′ ) To keep ∇ x A ( x ′ , v , c ) = ∇ L ( x ′ , v ′ ) \nabla_x A(x',v,c) = \nabla L(x',v') ∇ x A ( x ′ , v , c ) = ∇ L ( x ′ , v ′ )
,
set v i ′ = v i + c ∗ h i ( x ′ ) v_i' = v_i + c*h_i(x') v i ′ = v i + c ∗ h i ( x ′ )
The augmented lagrangian algorithm
repeat until convergence{
x' = argmin_x L(x,v)
v_i' = v_i + c * h_i(x')
x = x'
v = v'
}
5. Solve Low-Rank Problem
The low-rank representation problem
min Z , E ∣ ∣ Z ∣ ∣ ∗ + λ ∣ ∣ E ∣ ∣ 2 , 1 \text {min}_{Z,E} ~~||Z||_* + \lambda ||E||_{2,1} min Z , E ∣ ∣ Z ∣ ∣ ∗ + λ ∣ ∣ E ∣ ∣ 2 , 1 s.t. X = A Z + E \text {s.t.} ~~~X = AZ + E s.t. X = A Z + E Transform is to the equivalent problem
min Z , E ∣ ∣ J ∣ ∣ ∗ + λ ∣ ∣ E ∣ ∣ 2 , 1 \text {min}_{Z,E} ~~||J||_* + \lambda ||E||_{2,1} min Z , E ∣ ∣ J ∣ ∣ ∗ + λ ∣ ∣ E ∣ ∣ 2 , 1 s.t. X = A Z + E \text {s.t.} ~~~X = AZ + E s.t. X = A Z + E Form the lagrangian and the augmented lagrangian
L ( Z , E , J , Y 1 , Y 2 ) = ∣ ∣ J ∣ ∣ ∗ + λ ∣ ∣ E ∣ ∣ 2 , 1 + t r ( Y 1 T ( X − A Z − E ) ) + t r ( Y 2 T ( Z − J ) ) L(Z,E,J,Y_1,Y_2) = ||J||_* + \lambda ||E||_{2,1} +
tr(Y_1^T(X-AZ-E)) + tr(Y_2^T(Z-J)) L ( Z , E , J , Y 1 , Y 2 ) = ∣ ∣ J ∣ ∣ ∗ + λ ∣ ∣ E ∣ ∣ 2 , 1 + t r ( Y 1 T ( X − A Z − E ) ) + t r ( Y 2 T ( Z − J ) ) A ( Z , E , J , Y 1 , Y 2 , c ) = ∣ ∣ J ∣ ∣ ∗ + λ ∣ ∣ E ∣ ∣ 2 , 1 + t r ( Y 1 T ( X − A Z − E ) ) + t r ( Y 2 T ( Z − J ) ) + c 2 ∣ ∣ X − A Z − E ∣ ∣ F 2 + c 2 ∣ ∣ Z − J ∣ ∣ F 2 A(Z,E,J,Y_1,Y_2, c) = ||J||_* + \lambda ||E||_{2,1} +
tr(Y_1^T(X-AZ-E)) + tr(Y_2^T(Z-J)) +
\frac{c}{2} ||X-AZ-E||_F^2 + \frac{c}{2} ||Z-J||_F^2 A ( Z , E , J , Y 1 , Y 2 , c ) = ∣ ∣ J ∣ ∣ ∗ + λ ∣ ∣ E ∣ ∣ 2 , 1 + t r ( Y 1 T ( X − A Z − E ) ) + t r ( Y 2 T ( Z − J ) ) + 2 c ∣ ∣ X − A Z − E ∣ ∣ F 2 + 2 c ∣ ∣ Z − J ∣ ∣ F 2
Update J J J
(Step 1 in augmented lagrangian )
∇ J A ( Z , E , J , Y 1 , Y 2 , c ) = c ∗ [ J − Z ] − Y 2 + ∂ ∣ ∣ J ∣ ∣ ∗ \nabla_JA(Z,E,J,Y_1,Y_2,c) = c * [J - Z] - Y_2 + \partial ||J||_* ∇ J A ( Z , E , J , Y 1 , Y 2 , c ) = c ∗ [ J − Z ] − Y 2 + ∂ ∣ ∣ J ∣ ∣ ∗ = c [ J − ( Z + 1 c Y 2 ) ] + ∂ ∣ ∣ J ∣ ∣ ∗ = 0 = c[J-(Z+\frac{1}{c}Y_2)] + \partial ||J||_*
=0 = c [ J − ( Z + c 1 Y 2 ) ] + ∂ ∣ ∣ J ∣ ∣ ∗ = 0 [ J − ( Z + 1 c Y 2 ) ] + 1 c ∂ ∣ ∣ J ∣ ∣ ∗ = 0 [J-(Z+\frac{1}{c}Y_2)] + \frac{1}{c}\partial ||J||_*=0 [ J − ( Z + c 1 Y 2 ) ] + c 1 ∂ ∣ ∣ J ∣ ∣ ∗ = 0 Apply SVD on Z + 1 c Y 2 Z + \frac{1}{c}Y_2 Z + c 1 Y 2
Z + 1 c Y 2 = U 0 Σ 0 V 0 T + U 1 Σ 1 V 1 T Z + \frac{1}{c}Y_2 = U_0 \Sigma_0V_0^T +
U_1 \Sigma_1V_1^T Z + c 1 Y 2 = U 0 Σ 0 V 0 T + U 1 Σ 1 V 1 T where d i a g ( Σ 0 ) > 1 / c diag(\Sigma_0) > 1/c d i a g ( Σ 0 ) > 1 / c
, d i a g ( Σ 1 ) < = 1 / c diag(\Sigma_1) <= 1/c d i a g ( Σ 1 ) < = 1 / c
.
Perform singular value shrinkage with threshold
1 / c 1/c 1 / c
on Z + 1 c Y 2 Z + \frac{1}{c}Y_2 Z + c 1 Y 2
D 1 / c ( Z + 1 c Y 2 ) = U 0 ( Σ 0 − 1 c I ) V 0 T D_{1/c}(Z + \frac{1}{c}Y_2) = U_0(\Sigma_0- \frac{1}{c}I) V_0^T D 1 / c ( Z + c 1 Y 2 ) = U 0 ( Σ 0 − c 1 I ) V 0 T ( Z + 1 c Y 2 ) − D 1 / c ( Z + 1 c Y 2 ) = U 1 Σ 1 V 1 T + 1 c U 0 V 0 T (Z + \frac{1}{c}Y_2) - D_{1/c}(Z + \frac{1}{c}Y_2) =
U_1 \Sigma_1V_1^T + \frac{1}{c}U_0V_0^T ( Z + c 1 Y 2 ) − D 1 / c ( Z + c 1 Y 2 ) = U 1 Σ 1 V 1 T + c 1 U 0 V 0 T = 1 c ( U 0 V 0 T + c ∗ U 1 Σ 1 V 1 T ) = \frac{1}{c} (U_0V_0^T + c * U_1 \Sigma_1V_1^T) = c 1 ( U 0 V 0 T + c ∗ U 1 Σ 1 V 1 T ) Since ∂ ∣ ∣ X ∣ ∣ ∗ = { U V T + W ∣ U T W = 0 , W V = 0 , ∣ ∣ W ∣ ∣ 2 < = 1 } \partial ||X||_* = \{UV^T + W | U^TW=0, WV=0, ||W||_2<=1\} ∂ ∣ ∣ X ∣ ∣ ∗ = { U V T + W ∣ U T W = 0 , W V = 0 , ∣ ∣ W ∣ ∣ 2 < = 1 }
, let U = U 0 , V = V 0 , W = c ∗ U 1 Σ 1 V 1 T U=U_0, V=V_0, W=c * U_1 \Sigma_1V_1^T U = U 0 , V = V 0 , W = c ∗ U 1 Σ 1 V 1 T
.
1 c ( U 0 V 0 T + c ∗ U 1 Σ 1 V 1 T ) = ( Z + 1 c Y 2 ) − D 1 / c ( Z + 1 c Y 2 ) ∈ 1 c ∂ ∣ ∣ J ∣ ∣ ∗ \frac{1}{c} (U_0V_0^T + c * U_1 \Sigma_1V_1^T) =
(Z + \frac{1}{c}Y_2) - D_{1/c}(Z + \frac{1}{c}Y_2)
\in \frac{1}{c} \partial||J||_* c 1 ( U 0 V 0 T + c ∗ U 1 Σ 1 V 1 T ) = ( Z + c 1 Y 2 ) − D 1 / c ( Z + c 1 Y 2 ) ∈ c 1 ∂ ∣ ∣ J ∣ ∣ ∗ J ∗ = ( Z + 1 c Y 2 ) − 1 c ∂ ∣ ∣ J ∣ ∣ ∗ = D 1 / c ( Z + 1 c Y 2 ) J^* = (Z + \frac{1}{c}Y_2) - \frac{1}{c} \partial||J||_* = D_{1/c}(Z + \frac{1}{c}Y_2) J ∗ = ( Z + c 1 Y 2 ) − c 1 ∂ ∣ ∣ J ∣ ∣ ∗ = D 1 / c ( Z + c 1 Y 2 )
Update Z Z Z
(Step 1 in augmented lagrangian )
∇ Z A ( Z , E , J , Y 1 , Y 2 , c ) = − A T Y 1 + Y 2 − c ∗ A T ( X − A Z − E ) + c ∗ ( Z − J ) = 0 \nabla_Z A(Z,E,J,Y_1,Y_2,c) = -A^T Y_1 + Y_2
- c*A^T(X-AZ-E) + c*(Z-J) = 0 ∇ Z A ( Z , E , J , Y 1 , Y 2 , c ) = − A T Y 1 + Y 2 − c ∗ A T ( X − A Z − E ) + c ∗ ( Z − J ) = 0 ( A T A + I ) Z ∗ = 1 c ( A T Y 1 − Y 2 ) + A T X − A T E + J (A^TA+I)Z^*=\frac{1}{c}(A^TY_1-Y_2) + A^TX - A^TE + J ( A T A + I ) Z ∗ = c 1 ( A T Y 1 − Y 2 ) + A T X − A T E + J
Update E (Step 1 in augmented lagrangian )
E ∗ = argmin E λ ∣ ∣ E ∣ ∣ 2 , 1 − t r ( Y 1 T E ) + c 2 ∣ ∣ X − A Z − E ∣ ∣ F 2 E^* = \text{argmin}_E \lambda||E||_{2,1} - tr(Y_1^TE) + \frac{c}{2}||X-AZ-E||_F^2 E ∗ = argmin E λ ∣ ∣ E ∣ ∣ 2 , 1 − t r ( Y 1 T E ) + 2 c ∣ ∣ X − A Z − E ∣ ∣ F 2 e i = argmin e i λ ∣ ∣ e i ∣ ∣ − y i T e i + c 2 ∣ ∣ w i − e i ∣ ∣ 2 2 ( W = X − A Z ) e_i = \text{argmin}_{e_i} \lambda ||e_i|| - y_i^Te_i + \frac{c}{2} ||w_i - e_i||_2^2 ~~~~~\text{(} W = X-AZ\text{)} e i = argmin e i λ ∣ ∣ e i ∣ ∣ − y i T e i + 2 c ∣ ∣ w i − e i ∣ ∣ 2 2 ( W = X − A Z ) = argmin e i λ ∣ ∣ e i ∣ ∣ − y i T e i + c 2 ( e i T e i − 2 w i T e i + w i T w i ) = \text{argmin}_{e_i} \lambda ||e_i|| - y_i^Te_i + \frac{c}{2}(e_i^Te_i - 2 w_i^Te_i + w_i^Tw_i) = argmin e i λ ∣ ∣ e i ∣ ∣ − y i T e i + 2 c ( e i T e i − 2 w i T e i + w i T w i ) = argmin e i λ c ∣ ∣ e i ∣ ∣ 2 + 1 2 ∣ ∣ e i − ( w i + y i c ) ∣ ∣ 2 2 = \text{argmin}_{e_i} \frac{\lambda}{c} ||e_i||_2 +
\frac{1}{2} ||e_i - (w_i + \frac{y_i}{c})||_2^2 = argmin e i c λ ∣ ∣ e i ∣ ∣ 2 + 2 1 ∣ ∣ e i − ( w i + c y i ) ∣ ∣ 2 2 e i ∗ = ∣ ∣ w i + y i c ∣ ∣ 2 − λ c ∣ ∣ w i + y i c ∣ ∣ 2 ∗ ( w i + y i c ) if ∣ ∣ w i + y i c ∣ ∣ 2 > λ / c e_i^* =
\frac
{||w_i + \frac{y_i}{c}||_2 - \frac{\lambda}{c}}
{||w_i + \frac{y_i}{c}||_2} * (w_i + \frac{y_i}{c}) ~~~~\text{if} ||w_i + \frac{y_i}{c}||_2 > \lambda/c e i ∗ = ∣ ∣ w i + c y i ∣ ∣ 2 ∣ ∣ w i + c y i ∣ ∣ 2 − c λ ∗ ( w i + c y i ) if ∣ ∣ w i + c y i ∣ ∣ 2 > λ / c e i ∗ = 0 otherwise e_i^* = 0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~\text{otherwise} e i ∗ = 0 otherwise This solution can be found here .
Update dual variable Y 1 Y_1 Y 1
and Y 2 Y_2 Y 2
Y 1 ′ = Y 1 + c ∗ ( X − A Z − E ) Y_1' = Y_1 + c*(X-AZ-E) Y 1 ′ = Y 1 + c ∗ ( X − A Z − E ) Y 2 ′ = Y 2 + c ∗ ( Z − J ) Y_2' = Y_2 + c*(Z-J) Y 2 ′ = Y 2 + c ∗ ( Z − J ) 6. Low-Rank Representation on Face Image Recovery
Suppose we have m images of n people (n< m), some of
which are heavily corrupted. Want to recover these
images by low-rank representation.
In this example, we use the dataset itself as the
dictionary (i.e. A = X A=X A = X
), thus the optimization
problem reduce to
min . ∣ ∣ Z ∣ ∣ ∗ + λ ∣ ∣ E ∣ ∣ 2 , 1 \text{min}. ||Z||_* + \lambda ||E||_{2,1} min . ∣ ∣ Z ∣ ∣ ∗ + λ ∣ ∣ E ∣ ∣ 2 , 1 s.t. X = X Z + E \text{s.t.} ~~~X = XZ + E s.t. X = X Z + E Here are some results