Vector spaces, transformations, subspaces
Restriction of a transformation, quotient space, etc. Definitions omitted.
Null space, span, rank-nullity theorem
For any subset $\mathcal S \subset \mathcal V$, define the vector subspace
to be the set of all finite linear combinations from $S$. And define
to be the size of the smallest subset which spans $\mathcal V$. For a linear transformation $T: \mathcal V \to \mathcal W$, define the following subspaces of $\mathcal V$ and $\mathcal W$:
Theorem. For any linear transformation $T: \mathcal V \to \mathcal W$, we have
Proof.
Where the first and last step are by group-theoretic isomorphism theorems. Omitted are the simple proofs that rank respects isomorphism and turns vector space products into sums. $\square$
Projection
We call a transform $T: \mathcal V \to \mathcal V$ a projection if $T^2 = T$, or equivalently $(T \mid \mathrm{Span}(T)) = 1$.
Proposition. If $P$ is a projection then so is $(1 - P)$.
Proof. $(1 - P)^2 = 1 - 2P + P^2 = 1 - P$. $\square$
Inner product and adjoint
Definition of inner product omitted.
For a transformation $T: \mathcal V \to \mathcal W$, we define $T^*$ to be the transformation which satisfies
$(\forall x \in \mathcal V, y \in \mathcal W):\langle Tx, y \rangle = \langle x, T^*y \rangle$
Proposition. $\mathrm{Span}(T)^\perp = \mathrm{Null}(T^*)$ and $\mathrm{Null}(T)^\perp = \mathrm{Span}(T^*)$.
Proof.
$\square$
Remark: For matrix transpose, this just says that the null space of the columns is the orthogonal complement of the span of the columns.
Matrices
A matrix $M \in {\mathbb R}^{n \times m}$ corresponds to a linear transformation via
$(Mx)_i = \langle M_i, x \rangle = \sum_{j=1}^m M_{ij}x_j$
The span of a matrix as a transformation is equal to the span of its columns.
Theorem. ($\mathrm{Rank}(M) = 1$ if and only if $M = uv^\top$ for some vectors $u, v$.
Proof. The $\Leftarrow$ direction is obvious. For $\Rightarrow$, let $u \in \mathrm{Span}(M)$ be a unit vector, and let $v = M^\top u$.
Suppose that there exists $x$ such that $(uv^\top - M)x \neq 0$. But then $(uu^\top - I)Mx \neq 0$, which is impossible, since $Mx$ must be in $\mathrm{Span}(u)$, and $\mathrm{Null}(uu^\top - I) = \mathrm{Span}(u)$. $\square$
A rank-1 projection is of the form $uu^\top$ for $\|u\| = 1$.
A rank-$n$ projection is of the form $u_1u_1^\top + \dots + u_nu_n^\top$ for orthonormal $u_i$.
Rank-one approximation and singular value decomposition
Let $A \in {\mathbb R}^{n \times d}$ be an arbitrary matrix.
Lemma. Let $v \in {\mathbb R}^d$ be any unit vector. Then the best rank-one approximation of the form $uv^\top$ is $Avv^\top$. And $Av \neq 0$ implies that $\mathrm{rank}(A - Avv^\top) = \mathrm{rank}(A) - 1$.
Proof. For each row $A_i$, the best approximation in the direction of $v$ is $(A_iv)v$, proving that $u = Av$ is the overall argmin. Additionally, if $Av \neq 0$, then we have
so that the rank-nullity theorem yields the result. $\square$
Remark: Given a matrix, we can reduce its Frobenius norm and rank by successively removing arbitrary rank-1 approximations $Avv^\top$ for any unit $v$ outside the null-space.
Proposition. For any orthonormal sequence of $d$ vectors $v_1, \dots, v_d$, we have
where $V$ has columns $v_1, \dots, v_d$.
Lemma. The best rank-1 approximation to $A$ is $uv^\top$ where $v$ is an eigenvector of $A^\top A$ with maximal eigenvalue and $u = Av$ is an eigenvector of $AA^\top$ with maximal eigenvalue.
Proof.
Where the last step is because $\|uv^\top\|_F^2 = \|u\|_2^2\|v\|_2^2$ for any pair of vectors $u, v$. Therefore, we must select a $v$ which maximizes $f(v) := |Av|^2 = v^\top A^\top Av$, which exists by the compactness of the unit sphere. Lagrange multipliers yield $A^\top Av = \lambda v$ for some $\lambda \in {\mathbb R}$, i.e., $v$ must be an eigenvector of $A^\top A$. Immediately it must be an eigenvector with maximal eigenvalue. By symmetry the same proof yields that $u$ must be an eigenvector of $AA^\top$ with maximal eigenvalue. $\square$
Theorem. Suppose $A \in {\mathbb R}^{n \times d}$ is rank-$r$ and $\sigma_1u_1v_1^\top, \dots, \sigma_ru_rv_r^\top$ is any sequence such that $\sigma_{k+1}u_{k+1}v_{k+1}^\top$ is an optimal rank-one approximation of $A - \sum_1^k \sigma_iu_iv_i^\top$, where $\sigma_i \in {\mathbb R}$ and $|u_i| = |v_i| = 1$. Then both $u_1, \dots, u_r$ and $v_1, \dots, v_r$ are orthonormal and $A = \sum_{i=1}^r \sigma_i u_iv_i^\top$.
Proof. By induction. We proved that $u_{k+1}$ is an eigenvector of
$(A - \sum_1^k u_iv_i^\top)(A - \sum_1^k u_iv_i^\top)^\top = (I - \sum_1^k u_iu_i^\top) AA^\top (I - \sum_1^k u_iu_i^\top)^\top$
so in particular, it must be in the span of $(I - \sum_1^k u_iu_i^\top)$, which is the projection matrix onto the subspace orthogonal to all $u_1, \dots, u_k$, proving the result. Likewise, we proved that $v_{k+1}$ is an eigenvector of
$(A - \sum_1^k u_iv_i^\top)^\top(A - \sum_1^k u_iv_i^\top) = (I - \sum_1^k v_iv_i^\top)^\top A^\top A (I - \sum_1^k v_iv_i^\top)$
hence it must be in the span of $(I - \sum_1^k v_iv_i^\top)^\top = (I - \sum_1^k v_iv_i^\top)$, which similarly makes it orthogonal to every $v_1, \dots, v_k$. $\square$
Characterizing everything else in terms of SVD
For a rank-$r$ matrix $A \in {\mathbb R}^{n \times d}$, let $\sigma_1u_1v_1^\top, \dots, \sigma_ru_rv_r^\top$ be any series of optimal rank-one approximations reducing the rank of $A$. Let $U$ have columns $u_1, \dots, u_r, u_{r+1}, \dots, u_n$ and $V$ have columns $v_1, \dots, v_r, v_{r+1}, \dots, v_d$, where $u_{r+j}$ and $v_{r+j}$ are arbitrary vectors such that the matrices are orthonormal. And define $\sigma_{r+1}, \dots, \sigma_{\min(n,d)}$ to be zero. Then the above theorem yields
where $\Sigma \in {\mathbb R}^{n \times d}$ has $\Sigma_{ii} = \sigma_i$ and is otherwise zero. We have the following facts.
- $AA^\top = U\Sigma^2 U^\top$
- $A^\top A = V\Sigma^2 V^\top$
- $A$ is an isometric if and only if $n = d$ and $\Sigma = I$
- $A$ is positive semidefinite if and only if $\sigma_i \geq 0$
- $\|A\|_F^2 = \sum_{i=1}^{\min(n,d)} \sigma_i^2 = \|s\|^2 = \|\Sigma\|_F^2$
- $\|A\|_*^2 = \max \{\sigma_i^2 \mid i \in \{1, \dots, \min(n,d)\}\} = \|s\|_\infty = \|\Sigma\|_*^2$
- $A^+ := U\Sigma^+ V$ where $\Sigma^+_{ii} := \sigma^+$ where $\sigma^+ := 1/\sigma$ if $\sigma \neq 0$ else $\sigma^+ := 0$.
Proposition. For every $x \in \mathrm{Null}(A)^\perp$, we have $A^+Ax = x$, and for every $y \in \mathrm{Span}(A)$, we have $y^\top A^+A = y^\top$.
Equivalently, $A^+AA^\top = A^\top$.