[notes] Linear algebra

Vector spaces, transformations, subspaces

Restriction of a transformation, quotient space, etc. Definitions omitted.

Null space, span, rank-nullity theorem

For any subset $\mathcal S \subset \mathcal V$, define the vector subspace

$$\mathrm{Span}(\mathcal S) := \{c_1v_1 + \dots + c_nv_n \mid v_i \in \mathcal S, c_i \in {\mathbb R}\}$$

to be the set of all finite linear combinations from $S$. And define

$$\mathrm{Rank}(\mathcal V) := \min\{|\mathcal S| : \mathcal S \subset \mathcal V, \mathrm{Span}(\mathcal S) = \mathcal V\}$$

to be the size of the smallest subset which spans $\mathcal V$. For a linear transformation $T: \mathcal V \to \mathcal W$, define the following subspaces of $\mathcal V$ and $\mathcal W$:

$$\mathrm{Null}(T) := \{v \in \mathcal V \mid Tv = 0\} = T^{-1}(0)$$

$$\mathrm{Span}(T) := \{Tv \mid v \in \mathcal V\} = T(\mathcal V)$$

Theorem. For any linear transformation $T: \mathcal V \to \mathcal W$, we have

$$\mathrm{Rank}(\mathrm{Null(T)}) + \mathrm{Rank}(\mathrm{Span}(T)) = \mathrm{Rank}(\mathcal V)$$

Proof.

\begin{align*} \mathrm{Rank}(\mathrm{Null(T)}) + \mathrm{Rank}(\mathrm{Span}(T)) &= \mathrm{Rank}(\mathrm{Null}(T)) + \mathrm{Rank}(\mathcal V/\mathrm{Null}(T))\\ &= \mathrm{Rank}(\mathrm{Null}(T) \times \mathcal V/\mathrm{Null}(T))\\ &= \mathrm{Rank}(\mathcal V) \\ \end{align*}

Where the first and last step are by group-theoretic isomorphism theorems. Omitted are the simple proofs that rank respects isomorphism and turns vector space products into sums. $\square$

Projection

We call a transform $T: \mathcal V \to \mathcal V$ a projection if $T^2 = T$, or equivalently $(T \mid \mathrm{Span}(T)) = 1$.

Proposition. If $P$ is a projection then so is $(1 - P)$.

Proof. $(1 - P)^2 = 1 - 2P + P^2 = 1 - P$. $\square$

Inner product and adjoint

Definition of inner product omitted.

For a transformation $T: \mathcal V \to \mathcal W$, we define $T^*$ to be the transformation which satisfies

$(\forall x \in \mathcal V, y \in \mathcal W):\langle Tx, y \rangle = \langle x, T^*y \rangle$

Proposition. $\mathrm{Span}(T)^\perp = \mathrm{Null}(T^*)$ and $\mathrm{Null}(T)^\perp = \mathrm{Span}(T^*)$.

Proof.

\begin{align*} \mathrm{Span}(T)^\perp &= \{y \in \mathcal W \mid (\forall z \in \mathrm{Span}(T)): \langle z, y \rangle = 0\} \\ &= \{y \in \mathcal W \mid (\forall x \in \mathcal V): \langle Tx, y \rangle = 0\} \\ &= \{y \in \mathcal W \mid (\forall x \in \mathcal V): \langle x, T^*y \rangle = 0\} \\ &= \{y \in \mathcal W \mid T^*y = 0\} \\ &= \mathrm{Null(T^*)} \end{align*}

$\square$

Remark: For matrix transpose, this just says that the null space of the columns is the orthogonal complement of the span of the columns.

Matrices

A matrix $M \in {\mathbb R}^{n \times m}$ corresponds to a linear transformation via

$(Mx)_i = \langle M_i, x \rangle = \sum_{j=1}^m M_{ij}x_j$

The span of a matrix as a transformation is equal to the span of its columns.

Theorem. ($\mathrm{Rank}(M) = 1$ if and only if $M = uv^\top$ for some vectors $u, v$.

Proof. The $\Leftarrow$ direction is obvious. For $\Rightarrow$, let $u \in \mathrm{Span}(M)$ be a unit vector, and let $v = M^\top u$.

Suppose that there exists $x$ such that $(uv^\top - M)x \neq 0$. But then $(uu^\top - I)Mx \neq 0$, which is impossible, since $Mx$ must be in $\mathrm{Span}(u)$, and $\mathrm{Null}(uu^\top - I) = \mathrm{Span}(u)$. $\square$

A rank-1 projection is of the form $uu^\top$ for $\|u\| = 1$.

A rank-$n$ projection is of the form $u_1u_1^\top + \dots + u_nu_n^\top$ for orthonormal $u_i$.

Rank-one approximation and singular value decomposition

Let $A \in {\mathbb R}^{n \times d}$ be an arbitrary matrix.

Lemma. Let $v \in {\mathbb R}^d$ be any unit vector. Then the best rank-one approximation of the form $uv^\top$ is $Avv^\top$. And $Av \neq 0$ implies that $\mathrm{rank}(A - Avv^\top) = \mathrm{rank}(A) - 1$.

Proof. For each row $A_i$, the best approximation in the direction of $v$ is $(A_iv)v$, proving that $u = Av$ is the overall argmin. Additionally, if $Av \neq 0$, then we have

\begin{align*} \mathrm{Rank}(\mathrm{Null}(A - Avv^\top)) &= \mathrm{Rank}(\mathrm{Null}(A(I - vv^\top))) \\ &= \mathrm{Rank}(\mathrm{Null}(A) \oplus \mathrm{Span}(v)) \\ &= \mathrm{Rank}(\mathrm{Null}(A)) + 1 \end{align*}

so that the rank-nullity theorem yields the result. $\square$

Remark: Given a matrix, we can reduce its Frobenius norm and rank by successively removing arbitrary rank-1 approximations $Avv^\top$ for any unit $v$ outside the null-space.

Proposition. For any orthonormal sequence of $d$ vectors $v_1, \dots, v_d$, we have

$$A = Av_1v_1^\top + \dots + Av_dv_d^\top = (AV)V^\top$$

where $V$ has columns $v_1, \dots, v_d$.

Lemma. The best rank-1 approximation to $A$ is $uv^\top$ where $v$ is an eigenvector of $A^\top A$ with maximal eigenvalue and $u = Av$ is an eigenvector of $AA^\top$ with maximal eigenvalue.

Proof.

\begin{align*} \mathop{\mathrm{arg\,min}\,}_{|v| = 1} \|A - Avv^\top\|_F^2 &= \mathop{\mathrm{arg\,min}\,}_{|v|=1} (\|A\|_F^2 - \|Avv^\top\|_F^2) \\ &= \mathop{\mathrm{arg\,max}\,}_{|v|=1} \|Avv^\top\|_F^2 \\ &= \mathop{\mathrm{arg\,max}\,}_{|v|=1} |Av|^2 \end{align*}

Where the last step is because $\|uv^\top\|_F^2 = \|u\|_2^2\|v\|_2^2$ for any pair of vectors $u, v$. Therefore, we must select a $v$ which maximizes $f(v) := |Av|^2 = v^\top A^\top Av$, which exists by the compactness of the unit sphere. Lagrange multipliers yield $A^\top Av = \lambda v$ for some $\lambda \in {\mathbb R}$, i.e., $v$ must be an eigenvector of $A^\top A$. Immediately it must be an eigenvector with maximal eigenvalue. By symmetry the same proof yields that $u$ must be an eigenvector of $AA^\top$ with maximal eigenvalue. $\square$

Theorem. Suppose $A \in {\mathbb R}^{n \times d}$ is rank-$r$ and $\sigma_1u_1v_1^\top, \dots, \sigma_ru_rv_r^\top$ is any sequence such that $\sigma_{k+1}u_{k+1}v_{k+1}^\top$ is an optimal rank-one approximation of $A - \sum_1^k \sigma_iu_iv_i^\top$, where $\sigma_i \in {\mathbb R}$ and $|u_i| = |v_i| = 1$. Then both $u_1, \dots, u_r$ and $v_1, \dots, v_r$ are orthonormal and $A = \sum_{i=1}^r \sigma_i u_iv_i^\top$.

Proof. By induction. We proved that $u_{k+1}$ is an eigenvector of

$(A - \sum_1^k u_iv_i^\top)(A - \sum_1^k u_iv_i^\top)^\top = (I - \sum_1^k u_iu_i^\top) AA^\top (I - \sum_1^k u_iu_i^\top)^\top$

so in particular, it must be in the span of $(I - \sum_1^k u_iu_i^\top)$, which is the projection matrix onto the subspace orthogonal to all $u_1, \dots, u_k$, proving the result. Likewise, we proved that $v_{k+1}$ is an eigenvector of

$(A - \sum_1^k u_iv_i^\top)^\top(A - \sum_1^k u_iv_i^\top) = (I - \sum_1^k v_iv_i^\top)^\top A^\top A (I - \sum_1^k v_iv_i^\top)$

hence it must be in the span of $(I - \sum_1^k v_iv_i^\top)^\top = (I - \sum_1^k v_iv_i^\top)$, which similarly makes it orthogonal to every $v_1, \dots, v_k$. $\square$

Characterizing everything else in terms of SVD

For a rank-$r$ matrix $A \in {\mathbb R}^{n \times d}$, let $\sigma_1u_1v_1^\top, \dots, \sigma_ru_rv_r^\top$ be any series of optimal rank-one approximations reducing the rank of $A$. Let $U$ have columns $u_1, \dots, u_r, u_{r+1}, \dots, u_n$ and $V$ have columns $v_1, \dots, v_r, v_{r+1}, \dots, v_d$, where $u_{r+j}$ and $v_{r+j}$ are arbitrary vectors such that the matrices are orthonormal. And define $\sigma_{r+1}, \dots, \sigma_{\min(n,d)}$ to be zero. Then the above theorem yields

$$A = \sum_{i=1}^r \sigma_iu_iv_i^\top = \sum_{i=1}^{\min(n,d)} \sigma_iu_iv_i^\top = U\Sigma V^\top$$

where $\Sigma \in {\mathbb R}^{n \times d}$ has $\Sigma_{ii} = \sigma_i$ and is otherwise zero. We have the following facts.

$AA^\top = U\Sigma^2 U^\top$
$A^\top A = V\Sigma^2 V^\top$
$A$ is an isometric if and only if $n = d$ and $\Sigma = I$
$A$ is positive semidefinite if and only if $\sigma_i \geq 0$
$\|A\|_F^2 = \sum_{i=1}^{\min(n,d)} \sigma_i^2 = \|s\|^2 = \|\Sigma\|_F^2$
$\|A\|_*^2 = \max \{\sigma_i^2 \mid i \in \{1, \dots, \min(n,d)\}\} = \|s\|_\infty = \|\Sigma\|_*^2$
$A^+ := U\Sigma^+ V$ where $\Sigma^+_{ii} := \sigma^+$ where $\sigma^+ := 1/\sigma$ if $\sigma \neq 0$ else $\sigma^+ := 0$.

Proposition. For every $x \in \mathrm{Null}(A)^\perp$, we have $A^+Ax = x$, and for every $y \in \mathrm{Span}(A)$, we have $y^\top A^+A = y^\top$.

Equivalently, $A^+AA^\top = A^\top$.