数值代数练习题

数值代数

Autumn, 2024

Let \(A\) be a symmetric positive definite matrix. Assume that the conjugate gradient method is applied to solve the linear system \(Ax=b\) where \(x^{*}\) is the exact solution.

(i) Prove the following error estimate:

\[ ||x_{k}-x^{*}||_{A}\le\left(\frac{\sqrt{\kappa_{2}(A)}-1}{\sqrt{\kappa_{2}(A)}+1}\right)^{k}||x_{0}-x^{*}||_{A}, \]

where \(x_{k}\) is the solution obtained by the \(k\)-th iteration, \(||\cdot||_{A}\) denotes the \(A\)-norm defined by \(||x||_{A}=\sqrt{x^{T}Ax}\), and \(\kappa_{2}(A)=\frac{\max \lambda(A)}{\min \lambda(A)}\) denotes the condition number of \(A\) under the \(l_{2}\) norm. (If you apply any theorem from approximation theory, please state the theorem clearly. No need to prove that theorem.)

(ii) Describe one Krylov subspace method for non-Hermitian matrices (\(A\ne A^{*}\)). (You can use pseudo-code or words/formulas as long as the steps are clear.)

Spring, 2025

The Sherman-Morrison formula (or the generalized version named Sherman-Morrison-Woodbury formula) is a useful tool in numerical linear algebra. It states: if \(A\in\mathbb{R}^{n\times n}\) is an invertible matrix, \(u,v\in\mathbb{R}^{n}\) are two vectors satisfying \(1+v^{T}A^{-1}u\ne0\); then the matrix \(A+uv^{T}\) is invertible and can be computed by

\[ (A+uv^{T})^{-1}=A^{-1}-\frac{A^{-1}uv^{T}A^{-1}}{1+v^{T}A^{-1}u} \]

(i) Prove the above statement.

(ii) If \(\{s_{j}\}_{j=1}^{n}\) and \(\{t_{j}\}_{j=1}^{n}\) are two sets of points in \([0, 1]\), \(A\in\mathbb{R}^{n\times n}\) is a matrix defined by:

\[ A_{j,k}=4n\cdot\delta_{j,k}+\cos(t_{j}-s_{k}), \]

where \(\delta_{j,k}\) is the Kronecker delta (\(\delta_{j,k}=1\) when \(j=k\), \(\delta_{j,k}=0\) otherwise). \(b\in\mathbb{R}^{n}\) is an arbitrary vector. Please give methods for the evaluation of \(Ab\), \(A^{-1}b\), and \(\det(A)\), all with the computational complexity of \(O(n)\).

证明

\((i)\)：纯计算。

\((ii)\)：利用三角恒等式 \(\cos(t_j - s_k) = \cos t_j \cos s_k + \sin t_j \sin s_k\)，我们可以将矩阵 \(A\) 写成：

\[ A = D + \mathbf{c}_t \mathbf{c}_s^T + \mathbf{s}_t \mathbf{s}_s^T \]

其中：\(D = 4n I\) 是对角矩阵。\(\mathbf{c}_t, \mathbf{s}_t, \mathbf{c}_s, \mathbf{s}_s \in \mathbb{R}^n\) 分别是元素为 \(\cos t_j, \sin t_j, \cos s_k, \sin s_k\) 的列向量。令 \(U = [\mathbf{c}_t, \mathbf{s}_t]\)， \(V = [\mathbf{c}_s, \mathbf{s}_s]\)，则 \(A\) 可以写成

\[ A = D + U V^T \]

对于 \(Ab\)：\(Ab=Db + U(V^Tb)\)，其中 \(Db\) 的计算复杂度为 \(O(n)\)，\(V^Tb\) 的计算复杂度为 \(O(n)\)，因此 \(Ab\) 的计算复杂度为 \(O(n)\)。

对于 \(A^{-1}b\)：我们有

\[ \begin{aligned} (D + UV^T)^{-1} &= D^{-1} - D^{-1} U (I_2 + V^T D^{-1} U)^{-1} V^T D^{-1}\\ &= \frac{1}{4n} I - \frac{1}{4n} U \left(I_2 + \frac{1}{4n} V^T U\right)^{-1} \frac{1}{4n} V^T \end{aligned} \]

中间矩阵 \(M= I_2 + \frac{1}{4n} V^T U\) 复杂度为 \(O(n)\)，其逆的计算复杂度为 \(O(1)\)，\(V^Tb\) 的计算复杂度为 \(O(n)\)，\(M^{-1} (V^Tb)\) 的计算复杂度为 \(O(1)\)，\(U (M^{-1} (V^Tb))\) 的计算复杂度为 \(O(n)\)，因此 \(A^{-1}b\) 的计算复杂度为 \(O(n)\)。

对于 \(\det(A)\)：我们有

\[ \det (A) = \det(D + UV^T) = \det(D) \cdot \det(I_2 + V^T D^{-1} U) = (4n)^n \cdot \det\left(I_2 + \frac{1}{4n} V^T U\right) \]

中间矩阵 \(M= I_2 + \frac{1}{4n} V^T U\) 复杂度为 \(O(n)\)，其行列式的计算复杂度为 \(O(1)\)，因此 \(\det(A)\) 的计算复杂度为 \(O(n)\)。

Autumn,2025

In many applications generalized eigenvalue problem is needed to be solved: finding \(x \in \mathbb{C}^n\) and \(\mu, \lambda \in \mathbb{C}\), s.t.

\[ x \neq 0, \quad |\mu|^2 + |\lambda|^2 = 1, \quad \mu A x = \lambda B x \]

for given \(A, B \in \mathbb{C}^{n \times n}\). Here \(x\) and \((\lambda, \mu)\) are called the eigenvector and eigenvalue of the pair \((A, B)\) respectively. If \(B\) is nonsingular, this problem is equivalent to the standard eigenvalue problem \(A B^{-1} y = \mu^{-1} \lambda y\).For two unitary matrices \(Q, Z\), \(Q^H A Z = S, Q^H B Z = T\) is called a UET performing on \((A, B)\), written as \(Q^H(A, B)Z = (S, T)\) for ease.

(a) Give an example that \((A, B)\) has more than \(n\) eigenvalues.

(b) Prove that there exists a UET such that \(Q^H(A, B)Z = (S, T)\), where \(S, T\) are upper triangular.

(c) Can you obtain the eigenvalues of \((A, B)\) from \((S, T)\)? How?

(d) Propose a method to construct a UET such that \(Q_0^H(A, B)Z_0 = (A_0, B_0)\) where \(A_0\) is upper Hessenberg, and \(B_0\) is upper triangular. For ease, you may assume \(n=4\).

证明

\((a)\)：只需 \(\mathrm{det}(\mu A - \lambda B)=0\) 的解超过 \(n\) 个即可，找 \(A,B\) 有共同的零空间就得到 \((A, B)\) 有无数个特征值。

\((b)\)：归纳证明之。首先存在非平凡解 \((\lambda,\mu)\) 使得 \(\mathrm{det}(\lambda A-\mu B)=0\)，进而存在单位向量 \(z\) 使得 \(\lambda A z= \lambda B z\)，进而可以设 \(Az=\alpha q,Bz=\beta q\)。可以补全矩阵使得 \(Z_1=[z,\cdots],Q_1=[q,\cdots]\) 是酉矩阵，则 \(Q_1^HAZ_1\) 和 \(Q_1^HBZ_1\) 的第一列分别为 \(\alpha e_1\) 和 \(\beta e_1\)，设 \(Q_1^HAZ_1=\begin{pmatrix} \alpha & * \\ 0 & A_1 \end{pmatrix}\)，\(Q_1^HBZ_1=\begin{pmatrix} \beta & * \\ 0 & B_1 \end{pmatrix}\)，则对 \((A_1, B_1)\) 进行归纳假设，存在 \(Q_2, Z_2\) 使得 \(Q_2^HA_1Z_2\) 和 \(Q_2^HB_1Z_2\) 都是上三角矩阵，则 \(Q= Q_1\begin{pmatrix} 1 & 0 \\ 0 & Q_2 \end{pmatrix}\) 和 \(Z=Z_1\begin{pmatrix} 1 & 0 \\ 0 & Z_2 \end{pmatrix}\) 满足要求。

\((c)\)：由 \((b)\) 可知 \(Q^H(A, B)Z = (S, T)\)，故 \(\det(\lambda A-\mu B)=\det(Q)\det(\lambda S-\mu T)\det(Z^H)=0\) 等价于 \(\det(\lambda S-\mu T)=0\)，由于 \(S\) 和 \(T\) 都是上三角矩阵，因此 \(\det(\lambda S-\mu T)=\prod_{i=1}^n (\lambda S_{ii}-\mu T_{ii})=0\)，得到特征值 \((\lambda, \mu) = (S_{ii}, T_{ii})\)，其中 \(i=1, 2, \cdots, n\)。

\((d)\)：不会。

Autumn,2025

Except LU-type factorizations, matrix inversion can be obtained by iterations. Given \(A \in \mathbb{R}^{n \times n}\), consider the sequence \(\{X_k\}\) generated by:

\[ X_{k+1} = 2X_k - X_k A X_k, \quad k=0, 1, 2, \dots \]

Prove:

(a) if \(\|I - A X_0\| < 1\), then \(A\) is nonsingular and \(X_k \to A^{-1}\) quadratically.

(b) there exists \(X_0\) such that \(\|I - A X_0\| < 1\), if and only if \(A\) is nonsingular.

证明

\((a)\)：若 \(\|I - A X_0\| < 1\)，则 \(AX_0\) 是可逆的，因此 \(A\) 是可逆的。再注意到有

\[ I-A X_{k+1} = (I - A X_k)^2=\cdots=(I - A X_0)^{2^{k+1}} \]

即得 \(X_k \to A^{-1}\)，且收敛速度为二次。

\((b)\)：如果 \(A\) 是可逆的，则 \(X_0 = A^{-1}\) 满足 \(\|I - A X_0\| = 0 < 1\)。反之，如果存在 \(X_0\) 满足 \(\|I - A X_0\| < 1\)，则 \(AX_0\) 是可逆的，因此 \(A\) 是可逆的。

.

Let \(A \in \mathbb{R}^{n \times n}\) be such that \(A = (1+\omega)P - (N+\omega P)\), with \(P^{-1}N\) nonsingular and with real eigenvalues \(1 > \lambda_1 \geqslant \lambda_2 \geqslant \ldots \geqslant \lambda_n\). Find the values of \(\omega \in \mathbb{R}\) for which the following iterative method\(\((1+\omega)P x_{k+1} = (N+\omega P)x_k + \mathbf{b}, \quad k \geqslant 0,\)\)converges any \(x_0\) to the solution of the linear system \(Ax = b\). Determine also the value of \(\omega\) for which the convergence rate is maximum.

解

设 \(e_k=x_k-x\)，则 \((1+\omega)P e_{k+1} = (N+\omega P)e_k\)，迭代矩阵为 \((1+\omega)^{-1}P^{-1}(N+\omega P)\). 其特征值为 \(\frac{\lambda_j + \omega}{1+\omega}\)，因此迭代收敛的条件是 \(|\frac{\lambda_j + \omega}{1+\omega}| < 1\)，即 \(\lambda_j < 1\) 和 \(\omega > -\frac{2}{1-\lambda_n}\)。当 \(\omega = -\frac{2}{1-\lambda_n}\) 时，迭代矩阵的谱半径最小，为 \(\frac{1-\lambda_1}{1-\lambda_n}\)。

.

Problem 1. Find the generalized eigenvalue \(\lambda\) and the eigenvector \((x_1, x_2, \cdots, x_n)^T\), such that

\[ \begin{pmatrix} 2 & -1 & & & \\ -1 & 2 & -1 & & \\ & -1 & 2 & \ddots & \\ & & \ddots & \ddots & -1 \\ & & & -1 & 2 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ \vdots \\ x_{n-1} \\ x_n \end{pmatrix} = \lambda \begin{pmatrix} 4 & 1 & & & \\ 1 & 4 & 1 & & \\ & 1 & 4 & \ddots & \\ & & \ddots & \ddots & 1 \\ & & & 1 & 4 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ \vdots \\ x_{n-1} \\ x_n \end{pmatrix}. \]

解

容易得到 \(x_{k-1}+x_{k+1}=\frac{2-4\lambda}{1+\lambda} x_k\)，这里 \(x_0=x_{n+1}=0\)。\(x_k=\sin (k\theta_j)\) 得到 \(\lambda=\frac{1-\cos \theta_j}{2+\cos \theta_j}\)，其中 \(\theta_j=\frac{j\pi}{n+1}\)。

.

Find the eigenvalues and eigenvectors of the following \(N \times N\) tridiagonal matrix:

\[ A = \begin{pmatrix} b & c & & & \\ a & b & c & & \\ & a & b & c & \\ & & \ddots & \ddots & \ddots \\ & & & a & b & c \\ & & & & a & b \end{pmatrix}, \]

where \(a, b, c\) are constants, and \(ac > 0\).

.

Suppose an \(n \times n\) matrix \(A\) is given by

\[ A = \begin{pmatrix} 1 & r & & & \\ & 1 & r & & \\ & & \ddots & \ddots & \\ & & & 1 & r \\ r & & & & 1 \end{pmatrix}, \quad n \times n. \]

For the linear system \(Ax = b\), prove that

\[ \|x\| \leq C \|b\|, \]

where the constant \(C\) is independent of the dimension \(n\).

解

首先要求 \(A\) 可逆即 \(r\neq 1\)，故

\[ \|x\|=\|A^{-1}b\| \leq \|A^{-1}\| \cdot \|b\|. \]

而 \(\|A^{-1}\|=\frac{1}{\lambda_{\min}}\)，其中 \(\lambda_{\min}\) 是 \(A\) 的最小特征值。容易得到 \(A\) 的特征值为 \(1+r e^{i\theta_j}\)，其中 \(\theta_j=\frac{2\pi j}{n}\)，因此 \(\lambda_{\min} \geq 1 - |r|\)，最后我们有

\[ \|x\| \leq \frac{1}{1 - |r|} \|b\|. \]

.

(Circulant Matrices) Every circulant matrix \(C\) has eigenvectors

\[ y^{(m)} = \frac{1}{\sqrt{n}} \left(1, e^{-2\pi i m/n}, \cdots, e^{-2\pi i m(n-1)/n}\right), \quad m = 0, 1, \cdots, n-1, \]

and corresponding eigenvalues

\[ \psi_m = \sum_{k=0}^{n-1} c_k e^{-2\pi i m k / n}. \]

It can be expressed in the form \(C = U \Psi U^*\), where \(U\) has the eigenvectors as columns in order and \(\Psi = \text{diag}(\psi_k)\). In particular all circulant matrices share the same eigenvectors, the same matrix \(U\) works for all circulant matrices, and any matrix of the form \(C = U \Psi U^*\) is circulant.

Prove the above theorem.

解

纯计算，不写。

.

(Companion Matrix) Let

\[ C = \begin{pmatrix} 0 & 0 & \cdots & 0 & -c_0 \\ 1 & 0 & \cdots & 0 & -c_1 \\ 0 & 1 & \cdots & 0 & -c_2 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & -c_{n-1} \end{pmatrix} \]

be a companion matrix with distinct eigenvalues \(\lambda_1, \ldots, \lambda_n\). Show that

\[ V C V^{-1} = \text{diag}(\lambda_1, \ldots, \lambda_n), \]

where

\[ V = \begin{pmatrix} 1 & \lambda_1 & \cdots & \lambda_1^{n-1} \\ 1 & \lambda_2 & \cdots & \lambda_2^{n-1} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & \lambda_n & \cdots & \lambda_n^{n-1} \end{pmatrix}. \]

解

纯计算，不写。

2013 Term

Let \((F_n)_n\) be the Fibonacci sequence, with \(F_0 = 0\), \(F_1 = 1\), and \(F_{n+2} = F_{n+1} + F_n\) for \(n \geq 0\).

(a) Establish a relation between \(\begin{pmatrix} 0 & 1 \\ 1 & 1 \end{pmatrix}^n\) and \(F_n\).
(b) Use it to design an efficient algorithm that for a given \(n\) computes the \(n\)-th Fibonacci number \(F_n\). In particular, it must be more efficient than computing \(F_n\) in \(n\) consecutive steps.
(c) Give an estimate on the number of steps of your algorithm.

Hint: Note that if \(m\) is even then

\[ \begin{pmatrix} 0 & 1 \\ 1 & 1 \end{pmatrix}^m = \left( \begin{pmatrix} 0 & 1 \\ 1 & 1 \end{pmatrix}^{m/2} \right)^2, \]

and if \(m\) is odd then

\[ \begin{pmatrix} 0 & 1 \\ 1 & 1 \end{pmatrix}^m = \begin{pmatrix} 0 & 1 \\ 1 & 1 \end{pmatrix}^{m-1} \cdot \begin{pmatrix} 0 & 1 \\ 1 & 1 \end{pmatrix}, \]

and \(m-1\) is even.

证明

\((a)\) 由数学归纳法可知

\[ \begin{pmatrix} 0 & 1 \\ 1 & 1 \end{pmatrix}^n = \begin{pmatrix} F_{n-1} & F_n \\ F_n & F_{n+1} \end{pmatrix}. \]

\((c)\) 容易得到第 \(n\) 个 Fibonacci 数需要 \(O(\log n)\) 步。

2018 Term

Take \(\sigma_i(A)\) to be the \(i\)-th singular value of the square matrix \(A \in \mathbb{R}^{n \times n}\). Define the nuclear norm of \(A\) to be

\[ \|A\|_* \equiv \sum_{i=1}^n \sigma_i(A). \]

(a) Show that \(\|A\|_* = \operatorname{tr}(\sqrt{A^T A})\).
(b) Show that \(\|A\|_* = \max_{X^T X = I} \operatorname{tr}(AX)\).
(c) Show that \(\|A + B\|_* \leq \|A\|_* + \|B\|_*\).
(d) Explain informally why minimizing \(\|A - A_0\|_*^2 + \|A\|_*\) over \(A\) for a fixed \(A_0 \in \mathbb{R}^{n \times n}\) might yield a low-rank approximation of \(A_0\).

证明

\((a)\) 设 \(A = U \Sigma V^T\) 是 \(A\) 的 SVD 分解，则

\[ \sqrt{A^T A} = \sqrt{V \Sigma^T U^T U \Sigma V^T} = V \Sigma V^T, \]

因此 \(\operatorname{tr}(\sqrt{A^T A}) = \operatorname{tr}(V \Sigma V^T) = \operatorname{tr}(\Sigma) = \|A\|_*\)。

\((b)\) 设 \(A = U \Sigma V^T\) 是 \(A\) 的 SVD 分解，则

\[ \max_{X^T X = I} \operatorname{tr}(AX) = \max_{X^T X = I} \operatorname{tr}(U \Sigma V^T X) = \max_{X^T X = I} \operatorname{tr}(\Sigma V^T X U) = \|A\|_*. \]

最后一个等号是因为 \(\Sigma\) 是一个对角矩阵，且 \(V^T X U\) 是一个正交矩阵，因此 \(\operatorname{tr}(\Sigma V^T X U) \leq \operatorname{tr}(\Sigma)\)。

\((c)\) 由三角不等式可知

\[ \|A + B\|_* = \max_{X^T X = I} \operatorname{tr}((A + B)X) \leq \max_{X^T X = I} (\operatorname{tr}(AX) + \operatorname{tr}(BX)) \leq \|A\|_* + \|B\|_*. \]

\((d)\) 由于 \(\|A\|_*\) 是 \(A\) 的奇异值之和，因此当 \(A\) 的秩较低时，\(\|A\|_*\) 也较小。因此，通过最小化 \(\|A - A_0\|_*^2 + \|A\|_*\)，我们可以鼓励 \(A\) 的秩较低，从而得到 \(A_0\) 的一个低秩近似。

2018 Term

Suppose \(A \in \mathbb{R}^{m \times n}\) with \(m \geq n\) and \(r = \text{rank}(A) < n\), and assume \(A\) has the following SVD:

\[ A = [U_1, U_2] \begin{bmatrix} \Sigma_1 & 0 \\ 0 & 0 \end{bmatrix} [V_1, V_2]^T = U_1 \Sigma_1 V_1^T, \]

where \(\Sigma_1\) is \(r \times r\) nonsingular and \(U_1, V_1\) have \(r\) columns. Let \(\sigma = \sigma_{\min}(\Sigma_1)\), the smallest nonzero singular value of \(A\). Consider the least square problem, for some \(b \in \mathbb{R}^m\),

\[ \min_x \|Ax - b\|_2. \]

(a) Show that all solutions \(x\) can be written as

\[ x = V_1 \Sigma_1^{-1} U_1^T b + V_2 z_2, \]

with \(z_2\) an arbitrary vector.
(b) Show that the solution \(x\) has minimal norm \(\|x\|_2\) precisely when \(z_2 = 0\), and in which case,

\[ \|x\|_2 \leq \frac{\|b\|_2}{\sigma}. \]

证明

\((a)\)：

\[ \begin{aligned} \min_x \|Ax - b\|^2_2 &= \min_x \|U \Sigma V^T x - b\|^2_2\\ &= \min_x \|\Sigma V^T x - U^T b\|^2_2\\ \left(y=V^Tx=\begin{bmatrix} V_1^Tx \\ V_2^T x \end{bmatrix}=\begin{bmatrix} y_1 \\ y_2 \end{bmatrix}\right) &=\min_x \|\Sigma y - U^T b\|^2_2\\ &= \min_{y_1,y_2} \|\Sigma_1 y_1 - U_1^T b\|^2_2 + \|U_2^T b\|^2_2\\ &= \min_{y_1} \|\Sigma_1 y_1 - U_1^T b\|^2_2 + \|U_2^T b\|^2_2\\ &= \|U_2^T b\|^2_2. \end{aligned} \]

取等号时 \(y_1 = \Sigma_1^{-1} U_1^T b\)，而 \(y_2\) 可以是任意值，因此 \(x = V y = V_1 \Sigma_1^{-1} U_1^T b + V_2 z_2\)。

\((b)\)：\(\|x\|_2^2=\|Vx\|_2^2=\|y\|_2^2=\|y_1\|_2^2+\|y_2\|_2^2\)，因此当 \(y_2 = 0\) 时，\(\|x\|_2\) 最小，即 \(x = V_1 \Sigma_1^{-1} U_1^T b\)，并且此时

\[ \|x\|_2 = \|y_1\|_2 = \|\Sigma_1^{-1} U_1^T b\|_2 \leq \|\Sigma_1^{-1}\| \cdot \|U_1^T b\|_2 \leq \frac{\|b\|_2}{\sigma}. \]

2018 Term

Let \(C\) and \(D\) in \(\mathbb{C}^{n \times n}\) be Hermitian matrices. Denote their eigenvalues by

\[ \lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n \quad \text{and} \quad \mu_1 \geq \mu_2 \geq \cdots \geq \mu_n, \]

respectively. Then it is known that

\[ \sum_{i=1}^n (\lambda_i - \mu_i)^2 \leq \|C - D\|_F^2. \]

(a) Let \(A\) and \(B\) be in \(\mathbb{C}^{n \times n}\). Denote their singular values by

\[ \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_n \quad \text{and} \quad \tau_1 \geq \tau_2 \geq \cdots \geq \tau_n, \]

respectively. Prove that the following inequality holds:

\[ \sum_{i=1}^n (\sigma_i - \tau_i)^2 \leq \|A - B\|_F^2. \]
(b) Suppose \(A = U \Sigma V^T\), where \(U = (u_1, u_2, \ldots, u_n)\), \(V = (v_1, v_2, \ldots, v_n)\) are orthogonal matrices, \(\Sigma = \text{diag}(\sigma_1, \sigma_2, \ldots, \sigma_n)\) with \(\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_n \geq 0\). Suppose \(\text{rank}(A) > k\) and denote by

\[ U_k = (u_1, u_2, \ldots, u_k), \quad V_k = (v_1, v_2, \ldots, v_k), \quad \Sigma_k = \text{diag}(\sigma_1, \sigma_2, \ldots, \sigma_k), \]

and

\[ A_k = U_k \Sigma_k V_k^T = \sum_{i=1}^k \sigma_i u_i v_i^T. \]

Prove that

\[ \min_{\text{rank}(B) = k} \|A - B\|_F^2 = \|A - A_k\|_F^2 = \sum_{i=k+1}^n \sigma_i^2. \]
(c) Let the vectors \(x_i \in \mathbb{R}^n\), \(i = 1, 2, \ldots, n\), be in the space \(\mathcal{W}\) with dimension \(d\), where \(d \ll n\). Let the orthonormal basis of \(\mathcal{W}\) be \(W \in \mathbb{R}^{n \times d}\). Then we can represent \(x_i\) by

\[ x_i = c + W r_i + e_i, \quad i = 1, 2, \ldots, n, \]

where \(c \in \mathbb{R}^n\) is a constant vector, \(r_i \in \mathbb{R}^d\) are coordinates, and \(e_i\) is the error. Denote \(R = (r_1, r_2, \ldots, r_n)\) and \(E = (e_1, e_2, \ldots, e_n)\). Find \(W, R\) and \(c\) such that the error \(\|E\|_F\) is minimized.

Hint: Write \(X = [x_1, x_2, \ldots, x_n] = c \mathbf{1}^T + W R + E\), where \(\mathbf{1}\) is the vector of all ones.

证明

\((a)\)：令 \(\tilde{A}=\begin{bmatrix} 0 & A \\ A^T & 0 \end{bmatrix}\) 其余同理。则由给定引理得

\[ \|\tilde{A} - \tilde{B}\|_F^2 \geq \sum_{i=1}^{2n} (\lambda_i(\tilde{A}) - \lambda_i(\tilde{B}))^2 = 2 \sum_{i=1}^n (\sigma_i - \tau_i)^2. \]

最后一个等号是因为 \(\tilde{A}\) 和 \(\tilde{B}\) 的特征值为 \(\pm \sigma_i\) 和 \(\pm \tau_i\)。另一方面我们有

\[ \|\tilde{A} - \tilde{B}\|_F^2 = \|A - B\|_F^2+ \|A^T - B^T\|_F^2 = 2 \|A - B\|_F^2. \]

\((b)\)：设 \(B\) 的秩为 \(k\)，则 \(B\) 的特征值 \(\tau_i = 0\) for \(i > k\)，因此

\[ \|A - B\|_F^2 \geq \sum_{i=1}^n (\sigma_i - \tau_i)^2 = \sum_{i=1}^k (\sigma_i - \tau_i)^2 + \sum_{i=k+1}^n \sigma_i^2 \geq \sum_{i=k+1}^n \sigma_i^2. \]

取等验证：

\[ A-A_k=\sum_{i=k+1}^n \sigma_i u_i v_i^T, \]

其中 \(\sigma_i\) 是 \(A-A_k\) 的奇异值，因此 \(\|A - A_k\|_F^2 = \sum_{i=k+1}^n \sigma_i^2\)。

\((c)\)：相当于最小化 \(\|E\|_F^2 = \|X - c \mathbf{1}^T - W R\|_F^2\)。固定 \(W\) 和 \(R\)，设 \(J(c)=\sum_{i=1}^n \|x_i - c - W r_i\|_2^2\)，则 \(J(c)\) 是 \(c\) 的二次函数，且当 \(c = \frac{1}{n} \sum_{i=1}^n (x_i - W r_i)\) 时取得最小值。不妨 \(\sum r_i=0\)，此时 \(c = \bar{x}=\frac{1}{n} \sum_{i=1}^n x_i\)。令 \(\bar{X}=X-\bar{x}1^T\)，则相当于最小化 \(\|E\|_F^2 = \|\bar{X} - W R\|_F^2\)。固定 \(W\) 且 \(W^TW=I\)，\(\|\bar{X}-WR\|_F^2\) 对 \(R\) 求导得 \(R=W^T \bar{X}\)，此时相当于最小化 \(\|E\|_F^2 = \|\bar{X} - W W^T \bar{X}\|_F^2=\|\bar{X}\|_F^2 - \|W^T \bar{X}\|_F^2\)。因此我们需要最大化 \(\|W^T \bar{X}\|_F^2\)，此时 \(W\) 的列向量是 \(\bar{X} \bar{X}^T\) 的前 \(d\) 个特征向量，

2018 Term

Let \(A \in \mathbb{R}^{n \times m}\) with rank \(r < \min(m, n)\). Let \(A = U \Sigma V^T\) be the SVD of \(A\), with singular values \(\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0\).

(a) Show that, for every \(\epsilon > 0\), there is a full rank matrix \(A_\epsilon \in \mathbb{R}^{n \times m}\) such that

\[ \|A - A_\epsilon\|_2 = \epsilon. \]
(b) Let \(A_k = U \Sigma_k V^T\) where \(\Sigma_k = \text{diag}(\sigma_1, \ldots, \sigma_k, 0, \ldots, 0)\) and \(1 \leq k \leq r-1\). Show that \(\text{rank}(A_k) = k\) and

\[ \sigma_{k+1} = \|A - A_k\|_2 = \min \{ \|A - B\|_2 \mid \text{rank}(B) \leq k \}. \]
(c) Assume that \(r = \min(m, n)\). Let \(B \in \mathbb{R}^{n \times m}\) and assume that \(\|A - B\|_2 < \sigma_r\). Show that \(\text{rank}(B) = r\).

证明

\((a)\)：设 \(A_\epsilon = U \Sigma_\epsilon V^T\)，其中 \(\Sigma_\epsilon = \text{diag}(\sigma_1, \ldots, \sigma_r, \epsilon, \ldots, \epsilon)\)，则 \(\|A - A_\epsilon\|_2 = \|\Sigma - \Sigma_\epsilon\|_2 = \epsilon\)。

\((b)\)：\(\text{rank}(A_k) = k\) 是显然的。下设 \(B\) 是任意一个满足 \(\text{rank}(B) \leq k\) 的矩阵，考虑 \(B\) 的零空间，则 \(\mathrm{dim} \mathcal{N}(B) \geq m-k\)，再考虑 \(A\) 的前 \(k+1\) 个右奇异向量张成的空间 \(V_{k+1}\)，则 \(\mathrm{dim} V_{k+1} = k+1\)，因此 \(V_{k+1} \cap \mathcal{N}(B) \neq \{0\}\)。设 \(x\) 是 \(V_{k+1} \cap \mathcal{N}(B)\) 中的一个非零单位向量，则

\[ \|A - B\|_2 \geq \frac{\|(A-B)x\|_2}{\|x\|_2} = \|Ax\|_2 \]

由于 \(x\) 是 \(V_{k+1}\) 中的一个非零单位向量，因此 \(x\) 可以表示为 \(x = \sum_{i=1}^{k+1} \alpha_i v_i\)，其中 \(\sum_{i=1}^{k+1} \alpha_i^2 = 1\)。因此

\[ \|Ax\|_2 ^2= \|\sum_{i=1}^{k+1} \alpha_i \sigma_i u_i\|_2^2 =\sum_{i=1}^{k+1} \alpha_i^2 \sigma_i^2 \geq \sum_{i=1}^{k+1} \alpha_i^2 \sigma_{k+1}^2 = \sigma_{k+1}^2. \]

\((c)\)：由 Weyl 定理可知 \(|\sigma_i(A) - \sigma_i(B)| \leq \|A - B\|_2\)，因此 \(\sigma_r(B) \geq \sigma_r(A) - \|A - B\|_2 > 0\)，因此 \(\text{rank}(B) = r\)。

.

Let \(b \in \mathbb{R}^n\). Suppose \(A \in M_{n \times n}(\mathbb{R})\) and \(B \in M_{n \times n}(\mathbb{R})\) are two \(n \times n\) matrices. Let \(A\) be non-singular.

(a) Consider the iterative scheme: \(A x^{k+1} = b - B x^k\). State and prove the necessary and sufficient condition for the iterative scheme to converge.
(b) Suppose the spectral radius of \(A^{-1}B\) satisfies \(\rho(A^{-1}B) = 0\). Prove that the iterative scheme converges in \(n\) iterations.
(c) Consider the following iterative scheme:

\[ x^{(k+1)} = \omega_1 x^{(k)} + \omega_2 (c_1 - M x^{(k)}) + \omega_3 (c_2 - M x^{(k)}) + \ldots + \omega_k (c_{k-1} - M x^{(k)}), \]

where \(M\) is symmetric and positive definite, \(\omega_1 > 1\), \(\omega_2, \ldots, \omega_k > 0\) and \(c_1, \ldots, c_{k-1} \in \mathbb{R}^n\). Deduce from (a) that the iterative scheme converges if and only if all eigenvalues of \(M\) (denote it as \(\lambda(M)\)) satisfy:

\[ \frac{\omega_1 - 1}{\sum_{i=2}^k \omega_i} < \lambda(M) < \frac{\omega_1 + 1}{\sum_{i=2}^k \omega_i}. \]
(d) Let \(A\) be non-singular. Now, consider the following system of iterative scheme (*):

\[ \begin{aligned} A x_1^{(k+1)} &= b_1 - B x_2^{(k)}, \\ A x_2^{(k+1)} &= b_2 - B x_1^{(k)}. \end{aligned} \]

Find and prove the necessary and sufficient condition for the iterative scheme (*) to converge.

For the iterative scheme (**):

\[ \begin{aligned} A x_1^{(k+1)} &= b_1 - B x_2^{(k)}, \\ A x_2^{(k+1)} &= b_2 - B x_1^{(k+1)}. \end{aligned} \]

Find and prove the necessary and sufficient condition for the iterative scheme (**) to converge. Compare the rate of convergence of the iterative schemes (*) and (**).

证明

\((c)\)：由 (a) 可知，迭代收敛的充分必要条件是 \(\rho(G) < 1\)，其中

\[ G = \omega_1 I - \left( \sum_{i=2}^k \omega_i \right) M. \]

因此 \(\rho(G) = \max_{\lambda \in \lambda(M)} |\omega_1 - \left( \sum_{i=2}^k \omega_i \right) \lambda| < 1\)，解出 \(\lambda\) 的范围即得结论。

\((d)\)：迭代格式 (*) 的迭代矩阵为

\[ G_* = \begin{bmatrix} 0 & A^{-1} B \\ A^{-1} B & 0 \end{bmatrix}, \]

迭代格式 (**) 的迭代矩阵为

\[ G_{**} = \begin{bmatrix} 0 & A^{-1} B \\ (A^{-1} B)^2 & 0 \end{bmatrix}. \]

由 (a) 可知，迭代格式 (*) 收敛的充分必要条件是 \(\rho(G_*) < 1\)，迭代格式 (**) 收敛的充分必要条件是 \(\rho(G_{**}) < 1\)。由于 \(G_*^2 = G_{**}\)，因此 \(\rho(G_*)^2 = \rho(G_{**})\)，因此 \(\rho(G_*) < 1\) 当且仅当 \(\rho(G_{**}) < 1\)。因此两种迭代格式的收敛性条件相同。另一方面，由于 \(G_{**} = G_*^2\)，因此 \(G_{**}^k = G_*^{2k}\)，因此迭代格式 (**) 的收敛速度比迭代格式 (*) 快。

.

(Richardson Iteration) Let \(A\) be an \(n \times n\) matrix with real and positive eigenvalues and \(b\) be a given vector. Consider the solution of \(Ax = b\) by the following Richardson's iteration

\[ x^{(k+1)} = (I - \omega A)x^{(k)} + \omega b, \]

where \(\omega\) is a damping coefficient. Let \(\lambda_1\) and \(\lambda_n\) be the smallest and the largest eigenvalues of \(A\). Let \(G_\omega = I - \omega A\).

(a) Prove that the Richardson's iteration converges if and only if

\[ 0 < \omega < \frac{2}{\lambda_n}. \]
(b) Prove that the optimal choice of \(\omega\) is given by

\[ \omega_{\text{opt}} = \frac{2}{\lambda_1 + \lambda_n}. \]

Prove also that

\[ \rho(G_\omega) = \begin{cases} 1 - \omega \lambda_1, & \omega \leq \omega_{\text{opt}}, \\ \frac{\lambda_n - \lambda_1}{\lambda_n + \lambda_1}, & \omega = \omega_{\text{opt}}, \\ \omega \lambda_n - 1, & \omega \geq \omega_{\text{opt}}. \end{cases} \]

where \(\rho(G_\omega)\) is the spectral radius of \(G_\omega\).
(c) Prove that, if \(A\) is symmetric and positive definite, then

\[ \rho(G_{\omega_{\text{opt}}}) = \frac{\kappa_2(A) - 1}{\kappa_2(A) + 1}, \]

where \(\kappa_2(A) = \lambda_n / \lambda_1\).

.

(Power Method) Suppose \(A\) is an \(m \times m\) matrix with a complete set of orthonormal eigenvectors \(q_1, \ldots, q_m\) and corresponding eigenvalues \(\lambda_1, \ldots, \lambda_m\). Assume that \(|\lambda_1| > |\lambda_2| > |\lambda_3|\) and \(\lambda_j \geq \lambda_{j+1}\) for \(j = 3, \ldots, m\). Consider the power method \(\psi^{(k)} = A \psi^{(k-1)} / \lambda_1\), with \(\psi^{(0)} = \alpha_1 q_1 + \cdots + \alpha_m q_m\) where \(\alpha_1\) and \(\alpha_2\) are both nonzero. Show that the sequence \(\{\psi^{(k)}\}_{k=0}^{\infty}\) converges linearly to \(\alpha_1 q_1\) with asymptotic constant \(C = |\lambda_2 / \lambda_1|\).

证明

由定义可知

\[ \psi^{(k)} = \frac{A^k \psi^{(0)}}{\lambda_1^k} = \alpha_1 q_1 + \sum_{j=2}^m \alpha_j \left( \frac{\lambda_j}{\lambda_1} \right)^k q_j. \]

因此

\[ \|\psi^{(k)} - \alpha_1 q_1\|_2 = \left\| \sum_{j=2}^m \alpha_j \left( \frac{\lambda_j}{\lambda_1} \right)^k q_j \right\|_2 \leq \sum_{j=2}^m |\alpha_j| \left| \frac{\lambda_j}{\lambda_1} \right|^k \leq C^k \sum_{j=2}^m |\alpha_j|, \]

因此 \(\{\psi^{(k)}\}\) 以线性速率收敛到 \(\alpha_1 q_1\)，且渐近常数为 \(C = |\lambda_2 / \lambda_1|\)。

.

(Lanczos Iteration) Let \(A \in \mathbb{R}^{n \times n}\) be symmetric and let \(\|q_1\|_2 = 1\). Consider the following Lanczos iteration:

\[ \begin{aligned} &r_0 = q_1, \quad \beta_0 = 1, \quad q_0 = 0, \quad k := 0 \\ &\text{while } \beta_k \neq 0 \\ &\quad q_{k+1} := r_k / \beta_k \\ &\quad k := k + 1 \\ &\quad \alpha_k := q_k^T A q_k \\ &\quad r_k := (A - \alpha_k I) q_k - \beta_{k-1} q_{k-1} \\ &\quad \beta_k := \|r_k\|_2 \\ &\text{end} \end{aligned} \]

Let \(\mathcal{K}_n = \text{span}\{q_1, A q_1, \cdots, A^{n-1} q_1\}\).

(a) Show that

\[ A Q_k = Q_k T_k + r_k e_k^T, \]

where \(e_k\) is the \(k\)-th unit vector, \(Q_k = [q_1 \cdots q_k]\) and

\[ T_k = \begin{bmatrix} \alpha_1 & \beta_1 & & & \\ \beta_1 & \alpha_2 & \beta_2 & & \\ & \beta_2 & \alpha_3 & \ddots & \\ & & \ddots & \ddots & \beta_{k-1} \\ & & & \beta_{k-1} & \alpha_k \end{bmatrix}. \]
(b) Assume that the iteration does not terminate. Show that \(Q_k\) has orthonormal columns, and that they span \(\mathcal{K}_k\).
(c) Show that the Lanczos iteration will stop when \(k = m\), where \(m = \text{rank}(\mathcal{K}_n)\).
(d) What is the purpose of this algorithm? Briefly justify your answer.

证明

\((a)\)：由定义即得。

\((b)\)：归纳即证 \(Q_k\) 列正交。后者也可由归纳即得。

\((c)\)：当 \(k = m\) 时，\(\mathcal{K}_m = \mathcal{K}_{m+1}\)，因此 \(r_m = 0\)，迭代停止。这表明

\[ Aq_m= \alpha_m q_m + \beta_{m-1} q_{m-1} \in \mathcal{K}_m, \]

故 \(\mathcal{K}_m\) 是 \(A\) 的不变子空间，因此 \(\mathcal{K}_m = \mathcal{K}_{m+1} = \cdots\)。另一方面，如果 \(k < m\)，则 \(\mathcal{K}_k \subsetneq \mathcal{K}_{k+1}\)，因此 \(r_k \neq 0\)，迭代不会停止。

\((d)\)：Lanczos 迭代的目的是求解大型稀疏矩阵的特征值问题。由于 \(T_k\) 是 \(A\) 在 \(\mathcal{K}_k\) 上的表示，因此 \(T_k\) 的特征值是 \(A\) 的 Ritz 值，且当 \(k\) 增大时，Ritz 值会收敛到 \(A\) 的特征值。

.

(Arnoldi Process) Given a vector \(b \in \mathbb{R}^m\) and \(A \in \mathbb{R}^{m \times m}\), the Arnoldi process is a systematic way of constructing an orthonormal basis for the successive Krylov subspaces

\[ \mathcal{K}_n = \langle b, Ab, \ldots, A^{n-1} b \rangle, \quad n = 1, 2, \ldots \]

It gives

\[ A Q_n = Q_{n+1} \tilde{H}_n, \]

where \(Q_n \in \mathbb{R}^{m \times n}\), \(Q_{n+1} \in \mathbb{R}^{m \times (n+1)}\) have orthonormal columns and \(\tilde{H}_n \in \mathbb{R}^{(n+1) \times n}\) is upper-Hessenberg. Let \(H_n \in \mathbb{R}^{n \times n}\) be obtained by deleting the last row of \(\tilde{H}_n\).

(a) Write out the Arnoldi algorithm.
(b) Assume that at step \(n\), the \((n+1,n)\)-th entry of \(\tilde{H}_n\) is zero.
- i. Show that \(\mathcal{K}_n\) is an invariant subspace of \(A\) and that \(\mathcal{K}_n = \mathcal{K}_{n+1} = \mathcal{K}_{n+2} = \ldots\)
- ii. Show that each eigenvalue of \(H_n\) is an eigenvalue of \(A\) for \(n > 1\).
(c) Let \(\mathcal{P}_n\) be the set of monic polynomials of degree \(n\). Show that the minimizer of

\[ \min_{p_n \in \mathcal{P}_n} \|p_n(A) b\|_2 \]

is given by the characteristic polynomial of \(H_n\).

证明

\((a)\)：Arnoldi 算法如下：

\[ \begin{aligned} &q_1 = b / \|b\|_2, \quad n := 1 \\ &\text{while } n \leq m \\ &\quad v := A q_n \\ &\quad \text{for } i = 1, 2, \ldots, n \\ &\qquad h_{i,n} := q_i^T v \\ &\qquad v := v - h_{i,n} q_i \\ &\quad \text{end for} \\ &\quad h_{n+1,n} := \|v\|_2 \\ &\quad \text{if } h_{n+1,n} = 0 \text{ then stop} \\ &\quad q_{n+1} := v / h_{n+1,n} \\ &\quad n := n + 1 \\ &\text{end while} \end{aligned} \]

\((b)\)：这表明 \(v\) 已经在 \(\text{span}\{q_1, \ldots, q_n\}\) 中。第 \(n\) 列的关系为：

\[ Aq_n = \sum_{i=1}^n h_{i,n} q_i\in \text{span}\{Q_n\} \]

此时矩阵关系化简为 \(AQ_n=Q_n H_n\)，这里 \(H_n\) 是 \(\tilde{H}_n\) 的前 \(n\) 行。对任意的 \(x\in \mathcal{K}_n\)，存在 \(y\in \mathbb{R}^n\) 使得 \(x=Q_n y\)，那么 \(Ax=A Q_n y=Q_n H_n y \in \mathcal{K}_n\)，所以 \(\mathcal{K}_n\) 是 \(A\) 的不变子空间，因此 \(\mathcal{K}_n = \mathcal{K}_{n+1} = \cdots\)。

设 \(\lambda\) 是 \(H_n\) 的一个特征值，\(y\) 是对应的特征向量，则 \(A Q_n y = Q_n H_n y = \lambda Q_n y\)，而由于 \(Q_n\) 的列向量是标准正交的，且 \(y\neq 0\)，因此 \(Q_n y \neq 0\)，因此 \(\lambda\) 是 \(A\) 的一个特征值。

\((c)\)：由 \(A Q_n = Q_{n+1} \tilde{H}_n\) 可知 \(A^k Q_n=Q_n H_n^k\)，进而有 \(p_n(A) Q_n = Q_n p_n(H_n)\)。因此

\[ p_n(A) b=p_n(A)(\|b\| Q_n e_1)=\|b\| p_n(A) Q_n e_1 = \|b\| Q_n p_n(H_n) e_1, \]

由于 \(Q_n\) 是列正交矩阵，

\[ \|p_n(A) b\|_2 = \|\|b\| p_n(H_n) e_1\|_2=\|b\| \|p_n(H_n) e_1\|_2. \]

取 \(p_n\) 为 \(H_n\) 的特征多项式，则 \(p_n(H_n) = 0\)，因此 \(\|p_n(A) b\|_2 = 0\)，是最小的。

.

(Steepest Descent) Let \(A\) be a positive definite matrix. Consider the quadratic function \(g(x) = \frac{1}{2} x^T A x - b^T x\). Suppose the eigenvalues of \(A\) are given by \(0 < \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n\), and \(\kappa = \frac{\lambda_n}{\lambda_1}\). Let the sequence \(\{x_k\}\) be generated by the steepest descent method:

\[ x_{k+1} = x_k - \alpha_k \nabla g(x_k), \quad k = 0, 1, 2, \ldots, \]

where \(\alpha_k\) is selected such that

\[ g(x_k - \alpha_k \nabla g(x_k)) = \min_{\alpha \geq 0} g(x_k - \alpha \nabla g(x_k)). \]

Prove that

\[ g(x_{k+1}) \leq \left( \frac{\kappa - 1}{\kappa + 1} \right)^2 g(x_k) \leq \left( \frac{\kappa - 1}{\kappa + 1} \right)^{2(k+1)} g(x_0). \]

证明

取 \(x*\) 为 \(g(x)\) 唯一极小值点，满足 \(Ax^*=b\)。则

\[ E(x_k)=g(x_k)-g(x^*)=\frac{1}{2}(x_k-x^*)^T A (x_k-x^*)=\frac{1}{2}e_k^T A e_k, \]

\[ \nabla g(x_k) = A x_k - b = A e_k=r_k. \]

其中 \(e_k = x_k - x^*\)。

\[ \begin{aligned} \frac{d}{d\alpha} g(x_k - \alpha \nabla g(x_k)) &= -\nabla g(x_k)^T \nabla g(x_k - \alpha \nabla g(x_k)) \\ &= -r_k^T \cdot (A(x_k-\alpha r_k)-b)\\ &=-r_k^T(r_k-\alpha A r_k)=0\\ \end{aligned} \]

解得 \(\alpha_k = \frac{r_k^T r_k}{r_k^T A r_k}\)。因此

\[ \begin{aligned} E(x_{k+1})&=g(x_{k+1})-g(x^*)\\ &=g(x_k-\alpha_k r_k)-g(x^*)\\ &=(g(x_k)-\alpha_k r_k^T \nabla g(x_k) + \frac{1}{2} \alpha_k^2 r_k^T A r_k) - g(x^*)\\ &=E(x_k) - \alpha_k r_k^T r_k + \frac{1}{2} \alpha_k^2 r_k^T A r_k\\ &=E(x_k) - \frac{1}{2} \frac{(r_k^T r_k)^2}{r_k^T A r_k}\\ \end{aligned} \]

因此

\[ \frac{E_{k+1}}{E_k} = 1 - \frac{(r_k^T r_k)^2}{(r_k^T A r_k)(r_k^T A^{-1} r_k)} \]

由 Kantorovich 不等式可知对于正定矩阵 \(A\) 及其特征值 \(0 < \lambda_1 \leq \cdots \leq \lambda_n\)，对任意非零向量 \(x\)，有：

\[ \frac{(x^T x)^2}{(x^T A x)(x^T A^{-1} x)} \geq \frac{4 \lambda_1 \lambda_n}{(\lambda_1 + \lambda_n)^2} \]

于是

\[ \frac{E_{k+1}}{E_k} \leq 1 - \frac{4 \lambda_1 \lambda_n}{(\lambda_1 + \lambda_n)^2} = \left( \frac{\kappa - 1}{\kappa + 1} \right)^2. \]

.

(Conjugate Gradient Method) Let \(A\) be an \(N \times N\) symmetric positive definite matrix. The conjugate gradient method can be described as follows:

\[ \begin{aligned} &\mathbf{r}_0 = \mathbf{b} - A \mathbf{x}_0, \quad \mathbf{p}_0 = \mathbf{r}_0, \quad \mathbf{x}_0 = 0 \\ &\text{For } n = 0, 1, \ldots \\ &\quad \alpha_n = \|\mathbf{r}_n\|_2^2 / (\mathbf{p}_n^T A \mathbf{p}_n) \\ &\quad \mathbf{x}_{n+1} = \mathbf{x}_n + \alpha_n \mathbf{p}_n \\ &\quad \mathbf{r}_{n+1} = \mathbf{r}_n - \alpha_n A \mathbf{p}_n \\ &\quad \beta_n = -\mathbf{r}_{n+1}^T A \mathbf{p}_n / \mathbf{p}_n^T A \mathbf{p}_n \\ &\quad \mathbf{p}_{n+1} = \mathbf{r}_{n+1} + \beta_n \mathbf{p}_n \\ &\text{End For} \end{aligned} \]

Show that:

(a) \(\alpha_n\) minimizes \(f(\mathbf{x}_n + \alpha \mathbf{p}_n)\) for all \(\alpha \in \mathbb{R}\), where \(f(\mathbf{x}) \equiv \frac{1}{2} \mathbf{x}^T A \mathbf{x} - \mathbf{b}^T \mathbf{x}\).
(b) \(\mathbf{p}_i^T \mathbf{r}_n = 0\) for \(i < n\) and \(\mathbf{p}_i^T A \mathbf{p}_j = 0\) if \(i \neq j\).
(c) \(\text{Span}\{\mathbf{p}_0, \mathbf{p}_1, \ldots, \mathbf{p}_{n-1}\} = \text{Span}\{\mathbf{r}_0, \mathbf{r}_1, \ldots, \mathbf{r}_{n-1}\} \equiv \mathcal{K}_n\).
(d) \(\mathbf{r}_n\) is orthogonal to \(\mathcal{K}_n\).

证明

\((a)\)：上一问中已详细写出证明。

\((b)\)：归纳即证。

\((c)\)：\(\mathbf{r}_k=\mathbf{r}_{k-1}-\alpha_{k-1} A \mathbf{p}_{k-1}\in \mathcal{K}_{k+1}\) 且 \(\mathbf{r}_k\) 正交于之前的 \(\mathbf{p}_i\)，故 \(\text{Span}\{\mathbf{r}_0,\cdots,\mathbf{r}_k\}=\mathcal{K}_{k+1}\)。另一方面 \(\mathbf{p}_k=\mathbf{r}_k+\beta_{k-1} \mathbf{p}_{k-1}\in \mathcal{K}_{k+1}\)，因此 \(\text{Span}\{\mathbf{p}_0,\cdots,\mathbf{p}_k\} \subseteq \mathcal{K}_{k+1}\)。

\((d)\)：由 (b) 可知 \(\mathbf{r}_n\) 正交于 \(\mathbf{p}_0, \ldots, \mathbf{p}_{n-1}\)，由 (c) 可知 \(\mathbf{p}_0, \ldots, \mathbf{p}_{n-1}\) 张成 \(\mathcal{K}_n\)，因此 \(\mathbf{r}_n\) 正交于 \(\mathcal{K}_n\)。

.

(GMRES Method) Consider the linear system \(Ax = b\). The GMRES method is a projection method which obtains a solution in the \(m\)-th Krylov subspace \(\mathcal{K}_m\) so that the residual is orthogonal to \(A \mathcal{K}_m\). Let \(r_0\) be the initial residual and let \(v_0 = r_0 / \|r_0\|_2\). The Arnoldi process is applied to build an orthonormal system \(v_1, v_2, \ldots, v_{m-1}\) with \(v_1 = A v_0 / \|A v_0\|_2\). The approximate solution is obtained from the space \(\mathcal{K}_m = \text{span}\{v_0, v_1, \ldots, v_{m-1}\}\).

(i) Show that the approximate solution is obtained as the solution of a least-square problem, and that this problem is triangular.
(ii) Prove that the residual \(r_k\) is orthogonal to \(\{v_1, v_2, \ldots, v_{k-1}\}\).
(iii) Find a formula for the residual norm.
(iv) Derive the complete algorithm.

.

(Structured Linear Systems)

(1) Suppose

\[ S = \begin{bmatrix} \sigma & u^T \\ 0 & S_c \end{bmatrix}, \quad T = \begin{bmatrix} \tau & v^T \\ 0 & T_c \end{bmatrix}, \quad b = \begin{bmatrix} \beta \\ b_c \end{bmatrix}, \]

where \(\sigma, \tau, \beta\) are scalars, \(S_c\) and \(T_c\) are \(n \times n\) matrices, and \(b_c\) is an \(n\)-vector. Show that if there exists a vector \(\mathbf{x}_c\) such that

\[ (S_c T_c - \lambda I) \mathbf{x}_c = b_c, \]

and \(\mathbf{w}_c = T_c \mathbf{x}_c\) is available, then

\[ \mathbf{x} = \begin{bmatrix} \gamma \\ \mathbf{x}_c \end{bmatrix}, \quad \gamma = \frac{\beta - \sigma v^T \mathbf{x}_c - u^T \mathbf{w}_c}{\sigma \tau - \lambda}, \]

solves \((ST - \lambda I) \mathbf{x} = b\).
(2) Hence or otherwise, derive an \(O(n^2)\) algorithm for solving the linear system \((U_1 U_2 - \lambda I) \mathbf{x} = b\) where \(U_1\) and \(U_2\) are \(n \times n\) upper triangular matrices, and \((U_1 U_2 - \lambda I)\) is nonsingular. Please write down your algorithm and prove that it is indeed of \(O(n^2)\) complexity.
(3) Hence or otherwise, derive an \(O(p n^2)\) algorithm for solving the linear system \((U_1 U_2 \cdots U_p - \lambda I) \mathbf{x} = b\) where \(\{U_i\}_{i=1}^p\) are all \(n \times n\) upper triangular matrices, and \((U_1 U_2 \cdots U_p - \lambda I)\) is nonsingular. Please write down your algorithm and prove that it is indeed of \(O(p n^2)\) complexity.