数值代数练习题
数值代数
Autumn, 2023
Consider the explicit shifted QR method for computing eigenvalues of a matrix \(A\in\mathbb{C}^{n\times n}\).
1: find an upper Hessenberg matrix \(H_{0}\) and a unitary matrix \(U_{0}\) such that \(H_{0}=U_{0}^{H}AU_{0}\).
2: for \(i=0,1,2,\dots\) do
3: determine a scalar \(\mu_{k}.\)
4: compute QR factorization \(Q_{k}R_{k}=H_{k}-\mu_{k}I\)
5: \(H_{k+1}=R_{k}Q_{k}+\mu_{k}I\).
6: end for
(1). Prove that \(H_{i}\), \(i=1,2,\dots\) are all upper Hessenberg matrices.
(2). Interpret the purpose to use \(H_{0}\) rather than \(A\) in the iteration.
(3). Suppose that \(A\) has \(n\) distinct eigenvalues and none of the shifts \(\mu_{i}\), \(i=1,2,\dots\) is an eigenvalue of \(A\). Prove that \(H_{i}\), \(i=0,1,2,\dots\) are unreduced upper Hessenberg matrices. (An upper Hessenberg matrix \(H\) is called unreduced, if \(H_{i+1,i}\ne0\) for \(i=1,\dots,n-1.\))
(4). Write \(H_{k}=\begin{bmatrix}G_{k}&u_{k}\\ \epsilon_{k}e^{T}&\alpha_{k}\end{bmatrix}\) where \(\alpha_{k}, \epsilon_{k}\in\mathbb{C}\), \(u_{k}, e\in\mathbb{C}^{n-1}\) and \(e=\begin{bmatrix}0\\ \vdots\\ 0\\ 1\end{bmatrix}.\) Suppose \(A\) has \(n\) distinct eigenvalues. Prove that
where \(\rho_{k}=||(G_{k}-\mu_{k}I)^{-1}||_{2}\), provided \(\mu_{k}\) is not an eigenvalue of \(G_{k}\).
证明
\((1)\):归纳。\(H_0\) 已经成立。\(H_k-\mu_k I\) 上 Hessenberg,\(R_k\) 上三角,\(Q_k=(H_k-\mu_kI)R_k^{-1}\) 上 Hessenberg,因此 \(H_{k+1}=R_k Q_k + \mu_k I\) 上 Hessenberg。
\((2)\):如果直接对稠密矩阵 \(A\) 迭代,对一个 \(n \times n\) 的稠密矩阵进行一次 QR 分解需要 \(\mathcal{O}(n^3)\) 的浮点运算。如果先化为 \(H_0\) 再迭代,只需要在初始阶段花费一次 \(\mathcal{O}(n^3)\) 的代价。在此之后的每一次 QR 迭代单步迭代的代价骤降为 \(\mathcal{O}(n^2)\)。总复杂度大幅优化为 \(\mathcal{O}(n^3 + N n^2)\)。
\((3)\):假设 \(H_0\) 是标准的非退化 Hessenberg 矩阵。我们证明如果在第 \(k\) 步 \(H_k\) 是非退化的,那么 \(H_{k+1}\) 也是非退化的。
因为 \(\mu_k\) 不是 \(A\) 的特征值,所以 \(H_k - \mu_k I\) 是非奇异的。因此,其 QR 分解中的上三角矩阵 \(R_k\) 必定也是非奇异的,即它的对角线元素全不为零:\((R_k)_{j,j} \ne 0 \quad \forall j\)。写出 \(Q_k = (H_k - \mu_k I) R_k^{-1}\)。
观察 \(Q_k\) 的次对角线元素:
因为 \(H_k\) 是非退化的,且 \((R_k^{-1})_{j,j} = 1 / (R_k)_{j,j} \ne 0\),所以:
接下来观察 \(H_{k+1}\) 的次对角线元素,由 \(H_{k+1} - \mu_k I = R_k Q_k\) 得到:
由于 \(R_k\) 对角线非零,且刚才证了 \(Q_k\) 次对角线非零,两者的乘积必然非零:
这证明了如果 \(H_k\) 的次对角线不包含 0,则 \(H_{k+1}\) 的次对角线也绝不会包含 0。因此所有 \(H_i\) 都是非退化的。证毕。
\((4)\):设 \(Q_k =\begin{bmatrix}Q_{11}&q_{12}\\ q^T_{21}&q_{22}\end{bmatrix}\) ,\(R_k = \begin{bmatrix}R_{11}&r_{12}\\ 0&r_{22}\end{bmatrix}\)。 考虑 \(Q_k R_k = H_k - \mu_k I\),
这给出了
因此,
考虑 \(R_k = Q_k^* (H_k - \mu_k I)\),
这给出了
因此,
考虑 \(H_{k+1} - \mu_k I = R_k Q_k\),
这给出了
因此,
Spring, 2024
Suppose \(k \le n \le m\) and \(A \in \mathbb{C}^{m \times n}\).
(i) Find a rank-\(k\) matrix \(X\) satisfying \(||A-X||_{2} \le ||A-B||_{2}\) for any rank-\(k\) matrix \(B\). Here \(|| \cdot ||_{2}\) is the spectral norm.
(ii) Find a rank-\(k\) matrix \(X\) satisfying \(||A-X||_{F} \le ||A-B||_{F}\) for any rank-\(k\) matrix \(B\). Here \(|| \cdot ||_{F}\) is the Frobenius norm.
证明
考虑对 \(A\) 进行 SVD 分解:
其中:\(U = [u_1, u_2, \dots, u_m] \in \mathbb{C}^{m \times m}\) 和 \(V = [v_1, v_2, \dots, v_n] \in \mathbb{C}^{n \times n}\) 是酉矩阵。\(\Sigma \in \mathbb{R}^{m \times n}\) 是对角矩阵,其对角线元素为降序排列的奇异值 \(\sigma_1 \ge \sigma_2 \ge \dots \ge \sigma_n \ge 0\)。两问的解均为:
(i):对于任意 rank-\(k\) 矩阵 \(B\),其零空间维数至少为 \(n-k\)。考虑由 \(V\) 的前 \(k+1\) 列张成的子空间 \(\mathcal{W} = \text{span}\{v_1, v_2, \dots, v_{k+1}\}\),其维数为 \(k+1\)。这两个子空间的交集必然非空。取单位向量 \(z \in \mathcal{W} \cap \mathcal{N}(B)\),则有 \(Bz=0\) 且 \(z=\sum_{i=1}^{k+1} c_i v_i\),满足 \(\sum_{i=1}^{k+1} c_i^2 = 1\)。因此:
(ii) Frobenius 范数在酉变换下保持不变,因此:
设 \(C = U^* B V\)。因为 \(B\) 的秩为 \(k\),所以矩阵 \(C\) 的秩也为 \(k\)。我们要最小化 \(||\Sigma - C||_F^2 = \sum_{i,j} |\Sigma_{i,j} - C_{i,j}|^2\)。由于 \(\Sigma\) 是对角阵,对角线外元素为 \(0\),要使总平方差最小,最直观的做法就是让 \(C\) 完全去匹配 \(\Sigma\) 对角线上最大的 \(k\) 个元素(即 \(\sigma_1, \dots, \sigma_k\)),而将其他所有位置置为 \(0\)。这正好对应于我们构造的截断矩阵 \(\Sigma_k\)。
Spring, 2024
Consider the Gauss-Seidel iteration for solving the linear equation \(Ax=b\) for \(A=[a_{ij}] \in \mathbb{R}^{n \times n}\), \(b=[b_{i}] \in \mathbb{R}^{n}\) with \(a_{ii} \ne 0\) for \(i=1,...,n\):
1: determine \(x^{(0)} \in \mathbb{R}^{n}\)
2: for \(k=0,1,2,...\) do
3: \(x_{i}^{(k+1)} = \frac{1}{a_{ii}}\left(b_{i} - \sum_{j=1}^{i-1}a_{ij}x_{j}^{(k+1)} - \sum_{j=i+1}^{n}a_{ij}x_{j}^{(k)}\right)\) for \(i=1,2,...,n\)
4: end for
(i) Write \(x^{(k)}=[x_{i}^{(k)}]\) and \(A=D-L-U\) where \(D, -L, -U\) are the diagonal, strictly lower triangular, strictly upper triangular parts of \(A\) respectively. Show that \(x^{(k+1)}=(D-L)^{-1}Ux^{(k)}+(D-L)^{-1}b\).
(ii) Prove that if \(A\) is diagonally dominant, then \(A\) is invertible, and the Gauss-Seidel iteration converges, namely \(\lim_{k \rightarrow \infty}x^{(k)}=A^{-1}b\) for any \(x^{(0)}\). (A matrix \(A\) is called diagonally dominant, if \(|a_{ii}| > \sum_{j \ne i}|a_{ij}|\) for \(i=1,...,n\).)
(iii) Prove that if \(A\) is symmetric and positive definite, then \(A\) is invertible, and the Gauss-Seidel iteration converges.
证明
\((i)\):第三步的更新公式可以重写为:
这给出了
\((ii)\):若存在非零向量 \(x\) 使得 \(Ax=0\),则对于某个 \(i\),有:
取 \(|x_i| = \max_{j} |x_j|\),则有 \(|a_{ii}|<\sum_{j \ne i} |a_{ij}|\),与 \(A\) 的对角占优矛盾。因此 \(A\) 是可逆的。
设精确解 \(x^* = (D-L)^{-1}Ux^* + (D-L)^{-1}b\)。定义误差向量 \(e^{(k)} = x^{(k)} - x^*\)。有
展开为分量形式:
\(\rho_i = \frac{1}{|a_{ii}|} \sum_{j \ne i} |a_{ij}|<1\)。令 \(\rho = \max_i \rho_i < 1\)。
利用数学归纳法证明对于所有 \(i\),都有 \(|e_i^{(k+1)}| \le \rho ||e^{(k)}||_\infty\):
当 \(i=1\) 时:\(|e_1^{(k+1)}| \le \frac{1}{|a_{11}|} \sum_{j=2}^n |a_{1j}| |e_j^{(k)}| \le \rho_1 ||e^{(k)}||_\infty \le \rho ||e^{(k)}||_\infty\)。
假设对于 \(j < i\),有 \(|e_j^{(k+1)}| \le \rho ||e^{(k)}||_\infty\) 成立。对于第 \(i\) 项:
因为 \(\rho < 1\),我们可以将第一项中的 \(\rho\) 放缩为 1:
结合 \(\rho<1\) 可知收敛到0。
\((iii)\):由于 \(A\) 是对称正定矩阵,故特征值均为正,\(A\) 可逆。
现在只需证明迭代矩阵 \(G = (D-L)^{-1}U\) 的谱半径 \(\rho(G) < 1\)。设 \(v\) 是 \(G\) 的特征向量,对应特征值 \(\lambda\),则有:
等式两边左乘 \(v^*\),得到:
设 \(a=v^* A v>0,d=v^* D v>0,c=v^* L v\),根据 \(A=D-L-L^T\),两边乘上 \(v^*\) 和 \(v\),得到
将其带回上式,得到
带回 \(a=d-c-\bar{c}\),得到
这给出了 \(|\lambda|<1\)。
Spring, 2024
Given the Hilbert matrix \(H_{n} = \left[\frac{1}{i+j-1}\right]_{i,j=1,...,n}\),Prove:
(i) \(H_{n}\) is positive definite.
(ii) The spectral radius \(\rho(H_{n})\) of \(H_{n}\) is strictly monotonically increasing with respect to \(n\).
(iii) \(\rho(H_{n}) \rightarrow \pi\) as \(n \rightarrow \infty\).
证明
\((i)\):对于任意非零向量 \(x \in \mathbb{R}^n\),有:
\((ii)\):因为 \(H_n\) 是对称矩阵,其谱半径等于最大特征值:
设 \(x^* \in \mathbb{R}^n\) 是对应于 \(\rho(H_n)\) 的特征向量,且令其模长为 1,即 \((x^*)^T x^* = 1\)。构造 \(\tilde{x} = \begin{pmatrix} x^* \\ 0 \end{pmatrix} \in \mathbb{R}^{n+1}\)。
等号显然不可能被取到。
\((iii)\):我们先来证明
由柯西不等式
最后一个不等号由 \(\sqrt{u} \cdot v^{-1/2} (u+v)^{-1}\) 的凸性和单调性得到,这里 \(u=i-1/2,v=j-1/2\)。计算积分得到 \(\pi\)。
最后可以取 \(x_i = \frac{1}{\sqrt{i}}\) 来从下方逼近 \(\pi\)。
Autumn, 2024
Let \(A\) be a symmetric positive definite matrix. Assume that the conjugate gradient method is applied to solve the linear system \(Ax=b\) where \(x^{*}\) is the exact solution.
(i) Prove the following error estimate:
where \(x_{k}\) is the solution obtained by the \(k\)-th iteration, \(||\cdot||_{A}\) denotes the \(A\)-norm defined by \(||x||_{A}=\sqrt{x^{T}Ax}\), and \(\kappa_{2}(A)=\frac{\max \lambda(A)}{\min \lambda(A)}\) denotes the condition number of \(A\) under the \(l_{2}\) norm. (If you apply any theorem from approximation theory, please state the theorem clearly. No need to prove that theorem.)
(ii) Describe one Krylov subspace method for non-Hermitian matrices (\(A\ne A^{*}\)). (You can use pseudo-code or words/formulas as long as the steps are clear.)
证明
\((i)\):设 \(e_k = x_k - x^*\) 为第 \(k\) 步的误差。共轭梯度法即在 Krylov 子空间 \(\mathcal{K}_k(A, r_0) = \text{span}\{r_0, Ar_0, \dots, A^{k-1}r_0\}\) 中寻找解,使得 \(A\)-范数误差最小化。这等价于误差可以表示为一个矩阵多项式作用在初始误差上:
其中 \(P_k \in \Pi_k^1\) 是所有满足 \(P_k(0) = 1\) 的 \(k\) 次多项式的集合。由于 CG 方法最小化 \(A\)-范数误差,我们有:
因为 \(A\) 是对称正定矩阵 ,它具有正交的特征向量基 \(\{v_i\}\) 和正实数特征值 \(\lambda_i\),且 \(\lambda_{\min} = \lambda_1 \le \lambda_2 \le \dots \le \lambda_n = \lambda_{\max}\)。 将 \(e_0\) 按特征向量展开:\(e_0 = \sum c_i v_i\),那么 \(||P(A) e_0||_A^2 = \sum c_i^2 \lambda_i (P(\lambda_i))^2\)。 由此可以得到一个上限界(将离散的特征值放松到连续区间 \([\lambda_{\min}, \lambda_{\max}]\) 上):
我们将区间 \([\lambda_{\min}, \lambda_{\max}]\) 线性映射到 \([-1, 1]\),并满足 \(P(0)=1\) 的约束,最优多项式为:
进而有
而
代入即得。
Spring, 2025
The Sherman-Morrison formula (or the generalized version named Sherman-Morrison-Woodbury formula) is a useful tool in numerical linear algebra. It states: if \(A\in\mathbb{R}^{n\times n}\) is an invertible matrix, \(u,v\in\mathbb{R}^{n}\) are two vectors satisfying \(1+v^{T}A^{-1}u\ne0\); then the matrix \(A+uv^{T}\) is invertible and can be computed by
(i) Prove the above statement.
(ii) If \(\{s_{j}\}_{j=1}^{n}\) and \(\{t_{j}\}_{j=1}^{n}\) are two sets of points in \([0, 1]\), \(A\in\mathbb{R}^{n\times n}\) is a matrix defined by:
where \(\delta_{j,k}\) is the Kronecker delta (\(\delta_{j,k}=1\) when \(j=k\), \(\delta_{j,k}=0\) otherwise). \(b\in\mathbb{R}^{n}\) is an arbitrary vector. Please give methods for the evaluation of \(Ab\), \(A^{-1}b\), and \(\det(A)\), all with the computational complexity of \(O(n)\).
证明
\((i)\):纯计算。
\((ii)\):利用三角恒等式 \(\cos(t_j - s_k) = \cos t_j \cos s_k + \sin t_j \sin s_k\),我们可以将矩阵 \(A\) 写成:
其中:\(D = 4n I\) 是对角矩阵。\(\mathbf{c}_t, \mathbf{s}_t, \mathbf{c}_s, \mathbf{s}_s \in \mathbb{R}^n\) 分别是元素为 \(\cos t_j, \sin t_j, \cos s_k, \sin s_k\) 的列向量。令 \(U = [\mathbf{c}_t, \mathbf{s}_t]\), \(V = [\mathbf{c}_s, \mathbf{s}_s]\),则 \(A\) 可以写成
对于 \(Ab\):\(Ab=Db + U(V^Tb)\),其中 \(Db\) 的计算复杂度为 \(O(n)\),\(V^Tb\) 的计算复杂度为 \(O(n)\),因此 \(Ab\) 的计算复杂度为 \(O(n)\)。
对于 \(A^{-1}b\):我们有
中间矩阵 \(M= I_2 + \frac{1}{4n} V^T U\) 复杂度为 \(O(n)\),其逆的计算复杂度为 \(O(1)\),\(V^Tb\) 的计算复杂度为 \(O(n)\),\(M^{-1} (V^Tb)\) 的计算复杂度为 \(O(1)\),\(U (M^{-1} (V^Tb))\) 的计算复杂度为 \(O(n)\),因此 \(A^{-1}b\) 的计算复杂度为 \(O(n)\)。
对于 \(\det(A)\):我们有
中间矩阵 \(M= I_2 + \frac{1}{4n} V^T U\) 复杂度为 \(O(n)\),其行列式的计算复杂度为 \(O(1)\),因此 \(\det(A)\) 的计算复杂度为 \(O(n)\)。
Autumn,2025
In many applications generalized eigenvalue problem is needed to be solved: finding \(x \in \mathbb{C}^n\) and \(\mu, \lambda \in \mathbb{C}\), s.t.
for given \(A, B \in \mathbb{C}^{n \times n}\). Here \(x\) and \((\lambda, \mu)\) are called the eigenvector and eigenvalue of the pair \((A, B)\) respectively. If \(B\) is nonsingular, this problem is equivalent to the standard eigenvalue problem \(A B^{-1} y = \mu^{-1} \lambda y\).For two unitary matrices \(Q, Z\), \(Q^H A Z = S, Q^H B Z = T\) is called a UET performing on \((A, B)\), written as \(Q^H(A, B)Z = (S, T)\) for ease.
(a) Give an example that \((A, B)\) has more than \(n\) eigenvalues.
(b) Prove that there exists a UET such that \(Q^H(A, B)Z = (S, T)\), where \(S, T\) are upper triangular.
(c) Can you obtain the eigenvalues of \((A, B)\) from \((S, T)\)? How?
(d) Propose a method to construct a UET such that \(Q_0^H(A, B)Z_0 = (A_0, B_0)\) where \(A_0\) is upper Hessenberg, and \(B_0\) is upper triangular. For ease, you may assume \(n=4\).
证明
\((a)\):只需 \(\mathrm{det}(\mu A - \lambda B)=0\) 的解超过 \(n\) 个即可,找 \(A,B\) 有共同的零空间就得到 \((A, B)\) 有无数个特征值。
\((b)\):归纳证明之。首先存在非平凡解 \((\lambda,\mu)\) 使得 \(\mathrm{det}(\lambda A-\mu B)=0\),进而存在单位向量 \(z\) 使得 \(\lambda A z= \lambda B z\),进而可以设 \(Az=\alpha q,Bz=\beta q\)。可以补全矩阵使得 \(Z_1=[z,\cdots],Q_1=[q,\cdots]\) 是酉矩阵,则 \(Q_1^HAZ_1\) 和 \(Q_1^HBZ_1\) 的第一列分别为 \(\alpha e_1\) 和 \(\beta e_1\),设 \(Q_1^HAZ_1=\begin{pmatrix} \alpha & * \\ 0 & A_1 \end{pmatrix}\),\(Q_1^HBZ_1=\begin{pmatrix} \beta & * \\ 0 & B_1 \end{pmatrix}\),则对 \((A_1, B_1)\) 进行归纳假设,存在 \(Q_2, Z_2\) 使得 \(Q_2^HA_1Z_2\) 和 \(Q_2^HB_1Z_2\) 都是上三角矩阵,则 \(Q= Q_1\begin{pmatrix} 1 & 0 \\ 0 & Q_2 \end{pmatrix}\) 和 \(Z=Z_1\begin{pmatrix} 1 & 0 \\ 0 & Z_2 \end{pmatrix}\) 满足要求。
\((c)\):由 \((b)\) 可知 \(Q^H(A, B)Z = (S, T)\),故 \(\det(\lambda A-\mu B)=\det(Q)\det(\lambda S-\mu T)\det(Z^H)=0\) 等价于 \(\det(\lambda S-\mu T)=0\),由于 \(S\) 和 \(T\) 都是上三角矩阵,因此 \(\det(\lambda S-\mu T)=\prod_{i=1}^n (\lambda S_{ii}-\mu T_{ii})=0\),得到特征值 \((\lambda, \mu) = (S_{ii}, T_{ii})\),其中 \(i=1, 2, \cdots, n\)。
\((d)\):不会。
Autumn,2025
Except LU-type factorizations, matrix inversion can be obtained by iterations. Given \(A \in \mathbb{R}^{n \times n}\), consider the sequence \(\{X_k\}\) generated by:
Prove:
(a) if \(\|I - A X_0\| < 1\), then \(A\) is nonsingular and \(X_k \to A^{-1}\) quadratically.
(b) there exists \(X_0\) such that \(\|I - A X_0\| < 1\), if and only if \(A\) is nonsingular.
证明
\((a)\):若 \(\|I - A X_0\| < 1\),则 \(AX_0\) 是可逆的,因此 \(A\) 是可逆的。再注意到有
即得 \(X_k \to A^{-1}\),且收敛速度为二次。
\((b)\):如果 \(A\) 是可逆的,则 \(X_0 = A^{-1}\) 满足 \(\|I - A X_0\| = 0 < 1\)。反之,如果存在 \(X_0\) 满足 \(\|I - A X_0\| < 1\),则 \(AX_0\) 是可逆的,因此 \(A\) 是可逆的。
.
Let \(A \in \mathbb{R}^{n \times n}\) be such that \(A = (1+\omega)P - (N+\omega P)\), with \(P^{-1}N\) nonsingular and with real eigenvalues \(1 > \lambda_1 \geqslant \lambda_2 \geqslant \ldots \geqslant \lambda_n\). Find the values of \(\omega \in \mathbb{R}\) for which the following iterative method\(\((1+\omega)P x_{k+1} = (N+\omega P)x_k + \mathbf{b}, \quad k \geqslant 0,\)\)converges any \(x_0\) to the solution of the linear system \(Ax = b\). Determine also the value of \(\omega\) for which the convergence rate is maximum.
解
设 \(e_k=x_k-x\),则 \((1+\omega)P e_{k+1} = (N+\omega P)e_k\),迭代矩阵为 \((1+\omega)^{-1}P^{-1}(N+\omega P)\). 其特征值为 \(\frac{\lambda_j + \omega}{1+\omega}\),因此迭代收敛的条件是 \(|\frac{\lambda_j + \omega}{1+\omega}| < 1\),即 \(\lambda_j < 1\) 和 \(\omega > -\frac{2}{1-\lambda_n}\)。当 \(\omega = -\frac{2}{1-\lambda_n}\) 时,迭代矩阵的谱半径最小,为 \(\frac{1-\lambda_1}{1-\lambda_n}\)。
.
Problem 1. Find the generalized eigenvalue \(\lambda\) and the eigenvector \((x_1, x_2, \cdots, x_n)^T\), such that
解
容易得到 \(x_{k-1}+x_{k+1}=\frac{2-4\lambda}{1+\lambda} x_k\),这里 \(x_0=x_{n+1}=0\)。\(x_k=\sin (k\theta_j)\) 得到 \(\lambda=\frac{1-\cos \theta_j}{2+\cos \theta_j}\),其中 \(\theta_j=\frac{j\pi}{n+1}\)。
.
Find the eigenvalues and eigenvectors of the following \(N \times N\) tridiagonal matrix:
where \(a, b, c\) are constants, and \(ac > 0\).
.
Suppose an \(n \times n\) matrix \(A\) is given by
For the linear system \(Ax = b\), prove that
where the constant \(C\) is independent of the dimension \(n\).
解
首先要求 \(A\) 可逆即 \(r\neq 1\),故
而 \(\|A^{-1}\|=\frac{1}{\lambda_{\min}}\),其中 \(\lambda_{\min}\) 是 \(A\) 的最小特征值。容易得到 \(A\) 的特征值为 \(1+r e^{i\theta_j}\),其中 \(\theta_j=\frac{2\pi j}{n}\),因此 \(\lambda_{\min} \geq 1 - |r|\),最后我们有
.
(Circulant Matrices) Every circulant matrix \(C\) has eigenvectors
and corresponding eigenvalues
It can be expressed in the form \(C = U \Psi U^*\), where \(U\) has the eigenvectors as columns in order and \(\Psi = \text{diag}(\psi_k)\). In particular all circulant matrices share the same eigenvectors, the same matrix \(U\) works for all circulant matrices, and any matrix of the form \(C = U \Psi U^*\) is circulant.
Prove the above theorem.
解
纯计算,不写。
.
(Companion Matrix) Let
be a companion matrix with distinct eigenvalues \(\lambda_1, \ldots, \lambda_n\). Show that
where
解
纯计算,不写。
2013 Term
Let \((F_n)_n\) be the Fibonacci sequence, with \(F_0 = 0\), \(F_1 = 1\), and \(F_{n+2} = F_{n+1} + F_n\) for \(n \geq 0\).
- (a) Establish a relation between \(\begin{pmatrix} 0 & 1 \\ 1 & 1 \end{pmatrix}^n\) and \(F_n\).
- (b) Use it to design an efficient algorithm that for a given \(n\) computes the \(n\)-th Fibonacci number \(F_n\). In particular, it must be more efficient than computing \(F_n\) in \(n\) consecutive steps.
- (c) Give an estimate on the number of steps of your algorithm.
Hint: Note that if \(m\) is even then
and if \(m\) is odd then
and \(m-1\) is even.
证明
\((a)\) 由数学归纳法可知
\((c)\) 容易得到第 \(n\) 个 Fibonacci 数需要 \(O(\log n)\) 步。
2018 Term
Take \(\sigma_i(A)\) to be the \(i\)-th singular value of the square matrix \(A \in \mathbb{R}^{n \times n}\). Define the nuclear norm of \(A\) to be
- (a) Show that \(\|A\|_* = \operatorname{tr}(\sqrt{A^T A})\).
- (b) Show that \(\|A\|_* = \max_{X^T X = I} \operatorname{tr}(AX)\).
- (c) Show that \(\|A + B\|_* \leq \|A\|_* + \|B\|_*\).
- (d) Explain informally why minimizing \(\|A - A_0\|_*^2 + \|A\|_*\) over \(A\) for a fixed \(A_0 \in \mathbb{R}^{n \times n}\) might yield a low-rank approximation of \(A_0\).
证明
\((a)\) 设 \(A = U \Sigma V^T\) 是 \(A\) 的 SVD 分解,则
因此 \(\operatorname{tr}(\sqrt{A^T A}) = \operatorname{tr}(V \Sigma V^T) = \operatorname{tr}(\Sigma) = \|A\|_*\)。
\((b)\) 设 \(A = U \Sigma V^T\) 是 \(A\) 的 SVD 分解,则
最后一个等号是因为 \(\Sigma\) 是一个对角矩阵,且 \(V^T X U\) 是一个正交矩阵,因此 \(\operatorname{tr}(\Sigma V^T X U) \leq \operatorname{tr}(\Sigma)\)。
\((c)\) 由三角不等式可知
\((d)\) 由于 \(\|A\|_*\) 是 \(A\) 的奇异值之和,因此当 \(A\) 的秩较低时,\(\|A\|_*\) 也较小。因此,通过最小化 \(\|A - A_0\|_*^2 + \|A\|_*\),我们可以鼓励 \(A\) 的秩较低,从而得到 \(A_0\) 的一个低秩近似。
2018 Term
Suppose \(A \in \mathbb{R}^{m \times n}\) with \(m \geq n\) and \(r = \text{rank}(A) < n\), and assume \(A\) has the following SVD:
where \(\Sigma_1\) is \(r \times r\) nonsingular and \(U_1, V_1\) have \(r\) columns. Let \(\sigma = \sigma_{\min}(\Sigma_1)\), the smallest nonzero singular value of \(A\). Consider the least square problem, for some \(b \in \mathbb{R}^m\),
-
(a) Show that all solutions \(x\) can be written as
\[ x = V_1 \Sigma_1^{-1} U_1^T b + V_2 z_2, \]with \(z_2\) an arbitrary vector.
-
(b) Show that the solution \(x\) has minimal norm \(\|x\|_2\) precisely when \(z_2 = 0\), and in which case,
\[ \|x\|_2 \leq \frac{\|b\|_2}{\sigma}. \]
证明
\((a)\):
取等号时 \(y_1 = \Sigma_1^{-1} U_1^T b\),而 \(y_2\) 可以是任意值,因此 \(x = V y = V_1 \Sigma_1^{-1} U_1^T b + V_2 z_2\)。
\((b)\):\(\|x\|_2^2=\|Vx\|_2^2=\|y\|_2^2=\|y_1\|_2^2+\|y_2\|_2^2\),因此当 \(y_2 = 0\) 时,\(\|x\|_2\) 最小,即 \(x = V_1 \Sigma_1^{-1} U_1^T b\),并且此时
2018 Term
Let \(C\) and \(D\) in \(\mathbb{C}^{n \times n}\) be Hermitian matrices. Denote their eigenvalues by
respectively. Then it is known that
-
(a) Let \(A\) and \(B\) be in \(\mathbb{C}^{n \times n}\). Denote their singular values by
\[ \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_n \quad \text{and} \quad \tau_1 \geq \tau_2 \geq \cdots \geq \tau_n, \]respectively. Prove that the following inequality holds:
\[ \sum_{i=1}^n (\sigma_i - \tau_i)^2 \leq \|A - B\|_F^2. \] -
(b) Suppose \(A = U \Sigma V^T\), where \(U = (u_1, u_2, \ldots, u_n)\), \(V = (v_1, v_2, \ldots, v_n)\) are orthogonal matrices, \(\Sigma = \text{diag}(\sigma_1, \sigma_2, \ldots, \sigma_n)\) with \(\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_n \geq 0\). Suppose \(\text{rank}(A) > k\) and denote by
\[ U_k = (u_1, u_2, \ldots, u_k), \quad V_k = (v_1, v_2, \ldots, v_k), \quad \Sigma_k = \text{diag}(\sigma_1, \sigma_2, \ldots, \sigma_k), \]and
\[ A_k = U_k \Sigma_k V_k^T = \sum_{i=1}^k \sigma_i u_i v_i^T. \]Prove that
\[ \min_{\text{rank}(B) = k} \|A - B\|_F^2 = \|A - A_k\|_F^2 = \sum_{i=k+1}^n \sigma_i^2. \] -
(c) Let the vectors \(x_i \in \mathbb{R}^n\), \(i = 1, 2, \ldots, n\), be in the space \(\mathcal{W}\) with dimension \(d\), where \(d \ll n\). Let the orthonormal basis of \(\mathcal{W}\) be \(W \in \mathbb{R}^{n \times d}\). Then we can represent \(x_i\) by
\[ x_i = c + W r_i + e_i, \quad i = 1, 2, \ldots, n, \]where \(c \in \mathbb{R}^n\) is a constant vector, \(r_i \in \mathbb{R}^d\) are coordinates, and \(e_i\) is the error. Denote \(R = (r_1, r_2, \ldots, r_n)\) and \(E = (e_1, e_2, \ldots, e_n)\). Find \(W, R\) and \(c\) such that the error \(\|E\|_F\) is minimized.
Hint: Write \(X = [x_1, x_2, \ldots, x_n] = c \mathbf{1}^T + W R + E\), where \(\mathbf{1}\) is the vector of all ones.
证明
\((a)\):令 \(\tilde{A}=\begin{bmatrix} 0 & A \\ A^T & 0 \end{bmatrix}\) 其余同理。则由给定引理得
最后一个等号是因为 \(\tilde{A}\) 和 \(\tilde{B}\) 的特征值为 \(\pm \sigma_i\) 和 \(\pm \tau_i\)。另一方面我们有
\((b)\):设 \(B\) 的秩为 \(k\),则 \(B\) 的特征值 \(\tau_i = 0\) for \(i > k\),因此
取等验证:
其中 \(\sigma_i\) 是 \(A-A_k\) 的奇异值,因此 \(\|A - A_k\|_F^2 = \sum_{i=k+1}^n \sigma_i^2\)。
\((c)\):相当于最小化 \(\|E\|_F^2 = \|X - c \mathbf{1}^T - W R\|_F^2\)。固定 \(W\) 和 \(R\),设 \(J(c)=\sum_{i=1}^n \|x_i - c - W r_i\|_2^2\),则 \(J(c)\) 是 \(c\) 的二次函数,且当 \(c = \frac{1}{n} \sum_{i=1}^n (x_i - W r_i)\) 时取得最小值。不妨 \(\sum r_i=0\),此时 \(c = \bar{x}=\frac{1}{n} \sum_{i=1}^n x_i\)。令 \(\bar{X}=X-\bar{x}1^T\),则相当于最小化 \(\|E\|_F^2 = \|\bar{X} - W R\|_F^2\)。固定 \(W\) 且 \(W^TW=I\),\(\|\bar{X}-WR\|_F^2\) 对 \(R\) 求导得 \(R=W^T \bar{X}\),此时相当于最小化 \(\|E\|_F^2 = \|\bar{X} - W W^T \bar{X}\|_F^2=\|\bar{X}\|_F^2 - \|W^T \bar{X}\|_F^2\)。因此我们需要最大化 \(\|W^T \bar{X}\|_F^2\),此时 \(W\) 的列向量是 \(\bar{X} \bar{X}^T\) 的前 \(d\) 个特征向量,
2018 Term
Let \(A \in \mathbb{R}^{n \times m}\) with rank \(r < \min(m, n)\). Let \(A = U \Sigma V^T\) be the SVD of \(A\), with singular values \(\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0\).
-
(a) Show that, for every \(\epsilon > 0\), there is a full rank matrix \(A_\epsilon \in \mathbb{R}^{n \times m}\) such that
\[ \|A - A_\epsilon\|_2 = \epsilon. \] -
(b) Let \(A_k = U \Sigma_k V^T\) where \(\Sigma_k = \text{diag}(\sigma_1, \ldots, \sigma_k, 0, \ldots, 0)\) and \(1 \leq k \leq r-1\). Show that \(\text{rank}(A_k) = k\) and
\[ \sigma_{k+1} = \|A - A_k\|_2 = \min \{ \|A - B\|_2 \mid \text{rank}(B) \leq k \}. \] -
(c) Assume that \(r = \min(m, n)\). Let \(B \in \mathbb{R}^{n \times m}\) and assume that \(\|A - B\|_2 < \sigma_r\). Show that \(\text{rank}(B) = r\).
证明
\((a)\):设 \(A_\epsilon = U \Sigma_\epsilon V^T\),其中 \(\Sigma_\epsilon = \text{diag}(\sigma_1, \ldots, \sigma_r, \epsilon, \ldots, \epsilon)\),则 \(\|A - A_\epsilon\|_2 = \|\Sigma - \Sigma_\epsilon\|_2 = \epsilon\)。
\((b)\):\(\text{rank}(A_k) = k\) 是显然的。下设 \(B\) 是任意一个满足 \(\text{rank}(B) \leq k\) 的矩阵,考虑 \(B\) 的零空间,则 \(\mathrm{dim} \mathcal{N}(B) \geq m-k\),再考虑 \(A\) 的前 \(k+1\) 个右奇异向量张成的空间 \(V_{k+1}\),则 \(\mathrm{dim} V_{k+1} = k+1\),因此 \(V_{k+1} \cap \mathcal{N}(B) \neq \{0\}\)。设 \(x\) 是 \(V_{k+1} \cap \mathcal{N}(B)\) 中的一个非零单位向量,则
由于 \(x\) 是 \(V_{k+1}\) 中的一个非零单位向量,因此 \(x\) 可以表示为 \(x = \sum_{i=1}^{k+1} \alpha_i v_i\),其中 \(\sum_{i=1}^{k+1} \alpha_i^2 = 1\)。因此
\((c)\):由 Weyl 定理可知 \(|\sigma_i(A) - \sigma_i(B)| \leq \|A - B\|_2\),因此 \(\sigma_r(B) \geq \sigma_r(A) - \|A - B\|_2 > 0\),因此 \(\text{rank}(B) = r\)。
.
Let \(b \in \mathbb{R}^n\). Suppose \(A \in M_{n \times n}(\mathbb{R})\) and \(B \in M_{n \times n}(\mathbb{R})\) are two \(n \times n\) matrices. Let \(A\) be non-singular.
- (a) Consider the iterative scheme: \(A x^{k+1} = b - B x^k\). State and prove the necessary and sufficient condition for the iterative scheme to converge.
- (b) Suppose the spectral radius of \(A^{-1}B\) satisfies \(\rho(A^{-1}B) = 0\). Prove that the iterative scheme converges in \(n\) iterations.
-
(c) Consider the following iterative scheme:
\[ x^{(k+1)} = \omega_1 x^{(k)} + \omega_2 (c_1 - M x^{(k)}) + \omega_3 (c_2 - M x^{(k)}) + \ldots + \omega_k (c_{k-1} - M x^{(k)}), \]where \(M\) is symmetric and positive definite, \(\omega_1 > 1\), \(\omega_2, \ldots, \omega_k > 0\) and \(c_1, \ldots, c_{k-1} \in \mathbb{R}^n\). Deduce from (a) that the iterative scheme converges if and only if all eigenvalues of \(M\) (denote it as \(\lambda(M)\)) satisfy:
\[ \frac{\omega_1 - 1}{\sum_{i=2}^k \omega_i} < \lambda(M) < \frac{\omega_1 + 1}{\sum_{i=2}^k \omega_i}. \] -
(d) Let \(A\) be non-singular. Now, consider the following system of iterative scheme (*):
\[ \begin{aligned} A x_1^{(k+1)} &= b_1 - B x_2^{(k)}, \\ A x_2^{(k+1)} &= b_2 - B x_1^{(k)}. \end{aligned} \]Find and prove the necessary and sufficient condition for the iterative scheme (*) to converge.
For the iterative scheme (**):
\[ \begin{aligned} A x_1^{(k+1)} &= b_1 - B x_2^{(k)}, \\ A x_2^{(k+1)} &= b_2 - B x_1^{(k+1)}. \end{aligned} \]Find and prove the necessary and sufficient condition for the iterative scheme (**) to converge. Compare the rate of convergence of the iterative schemes (*) and (**).
证明
\((c)\):由 (a) 可知,迭代收敛的充分必要条件是 \(\rho(G) < 1\),其中
因此 \(\rho(G) = \max_{\lambda \in \lambda(M)} |\omega_1 - \left( \sum_{i=2}^k \omega_i \right) \lambda| < 1\),解出 \(\lambda\) 的范围即得结论。
\((d)\):迭代格式 (*) 的迭代矩阵为
迭代格式 (**) 的迭代矩阵为
由 (a) 可知,迭代格式 (*) 收敛的充分必要条件是 \(\rho(G_*) < 1\),迭代格式 (**) 收敛的充分必要条件是 \(\rho(G_{**}) < 1\)。由于 \(G_*^2 = G_{**}\),因此 \(\rho(G_*)^2 = \rho(G_{**})\),因此 \(\rho(G_*) < 1\) 当且仅当 \(\rho(G_{**}) < 1\)。因此两种迭代格式的收敛性条件相同。另一方面,由于 \(G_{**} = G_*^2\),因此 \(G_{**}^k = G_*^{2k}\),因此迭代格式 (**) 的收敛速度比迭代格式 (*) 快。
.
(Richardson Iteration) Let \(A\) be an \(n \times n\) matrix with real and positive eigenvalues and \(b\) be a given vector. Consider the solution of \(Ax = b\) by the following Richardson's iteration
where \(\omega\) is a damping coefficient. Let \(\lambda_1\) and \(\lambda_n\) be the smallest and the largest eigenvalues of \(A\). Let \(G_\omega = I - \omega A\).
-
(a) Prove that the Richardson's iteration converges if and only if
\[ 0 < \omega < \frac{2}{\lambda_n}. \] -
(b) Prove that the optimal choice of \(\omega\) is given by
\[ \omega_{\text{opt}} = \frac{2}{\lambda_1 + \lambda_n}. \]Prove also that
\[ \rho(G_\omega) = \begin{cases} 1 - \omega \lambda_1, & \omega \leq \omega_{\text{opt}}, \\ \frac{\lambda_n - \lambda_1}{\lambda_n + \lambda_1}, & \omega = \omega_{\text{opt}}, \\ \omega \lambda_n - 1, & \omega \geq \omega_{\text{opt}}. \end{cases} \]where \(\rho(G_\omega)\) is the spectral radius of \(G_\omega\).
-
(c) Prove that, if \(A\) is symmetric and positive definite, then
\[ \rho(G_{\omega_{\text{opt}}}) = \frac{\kappa_2(A) - 1}{\kappa_2(A) + 1}, \]where \(\kappa_2(A) = \lambda_n / \lambda_1\).
.
(Power Method) Suppose \(A\) is an \(m \times m\) matrix with a complete set of orthonormal eigenvectors \(q_1, \ldots, q_m\) and corresponding eigenvalues \(\lambda_1, \ldots, \lambda_m\). Assume that \(|\lambda_1| > |\lambda_2| > |\lambda_3|\) and \(\lambda_j \geq \lambda_{j+1}\) for \(j = 3, \ldots, m\). Consider the power method \(\psi^{(k)} = A \psi^{(k-1)} / \lambda_1\), with \(\psi^{(0)} = \alpha_1 q_1 + \cdots + \alpha_m q_m\) where \(\alpha_1\) and \(\alpha_2\) are both nonzero. Show that the sequence \(\{\psi^{(k)}\}_{k=0}^{\infty}\) converges linearly to \(\alpha_1 q_1\) with asymptotic constant \(C = |\lambda_2 / \lambda_1|\).
证明
由定义可知
因此
因此 \(\{\psi^{(k)}\}\) 以线性速率收敛到 \(\alpha_1 q_1\),且渐近常数为 \(C = |\lambda_2 / \lambda_1|\)。
.
(Lanczos Iteration) Let \(A \in \mathbb{R}^{n \times n}\) be symmetric and let \(\|q_1\|_2 = 1\). Consider the following Lanczos iteration:
Let \(\mathcal{K}_n = \text{span}\{q_1, A q_1, \cdots, A^{n-1} q_1\}\).
-
(a) Show that
\[ A Q_k = Q_k T_k + r_k e_k^T, \]where \(e_k\) is the \(k\)-th unit vector, \(Q_k = [q_1 \cdots q_k]\) and
\[ T_k = \begin{bmatrix} \alpha_1 & \beta_1 & & & \\ \beta_1 & \alpha_2 & \beta_2 & & \\ & \beta_2 & \alpha_3 & \ddots & \\ & & \ddots & \ddots & \beta_{k-1} \\ & & & \beta_{k-1} & \alpha_k \end{bmatrix}. \] -
(b) Assume that the iteration does not terminate. Show that \(Q_k\) has orthonormal columns, and that they span \(\mathcal{K}_k\).
- (c) Show that the Lanczos iteration will stop when \(k = m\), where \(m = \text{rank}(\mathcal{K}_n)\).
- (d) What is the purpose of this algorithm? Briefly justify your answer.
证明
\((a)\):由定义即得。
\((b)\):归纳即证 \(Q_k\) 列正交。后者也可由归纳即得。
\((c)\):当 \(k = m\) 时,\(\mathcal{K}_m = \mathcal{K}_{m+1}\),因此 \(r_m = 0\),迭代停止。这表明
故 \(\mathcal{K}_m\) 是 \(A\) 的不变子空间,因此 \(\mathcal{K}_m = \mathcal{K}_{m+1} = \cdots\)。 另一方面,如果 \(k < m\),则 \(\mathcal{K}_k \subsetneq \mathcal{K}_{k+1}\),因此 \(r_k \neq 0\),迭代不会停止。
\((d)\):Lanczos 迭代的目的是求解大型稀疏矩阵的特征值问题。由于 \(T_k\) 是 \(A\) 在 \(\mathcal{K}_k\) 上的表示,因此 \(T_k\) 的特征值是 \(A\) 的 Ritz 值,且当 \(k\) 增大时,Ritz 值会收敛到 \(A\) 的特征值。
.
(Arnoldi Process) Given a vector \(b \in \mathbb{R}^m\) and \(A \in \mathbb{R}^{m \times m}\), the Arnoldi process is a systematic way of constructing an orthonormal basis for the successive Krylov subspaces
It gives
where \(Q_n \in \mathbb{R}^{m \times n}\), \(Q_{n+1} \in \mathbb{R}^{m \times (n+1)}\) have orthonormal columns and \(\tilde{H}_n \in \mathbb{R}^{(n+1) \times n}\) is upper-Hessenberg. Let \(H_n \in \mathbb{R}^{n \times n}\) be obtained by deleting the last row of \(\tilde{H}_n\).
- (a) Write out the Arnoldi algorithm.
- (b) Assume that at step \(n\), the \((n+1,n)\)-th entry of \(\tilde{H}_n\) is zero.
- i. Show that \(\mathcal{K}_n\) is an invariant subspace of \(A\) and that \(\mathcal{K}_n = \mathcal{K}_{n+1} = \mathcal{K}_{n+2} = \ldots\)
- ii. Show that each eigenvalue of \(H_n\) is an eigenvalue of \(A\) for \(n > 1\).
-
(c) Let \(\mathcal{P}_n\) be the set of monic polynomials of degree \(n\). Show that the minimizer of
\[ \min_{p_n \in \mathcal{P}_n} \|p_n(A) b\|_2 \]is given by the characteristic polynomial of \(H_n\).
证明
\((a)\):Arnoldi 算法如下:
\((b)\):这表明 \(v\) 已经在 \(\text{span}\{q_1, \ldots, q_n\}\) 中。第 \(n\) 列的关系为:
此时矩阵关系化简为 \(AQ_n=Q_n H_n\),这里 \(H_n\) 是 \(\tilde{H}_n\) 的前 \(n\) 行。对任意的 \(x\in \mathcal{K}_n\),存在 \(y\in \mathbb{R}^n\) 使得 \(x=Q_n y\),那么 \(Ax=A Q_n y=Q_n H_n y \in \mathcal{K}_n\),所以 \(\mathcal{K}_n\) 是 \(A\) 的不变子空间,因此 \(\mathcal{K}_n = \mathcal{K}_{n+1} = \cdots\)。
设 \(\lambda\) 是 \(H_n\) 的一个特征值,\(y\) 是对应的特征向量,则 \(A Q_n y = Q_n H_n y = \lambda Q_n y\),而由于 \(Q_n\) 的列向量是标准正交的,且 \(y\neq 0\),因此 \(Q_n y \neq 0\),因此 \(\lambda\) 是 \(A\) 的一个特征值。
\((c)\):由 \(A Q_n = Q_{n+1} \tilde{H}_n\) 可知 \(A^k Q_n=Q_n H_n^k\),进而有 \(p_n(A) Q_n = Q_n p_n(H_n)\)。因此
由于 \(Q_n\) 是列正交矩阵,
取 \(p_n\) 为 \(H_n\) 的特征多项式,则 \(p_n(H_n) = 0\),因此 \(\|p_n(A) b\|_2 = 0\),是最小的。
.
(Steepest Descent) Let \(A\) be a positive definite matrix. Consider the quadratic function \(g(x) = \frac{1}{2} x^T A x - b^T x\). Suppose the eigenvalues of \(A\) are given by \(0 < \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n\), and \(\kappa = \frac{\lambda_n}{\lambda_1}\). Let the sequence \(\{x_k\}\) be generated by the steepest descent method:
where \(\alpha_k\) is selected such that
Prove that
证明
取 \(x*\) 为 \(g(x)\) 唯一极小值点,满足 \(Ax^*=b\)。则
其中 \(e_k = x_k - x^*\)。
解得 \(\alpha_k = \frac{r_k^T r_k}{r_k^T A r_k}\)。因此
因此
由 Kantorovich 不等式可知对于正定矩阵 \(A\) 及其特征值 \(0 < \lambda_1 \leq \cdots \leq \lambda_n\),对任意非零向量 \(x\),有:
于是
.
(Conjugate Gradient Method) Let \(A\) be an \(N \times N\) symmetric positive definite matrix. The conjugate gradient method can be described as follows:
Show that:
- (a) \(\alpha_n\) minimizes \(f(\mathbf{x}_n + \alpha \mathbf{p}_n)\) for all \(\alpha \in \mathbb{R}\), where \(f(\mathbf{x}) \equiv \frac{1}{2} \mathbf{x}^T A \mathbf{x} - \mathbf{b}^T \mathbf{x}\).
- (b) \(\mathbf{p}_i^T \mathbf{r}_n = 0\) for \(i < n\) and \(\mathbf{p}_i^T A \mathbf{p}_j = 0\) if \(i \neq j\).
- (c) \(\text{Span}\{\mathbf{p}_0, \mathbf{p}_1, \ldots, \mathbf{p}_{n-1}\} = \text{Span}\{\mathbf{r}_0, \mathbf{r}_1, \ldots, \mathbf{r}_{n-1}\} \equiv \mathcal{K}_n\).
- (d) \(\mathbf{r}_n\) is orthogonal to \(\mathcal{K}_n\).
证明
\((a)\):上一问中已详细写出证明。
\((b)\):归纳即证。
\((c)\):\(\mathbf{r}_k=\mathbf{r}_{k-1}-\alpha_{k-1} A \mathbf{p}_{k-1}\in \mathcal{K}_{k+1}\) 且 \(\mathbf{r}_k\) 正交于之前的 \(\mathbf{p}_i\),故 \(\text{Span}\{\mathbf{r}_0,\cdots,\mathbf{r}_k\}=\mathcal{K}_{k+1}\)。另一方面 \(\mathbf{p}_k=\mathbf{r}_k+\beta_{k-1} \mathbf{p}_{k-1}\in \mathcal{K}_{k+1}\),因此 \(\text{Span}\{\mathbf{p}_0,\cdots,\mathbf{p}_k\} \subseteq \mathcal{K}_{k+1}\)。
\((d)\):由 (b) 可知 \(\mathbf{r}_n\) 正交于 \(\mathbf{p}_0, \ldots, \mathbf{p}_{n-1}\),由 (c) 可知 \(\mathbf{p}_0, \ldots, \mathbf{p}_{n-1}\) 张成 \(\mathcal{K}_n\),因此 \(\mathbf{r}_n\) 正交于 \(\mathcal{K}_n\)。
.
(GMRES Method) Consider the linear system \(Ax = b\). The GMRES method is a projection method which obtains a solution in the \(m\)-th Krylov subspace \(\mathcal{K}_m\) so that the residual is orthogonal to \(A \mathcal{K}_m\). Let \(r_0\) be the initial residual and let \(v_0 = r_0 / \|r_0\|_2\). The Arnoldi process is applied to build an orthonormal system \(v_1, v_2, \ldots, v_{m-1}\) with \(v_1 = A v_0 / \|A v_0\|_2\). The approximate solution is obtained from the space \(\mathcal{K}_m = \text{span}\{v_0, v_1, \ldots, v_{m-1}\}\).
- (i) Show that the approximate solution is obtained as the solution of a least-square problem, and that this problem is triangular.
- (ii) Prove that the residual \(r_k\) is orthogonal to \(\{v_1, v_2, \ldots, v_{k-1}\}\).
- (iii) Find a formula for the residual norm.
- (iv) Derive the complete algorithm.
.
(Structured Linear Systems)
-
(1) Suppose
\[ S = \begin{bmatrix} \sigma & u^T \\ 0 & S_c \end{bmatrix}, \quad T = \begin{bmatrix} \tau & v^T \\ 0 & T_c \end{bmatrix}, \quad b = \begin{bmatrix} \beta \\ b_c \end{bmatrix}, \]where \(\sigma, \tau, \beta\) are scalars, \(S_c\) and \(T_c\) are \(n \times n\) matrices, and \(b_c\) is an \(n\)-vector. Show that if there exists a vector \(\mathbf{x}_c\) such that
\[ (S_c T_c - \lambda I) \mathbf{x}_c = b_c, \]and \(\mathbf{w}_c = T_c \mathbf{x}_c\) is available, then
\[ \mathbf{x} = \begin{bmatrix} \gamma \\ \mathbf{x}_c \end{bmatrix}, \quad \gamma = \frac{\beta - \sigma v^T \mathbf{x}_c - u^T \mathbf{w}_c}{\sigma \tau - \lambda}, \]solves \((ST - \lambda I) \mathbf{x} = b\).
-
(2) Hence or otherwise, derive an \(O(n^2)\) algorithm for solving the linear system \((U_1 U_2 - \lambda I) \mathbf{x} = b\) where \(U_1\) and \(U_2\) are \(n \times n\) upper triangular matrices, and \((U_1 U_2 - \lambda I)\) is nonsingular. Please write down your algorithm and prove that it is indeed of \(O(n^2)\) complexity.
-
(3) Hence or otherwise, derive an \(O(p n^2)\) algorithm for solving the linear system \((U_1 U_2 \cdots U_p - \lambda I) \mathbf{x} = b\) where \(\{U_i\}_{i=1}^p\) are all \(n \times n\) upper triangular matrices, and \((U_1 U_2 \cdots U_p - \lambda I)\) is nonsingular. Please write down your algorithm and prove that it is indeed of \(O(p n^2)\) complexity.