Krylov SubspaceEdit
The Krylov subspace is a foundational concept in numerical linear algebra, arising when a matrix or linear operator is applied repeatedly to a starting vector. It underpins a broad class of efficient iterative methods for solving large sparse linear systems and for computing a few eigenvalues. The idea is to build successively richer subspaces from the action of the matrix, then extract approximations to the desired solution or spectral information from within those subspaces. This approach avoids forming the full matrix inverse or dense eigenproblems, which is crucial for large-scale computations in engineering and science.
In practice, Krylov subspaces are used to tame problems that would be intractable with direct methods. They are particularly valuable when the matrix is large, sparse, or only available through matrix-vector products. The methods built on Krylov subspaces – such as GMRES and its variants, the Conjugate Gradient method family for symmetric/positive-definite systems, and related constructions – deliver robust behavior on real-world problems ranging from structural analysis to fluid dynamics. The geometric idea is simple but powerful: by projecting the original problem onto a sequence of growing subspaces, one can obtain increasingly accurate approximate solutions with a computational footprint that scales favorably with problem size.
Overview
Definition and basic construction
For a given square matrix A and a starting vector b, the m-th Krylov subspace is defined as K_m(A,b) = span{b, Ab, A^2 b, ..., A^{m-1} b}. This sequence of subspaces grows with m, and each K_m(A,b) is contained in K_{m+1}(A,b). The vectors that form an orthonormal basis of these subspaces are typically generated by an iterative procedure such as the Arnoldi iteration or, when A is real and symmetric, the Lanczos method. These bases provide a compact representation of the subspace and enable efficient projection of the original problem onto a much smaller one.
Right vs left perspectives and key properties
K_m(A,b) is a right Krylov subspace since it is built from the action of A on the right on b. A closely related construction is the left Krylov subspace obtained from b^T A^k. Properties of Krylov subspaces include their nested structure, the relationship between the subspace dimension and convergence behavior, and the fact that A maps K_m into a nearby space K_{m+1}. In symmetric cases, the Arnoldi process reduces to the Lanczos process, yielding especially stable recurrences.
Role in solving linear systems and eigenvalue problems
Krylov subspace methods solve Ax ≈ b by seeking the solution x in the affine space x0 + K_m(A,r0) where r0 = b − A x0 is the initial residual. The methods differ in how they choose the approximate x_m within this subspace and what optimality criteria they impose (e.g., minimizing the residual norm in GMRES or reducing the A-norm of the error in CG). For eigenvalue problems, projecting A onto K_m(A,b) produces a small dense matrix whose eigenvalues approximate a subset of the eigenvalues of A, with refinements improving accuracy as m grows.
Algorithms and methods
Generalized Minimal Residual (GMRES)
GMRES chooses x_m in x0 + K_m(A,r0) to minimize the residual norm ||b − A x_m||. The method relies on orthonormal bases of K_m(A,r0), generated via the Arnoldi process, and a small Hessenberg system that determines x_m. GMRES is widely used for non-symmetric matrices because of its robust convergence properties, but memory usage grows with m, motivating restarted variants such as GMRES(k) that periodically reset the subspace size.
Conjugate Gradient and related SPD methods
The Conjugate Gradient (CG) method accelerates the solution of Ax = b when A is symmetric and positive definite. CG operates within the same Krylov subspace framework but leverages the SPD structure to yield short-term recurrences and guarantees of convergence in at most N steps (for an N-by-N system). MINRES extends similar ideas to certain indefinite systems, preserving short recurrences under appropriate conditions.
Arnoldi and Lanczos iterations
The Arnoldi iteration produces an orthonormal basis for K_m(A,b) and a reduced upper Hessenberg matrix H_m that captures the action of A in that subspace. The eigenvalues of H_m provide approximations to eigenvalues of A, with refinement as m grows. When A is real and symmetric, the Lanczos method yields a tridiagonal form and often faster, more storage-efficient progress for eigenvalue or linear-system tasks.
BiConjugate Gradient and related methods
BiCG and BiCGSTAB are Krylov-based approaches designed for non-symmetric problems, aiming to combine robustness with smoother convergence than some classical methods. These methods build bi-orthogonal bases in Krylov subspaces and apply short recurrences to advance the solution.
Preconditioning
Preconditioners transform the original system into one that has more favorable spectral properties for Krylov methods. Common strategies include incomplete factorizations (e.g., ILU), Jacobi/Jacobi-like preconditioners, SSOR, and multigrid-based approaches. Effective preconditioning accelerates convergence, often making the difference between practical and impractical solve times for large problems.
Restarting and recycling
Given memory constraints, restarted Krylov methods (e.g., GMRES(k)) periodically truncate and restart the subspace. Recycling techniques reuse information from prior solves to accelerate later ones, particularly in sequences of related linear systems. These strategies are important for time-stepping simulations and parameter studies where systems share structure across steps.
Computational and practical considerations
- Memory and cost: each increment of m adds a new basis vector and a corresponding work cost. This makes restarting and recycling essential in large-scale work.
- Orthogonality and numerical stability: in finite-precision arithmetic, maintaining orthogonality among Krylov basis vectors is challenging. Reorthogonalization or more stable implementations are often employed to preserve accuracy.
- Matrix-free and sparse matrix considerations: Krylov methods are well-suited to matrix-free contexts where only matrix–vector products are available. Exploiting sparsity in A is critical to achieving scalable performance.
- Parallelism and hardware: these methods map well to parallel architectures, distributing vectors and sparse matrix–vector products across processing units to tackle very large problems.