Diagonal PreconditioningEdit
Diagonal preconditioning is a straightforward technique in numerical linear algebra designed to accelerate the convergence of iterative methods for solving linear systems of the form Ax = b. The core idea is to replace the original system with a nearby system that is easier to solve iteratively, while preserving the solution. The simplest and most common choice for the preconditioner is a diagonal matrix M that captures the diagonal structure of A, leading to the so-called Jacobi preconditioner. By applying this diagonal proxy, one hopes to improve the conditioning of the problem or at least improve the spectral properties that drive the performance of Krylov subspace methods.
In practice, the preconditioner is used in a way that does not require forming the inverse of A. Instead, one applies M^{-1} to a vector, which—when M is diagonal—reduces to componentwise division by the diagonal entries of A. The preconditioned system can be viewed in left form, M^{-1} A x = M^{-1} b, or in right form, A M^{-1} y = b with x = M^{-1} y, depending on the solver and the desired numerical behavior. This approach is especially attractive for large sparse systems, where the cost of a diagonal solve is negligible compared to more elaborate factorization-based preconditioners.
Diagonal Preconditioning
Framework and definitions
- Let A be a square matrix representing the coefficients of a linear system. A diagonal preconditioner M is formed by taking M = diag(A), the matrix that contains the diagonal entries a_ii of A and zeros elsewhere.
- The preconditioned iteration then operates on either M^{-1} A or A M^{-1}, depending on whether left or right preconditioning is used. In the left-preconditioned form, the solver works on the system M^{-1} A x = M^{-1} b.
- If A is symmetric positive definite (SPD), diagonally scaled systems can still retain favorable properties under appropriate interpretations, and some solvers like the Conjugate Gradient method can be adapted to work with the preconditioned operator. For more general matrices, Krylov methods such as GMRES or BiCGSTAB are commonly employed with diagonal preconditioning.
Construction and variants
- Jacobi preconditioner: M = diag(A). This is the canonical diagonal preconditioner and the one most people mean when they speak of diagonal preconditioning.
- Diagonal scaling (row and/or column scaling): Before forming the diagonal, one may scale rows and/or columns so that the resulting diagonal entries have more uniform magnitude, which can improve the effectiveness of the preconditioner. See discussions of scaling (mathematics) in numerical linear algebra.
- Handling zero diagonals: If A has zero on its diagonal, M is singular and cannot serve as a preconditioner in the straightforward sense. Practical remedies include perturbing the diagonal with a small shift (diagonal augmentation), performing a permutation to move nonzero elements into diagonal positions, or combining diagonal preconditioning with a more robust technique.
Computational considerations
- Cost: The application of M^{-1} is O(n) for an n-by-n matrix when M is diagonal, making it extremely cheap relative to other preconditioning schemes.
- Storage: M is diagonal, so storage is minimal.
- Robustness: The effectiveness of diagonal preconditioning hinges on how well the diagonal entries reflect the dominant scaling of the rows of A. It is particularly natural when A is diagonally dominant or when the diagonal terms provide a good proxy for the row norms.
Effect on solver performance
- Spectral interpretation: Preconditioning aims to cluster the eigenvalues of the preconditioned matrix (or to reduce the condition number) to improve convergence rates of Krylov methods. With a strong diagonal dominance, M^{-1} A tends to have eigenvalues that are closer to 1, which aids convergence.
- SPD case: If A is SPD and M is chosen to be SPD (which is automatic for a real diagonal matrix with positive diagonal entries), the preconditioned problem often remains amenable to CG-like methods. In some instances, using M^{-1/2} A M^{-1/2} as the operator can provide a symmetric, better-conditioned form, though this is more common with more sophisticated preconditioners than a plain diagonal one.
- Non-symmetric and indefinite cases: For non-symmetric A, diagonal preconditioning remains a simple baseline that can help, but it does not guarantee rapid convergence. Krylov solvers such as GMRES typically benefit from any preconditioning that improves eigenvalue clustering, but the gains are problem-dependent.
Applications and limitations
- Suitable contexts: Problems arising from discretized partial differential equations (PDEs), especially when the discretization yields matrices with diagonally dominant structure, often respond well to diagonal preconditioning. Sparse matrix computations routinely use it as a cheap baseline preconditioner.
- Limitations: If the diagonal does not reflect the dominant action of A—for example, in matrices with strong off-diagonal coupling or poor diagonal dominance—the Jacobi preconditioner may offer little improvement or even degrade convergence relative to no preconditioning. In such cases, more elaborate preconditioners (e.g., block diagonal, incomplete factorization) are preferable.
- Interaction with scaling: Pairing diagonal preconditioning with row or column scaling can yield a more balanced spectrum. This combination can be viewed as a two-step approach: first scale to normalize rows/columns, then apply the diagonal preconditioner to the scaled matrix.
Extensions and related techniques
- Block diagonal preconditioning: When A has a natural block structure, replacing the scalar diagonal with a block diagonal M can capture more of A’s coupling while retaining cheap solves for M^{-1}.
- Diagonal plus low-rank or approximate inverses: Some practical strategies augment a diagonal preconditioner with a low-rank correction to capture more of the problematic components without sacrificing efficiency.
- Relation to scaling methods: Diagonal preconditioning is often discussed alongside row/column scaling techniques; together they form a family of simple, fast preprocessing steps that can substantially influence solver performance.