SubgradientEdit
Subgradient is a fundamental concept in convex analysis and nonsmooth optimization that extends the familiar idea of a gradient to functions that are not differentiable everywhere. It provides a way to capture all possible linear under-approximations of a function at a given point, which in turn underpins many algorithms for finding minima in settings where the usual derivative does not exist.
In the simplest terms, if a function f is defined on a real vector space and is convex, a subgradient at a point x is a vector g that satisfies a global linear lower bound: f(y) ≥ f(x) + ⟨g, y − x⟩ for all y in the domain. The collection of all such vectors g at x is called the subdifferential and is denoted ∂f(x). When f is differentiable at x, the subdifferential reduces to the single gradient ∇f(x). For nondifferentiable points, ∂f(x) typically contains more than one element, encoding the range of possible linearizations.
Theory
Definition and interpretation
- For a convex function f, a vector g is a subgradient at x if f(y) ≥ f(x) + ⟨g, y − x⟩ for every y in dom(f). The set of all subgradients at x is the subdifferential ∂f(x). When f is differentiable at x, ∂f(x) = {∇f(x)}.
- Subgradients generalize the gradient in a way that preserves many useful properties of smooth optimization, such as providing a direction of nonincreasing movement when moving against the subgradient.
- The concept can be extended to other spaces and contexts, including nonsmooth and nonconvex settings via generalized subdifferentials such as the Clarke subdifferential.
Basic properties
- For convex f, ∂f(x) is a closed, convex set. If f is finite and differentiable at x, ∂f(x) contains exactly one element, ∇f(x).
- If x is outside the interior of the domain of f, the subdifferential can be empty; many results assume interior points or additional regularity (e.g., lower semicontinuity) to guarantee meaningful subgradients.
- Subgradients give rise to supporting hyperplanes of the epigraph of f, which is a geometric way to view the linear under-approximation.
Relationship to the gradient and duality
- In smooth regions, subgradients coincide with gradients. Thus subgradient methods extend gradient methods to nonsmooth terrain.
- Subgradients connect to duality theory: subgradients can be interpreted as elements of the dual space that witness lower bounds for f.
Examples
- f(x) = |x|: ∂f(0) = [−1, 1], while ∂f(x) = {sign(x)} for x ≠ 0.
- f(x) = x^2: ∂f(x) = {2x} for all x (since f is differentiable everywhere).
- f(x) = max{0, x} (the ReLU function): ∂f(0) = [0, 1], and ∂f(x) = {1} for x > 0, ∂f(x) = {0} for x < 0.
- These examples illustrate how subgradients capture a range of possible linearizations at nondifferentiable points.
Extensions and generalizations
- Clarke subdifferential: a construction for locally Lipschitz, possibly nonconvex functions that generalizes the notion of a gradient to nonsmooth contexts.
- Other generalized subdifferentials (e.g., Mordukhovich, Fréchet) provide alternative ways to describe sensitivities of nonsmooth functions in variational analysis.
- Proximal subgradients and Moreau envelopes connect subgradients to regularization techniques that produce smoother optimization problems.
Subgradient methods
Algorithmic framework
- Subgradient methods iteratively update an estimate x_k using a subgradient g_k ∈ ∂f(x_k): x_{k+1} = x_k − α_k g_k, with step sizes α_k chosen to ensure convergence.
- These methods are robust to nondifferentiability and can be applied to a broad class of convex optimization problems, including those with nonsmooth objective functions.
Convergence and trade-offs
- Convergence guarantees typically require convexity of f and appropriate diminishing step sizes. In general, subgradient methods may converge more slowly than gradient methods on smooth problems, but they remain valuable when nonsmoothness or large-scale structure makes smooth methods impractical.
- Practical considerations include the choice of step-size rules, averaging strategies, and problem conditioning. Variants and accelerated schemes exist in the broader literature on nonsmooth optimization.
Applications
- Large-scale machine learning problems with regularized objectives, such as L1-regularized regression, benefit from subgradient approaches when smoothness cannot be assumed.
- Economic equilibrium computations and other areas of applied optimization employ subgradient methods to handle nonsmooth payoffs or constraints.
Extensions and relationships
Nonconvex and generalized notions
- For nonconvex functions, Clarke subdifferentials and related constructs provide useful generalized gradients, allowing sensitivity analysis and algorithmic design in broader settings.
- These generalized notions help in establishing first-order optimality conditions and in guiding descent methods in nonsmooth, nonconvex landscapes.
Connections to related topics
- Convex analysis provides the core language and results for subgradients, including subdifferentials, epigraphs, and supporting hyperplanes.
- Optimization theory uses subgradients to derive optimality conditions and to design algorithms for convex and nonsmooth problems.
- Related concepts include gradient methods, proximal operators, and duality, all of which interact with subgradients in various ways.