SubgradientEdit

Subgradient is a fundamental concept in convex analysis and nonsmooth optimization that extends the familiar idea of a gradient to functions that are not differentiable everywhere. It provides a way to capture all possible linear under-approximations of a function at a given point, which in turn underpins many algorithms for finding minima in settings where the usual derivative does not exist.

In the simplest terms, if a function f is defined on a real vector space and is convex, a subgradient at a point x is a vector g that satisfies a global linear lower bound: f(y) ≥ f(x) + ⟨g, y − x⟩ for all y in the domain. The collection of all such vectors g at x is called the subdifferential and is denoted ∂f(x). When f is differentiable at x, the subdifferential reduces to the single gradient ∇f(x). For nondifferentiable points, ∂f(x) typically contains more than one element, encoding the range of possible linearizations.

Theory

Definition and interpretation
- For a convex function f, a vector g is a subgradient at x if f(y) ≥ f(x) + ⟨g, y − x⟩ for every y in dom(f). The set of all subgradients at x is the subdifferential ∂f(x). When f is differentiable at x, ∂f(x) = {∇f(x)}.
- Subgradients generalize the gradient in a way that preserves many useful properties of smooth optimization, such as providing a direction of nonincreasing movement when moving against the subgradient.
- The concept can be extended to other spaces and contexts, including nonsmooth and nonconvex settings via generalized subdifferentials such as the Clarke subdifferential.
Basic properties
- For convex f, ∂f(x) is a closed, convex set. If f is finite and differentiable at x, ∂f(x) contains exactly one element, ∇f(x).
- If x is outside the interior of the domain of f, the subdifferential can be empty; many results assume interior points or additional regularity (e.g., lower semicontinuity) to guarantee meaningful subgradients.
- Subgradients give rise to supporting hyperplanes of the epigraph of f, which is a geometric way to view the linear under-approximation.
Relationship to the gradient and duality
- In smooth regions, subgradients coincide with gradients. Thus subgradient methods extend gradient methods to nonsmooth terrain.
- Subgradients connect to duality theory: subgradients can be interpreted as elements of the dual space that witness lower bounds for f.
Examples
- f(x) = |x|: ∂f(0) = [−1, 1], while ∂f(x) = {sign(x)} for x ≠ 0.
- f(x) = x^2: ∂f(x) = {2x} for all x (since f is differentiable everywhere).
- f(x) = max{0, x} (the ReLU function): ∂f(0) = [0, 1], and ∂f(x) = {1} for x > 0, ∂f(x) = {0} for x < 0.
- These examples illustrate how subgradients capture a range of possible linearizations at nondifferentiable points.
Extensions and generalizations
- Clarke subdifferential: a construction for locally Lipschitz, possibly nonconvex functions that generalizes the notion of a gradient to nonsmooth contexts.
- Other generalized subdifferentials (e.g., Mordukhovich, Fréchet) provide alternative ways to describe sensitivities of nonsmooth functions in variational analysis.
- Proximal subgradients and Moreau envelopes connect subgradients to regularization techniques that produce smoother optimization problems.

Subgradient methods

Algorithmic framework
- Subgradient methods iteratively update an estimate x_k using a subgradient g_k ∈ ∂f(x_k): x_{k+1} = x_k − α_k g_k, with step sizes α_k chosen to ensure convergence.
- These methods are robust to nondifferentiability and can be applied to a broad class of convex optimization problems, including those with nonsmooth objective functions.
Convergence and trade-offs
- Convergence guarantees typically require convexity of f and appropriate diminishing step sizes. In general, subgradient methods may converge more slowly than gradient methods on smooth problems, but they remain valuable when nonsmoothness or large-scale structure makes smooth methods impractical.
- Practical considerations include the choice of step-size rules, averaging strategies, and problem conditioning. Variants and accelerated schemes exist in the broader literature on nonsmooth optimization.
Applications
- Large-scale machine learning problems with regularized objectives, such as L1-regularized regression, benefit from subgradient approaches when smoothness cannot be assumed.
- Economic equilibrium computations and other areas of applied optimization employ subgradient methods to handle nonsmooth payoffs or constraints.

Extensions and relationships

Nonconvex and generalized notions
- For nonconvex functions, Clarke subdifferentials and related constructs provide useful generalized gradients, allowing sensitivity analysis and algorithmic design in broader settings.
- These generalized notions help in establishing first-order optimality conditions and in guiding descent methods in nonsmooth, nonconvex landscapes.
Connections to related topics
- Convex analysis provides the core language and results for subgradients, including subdifferentials, epigraphs, and supporting hyperplanes.
- Optimization theory uses subgradients to derive optimality conditions and to design algorithms for convex and nonsmooth problems.
- Related concepts include gradient methods, proximal operators, and duality, all of which interact with subgradients in various ways.

SubgradientEdit

Theory

Subgradient methods

Extensions and relationships

See also

Your Feedback is Important