SubgradientEdit

Subgradient is a fundamental concept in convex analysis and nonsmooth optimization that extends the familiar idea of a gradient to functions that are not differentiable everywhere. It provides a way to capture all possible linear under-approximations of a function at a given point, which in turn underpins many algorithms for finding minima in settings where the usual derivative does not exist.

In the simplest terms, if a function f is defined on a real vector space and is convex, a subgradient at a point x is a vector g that satisfies a global linear lower bound: f(y) ≥ f(x) + ⟨g, y − x⟩ for all y in the domain. The collection of all such vectors g at x is called the subdifferential and is denoted ∂f(x). When f is differentiable at x, the subdifferential reduces to the single gradient ∇f(x). For nondifferentiable points, ∂f(x) typically contains more than one element, encoding the range of possible linearizations.

Theory

  • Definition and interpretation

    • For a convex function f, a vector g is a subgradient at x if f(y) ≥ f(x) + ⟨g, y − x⟩ for every y in dom(f). The set of all subgradients at x is the subdifferential ∂f(x). When f is differentiable at x, ∂f(x) = {∇f(x)}.
    • Subgradients generalize the gradient in a way that preserves many useful properties of smooth optimization, such as providing a direction of nonincreasing movement when moving against the subgradient.
    • The concept can be extended to other spaces and contexts, including nonsmooth and nonconvex settings via generalized subdifferentials such as the Clarke subdifferential.
  • Basic properties

    • For convex f, ∂f(x) is a closed, convex set. If f is finite and differentiable at x, ∂f(x) contains exactly one element, ∇f(x).
    • If x is outside the interior of the domain of f, the subdifferential can be empty; many results assume interior points or additional regularity (e.g., lower semicontinuity) to guarantee meaningful subgradients.
    • Subgradients give rise to supporting hyperplanes of the epigraph of f, which is a geometric way to view the linear under-approximation.
  • Relationship to the gradient and duality

    • In smooth regions, subgradients coincide with gradients. Thus subgradient methods extend gradient methods to nonsmooth terrain.
    • Subgradients connect to duality theory: subgradients can be interpreted as elements of the dual space that witness lower bounds for f.
  • Examples

    • f(x) = |x|: ∂f(0) = [−1, 1], while ∂f(x) = {sign(x)} for x ≠ 0.
    • f(x) = x^2: ∂f(x) = {2x} for all x (since f is differentiable everywhere).
    • f(x) = max{0, x} (the ReLU function): ∂f(0) = [0, 1], and ∂f(x) = {1} for x > 0, ∂f(x) = {0} for x < 0.
    • These examples illustrate how subgradients capture a range of possible linearizations at nondifferentiable points.
  • Extensions and generalizations

    • Clarke subdifferential: a construction for locally Lipschitz, possibly nonconvex functions that generalizes the notion of a gradient to nonsmooth contexts.
    • Other generalized subdifferentials (e.g., Mordukhovich, Fréchet) provide alternative ways to describe sensitivities of nonsmooth functions in variational analysis.
    • Proximal subgradients and Moreau envelopes connect subgradients to regularization techniques that produce smoother optimization problems.

Subgradient methods

  • Algorithmic framework

    • Subgradient methods iteratively update an estimate x_k using a subgradient g_k ∈ ∂f(x_k): x_{k+1} = x_k − α_k g_k, with step sizes α_k chosen to ensure convergence.
    • These methods are robust to nondifferentiability and can be applied to a broad class of convex optimization problems, including those with nonsmooth objective functions.
  • Convergence and trade-offs

    • Convergence guarantees typically require convexity of f and appropriate diminishing step sizes. In general, subgradient methods may converge more slowly than gradient methods on smooth problems, but they remain valuable when nonsmoothness or large-scale structure makes smooth methods impractical.
    • Practical considerations include the choice of step-size rules, averaging strategies, and problem conditioning. Variants and accelerated schemes exist in the broader literature on nonsmooth optimization.
  • Applications

    • Large-scale machine learning problems with regularized objectives, such as L1-regularized regression, benefit from subgradient approaches when smoothness cannot be assumed.
    • Economic equilibrium computations and other areas of applied optimization employ subgradient methods to handle nonsmooth payoffs or constraints.

Extensions and relationships

  • Nonconvex and generalized notions

    • For nonconvex functions, Clarke subdifferentials and related constructs provide useful generalized gradients, allowing sensitivity analysis and algorithmic design in broader settings.
    • These generalized notions help in establishing first-order optimality conditions and in guiding descent methods in nonsmooth, nonconvex landscapes.
  • Connections to related topics

    • Convex analysis provides the core language and results for subgradients, including subdifferentials, epigraphs, and supporting hyperplanes.
    • Optimization theory uses subgradients to derive optimality conditions and to design algorithms for convex and nonsmooth problems.
    • Related concepts include gradient methods, proximal operators, and duality, all of which interact with subgradients in various ways.

See also