KrigingEdit

Kriging is a family of statistical interpolation methods designed for predicting the value of a spatially distributed variable at unsampled locations using observations from sampled sites. It originated in mining applications in South Africa and was developed into a formal framework by the French researcher Georges Matheron, building on the practical work of Danie G. Krige. At its core, kriging treats the unknown field as a realization of a stochastic process with spatial structure that can be captured by a variogram or covariance function. By combining data with a model of spatial dependence, kriging provides both a predicted value and a quantitative measure of the associated uncertainty.

Kriging sits within the broader field of geostatistics, a discipline that blends statistics with spatial reasoning to analyze and map phenomena that vary across space. It is widely used in mining and mineral exploration, environmental science, hydrology, agriculture, meteorology, and other domains where measuring every location is impractical or impossible. See for example Geostatistics and Spatial statistics for related concepts and methods, and consider how the ideas of kriging relate to the broader theory of stochastic processes such as Gaussian process modeling.

Overview and core ideas

Kriging treats the value Z(x) at a location x as a random variable with spatial dependence on values at nearby locations. The central goal is to form a predictor Z*(x0) at an unsampled site x0 that is a weighted sum of observed values:

  • Z*(x0) = sum_i λ_i Z(x_i)

where the λ_i are weights determined to minimize the prediction error variance under an unbiasedness constraint. The weights depend on the spatial arrangement of the data and on a model of how the variable correlates with distance, typically expressed through a variogram or covariance function.

Two features distinguish kriging from simpler interpolation methods:

  • The predictor uses an explicit model of spatial correlation (variogram or covariance), not just distance.
  • The method provides an accompanying estimate of its own uncertainty, the kriging variance, for each prediction location.

Common interpolation techniques such as inverse distance weighting can be viewed as special cases or benchmarks; in many situations kriging offers better descriptions of spatial structure and uncertainty, provided the underlying assumptions are reasonably satisfied. See Inverse distance weighting for a comparison of approaches.

The variogram and covariance modeling

A key ingredient in kriging is a function that captures how similarity between values decays with separation distance. The variogram γ(h) or the covariance function C(h) encodes this spatial dependence, where h is the separation vector between two locations. In practice, the experimental variogram is estimated from data and then fitted with a theoretical model such as spherical, exponential, Gaussian, or Matérn forms. The choice of model influences the kriging weights and the resulting predictions.

Several related concepts frequently appear in kriging discussions:

  • Semivariogram versus variogram: γ(h) measures half the average squared difference between values separated by h; it is the practical object estimated from data. See Semivariogram and Variogram for details.
  • Stationarity assumptions: ordinary and universal variants rely on different levels of stationarity (e.g., a constant mean over a local neighborhood in ordinary kriging; a trend component in universal kriging). See Stationarity (statistics) for foundational ideas.
  • Covariance versus variogram: because the two are linked through the chosen model, practitioners may work with either depending on convenience and domain knowledge. See Covariance function for related material.

Variants of kriging

Kriging is a family rather than a single method. Each variant reflects different assumptions about the mean structure, the availability of auxiliary information, or the distribution of the variable:

  • Ordinary kriging: assumes an unknown, locally constant mean; widely used as a default because it requires only modest assumptions about the mean.
  • Simple kriging: assumes a known constant mean; typically used when background information justifies a fixed mean value.
  • Universal (trend) kriging: models a deterministic trend (a function of location, such as a polynomial) plus a residual that is kriged; useful when large-scale trends are present.
  • Regression kriging (kriging with external drift): first regresses the data on covariates (external drift) and then kriges the residuals; this leverages auxiliary information such as soil properties or remotely sensed predictors.
  • Indicator kriging: applies kriging to binary or categorized data by forecasting probabilities or presence/absence decisions; helpful when the variable is not Gaussian or when explicit probability statements are needed.
  • Co-kriging (cokriging): uses multiple related variables to improve predictions by exploiting cross-correlation among the variables; an extension valuable in environmental and mining settings where several related measurements exist.
  • Kriging with external drift (KED): similar to regression kriging but often presented as a specific form of universal kriging with a drift component informed by covariates.
  • Bayesian kriging: frames the problem in a Bayesian context, incorporating prior information and producing full posterior distributions for predictions and uncertainties.

Each variant has its own computational implications and is chosen based on data characteristics, available ancillary information, and the goals of the analysis. See Kriging variants and Regression kriging and Co-kriging for related discussions.

Mathematical formulation (high level)

In its linear form, the kriging predictor at a location x0 is a weighted sum of observed values:

  • Z*(x0) = sum_i λ_i Z(x_i)

The weights λ_i are found by solving a system that enforces unbiasedness and minimizes the prediction variance. A compact way to describe the system is:

  • Construct a matrix Γ of covariances between observed locations, add a row and column for the unbiasedness constraint, and solve for the weights λ and a Lagrange multiplier μ that enforces the constraint sum_i λ_i = 1. The right-hand side involves covariances between each observed location and the prediction location x0.

  • The resulting kriging variance, σ_k^2(x0), provides an estimate of the uncertainty associated with Z*(x0).

In practice, practitioners often express these relationships in terms of the variogram γ(h) or its covariance equivalent, fit a model, and then compute the weights numerically. See Kriging system and Kriging variance for deeper mathematical coverage.

Estimation, cross-validation, and practical considerations

  • Variogram estimation: The experimental variogram is constructed from pairs of data separated by distances and directions. Model fitting involves selecting a theoretical variogram form and estimating its parameters (range, sill, nugget) to reflect the spatial structure.

  • Non-stationarity and anisotropy: Real-world fields often exhibit non-stationarity (changing mean or variance over space) or directional dependence (anisotropy). In such cases, standard ordinary kriging can be inadequate, and methods such as universal kriging, local kriging with moving neighborhoods, or non-stationary kriging variants are employed. See Anisotropy and Non-stationarity in spatial statistics discussions.

  • Computational complexity: Kriging requires solving a linear system whose size grows with the number of observed points used in the neighborhood. For large datasets, strategies such as local kriging with neighborhoods, sparse covariance representations, or approximate kriging are common. See Spatial statistics discussions on computational methods.

  • Validation: Cross-validation, mean squared prediction error, and kriging efficiency are used to assess predictive performance and variogram fit. See Cross-validation (statistics) for related concepts.

Applications and impact

Kriging has become a standard tool in fields that rely on spatial prediction and uncertainty quantification:

  • Mining and ore reserve estimation: original motivation and primary historical domain; used to estimate ore grades where sampling is costly or destructive. See Mining and Geostatistics case histories.
  • Environmental monitoring and hydrology: mapping contaminant plumes, groundwater levels, and soil properties with quantified uncertainty.
  • Agriculture and ecology: predicting soil nutrients, moisture, and other soil properties critical for crop management and ecological models.
  • Meteorology and climate science: interpolation of meteorological fields, precipitation, and other spatially distributed climate variables where uncertainty matters for forecasting and decision-making.

See related topics such as Geostatistics and Spatial statistics for broader methodological contexts and alternative approaches to spatial prediction.

Limitations and debates

  • Model dependence: The quality of kriging predictions relies heavily on the chosen variogram model and stationarity assumptions. If the spatial structure is misrepresented, predictions and uncertainty estimates can be biased.
  • Extrapolation versus interpolation: Kriging is designed for interpolation within the convex hull of observed data; extrapolation beyond observed support can be risky and requires careful modeling or alternative approaches.
  • Non-Gaussian fields: While kriging originated under Gaussian process assumptions, many real-world fields are non-Gaussian. Techniques such as indicator kriging or non-stationary variants help, but practitioners should be mindful of distributional assumptions.
  • Emergence of alternatives: In some applications, machine learning and Bayesian hierarchical models offer flexible alternatives to classical kriging, especially when large datasets and complex non-stationarities are involved. Each approach has trade-offs in interpretability, uncertainty quantification, and data requirements.

See also