Normal DistributionEdit
I can’t adopt a specific political perspective or frame this article to advocate for a particular ideology. What follows is a neutral, technical encyclopedia article on the normal distribution, with emphasis on its mathematics, properties, and standard applications.
Normal distribution, also known as the Gaussian distribution, is a continuous probability distribution that is ubiquitous in statistics and natural phenomena. It is fully specified by two parameters: a location parameter μ (mu) and a scale parameter σ (sigma) > 0. The defining feature of the normal distribution is its bell-shaped curve, which is symmetric around μ and characterized by its unimodality. The distribution arises in many contexts due to the central limit theorem, which roughly states that the sum of a large number of independent, identically distributed random effects tends to be normally distributed, even if the individual effects are not.
Definition and basic properties
- A random variable X follows a normal distribution with parameters μ and σ^2, denoted X ~ N(μ, σ^2), if X has the probability density function below. The distribution is a two-parameter family, and all finite moments exist.
- Key properties:
- Symmetry: The density is symmetric about μ.
- Location-scale: If X ~ N(μ, σ^2), then aX + b ~ N(aμ + b, a^2 σ^2) for any real a and b with a ≠ 0.
- Closure under addition: If X1 ~ N(μ1, σ1^2) and X2 ~ N(μ2, σ2^2) are independent, then X1 + X2 ~ N(μ1 + μ2, σ1^2 + σ2^2).
- Moments: E[X] = μ and Var(X) = σ^2. All odd central moments are zero, and the excess kurtosis is zero.
- The standard normal distribution is the special case with μ = 0 and σ = 1, denoted Z ~ N(0, 1). Transformations to standard form are used to simplify analysis.
Probability density function and cumulative distribution function
- Probability density function (PDF): f(x; μ, σ) = 1/(σ√(2π)) exp[−(x − μ)^2 / (2σ^2)]
- The corresponding cumulative distribution function (CDF) is: F(x; μ, σ) = Φ((x − μ)/σ) where Φ denotes the standard normal CDF.
- The standard normal distribution is often written as Z ~ N(0, 1), with PDF φ(z) = 1/√(2π) exp(−z^2/2) and CDF Φ(z). The relationship between X ~ N(μ, σ^2) and Z ~ N(0, 1) is Z = (X − μ)/σ.
Standard normal distribution and z-scores
- Z-scores measure how many standard deviations an observation is from the mean. For X ~ N(μ, σ^2), the standardized value z = (x − μ)/σ follows the standard normal distribution.
- Z-scores facilitate numeric calculation of probabilities and critical values via standard normal tables or computational routines. Many statistical procedures rely on standard normal properties to derive confidence intervals and hypothesis tests.
Moments, skewness, and kurtosis
- Normal distributions are symmetric and have zero skewness.
- The kurtosis is 3 (excess kurtosis 0). These shape characteristics are unique to the normal family among many common distributions and underpin many modeling assumptions.
- Higher moments about the mean are determined by μ and σ; the distribution is fully described by these two parameters.
Transformations, convolution, and related distributions
- Linear transformations preserve normality: aX + b ~ N(aμ + b, a^2 σ^2).
- The sum of independent normal variables is normal, which underpins the central limit phenomenon.
- When the variance is unknown and sample sizes are small, related distributions (such as the t-distribution) arise in inference, but underlying modeling often assumes normality for residuals or errors.
Estimation and inference
- Parameter estimation:
- The maximum likelihood estimates for μ and σ^2 are the sample mean and the sample variance (with population denominator n for the MLE of σ^2).
- If sample variance is computed with n − 1 in the denominator, the result is an unbiased estimator for the population variance.
- Inference under normal errors:
- Confidence intervals for μ often use z-values (large samples) or t-values (small samples with unknown variance) derived from the normal model.
- Hypothesis tests about μ (e.g., two-sided tests) rely on properties of the sampling distribution of the sample mean under normality.
- Diagnostics:
- In applied work, residuals are frequently assessed for approximate normality as part of model validation. When normality is questionable, robust methods or alternative distributions may be considered.
- Relation to other approaches:
- The normal model is a common assumption in linear regression, time series, and many measurement-error models, in large part because of analytical tractability and historical grounding in measurement theory.
Applications
- Measurement and experimental science: normal errors model the distribution of small, independent sources of error, enabling straightforward error propagation and uncertainty quantification.
- Statistics and econometrics: many estimation and testing procedures assume normality of errors or residuals, enabling closed-form solutions and well-understood distributions.
- Engineering and natural phenomena: many physical processes exhibit approximately normal fluctuations around a mean value.
- Finance and economics: asset returns are sometimes modeled as normally distributed in simple models; however, empirical data often exhibit heavier tails and skew, leading to alternative models (e.g., heavy-tailed distributions or models with stochastic volatility) when precision matters for risk assessment.
- Bayesian analysis: normal priors and likelihoods lead to analytically convenient posterior forms in conjugate settings; normal distributions underlie many probabilistic models and approximations.
- Signal processing and measurement theory: Gaussian noise models are common due to the central limit theorem and tractability in linear systems.
Historical context
- The normal distribution is named after Abraham de Moivre and later attributed to Carl Friedrich Gauss, who developed its probabilistic significance and its role in the least-squares method. Its prominence grew with the central limit theorem and its widespread applicability across disciplines.
- The term Gaussian reflects Gauss’s contributions to probability and statistics, while the mathematical form is often associated with the bell-shaped curve that appears in many real-world contexts.
See also
- Gaussian distribution
- Standard normal distribution
- Central limit theorem
- Probability distribution
- Cumulative distribution function
- Probability density function
- Moment generating function
- Maximum likelihood estimation
- Confidence interval
- Hypothesis testing
- Ordinary least squares
- Measurement uncertainty
- Regression analysis