Bernoulli DistributionEdit

The Bernoulli distribution is the simplest nontrivial discrete probability model for a binary outcome. A single trial produces one of two possible results, commonly encoded as 1 for success and 0 for failure, with a fixed probability p of success on each trial. The distribution is a building block for many other models in statistics and data analysis, most notably the Binomial distribution which arises when several independent Bernoulli trials are observed and counted. The distribution is named after the Swiss mathematician Jacob Bernoulli and has deep ties to ideas about chance, repeatable experiments, and the law of large numbers as explored in his work Ars Conjectandi.

In practice, the Bernoulli model is used whenever a binary decision, outcome, or event is observed with a known or estimated probability of occurrence. It appears in fields as diverse as quality control, survey sampling, digital communication, and modern data science workflows that rely on binary indicators such as conversions, clicks, or yes/no classifications. The core mathematics is compact and transparent, which is why the Bernoulli distribution continues to serve as a reference point for modeling and inference in both traditional statistics and applied analytics, including methods like Logistic regression when the goal is to relate a probability of success to explanatory variables.

History

The concept of a Bernoulli trial and the associated distribution grew out of early work on probability by the Bernoulli family, culminating in the late 17th and early 18th centuries. In particular, Jacob Bernoulli formalized patterns of repeated independent trials and the accumulation of evidence toward likelihoods of outcomes, a foundation that later evolved into the modern understanding of the Bernoulli distribution and its role in the broader family of discrete distributions. The formal appearance of the term and its use in statistical modeling were popularized through subsequent work on discrete probability and the connection to the more general Binomial distribution.

Definition and basic properties

Let X be a random variable taking values in {0, 1}, where P(X = 1) = p and P(X = 0) = 1 − p with p in the interval [0, 1]. Then X follows the Bernoulli distribution with parameter p, denoted X ~ Bernoulli(p).

Probability mass function: P(X = x) = p^x (1 − p)^{1 − x} for x ∈ {0, 1}.
Expected value (mean): E[X] = p.
Variance: Var(X) = p(1 − p).
Moment generating function: M_X(t) = (1 − p) + p e^t.
The support is exactly {0, 1}. The distribution is symmetric about p = 1/2 when viewed in terms of outcomes, with the most probable value being 0 if p < 1/2, 1 if p > 1/2, and both values equally likely when p = 1/2.

A sequence of independent Bernoulli(p) trials forms a Bernoulli process, and the sum of n such trials has a Binomial distribution with parameters n and p. This connection is central for modeling counts of successes in a fixed number of trials and for connecting binary outcomes to broader probabilistic reasoning.

Parameter estimation and inference

Estimation of p: When observing n independent Bernoulli trials X_1, X_2, ..., X_n, the natural estimator for p is the sample proportion p̂ = (1/n) ∑ X_i. This estimator is unbiased and, under standard conditions, its distribution concentrates around p as n grows, consistent with the law of large numbers.
Confidence intervals: For large samples, p̂ is often treated as approximately normal with standard error sqrt[p̂(1 − p̂)/n], enabling conventional confidence intervals. In small samples, alternatives such as Wilson or exact (Clopper–Pearson) intervals are commonly recommended to maintain nominal coverage.
Hypothesis testing: Tests about p (for example, testing p = p0) can be carried out using z-tests (large-sample) or exact methods for small samples, depending on the desired properties and the context of the data.
Bayesian perspective: With a Beta prior Beta(α, β), the posterior after observing x successes in n trials is Beta(α + x, β + n − x). The Bayesian framework blends prior beliefs with data and yields posterior summaries for p that are convenient to report and interpret.

Related models and extensions

Relation to the Binomial distribution: The sum of n independent Bernoulli(p) variables has distribution Binomial(n, p), which underpins many classical statistical procedures for counts of successes.
Beta-Bernoulli and overdispersion: If p itself is uncertain and modeled as a random variable, one can obtain hierarchical models such as the Beta-Bernoulli, which can address overdispersion relative to the simple Bernoulli model.
Extensions to binary data with covariates: When p depends on explanatory variables, models such as Logistic regression or Probit model are used to describe how covariates influence the probability of a binary outcome.
Alternative approaches to binary outcomes: In some settings, the Bernoulli distribution is replaced by more flexible families or by time-series models (e.g., Markov Bernoulli processes) when outcomes exhibit dependence over time or context.

Applications and examples

Quality control and reliability: Bernoulli trials model whether a unit passes a quality check or whether a component works on demand, with p representing the pass-rate or reliability.
Survey sampling and experiments: Yes/no responses, such as whether a respondent agrees with a statement or a treatment yields a positive effect, are naturally modeled with Bernoulli(p).
A/B testing and conversion measurement: When outcomes are binary (conversion vs. no conversion), the Bernoulli model provides a straightforward basis for estimating conversion probabilities and for comparing groups.
Digital communication and information theory: Binary channels assign 0/1 symbols with a certain probability of success, making Bernoulli models a fundamental component of more complex coding and transmission analyses.

Limitations and considerations

Independence assumption: The Bernoulli model assumes that trials are independent and identically distributed. In many real-world settings this assumption is violated, requiring alternate models that capture dependence or heterogeneity in p across trials.
Overdispersion and heterogeneity: When data show more variability than the Bernoulli model allows, extensions such as Beta-Binomial or models that link p to covariates via link functions are used to restore interpretability and fit.
When outcomes are not truly binary or when multiple categories exist: The Bernoulli distribution is not appropriate, and one should consider the broader family of discrete distributions (e.g., Binomial distribution, Multinomial distribution).