Holm BonferroniEdit

Holm Bonferroni is a widely used statistical method for controlling the familywise error rate when multiple hypotheses are tested. Rooted in the classical Bonferroni correction, it adds a sequential, step-down procedure that increases statistical power while preserving a principled standard against spurious findings. The method is named after Carlo Emilio Bonferroni, who introduced early ideas for correcting for multiple comparisons, and the Swedish statistician Sture Holm, who formalized the sequentially rejective approach in 1979. Together, the Holm–Bonferroni method offers a practical compromise between rigid error control and the desire to detect true effects across many tests. Bonferroni correction Sture Holm Holm–Bonferroni method familywise error rate multiple testing

History and etymology

The Bonferroni correction traces its lineage to the work of Carlo Emilio Bonferroni in the early 20th century, who laid groundwork for adjusting significance thresholds in the face of multiple comparisons. The modern Holm–Bonferroni procedure builds on that lineage, with Sture Holm introducing a simple, sequentially rejective framework in the late 1970s. This approach keeps the overall probability of any false positive under control (the familywise error rate) while offering more power than the original Bonferroni rule in typical research settings. The method is now a standard tool in many fields where researchers face large numbers of simultaneous tests, from clinical trials to genomics. Bonferroni correction Sture Holm Holm–Bonferroni method p-value

How it works

The Holm–Bonferroni procedure is a step-down procedure for testing m hypotheses with observed p-values p(1) ≤ p(2) ≤ ... ≤ p(m). The basic idea is to compare the ordered p-values to increasingly lenient significance thresholds, starting with the strongest evidence against any null hypothesis.

  • Order the m p-values from smallest to largest: p(1), p(2), ..., p(m).
  • For i = 1 to m, compare p(i) to alpha/(m - i + 1), where alpha is the chosen familywise error rate (commonly 0.05).
  • If p(i) ≤ alpha/(m - i + 1), reject the corresponding null hypothesis H(i) and continue to the next i.
  • If p(i) > alpha/(m - i + 1), stop and do not reject H(i) or any subsequent hypotheses H(i+1), ..., H(m).

All hypotheses corresponding to p-values rejected in the steps above are rejected; those not rejected remain in doubt. This procedure guarantees control of the familywise error rate under fairly general conditions, and it tends to be more powerful than the plain Bonferroni correction because it relaxes thresholds for larger p-values as the number of remaining hypotheses decreases. See also Holm–Bonferroni method for the formal description and proofs. p-value Bonferroni correction step-down procedure familywise error rate

Example Suppose m = 5 hypotheses, alpha = 0.05, and the ordered p-values are 0.003, 0.02, 0.04, 0.06, 0.10.

  • p(1) = 0.003 ≤ 0.05/5 = 0.01 → reject H(1)
  • p(2) = 0.02 > 0.05/4 = 0.0125 → stop; do not reject H(2), H(3), H(4), H(5)

Only the first hypothesis is rejected in this example, illustrating the balance Holm–Bonferroni strikes between error control and discovery. p-value multiple testing

Mathematical formulation and properties

  • Objective: control the probability of making at least one Type I error across all m tests (the familywise error rate).
  • Assumptions: The method is valid under arbitrary dependence among tests, which makes it a robust choice in many practical circumstances.
  • Comparison to alternatives:

Applications and practice

Holm–Bonferroni is used across disciplines whenever researchers face multiple testing concerns and wish to maintain credible claims. It is common in: - clinical trial reporting to avoid spurious treatment effects being declared significant. - genomics and high-throughput screening where thousands of tests are performed simultaneously. - psychology and other social sciences where replicability and robustness are valued. - biostatistics and epidemiology, where precise error control is important for policy implications. Practical implementation is straightforward in standard statistical software, and the method is implemented in many libraries alongside other p-value adjustments. For instance, p-values can be adjusted using built-in functions in resources such as R (programming language) or Python (programming language). p-value R (programming language) Python (programming language)

Controversies and debates

Like any statistical tool, Holm–Bonferroni sits at the center of debates about how best to balance caution with discovery. Proponents emphasize accountability and reproducibility: preventing even a single false positive in a large set of tests is crucial when decisions involve public health, policy, or substantial resource allocation. From this vantage point, Holm–Bonferroni helps ensure that reported findings are genuinely informative and not artifacts of random variation.

Critics argue that the method can be overly conservative, especially when the number of tests is large or when the tests are not equally informative. In exploratory research, or in fields with a very large testing burden, some advocate for methods that control the false discovery rate (FDR) rather than the FWER, arguing that some acceptable level of false positives can be traded for greater ability to detect true effects. See also Benjamini–Hochberg procedure and multiple testing.

From a traditional, efficiency-minded perspective on public science and research funding, advocates contend that robust error control reduces downstream costs—time, money, and attention spent pursuing false leads. Critics of overly cautious approaches may dismiss excessive focus on p-values as a distraction from theory, replication, and practical impact. When debates become heated, the core question is whether the costs of false positives outweigh the costs of potentially missing real effects, and in many applications Holm–Bonferroni is defended as a sensible compromise. In these discussions, the technical point remains that Holm–Bonferroni provides a clear, mathematically principled way to sustain credibility while preserving as much discovery as the data allow. See p-value statistics hypothesis testing for foundational context. Benjamini–Hochberg procedure False discovery rate Multiple comparisons problem

See also