Cohens DEdit

Cohen’s d is a standard measure of effect size used to describe the magnitude of difference between two groups in a way that is independent of the original measurement scale. Named after the American psychologist Jacob Cohen, it is a foundational tool in the behavioral and social sciences, medicine, education, and beyond for expressing how large an observed difference is in practical terms, not just whether it is statistically detectable. By translating differences into units of standard deviation, researchers can compare results across studies that may have used different instruments or scales. See Cohen's d and effect size for related concepts.

Because Cohen’s d is dimensionless, it helps researchers interpret findings in a way that new or disparate measurements can be aligned. A d around 0.2 is generally treated as a small effect, around 0.5 as a medium effect, and around 0.8 as a large effect, though these thresholds are rough guidelines and depend on field-specific norms and the practical stakes of the outcome. In many fields, small effects can still have meaningful implications when the outcome matters a great deal, while in others even moderate effects may be of limited practical importance. This is why researchers often report confidence intervals and discuss context alongside the point estimate of d. See confidence interval and statistical power for related considerations.

Definition and interpretation

Cohen’s d measures the difference between two group means relative to the pooled variability of the groups. For two independent samples, with n1 observations in group 1, n2 in group 2, means m1 and m2, and standard deviations s1 and s2, the pooled standard deviation sp is

  • sp = sqrt(((n1 − 1)s1^2 + (n2 − 1)s2^2) / (n1 + n2 − 2))

and the effect size is

  • d = (m1 − m2) / sp.

If the groups are reversed, the sign of d flips, but the magnitude remains the focus for practical interpretation. In words, d tells you how many standard deviations separate the two means.

For designs that are not two independent groups, variants exist. In paired designs, where the same subjects are measured in both conditions, d is computed as the mean difference divided by the standard deviation of the differences. See paired sample t-test and Cohen's d for guidance on these variations.

The relationship between the test statistic from a two-sample t-test and Cohen’s d helps connect hypothesis testing with practical significance. If t is the test statistic for the comparison, then for independent samples with equal size, d ≈ t / sqrt(n). More generally, d = t × sqrt(1/n1 + 1/n2). See t-test and Cohen's d for details.

Calculation and variants

  • Independent samples (two groups): d = (m1 − m2) / sp, with sp as above. Use this when groups are distinct and not paired.

  • Paired samples: d = (mean of differences) / sd of differences. This reflects the within-subject design.

  • When sample sizes are small or variances are unequal, a bias-corrected estimator is preferred in practice: Hedges’ g. This is Cohen’s d multiplied by a small-sample correction factor J(n1, n2) that reduces positive bias. See Hedges' g for details.

  • If the two groups have very different variances, an alternative is Glass’s delta, which uses only the standard deviation of the control group (or one of the groups) in the denominator. See Glass's delta.

  • For analysis of variance (ANOVA) contexts with more than two groups, the related measure Cohen’s f is used. It connects to η² (eta squared) and can be translated from d under certain assumptions; see Cohen's f and effect size.

  • Nonparametric or ordinal data contexts may warrant alternatives such as Cliff’s delta, which assesses the probability that a randomly selected observation from one group exceeds a randomly selected observation from the other group. See Cliff's delta.

Example - Suppose group A has n1 = 30, mean m1 = 110, s1 = 10, and group B has n2 = 28, mean m2 = 100, s2 = 9. The pooled standard deviation is sp ≈ sqrt(((29)(100) + (27)(81)) / (56)) ≈ 9.54, and d ≈ (110 − 100) / 9.54 ≈ 1.05, a large effect by conventional benchmarks. See mean and standard deviation for the underlying quantities.

Uses in research

Cohen’s d is widely used to:

  • Report the magnitude of effects in empirical studies, enabling comparisons across different measures and samples. See effect size.

  • Facilitate meta-analysis by providing a standardized metric that can be aggregated across studies with different instruments. In meta-analysis, the standardized mean difference is often computed in the form of d or its bias-corrected cousin, Hedges' g.

  • Aid in power analyses and study design, where researchers estimate the sample size needed to detect a given effect size with a chosen level of power. See Power analysis and statistical power.

  • Interpret results in fields where full context requires more than p-values, aligning statistical conclusions with practical significance. See p-value and confidence interval for related concepts.

Controversies and debates

  • Thresholds are context-dependent. While the conventional descriptors (small ≈ 0.2, medium ≈ 0.5, large ≈ 0.8) provide a quick heuristic, many researchers argue that these cutoffs are arbitrary and should be calibrated to domain-specific norms and consequences. See Cohen's d and Cohen's f for discussions of interpretation.

  • Sensitivity to measurement and design. The value of d depends on the reliability of the measurements and on the study design. Measurement error attenuates observed differences, potentially underestimating the true effect size. See standard deviation and reliability in measurement for related considerations.

  • Assumptions and robustness. The standard computation of d assumes roughly normal distributions and comparable variances. Violations can distort the estimate, especially with small samples. In such cases, researchers may report robust or nonparametric alternatives like Cliff's delta or use transformation approaches. See Normal distribution.

  • Small-sample bias and corrections. In small samples, Cohen’s d can be upwardly or downwardly biased depending on context; the bias-corrected estimator Hedges' g is commonly recommended. See also discussions in meta-analysis about how bias corrections affect aggregated results.

  • Practical significance versus statistical significance. A statistically significant difference with a tiny d may have little real-world impact, and conversely a large d in a study with poor design or measurement quality may be misleading. Researchers increasingly pair effect sizes with confidence intervals and domain-specific judgment. See confidence interval and statistical power for related considerations.

Practical considerations and limitations

  • Always report uncertainty. Alongside d, provide a confidence interval for the population effect size and consider presenting a related metric like standard error. See confidence interval.

  • Consider the measurement scale and reliability. If the measurement instrument has high unreliability, d will underestimate the true difference. See reliability and standard deviation.

  • Use appropriate variants when assumptions are violated. If variances are unequal or the data are nonnormal, consider alternatives such as Glass’s delta or nonparametric measures like Cliff’s delta. See Glass's delta and Cliff's delta.

  • In meta-analytic work, be mindful of publication bias and heterogeneity. Standardized mean differences are subject to the same biases as other effect-size estimates, and methods exist to assess and adjust for them. See meta-analysis and publication bias.

See also