Regression Discontinuity DesignEdit

Regression Discontinuity Design (RDD) is a quasi-experimental tool for causal inference that leverages a known cutoff in an assignment rule to estimate the effects of a policy or program. When units just below and just above the threshold are alike in terms of observed and unobserved characteristics, any sudden jump in the outcome at the cutoff can be attributed to the treatment rather than to other factors. RDD is a productive method for policy evaluation and is widely used in economics, political science, and public policy because it can yield credible causal estimates without relying on randomized trials. The estimated effect is a local average treatment effect, meaningful for observations near the threshold, and its credibility rests on the assumption that the running variable determines treatment in a way that is as-if random around the cutoff. RDD can be implemented in two primary flavors: sharp RD, where treatment is assigned deterministically at the cutoff, and fuzzy RD, where the probability of treatment increases at the cutoff but is not perfect.

RDD sits at the intersection of policy evaluation and causal inference, providing a practical approach when randomized experiments are infeasible or politically impractical. It is especially attractive to policymakers who favor targeted, results-driven programs and need credible evidence to justify the allocation of scarce resources. By focusing on the margin where policy changes occur, RDD highlights how a policy works for those who are just on the edge of eligibility, a useful perspective for designing efficient programs and for calibrating thresholds that balance fairness, cost, and impact. See causal inference and policy evaluation for broader context, as well as the discussion of the local average treatment effect (local average treatment effect) that emerges in this framework. The method is closely related to other quasi-experimental approaches, including instrumental variables strategies, but its strength lies in exploiting a known rule that produces a quasi-random assignment near the threshold.

Concept and Assumptions

  • Running variable and cutoff: The assignment variable, sometimes called the running variable, determines whether a unit receives treatment when it crosses a specified threshold or cutoff. See running variable and cutoff for formal definitions and common examples.

  • Continuity and comparability: The core assumption is that, absent the treatment, potential outcomes would change smoothly with the running variable. In the vicinity of the cutoff, units on either side are comparable in expectation, making the observed discontinuity attributable to the treatment. See causal inference for a broader treatment of identification assumptions.

  • No precise manipulation around the cutoff: A key concern is whether units can or will manipulate the running variable to reach or avoid the threshold. Researchers test this with the McCrary density test and related checks to ensure that the distribution near the cutoff does not reflect sorting that invalidates the design. See McCrary density test for details.

  • Sharp vs fuzzy RD: In sharp RD, treatment assignment flips deterministically at the cutoff. In fuzzy RD, the probability of treatment jumps at the cutoff but does not go from 0 to 1. See sharp regression discontinuity design and fuzzy regression discontinuity design for formal formulations and examples.

  • Local nature of the estimate: The causal effect identified by RD applies to observations near the cutoff, i.e., a local average treatment effect. This characteristic is a feature, not a flaw, because it provides policy-relevant insight at the margin where decisions are made. See local average treatment effect for a broader discussion.

Types and Estimation

  • Estimation approach: The standard practice uses local polynomial (often local linear) regression to estimate the outcome level on each side of the cutoff and then computes the discontinuity. The bandwidth choice—how close to the cutoff to include observations—affects bias and variance. Modern implementations combine bias-correction with optimal bandwidth selection methods. See discussions of bandwidth selection and local regression techniques.

  • Inference challenges: Because the estimate is based on a small neighborhood around the cutoff, standard errors must be calculated carefully, sometimes with robust methods and explicit bias corrections. See the work of leading practitioners in Calonico, Cattaneo, and Titiunik for practical guidelines.

  • Complementary analyses: Researchers often report RD plots, examine covariate balance across the threshold, and conduct placebo tests at non-cutoff values to bolster credibility. See placebo test and covariate balance for related concepts.

Validation and Robustness

  • Manipulation checks: The McCrary density test is commonly used to detect unusual discontinuities in the density of the running variable at the cutoff, which could signal manipulation. See McCrary density test.

  • Covariate continuity: A desirable check is whether observed pre-treatment covariates are continuous at the cutoff. Abrupt jumps in covariates near the threshold can indicate violations of the identification assumptions.

  • Sensitivity to bandwidth and polynomial order: Analysts explore a range of bandwidths and polynomial specifications to ensure that results are not driven by a particular modeling choice. This aligns with broader best practices in statistical inference.

  • Placebo and falsification tests: Conducting discontinuity checks at values away from the cutoff, or in subgroups where no treatment is expected, helps establish that the observed effect is tied to the policy change rather than to spurious trends. See placebo test and falsification in causal inference.

Applications and Examples

  • Education and testing thresholds: Policies that grant access to resources or opportunities when test scores or grades cross a threshold are common sites for RD analyses. For example, scholarship eligibility, school funding awards, or probationary status tied to performance benchmarks often yield informative RD estimates about the policy’s impact on outcomes such as achievement, retention, or enrollment. See education policy for related literature and applications.

  • Welfare and labor programs: Benefit eligibility rules that hinge on a cutoff in income or other indicators provide fertile ground for RD studies, informing debates about the effectiveness and efficiency of transfer programs. See policy evaluation discussions of social programs for additional context.

  • Public health and safety rules: Some regulations apply only to individuals above or below a risk threshold, offering opportunities to study effects on behavior, utilization, or health outcomes near the margin.

Controversies and Debates

  • External validity and generalizability: A frequent debate centers on how far RD findings travel beyond the immediate neighborhood of the cutoff. Critics argue that local effects may not generalize to the broader population or to different thresholds. Proponents counter that RD provides credible evidence where randomized trials are unavailable and that multiple RD studies across thresholds and programs can illuminate broader patterns. See external validity and generalizability discussions within causal inference.

  • Threshold design and fairness: Critics worry that thresholds can create winners and losers near the cutoff and that the choice of threshold can embed inequities. Proponents argue that thresholds are a natural byproduct of administrative feasibility, political compromise, and transparent rule-making, and that RD isolates the effect of the policy in a well-defined margin where decisions are being made.

  • Manipulation and integrity of the running variable: If actors can influence the running variable to cross the cutoff, the RD identification can be compromised. The McCrary test helps detect this, and researchers may restrict samples, adjust methods, or combine RD with other designs to mitigate risk. This debate connects to broader concerns about measurement, enforcement, and policy design.

  • The role of RD in a broader evidence portfolio: Some critics claim RD is narrow or insufficient for broad policy conclusions. Supporters emphasize that RD is one of several credible quasi-experimental tools that, together with randomized trials, natural experiments, and panel methods, contribute to a robust evidentiary base for policy. See quasi-experimental design and randomized controlled trial for context on complementary approaches.

  • Woke criticisms and why some defenses regard them as overstated: Critics sometimes frame RD as limited because it only identifies effects at the margin or claim that thresholds reflect arbitrary or biased policy design. From a policy-analytic perspective, the defense is that thresholds are ubiquitous in governance (eligibility rules, funding caps, performance benchmarks) and that RD directly tests the policy-relevant question at the margin with credible causal logic. Modern RD practice incorporates rigorous robustness checks, multiple specifications, and falsification tests, increasing reliability and policy relevance. The claim that RD is inherently biased or insufficient is typically addressed by technical safeguards and by recognizing the local scope of the inference rather than dismissing the method outright.

See also