Extended Haplotype HomozygosityEdit

Extended Haplotype Homozygosity

Extended Haplotype Homozygosity (EHH) is a population-genetic statistic used to detect recent positive selection in genomes. The core idea is simple: when a new advantageous variant rises rapidly in frequency, it “drags along” neighboring genetic material before recombination has a chance to break it apart. As a result, a region around the favored allele shows unusually long stretches of identical genetic background, i.e., extended haplotype homozygosity. The signal is strongest for variants that rose to high frequency only in the recent past, leaving a characteristic footprint that current methods can identify in large-scale sequence data.

EHH was developed to quantify how identity-by-descent extends away from a focal locus, or core haplotype. By comparing carriers of the core haplotype to non-carriers, researchers infer whether a selective sweep may be responsible for the preservation of a long haplotype despite recombination and mutation. This approach complements other methods for inferring selection and population history, and it has become a foundational tool in human and model-organism genetics. For more on the basic idea of the method and its origins, see Extended Haplotype Homozygosity and the early work by Pardis Sabeti and colleagues.

History and concepts

Extended Haplotype Homozygosity emerged from efforts to identify recent adaptive events in the human genome and in other species. The statistic builds on concepts from haplotype structure, recombination dynamics, and the distribution of genetic variation across populations. The central notion is that selection leaves a distinctive pattern in linkage disequilibrium: when a beneficial allele increases in frequency quickly, recombination has less time to shuffle neighboring alleles, producing a detectable tract of shared ancestry among carriers. See also discussions of core haplotype and how a single signal can reflect both historical demography and selection.

In practice, researchers compute EHH starting from a specified core haplotype and observe how rapidly or slowly haplotype homozygosity decays as one moves away from the core along either direction on the chromosome. A slow decay points to a potential recent selective event, while neutral regions typically show faster erosion of homozygosity due to recombination over time.

Methodology and concepts

Core haplotype and haplotype structure

A core haplotype is defined by a short window of neighboring polymorphisms around a variant of interest. The choice of core—its genomic span and the allele it centers on—helps determine the sensitivity of the analysis. Researchers often compare haplotype segments carried by individuals who share the same core to those who do not, tracking the extent of shared ancestry as distance from the core increases. See core haplotype and haplotype for related concepts.

Calculation and interpretation

EHH is a probabilistic measure: it estimates the probability that two chromosomes carrying the core haplotype are identical by descent (i.e., share the same ancestral segment) at a given distance from the core. As distance grows, recombination and mutation erode similarity, so EHH typically declines with distance. A high EHH value at considerable distance from the core indicates that the region around the core has remained unusually intact, which is consistent with recent selection.

To broaden the toolkit, population geneticists use related statistics that build on EHH:

iHS (integrated Haplotype Score) compares EHH decay around ancestral and derived alleles at a given SNP within a single population, helping detect incomplete sweeps.
XP-EHH (cross-population EHH) contrasts EHH decay patterns between populations to identify alleles that have reached fixation in one population but not in another.

These methods are often discussed together because they share the same underlying idea—that the shape of haplotype blocks encodes information about selection and history. See iHS and XP-EHH for more.

Limitations and caveats

While powerful, EHH-based methods have important caveats:

Demography matters. Population bottlenecks, founder effects, withdrawal or migration, and complex structure can produce long haplotypes without any selection. Proper interpretation requires considering demographic history and using simulations or complementary statistics.
Time window sensitivity. EHH is most informative for relatively recent events. Very old or very ancient sweeps may leave weaker, harder-to-interpret signatures.
Recombination rate variation and sampling. Heterogeneous recombination landscapes and small sample sizes can affect the detection power and the apparent length of haplotypes.
Soft sweeps and standing variation. When selection acts on existing variation or multiple haplotypes simultaneously, the classic long, single haplotype pattern may be attenuated, reducing EHH’s sensitivity.

Applications and notable examples

EHH and its derivatives have been applied across human populations and other species to pinpoint regions of recent adaptation and to understand the forces shaping genetic diversity.

Classic human example: selection at the LCT/MCM6 region. The lactase persistence allele that enables adults to digest lactose in dairy cultures rose rapidly in several populations, creating a pronounced haplotype signal around the lactase gene region. See lactase persistence and LCT for related discussion.
Immune and other regions: EHH-based analyses have identified signals in regions associated with immune function and environmental interactions, where recent adaptation may reflect pathogen pressures and dietary or ecological changes. See discussions of beta-globin in malaria-endemic regions and other immune-related loci.
Cross-population contrasts: Researchers have used XP-EHH to compare populations with distinct histories (for example, those with different agricultural practices or disease pressures) to reveal region-specific selective events. See XP-EHH for a methodological overview.
Model organisms and non-human species: Extended haplotype approaches are not limited to humans; they are used to study adaptation in populations of various species, providing a general framework for interpreting selection from haplotype structure. See population genetics for cross-species perspectives.

Controversies and debates

As with any method that infers historical processes from present-day genetic data, EHH-based approaches are subject to ongoing discussion. Key points in the debates include:

Distinguishing selection from demography: Critics emphasize that demographic history can mimic selection signals. Proponents respond that, when combined with simulations, multiple statistics (e.g., EHH, iHS, XP-EHH) and functional evidence, the method remains a valuable part of a convergent approach to inference.
Interpretation of signals across populations: Signals observed in one population may not generalize to others, raising questions about universal claims of adaptation. A cautious approach emphasizes replication, context, and functional validation rather than sweeping claims about universal human history.
Functional relevance vs. statistical footprint: There is ongoing discussion about how best to connect statistical signals to biological function. While a region may show extended haplotype structure, establishing causal variants and mechanisms requires experimental follow-up.
Policy and communication: In public discourse, there is concern about misinterpretation of population-genetic results as claims about groups or individuals. Responsible interpretation stresses that genetic variation reflects a complex history of migration, drift, selection, and recombination, and that findings should not be overextended to social categories or policy conclusions.