Segmental DuplicationsEdit
Segmental duplications are sizeable, highly similar copies of DNA segments that reside within the genome. Typically defined as duplications spanning at least 1 kilobase in length with sequence identity around 90% or higher, these elements cluster into blocks and are dispersed across chromosomes. They contribute a substantial portion of genomic variation among individuals and species, acting as substrates for genetic innovation as well as a source of structural variation that can predispose to disease. In humans, segmental duplications constitute a meaningful fraction of the genome and are enriched in particular regions, such as subtelomeric and pericentromeric zones, as well as areas rich in genes and regulatory sequences. Many SDs harbor full genes or gene fragments and regulatory elements, influencing expression patterns and phenotypes in ways that are still being clarified. segmental duplications copy number variation pericentromeric region subtelomeric region olfactory receptor gene clusters
From a genomic architecture standpoint, segmental duplications are not random ornaments but structured components of genome organization. They form duplication blocks and can be organized around core duplicons—discrete sequence motifs that seed further duplication events. This architecture fosters rapid expansion or contraction of particular gene families and can shape genomic regions with recurrent rearrangements. The net result is a genome that is both dynamic and, in places, especially prone to errors during meiosis or DNA repair. core duplicons duplication gene family non-allelic homologous recombination
Definition and genomic structure
Segmental duplications are large, near-identical DNA segments that have been copied and relocated within the genome. They are grouped into blocks that can span hundreds of kilobases and frequently appear in clusters. The high sequence identity among paralogous copies makes these regions challenging to assemble with short-read sequencing technologies, but long-read approaches have improved accuracy in recent years. The distribution of SDs tends to be nonuniform, with notable concentrations in regions that also harbor gene families or regulatory elements. segmental duplications long-read sequencing BAC subtelomeric region pericentromeric region
Within SD blocks, a subset of cores or “duplicons” acts as seeds for further duplication, creating a hierarchical landscape of repeats. This architecture supports both the maintenance of essential genes and the generation of novel gene copies, but it also creates predispositions to misalignment and misrepair during cellular division. The result can be copy-number variation among individuals, as well as recurrent rearrangements that underpin specific clinical syndromes. core duplicons copy-number variation non-allelic homologous recombination PMP22
Mechanisms and consequences
The principal mechanism by which segmental duplications influence genome structure is non-allelic homologous recombination (NAHR). When misalignment occurs between similar SD copies during meiosis or DNA repair, unequal crossing over can produce deletions, duplications, or more complex rearrangements. Such events have been linked to several well-characterized human disorders and to broader patterns of structural variation across populations. In some cases, these rearrangements are recurrent, producing characteristic and copy-number–dependent phenotypes. non-allelic homologous recombination NAHR structural variation Charcot-Marie-Tooth disease PMP22
SDs also facilitate gene birth and death through mechanisms such as gene duplication, exon shuffling, and pseudogenization. The duplication of full genes or regulatory regions can create novel gene functions or expression patterns that may be adaptive in certain contexts but maladaptive in others. Notably, some SDs contain genes that contribute to sensory perception, immunity, or neurodevelopment, and their copy-number dynamics are a subject of intense study. gene duplication SRGAP2 olfactory receptor gene cluster immune gene families
In evolutionary terms, segmental duplications have left a lasting imprint on lineage-specific variation. They have contributed to the expansion of gene families in primates and other mammals and are implicated in adaptive changes to physiology and development. Evidence from comparative genomics shows both conserved and rapidly evolving SD blocks across species, reflecting a balance between functional constraint and plasticity. evolutionary genetics comparative genomics primates SRGAP2
Disease associations and clinical relevance
Because SDs create structural variability in the genome, they are repeatedly implicated in clinically relevant rearrangements. Deletions and duplications mediated by SDs can disrupt dosage-sensitive genes or perturb regulatory networks, giving rise to a spectrum of disorders. Classic examples include deletions or duplications at loci flanked by SDs that lead to syndromes such as Williams-Beuren syndrome (7q11.23) or Charcot-Marie-Tooth disease type 1A (PMP22 gene duplications). The 22q11.2 region is another hotspot where SD-rich architecture underpins microdeletions and microduplications with notable clinical consequences. Williams syndrome PMP22 Charcot-Marie-Tooth disease diGeorge syndrome 22q11.2 deletion syndrome
Beyond overt disease, SD-driven copy-number variation contributes to phenotypic diversity in quantitative traits and may influence susceptibility to complex diseases in ways that are still being deciphered. Population-level studies of SD variation help illuminate how these genomic features contribute to health disparities and personalized medicine, while also informing the limits of diagnostic interpretation in regions of the genome that are structurally complex. copy-number variation population genetics personalized medicine
Evolution, population genetics, and human diversity
Segmental duplications have played a pivotal role in shaping the human genome's evolution. They underpin notable human-specific changes, including expansions of certain gene families and regulatory networks that may influence brain development, sensory perception, and metabolism. For example, duplications involving neural development genes and related regulatory elements have been linked to lineage-specific traits, though the exact causal relationships are complex and often debated. Comparative genomics across primates reveals both conserved SD blocks and lineage-specific rearrangements, highlighting the dynamic nature of the genome over evolutionary time. SRGAP2 gene family pangenome primates neural development
The study of segmental duplications also intersects with broader themes in population genetics and genomics. SD diversity among individuals and populations reflects historical recombination, selection, and drift, as well as recent demographic events. Advances in sequencing technologies and the move toward pangenome projects are helping researchers map SD variation more comprehensively, reducing biases that arise when relying on a single reference genome. pangenome population genetics long-read sequencing Genome Reference Consortium
Detection, sequencing, and interpretation
Historically, segmental duplications were difficult to resolve with short-read data and standard genome assemblies. Modern approaches combine long-read sequencing, optical mapping, and targeted assembly strategies to improve resolution of SD blocks. Integrated analyses with short-read data, array-based platforms, and specialized software enable researchers to delineate copy-number changes and the precise breakpoints of rearrangements. Annotation of SDs continues to be refined as new assemblies and population-scale data become available. long-read sequencing optical mapping genome assembly pangenome copy-number variation
In clinical genetics, interpreting SD-associated rearrangements requires careful consideration of breakpoint structure, gene content, and the broader genomic context. Not all SDs are pathogenic; many contribute to normal variation. Clinicians and researchers emphasize confirming findings with orthogonal methods and considering the full repertoire of structural variation when constructing a diagnostic or research interpretation. clinical genetics structural variation NAHR PMP22