Phylogenetic NetworksEdit
Phylogenetic networks are graphical models that extend the traditional tree representation of evolutionary history to accommodate reticulate events—situations where lineages mix, swap genes, or otherwise fail to split cleanly. In contrast to bifurcating trees, networks can contain reticulation nodes in which genetic material from distinct lineages comes together, producing histories that reflect processes such as hybridization, horizontal gene transfer, and recombination. This makes phylogenetic networks a valuable tool for studying organisms and genomic regions where a single, clean tree cannot capture the full complexity of ancestral relationships. phylogenetic networks in practice are used across the tree of life, including bacteria and other microbes where gene exchange is rampant, as well as in plants and some animals where hybridization has shaped diversification. hybridization and horizontal gene transfer are central mechanisms that motivate the use of networks, while recombination within genomes also creates non-tree-like histories.
The rise of phylogenetic networks mirrors a long-standing recognition that gene histories can conflict with species histories. Early phylogenetics emphasized trees under a simple, bifurcating model, but increasing amounts of data revealed widespread discordance among gene trees. Networks provide a way to visualize and quantify that discordance, offering a more realistic account of how evolution unfolds in the presence of reticulation. They range from distance-based representations that emphasize splits in the data to explicit graphical models that attempt to reconstruct particular reticulation events. In this sense, phylogenetic networks can be viewed as a bridge between traditional tree thinking and the more nuanced view of evolution as a reticulate process in many lineages. gene tree discordance incomplete lineage sorting split networkNeighborNet.
This article surveys the core ideas, methods, and debates surrounding phylogenetic networks without assuming a single net always outperforms a tree. While networks can reveal reticulate histories that trees miss, they also introduce methodological challenges, including identifiability, sensitivity to data quality, and the risk of overfitting when reticulations are inferred from limited information. The practical choice between a network and a tree (or a tree with polytomies and reconciliation analyses) depends on the data, the evolutionary questions at hand, and the strength of the signals for reticulation. split network reticulation multispecies coalescent.
Core concepts
A phylogenetic network is a graph that represents ancestry with the possibility of reticulation, where two or more lineages contribute to a common descendant. This structure can be rooted (time-directed) or unrooted (undirected). Phylogenetic networks may include explicit reticulation nodes or be derived from distance data that imply splits without naming a specific reticulation event. reticulation phylogenetic tree.
Reticulation events include hybridization (sometimes called hybrid speciation), horizontal gene transfer, and recombination. Each of these processes can create parts of the genome with different histories, which networks aim to capture. hybridization horizontal gene transfer recombination.
Types of networks range from split networks, which summarize data as a collection of compatible splits, to explicit networks that model particular evolutionary scenarios with reticulation nodes. Notable formulations include the ancestral recombination graph for within-genome histories and various species-network models that accommodate gene flow between species. split network Ancestor recombination graph.
Rooted vs unrooted: many analyses impose a direction of time to reflect lineage divergence, while some representations emphasize the relationships among taxa without a specified root. Rooting often relies on outgroups, molecular clock dating, or other temporal information. rooted phylogeny.
Types of phylogenetic networks
Split networks
- Constructed from distance matrices or matrices of splits, these networks visualize conflicting signals in data without committing to a single reticulation scenario. They are useful for exploring the overall structure of discordance and can be produced by software such as NeighborNet and SplitsTree.
Hybridization networks
- These models explicitly encode hybridization events, representing species histories where lineages merge through a reproductive barrier breakdown or other mechanisms of gene flow. They are particularly relevant to groups with documented hybrid speciation. hybridization.
Ancestral recombination graphs (ARGs)
- ARGs model the genealogical history of alleles along a genome in the presence of recombination. They provide a comprehensive framework for describing how different genomic regions trace back to different ancestral lineages. Ancestral recombination graph.
Species networks
- Broader models that capture gene flow among species, often within a multispecies coalescent framework. These networks aim to reconcile differences among gene trees by allowing a history with migration or introgression among species. Species network multispecies coalescent.
Inference methods
Distance- and split-based methods
- These approaches transform sequence data into measures of pairwise distances or splits, then summarize those signals in a network structure. They are computationally scalable and can handle large data sets, though they may simplify certain historical details. Tools and algorithms associated with this approach include NeighborNet and related implementations in SplitsTree.
Explicit inference of reticulations
- Methods in this category attempt to reconstruct specific reticulation events and their placement in the history. They often rely on models of the multispecies coalescent with introgression or hybridization, and may employ maximum likelihood or Bayesian inference. Notable software and frameworks include PhyloNet, BEAST with multispecies networks, and related implementations that explore networks under probabilistic models such as the multispecies coalescent. PhyloNet BEAST
Ancestral recombination and locus-specific histories
- Some analyses focus on jointly inferring local genealogies across the genome, acknowledging that different regions have different histories due to recombination. This line of work interfaces with the general theory of coalescent processes in structured populations. Ancestral recombination graph coalescent.
Data and interpretation
Data types
- Multilocus data, gene trees, genome-scale alignments, and SNP matrices are common inputs. The choice of data type influences the detectability of reticulation signals and the confidence in inferred networks. multilocus genome.
Gene tree discordance and signal separation
- Gene trees can disagree due to incomplete lineage sorting, recombination, or horizontal transfer. Distinguishing these sources of discordance is a central challenge in network inference and interpretation. gene tree discordance incomplete lineage sorting.
Practical considerations
- Networks can become complex quickly as more reticulations are included. Researchers must balance model complexity, data quality, and interpretability, often using cross-validation, simulation-based checks, or information criteria to avoid overfitting. model selection.
Applications
Bacteria and archaea
- In prokaryotes, horizontal gene transfer is pervasive, and network representations help describe the mosaic nature of genomes and the flow of genes across lineages. Horizontal gene transfer.
Plants and hybrid zones
- Plant lineages frequently exhibit hybridization and reticulate speciation, where networks illuminate patterns of introgression and the origins of polyploid lineages. hybridization.
Animals and complex speciation
- In some animals, hybridization and introgression have shaped clades, contributing to reticulate histories that networks can help to summarize alongside more traditional trees. hybridization.
Controversies and debates
When to use networks vs trees
- Critics argue that networks can overfit data or imply reticulations where none exist, especially when signals are weak or data are sparse. Proponents contend that networks provide a more faithful representation when reticulation processes are common, and that failing to account for them risks misinterpreting evolutionary history. The choice often hinges on data quality, the evolutionary context, and the strength of the statistical signal for reticulation. reticulation incomplete lineage sorting.
Identifiability and model complexity
- A recurring concern is identifiability: given noisy data, different network topologies can fit similarly well, making it hard to distinguish true reticulations from artifacts. This has led to a push for model-based approaches, rigorous hypothesis testing, and transparent reporting of uncertainty. model selection.
Interpretability of reticulations
- Even when reticulations are detected, interpreting their biological meaning can be challenging. Researchers must distinguish between ancient gene flow, recent introgression, and methodological artifacts, which can lead to divergent conclusions about the history of a lineage. introgression.
Data requirements and computational demands
- Network inference can be data- and computation-intensive, particularly for large genomes or many taxa. Critics point to diminishing returns in certain cases, while supporters emphasize the incremental gains in accuracy for groups where reticulation is well-supported. computational biology.