Tamura Nei ModelEdit
The Tamura-Nei model, often abbreviated TN93, is a nucleotide substitution model used in molecular phylogenetics to describe how DNA sequences evolve over time. It relaxes the simplest assumptions by allowing unequal stationary base frequencies and by distinguishing rates for transitions among purines (A↔G) and among pyrimidines (C↔T) from rates for transversions (all other substitutions). This makes TN93 more flexible than older models while remaining computationally tractable for many analyses in Phylogenetics and Molecular evolution.
Introduced in 1993 by Tamura and Nei, the model has become a workhorse for distance calculations and likelihood-based tree reconstruction. It sits between the most basic models (which assume equal base frequencies and a single substitution rate) and the most general frameworks (which may require many parameters or complex context dependence). TN93 is particularly popular when data suggest biased base composition and distinct transition dynamics for the two classes of transitions, providing a pragmatic balance between realism and efficiency. See for example discussions in applications implemented in various software suites such as MEGA and RAxML.
Model definition
Basic structure
The Tamura-Nei model is a continuous-time Markov process on DNA sequences. It assumes a stationary distribution of bases with frequencies pi_A, pi_G, pi_C, pi_T that sum to one, and it distinguishes three kinds of substitutions when i ≠ j: - transitions between purines (A↔G) with rate parameter k1, - transitions between pyrimidines (C↔T) with rate parameter k2, - all other substitutions (the transversions) with rate parameter k3.
Under the model, the off-diagonal elements of the rate matrix Q are given by: - Q_{AG} = k1 * pi_G, Q_{GA} = k1 * pi_A - Q_{CT} = k2 * pi_T, Q_{TC} = k2 * pi_C - Q_{AC} = k3 * pi_C, Q_{AT} = k3 * pi_T - Q_{GC} = k3 * pi_C, Q_{GT} = k3 * pi_T - Q_{CA} = k3 * pi_A, Q_{CG} = k3 * pi_G - Q_{TA} = k3 * pi_A, Q_{TG} = k3 * pi_G
The diagonal entries are set so that each row sums to zero: Q_{ii} = -∑{j ≠ i} Q{ij}. The matrix is typically scaled so that the expected rate of substitution per unit time equals one.
Parameters and normalization
- Base frequencies: pi_A, pi_G, pi_C, pi_T reflect the stationary composition of the sequences.
- Rate parameters: k1 (A↔G transitions), k2 (C↔T transitions), k3 (transversions).
- Normalization: the matrix is scaled to fix the overall substitution rate for comparability across analyses.
Special cases and relationships
- If base frequencies are equal (pi_A = pi_G = pi_C = pi_T = 0.25) and k1 = k2 with k3 tuned to match the transversion rate, TN93 reduces toward other two-parameter models.
- If all three rate parameters are equal and base frequencies are equal, TN93 reduces to the Jukes-Cantor model, which treats all substitutions as having the same rate.
Applications and implementation
TN93 is widely used for estimating evolutionary distances and for likelihood-based phylogenetic inference. It is implemented in many software packages for sequence analysis, including MEGA, PhyML, RAxML, and BEAST (often as a selectable nucleotide substitution model). Researchers choose TN93 when data show unequal base composition and distinct transition dynamics for the two classes of transitions, offering a better fit than simpler models without incurring the parameter burden of more general frameworks such as the general time reversible model.
Practical considerations and limitations
When TN93 is appropriate
- Datasets exhibit biased base frequencies (not all bases occur equally often).
- There is evidence that A↔G transitions behave differently from C↔T transitions, while transversions occur at a distinct, more uniform rate across different nucleotide pairs.
- A balance between model realism and computational efficiency is desirable, especially for large alignments or exploratory analyses.
Limitations and common criticisms
- Time-reversibility and stationarity assumptions may be violated in datasets where base composition changes over time or across lineages.
- Like many substitution models, TN93 assumes rate homogeneity across sites unless additional components (e.g., gamma-distributed rate variation, +I for invariable sites) are explicitly included.
- It does not incorporate codon structure, selection on amino acids, or context-dependent effects such as CpG hypermutability, which can be important in vertebrate genomes.
- With very large datasets or when the signal strongly supports more complex patterns, over- or underfitting can occur if the model is misapplied; in such cases, model selection criteria like AIC or BIC may favor more flexible alternatives such as the general time reversible model (GTR) or codon-based models.
Related approaches
- Other nucleotide substitution models include the Jukes-Cantor model (JC) and the Kimura models (K2P and K3P), which offer different levels of complexity and assumptions about base frequencies and transition/transversion rates.
- For datasets where rate variation among sites is important, TN93 is commonly augmented with a gamma distribution (+G) or a proportion of invariant sites (+I), or combined with approaches that account for codon structure or context-dependence.
- In many analyses, TN93 serves as a stepping stone toward more comprehensive frameworks such as the general time reversible model (GTR) and its many extensions.