Tree Data StructureEdit

Tree data structures are fundamental in computer science, providing a way to model hierarchical relationships and support efficient algorithms for searching, inserting, and deleting elements. A tree is a connected acyclic graph with a distinguished node called the root. Each node has zero or more children and, except for the root, exactly one parent. The structure naturally represents nested data, such as parse trees in compilers, expression trees in mathematics and programming, and directory hierarchies in file systems. The performance of common operations is closely tied to the tree’s shape, with shallow, balanced trees offering faster worst-case guarantees than highly skewed, chain-like ones. See Tree (data structure) and Graph (data structure) for broader context.

Different variants adapt the basic idea to different domains. Binary trees, for example, limit each node to at most two children and form the backbone of many ordered search structures. Other families, such as the AVL tree and Red-Black Tree, are designed to keep height small to ensure fast operations. Disk-oriented storage systems frequently use B-tree and its relatives to keep data pages close to each other, reducing disk I/O. In programming languages and tooling, trees appear as Parse tree and as Abstract syntax tree representations that capture the syntactic structure of source code. The tree model also underpins data structures used in priority queues, heaps, and various indexing schemes employed in databases and search systems.

This article surveys the core concepts, common variants, fundamental operations, and typical applications of tree data structures. It emphasizes the underlying ideas that recur across many specific implementations, and it points to related topics such as traversal methods, balancing techniques, and alternative representations.

Basic concepts

Node: a basic element of a tree, often containing a value and links to its children. See Node (data structure).
Root: the unique top node with no parent. See Root (data structure).
Edge: a connection between a node and one of its children. See Edge (graph theory).
Parent and child: a node’s parent is its immediate ancestor; a node’s children are its immediate descendants. See Parent (data structure) and Child (graph theory).
Leaf: a node with no children. See Leaf (data structure).
Subtree: a node together with all its descendants. See Subtree.
Path: a sequence of edges linking two nodes. See Path (graph theory).
Height and depth: height is the length of the longest path from the root to a leaf, while depth is the distance from the root to a node. See Height (graph theory) and Depth (graph theory).
Degree: the number of neighbors a node has in the tree context, often the number of children (and possibly the parent). See Degree (graph theory).
Ancestor and descendant: nodes connected by a path to another node above or below in the hierarchy. See Ancestor (graph theory) and Descendant.

Notation and formal definitions

A tree is a connected acyclic graph with a designated root node. Each non-root node has exactly one parent, and the set of descendants of a node forms a subtree. See Tree (data structure) for the formal notion and common terminology.
A rooted tree is one with a specified root; an unrooted tree omits the distinguished top node but still maintains a hierarchical structure in many contexts. See Rooted tree and Unrooted tree if you are looking for that distinction in literature.

Structure and representation

Rooted trees vs. unrooted trees: In many algorithms, the root provides a natural starting point for traversals and balancing decisions. See Rooted tree.
Representations: Trees can be stored with explicit pointers between nodes or in array-based layouts, especially when the shape is regular. Common representations include:
- Left-child right-sibling representation, which encodes a general tree with a binary structure. See Left-child Right-sibling representation.
- First-child/next-sibling representation, a practical variant for compact storage. See First-child/next-sibling representation.
- Binary tree layouts, where each node has at most two children, often stored in arrays for compact indexing. See Binary tree.
Shape and balance: The arrangement of nodes affects operation costs. A perfectly balanced tree minimizes height, while skewed or degenerate trees can degrade performance to linear time for some operations. See Balance (tree data structure) and discussions of Tree balancing in specific variants.

Variants and examples

Binary trees: A node has at most two children (commonly referred to as left and right). Binary tree forms the basis for many search and heap variants.
Self-balancing binary search trees: Maintain a height-balanced shape to guarantee logarithmic operation times.
- AVL tree: Rotations are used to keep the height within strict bounds.
- Red-Black Tree: A looser balancing scheme using color properties to ensure logarithmic height.
- Splay tree: Recently accessed elements are moved toward the root to improve locality of reference.
- Treap: A randomized structure combining a binary search tree with a heap property.
B-tree and variants: B-trees are designed for storage systems with high fan-out and efficient disk usage.
- B-tree: Indexing structure that keeps data in sorted order and minimizes disk I/O.
- B+ tree: An extension where all data resides in leaf pages, with internal nodes storing only keys for navigation.
- B* tree: A variant with different fill factors to improve space utilization.
Tries and prefix trees: Specialized trees for associative arrays with string keys, useful in text processing and dictionaries. See Trie.
Heaps: A tree with a partial order (heap property) used to implement priority queues. See Heap (data structure).

Operations and algorithms

Insertion, deletion, and search: Core operations for most tree types, with performance depending on height. Balanced trees ensure O(log n) worst-case time for these operations, while unbalanced trees can degrade to O(n) in the worst case.
Traversal methods: Systematic ways to visit all nodes.
- Preorder traversal
- Inorder traversal
- Postorder traversal
- Level-order traversal (breadth-first) See Tree traversal for general techniques and algorithms.
Balancing and rotations: Many trees employ local restructuring to maintain height bounds.
- Tree rotations are the primary operation in balancing procedures for several self-balancing BSTs. See Tree rotation.
Path properties and locality: Tree layouts can improve data locality and cache performance during traversals and queries.
Specialized operations: Merging trees, splitting trees, and maintaining aggregate data (such as subtree sums or minimums) during updates. See Augmented tree for examples.

Applications

Data organization and indexing: Trees enable hierarchical data models and efficient range queries in databases and file systems. See B-tree and B+ tree for database indexing contexts.
Compilers and interpreters: Abstract syntax trees capture the syntactic structure of source code, while parse trees arise from grammatical analysis. See Abstract syntax tree and Parse tree.
Expression evaluation and symbolic computation: Expression trees encode algebraic formulas and support transformations and simplifications.
Text processing and dictionaries: Tries provide fast prefix lookups and autocomplete support.
Routing and file systems: Directory trees and hierarchical namespaces model real-world organization and access patterns.

Implementations and considerations

Memory layout and locality: The choice between pointer-based representations and array-based layouts affects memory usage and cache efficiency. See Binary tree and Left-child Right-sibling representation for common options.
Balancing costs: Self-balancing trees incur overhead for rotations and maintenance but offer predictable performance. The trade-offs depend on workload characteristics, such as insert-heavy versus search-heavy patterns.
Practical considerations: When data are large and stored on disk, structures like B-tree-family indexes reduce disk I/O and improve throughput. In-memory trees may favor simpler implementations like plain binary trees or augmented trees when balancing is less critical.