Follow SetEdit

In formal language theory, the follow set is a foundational concept used to analyze grammars for programming languages and to build efficient parsers. It complements the First set and is essential for creating predictive parsing tables that guide which production to apply when the parser looks ahead at the next input symbol. The concept is defined for grammars expressed as context-free grammars and helps determine which terminal symbols may legally appear immediately to the right of a given nonterminal during a derivation.

The follow set is especially important when constructing LL(1) parsers, where a single lookahead symbol should suffice to decide among competing productions. By combining follow sets with FIRST sets, developers can build parse tables that avoid backtracking, leading to faster and more reliable compilers and interpreters. The end-of-input marker, commonly written as $, is often included in follow sets to indicate that a nonterminal may appear at the end of the input stream.

Definition

Formal definition

Let G = (N, Σ, P, S) be a context-free grammar, where: - N is the set of nonterminals, - Σ is the set of terminals, - P is the set of productions, and - S ∈ N is the start symbol.

For A ∈ N, the follow set Follow(A) is the set of terminals a ∈ Σ ∪ { $ } such that there exists some derivation S =>* α A a β, where α ∈ (N ∪ Σ)* and β ∈ (N ∪ Σ)*, and a is the first symbol of the string that can appear immediately to the right of A in some sentential form. Intuitively, it captures which symbols can legally come after A in a derivation from the start symbol, including the end-of-input marker when A can be at the end of the input.

Computation (fixed-point algorithm)

Follow sets are computed by iterating over the productions until no set changes: - Initialize Follow(A) = ∅ for all A ∈ N. - Include $ in Follow(S), the start symbol. - For each production X → α A β, add FIRST(β) \ { ε } to Follow(A). - If β ⇒* ε (or β is empty), add Follow(X) to Follow(A). - Repeat until all Follow sets converge (no changes).

This procedure is typically explained together with FIRST sets, since FIRST(β) is needed in the accumulation step. The relationship between FIRST and FOLLOW sets underpins how predictive parsers choose the right production based on a single lookahead symbol.

Example

Consider a simple grammar with nonterminals {S, A, B}, terminals {a, b} and the start symbol S: - S → A B - A → a | ε - B → b | ε

From this grammar: - FIRST(A) = { a, ε } - FIRST(B) = { b, ε }

To compute Follow: - Follow(S) contains { $ } by convention (end of input). - From S → A B, A is followed by B. FIRST(B) \ { ε } = { b } goes into Follow(A). Since ε ∈ FIRST(B), B can be ε, so Follow(S) is added to Follow(A): Follow(A) includes { $, b }. - Since B is at the end of the production S → A B, Follow(S) is added to Follow(B): Follow(B) includes { $ }.

Thus: - Follow(S) = { $ } - Follow(A) = { b, $ } - Follow(B) = { $ }

These follow sets would then be used, for example, to populate a predictive parse table for an LL(1) parser, in combination with FIRST sets of the productions for A and B.

Computation methods and usage

In predictive parsing

Follow sets are used to decide which production to use when the parser is looking at the next input symbol. In an LL(1) parse table, a production X → α is placed in the cell [X, a] for each a in FIRST(α) (excluding ε). If ε ∈ FIRST(α), the production is also placed in cells [X, a] for each a in FOLLOW(X). This ensures that, given the current nonterminal on the parsing stack and the next input symbol, the parser can select a single, correct production without backtracking.

Relation to FIRST sets and SELECT sets

Follow sets interact with FIRST sets to define the SELECT set for each production in a grammar. The SELECT set for a production X → α is FIRST(α) without ε, union FOLLOW(X) if α can derive ε. This combination is what drives the construction of nonconflicting LL(1) parse tables.

Limitations and transformations

Not all grammars are LL(1); left-recursive grammars or grammars requiring more than one symbol of lookahead pose challenges. In practice, grammars are often transformed (e.g., via left factoring or eliminating left recursion) to satisfy LL(1) constraints. When performing such transformations, Follow sets must be recomputed for the new grammar, and the resulting parse tables must be validated to ensure they remain deterministic and correct.

Applications and related concepts

Predictive parsing: a class of parsing strategies that rely on single-symbol lookahead and parse tables derived from FIRST and FOLLOW sets.
LL(1) parser: a top-down parser that uses one symbol of lookahead and LL(1) parse tables created from FIRST and FOLLOW sets.
parsing table: the data structure that stores which production to apply for each combination of nonterminal and lookahead symbol.
context-free grammar: the formal framework within which follow sets are defined.
nonterminal and terminal: the objects involved in the grammar and its productions.
First set: the companion concept describing which terminals can begin strings derived from a given nonterminal.