First SetsEdit

First Sets are a foundational concept in formal language theory and compiler design. They describe, for a given grammar symbol or a sequence of symbols, which terminals can appear as the first symbol in some string derived from that symbol. This information is essential when building parsers, especially when constructing predictive parse tables for top-down parsers and when deciding which productions to apply in a rule-driven parse.

At a high level, First Sets help answer questions like: What terminals can appear at the start of any string derived from a nonterminal A? If A can derive the empty string, ε, this fact also influences decisions later in parsing. The notion extends to strings of symbols, not just single nonterminals, so you can talk about First(α) for any α in (N ∪ Σ)*, where N is the set of nonterminals and Σ is the set of terminals. In addition to guiding parsing, First Sets are used in grammar analysis and in the design of efficient parsing algorithms, such as predictive parsers Context-free grammar and LL(1) grammar.

A quick primer on the concept connects to related ideas like Follow Sets and ε-derivations. First Sets interact with Follow Sets to populate predictive parse tables, and they are sensitive to productions that can yield the empty string Follow set. Understanding First Sets also helps diagnose when a grammar is not suitable for simple LL(1) parsing due to overlapping First sets among alternatives or because ε appears in FIRST sets in ways that complicate lookahead decisions Predictive parsing.

Computation and Definitions

First of a string

For any string α in (N ∪ Σ)*, First(α) is the set of terminals that can appear as the first symbol of some string derived from α, together with ε if α can derive ε. The basic rules are: - If α starts with a terminal t, then First(α) = {t}. - If α starts with a nonterminal A, then First(α) includes First(A) excluding ε. If ε is in First(A), then you look at the next symbol in α and add its First set, and so on. If every symbol in α can derive ε, then ε is included in First(α).

For a nonterminal A, First(A) is defined as First(α) for any production A → α. The computation typically starts with terminals and gradually incorporates nonterminals through the grammar’s productions, repeating until no more symbols can be added—a fixed-point process Fixed-point iteration.

Computation with a simple grammar

Consider a small grammar with productions: - S → A a | b - A → c | ε

Terminals: {a, b, c} Nonterminals: {S, A} First(A) is {c, ε} because A can derive c or ε. First(S) is {b, c, a} because S → b gives b, and S → A a yields c when A → c, or a when A → ε (since A may be ε, the next symbol a can begin the string).

This demonstration shows how ε-derivations affect the first symbol of derivations and why ε in a FIRST set requires attention when building parse tables or deciding which production to apply.

Algorithm to compute First sets

A standard way to compute First sets is a fixed-point iteration over the grammar’s productions: - Initialize First(t) = {t} for every terminal t, and First(A) = ∅ for every nonterminal A. - For each production A → α, add First(α) to First(A) according to the rules above, paying special attention to ε. - Repeat until no First set changes.

A more formal description follows the idea: - For A → X1 X2 … Xn, add First(X1) minus ε to First(A). - If ε ∈ First(X1), also add First(X2) minus ε to First(A); continue this way until a symbol Xk is found with ε ∉ First(Xk), or all symbols derive ε, in which case ε is added to First(A). - Terminals contribute directly to the First set of the surrounding nonterminals in the appropriate productions.

This iterative approach is efficient for grammars used in real-world compilers and is foundational for building predictive parsing tables Algorithm.

Example and practical use

A concrete example

Take the grammar: - S → A a | b - A → c | ε

From the earlier discussion, First(A) = {c, ε} and First(S) = {b, c, a}. This information helps determine which production to use when the parser sees certain lookahead terminals, and it informs the construction of the LL(1) parse table by mapping productions to appropriate lookahead terminals.

Applications in parsing

Predictive parsing: First Sets underpin the decisions a top-down parser makes when choosing among alternatives for a nonterminal. In an LL(1) table, an entry for a nonterminal A and a terminal a is filled with a production A → α if a ∈ FIRST(α); if ε ∈ FIRST(α), then the entry may also depend on FOLLOW(A) Predictive parsing.
Grammar analysis: First Sets help detect ambiguities or conflicts in a grammar, such as overlaps among FIRST sets of different productions for the same nonterminal, which can indicate non-LL(1) behavior. They also play a role in simplifying grammars by eliminating or restructuring ambiguous productions.
Extensions to larger parsing frameworks: While First Sets are central to LL parsing, they also interact with other parsing strategies such as LR parsing, where lookahead items and epsilon closures in automata rely on first-symbol information in a broader sense LR parsing.

Limitations and extensions

Some grammars inherently resist LL(1) parsing due to overlapping FIRST sets among different alternatives or due to ε-productions that complicate lookahead decisions. In such cases, more powerful parsing techniques (for example, generalized LR parsers) or grammar transformations are used. Extensions like First_k Sets generalize the idea to the first k symbols of derived strings, which can support more expressive predictive parsing strategies in certain contexts First_k.