9-first and follow set

First and Follow Sets

The Function of LL (1) Grammars

o We mentioned that back-tracking causes the effective problems of the analyser

o It will be avoided if we can predict the correct selection of each production in each step

o Predicted analysis processing

o From the start symbol, according to the left-most non-terminal symbols A and the input symbol ‘a’, select the correct production

o The selected production must be unique to ensure the correctness of the analysis

o How to select the correct production？

LL(1) 文法的功能详细解读

1. 回溯导致的问题

• 在解析过程中，回溯（Back-tracking） 是指当解析器选择了错误的推导路径时，需要返回到之前的步骤并重新选择产生式。

• 回溯的问题：

• 低效：解析器会多次扫描输入流，导致时间和资源的浪费。

• 复杂：撤销已执行的操作，例如删除符号表条目、撤销中间代码等。

• 为了避免这些问题，LL(1) 文法 通过预测分析的方法进行解析，无需回溯。

2. 避免回溯：预测分析

• 预测分析（Predictive Parsing）：通过预测每一步选择的产生式，构建一个解析树，解析过程是自顶向下的（Top-Down）。

• 核心思想：根据当前的输入符号和最左侧的非终结符，唯一地选择一个正确的产生式。

3. 预测分析的步骤

起始符号：

从文法的起始符号 $S$ 开始推导。

左推导：

在解析过程中，始终对最左侧的非终结符进行替换，称为最左推导（Left-most Derivation）。

输入符号：

解析器查看下一个输入符号 $a$（也称为前瞻符号或 Lookahead 符号）。

选择正确的产生式：

根据当前的非终结符 $A$ 和输入符号 $a$，选择一个唯一的产生式 $A \rightarrow \alpha$。

4. 选择正确产生式的唯一性

• 为了确保解析的唯一性，满足以下条件：

• 当前非终结符 $A$ 和 输入符号 $a$ 唯一确定一个产生式 $A \rightarrow \alpha$。

• 解析器使用预测表（Parsing Table），实现这个映射关系。

5. 问题：如何选择正确的产生式？

选择正确的产生式需要遵循以下原则：

• 预测表（Parsing Table） 的构建：

• 每个非终结符 $A$ 和输入符号 $a$ 对应一个唯一的产生式 $A \rightarrow \alpha$。

• LL(1) 条件：

• 一个文法是LL(1) 的，当且仅当在每个输入符号和非终结符的组合下，产生式的选择是唯一的。

总结

• LL(1) 文法通过预测分析消除了回溯：

• 从起始符号出发。

• 对当前的非终结符和输入符号进行匹配，唯一地选择产生式。

• 确保文法满足 LL(1) 条件，即每个输入符号和非终结符的组合都有唯一的产生式。

• 核心问题：如何选择正确的产生式？

• 使用预测表。

• 遵循 LL(1) 文法的唯一性原则。

S-grammars (Korenjak&Hopcroft,1966)

o Make the predict processing works well, it must follow:

o Right side of every production must begin with terminals

o For the same non-terminal, the first terminal of each candidate production must be different

o So, We can select one production according to the input symbols (no confliction generates)

o ε-production is not included

o The application of the grammar is restricted

S-grammars (Korenjak & Hopcroft, 1966) 详细解读

1. 预测分析的基本要求

在LL(1) 预测分析中，为了确保能够高效、无冲突地选择合适的产生式，文法必须满足以下条件：

每个产生式右侧必须以终结符开始：

• 在文法 $G$ 中，每个产生式的右侧（也就是产生的符号串）必须以终结符开头。

• 这样，解析器可以直接根据输入的终结符进行匹配，而不需要进行复杂的回溯或多次选择。

相同非终结符的候选产生式的第一个终结符必须不同：

• 如果非终结符 $A$ 有多个候选产生式：

$A \rightarrow \alpha_1 , | , \alpha_2 , | , \alpha_3 \dots$

• 每个 $\alpha_i$ 的第一个终结符（称为 First 集）必须是互不相同的。

• 这样，解析器可以通过当前输入符号唯一确定一个合适的产生式，而不会产生冲突。

2. 无法包含 $\varepsilon$-产生式

• $\varepsilon$-产生式指的是能够推导出空字符串的产生式：

$A \rightarrow \varepsilon$

• 在 S-grammars 中，$\varepsilon$-产生式不被允许，因为 $\varepsilon$ 可能导致预测分析无法唯一确定输入符号的匹配：

• 如果某个非终结符 $A$ 存在 $\varepsilon$-产生式，可能会造成First 集冲突，导致解析器无法明确选择合适的产生式。

• 为了避免这种情况，S-grammars 禁止 $\varepsilon$-产生式。

3. 解析过程中的无冲突性

• 关键原则：解析器在解析输入字符串时，可以通过当前输入符号直接预测和选择正确的产生式。

• 确保无冲突的条件：

• 每个产生式右侧必须以终结符开始。

• 所有候选产生式的第一个终结符必须不同。

• 解析器的工作流程：

• 读取输入符号。

• 根据输入符号与当前非终结符的First 集匹配，唯一确定产生式。

4. 文法的限制

由于以上严格要求，S-grammars 对文法的适用范围有所限制：

• 不允许 $\varepsilon$-产生式：不能表示空字符串的推导。

• 终结符开头：所有产生式的右侧必须以终结符开始，无法表示更复杂的文法结构。

• 解析能力有限：虽然 S-grammars 在 LL(1) 预测分析中表现良好，但对于更复杂的上下文无关文法（Context-Free Grammars）不适用。

总结

S-grammars 的核心要求：

• 每个产生式右侧必须以终结符开头。

• 相同非终结符的候选产生式的第一个终结符必须不同。

• 不允许包含 $\varepsilon$-产生式。

优点：

• 使得预测分析过程高效、无冲突。

• 解析器可以通过输入符号直接选择合适的产生式。

缺点：

• 限制了文法的表达能力，无法处理包含 $\varepsilon$-产生式或更复杂结构的文法。

S-grammars 是 LL(1) 文法的一个特殊子集，主要用于预测分析，确保解析过程简单高效且无冲突。

The role of FIRST and FOLLOW sets

o FIRST and FOLLOW sets have a role to play in both the parsing strategies covered in this module, as you will see

o Given a rule, what are the set of terminals that could appear as the first terminal of strings derived by that rule？This is the FIRST set