Go to the first, previous, next, last section, table of contents.
It is useful to define some terms that are used throughout the STT
related to the theory of lexical and syntactic analysis. Note that the
concepts defined here are reworked to my own interpretation and specific
to the STT, they are not intended to be generally applicable.
-
A regular expression is a language that defines what set of input
character sequences "reduce" to a token.
-
A token is an input character sequence that is a particular
instance of a regular expression.
-
A regular definition is a regular expression that has a
symbol (a name).
-
A regular grammar is a set of regular definitions.
-
A lexical context is a regular grammar that has a symbol (a
name).
-
A context transition function is a mapping from the tuple (context
symbol, regular definition symbol) to context action.
-
A context action is a tuple (instruction, value) that drives
the management of the lexer context stack.
-
A lexical grammar is a set of lexical contexts, a context
transition function, and a start context (a distinguished context
in the lexical grammar).
-
A deterministic finite automaton (DFA) is an regular expression
recognition machine that has a transition function and an
output function. The output function maps state numbers to
regular definition names. A DFA is equivalent to a single regular
grammar.
-
A lexer is machine that transforms an input sequence of characters to an
output sequence of tokens. It uses a set of DFAs (one for each lexical
context), the context transition function, and a context stack.
-
A context-free grammar is a set of terminals, a set of
nonterminals, and an accepting nonterminal (a distinguished
nonterminal).
-
A grammar symbol is the union of the set of terminal symbols and
the set of nonterminal symbols.
-
A terminal is a symbol. The set of symbols that constitute the
terminals is typically drawn from the union of set of regular definition
symbols over all the lexical contexts of the lexical grammar. Note that
not all regular definition symbols need be terminal symbols, and not all
terminal symbols need be defined as a regular definition (see fig 1).
-
a context-free expression is a language that defines which
grammar symbol sequences "reduce" to a nonterminal.
-
A production is a single partition of a context-free expression,
it is one alternative of a nonterminal.
-
A nonterminal is context-free expression that has a symbol.
-
A deterministic pushdown automaton (DPA) is an context-free
expression recognition engine that has a terminal transition function
and a nonterminal transition function. A DPA is equivalent to a single
context-free grammar.
-
A parser is a machine that transforms an input sequence of
terminals to a tree of grammar symbols (the syntax tree). It uses a
DPA, a state stack, and a symbol stack.
+---------------+
| Terminals |
| |
+----|----------+ |
| | | |
| +---------------+
| |
| Regular Dfn's |
+---------------+
Fig. 1: The set of terminal symbols and the set of regular definition
symbols are not necessarily equal.
-
Translation refers to the combined process of lexical and
syntactic analysis.
-
A translator grammar is a lexical grammar and a context-free
grammar.
-
A translator is a machine that transforms an input character
sequence to an output object. The output object may be the syntax tree
itself or some result that is generated through interpretation of
the (abstract) syntax tree.
Go to the first, previous, next, last section, table of contents.