Go to the first, previous, next, last section, table of contents.


Lexical Context

Background

The concept of lexer states was introduced by the flex scanner generator as a means to subdivide the lexer into multiple layers such that only a subset of the token definitions can be matched at a particular time. This is useful when a syntax has widely variable "topography"; certain sections of a file are syntatically very different from one another. Attempts to cram all the required regular expressions into a a single lexer state (a single DFA) result in collisions within the grammar. Under these circumstances it becomes advantageous to partition the token definitions into subsets and switch between them at the appropriate time.

Note: I would imagine that the concept of lexer states was almost certainly known well before flex came along, but my knowledge of the subject is pretty limited so don't quote me on historical accuracy.

Transition Function

In the STT, this notion of lexer states is recast into the term lexical context, or more commonly just context, for short. Note however its meaning here is unrelated to the term "context-free parsing"; that is, "lexical context" does not imply any sort of "context-ful parsing". This is because the notion of lexical context does not affect the theory or algorithms used in the parsing phase (syntactic analysis).

It is useful to remember the formula that one lexical context equals one DFA. With this in mind then we can restate what it means to be a lexical analyzer in the STT: A lexical analyzer is a machine that transforms a stream of characters into a stream of tokens; to accomplish its job it requires a set of contexts and a context transition function. The transition function is a mapping from (context, token) --> context.

Therefore, when a DFA says "hey, we found the beginning of a comment!", the context transition function is consulted to check if the context needs to be changed.

Context Stack

STT lexers have an additional feature in that they maintain a context stack. I fibbed when I said that the transition function is a mapping from (context, token) --> context; it is actually a mapping (context, token) --> context action where the context action is a context stack instruction. A context action is a tuple (instruction, register) and hence the full definition for the transition function is a tuple to tuple mapping (context, token) --> (instruction, register).

The instruction is a constant that is one of:

How to Use Multiple Contexts

To take advantage of multiple lexical contexts you need to declare the names of the contexts to be used in your grammar. Then, define which terminals will be included in what context such that the finite automata can be correctly assembled for each context. The context definition is also where PUSH or POP instructions are made. See section Grammar Syntax for more information.


Go to the first, previous, next, last section, table of contents.