Go to the first, previous, next, last section, table of contents.


Grammar Syntax

A grammar for a translator generated with the STT is an STT Grammar. Either the native .stt format or XML can be used to express the required structure.

XML Format

Though XML can be used as to write a grammar, it is far more verbose than the native stt format. Since the abstract structure of an XML grammar instance and an stt grammar instance are interchangeable, no formal description of the XML format is given; consult the DTD and the examples in the distribution. The use of XML was basically a bootstrap mechanism. It is still occasionally required when some part of the translation machinery is broken due to development, disabling the native pathway.

XML instances must conform to the grammar.dtd document type.

STT Format

A grammar file consists of a set of sections, some of which are optional. Each section consists of one or more statements terminated by a semicolon.

Comments and whitespace are discarded. Comments are typical unix-style; they start with a pound sign (#) and end with a newline.

The sections are:

Grammar Declaration

The grammar declaration defines the name of the grammar and the version. It looks like this:

# format: this is <NAME> version <VERSION>;
this is syntacs version 0.1.0;

Property Declarations

Properties are key:value pairs that are put into a hashtable and used throughout grammar processing. See section Properties for a listing of these properties. They are enclosed in double-quotes.

# format: property <NAME> = "<VALUE>";
property namespace = "com.inxar.syntacs.translator.regexp";

Terminal Declarations

Terminals need to be declared before they can be defined. A declaration establishes that name as a terminal. There may be multiple terminal statements, each of which may declare multiple names.

Terminals and Nonterminals share the same namespace, meaning there cannot be a terminal and a nonterminal having the same name. By convention, terminals identifiers are all caps and nonterminal identifiers are capitalized, but it is up to the preference of the grammar author...

# format: terminal <NAME>;
# format: terminal <NAME>, <NAME>, <NAME>;
terminal IDENT;
terminal T1, T2;

Terminal Definitions

Terminal definitions are regular definitions; they associate a name with an expression. A regular expression is enclosed in double-quotes; whitespace within the string is insignificant. See section Regular Expression Syntax about how regular expressions are written in STT.

# format: <TERMINAL> matches "regexp";
IDENT matches " [_a-zA-Z0-9] [-_a-zA-Z0-9]* ";

Nonterminal Declarations

Nonterminal declarations are identical to terminal declarations with the exception of the keyword. Nonterminal identifiers are by convention capitalized.

# format: nonterminal <NAME>;
# format: nonterminal <NAME>, <NAME>, <NAME>;
nonterminal Goal;
nonterminal IdentList, Name, Statement;

Nonterminal Definitions

Nonterminal definitions are productions: each production relates a nonterminal to a sequence of grammar symbols; when that sequence of grammar symbols (terminals or nonterminal) appears the top of the parse stack, the parser will reduce it to the nonterminal named in the production (i.e. the nonterminal definition).

# format: reduce <NONTERMINAL> when <SYMBOL> <SYMBOL> <SYMBOL>;
reduce Term when Term PLUS Factor;

Accept Definition

This section consists of a single statement that states what "goal symbol" must be reduced in order for the grammar to signal acceptance of the input. The goal symbol must be a declared nonterminal. The convention is "Goal".

# format: accept when <NONTERMINAL>;
accept when Goal;

Context Declarations

The context declarations and definitions are optional. See section Lexical Context for an explanation of what a "context" is.

The context declarations section is similar to the terminal declarations section and nonterminal declarations section.

# format: context <NAME>;
# format: context <NAME>, <NAME>, <NAME>;
context comment;
context special1, special2;

Identifiers used for contexts have their own namespace, each one must be unique only within the set of context declarations. The context names "default" and "all" have special meaning.

Context Definitions

A context definition determines what subset of terminals in the full set of terminals is included in the context. If a terminal is included within a particular context, its corresponding DFA will recognize the appropriate character sequence (given the opportunity).

Each context definition statement consists of a context name and a list of one or more context stack instructions. A stack instruction can say one of three things: "when terminal X is matched, do nothing", "when terminal X is matched, switch into context Y", and "when terminal X is matched, return to the previous context".

The following example demonstrates the use of context switching through context stack instructions:

# format: <NAME> includes <INSTRUCTION>;
terminal WHITESPACE, START_COMMENT, COMMENT_DATA, END_COMMENT;

context default, comment;

default includes START_COMMENT shifts comment, WHITESPACE;
comment includes COMMENT_DATA, END_COMMENT unshifts;

Start Context Definition

This section defines what the starting context will be. When omitted, the default context is "default".

# format: start with context <NAME>;
start with context special;

Context Post-Processing

After the grammar is parsed, some processing is done to initialize each context with the terminals that will be included in it.

Case 1: No explicit contexts

The simplest case is when no context information has been explicitly provided -- the grammar consists of terminal and nonterminal declarations/definitions only.

In this circumstance, the processor implicitly adds in a single context "default", and all terminals are added to that context. The lexer acts on the corresponding DFA and no context switching is done.

terminal WHITESPACE, DATA; 

nonterminal data;
reduce data when DATA;
accept when data

Case 2: One or more explicit contexts, no "all" context

In this circumstance the user declares more or more contexts. The "default" context is always implicitly declared, but can be declared explicitly with no error.

terminal WHITESPACE, DATA, 
         START_QUOTE, QUOTE_DATA, END_QUOTE;

nonterminal data, quote;

context quoted_context;

default        includes WHITESPACE, DATA, START_QUOTE shifts quoted_context;
quoted_context includes WHITESPACE, QUOTE_DATA, END_QUOTE unshifts;

Case 3: Use of the "all" context

The "all" context is special in that it does not actually refer to a real context (a DFA), but rather is a syntactic convenience. Terminals included in the "all" context are placed into every other context after the grammar is parsed. The all context does not have to be declared.

terminal WHITESPACE, DATA, 
         START_QUOTE, QUOTE_DATA, END_QUOTE;

nonterminal data, quote;

context quoted_context;

all            includes WHITESPACE;
default        includes DATA, START_QUOTE shifts quoted_context;
quoted_context includes QUOTE_DATA, END_QUOTE unshifts;


Go to the first, previous, next, last section, table of contents.