Go to the first, previous, next, last section, table of contents.


Error Handling

An important part of any parser generator is error recovery and/or error repair. The design that the STT uses is that when an error is discovered by the parser, it asks to the ParserInterpreter how to recover. This, in theory, gives the user complete control over error recovery. In practice, however, only a few maneuvers (called Corrections) are defined by the parser, but these are sufficient for general use.

Auditor

The auditor is the central listener for errors and warnings, collectively called complaints. The auditor instance is visible to all translation components. When a TranslationException is thrown, the auditor instance is carried out on the back of the exception to the caller such that it can be printed out or otherwise inspected.

Lexical Errors

If the lexer cannot find an appropriate match given the current input, it will notify the LexerInterpreter through the error(int offset, int length) method with the offset and length of the unrecognizable sequence. No context switching occurs during a lexical error.

It is standard behavior for the lexer interpreter to report this to the Auditor such that the line is printed, highlighting the syntax error.

Currently, no lexical error handling is done in the lexer itself. The overall design, however, allows custom lexical error handling to be implemented by the author within the LexerInterpreter.

Syntactic Errors

If for a given parser state and input terminal no valid action is known, the parser will react by calling interpreter.recover(int type, Sentence left_context). The return value from the ParserInterpreter is a Recovery object (a list of Correction actions that the parser will execute in sequence).

This design is relatively flexible because it allows for future implementation of a number of methods for error recovery. However, only a few Correction types have been implemented thus far! This is partically due to the fact that (1) error recovery is for some reason one of the last things implemented in these types of projects, (2) the SYNC and TUMBLE recovery paradigm works nicely and is extremely simple.

Correction Types

ABORT

When the parser executes an ABORT correction, it bails out by immediately throwing a TranslationException. This is useful if a maximum number of errors have been discovered (such as 100). In this case the user would be responsible for checking with the auditor how many errors the translation has encountered.

SYNC

When the parser executes a SYNC correction, it will discard future input terminals from the LexerInterpreter until it sees one that matches some given type (a semicolon, for example). The terminal named by the SYNC instruction is called a synchronizing symbol.

Note that the synchronizing symbol is also discarded. When the SYNC instruction is satisfied, the parser advances to the next correction in the recovery list.

TUMBLE

When the parser executes a TUMBLE currection, it iteratively challenges the current input terminal symbol and the current parse state against the DPA.action(state, symbol) method.

If the Action returned by the DPA is not an ERROR, the TUMBLE condition is satisfied, the parser executes the given Action, and the parser advances to the next correction in the recovery plan. If the end of the recovery plan has been reached, normal parsing resumes.

Conversely, if the Action is an ERROR, the parser pops both the state and symbol stacks, discards their values, and tries another challenge. In this manner the parser is "tumbling" down the stack trying to find a combination that works.

For grammars that have obvious synchronizing tokens, the SYNC and TUMBLE recovery combo works well to limit the number of errors to a reasonable number. I hope to implement more error correction types in a future release as it is an interesting problem. I would also encourage others to get involved. Your help is needed!


Go to the first, previous, next, last section, table of contents.