An important part of any parser generator is error recovery and/or error
repair. The design that the STT uses is that when an error is
discovered by the parser, it asks to the ParserInterpreter
how to
recover. This, in theory, gives the user complete control over error
recovery. In practice, however, only a few maneuvers (called
Corrections) are defined by the parser, but these are sufficient
for general use.
The auditor is the central listener for errors and warnings,
collectively called complaints. The auditor instance is visible
to all translation components. When a TranslationException
is
thrown, the auditor instance is carried out on the back of the exception
to the caller such that it can be printed out or otherwise inspected.
If the lexer cannot find an appropriate match given the current input,
it will notify the LexerInterpreter
through the error(int
offset, int length)
method with the offset and length of the
unrecognizable sequence. No context switching occurs during a lexical
error.
It is standard behavior for the lexer interpreter to report this to the
Auditor
such that the line is printed, highlighting the syntax
error.
Currently, no lexical error handling is done in the lexer itself. The
overall design, however, allows custom lexical error handling to be
implemented by the author within the LexerInterpreter
.
If for a given parser state and input terminal no valid action is known,
the parser will react by calling interpreter.recover(int type,
Sentence left_context)
. The return value from the
ParserInterpreter
is a Recovery
object (a list of
Correction
actions that the parser will execute in sequence).
This design is relatively flexible because it allows for future
implementation of a number of methods for error recovery. However, only
a few Correction
types have been implemented thus far! This is
partically due to the fact that (1) error recovery is for some reason
one of the last things implemented in these types of projects, (2) the
SYNC and TUMBLE recovery paradigm works nicely and is extremely
simple.
When the parser executes an ABORT correction, it bails out by
immediately throwing a TranslationException
. This is useful if a
maximum number of errors have been discovered (such as 100). In this
case the user would be responsible for checking with the auditor how
many errors the translation has encountered.
When the parser executes a SYNC correction, it will discard future input
terminals from the LexerInterpreter
until it sees one that matches some
given type (a semicolon, for example). The terminal named by the SYNC
instruction is called a synchronizing symbol.
Note that the synchronizing symbol is also discarded. When the SYNC instruction is satisfied, the parser advances to the next correction in the recovery list.
When the parser executes a TUMBLE currection, it iteratively challenges
the current input terminal symbol and the current parse state against
the DPA.action(state, symbol)
method.
If the Action
returned by the DPA is not an ERROR, the TUMBLE
condition is satisfied, the parser executes the given Action
, and
the parser advances to the next correction in the recovery plan. If the
end of the recovery plan has been reached, normal parsing resumes.
Conversely, if the Action
is an ERROR, the parser pops both the
state and symbol stacks, discards their values, and tries another
challenge. In this manner the parser is "tumbling" down the stack
trying to find a combination that works.
For grammars that have obvious synchronizing tokens, the SYNC and TUMBLE recovery combo works well to limit the number of errors to a reasonable number. I hope to implement more error correction types in a future release as it is an interesting problem. I would also encourage others to get involved. Your help is needed!
Go to the first, previous, next, last section, table of contents.