org.inxar.syntacs.grammar.regular
Interface RegularGrammar

All Known Implementing Classes:
REGrammar

public interface RegularGrammar

The RegularGrammar interface represents a factory for generating regular expressions, typically for the purpose of constructing RegularTokens. Each newXXX method allocates and returns a new RegularExpression object which implements the XXX interface. These regex Objects are then used to construct more complex RegularExpressions, eventually to be resubmitted to the RegularGrammar object with a name using the newToken() method. By this fashion one builds up a set of named regular expressions, perhaps to be transformed to a DFA which recognizes the tokens implied by the regexes. When the token construction phase is complete, calling compile() returns a 'compiled' version of the language. The compilation process typically involves giving the appropriate objects unique integer id's such that future set manipulation can be done numerically rather than using full-scale Objects.

In this way, one can think of the RegularGrammar interface as the 'thing' humans assemble and the RegularSet as the 'thing' machines use to do more interesting things like build DFAs.


Method Summary
 RegularSet compile()
          When token construction is complete, compile() compiles and returns a RegularSet object which can be used for generation of DFA's, for example.
 Epsilon getEpsilon()
          Returns the Epsilon symbol in the (rare) case one needs it.
 CharClass newCharClass()
          Allocates and returns a new CharClass expression ([^-a-z]).
 CharString newCharString(String s)
          Allocates and returns a new CharString expression ('+') wrapping the given RegularExpression.
 Closure newClosure(RegularExpression re)
          Allocates and returns a new Closure expression ('*') wrapping the given RegularExpression.
 Concatenation newConcatenation(RegularExpression left, RegularExpression right)
          Allocates and returns a new Concatenation expression from the given left and right RegularExpressions.
 Interval newInterval(char c)
          Allocates and returns a new Interval expression over the given char.
 Interval newInterval(int lo, int hi)
          Allocates and returns a new Interval expression over the given character range from lo to hi, inclusive.
 Option newOption(RegularExpression re)
          Allocates and returns a new Option expression ('?') wrapping the given RegularExpression.
 PositiveClosure newPositiveClosure(RegularExpression re)
          Allocates and returns a new PositiveClosure expression ('+') wrapping the given RegularExpression.
 RegularToken newToken(int tokenID, String name, RegularExpression regex)
          Allocates and returns a new RegularToken mapping the given name to the given RegularExpression.
 RegularToken newToken(int tokenID, String name, String regex)
          Allocates a new RegularToken in this grammar having the given tokenID number, name, and regex.
 Union newUnion()
          Allocates and returns a new Union expression.
 

Method Detail

newConcatenation

public Concatenation newConcatenation(RegularExpression left,
                                      RegularExpression right)
Allocates and returns a new Concatenation expression from the given left and right RegularExpressions.

newClosure

public Closure newClosure(RegularExpression re)
Allocates and returns a new Closure expression ('*') wrapping the given RegularExpression.

newPositiveClosure

public PositiveClosure newPositiveClosure(RegularExpression re)
Allocates and returns a new PositiveClosure expression ('+') wrapping the given RegularExpression. Note that the PositiveClosure is a shortcut for concatenation-closure. Therefore, a+ expands to aa*.

newInterval

public Interval newInterval(int lo,
                            int hi)
Allocates and returns a new Interval expression over the given character range from lo to hi, inclusive.

newInterval

public Interval newInterval(char c)
Allocates and returns a new Interval expression over the given char. Note newInterval(97, 97) has the same meaning as newInterval('a') under the ascii or unicode charset.

newOption

public Option newOption(RegularExpression re)
Allocates and returns a new Option expression ('?') wrapping the given RegularExpression. Note that Option is not an atomic RegularExpression. Thus, 'a?' expands to the Union (a|Epsilon).

newCharString

public CharString newCharString(String s)
Allocates and returns a new CharString expression ('+') wrapping the given RegularExpression. Note CharString is not a fundamental expression. Thus, 'abc' expands to the concatenation sequence a-b-c.

newUnion

public Union newUnion()
Allocates and returns a new Union expression. Subsequent modification of the Union is required (i.e. an empty union is invalid).

newCharClass

public CharClass newCharClass()
Allocates and returns a new CharClass expression ([^-a-z]). Subsequent modification of the character class is required (i.e. an empty character class is invalid).

newToken

public RegularToken newToken(int tokenID,
                             String name,
                             RegularExpression regex)
Allocates and returns a new RegularToken mapping the given name to the given RegularExpression. This is the 'special' newXXX() method in that it does not return a RegularExpression, but a Token. The RegularToken is returned to potentially facilitate it's incorporation into other languages such as the ContextFreeLanguage.newTerminal(Token) method. Therefore, calling newToken() not only makes a RegularToken object on the regex, it becomes associated into the grammar.

newToken

public RegularToken newToken(int tokenID,
                             String name,
                             String regex)
Allocates a new RegularToken in this grammar having the given tokenID number, name, and regex. The RegularGrammar is then responsible for parsing the regex string and generating a RegularExpression.

getEpsilon

public Epsilon getEpsilon()
Returns the Epsilon symbol in the (rare) case one needs it.

compile

public RegularSet compile()
When token construction is complete, compile() compiles and returns a RegularSet object which can be used for generation of DFA's, for example. The compilation process is essentially making sure Intervals each get a unique ID and concatenating ExpressionTerminators where appropriate.