Go to the first, previous, next, last section, table of contents.


Regular Expression Syntax

The regular expression syntax in STT is pretty standard. Whitespace is never significant, however -- any literal space characters must be introduced with the space character escape `\s'. Any literal double-quotation mark `"' must be escaped since regexps are always enclosed in double-quotations.

List Operators

Op
Definition
|
Union: a list of alternate choices that can be matched, like `a|b|c'.
none
Concatenation: a list of atoms that must be matched in sequence, like `a b c' or `abc'.
[]
Character Classes: A syntactic convenience for alternation of character intervals: `[\r\n]', `[a-z]'. Negation of character classes inverts the sense of the inclusion: `[^a-z]'. If the dash character `-' is one of the characters in the class, it must be the first member in the class: `[-=+]'. Whitespace within the brackets is not significant and characters that would normally have to be escaped do not. The ones that do include: backslash `\\', close-bracket `\', semicolon `\;', and all the whitespace escapes `\s', `\r', `\n', `\t', `\v'. Octal and Unicode escapes can be used as well.

Quantification Operators

Op
Definition
*
Closure: zero-or-more occurrences must exist
+
Positive-closure: one or more occurrences must exist
?
Optional: zero-or-one occurrences must exist

Literal Escapes

Op
Definition
\\
literal backslash
\s
literal space
\n
literal newline
\r
literal carriage return
\t
literal horizontal tab
\v
literal vertical tab
\+
literal plus sign
\*
literal asterisk
\?
literal question-mark
\(
literal open-parenthesis
\)
literal close-parenthesis
\[
literal open-bracket
\]
literal close-bracket
\|
literal pipe
\"
literal double-quote (necessary since regexps are enclosed in double quotes)

Octal and Unicode Escapes

Octal and unicode escapes match the following regular expressions, respectively:

OCTAL_ESCAPE   matches " \\ [0-3] [0-7] [0-7] ";
UNICODE_ESCAPE matches " \\ u [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] "; 

Precedence

From lowest to highest: union, concatentation, quantification, atom (char | escape | char-class), grouping.

Examples

IDENTIFIER matches " [_a-z] [-_a-zA-Z0-9] ";
WHITESPACE matches " [\n \r \t \v \s]+ ";
BEVERAGE   matches " coffee | tea | cola ";
CAFFEINE   matches " caff(ei|ie)ne ";


Go to the first, previous, next, last section, table of contents.