com.inxar.syntacs.automaton.finite
Class MesoArrayDFA
java.lang.Object
|
+--com.inxar.syntacs.automaton.finite.MesoArrayDFA
- All Implemented Interfaces:
- DFA, Serializable
- public class MesoArrayDFA
- extends Object
- implements DFA, Serializable
Concrete DFA
implementation which uses truncated array
parse tables. The MesoArrayDFA
has favorable memory
and speed characteristics and thus is used often within the STT.
'Truncated' means that the ends of the array are contracted into a
single array element. Another name to describe this is
"atelomeric". Since each state table entry commonly has long
stretches of contiguously identical transitions at the beginning
and end of the unicode range, the next-state value for the
beginning and end stretch of the table can be placed in a single
element. 'Meso' means 'middle'.
The way this works is as follows. Each state is represented by a
single row in the transition table. This table entry must
communicate the next_state for every possible input character over
the unicode input range. One way to do this is make the array
65535 elements long and explicitly say what every next state is.
Invariably however, most of these next_states are the DEAD_STATE
(or at least repetitive). For example, consider an arbitrary state
'5' which has a single transition to state '8' over the character
'A'. The table entry would look like:
INDEX: [0, 1, 2, ... , 64, 65, 66, ..., 65532, 65533, 65534]
VALUE: [X, X, X, ... , X, 8, X, ..., X, X, X]
where 'X' represents the dead state. Another way to communicate
the same information is as follows:
INDEX: [ 0, 1, 2, 3, 4]
VALUE: [64, 66, X, X, 8]
The above says "if the input character has a value equal or less
than 64, forget about looking it up in the array, I can tell you
right off the bat that the next_state value is stored in element 2.
Likewise, if the input character has a value greater or equal to
66, don't bother looking, the next_state is at element 3. But if
the value is between 64 and 66, look in the rest of array for the
next state, but make sure you take into account that (1) the first
64 characters have been chopped off the front of the array and (2)
we're using the first four elements in the array for accounting
purposes."
This strategy only works effectively when states have long
stretches of contiguous repeats at the ends of the state tables,
but fortunately this is the norm not the exception. Incidentally,
the ends of eukaryotic chromosomes have long stretches of
oligonucleotide repeats called "telomeres" that are thought to play
a key role in the regulation and metering of cellular aging.
- See Also:
- Serialized Form
Field Summary |
int[] |
accepts
The accepting token table, where the first
dimension is the state number and the value at that address is
the Token type number. |
int[][] |
table
The transition table, where the first dimension index is the
state number, the second dimension index is the offset unicode
input character, and the value at that array address is the
next state. |
Constructor Summary |
MesoArrayDFA(int[][] table,
int[] accepts)
Constructs the MesoArrayDFA on the given truncated transition
table and accepting token array. |
Method Summary |
int |
go(int state_num,
int input)
Returns the next state for the given state and given input symbol. |
int |
output(int state)
Returns the output at the given state. |
String |
toString()
|
accepts
public int[] accepts
- The accepting
token
table, where the first
dimension is the state number and the value at that address is
the Token
type number.
table
public int[][] table
- The transition table, where the first dimension index is the
state number, the second dimension index is the offset unicode
input character, and the value at that array address is the
next state. See the description or the source code of the
class for specifics.
MesoArrayDFA
public MesoArrayDFA(int[][] table,
int[] accepts)
- Constructs the MesoArrayDFA on the given truncated transition
table and accepting token array.
go
public int go(int state_num,
int input)
- Description copied from interface:
DFA
- Returns the next state for the given state and given input symbol.
- Specified by:
go
in interface DFA
output
public int output(int state)
- Description copied from interface:
DFA
- Returns the output at the given state. Output values generally
correspond to to Token ID numbers. If no output has been
defined for the given state,
Token.UNDEF
is
returned.
- Specified by:
output
in interface DFA
toString
public String toString()
- Overrides:
toString
in class Object