com.inxar.syntacs.automaton.finite
Class MesoArrayDFA

java.lang.Object
  |
  +--com.inxar.syntacs.automaton.finite.MesoArrayDFA
All Implemented Interfaces:
DFA, Serializable

public class MesoArrayDFA
extends Object
implements DFA, Serializable

Concrete DFA implementation which uses truncated array parse tables. The MesoArrayDFA has favorable memory and speed characteristics and thus is used often within the STT.

'Truncated' means that the ends of the array are contracted into a single array element. Another name to describe this is "atelomeric". Since each state table entry commonly has long stretches of contiguously identical transitions at the beginning and end of the unicode range, the next-state value for the beginning and end stretch of the table can be placed in a single element. 'Meso' means 'middle'.

The way this works is as follows. Each state is represented by a single row in the transition table. This table entry must communicate the next_state for every possible input character over the unicode input range. One way to do this is make the array 65535 elements long and explicitly say what every next state is. Invariably however, most of these next_states are the DEAD_STATE (or at least repetitive). For example, consider an arbitrary state '5' which has a single transition to state '8' over the character 'A'. The table entry would look like:

     INDEX: [0, 1, 2, ... , 64, 65, 66, ..., 65532, 65533, 65534]
     VALUE: [X, X, X, ... ,  X,  8,  X, ...,     X,     X,     X]
 
where 'X' represents the dead state. Another way to communicate the same information is as follows:
     INDEX: [ 0,  1, 2, 3, 4]
     VALUE: [64, 66, X, X, 8]
 
The above says "if the input character has a value equal or less than 64, forget about looking it up in the array, I can tell you right off the bat that the next_state value is stored in element 2. Likewise, if the input character has a value greater or equal to 66, don't bother looking, the next_state is at element 3. But if the value is between 64 and 66, look in the rest of array for the next state, but make sure you take into account that (1) the first 64 characters have been chopped off the front of the array and (2) we're using the first four elements in the array for accounting purposes."

This strategy only works effectively when states have long stretches of contiguous repeats at the ends of the state tables, but fortunately this is the norm not the exception. Incidentally, the ends of eukaryotic chromosomes have long stretches of oligonucleotide repeats called "telomeres" that are thought to play a key role in the regulation and metering of cellular aging.

See Also:
Serialized Form

Field Summary
 int[] accepts
          The accepting token table, where the first dimension is the state number and the value at that address is the Token type number.
 int[][] table
          The transition table, where the first dimension index is the state number, the second dimension index is the offset unicode input character, and the value at that array address is the next state.
 
Fields inherited from interface org.inxar.syntacs.automaton.finite.DFA
DEAD_STATE, START_STATE
 
Constructor Summary
MesoArrayDFA(int[][] table, int[] accepts)
          Constructs the MesoArrayDFA on the given truncated transition table and accepting token array.
 
Method Summary
 int go(int state_num, int input)
          Returns the next state for the given state and given input symbol.
 int output(int state)
          Returns the output at the given state.
 String toString()
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

accepts

public int[] accepts
The accepting token table, where the first dimension is the state number and the value at that address is the Token type number.

table

public int[][] table
The transition table, where the first dimension index is the state number, the second dimension index is the offset unicode input character, and the value at that array address is the next state. See the description or the source code of the class for specifics.
Constructor Detail

MesoArrayDFA

public MesoArrayDFA(int[][] table,
                    int[] accepts)
Constructs the MesoArrayDFA on the given truncated transition table and accepting token array.
Method Detail

go

public int go(int state_num,
              int input)
Description copied from interface: DFA
Returns the next state for the given state and given input symbol.
Specified by:
go in interface DFA

output

public int output(int state)
Description copied from interface: DFA
Returns the output at the given state. Output values generally correspond to to Token ID numbers. If no output has been defined for the given state, Token.UNDEF is returned.
Specified by:
output in interface DFA

toString

public String toString()
Overrides:
toString in class Object