jb -- Parser+Lexer Generating Java
Last Updated: 16 December 1996
Latest Version: jb-4.2
[Description
| Retrieving
| Changes
| Mailing List
| Acknowledgments ]
The jb system takes
parsers generated using the Gnu Bison parser generator system
and translates them to execute in Java (tm)
Jb takes the C file output by Bison and scans it to extract
the parse tables and constants. Jb then scans
various template files specified by the user and inserts the
extracted information at specified points in the templates.
In addition to generating parsers,
jb provides two methods of generating corresponding lexers.
- Flex -- the jb system (starting with version 3.0) can also
take lexers generated using the Gnu flex generator and
translate them to execute in Java; this is accomplished with a
program called jf.
- yylex.generic -- A generic ad-hoc lexer that can be modified
to produce lexers for typical programming languages.
FTP
Various versions of jb are available via anonymous ftp.
The following references are symbolic links to the latest versions.
- Source:
- ftp://ftp.cs.colorado.edu/pub/cs/distribs/arcadia/jb.tar
- Information file:
- ftp://ftp.cs.colorado.edu/pub/cs/distribs/arcadia/jb.txt
Dependencies:
- Bison --
Jb has been tested with bison version 1.24 and 1.25.
Other versions probably will work as well as long as the
bison output parser is not wildly changed, but jb will issue
a warning when using other versions.
- Flex -- Jb has been tested with flex version 2.5.2 and 2.5.3.
- Java -- Jb produces code targeted at Java version 1.0.2;
later versions should also work.
- Starwave Regular Expression Package -- I use this package (version 1.10)
to provide regular expression support.
Changes Incorporated into Version 4
Minor version levels are indicated in parentheses.
- (0) Rebuilt jb and jf to be Java programs. Unfortunately,
Java is not a good string processing language, so the new programs
are detectably slower than the older Tcl versions.
- (0) Replaced the use of sed by a java program called subst.
- (0) Rebuilt the mechanism by which tokens are passed between
the parser and the lexer. The change attempts to avoid
the creation of a new token for every lexeme in the input
and attempts to avoid re-allocating so many strings.
This destroys compatibility with earlier versions of jb.
- (0) Added support for the bison %union construct
to allow typing of tokens and rules. This can simplify semantic
actions by automating some casting of values.
- (1) The static initializer in class YYtokentypes is incorrect.
In the file jbf/yytokentypes.template, line 1 should be substituted
for line 2.
- for(int i=0;i<tokenmax;i++) {Tokentype[i] = new Integer(i);}
- for(int i=0;i<=tokenmax;i++) {Tokentype[i] = new Integer(i);}
- (2) Incorporated suggestion from Steven G. Parker
(sparker@taz.cs.utah.edu), who notes as follows.
``By default, Java doesn't buffer File*Streams. So each read/write to
that stream translates to a read/write system call. If that isn't bad
enough, the implementation of threads in Java makes each read/write system
call take about 4 calls - to getpid, fcntl, and some other junk. Adding
the buffer makes it read in 4K chunks. For my one test case, it knocked
it down from around 60 seconds to around 20 seconds.
...You have to flush them manually -
I had a program that would chop off the tail end because
the buffer wasn't getting flushed when the program exited.''
Changes Incorporated into Version 3
Minor version levels are indicated in parentheses.
- (0) Major renaming of various classes to avoid name clashes.
This destroys compatibility with earlier versions of jb.
- (0) Moved the calc and idl examples to a general examples
directory and added a grammar for Java.
- (0) Renamed the bison package to be named "jbf"
(java+bison+flex).
- (0) Added the ability to store line number and character positions;
this also required modifications to jbf/yylex.generic.
- (0) Introduced a general nonterminal mechanism that
can be used to build an intermediate representation tree
(ala the Arcadia IRIS mechanism) for any parse.
Both the new non-terminal class (YYnonterm) and the old
token class (now called YYtoken) are both subtypes
of a new class called YYnode.
The names and an index of the non-terminals is inserted into
generated class
YYtokentypes (was Tokentypes).
- (0) Dropped JavaLex in favor of the Gnu Flex system for generating
lexers. Flex produced much smaller lexers, and ones that are
competitive in size to the generic lexer.
- (1) Fixed bug in YYlexbuffer that miscounted the number
of lines after purge was finished.
If you are interested in receiving occasional mailings
about this system, please send your preferred email address
to the contact address below and mentioning the name of this system.
This work is sponsored by the Air Force Material Command, Rome
Laboratory, and the Advanced Research Projects Agency under Contract
Number F30602-94-C-0253.
Dennis Heimbigner
<dennis@cs.colorado.edu>
SERL
<serl@cs.colorado.edu>
[ Research |
People |
Software |
Papers ]
[ Architecture |
CM |
Object |
Process ]