VavrCC
VavrCC is a parser generator for use with Java applications. It is a fork of JavaCC 7.0.6
A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar.
In addition to the parser generator itself, VavrCC provides other standard capabilities related to parser generation such as tree building (via a tool called JJTree included with VavrCC), actions and debugging.
The generated parser has no dependencies except JRE.
This README is meant as a brief overview of the core features and how to set things up to get yourself started with VavrCC. Note: you might use your existing JavaCC grammars with VavrCC.
Contents
Introduction
Differences from JavaCC
VavrCC is based on JavaCC, and it makes a number of changes:
- VavrCC uses Gradle for the build: code is much easier to load in IDE
- VavrCC has better test coverage (tests are executed in Travis and GitHub Actions CI)
- In progress: automatic fuzzer generator. VavrCC would be able to generate valid-looking inputs for testing grammars.
Features
-
VavrCC generates top-down (recursive descent) parsers as opposed to bottom-up parsers generated by YACC-like tools. This allows the use of more general grammars, although left-recursion is disallowed. Top-down parsers have a number of other advantages (besides more general grammars) such as being easier to debug, having the ability to parse to any non-terminal in the grammar, and also having the ability to pass values (attributes) both up and down the parse tree during parsing.
-
By default, VavrCC generates an
LL(1)
parser. However, there may be portions of grammar that are notLL(1)
. VavrCC offers the capabilities of syntactic and semantic lookahead to resolve shift-shift ambiguities locally at these points. For example, the parser isLL(k)
only at such points, but remainsLL(1)
everywhere else for better performance. Shift-reduce and reduce-reduce conflicts are not an issue for top-down parsers. -
VavrCC generates parsers that are 100% pure Java, so there is no runtime dependency on VavrCC and no special porting effort required to run on different machine platforms.
-
VavrCC allows extended BNF specifications - such as
(A)*
,(A)+
etc - within the lexical and the grammar specifications. Extended BNF relieves the need for left-recursion to some extent. In fact, extended BNF is often easier to read as inA ::= y(x)*
versusA ::= Ax|y
. -
The lexical specifications (such as regular expressions, strings) and the grammar specifications (the BNF) are both written together in the same file. It makes grammars easier to read since it is possible to use regular expressions inline in the grammar specification, and also easier to maintain.
-
The lexical analyzer of VavrCC can handle full Unicode input, and lexical specifications may also include any Unicode character. This facilitates descriptions of language elements such as Java identifiers that allow certain Unicode characters (that are not ASCII), but not others.
-
VavrCC offers Lex-like lexical state and lexical action capabilities. Specific aspects in VavrCC that are superior to other tools are the first class status it offers concepts such as
TOKEN
,MORE
,SKIP
and state changes. This allows cleaner specifications as well as better error and warning messages from VavrCC. -
Tokens that are defined as special tokens in the lexical specification are ignored during parsing, but these tokens are available for processing by the tools. A useful application of this is in the processing of comments.
-
Lexical specifications can define tokens not to be case-sensitive either at the global level for the entire lexical specification, or on an individual lexical specification basis.
-
VavrCC comes with JJTree, an extremely powerful tree building pre-processor.
-
VavrCC also includes JJDoc, a tool that converts grammar files to documentation files, optionally in HTML.
-
VavrCC offers many options to customize its behavior and the behavior of the generated parsers. Examples of such options are the kinds of Unicode processing to perform on the input stream, the number of tokens of ambiguity checking to perform etc.
-
VavrCC error reporting is among the best in parser generators. VavrCC generated parsers are able to clearly point out the location of parse errors with complete diagnostic information.
-
Using options
DEBUG_PARSER
,DEBUG_LOOKAHEAD
, andDEBUG_TOKEN_MANAGER
, users can get in-depth analysis of the parsing and the token processing steps. -
The VavrCC release includes a wide range of examples including Java and HTML grammars. The examples, along with their documentation, are a great way to get acquainted with VavrCC.
Example
This example recognizes matching braces followed by zero or more line terminators and then an end of file.
Examples of legal strings in this grammar are:
{}
, {{{{{}}}}}
// ... etc
Examples of illegal strings are:
{}{}
, }{}}
, { }
, {x}
// ... etc
Grammar
PARSER_BEGIN(Example)
/** Simple brace matcher. */
public class Example {
/** Main entry point. */
public static void main(String args[]) throws ParseException {
Example parser = new Example(System.in);
parser.Input();
}
}
PARSER_END(Example)
/** Root production. */
void Input() :
{}
{
MatchedBraces() ("\n"|"\r")* <EOF>
}
/** Brace matching production. */
void MatchedBraces() :
{}
{
"{" [ MatchedBraces() ] "}"
}
Output
$ java Example
{{}}<return>
$ java Example
{x<return>
Lexical error at line 1, column 2. Encountered: "x"
TokenMgrError: Lexical error at line 1, column 2. Encountered: "x" (120), after : ""
at ExampleTokenManager.getNextToken(ExampleTokenManager.java:146)
at Example.getToken(Example.java:140)
at Example.MatchedBraces(Example.java:51)
at Example.Input(Example.java:10)
at Example.main(Example.java:6)
$ java Example
{}}<return>
ParseException: Encountered "}" at line 1, column 3.
Was expecting one of:
<EOF>
"\n" ...
"\r" ...
at Example.generateParseException(Example.java:184)
at Example.jj_consume_token(Example.java:126)
at Example.Input(Example.java:32)
at Example.main(Example.java:6)
Getting Started
Follow the steps here to get started with VavrCC.
This guide will walk you through locally building the project, running an existing example, and setup to start developing and testing your own VavrCC application.
Download & Installation
VavrCC has not been released yet.
Installation
To install VavrCC, navigate to the download directory and type:
$ unzip vavr-7.0.5.zip
or
$ tar xvf vavr-7.0.5.tar.gz
Once you have completed installation add the bin/
directory in the VavrCC installation to your PATH
. The VavrCC, JJTree, and JJDoc invocation scripts/executables reside in this directory.
Binary Distribution
The binary distributions contain the VavrCC, JJTree and JJDoc sources, launcher scripts, example grammars and documentation. It also contains a bootstrap version of VavrCC needed to build VavrCC.
On Unix-based systems, you need to make sure the files in the bin/
directory of the distribution are in your path.
Building VavrCC from Source
Prerequisites for building VavrCC:
- Git
- Java 8 (or Java 11)
Note: the build system is Gradle, however several integration tests use Ant. You don't need to install Ant separately.
$ git clone https://github.com/vavrcc/vavrcc.git
$ cd vavrcc
$ ./gradlew build
This will build the vavrcc.jar
file in the build/
directory
Developing VavrCC
It is recommended to use IntelliJ IDEA 2019.2+
IntelliJ IDEA
The IntelliJ IDE supports Maven out of the box and there's a plugin for JavaCC grammar development.
- IntelliJ download: https://www.jetbrains.com/idea/
- IntelliJ JavaCC Plugin: https://plugins.jetbrains.com/plugin/11431-javacc/
Eclipse IDE
- Eclipse download: https://www.eclipse.org/ide/
- Eclipse JavaCC Plugin: https://marketplace.eclipse.org/content/javacc-eclipse-plug
Support
Don’t hesitate to ask!
Open an issue if you found a bug in VavrCC.
Resources
JavaCC books, tutorials, and articles are relevant for VavrCC as well: https://github.com/javacc/javacc/#resources
Parsing theory
- Alfred V. Aho, Monica S. Lam, Ravi Sethi and Jeffrey D. Ullman, Compilers: Principles, Techniques, and Tools, 2nd Edition, Addison-Wesley, 2006, ISBN 0-3211314-3-6 (book, pdf).
- Charles N. Fischer and Richard J. Leblanc, Jr., Crafting a Compiler with C., Pearson, 1991. ISBN 0-8053216-6-7 (book).
License
VavrCC is an open source project released under the BSD-3-Clause.