ANTLR Tips

ANTLR can be a beast sometimes. Grammars are hard because each piece is not alone, it is intertwined with all the other pieces. Change one rule and you can have ambiguities in many other rules that call it. Here are some good practices I've learned about working with ANTLR.

Look at the generated code! This is by far the best way to understand ANTLR. The code is meant to be human readable and debuggable. If something is going wrong, look at the generated code. Try to figure out what ANTLR is doing. Don't understand predicates? Put one in and look at the generated code. You'll get a feel for what lookahead is doing, how predicates work, how closures turn into loops, and anything else you want to know.

Use revision control. I use RCS and check in my .g files with every compile, with a little log message about what I'm doing and why. It's easy to make a few changes at a time and really hose things up. RCS provides a safety net so I can radically change my grammars and know that I can always get back to a known state.

Write lots of tests. My GCC grammar contains over 80 tests of various C and GCC constructs. They really help during development because with every change I can run the tests to be sure I didn't break anything. Every bug I find becomes a new test, and I fix the grammar until the test runs. Each time I added a GCC extension to the grammar I made a test so that I knew it was really working.

Learn to trace. The -traceParser, -traceLexer, and -traceTreeParser options are very valuable. My GCC grammar has overrides of the trace printout functions to indent with each new rule call and print out the nice names of tokens.