42

I was researching about the gcc compiler suite on wikipedia here, when this came up:

GCC started out using LALR parsers generated with Bison, but gradually switched to hand-written recursive-descent parsers; for C++ in 2004, and for C and Objective-C in 2006. Currently all front ends use hand-written recursive-descent parsers

So by that last sentence, (and for as much as I trust wikipedia) I can definitely say that "C (gcc), C++ (g++), Objective-C, Objective-C++, Fortran (gfortran), Java (gcj), Ada (GNAT), Go (gccgo), Pascal (gpc),... Mercury, Modula-2, Modula-3, PL/I, D (gdc), and VHDL (ghdl)" are all front-ends that no longer use a parser generator. That is, they all use hand-written parsers.

My question then is, is this practice ubiquitous? Specifically, I'm looking for exact answers to "does the standard/official implementation of x have a hand-written parser" for x in [Python, Swift, Ruby, Java, Scala, ML, Haskell]? (Actually, information on any other languages is also welcome here.) I'm sure I can find this on my own after a lot of digging. But I'm also sure this is easily answerable by the community. Thanks!

eatonphil
  • 571

2 Answers2

36

AFAIK, GCC use hand-written parsers in particular to improve syntactic error diagnostics (i.e. giving human meaningful messages on syntax errors).

Parsing theory (and the parsing generators descending from it) is mostly about recognizing and parsing a correct input phrase. But we expect from compilers that they give a meaningful error message (and that they are able to parse meaningfully the rest of the input after the syntactic error), for some incorrect input.

Also, old legacy languages -like C11 or C++11- (which are conceptually old, even if their latest revision is only three years old) are not at all context-free. Dealing with that context sensitiveness in grammars for parser generators (i.e. bison or even menhir) is boringly difficult.

7

Parser generators and parser engines are quite general. The advantage of the generality is that building an accurate parser quickly and getting it functional is easy, in the overall scheme of things.

The parser engine itself suffers on the performance front because of its generality. Any hand-written code will always be significantly faster than the table-driven parser engines.

The second area where parser generators/engines have difficulty is that all real programming languages are context sensitive, often in quite subtle ways. LR languages are context-free, meaning that there are many subtleties about positioning and environment that are impossible to properly convey in the syntax. Attributed grammers attempt to address basic language rules like "declare before use", etc. Wiring this context-sensitivity into hand-written code is straight forward.

BobDalgleish
  • 4,749