What's the simplest example out there to explain the difference between Parse Trees and Abstract Syntax Trees?

Question

To my understanding, a parser creates a parse tree, and then discards it thereafter. However, it can also pop out an abstract syntax tree, which the compiler supposedly makes use of.

I'm under the impression that both the parse tree and the abstract syntax tree are created under the parsing stage. Then could someone explain why these are different?

score 21 · Accepted Answer · edited Feb 20 '14 at 12:00

A parse tree is also known as a concrete syntax tree.

Basically, the abstract tree has less information than the concrete tree. The concrete tree contains each element in the language, whereas the abstract tree has thrown away the uninteresting pieces.

For example the expression: (2 + 5) * 8

The concrete looks like this

  ( 2  + 5 )  * 8
  |  \ | / |  | |
  |   \|/  |  | |
   \___|__/   | |
       \______|/

Whereas the abstract tree has:

In the concrete cases the parentheses and all pieces of the language have been incorporated into the tree. In the abstract case the parentheses are gone, because their information has been incorporated into the tree structure.

score 0 · Answer 2 · answered Feb 20 '14 at 13:04

The first thing you need to understand is that nobody forces you to write a parser or compiler in a certain way. Specifically, it is not necessarily the case that the result of a parser must be a tree. It can be any data structure that is suitable to represent the input.

For example, the following language:

prog:
      definition 
    | definition ';' prog
    ;

definition: .....

can be represented as a list of definitions. (Nitpickers will point out that a list is a degenerate tree, but anyway.)

Second, there is no need to hold onto the parse tree (or whatever data structure the parser returned). To the contrary, compilers are usually constructed as a sequence of passes, that transform the results of the previous pass. Hence the overall layout of a compiler could be thus:

parser :: String             -> Maybe [Definitions]      -- parser
pass1  :: [Definitions]      -> Maybe DesugaredProg      -- desugarer
pass2  :: DesugaredProg      -> Maybe TypedProg          -- type checker
pass3  :: TypedProg          -> Maybe AbstractTargetLang -- code generation
pass4  :: AbstractTargetLang -> Maybe String             -- pretty printer

compiler :: String           -> Maybe String    -- transform source code to target code
compiler source = do
   defs  <- parser source
   desug <- pass1 defs
   typed <- pass2 desug
   targt <- pass3 typed
   pass4 targt

Bottom Line: If you hear people talk about parse trees, abstract syntac trees, concrete syntax trees etc., always substitute by data structure suitable for the given purpose, and you're fine.

What's the simplest example out there to explain the difference between Parse Trees and Abstract Syntax Trees?

2 Answers2