49

I study the topics of compilers and interpreters intensively. I want to check if my base understanding is right, so let's assume the following:

I have a language called "Foobish" and its keywords are

<OUTPUT> 'TEXT', <Number_of_Repeats>;

So if I want to print to the console 10 times, I would write

OUTPUT 'Hello World', 10;

Hello World.foobish-file.

Now I write an interpreter in the language of my choice - C# in this case:

using System;

namespace FoobishInterpreter
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            analyseAndTokenize(Hello World.foobish-file)//Pseudocode
            int repeats = Token[1];
            string outputString = Token[0];
            for (var i = 0; i < repeats; i++)
            {
                Console.WriteLine(outputString);
            }
        }
    }
}

On a very easy interpreter level, the interpreter would analyze the script-file, etc. and execute the foobish-language in the way of the interpreter's implementation.

Would a compiler create machine language which runs on the physical hardware directly?

So an interpreter doesn't produce machine language, but does a compiler do it for its input?

Do I have any misunderstandings in the basic way how compilers and interpreters work?

GrayFox
  • 629

6 Answers6

81

The terms "interpreter" and "compiler" are much more fuzzy than they used to be. Many years ago it was more common for compilers to produce machine code to be executed later, while interpreters more or less "executed" the source code directly. So those two terms were well understood back then.

But today there are many variations on the use of "compiler" and "interpreter." For example, VB6 "compiles" to byte code (a form of Intermediate Language), which is then "interpreted" by the VB Runtime. A similar process takes place in C#, which produces CIL that is then executed by a Just-In-Time Compiler (JIT) which, in the old days, would have been thought of as an interpreter. You can "freeze-dry" the output of the JIT into an actual binary executable by using NGen.exe, the product of which would have been the result of a compiler in the old days.

So the answer to your question is not nearly as straightforward as it once was.

Further Reading
Compilers vs. Interpreters on Wikipedia

Robert Harvey
  • 200,592
38

The summary I give below is based on "Compilers, Principles, Techniques, & Tools", Aho, Lam, Sethi, Ullman, (Pearson International Edition, 2007), pages 1, 2, with the addition of some ideas of my own.

The two basic mechanisms for processing a program are compilation and interpretation.

Compilation takes as input a source program in a given language and outputs a target program in a target language.

source program --> | compiler | --> target program

If the target language is machine code, it can be executed directly on some processor:

input --> | target program | --> output

Compilation involves scanning and translating the entire input program (or module) and does not involve executing it.

Interpretation takes as input the source program and its input, and produces the source program's output

source program, input --> | interpreter | --> output

Interpretation usually involves processing (analyzing and executing) the program one statement at a time.

In practice, many language processors use a mix of the two approaches. E.g., Java programs are first translated (compiled) into an intermediate program (byte code):

source program --> | translator | --> intermediate program

the output of this step is then executed (interpreted) by a virtual machine:

intermediate program + input --> | virtual machine | --> output

To complicate things even further, the JVM can perform just-in-time compilation at runtime to convert byte code into another format, which is then executed.

Also, even when you compile to machine language, there is an interpreter running your binary file which is implemented by the underlying processor. Therefore, even in this case you are using a hybrid of compilation + interpretation.

So, real systems use a mix of the two so it is difficult to say whether a given language processor is a compiler or an interpreter, because it will probably use both mechanisms at different stages of its processing. In this case it would probably more appropriate to use another, more neutral term.

Nevertheless, compilation and interpretation are two distinct kinds of processing, as described in the diagrams above,

To answer the initial questions.

A compiler would create machine language which runs on the physical hardware directly?

Not necessarily, a compiler translates a program written for a machine M1 to an equivalent program written for a machine M2. The target machine can be implemented in hardware or be a virtual machine. Conceptually there is no difference. The important point is that a compiler looks at a piece of code and translates it to another language without executing it.

So an interpreter doesn't produce machine language but a compiler does it for its input?

If by producing you are referring to the output, then a compiler produces a target program which may be in machine language, an interpreter does not.

Giorgio
  • 19,764
25

A compiler would create machine language

No. A compiler is simply a program which takes as its input a program written in language A and produces as its output a semantically equivalent program in language B. Language B can be anything, it doesn't doesn't have to be machine language.

A compiler can compile from a high-level language to another high-level language (e.g. GWT, which compiles Java to ECMAScript), from a high-level language to a low-level language (e.g. Gambit, which compiles Scheme to C), from a high-level language to machine code (e.g. GCJ, which compiles Java to native code), from a low-level language to a high-level language (e.g. Clue, which compiles C to Java, Lua, Perl, ECMAScript and Common Lisp), from a low-level language to another low-level language (e.g. the Android SDK, which compiles JVML bytecode to Dalvik bytecode), from a low-level language to machine code (e.g. the C1X compiler which is part of HotSpot, which compiles JVML bytecode to machine code), machine code to a high-level language (any so-called "decompiler", also Emscripten, which compiles LLVM machine code to ECMAScript), machine code to low-level language (e.g. the JIT compiler in JPC, which compiles x86 native code to JVML bytecode) and native code to native code (e.g. the JIT compiler in PearPC, which compiles PowerPC native code to x86 native code).

Note also that "machine code" is a really fuzzy term for several reasons. For example, there are CPUs which natively execute JVM byte code, and there are software interpreters for x86 machine code. So, what makes one "native machine code" but not the other? Also, every language is code for an abstract machine for that language.

There are many specialized names for compilers that perform special functions. Despite the fact that these are specialized names, all of these are still compilers, just special kinds of compilers:

  • if language A is perceived to be at roughly the same level of abstraction as language B, the compiler might be called a transpiler (e.g. a Ruby-to-ECMAScript-transpiler or an ECMAScript2015-to-ECMAScript5-transpiler)
  • if language A is perceived to be at a lower level level of abstraction than language B, the compiler might be called a decompiler (e.g. a x86-machine-code-to-C-decompiler)
  • if language A == language B, the compiler might be called an optimizer, obfuscator, or minifier (depending on the particular function of the compiler)

which runs on the physical hardware directly?

Not necessarily. It could be run in an interpreter or in a VM. It could be further compiled to a different language.

So an interpreter doesn't produce machine language but a compiler does it for its input?

An interpreter doesn't produce anything. It just runs the program.

A compiler produces something, but it doesn't necessarily have to be machine language, it can be any language. It can even be the same language as the input language! For example, Supercompilers, LLC has a compiler that takes Java as its input and produces optimized Java as its output. There are many ECMAScript compilers which take ECMAScript as their inputs and produce optimized, minified, and obfuscated ECMAScript as their output.


You may also be interested in:

Jörg W Mittag
  • 104,619
16

I think you should drop the notion of "compiler versus interpreter" entirely, because it's a false dichotomy.

  • A compiler is a transformer: It transforms a computer program written in a source language and outputs an equivalent in a target language. Usually, the source language is higher-level that the target language - and if it's the other way around, we often call that kind of transformer a decompiler.
  • An interpreter is an execution engine. It executes a computer program written in one language, according to the specification of that language. We mostly use the term for software (but in a way, a classical CPU can be viewed as a hardware-based "interpreter" for its machine code).

The collective word for making an abstract programming language useful in the real world is implementation.

In the past, a programming language implementation often consisted of just a compiler (and the CPU it generated code for) or just an interpreter - so it may have looked like these two kinds of tools are mutually exclusive. Today, you can clearly see that this isn't the case (and it never was to begin with). Taking a sophisticated programming language implementation, and attempting to shove the name "compiler" or "interpreter" to it, will often lead you to inconclusive or inconsistent results.

A single programming language implementation can involve any number of compilers and interpreters, often in multiple forms (standalone, on-the-fly), any number of other tools, like static analyzers and optimizers, and any number of steps. It can even include entire implementations of any number of intermediate languages (that may be unrelated to the one being implemented).

Examples of implementation schemes include:

  • A C compiler that transforms C to x86 machine code, and an x86 CPU that executes that code.
  • A C compiler that transforms C to LLVM IR, an LLVM backend compiler that transforms LLVM IR to x86 machine code, and an x86 CPU that executes that code.
  • A C compiler that transforms C to LLVM IR, and an LLVM interpreter that executes LLVM IR.
  • A Java compiler that transforms Java to JVM bytecode, and a JRE with an interpreter that executes that code.
  • A Java compiler that transforms Java to JVM bytecode, and a JRE with both an interpreter that executes some parts of that code and a compiler that transforms other parts of that code to x86 machine code, and an x86 CPU that executes that code.
  • A Java compiler that transforms Java to JVM bytecode, and an ARM CPU that executes that code.
  • A C# compiler that transforms C# to CIL, a CLR with a compiler that transforms CIL to x86 machine code, and an x86 CPU that executes that code.
  • A Ruby interpreter that executes Ruby.
  • A Ruby environment with both an interpreter that executes Ruby and a compiler that transforms Ruby to x86 machine code, and an x86 CPU that executes that code.

...and so on.

7

While the lines between compilers and interpreters has gotten fuzzy over time, one can still draw a line between them by looking at the semantics of what the program should do and what the compiler/interpreter does.

A compiler will generate another program (typically in a lower-level language like machine code) which, if that program is run, will do what your program should do.

An interpreter will do what your program should do.

With these definitions, the places where it gets fuzzy are the cases where your compiler/interpreter can be thought of as doing different things depending on how you look at it. For example, Python takes your Python code and compiles it into a compiled Python bytecode. If this Python bytecode is run through a Python bytecode interpreter, it does what your program was supposed to do. In most situations, however, Python developers think of both of those steps being done in one big step, so they choose to think of the CPython interpreter as interpreting their sourcecode, and the fact that it got compiled along the way is considered an implementation detail. In this way, it's all a matter of perspective.

Cort Ammon
  • 11,917
  • 3
  • 26
  • 35
5

Here's a simple conceptual disambiguation between compilers and interpreters.

Consider 3 languages: programming language, P (what the program is written in); domain language, D (for what goes on with the running program); and target language, T (some third language).

Conceptually,

  • a compiler translates P to T so that you can evaluate T(D); whereas

  • an interpreter evaluates P(D) directly.

Lawrence
  • 657