16

As per Wikipedia: A compiled language is a programming language whose implementations are typically compilers (translators that generate machine code from source code). And an interpreted language is a type of programming language for which most of its implementations execute instructions directly and freely, without previously compiling a program into machine-language instructions.

Hence the following is clear.

  • C, C++ and few other similar languages - Compiled Language
  • Shell script, Perl, Ruby and some more - Interpreted Language

However, there is a 3rd kind of language as well. Languages like C# and Java which use both a compiler and a JIT while running. So my question is, is there a separate name for such languages or they can be categorized to either of the above types? An explanatory answer would be more helpful?

EDIT:

From Wikipedia and this SO post: Interpreted vs Compiled it is evident that there are 2 well-defined language implementation types. But my question is about the fact that is it sufficient to have 2 categories, can all languages be fit within these 2 or is there a 3rd one?

Sisir
  • 888

5 Answers5

52

The answer to your question:

Can every language be categorized as either compiled or interpreted?

Is "No", but not for the reason you think it is. The reason is not that there is a third missing category, the reason is that the categorization itself is nonsensical.

There is no such thing as a "compiled language" or an "interpreted language". Those terms are not even wrong, they are nonsensical.

Programming languages are sets of abstract mathematical rules, definitions, and restrictions. Programming languages aren't compiled or interpreted. Programming languages just are. [Credit goes to Shriram Krishnamurthi who said this in an interview on Channel 9 years ago (at about 51:37-52:20).]

In fact, a programming language can perfectly exist without having any interpreter or compiler! For example, Konrad Zuse's Plankalkül which he designed in the 1930s was never implemented during his lifetime. You could still write programs in it, you could analyze those programs, reason about them, prove properties about them … you just couldn't execute them. (Well, actually, even that is wrong: you can of course run them in your head or with pen and paper.)

Compilation and interpretation are traits of the compiler or interpreter (duh!), not the programming language. Compilation and interpretation live on a different level of abstraction than programming languages: a programming language is an abstract concept, a specification, a piece of paper. A compiler or interpreter is a concrete piece of software (or hardware) that implements that specification. If English were a typed language, the terms "compiled language" and "interpreted language" would be type errors. [Again, credit to Shriram Krishnamurthi.]

Every programming language can be implemented by a compiler. Every programming language can be implemented by an interpreter. Many modern mainstream programming languages have both interpreted and compiled implementations. Many modern mainstream high-performance programming language implementations have both compilers and interpreters.

There are interpreters for C and for C++. On the other hand, every single current major mainstream implementation of ECMAScript, PHP, Python, Ruby, and Lua has a compiler. The original version of Google's V8 ECMAScript engine was a pure native machine code compiler. (They went through several different designs, and the current version does have an interpreter, but for many years, it didn't have one.) XRuby and Ruby.NET were purely compiled Ruby implementations. IronRuby started out as a purely compiled Ruby implementation, then added an interpreter later in order to improve performance. Opal is a purely compiled Ruby implementation.

Some people might say that the terms "interpreted language" or "compiled language" make sense to apply to programming languages that can only be implemented by an interpreter or by a compiler. But, no such programming language exists. Every programming language can be implemented by an interpreter and by a compiler.

For example, you can automatically and mechanically derive a compiler from an interpreter using the Second Futamura Projection. It was first described by Prof. Yoshihiko Futamura in his 1971 paper Partial Evaluation of Computation Process – An approach to a Compiler-Compiler (Japanese), an English version of which was republished 28 years later. It uses Partial Evaluation, by partially evaluating the partial evaluator itself with respect to the interpreter, thus yielding a compiler.

But even without such complex highly-academic transformations, you can create something that is functionally indistinguishable from compilation in a much simpler way: just bundle together the interpreter with the program to be interpreted into a single executable.

Another possibility is the idea of a "meta-JIT". (This is related in spirit to the Futamura Projections.) This is e.g. used in the RPython framework for implementing programming languages. In RPython, you write an interpreter for your language, and then the RPython framework will JIT-compile your interpreter while it is interpreting the program, thus producing a specialized compiled version of the interpreter which can only interpret that one single program – which is again indistinguishable from compiling that program. So, in some sense, RPython dynamically generates JIT compilers from interpreters.

The other way around, you can wrap a compiler into a wrapper that first compiles the program and then directly executes it, making this wrapped compiler indistinguishable from an interpreter. This is, in fact, how the Scala REPL, the C♯ REPL (both in Mono and .NET), the Clojure REPL, the interactive GHC REPL, and many other REPLs are implemented. They simply take one line / one statement / one expression, compile it, immediately run it, and print the result. This mode of interacting with the compiler is so indistinguishable from an interpreter, that some people actually use the existence of a REPL for the programming language as the defining characteristic of what it means to be an "interpreted programming language".

Note, however, that you can't run a program without an interpreter. A compiler simply translates a program from one language to another. But that's it. Now you have the same program, just in a different language. The only way to actually get a result of the program is to interpret it. Sometimes, the language is an extremely simple binary machine language, and the interpreter is actually hard-coded in silicone (and we call it a "CPU"), but that's still interpretation.

Some people say that you can call a programming language "interpreted" if the majority of its implementations are interpreters. Well, let's just look at a very popular programming language: ECMAScript. There are a number of production-ready, widely-used, high-performance mainstream implementations of ECMAScript, and every single one of them includes at least one compiler, some even multiple compilers. So, according to this definition, ECMAScript is clearly a compiled language.

You might also be interested in this answer of mine, which explains the differences and the different means of combining interpreters, JIT compilers and AOT compilers and this answer dealing with the differences between an AOT compiler and a JIT compiler.

It is possible to categorize language implementations to some degree. In general, we have the distinction between

  • compilers and
  • interpreters (if the interpreter interprets a language that is not meant for humans, it is also often called a virtual machine)

Within the group of compilers, we have the temporal distinction when the compiler is run:

  • Just-In-Time compilers run while the program is executing
  • Ahead-Of-Time compilers run before the program starts

And then we have implementations which combine interpreters and compilers, or combine multiple compilers, or (much more rare) multiple interpreters. Some typical combinations are

  • mixed-mode execution engines which combine an interpreter and a JIT compiler that both process the same program at the same time (examples: Oracle HotSpot JVM, IBM J9 JVM)
  • multi-phase [I invented that term; I don't know of a widely-used one] execution engines, where the first phase is a compiler that compiles the program to a language more suitable for the next phase, and then a second phase which processes that language. (There could be more phases, but two is typical.) As you can probably guess, the second phase can again use different implementation strategies:
    • an interpreter: this is a typical implementation strategy. Often, the language that is interpreted is some form of bytecode that is optimized for "interpretability". Examples: CPython, YARV (pre-2.6), Zend Engine
    • a compiler, which makes this a combination of two compilers. Typically, the first compiler translates the language into some form of bytecode that is optimized for "compilability" and the second compiler is an optimizing compiler that is specific to the target platform
    • a mixed-mode VM. Examples: YARV post-2.6, Rubinius, SpiderMonkey, SquirrelFish Extreme, Chakra

But, there are still others. Some implementations use two compilers instead of a compiler and an interpreter to get the same benefits as a mixed-mode engines (e.g. the first few years, V8 worked this way).

RPython combines a bytecode interpreter and a JIT, but the JIT does not compile the user program, it compiles the bytecode interpreter while it interprets the user program! The reason for this is that RPython is a framework for implementing languages, and in this way, a language implementor only has to write the bytecode interpreter and gets a JIT for free. (The most well-known user of RPython is of course PyPy.)

The Truffle framework interprets a language-agnostic AST, but at the same time it specializes itself to the specific AST, which is kind-of like compilation but also kind-of not. The end result is that Truffle can execute the code extremely fast, without knowing too much about the language-specifics. (Truffle is also a generic framework for language implementations.) Because the AST is language-agnostic, you can mix and match multiple languages in the same program, and Truffle is able to perform optimizations across languages, such as inlining a Ruby method into an ECMAScript function etc.

Macros and eval are sometimes cited as features that cannot possibly be compiled. But that is wrong. There are two simple ways of compiling macros and eval. (Note that for the purpose of compilation, macros and eval are somewhat dual to each other, and can be handled using similar means.)

  1. Using an interpreter: for macros, you embed an interpreter into the compiler. For eval, you embed an interpreter into the compiled program or into the runtime support libraries.
  2. Using a compiler: for macros, you compile the macro first, then embed the compiled macro into your compiler and compile the program using this "extended" compiler. For eval, you embed a compiler into the compiled program or into the runtime support libraries.
CJ Dennis
  • 669
Jörg W Mittag
  • 104,619
14

If we are being pedantic, there is no such thing as a compiled or interpreted language, since any language could be in principle be implemented either by a compiler or an interpreter. However, most languages follow a relatively consistent implementation strategy. C++ is almost always compiled to native code. Python is almost always run via a bytecode interpretor. Java is almost always run via a JIT comiler. So, if we don't insist on an obtuse pedanticness, it does make sense to talk about compiled or interpreted languages.

However, language implementation strategy does not neatly fit into the compiled/interpreted dichotomy. Essentially no languages are strictly interpreted, executed directly from the source. This would be very slow. Instead, virtually all "interpreted" language implementations compile the source into something (often bytecode) which can be more effeciently executed. On top of this, some implementations JIT compile that bytecode into native code at run time. Even languages that we think of as being compiled often contain some amount of interpretation. For example, printf in C is effectively an interpreter of the format string.

So, I would argue that it doesn't make sense to try and categorize languages into compiled or interpreted. Pretty much any languages is some degree of hybrid of the two approaches. (And yes, if we are pedantic, it is language implementations not languages which are compiled/interpreted).

Winston Ewert
  • 25,052
2

No, the classification compiled vs. interpreted is not relevant for languages.

Compilers and interpreters are only means to deliver a language. And technology behind these means evolves. Practical examples:

  • In the late 80's Instant C from Rational Systems was a C interpreter (yes!). It was quite impressive although too slow for performance sensitive development.
  • For a long time, BASIC was a typically interpreted language. But all of the sudden, a market of BASIC compilers emerged (e.g. Visual Basic).
  • UCSD pascal, the first language delivered with a virtual machine was considered by many as a compiler, because the source code was compiled into a a bytecode, the p-code. But some saw it in the interpreter category since the p-code was in reality not compiled into machine code but itself interpreted, with a consequence of a slower performance.
  • Transpilers were and are still used to convert one language into another, possibly to a compiled language, using some additional libraries to cope with the more dynamic language constructs.
  • Virtual machine combined with JIT-compiler technology (e.g. JVM, CLI/CLR) allows nowadays to compile a lot of initially interpreted language (examples: IronPython for Ptyhton, or Clojure as a Lisp dialect).

Finally an interesting article of how the same language can evolve from interpreted to compiled.

Christophe
  • 81,699
0

Some versions of Basic tokenize the statements in a source code file, something between interpretive and compiled.

APL (a programming language), has an execute operator, that takes a variable string as a parameter, and then interprets and executes it. There's a create function operator, that takes a variable string matrix and creates a function from it. This would require some type of run time compilation in order to consider the language as compiled.

rcgldr
  • 191
0

Under OP's definition of compiled language and interpreted language.

Other categories would be:

  • None. A language for which there exist neither interpreters nor compilers. This is the case of many languages in academic computer science.
  • Mixed. A language for which most of its implementations partially compile it and partially execute it. As OP mentions this is the case for Java. And no, I don't know a popular name for this category. JIT compiled languages is what I most often hear them referred to.

A language may change categories as new tools are developed or use trends change.