26

We're all too familiar with waiting for compilation, especially on large projects.

Why isn't a thing to interpret a codebase for quick iterative development instead of generating code for a binary each time?

Is that because if we are compiling at -O0, most of the compilation time comes from parsing, so it won't matter much? Or is it because the effort to develop an interpreter is too high (e.g. for languages with a lot of features like C++)? For a language with a small standard like C, this seems like a reasonable approach instead of waiting for compilation each time you make a change.

Laiv
  • 14,990
gust
  • 377

10 Answers10

35

I refute the premise. There are interpreters / REPLs for compiled, static languages, they're just not as much part of the common workflow as with dynamic languages. Though that also depends on the application. For example, scientists at CERN work a lot in C++ in the Root framework, and they also use the Cling interpreter a lot, an approach which combines many of the advantages of a fast compiled language and a slow interpreted one like Python, especially for scientific purposes.

With some other languages it's even more drastic. Haskell is a static, compiled language (in some ways even more static than OO languages), but it is very common to develop Haskell interactively using GHCi, either as a REPL (see the online version) or just as a quick typechecking pass to highlight what needs to be worked on. Once something is ready implemented, it'll then be part of a library that is always compiled, resulting in fast code, and that can then be called in either a fully-compiled program or in another interactive session.

Of course it can also go the other way around: typical interpreted languages like Python, JavaScript and Common Lisp are all possible to compile at least in some senses of the word (either JIT or a subset of the language can be statically compiled). Though in my opinion this approach is way more limited than starting with a strong statically typed programming language and then using it more interactively, it can still be a good option for optimising the bottleneck parts of an interpreted program, and is indeed commonly done.

18

Why isn't a thing to interpret a codebase for quick iterative development instead of generating code for a binary each time?

Many languages, including C and C++ don’t lend themselves to repl style interpreters. Making a one line or even one character change can have widespread impact to the behavior of a program (consider changes to a #define for example). Somewhat ironically, this same sort of avalanche effect of small code changes leading to large program changes also makes incremental compilation very difficult. So languages that take a very long time to compile will tend to be ones that are also troublesome to interpret.

Telastyn
  • 110,259
8

Comparing say python and swift, in python even simple checks are not made until runtime. I don't actually know if my program runs until every code path has been executed.

On the other hand, swift which started out notoriously bad, will nowadays only recompile changed files and will recompile methods in other files if they are affected by changes, not even complete source files. The recompilation is so fast that it happens while you type your program.

Now Bjarne Stroustrup (I think) has said that modern C++ would have been impossible in 1985 when C++ was invented because no machine from 1985 would be able to compile it in any reasonable time. We do have powerful computers, and they are used.

But the real answer to your question is: Languages like C++ are not interpreted because nobody is willing to invest the time and money to build a fast interpreter, and nobody is willing to pay enough for the ability to use one. And there is the question how fast an interpreter will run for a project with a million lines of code.

gnasher729
  • 49,096
6

We're all too familiar waiting for compilation, especially on large projects.

It sounds like the mix of languages you've been using didn't include Go.

Why isn't it a thing to interpret Go? Because the compilation is already fast enough for human edit->debug loops. On purpose.

Some languages sensibly focus on "time to execute this code in production", while Go included "time that developer waited for compilation" as an explicit design goal from the very beginning. And it shows. Just pull out your stopwatch to verify.


Suppose you changed a single character of source code and clicked "Save".

A well crafted Makefile should be able to recompile a single source file, then invoke the link editor and start running your tests. If your environment has a rather creaky setup that does a lot more work than that, it's time to carefully examine those make deps.


Consider using the numba @jit decoration on your interpreted python functions. Comment it out if you don't care to wait on the compiler overhead.


Implementing both interpreter and compiler for same language is a whole can of worms. The principle danger is that semantics will be different, that is, same source produces different results in different execution environments. In fairness, this is also a concern for -O0, -O1, -O2, -O3 settings, or the more specific switches, but there's usually some hand wavy excuses that can be invoked when it turns out that one version behaves differently from another. Quite often the ANSI C notion of "undefined behavior" will rear its head.

If you're authoring "portable" Scheme or Common Lisp, there are quite a few interpreters and compilers to choose from, with diverse compiler options to set. In my experience, time to compile a function I just edited has never posed interesting delays. There is good support for incremental development. OTOH, doing an ASDF compile of a large package might be a bit time consuming when there's lots of source text to analyze.

J_H
  • 7,645
5

There are REPLs and Interpreters for C++ and most compiled languages

https://replit.com/languages/cpp

http://www.hanno.jp/gotom/Cint.html

But they aren't used often

Why? Because compiling adds checks for things and enforces extra rules like type safety, and those languages were built around enforcing those extra checks. Compiling is a feature!

For example.

Many interpreted languages use "Duck Typing". The interpreter sees if it looks like a duck and quacks like a duck, then it's probably a duck. I know plenty of Ruby devs that swear by it, and it's a great thing until you use an object that doesn't have the right functions and the program blows up in production!

This would never happened in a compiled language. Because we took minutes (and occasionally hours) to meticulously check for it beforehand.

Projects that use compiled languages have decided it's worth more time up-front to ensure rules such as type safety are enforce.

Projects that use interpreted languages have decided it's worth more testing and QA after-the-fact to get to play a bit fast-and-loose with the rules.

EDIT

Why don't developers use interpreters and then compile once every so often? Best of both worlds!

Because you can quickly get into a nightmare situation that causes multiple compile issues because an interpreter won't be able to check everything in anywhere near real-time. Any interpreter will have to play a little fast-and-loose with the rules. You'll probably end up spending any time-savings fixing compile issues only found doing it "the long way."

In practice, stuff like incremental compiles and breaking things into libraries lessens the pain of compiling. Some C/C++ code uses Void* pointers, which negate type saftey. On the flip side, some interpreted languages like TypeScript allow and encourage type safety.

If you want a real-world example, look at some of the "hot-deploy" development environments in Java, which tried to do what you're talking about. I almost always turned them off because they were hit-and-miss.

2

Sunk cost.

For languages which are commonly cross-compiled, the debug tooling for compiled binaries will receive a lot of time and attention by necessity. Debugging a locally executed, simulated version of a cross-compiled project is only useful for a subset of the problems you'll need to debug, and interpreters can only produce those simulated versions. Compilation can produce both simulated versions (and often does, for unit tests) and the real version for on-chip debugging.

By example, C and C++ are the go-to languages for bare-metal microprocessor programming, so there is an entire industry pumping resources into making the debug tooling better. With tools like GDB (and the family of tools built to support it, like OpenOCD) being extremely mature, it makes the prospect of writing a C/C++ interpreter for iterative development less attractive - you'd have a lot of ground to cover just to get debugging feature parity with GDB.

Add to this, most difficult programming problems involve quite a bit of thinking. Faster iteration stops being useful to the developer after the point at which the compile time is equal to the amount of time the programmer is thinking about the problem between builds. Personally, I find that the build time of a large embedded C++ project I work on (~30s for cross-compile, ~60s for locally executed unit tests) is more than fast enough. I rarely find myself staring at the screen waiting for it to complete, I return from thought to see that it's done.

Willa
  • 235
1

Frame challenge. In a modern machine interpreting the language is not needed, compilation is very fast.

We're all too familiar with waiting for compilation, especially on large projects.

This is because many times the developer applies without thinking the steps clean compile, or sometimes they are scripted in without a lot of consideration. When you realise that you used a wrong variable and you want to change a single name in the code does it make sense to rebuild the entire project? Letting the compiler see what changed and rebuild only those files can save a lot of time. You better reconsider when it is appropriate to do a clean before you build.

FluidCode
  • 821
  • 4
  • 10
0

If a source code construct will have the same meaning every time it is executed, converting it into non-optimized machine code would generally not be much more expensive than interpreting it. The primary advantages interpreters have over compilers are:

  1. It's easier to "sandbox" an interpreter than a compiler. If an interpreter includes no forms of I/O except a keyboard and display, and no means of accessing memory without bounds checks, even a maliciously-designed program would be unable to do anything beyond read keystrokes that are made available to it and render graphics in response.

  2. They can easily support dynamic languages where the meaning of a piece of code may vary based upon outside factors. For example, in Javascript, function foo(x,y) { return x+y; } may perform arithmetic addition or string concatenation based upon the types of the operands, which a compiler would have no way of knowing at the time it's processing that function.

  3. In environments that would require keeping the source code in memory, an interpreter may be more space-efficient than a compiler that would require keeping both the source code and machine-code equivalent in memory simultaneously.

  4. When using a language where the behavior of a piece of code can only depend upon other code that has already been executed, an interpreter need not examine parts of the source text that are never actually executed.

While it may be possible and even practical to design a C or C++ interpreter for use in situations where sandboxing was required, in all other regards a non-optimizing compiler would be faster and more efficient.

supercat
  • 8,629
-1

Languages that are normally interpreted or just-in-time compiled have access to metadata that are often lost when the compilation is finished. For instance, it may still be possible to access the fields of the structure by the string name, iterate over them, get information about they actual type. It may be possible to find and call a function by the string name, or read the annotation placed on it. Interpreted and just in time compiled languages quite often have lots of reflection features that are part of the standard and even if not very common in the user code, are heavily used in the libraries. I do not really imagine how something like Hibernate could be implemented for a language like C++. Completely different world.

h22
  • 965
-1

That depends on the language design and features. LISP had an interpreter first in around 1960. 1962 the first incremental machine code compilation was implemented for it. Incremental means that the compiler can compile any function, small or large, and load the machine code into the running Lisp system. Source-level interpreted code can be freely mixed with machine compiled code. That's to this date the dominate way to use Lisp for application development: the code gets mostly compiled, both interactive or as files. Additionally there might be a source-level interpreter - but compiled code and source-interpreted code can be freely mixed.

So there are a bunch of strategies to get fast development times with Lisp:

  • use a source-level interpreter -> this requires no compilation, but compiled code can be called transparently. The drawback is usually slow runtime performance.

  • use an incremental compiler -> this compiles small units of code (expressions, functions, ...). The compiled code is then immediately usable in the running program, even though it is often machine code. The incremental compiler can also be used to immediatly compile the code which gets entered into a Read Eval Print Loop (-> REPL) - the incrementally compiled code then gets evaluated. Compiled code and interpreted code can be freely mixed. If one uses an incremental compiler, one gets the advantage of fast code and fast interactive development.

  • use a file or block compiler -> this compiles single files or blocks of files. The compiled code is written to disk, but can be loaded into a running program. Typically one might either compile the code for debugging or for optimizations to improve runtime speed.

  • use an image-based system -> whole memory dumps of running programs can be saved and restarted. The images contain all code (compiled and interpreted), development information and runtime data. Typically one interacts with such a running system via a REPL. Compiling code and using the compiled code is immediate. Restarting the development environment is fast, since all information is already saved with the image.

For delivery of applications:

  • use a whole-program compiler -> this compiles a whole program to a binary. This usually only done for deliver of applications and not during development.

  • use a treeshaker (and similar delivery tools), which create optimized binaries without development information.

Rainer Joswig
  • 2,240
  • 13
  • 17