What makes a scripting language "embeddable"?

Question

According to my experience, Wikipedia and prior answers, a scripting language is vague category of languages which are high-level (no manual memory management) and interpreted. Popular examples are Python, Ruby, Perl and Tcl.

Some scripting languages can be "embedded". For example:

Lua is frequently embedded in video game applications.
TCL is embedded in the fossil version control system

It is sometimes said that Lua is more easily embedded than Python or that JavaScript is difficult to embed, because the size of the interpreter. Similarly, Wren is "intended for embedding in applications".

What factors make a language embeddable? Is it solely the size and speed of the base interpreter or do other factors come into play?

score 46 · Accepted Answer · edited Jan 17 '20 at 19:56

Embedding a language (I'll avoid characterizing it as "scripting") means that the following has been done:

The interpreter and runtime are running in the same process as the host application
Enough of the standard types and the standard library are also available from within that runtime
Most times, the application has its own library available to the host application

The first bullet is literally the definition of embedding. The main reason to embed a language into an application is to provide an easy means of extending the functionality of the application. Reasons include:

Creating macros to perform complex steps repeatably as fast as possible (e.g. Photoshop, Gimp)
Programming game elements by less technical people (many games have some level of embedded language to create mods, characters, etc.)

So the big question is then, what factors simplify embedding?

Complexity of the interpreter and/or runtime environment (simpler is easier)
Size of the standard library (smaller is easier)
Layers of indirection (fewer are better, Typescript recompiles down to JavaScript like C++ used to recompile down to C, there is no native Typescript environment)
Compatibility of underlying architecture (several languages are implemented on the Java runtime or .Net runtime, which makes it easier to embed due to the similarity of the underlying environment)

Bottom line is that it is possible to embed a wide range of languages into another application. In some cases, the hard work has already been done for you and you simply need to include the language into your app. For example, IronPython is built on .Net and Jython is built on Java allowing you to easily embed Python into applications built on those platforms.

As far as how robust or complete the implementation is, you will get mixed results. Some projects are more mature than others. Some languages are just easier to implement (there is a reason why LISP was one of the first embedded languages).

score 10 · Answer 2 · answered Jan 16 '20 at 20:30

The main factor is typically the API that's used by host applications to access the language libraries. Languages like Lua are designed to be easily 'connected to' from host applications. The language may be available in library form, the API easily callable from other languages (generally a plain C API). The API usually provides functions to run a script, setting up callbacks to respond to certain situations (like undefined variables), and accessing the host application's resources/gui. API's that let you do that fairly easily are more "embeddable" than those that don't.

score 10 · Answer 3 · answered Jan 16 '20 at 20:42

In theory any language can be embedded. If there are no constraints on the solution, it is actually the case. It's natural consequence of Turing completeness i.e. you can always build an emulator.

What I think you are asking is "what makes a language practical for this purpose?" I think one of the main things that makes a language a good choice for this is one that's defined in terms of behaviors as opposed to implementations. If the language in question has very specific rules around how int values need to be represented in memory, for example, this creates challenges when running on top of another application whose ideas about integers are not exactly the same.

A good example of this is Python which is defined in terms of how it behaves and has little to say about how it is implemented. This means you can write a fully functional Python interpreter in that acts as a facade to Java (or C#, etc.) types. This means not only can you run the Python scripts, you can use it to interact with parts of the application written in Java.

Another factor is simplicity in the semantics of the language. The more complex the language, the more difficult it tends to be to build an interpreter for that language for obvious reasons.

Lie Ryan · Answer 4 · 2020-01-18T05:52:17.787

There are a couple factors:

whether the language has support for embedding API. Some scripting languages like Python and Lua has officially supported APIs specifically designed to embed those languages into a host application. This includes specifying how the language interacts with foreign function interface, foreign object handles, foreign classes, etc and specifying an API for those foreign languages to call into and work with objects in the language. Languages that are designed for embedding can make these foreign objects look and behave just like regular classes and objects without complicated wrappers classes. Protocol based languages like Python tend to be very good at this.
Languages implementations that are designed to embedding are often designed to share a thread with the main application. This is because UI elements can usually only be updated from the UI thread, so interpreters for embedded languages need be able to run, yield to the main thread, call UI updates, and resume execution without taking over the main thread completely. Language implementations that aren't designed for embedding might require the interpreter to run in separate thread or processes and communicate with the application's UI thread only through an RPC/message queue mechanism, which comes at a significant performance cost
Memory safety. Memory safe languages and languages without direct memory access are easier to embed because code written in the scripting language cannot crash the main application due to direct memory access.
How big the runtime support for those languages are. Languages with big standard libraries tend to be at a disadvantage for embedding, because it means bloating the application size. On the other hand, there are many applications where the huge standard library is the reason why a specific scripting language is chosen, so that scripters can actually access functionalities that the main application itself are unwilling to actually provide directly.
Additional challenges like Typescript can only be embedded by embedding a JavaScript interpreter. So there is the added challenge of embedding JavaScript interpreter even if you actually you only care about Typescript.

score 3 · Answer 5 · answered Jan 18 '20 at 23:57

Languages that are designed to be embeddable try to provide features to ease access for the host application. There are two layers to this, the actual language syntax and semantics and the runtime implementation of the language you try to embed.

Take for example both Python and Tcl, which both are labeled as embeddable. From my experience, Python is much harder to embed than Tcl (did both, in multiple contexts).

Why is that so?

Python is opinionated, the world is assumed to look like a POSIX setup. The filesystem APIs, console APIs, network APIs all do not abstract much, mostly are direct wrappers around POSIX C-APIs. Tcl isn't that close to the hardware, it tries to abstract most APIs and does not provide a lot of low level APIs to the script layers. So if you try to embed Python, you must provide a POSIX like file abstraction. For Tcl you do not need to do anything, if you do not care about files. Less work, easier to embed.

(C)Python is basically single threaded with a global lock. Tcl has no global lock. So if your host application is multi threaded and you embed Python for critical stuff you just added a global lock to your application. So embedding Python in multi-threaded programs is much more painful.

Pythons module system by default maps to a filesystem. Your module names and filesystem names are linked. So your language is limited by the filesystem you provide and breaks suddenly when you port it to a filesystem that is case insensitive. Tcl did not link its module system to any filesystem layout, so the filesystem doesn't change semantics of your language.

Python assumes the world is blocking and synchronous (like POSIX), only slowly adopting async APIs. Tcl tries harder to do nonblocking APIs and callbacks. As it is much easier to go simulate blocking APIs on top of async than the other way round, you usually have less work to do.

Tcl is rather minimal. You can strip out all stuff you do not need, easily. Like get rid of any filesystem APIs. Or process control. Or regular expressions. Python has a ton of stuff in its builtin namespace and is nearly impossible to lockdown and secure against untrusted code due to write access to the bytecode. It is not even considered a bug that pure Python code can crash the process (e.g. Python bytecode sometimes just uses a raw pointer and writes to it). So it is harder to embed if you try to run untrusted user code.

Pythons standard library often assumes it is in control of the world. Tcls does not. For example Python often blows up when you encounter out of memory issues and kills your process. Some Python API calls might even dump critical errors to stderr (which might not exist in an embedding situation, e.g. Windows service contexts and kill your application), while Tcl usually tries very hard to give control back to the application without crashing or exiting. So being a good guest is important.

So things that make a language easy to embed is being like a good guest:

Do not assume too much about your hosting environment. You have no filesystem. You have no stdio. You have no environment variables. Check and minimize your assumptions about the world.
Clean up after yourself. Be able to reinitialize yourself multiple times.
Do not get in the way of threads in your host.
Allow it to customize the available featureset to the problem domain.
Be reasonable safe even if running untrusted input.

What makes a scripting language "embeddable"?

5 Answers5