93

I've always used JSON files for configuration of my applications. I started using them from when I coded a lot of Java, and now I'm working mainly on server-side and data science Python development and am not sure if JSON is the right way to go any more.

I've seen Celery use actual Python files for configuration. Initially I was skeptical about it. But the idea of using simple Python data structures for configuration is starting to grow on me. Some pros:

  • The data structures will be the same as I'm normally coding in. So, I don't need to change frame of mind.
  • My IDE (PyCharm) understands the connection between configuration and code. Ctrl + B makes it possible to jump between configuration and code easily.
  • I don't need to work with IMO unnecessary strict JSON. I'm looking at you double quotes, no trailing commas and no comments.
  • I can write testing configurations in the application I'm working on, then easily port them to a configuration file without having to do any conversion and JSON parsing.
  • It is possible to do very simple scripting in the configuration file if really necessary. (Although this should be very, very limited.)

So, my question is: If I switch, how am I shooting myself in the foot?

No unskilled end user will be using the configuration files. Any changes to the configuration files are currently committed to Git and are rolled out to our servers as part of continuous deployment. There are no manual configuration changes, unless there is an emergency or it is in development.

(I've considered YAML, but something about it irks me. So, for now it is off the table.)

7 Answers7

105

Using a scripting language in place of a config file looks great at first glance: you have the full power of that language available and can simply eval() or import it. In practice, there are a few gotchas:

  • it is a programming language, which needs to be learnt. To edit the config, you need to know this language sufficiently well. Configuration files typically have a simpler format that is more difficult to get wrong.

  • it is a programming language, which means that the config can get difficult to debug. With a normal config file you look at it and see what values are provided for each property. With a script, you potentially need to execute it first to see the values.

  • it is a programming language, which makes it difficult to maintain a clear separation between the configuration and the actual program. Sometimes you do want this kind of extensibility, but at that point you are probably rather looking for a real plugin system.

  • it is a programming language, which means that the config can do anything that the programming language can do. So either you are using a sandbox solution which negates much of the flexibility of the language, or you are placing high trust in the config author.

So using a script for configuration is likely OK if the audience of your tool is developers, e.g. Sphinx config or the setup.py in Python projects. Other programs with executable configuration are shells like Bash, and editors like Vim.

Using a programming language for configuration is necessary if the config contains many conditional sections, or if it provides callbacks/plugins. Using a script directly instead of eval()-ing some config field tends to be more debuggable (think of the stack traces and line numbers!).

Directly using a programming language may also be a good idea if your config is so repetitive that you are writing scripts to autogenerate the config. But perhaps a better data model for the config could remove the need for such explicit configuration? For example, it may be helpful if the config file can contain placeholders that you later expand. Another feature sometimes seen is multiple config files with different precedence that can override each other, though that introduces some problems of its own.

In the majority of cases, INI files, Java property files, or YAML documents are much better suited for configuration. For complex data models, XML may also be applicable. As you've noted, JSON has some aspects that make it unsuitable as a human-editable configuration file, although it is a fine data exchange format.

amon
  • 135,795
55

+1 to everything in amon's answer. I'd like to add this:

You'll regret using Python code as your configuration language the first time you want to import the same configuration from within code written in a different language. For example if code that's part of your project and it written in C++ or Ruby or something else needs to pull in the configuration, you'll need to link in the Python interpreter as a library or parse the configuration in a Python coprocess, both of which are awkward, difficult, or high-overhead.

All of the code that imports this configuration today may be written in Python, and you may think this will be true tomorrow as well, but do you know for sure?

You said you would use logic (anything other that static data structures) in your configuration sparingly if at all, which is good, but if there's any bit of that at all, you'll find it difficult in the future to undo it so you can move back to a declarative configuration file.

EDIT for the record: several people have commented on this answer about how likely or unlikely it is that a project would ever be successfully completely rewritten in another language. It's fair to say that a complete backward-compatible rewrite is probably rarely seen. What I actually had in mind was bits and pieces of the same project (and needing access to the same configuration) being written in different languages. For example, serving stack in C++ for speed, batch database cleanup in Python, some shell scripts as glue. So spend a thought for that case too :)

Celada
  • 664
25

The other answers are already very good, I'll just bring my experience of real-world usage in a few projects.

Pros

They are mostly already spelled out:

  • if you are in a Python program, parsing is a breeze (eval); it works automatically even for more complex data types (in our program, we have geometric points and transformations, which are dumped/loaded just fine through repr/eval);
  • creating a "fake config" with just a few line of code is trivial;
  • you have better structures and, IMO, way better syntax than JSON (jeez even just having comments and not having to put double quotes around dictionary keys is a big readability win).

Cons

  • malicious users can do anything that your main program can do; I don't consider this much of a problem, since generally if a user can modify a configuration file he/she can already do whatever the application can do;
  • if you are no longer in a Python program, now you have a problem. While several of our configuration files remained private to their original application, one in particular came to store information that is used by several different programs, most of which are currently in C++, which now have a hacked-together parser for an ill-defined small subset of Python repr. This is obviously a bad thing.
  • Even if your program remains in Python, you may change Python version. Let's say your application started in Python 2; after lots of testing you managed to migrate it to Python 3 - unfortunately, you didn't really test all of your code - you have all the configuration files lying around on your customers' machines, written for Python 2, and on which you don't really have control. You cannot even provide a "compatibility mode" to read old configuration files (which is often done for file formats), unless you are willing to bundle/call the Python 2 interpreter!
  • Even if you are in Python, modifying the configuration file from code is a real problem, because... well, modifying code is not trivial at all, especially code that has a rich syntax and is not in LISP or similar. One program of ours has a configuration file that is Python, originally written by hand, but which later turned out it would be useful to manipulate via software (a particular setting is a list of things that is way simpler to reorder using a GUI). This is a big problem, because:

    • even just performing a parse→AST→rewrite roundtrip is not trivial (you'll notice that half of the proposed solutions are later marked as "obsolete, do not use, does not work in all cases");
    • even if they worked, AST is way too low-level; you are generally interested in manipulating the result of the computations performed in the file, not the steps that brought to it;
    • which brings us to the simple fact that you cannot just edit the values you are interested with, because they may be generated by some complex computation that you cannot understand/manipulate through your code.

    Compare this with JSON, INI or (God forbid!) XML, where the in-memory representation can always be edited and written back either without loss of data (XML, where most DOM parsers can keep whitespace in text nodes and comment nodes) or at least losing just some formatting (JSON, where the format itself doesn't allow much more than the raw data you are reading).


So, as usual, there's no clear-cut solution; my current policy on the issue is:

  • if the configuration file is:

    • surely for a Python application and private to it - as in, nobody else will ever try to read from it;
    • hand-written;
    • coming from a trusted source;
    • using target application data types is really a premium;

    a Python file may be a valid idea;

  • if instead:

    • there may be the possibility of having some other application read from it;
    • there is the possibility that this file may be edited by an application, possibly even my application itself;
    • is provided by an untrusted source.

    a "data only" format may be a better idea.

Notice that it's not required to make a single choice - I recently wrote an application that uses both approaches. I have an almost-never-modified file with first-setup, handwritten settings where there are advantages of having nice Python bonuses, and a JSON file for configuration edited from the UI.

9

The main question is: do you want your configuration file to be in some Turing complete language (like Python is)? If you do want that, you might also consider embedding some other (Turing complete) scripting language like Guile or Lua (because there could be perceived as "simpler" to use, or to embed, than Python is; read the chapter on Extending & Embedding Python). I won't discuss that further (because other answers -e.g. by Amon- discussed that in depth) but notice that embedding a scripting language in your application is a major architectural choice, that you should consider very early; I really don't recommend making that choice later!

A well known example of a program configurable thru "scripts" is the GNU emacs editor (or probably AutoCAD in the proprietary realm); so be aware that if you accept scripting, some user would eventually use -and perhaps abuse, in your point of view- that facility extensively and make a multi-thousand lines script; hence the choice of a good enough scripting language is important.

However (at least on POSIX systems), you might consider convenient to enable the configuration "file" to be dynamically computed at initialization time (of course, leaving the burden of a sane configuration to your system admin or user; actually it is a configuration text which comes from some file or from some command). For that, you could simply adopt the convention (and document it) that a configuration file path starting with e.g. a ! or a | is actually a shell command that you would read as a pipeline. This leaves your user with the choice of using whatever "preprocessor" or "scripting language" he is the most familiar with.

(you need to trust your user about security issues if you accept a dynamically computed configuration)

So in your initialization code, your main would (for example) accept some --config argument confarg and get some FILE*configf; from it. If that argument starts with ! (i.e. if (confarg[0]=='!') ....), you would use configf = popen(confarg+1, "r"); and close that pipe with pclose(configf);. Otherwise you would use configf=fopen(confarg, "r"); and close that file with fclose(configf); (don't forget the error checking). See pipe(7), popen(3), fopen(3). For an application coded in Python read about os.popen, etc...

(document also for the weird user wanting to pass a configuration file named !foo.config to pass ./!foo.config to bypass the popen trick above)

BTW, such a trick is only a convenience (to avoid requiring the advanced user to e.g. code some shell script to generate a configuration file). If the user want to report any bug, he should send you the generated configuration file...

Notice that you could also design your application with the ability to use and load plugins at initialization time, e.g. with dlopen(3) (and you need to trust your user about that plugin). Again, this is a very important architectural decision (and you need to define and provide some rather stable API and convention about these plugins and your application).

For an application coded in a scripting language like Python you could also accept some program argument for eval or exec or similar primitives. Again, the security issues are then the concern of the (advanced) user.

Regarding the textual format for your configuration file (be it generated or not), I believe that you mostly need to document it well (and the choice of some particular format is not that important; however I recommend to let your user be able to put some -skipped- comments inside it). You could use JSON (preferably with some JSON parser accepting and skipping comments with usual // till eol or /*...*/ ...), or YAML, or XML, or INI or your own thing. Parsing a configuration file is reasonably easy (and you'll find many libraries related to that task).

3

Adding to amon's answer, have you considered alternatives? JSON is maybe more than you need, but Python files will probably give you problems in the future for the reasons mentioned above.

However Python already has a config parser for a very simple config language that might fulfill all your needs. The ConfigParser module implements a simple config language.

1

I have worked for a long time with some well-known software which has its configuration files written in TCL, so the idea is not new. This worked quite well, since users who didn't know the language could still write/edit simple configuration files using a single set name value statement, while more advanced users and developers could pull sophisticated tricks with this.

I don't think that "the config files can get difficult to debug" is a valid concern. As long as your application doesn't force users to write scripts, your users can always use simple assignments in their configuration files, which is hardly any more difficult to get right compared to JSON or XML.

Rewriting the config is a problem, though it's not as bad as it seems. Updating arbitrary code is impossible, but loading config from a file, altering it and saving it back is. Basically, if you do some scripting in a config file which is not read-only, you'll just end up with an equivalent list of set name value statements once it is saved. A good hint that this will happen is a "do not edit" comment at the beginning of the file.

One thing to consider is that your config files won't be reliably readable by simple regex-based tools, such as sed, but as far as I understand this is already not the case with your current JSON files, so there's not much to lose.

Just make sure you use appropriate sandboxing techniques when executing your config files.

1

Besides all the valid points of other good answers here (wow, they even mentioned the Turing-complete concept), there are actually a couple solid practical reasons to NOT use a Python file as your configuration, even when you are working on a Python-only project.

  1. The settings inside a Python source file is technically part of executable source code, rather than a read-only data file. If you go this route, you would typically do import config, because that kind of "convenience" was presumably one of the major reason that people started with using a Python file as config in the first place. Now you tend to commit that config.py into your repo, otherwise your end user would encounter a confusing ImportError when they try to run your program for the first time.

  2. Assuming you actually committing that config.py into your repo, now your team members would probably have different settings on different environment. Imagine someday somehow some member accidentally commits his/her local configuration file into the repo.

  3. Last but not the least, your project could have passwords in configuration file. (This is a debatable practice in its own, but it happens anyway.) And if your configuration file exists in repo, you risk committing your credential into a public repo.

Now, using a data-only configuration file, such as the universal JSON format, can avoid all the 3 problems above, because you can reasonably ask the user to come up with their own config.json and feed it into your program.

PS: It is true that JSON has many restriction. 2 of the limitations mentioned by the OP, can be solved by some creativity.

  • How to put comments in a JSON file (properly)
  • And I usually have a placeholder to bypass the trailing comma rule. Like this:

    {
        "foo": 123,
        "bar": 456,
        "_placeholder_": "all other lines in this file can now contain trailing comma"
    }
    
RayLuo
  • 731