15

Generally, modules should not have side effects. However, in a lot of cases, the side-effects are hard to avoid or may be desirable. There are also popular packages with on-import side-effects.

Which side effects, and when, are allowable when developing a package? For instance, these seem on the border of acceptable and not-acceptable:

# Creates a logfile
logging.basicConfig(filename="module.log")

Write output to a logfile

logging.info("Module initialized successfully.")

Add to PYTHONPATH to find a required module

sys.path.append(PLUGIN_DIRECTORY) import my_plugin

Replace a function with a patched version

(from future import ... does similar)

@functools.wraps(math.cos) def _polyfilled_cosine(theta): return math.sin(theta + math.pi / 2) math.cos = _polyfilled_cosine

Register a class or method with some dispatcher

(Often this is done with a decorator or metaclass,

i.e. Flask's @app.route(): https://flask.palletsprojects.com/en/2.3.x/patterns/packages/)

class PngHandler(filetype_registry.GenericHandler): pass filetype_registry.register(PngHandler, '.png')

  • For logfiles, it's more effort to defer their creation until the first write than it is to simply create one, but if you create a logfile it may be created when code is imported, and then never run.
  • PYTHONPATH manipulation is often something to be avoided, but I've encountered many modules that add a subdirectory or a plugin path to PATH.
  • Monkey patching has its issues, but when the situation calls for it I'm not sure import monkeypatch; monkeypatch.apply_patch() is any better.
  • Registering classes in some way (like Flask does) seems very common, but it means that importing a package can have dramatic side effects.

In practice, it seems like many libraries have an object (potentially a singleton) whose constructor handles the initialization, so instead of my_library.do_thing() the semantics are ml = my_library.MyLib(); ml.do_thing(). Is this a better design pattern or does it just kick the issue down the road?

Kaia
  • 422

4 Answers4

22

logging

# Creates a logfile
logging.basicConfig(filename="module.log")

No, don't do it! Now git status is dirty, it shows an untracked file.

Protect this with a __main__ guard, or let __init__ do the setup. Follow the documentation's advice.

And sending an .info() message the console or a file is similarly a no go. Unit tests, and imports, should succeed silently. For example, here is a broken unit test:

$ python -m unittest *_test.py
.....DEBUG: x=34, y=10
..
----------------------------------------------------------------
Ran 7 tests in 1.286s
OK

And here is a proper test which succeeds silently:

$ python -m unittest *_test.py
.......
----------------------------------------------------------------
Ran 7 tests in 1.245s
OK

sys.path

# Add to PYTHONPATH to find a required module
sys.path.append(PLUGIN_DIRECTORY)

No. Fix up PYTHONPATH before you even invoke the interpreter. After all, it will control startup behavior before your module even gets control.

You have several options. Adjust the install script or the install instructions so your users will have a suitable env var, perhaps set in .bashrc. Use a companion Bourne script or Makefile recipe to adjust the path. You can even start with a shebang like this:

#! /usr/bin/env -S PYTHONPATH=/opt/mypackage/plugins:. python

patching

If package foo depends on baz, it seems reasonable that it might import and monkeypatch baz. I have never been burned by that. Had I imported baz, then someone silently patching it behind my back would be troubling.

To patch math.cos seems rather adventurous. Better to access the modified behavior via a new name such as _polyfilled_cosine() or cos1().

registry

Creating a module level app object and registering @app.route()s is perfectly fine; we are using them as intended. They produce no annoying diagnostics or other externally visible side effects. They do not, for example, interfere with a "second caller", a unit test, silently importing the module and exercising helper functions.

OTOH, app.run() should certainly be protected by a __main__ guard.

deferred execution

the semantics are ml = my_library.MyLib(); ml.do_thing(). Is this a better design pattern ?

Yes, this is significantly better than doing the thing at import time. Wait until the __main__ guard or some function or method is actually requesting that the thing be done.

If this module was imported by another module which merely wanted to verify a compatible version number, then triggering and diagnosing any of do_thing()'s numerous potential failure modes is more complexity than warranted.

circular imports

While we're on the topic, here's another thing to avoid. Do not let module a depend on b and b depend on a. Even if you can get import a to "work", you have probably left behind a landmine which some unit test that attempts import b will trigger. And no, don't ask the unit test author to "just import a" instead.

Future maintenance engineers should be able to freely use both forms, import x and from x import foo, for any of the modules. Which implies "no circular references!".

J_H
  • 7,645
13

However, in a lot of cases, the side-effects are hard to avoid or may be desirable.

They are never hard to avoid, and they are never desirable. Or maybe they somewhat are hard to avoid, because it requires mindset change. Which as we know is hard for many people.

For logfiles...

What if the file lives on a network drive? Now you not only create it at import, but also do networking at import. While being unsure that you ever use it.

In fact I had this case in one of my jobs. It would took a webserver around 5s to start, because one import would do a networking. So 5s before we could even check that server is up and running. Bad.

And secondly, where is that file's path coming from? Hardcoded? Really bad. Configured in other module? Order of imports matter, bad. Taken from env? Implicit, hidden dependency, bad.

PYTHONPATH...

That's even worse. I don't know a single case when manipulating env variables (rather then just reading them) is beneficial. Except for situations where a library requires it because it is badly written.

The set of environment variables is a global singleton. What can go wrong?

And doing this at import means that now order of imports matter.

Monkey patching...

Cancer. Do not modify behaviour of existing code at runtime. This leads to more confusion and is so error prone. I never once used monkey patching in production code.

I know that patching is popular in testing, but even there, well designed services, abstractions and passing of dependencies is simply superior.

And again, if you do this at import, then the order of imports matters. Counterintuitive.

Registering classes...

Or more generally decorators. Yes, these are somewhat acceptable, as long as these don't do too much.

Registering classes through decorators is quite popular but I would argue that it is unnecessary. In case of any major side effect (i/o) you again block an import. Why?

Secondly, you tie a certain behaviour with a class. Why is that necessary? It limits its reusability.

Thirdly what is registered or not depends on whether it is imported or not. The entire registration is distributed over the source code, and you have to ensure that all classes are imported. And potentially in correct order. Counterintuitive and error prone.

Wouldn't it be better to have a single explicit function that does all of that?

In practice, it seems like many libraries have an object (potentially a singleton) whose constructor handles the initialization

Yes, it is better. You don't pay for what you don't use. And you don't have hidden behaviours hard to track.

When you code, the first question you should ask is: will other people have trouble understanding my code? Any implicit, hidden behaviour, especially at import simply goes against this principle.

For example, I remember once I worked on a web server, which ate 200mb ram at startup after a change. Kubernetess then would kill it because of low memory limits. We couldn't figure out what is going on, no code seems to indicate such behaviour. Until we found out that some lib was building a cache at import. A huge one. And we didn't even use that piece of the lib.

Amazing what kind of libs people create and are proud of...

freakish
  • 2,803
3

One big issue with your example logging code is that both those lines of code could fail for a multitude of reasons (read-only filesystem, invalid path, lack of filesystem permissions, etc). These errors will be extremely difficult to work with. I can technically wrap a import statement in a try block and catch any exceptions, but how am I supposed to handle those exceptions intelligently this early in my program? The import was the first line in the file, I don't even have any of my support code in scope yet. There's no clean way for me to try to fix the problem and retry the operation. There's also no clean way for me to provide your library with enough information to avoid the error in the first place (like a path to a usable log directory). If this functionality was instead inside a regular my_module.init_log() function, then handling these sorts of problems becomes a straightforward task.

Don't do anything during import that involves non-trivial error handling. You're just making it significantly harder to handle errors. You definitely don't want to take an otherwise-recoverable error and promote it to a fatal error simply because you wanted to run the code during import for some reason.

bta
  • 1,199
-3

TL;DR

Side effects are unacceptable until they are unavoidable.

Convention

Python libraries are "quick and dirty" - they rely heavily on global shared state and there is little to be done about that. Eventually your module would have to interact with these and contribute to chaos.

Python programs vary wildly in scale - from a single page to megabytes of code. In the first case anything goes, in the second - any surprise is fatal.

Therefore there can't be hard and fast rule about side effects. Just avoid them unless it is impossible in your context.

Non-solutions

Init method

Introducing an init() method that would perform side effects for your module is mostly meaningless - it does not change dependency topology and does not eliminate the problem of shared global state. There are rare exceptions, where a two-step initialization helps, but they smell.

Lazy initialization

If you call an init() method from other methods of your module you introduce additional uninitialized state is your module, exacerbating the problem of global state, not fixing it.

Examples

In your listed examples only logging.basicConfig is a critical bug as it has effect only on first call and there is no way to ensure call order from an imported module.

I can find scenarios where other examples are unavoidable.

Basilevs
  • 3,896