53

Let’s say fn(x) is a pure function that does something expensive, like returning a list of the prime factors of x.

And let’s say we make a memoized version of the same function called memoizedFn(x). It always returns the same result for a given input, but it maintains a private cache of previous results to improve performance.

Formally speaking, is memoizedFn(x) considered pure?

Or is there some other name or qualifying term used to refer to such a function in FP discussions? (i.e. a function with side effects that may affect the computational complexity of subsequent calls but that may not affect the return values.)

Robert Harvey
  • 200,592
callum
  • 10,467

6 Answers6

44

Yes. The memoized version of a pure function is also a pure function.

All that function purity cares about is the effect that input parameters on the return value of the function (passing the same input should always produce the same output) and any observable side effects (e.g. text to the terminal or UI or network). Computation time and extra memory usages is irrelevant to function purity.

Caches of a pure function is pretty much invisible to the program; a functional programming language is allowed to automatically optimise a pure function to a memoized version of the function if it can determine that it'll be beneficial to do so. In practice, automatically determining when memoization is beneficial is actually quite a difficult problem, but such optimisation would be valid.

Pete
  • 9,016
Lie Ryan
  • 12,496
19

Wikipedia defines a "Pure Function" as a function that has the following properties:

  • Its return value is the same for the same arguments (no variation with local static variables, non-local variables, mutable reference arguments or input streams from I/O devices).

  • Its evaluation has no side effects (no mutation of local static variables, non-local variables, mutable reference arguments or I/O streams).

In effect, a pure function returns the same output given the same input, and doesn't affect anything else outside of the function. For purposes of purity, it doesn't matter how the function computes its return value, so long as it returns the same output given the same input.

Functionally pure languages like Haskell routinely use memoization to speed up a function by caching its previously computed results.

Robert Harvey
  • 200,592
7

Yes, memoized pure functions are commonly referred to as pure. This is especially common in languages like Haskell, in which memoized, lazily-evaluated, immutable results are a built-in feature.

There is one important caveat: the memoizing function must be thread-safe, or else you might get a race condition when two threads both try to call it.

One example of a computer scientist using the term “purely functional” this way is this blog post by Conal Elliott about automatic memoization:

Perhaps surprisingly, memoization can be implemented simply and purely functionally in a lazy functional language.

There are many examples in the peer-reviewed literature, and have been for decades. For example, this paper from 1995, “Using Automatic Memoization as a Software Engineering Tool in Real-World AI Systems,” uses very similar language in section 5.2 to describe what we would today call a pure function:

Memoization only works for true functions, not procedures. That is, if a function’s result is not completely and deterministically specified by its input parameters, using memoization will give incorrect results. The number of functions that can be memoized successfully will be increased by encouraging the use of a functional programming style throughout the system.

Some imperative languages have a similar idiom. For example, a static const variable in C++ is initialized only once, before its value is used, and never mutates.

Davislor
  • 1,563
3

It depends on how you do it.

Usually people want to memoize by mutating some sort of cache dictionary. This has all the problems associated with impure mutation, such as having to worry about concurrency, worrying about the cache growing too large, etc.

However, you can memoize without impure memory mutation. One example is in this answer, where I track the memoized values externally by means of a lengths argument.

In the link Robert Harvey provided, lazy evaluation is used to avoid side effects.

Another technique sometimes seen is to explicitly mark the memoization as an impure side effect in the context of an IO type, such as with cats-effect's memoize function.

This last one brings up the point that sometimes the goal is just encapsulating mutation rather than eliminating it. Most functional programmers consider it "pure enough" to make impurity explicit and encapsulated.

If you want a term to differentiate it from a truly pure function, I think it's sufficient to just say "memoized with a mutable dictionary." That lets people know how to use it safely.

Karl Bielefeldt
  • 148,830
3

Usually, a function that returns a list is not pure at all because it requires allocation of storage and can thereby fail (e.g. by throwing an exception, which is non-pure). A language that has value types and can represent a list as a bounded-size value type may not have this issue. For this reason, your example is probably not pure.

In general, if the memoization can be done in a manner that's failure-case free (e.g. by having statically allocated storage for memoized results, and internal synchronization to control access to them if the language admits threads), it's reasonable to consider such a function pure.

0

You can implement memoization without side-effects using the state monad.

[State monad] is basically a function S => (S, A), where S is the type that represents your state and A is the result the function produces - Cats State.

In your case the state would be the memoized value or nothing (i.e. Haskell Maybe or Scala Option[A]). If the memoized value is present it is returned as A, otherwise A is calculated and returned as both the transitioned state and the result.

Samuel
  • 9,237