Extending the definition of purity in Rust
12 October 2012
In this post I propose an extension of Rust’s purity rules. The short
version is that pure functions would be allowed to mutate data owned
by their &mut
parameters. This extends the current Rust purity
rules which allow pure functions to invoke impure closures so long as
they are an argument to the function. The principle is the same: pure
functions are functions whose side-effects can be completely
determined by examining their parameters (for the more formally minded
among you, this is effectively an effect-parametric system with very
lightweight notation). The rest of the post is an elaboration and
justification of this idea.
Background on purity and how it works in Rust today
Rust allows functions to be declared pure
. The general idea of a
pure
function is that they do not mutate state reachable by the
caller. That means that you can call a pure function twice in a row,
with the same arguments, and be guaranteed of getting the same result.
In practice, though, Rust provides a somewhat more complex, but
infinitely more useful, definition of purity. Informally, rather than
saying that a pure function will not modify anything the caller can
reach, we say that executing a pure function will not modify anything
that the caller doesn’t know they are going to modify. To see what I
mean, consider the function each()
which iterates over a vector and
invokes a callback for each element:
pure fn each<T>(v: &[T], op: fn(&T)) {
let i = 0, n = v.len();
while i < n {
op(&v[i]);
i += 1;
}
}
We can’t say that the function each()
modifies nothing, because it
calls the closure op
that the caller provided, and op
could do
anything. We could of course declare that op
as a pure closure, but
that would make each()
rather useless, since you would not be able
to do any mutation whenever you iterated over a vector. We could also
say that each()
is not pure, but that loses some useful
information—after all, each()
does not itself mutate anything
that the user has access to. It’s the callback which does! So
each()
is kind of “conditionally pure”—it is pure if its callback
is pure, but not otherwise.
This idea of “conditional purity” is precisely what Rust currently adopts. So in short a pure function today is permitted exactly one kind of (potentially) impure action: invoking closures that it was provided as argument. This idea is really useful and opens up a wide variety of functions to be declared as pure that could not have been declared pure before.
What is purity used for in Rust
The primary purpose of purity in Rust is memory safety. The Rust compiler will permit you to create borrowed pointers that might be invalidated by mutation so long as you only perform pure actions for the lifetime of those pointers. This is very useful in practice.
Another purpose for purity that I foresee is bringing a
Rivertrail- or PJs-like model for data parallelism to Rust. The
notion of purity I am talking about aligns very well here. In that
model, in particular, being able to give the worker functions
controlled access to mutable data can be very useful, as I described
in this HotPar paper that I wrote some time back (note
however that in the HotPar paper I made use of a transitive read-only
type qualifier to gain the same guarantees that pure
functions give
us; using pure
seems simpler).
My proposed extension
I would like to extend the idea of conditional purity a little
further. I want to say that, in addition to invoking impure closures
given as argument, a pure function can take one additional action: it
can write to any &mut
pointer given as argument. The principle is
the same as before: the side effects of executing a particular call to
a pure function can be completely determined by examining its
arguments.
The problem I am trying to solve is succintly described by Tim in Issue #3722: right now, a pure function is permitted to mutate its local variables, but there is no way to factor out bits of code from a pure function into a helper.
Tim’s example is as follows:
pure fn make_str() -> ~str {
let mut s = ~"";
str::push_char(&mut s, 'c');
return s;
}
Here, the call to push_char()
is not considered pure because it
modifies the string s
which was supplied as argument. Therefore it
cannot be called from make_str()
, as make_str()
is pure.
Nonetheless, we know that this would be safe, as push_char()
only
mutates the string it is given as argument, and make_str()
is
permitted to mutate s
without violating purity.
Enforcing and Formalizing the proposed rule
Under the rule I propose, a pure function could have two form of externally visible side-effects:
- a pure function can write to any
&mut
pointer which it is given as a parameter; - a pure function may invoke any closure which is passed to it as a parameter.
From the point of view of the pure function, the actions it may legally take are something like the following:
- write to data owned by the current stack frame;
- write to data owned by an
&mut
parameter; - read any data it likes;
- invoke closures given as parameters;
- invoke other pure functions or methods, however:
- for each parameter of the pure function declared with
&mut
type, the value provided must be one which is writable by the current function; - for each parameter of the pure function declared with closure type, the value provided must be one which is callable by the current function.
- for each parameter of the pure function declared with
These rules aren’t really complete, however, because they don’t take into account what the limits ought to be when borrow check decides that purity is needed for a particular block.
On the plane flying back from the Emerging Languages workshop, I wrote out a formalization of this idea based on an effect system with heap regions. What I would like to do is move the current purity checking, which is written using ad-hoc rules similar to the ones above, and implement it based on this formalization, which I think is much easier to think about, and helps to ensure it’s giving the right answer in various corner cases.
I wanted to write up this formalization here in this post, but I’ve got too much on my plate today to spend the time on it, so it’ll have to wait. I guess it’s no crime if I post a blog post that’s less than 22 pages when printed.