Rvalue lifetimes in Rust
9 January 2014
I’ve been working on Issue #3511, which is an effort to rationalize the lifetimes of temporary values in Rust. This issue has been a thorn in the side of Rust users for a while, because the current lifetimes are rather haphazard and frequently too short. Some time ago, I did some thinking on this issue and then let it lie while other things took priority.
Part of the reason that this issue has lasted so long is that the current trans cleanup scheme is very inflexible. I have a branch now that rewrites the cleanup system so that it can handle any rules we would like. The problem I am encountering now, of course, is that it’s unclear what the rules should be. I want to lay out the options I see.
The problem
There are numerous situations in which Rust users borrow temporary values; the tricky part is deciding what the lifetime of these temporary values ought to be. Put another way, when should we run the destructor for these temporaries? To see what I mean, let me show a few examples: I’ll focus on three use cases that I think are fairly representative, and which I see appearing in the code base a lot. It’s possible though that I’m missing some good examples.
Example 1: borrowing an rvalue. Here is the first example:
let map = &mut HashMap::new(); // (1)
The effect of this expression is to create a temporary stack variable and create a pointer to it. It is roughly equivalent to something like this:
let _temp = HashMap::new();
let map = &mut _temp;
The question that I want to consider in this post is what the lifetime
of this temporary ought to be. In the explicit expansion, the lifetime
of the temporary is clear: it will live as long as the explicit
variable _temp
. But the correct semantics for the first example
are less clear (as we’ll see).
Applying the &
operator to an rvalue might seem a bit silly at
first. Why not write it using an explicit temporary, after all? There
are a couple of reasons though that it’s worth supporting, but the
most crucial one is in macros. In macros, it’s useful to be able to
apply the borrow operator to any expression in order to avoid moving
or copying values when you don’t want to. For example, the expansion
of assert_eq!($a, $b)
is something like:
{
let _a = & $a;
let _b = & $b;
if _a != _b {
fail!(...)
}
}
If we didn’t use the &
operator, then the first two lines might
cause inapprorpriate moves. For example, if I wrote assert_eq!(x.a, y.b)
, and the type of x.a
was affine, then assert_eq!
would move
from both x.a
and y.b
. Not so good.
Example 2: ref bindings and rvalues. The second example is in some way just a different syntax for the same thing (or, I should say, somethign which probably ought to be the same, though some of the rules I describe do not treat it the same way):
let ref mut map = HashMap::new();
Example 3: Autoref in method calls. Method calls in Rust typically take borrowed pointers to their receivers, but one rarely writes this explicitly. Instead, the receiver is implicitly borrowed via a mechanism called “autoref” (this is actually the same as in C++, except that in C++ all method calls are autoref’d, whereas in Rust you can also have method calls that take the receiver by value and not by reference).
One example that is becoming rather common in the rustc
code base is
the RefCell
type. RefCell
is a standard library type that allows
some of the Rust compiler’s static checks to be converted into dynamic
checks; it replaces @mut
, which bundled together dynamic checks with
managed data, and just isolates out the dynamic check portion so that
it can be reused with other smart pointer types.
The way that RefCell
works is that you invoke the borrow
or
borrow_mut
methods:
let map: RefCell<HashMap<K,V>> = ...;
let mut r = map.borrow_mut();
These methods check some bits to ensure that the value is not borrowed
in an incompatible way already (basically: a mutable borrow must not
overlap with any other borrows). These methods then toggle some bits
and return a special Ref
type (r
in the example above). This Ref
type has a destructor which resets the bit, effectively ending the
borrow. In the meantime, the Ref
type can be used to get access to
the data itself:
let data = r.get();
Note that there is an implicit borrow of the variable r
occurring here.
In a sense, that method call could be expanded to:
let data = (&mut r).get();
The key point here is that the lifetime of the variable r
exactly
corresponds to the lifetime of the dynamic borrow of map
. Therefore,
having a good understanding of when the destructor for r
will run is
crucial to know how long your map is borrowed for. For example, if you
write code like the following, you will get a fatal error:
let map: RefCell<HashMap<K,V>> = ...;
let mut r1 = map.borrow_mut();
let mut r2 = map.borrow(); // Error: already borrowed!
The problem is that the second borrow (r2
) occurs before the first
borrow has completed; more operationally, the second borrow occurs
before the destructor for r
executes.
The lifetime of a borrow is relatively clear so long as explicit temporaries are used, as I showed so far. But it’s kind of verbose. For example, to insert an item into a map, I have to write something like:
let mut r = map.borrow_mut();
r.get().insert(k, v);
It’d be nicer if we could remove the temporary:
map.borrow_mut().get().insert(k, v);
// temporary: ^~~~^
But this gets right back to the issue were talking about, because the
call to get()
in fact takes the address of the receiver, and in this
case the receiver is an rvalue map.borrow_mut()
.
Some solutions
OK, so we have three examples where rvalue temporaries are borrowed:
let map = &mut HashMap::new(); // (1)
let ref mut map = HashMap::new(); // (2)
map.borrow_mut().get().insert(k, v); // (3)
I want to explore various rules we could use to decide when the destructors run, and see what the effect would be on each example.
Solution 0: Innermost enclosing statement.
My first attempt (what is currently written on the branch) was to make all temporaries tied to the innermost enclosing statement (roughly, see Appendix B for full details). I think this does the right thing for example 3, in that it releases the borrow at the of the statement:
map.borrow_mut().get().insert(k, v); // (3)
However, examples 1 and 2 both fail to compile:
let map = &mut HashMap::new(); // (1)
let ref mut map = HashMap::new(); // (2)
The reason for this is that, if the hashmap only lives as long as the
statement, the value in map
gets destructored as soon as it is assigned,
and thus cannot safely be used by the following statements. That is,
the following code would access freed memory:
let map = &mut HashMap::new();
map.insert(...);
I think this solution is not workable because one cannot write the
assert_eq
macro above.
Solution 1: Innermost enclosing block.
To address the problem with solution 0, we might try to use the innermost enclosing block. This makes examples 1 and 2 work find, but example 3 doesn’t work so well:
map.borrow_mut().get().insert(k, v); // (3)
The problem is that here the borrow isn’t released until the end of the enclosing block, rather than the enclosing statement. This probably way too late. For example, code like the following would fail dynamically:
{
ref_map.borrow_mut().get().insert(key1, value);
let v = *ref_map.borrow().get().find(key2);
...
}
I think this solution is not workable because it is too painful
to work with RefCell
.
Solution 2: Variations on the C++ rule (roughly).
Interestingly, C++ has a similar problem concerning temporaries, and they have a rather custom rule that attempts to address exactly the issue I encountered with solution 0. The C++ rule, as I understand it, is that temporaries live as long as the innermost enclosing statement, unless the temporary is assigned to an (reference) variable, in which case it lives as long as that variable.
So, for example, if I had a call to a function that took a map rvalue reference, as follows:
V& find(const map<K,V>& m) { ... }
use(find(map(...)));
then the map will be freed after use()
returns. This is true even
though the temporary was created as an argument to find()
. Basically
the temporary will live until the next semicolon, roughly speaking.
Now there is one exception to this rule. If I asssign the temporary to a variable, then it lives as long as the variable:
const map<K,V>& m = map(...);
In this case, the destructor for map
will run once m
goes out of
scope.
It is a bit challenging to make a direct equivalent to this rule in
Rust. For one thing, we have explicit borrows (the &
operator) and
also ref
bindings. For another, assignments can be more complicated,
e.g.:
let Foo { a: ref a, b: b } = create_foo();
In this case, one of the fields is bound by reference, but the other is moved.
Variation A. Still, we could make a rule that says something like this:
let
bindings where the initializer is an rvalue first store the initializer
into a temporary with the lifetime of the innermost block, and then assign
from that temporary into the pattern. So effectively let pat = rvalue
becomes:
let _temp = rvalue;
let pat = temp;
This rule treats examples 1 and 2 rather differently:
let m = &mut HashMap::new(); // (1) Error
let ref mut m2 = HashMap::new(); // (2) OK
Example 1 is unaffected by the rule and hence still an error for the
same reasons as Solution 0: the temporary created by the explicit
borrow goes out of scope at the end of the let
statement, rather
than the block. Example 2 would work, though, because the temporary in
that case would have an extended lifetime. In the case of Example 3
(the RefCell
), the borrow would terminate at the end of the
statement, as desired.
Variation B. Another option would be to say that the borrow
operator &
uses the lifetime of the innermost enclosing block, but
all other temporaries use the innermost enclosing statement. This rule
is easier to explain than variation A, and it has the opposite effect
on examples 1 and 2:
let m = &mut HashMap::new(); // (1) OK
let ref mut m2 = HashMap::new(); // (2) Error
In Example 2, the temporary used for let
initializers only lives
until the end of the statement, so we get a compilation error.
Variation C. We could extend the lifetime of both explicit &
uses
as well as let
initializers. In that case, both examples 1 and 2 work
the same way:
let m = &mut HashMap::new(); // (1) OK
let ref mut m2 = HashMap::new(); // (2) OK
Summary. I think variations A, B, or C would all be potentially workable.
Solution 3: Inference.
Finally, we can rely on inference. Essentially the compiler would
decide the smallest lifetime that makes the program legal. This makes
all of the examples I’ve given work, but at a cost in predictability
– it’s hard to know when your destructors run. For things like
RefCell
, this is of course a potential problem. Overall, while I
think inference is workable, it is almost universally unpopular,
simply because people do not like the idea of an ill-defined lifetime
inference algorithm dictating when their destructor will execute.
Conclusions
All in all I guess I leans towards some variation of Solution 2. I
like Variation B (make the lifetime of the explicit &
operator be
the innermost enclosing block; otherwise, innermost enclosing
statement) because it’s easy to express and implement, but I also like
Variation C because it treats examples 1 and 2 the same way. Whatever
we do, having an explicit option seems like a good idea (see Appendix
A).
Appendix A. Explicit annotation
Regardless of what rule we pick, it is possible to permit users to explicitly annotation temporary lifetimes. One of the motivations for the current lifetime syntax was to permit users to annotate blocks (and perhaps statements/expressions) with lifetime names and then refer to those later. For example, one might create a temporary and explicitly state that it should be destructed in an outer block:
'a: {
{
let m = &'a mut HashMap::new();
...
}
}
There would be some limits to these explicit temporaries. For example,
you could not create a temporary in an outer block if you are within
an if
or loop
statement (this is needed to ensure fixed size
stacks and to ensure we know what values to run destructors on
statically).
Appendix B. Tail expressions in block.
In many of the rules above, I’ve referenced the innermost enclosing block or statement. But what is the innermost enclosing block or statement in a situation like:
let v = {
&mut HashMap::new()
};
It might be nice to make the tail expression in a block belong, effectively, to its parent.
In my existing code (which implements Solution 0), the actual rule is not ““innermost enclosing statement” but rather “innermost enclosing statement, loop body, or function”. I do not consider the tail expression of a block to be in a statement. This means that temporaries in the tail expression effectively have the lifetime of the statement (or loop body, or function body) in which the block appears.
Appendix C. Match expressions.
Ref bindings can also appear in match expressions, of course. Regardless, I think the semantics of match on an rvalue probably ought to be that the temporary value lives as long as the enclosing statement, regardless of what bindings it contains; that seems to be what most people expect.