Non-lexical lifetimes: introduction
27 April 2016
Over the last few weeks, I’ve been devoting my free time to fleshing out the theory behind non-lexical lifetimes (NLL). I think I’ve arrived at a pretty good point and I plan to write various posts talking about it. Before getting into the details, though, I wanted to start out with a post that lays out roughly how today’s lexical lifetimes work and gives several examples of problem cases that we would like to solve.
The basic idea of the borrow checker is that values may not be mutated or moved while they are borrowed. But how do we know whether a value is borrowed? The idea is quite simple: whenever you create a borrow, the compiler assigns the resulting reference a lifetime. This lifetime corresponds to the span of the code where the reference may be used. The compiler will infer this lifetime to be the smallest lifetime that it can that still encompasses all the uses of the reference.
Note that Rust uses the term lifetime in a very particular way. In everyday speech, the word lifetime can be used in two distinct – but similar – ways:
- The lifetime of a reference, corresponding to the span of time in which that reference is used.
- The lifetime of a value, corresponding to the span of time before that value gets freed (or, put another way, before the destructor for the value runs).
This second span of time, which describes how long a value is valid, is of course very important. We refer to that span of time as the value’s scope. Naturally, lifetimes and scopes are linked to one another. Specifically, if you make a reference to a value, the lifetime of that reference cannot outlive the scope of that value, Otherwise your reference would be pointing into free memory.
To better see the distinction between lifetime and scope, let’s
consider a simple example. In this example, the vector data
is
borrowed (mutably) and the resulting reference is passed to a function
capitalize
. Since capitalize
does not return the reference back,
the lifetime of this borrow will be confined to just that call. The
scope of data, in contrast, is much larger, and corresponds to a
suffix of the fn body, stretching from the let
until the end of the
enclosing scope.
fn foo() {
let mut data = vec!['a', 'b', 'c']; // --+ 'scope
capitalize(&mut data[..]); // |
// ^~~~~~~~~~~~~~~~~~~~~~~~~ 'lifetime // |
data.push('d'); // |
data.push('e'); // |
data.push('f'); // |
} // <---------------------------------------+
fn capitalize(data: &mut [char]) {
// do something
}
This example also demonstrates something else. Lifetimes in Rust today are quite a bit more flexible than scopes (if not as flexible as we might like, hence this RFC):
- A scope generally corresponds to some block (or, more specifically,
a suffix of a block that stretches from the
let
until the end of the enclosing block) [1]. - A lifetime, in contrast, can also span an individual expression, as
this example demonstrates. The lifetime of the borrow in the example
is confined to just the call to
capitalize
, and doesn’t extend into the rest of the block. This is why the calls todata.push
that come below are legal.
So long as a reference is only used within one statement, today’s lifetimes are typically adequate. Problems arise however when you have a reference that spans multiple statements. In that case, the compiler requires the lifetime to be the innermost expression (which is often a block) that encloses both statements, and that is typically much bigger than is really necessary or desired. Let’s look at some example problem cases. Later on, we’ll see how non-lexical lifetimes fixes these cases.
Problem case #1: references assigned into a variable
One common problem case is when a reference is assigned into a
variable. Consider this trivial variation of the previous example,
where the &mut data[..]
slice is not passed directly to
capitalize
, but is instead stored into a local variable:
fn bar() {
let mut data = vec!['a', 'b', 'c'];
let slice = &mut data[..]; // <-+ 'lifetime
capitalize(slice); // |
data.push('d'); // ERROR! // |
data.push('e'); // ERROR! // |
data.push('f'); // ERROR! // |
} // <------------------------------+
The way that the compiler currently works, assigning a reference into
a variable means that its lifetime must be as large as the entire
scope of that variable. In this case, that means the lifetime is now
extended all the way until the end of the block. This in turn means
that the calls to data.push
are now in error, because they occur
during the lifetime of slice
. It’s logical, but it’s annoying.
In this particular case, you could resolve the problem by putting
slice
into its own block:
fn bar() {
let mut data = vec!['a', 'b', 'c'];
{
let slice = &mut data[..]; // <-+ 'lifetime
capitalize(slice); // |
} // <------------------------------+
data.push('d'); // OK
data.push('e'); // OK
data.push('f'); // OK
}
Since we introduced a new block, the scope of slice
is now smaller,
and hence the resulting lifetime is smaller. Of course, introducing a
block like this is kind of artificial and also not an entirely obvious
solution.
Problem case #2: conditional control flow
Another common problem case is when references are used in only match
arm. This most commonly arises around maps. Consider this function,
which, given some key
, processes the value found in map[key]
if it
exists, or else inserts a default value:
fn process_or_default<K,V:Default>(map: &mut HashMap<K,V>,
key: K) {
match map.get_mut(&key) { // -------------+ 'lifetime
Some(value) => process(value), // |
None => { // |
map.insert(key, V::default()); // |
// ^~~~~~ ERROR. // |
} // |
} // <------------------------------------+
}
This code will not compile today. The reason is that the map
is
borrowed as part of the call to get_mut
, and that borrow must
encompass not only the call to get_mut
, but also the Some
branch
of the match. The innermost expression that encloses both of these
expressions is the match itself (as depicted above), and hence the
borrow is considered to extend until the end of the
match. Unfortunately, the match encloses not only the Some
branch,
but also the None
branch, and hence when we go to insert into the
map in the None
branch, we get an error that the map
is still
borrowed.
This particular example is relatively easy to workaround. One can
(frequently) move the code for None
out from the match
like so:
fn process_or_default1<K,V:Default>(map: &mut HashMap<K,V>,
key: K) {
match map.get_mut(&key) { // -------------+ 'lifetime
Some(value) => { // |
process(value); // |
return; // |
} // |
None => { // |
} // |
} // <------------------------------------+
map.insert(key, V::default());
}
When the code is adjusted this way, the call to map.insert
is not
part of the match, and hence it is not part of the borrow. While this
works, it is of course unfortunate to require these sorts of
manipulations, just as it was when we introduced an artificial block
in the previous example.
Problem case #3: conditional control flow across functions
While we were able to work around problem case #2 in a relatively
simple, if irritating, fashion. there are other variations of
conditional control flow that cannot be so easily resolved. This is
particularly true when you are returning a reference out of a
function. Consider the following function, which returns the value for
a key if it exists, and inserts a new value otherwise (for the
purposes of this section, assume that the entry
API for maps does
not exist):
fn get_default<'m,K,V:Default>(map: &'m mut HashMap<K,V>,
key: K)
-> &'m mut V {
match map.get_mut(&key) { // -------------+ 'm
Some(value) => value, // |
None => { // |
map.insert(key, V::default()); // |
// ^~~~~~ ERROR // |
map.get_mut(&key).unwrap() // |
} // |
} // |
} // v
At first glance, this code appears quite similar the code we saw
before. And indeed, just as before, it will not compile. But in fact
the lifetimes at play are quite different. The reason is that, in the
Some
branch, the value is being returned out to the caller.
Since value
is a reference into the map, this implies that the map
will remain borrowed until some point in the caller (the point
'm
, to be exact). To get a better intuition for what this lifetime
parameter 'm
represents, consider some hypothetical caller of
get_default
: the lifetime 'm
then represents the span of code in
which that caller will use the resulting reference:
fn caller() {
let mut map = HashMap::new();
...
{
let v = get_default(&mut map, key); // -+ 'm
// +-- get_default() -----------+ // |
// | match map.get_mut(&key) { | // |
// | Some(value) => value, | // |
// | None => { | // |
// | .. | // |
// | } | // |
// +----------------------------+ // |
process(v); // |
} // <--------------------------------------+
...
}
If we attempt the same workaround for this case that we tried in the previous example, we will find that it does not work:
fn get_default1<'m,K,V:Default>(map: &'m mut HashMap<K,V>,
key: K)
-> &'m mut V {
match map.get_mut(&key) { // -------------+ 'm
Some(value) => return value, // |
None => { } // |
} // |
map.insert(key, V::default()); // |
// ^~~~~~ ERROR (still) |
map.get_mut(&key).unwrap() // |
} // v
Whereas before the lifetime of value
was confined to the match, this
new lifetime extends out into the caller, and therefore the borrow
does not end just because we exited the match. Hence it is still in
scope when we attempt to call insert
after the match.
The workaround for this problem is a bit more involved. It relies on the fact that the borrow checker uses the precise control-flow of the function to determine what borrows are in scope.
fn get_default2<'m,K,V:Default>(map: &'m mut HashMap<K,V>,
key: K)
-> &'m mut V {
if map.contains(&key) {
// ^~~~~~~~~~~~~~~~~~ 'n
return match map.get_mut(&key) { // + 'm
Some(value) => value, // |
None => unreachable!() // |
}; // v
}
// At this point, `map.get_mut` was never
// called! (As opposed to having been called,
// but its result no longer being in use.)
map.insert(key, V::default()); // OK now.
map.get_mut(&key).unwrap()
}
What has changed here is that we moved the call to map.get_mut
inside of an if
, and we have set things up so that the if body
unconditionally returns. What this means is that a borrow begins at
the point of get_mut
, and that borrow lasts until the point 'm
in
the caller, but the borrow checker can see that this borrow will not
have even started outside of the if
. So it does not consider the
borrow in scope at the point where we call map.insert
.
This workaround is more troublesome than the others, because the resulting code is actually less efficient at runtime, since it must do multiple lookups.
It’s worth noting that Rust’s hashmaps include an entry
API that
one could use to implement this function today. The resulting code is
both nicer to read and more efficient even than the original version,
since it avoids extra lookups on the “not present” path as well:
fn get_default3<'m,K,V:Default>(map: &'m mut HashMap<K,V>,
key: K)
-> &'m mut V {
map.entry(key)
.or_insert_with(|| V::default())
}
Regardless, the problem exists for other data structures besides
HashMap
, so it would be nice if the original code passed the borrow
checker, even if in practice using the entry
API would be
preferable. (Interestingly, the limitation of the borrow checker here
was one of the motivations for developing the entry
API in the first
place!)
Conclusion
This post looked at various examples of Rust code that do not compile today, and showed how they can be fixed using today’s system. While it’s good that workarounds exist, it’d be better if the code just compiled as is. In an upcoming post, I will outline my plan for how to modify the compiler to achieve just that.
Endnotes
1. Scopes always correspond to blocks with one exception: the scope of a temporary value is sometimes the enclosing statement.