Non-lexical lifetimes based on liveness

4 May 2016

In my previous post I outlined several cases that we would like to improve with Rust’s current borrow checker. This post discusses one possible scheme for solving those. The heart of the post is two key ideas:

  1. Define a lifetime as a set of points in the control-flow graph, where a point here refers to some particular statement in the control-flow graph (i.e., not a basic block, but some statement within a basic block).
  2. Use liveness as the basis for deciding where a variable’s type must be valid.

The rest of this post expounds on these two ideas and shows how they affect the various examples from the previous post.

Problem case #1: references assigned into a variable

To see better what these two ideas mean – and why we need both of them – let’s look at the initial example from my previous post. Here we are storing a reference to &mut data[..] into the variable slice:

fn bar() {
    let mut data = vec!['a', 'b', 'c'];
    let slice = &mut data[..]; // <-+ lifetime today
    capitalize(slice);         //   |
    data.push('d'); // ERROR!  //   |
    data.push('e'); // ERROR!  //   |
    data.push('f'); // ERROR!  //   |
} // <------------------------------+

As shown, the lifetime of this reference today winds up being the subset of the block that starts at the let and stretches until the ending }. This results in compilation errors when we attempt to push to data. The reason is that a borrow like &mut data[..] effectively “locks” the data[..] for the lifetime of the borrow, meaning that data becomes off limits and can’t be used (this “locking” is just a metaphor for the type system rules; there is of course nothing happening at runtime).

What we would like is to observe that slice is dead – which is compiler-speak for “it won’t ever be used again” – after the call to capitalize. Therefore, if we had a more flexible lifetime system, we might compute the lifetime of the slice reference to something that ends right after the call to capitalize, like so:

fn bar() {
    let mut data = vec!['a', 'b', 'c'];
    let slice = &mut data[..]; // <-+ lifetime under this proposal
    capitalize(slice);         //   |
    // <----------------------------+
    data.push('d'); // OK
    data.push('e'); // OK
    data.push('f'); // OK
}

If we had this shorter lifetime, then the calls to data.push would be legal, since the “lock” is effectively released early.

At first it might seem like all we have to do to achieve this result is to adjust the definition of what a lifetime can be to make it more flexible. In particular, today, once a lifetime must extend beyond the boundaries of a single statement (e.g., beyond the let statement here), it must extend all the way till the end of the enclosing block. So, by adopting a definition of lifetimes that is just “a set of points in the control-flow graph”, we lift this constraint, and we can now express the idea of a lifetime that starts at the &mut data[..] borrow and ends after the call to capitalize, which we couldn’t even express before.

But it turns out that is not quite enough. There is another rule in the type system today that causes us a problem. This rule states that the type of a variable must outlive the variable’s scope. In other words, if a variable contains a reference, that reference must be valid for the entire scope of the variable. So, in our example above, the reference created by the &mut data[..] borrow winds up being stored in the variable slice. This means that the lifetime of that reference must include the scope of slice – which stretches from the let until the closing }. In other words, even if we adopt more flexible lifetimes, if we change nothing else, we wind up with the same lifetime as before.

You might think we could just remove the rule altogether, and say that the lifetime of a reference must include all the points where the lifetime is used, with no special treatment for references stored into variables. In this particular example we’ve been looking at, that would do the right thing: the lifetime of slice would only have to outlive the call to capitalize. But it starts to go wrong if the control-flow gets more complicated:

fn baz() {
    let mut data = vec!['a', 'b', 'c'];
    let slice = &mut data[..]; // <-+ lifetime if we ignored
    loop {                     //   | variables altogether
        capitalize(slice);     //   |
        // <------------------------+
        data.push('d'); // Should be error, but would not be.
    }
    data.push('e'); // OK
    data.push('f'); // OK
}

Here again the reference slice is still only be required to live until after the call to capitalize, since that is the only place it is used. However, in this variation, that is not the correct behavior: the reference slice is in fact still live after the call to capitalize, since it will be used again in the next iteration of the loop. The problem here is that we are entering the lifetime (after the call to capitalize) and then re-entering it (on the loop backedge) but without reinitializing slice.

One way to address this problem would be to modify the definition of a lifetime. The definition I gave earlier was very flexible and allowed any set of points in the control-flow to be included. Perhaps we want some special rules to ensure that control flow is continuous? This is the approach that RFC 396 took, for example. I initially explored this approach but found that it caused problems with more advanced cases, such as a variation on problem case 3 we will examine in a later post.

(EDITED: The paragraph above incorrectly suggested that RFC 396 had special rules around backedges. Edited to clarify.)

Instead, I have opted to weaken – but not entirely remove – the original rule. The original rule was something like this (expressed as an inference rule):

scope(x) = 's
T: 's
------------------
let x: T OK

In other words, it’s ok to declare a variable x with type T, as long as T outlive the scope 's of that variable. My new version is more like this:

live-range(x) = 's
T: 's
------------------
let x: T OK

Here I have substituted live-range for scope. By live-range I mean “the set of points in the CFG where x may be later used”, effectively. If we apply this to our two variations, we will see that, in the first example, the variable slice is dead after the call to capitalize: it will never be used again. But in the second variation, the one with a loop, slice is live, because it may be used in the next iteration. This accounts for the different behavior:

// Variation #1: `slice` is dead after call to capitalize,
// so the lifetime ends
fn bar() {
    let mut data = vec!['a', 'b', 'c'];
    let slice = &mut data[..]; // <-+ lifetime under this proposal
    capitalize(slice);         //   |
    // <----------------------------+
    data.push('d'); // OK
    data.push('e'); // OK
    data.push('f'); // OK
}

// Variation #2: `slice` is live after call to capitalize,
// so the lifetime encloses the entire loop.
fn baz() {
    let mut data = vec!['a', 'b', 'c'];
    let slice = &mut data[..]; // <---------------------------+
    loop {                                               //   |
        capitalize(slice);                               //   |
        data.push('d'); // ERROR!                        //   |
    }                                                    //   |
    // <------------------------------------------------------+
    
    // But note that `slice` is dead here, so the lifetime ends:
    data.push('e'); // OK
    data.push('f'); // OK
}

Refining the proposal using fragments

One problem with the analysis as I presented it thus far is that it is based on liveness of individual variables. This implies that we lose precision when references are moved into structs or tuples. So, for example, while this bit of code will type-check:

let mut data1 = vec![];
let mut data2 = vec![];
let x = &mut data1[..]; // <--+ data1 is "locked" here
let y = &mut data2[..]; // <----+ data2 is "locked" here
use(x);                 //    | |
// <--------------------------+ |
data1.push(1);          //      |
use(y);                 //      |
// <----------------------------+
data2.push(1);

It would cause errors if we move those two references into a tuple:

let mut data1 = vec![];
let mut data2 = vec![];
let tuple = (&mut data1[..], &mut data2[..]); // <--+ data1 and data2
use(tuple.0);                                 //    | are locked here
data1.push(1);                                //    |
use(tuple.1);                                 //    |
// <------------------------------------------------+
data2.push(1);

This is because the variable tuple is live until after the last field access. However, the dynamic drop analysis is already computing a set of fragments, which are basically minimal paths that it needs to retain full resolution around which subparts of a struct or tuple have been moved. We could probably use similar logic to determine that we ought to compute the liveness of tuple.0 and tuple.1 independently, which would make this example type-check. (If we did so, then any use of tuple would be considered a “gen” of both tuple.0 and tuple.1, and any write to tuple would be considered a “kill” of both.) This would probably subsume and be compatible with the fragment logic used for dynamic drop, so it could be a net simplification.

Destructors

One further wrinkle that I did not discuss is that any struct with a destructor encounters special rules. This is because the destructor may access the references in the struct. These rules were specified in RFC 1238 but are colloquially called “dropck”. They basically state that when we create some variable x whose type T has a destructor, then T must outlive the parent scope of x. That is, the references in x don’t have to just be valid for the scope of x, they have to be valid for longer than the scope of x.

In some sense, the dropck rules remains unchanged by all I’ve discussed here. But in another sense dropck may stop being a special case. The reason is that, in MIR, all drops are made explicit in the control-flow graph, and hence if a variable x has a destructor, that should show us as “just another use” of x, and thus cause the lifetime of any references within to be naturally extended to cover that destructor. I admit I haven’t had time to dig into a lot of examples here: destructors are historically a very subtle case.

Implementation ramifications

Those of you familiar with the compiler will realize that there is a bit of a chicken-and-egg problem with what I have presented here. Today, the compiler computes the lifetimes of all references in the typeck pass, which is basically the main type-checking pass that computes the types of all expressions. We then use the output of this pass to construct MIR. But in this proposal I am defining lifetimes as a set of points in the MIR control-flow-graph. What gives?

To make this work, we have to change how the compiler works internally. The rough idea is that the typeck pass will no longer concern itself with regions: it will erase all regions, just as trans does. This has a number of ancillary benefits, though it also carries a few complications we have to resolve (maybe a good topic for another blog post!). We’ll then build MIR from this, and hence the initially constructed MIR will also have no lifetime information (just erased lifetimes).

Then, looking at each function in the program in turn, we’ll do a safety analysis. We’ll start by computing lifetimes – at this point, we have the MIR CFG in hand, so we can easily base them on the CFG. We’ll then run the borrowck. When we are done, we can just forget about the lifetimes entirely, since all later passes are just doing optimization and code generation, and they don’t care about lifetimes.

Another interesting question is how to represent lifetimes in the compiler. The most obvious representation is just to use a bit-set, but since these lifetimes would require one bit for every statement within a function, they could grow quite big. There are a number of ways we could optimize the representation: for example, we could track the mutual dominator, even promoting it “upwards” to the innermost enclosing loop, and only store bits for that subportion of the graph. This would require fewer bits but it’d be a lot more accounting. I’m sure there are other far more clever options as well. The first step I think would be to gather some statistics about the size of functions, the number of inference variables per fn, and so forth.

In any case, a key observation is that, since we only need to store lifetimes for one function at a time, and only until the end of borrowck, the precise size is not nearly as important as it would be today.

Conclusion

Here I presented the key ideas of my current thoughts around non-lexical lifetimes: using flexible lifetimes coupled with liveness. I motivated this by examining problem case #1 from my introduction. I also covered some of the implementation complications. In future posts, I plan to examine problem cases #2 and #3 – and in particular to describe how to extend the system to cover named lifetime parameters, which I’ve completely ignored here. (Spoiler alert: problem cases #2 and #3 are also no longer problems under this system.)

I also do want to emphasize that this plan is a “work-in-progress”. Part of my hope in posting it is that people will point out flaws or opportunities for improvement. So I wouldn’t be surprised if the final system we wind up with winds up looking quite different.

(As is my wont lately, I am disabling comments on this post. If you’d like to discuss the ideas in here, please do so in this internals thread instead.)