Baby Steps

A blog about programming, and tiny ways to improve it.

An Experimental New Type Inference Scheme for Rust

While on vacation, I’ve been working on an alternate type inference scheme for rustc. (Actually, I got it 99% working on the plane, and have been slowly poking at it ever since.) This scheme simplifies the code of the type inferencer dramatically and (I think) helps to meet our intutions (as I will explain). It is however somewhat less flexible than the existing inference scheme, though all of rustc and all the libraries compile without any changes. The scheme will (I believe) make it much simpler to implement to proper one-way matching for traits (explained later).

Note: Changing the type inference scheme doesn’t really mean much to end users. Roughly the same set of Rust code still compiles. So this post is really mostly of interest to rustc implementors.

The new scheme in a nutshell

The new scheme is fairly simple. It is based on the observation that most subtyping in Rust arises from lifetimes (though the scheme is extensible to other possible kinds of subtyping, e.g. virtual structs). It abandons unification and the H-M infrastructure and takes a different approach: when a type variable V is first related to some type T, we don’t set the value of V to T directly. Instead, we say that V is equal to some type U where U is derived by replacing all lifetimes in T with lifetime variables. We then relate T and U appropriately.

Let me give an example. Here are two variables whose type must be inferred:

'a: { // 'a --> name of block's lifetime
    let x = 3;
    let y = &x;
    ...
}

Let’s say that the type of x is $X and the type of y is $Y, where $X and $Y are both inference variables. In that case, the first assignment generates the constraint that int <: $X and the second generates the constraint that &'a $X <: $Y. To resolve the first constraint, we would set $X directly to int. This is because there are no lifetimes in the type int. To resolve the second constraint, we would set $Y to &'0 int – here '0 represents a fresh lifetime variable. We would then say that &'a int <: &'0 int, which in turn implies that '0 <= 'a. After lifetime inference is complete, the types of x and y would be int and &'a int as expected.

Without unification, you might wonder what happens when two type variables are related that have not yet been associated with any concrete type. This is actually somewhat challenging to engineer, but it certainly does happen. For example, there might be some code like:

let mut x;        // type: $X
let mut y = None; // type: Option<$0>

loop {
    if y.is_some() {
        x = y.unwrap();
        ...
    }
    ...
}

Here, at the point where we process x = y.unwrap(), we do not yet know the values of either $X or $0. We can say that the type of y.unwrap() will be $0 but we must now process the constrint that $0 <: $X. We do this by simply keeping a list of outstanding constraints. So neither $0 nor $X would (yet) be assigned a specific type, but we’d remember that they were related. Then, later, when either $0 or $X is set to some specific type T, we can go ahead and instantiate the other with U, where U is again derived from T by replacing all lifetimes with lifetime variables. Then we can relate T and U appropriately.

If we wanted to extend the scheme to handle more kinds of inference beyond lifetimes, it can be done by adding new kinds of inference variables. For example, if we wanted to support subtyping between structs, we might add struct variables.

Implied Bounds

I am on vacation for a few weeks. I wanted to take some time to jot down an idea that’s been bouncing around in my head. I plan to submit an RFC at some point on this topic, but not yet, so I thought I’d start out by writing a blog post. Also, my poor blog has been neglected for some time. Consider this a draft RFC. Some important details about references are omitted and will come in a follow-up blog post.

The high-level summary of the idea is that we will take advantage of bounds declared in type declarations to avoid repetition in fn and impl declarations.