baby steps

The `Overwrite` trait and `Pin`

2024-10-14T00:00:00+00:00

In July, boats presented a compelling vision in their post pinned places. With the Overwrite trait that I introduced in my previous post, however, I think we can get somewhere even more compelling, albeit at the cost of a tricky transition. As I will argue in this post, the Overwrite trait effectively becomes a better version of the existing Unpin trait, one that effects not only pinned references but also regular &mut references. Through this it’s able to make Pin fit much more seamlessly with the rest of Rust.

Just show me the dang code

Before I dive into the details, let’s start by reviewing a few examples to show you what we are aiming at (you can also skip to the TL;DR, in the FAQ).

I’m assuming a few changes here:

Adding an Overwrite trait and changing most types to be !Overwrite by default.
- The Option (and maybe others) would opt-in to Overwrite, permitting x.take().
Integrating pin into the borrow checker, extending auto-ref to also “auto-pin” and produce a Pin<&mut T>. The borrow checker only permits you to pin values that you own. Once a place has been pinned, you are not permitted to move out from it anymore (unless the value is overwritten).

The first change is “mildly” backwards incompatible. I’m not going to worry about that in this post, but I’ll cover the ways I think we can make the transition in a follow up post.

Example 1: Converting a generator into an iterator

We would really like to add a generator syntax that lets you write an iterator more conveniently.¹ For example, given some slice strings: &[String], we should be able to define a generator that iterates over the string lengths like so:

fn do_computation() -> usize {
    let hashes = gen {
        let strings: Vec<String> = compute_input_strings();
        for string in &strings {
            yield compute_hash(&string);
        }
    };
    
    // ...
}

But there is a catch here! To permit the borrow of strings, which is owned by the generator, the generator will have to be pinned.² That means that generators cannot directly implement Iterator, because generators need a Pin<&mut Self> signature for their next methods. It is possible, however, to implement Iterator for Pin<&mut G> where G is a generator.³

In today’s Rust, that means that using a generator as an iterator would require explicit pinning:

fn do_computation() -> usize {
    let hashes = gen {....};
    let hashes = pin!(hashes); // <-- explicit pin
    if let Some(h) = hashes.next() {
        // process first hash
    };
    // ...
}

With pinned places, this feels more builtin, but it still requires users to actively think about pinning for even the most basic use case:

fn do_computation() -> usize {
    let hashes = gen {....};
    let pinned mut hashes = hashes;
    if let Some(h) = hashes.next() {
        // process first hash
    };
    // ...
}

Under this proposal, users would simply be able to ignore pinning altogether:

fn do_computation() -> usize {
    let mut hashes = gen {....};
    if let Some(h) = hashes.next() {
        // process first hash
    };
    // ...
}

Pinning is still happening: once a user has called next, they would not be able to move hashes after that point. If they tried to do so, the borrow checker (which now understands pinning natively) would give an error like:

error[E0596]: cannot borrow `hashes` as mutable, as it is not declared as mutable
 --> src/lib.rs:4:22
  |
4 |     if let Some(h) = hashes.next() {
  |                      ------ value in `hashes` was pinned here
  |     ...
7 |     move_somewhere_else(hashes);
  |                         ^^^^^^ cannot move a pinned value
help: if you want to move `hashes`, consider using `Box::pin` to allocate a pinned box
  |
3 |     let mut hashes = Box::pin(gen { .... });
  |                      +++++++++            +

As noted, it is possible to move hashes after pinning, but only if you pin it into a heap-allocated box. So we can advise users how to do that.

Example 2: Implementing the `MaybeDone` future

The pinned places post included an example future called MaybeDone. I’m going to implement that same future in the system I describe here. There are some comments in the example comparing it to the version from the pinned places post.

enum MaybeDone<F: Future> {
    //         ---------
    //         I'm assuming we are in Rust.Next, and so the default
    //         bounds for `F` do not include `Overwrite`.
    //         In other words, `F: ?Overwrite` is the default
    //         (just as it is with every other trait besides `Sized`).
    
    Polling(F),
    //      -
    //      We don't need to declare `pinned F`.
    
    Done(Option<F::Output>),
}

impl<F: Future> MaybeDone<F> {
    fn maybe_poll(self: Pin<&mut Self>, cx: &mut Context<'_>) {
        //        --------------------
        //        I'm not bothering with the `&pinned mut self`
        //        sugar here, though certainly we could still
        //        add it.
        if let MaybeDone::Polling(fut) = self {
            //                    ---
            //       Just as in the original example,
            //       we are able to project from `Pin<&mut Self>`
            //       to a `Pin<&mut F>`.
            //
            //       The key is that we can safely project
            //       from an owner of type `Pin<&mut Self>`
            //       to its field of type `Pin<&mut F>`
            //       so long as the owner type `Self: !Overwrite`
            //       (which is the default for structs in Rust.Next).
            if let Poll::Ready(res) = fut.poll(cx) {
                *self = MaybeDone::Done(Some(res));
            }
        }
    }

    fn is_done(&self) -> bool {
        matches!(self, &MaybeDone::Done(_))
    }

    fn take_output(&mut self) -> Option<F::Output> {
        //         ---------
        //   In pinned places, this method had to be
        //   `&pinned mut self`, but under this design,
        //   it can be a regular `&mut self`.
        //   
        //   That's because `Pin<&mut Self>` becomes
        //   a subtype of `&mut Self`.
        if let MaybeDone::Done(res) = self {
            res.take()
        } else {
            None
        }
    }
}

Example 3: Implementing the `Join` combinator

Let’s complete the journey by implementing a Join future:

struct Join<F1: Future, F2: Future> {
    // These fields do not have to be declared `pinned`:
    fut1: MaybeDone<F1>,
    fut2: MaybeDone<F2>,
}

impl<F1, F2> Future for Join<F1, F2>
where
    F1: Future,
    F2: Future,
{
    type Output = (F1::Output, F2::Output);

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        //  --------------------
        // Again, I've dropped the sugar here.
        
        // This looks just the same as in the
        // "Pinned Places" example. This again
        // leans on the ability to project
        // from a `Pin<&mut Self>` owner so long as
        // `Self: !Overwrite` (the default for structs
        // in Rust.Next).
        self.fut1.maybe_poll(cx);
        self.fut2.maybe_poll(cx);
        
        if self.fut1.is_done() && self.fut2.is_done() {
            // This code looks the same as it did with pinned places,
            // but there is an important difference. `take_output`
            // is now an `&mut self` method, not a `Pin<&mut Self>`
            // method. This demonstrates that we can also get
            // a regular `&mut` reference to our fields.
            let res1 = self.fut1.take_output().unwrap();
            let res2 = self.fut2.take_output().unwrap();
            Poll::Ready((res1, res2))
        } else {
            Poll::Pending
        }
    }
}

How I think about pin

OK, now that I’ve lured you in with code examples, let me drive you away by diving into the details of Pin. I’m going to cover the way that I think about Pin. It is similar to but different from how Pin is presented in the pinned places post – in particular, I prefer to think about places that pin their values and not pinned places. In any case, Pin is surprisingly subtle, and I recommend that if you want to go deeper, you read boat’s history of Pin post and/or the stdlib documentation for Pin.

The `Pin`
type is a modifier on the pointer `P`

The Pin

type is unusual in Rust. It looks similar to a “smart pointer” type, like Arc, but it functions differently. Pin

is not a pointer, it is a modifier on another pointer, so

a Pin<&T> represents a pinned reference,
a Pin<&mut T> represents a pinned mutable reference,
a Pin> represents a pinned box,

and so forth.

You can think of a Pin

type as being a pointer of type P that refers to a place (Rust jargon for a location in memory that stores a value) whose value v has been pinned. A pinned value v can never be moved to another place in memory. Moreover, v must be dropped before its place can be reassigned to another value.

Pinning is part of the “lifecycle” of a place

The way I think about, every place in memory has a lifecycle:

flowchart TD
Uninitialized 
Initialized
Pinned

Uninitialized --
    p = v where v: T
--> Initialized

Initialized -- 
    move out, drop, or forget
--> Uninitialized

Initialized --
    pin value v in p
    (only possible when T is !Unpin)
--> Pinned

Pinned --
    drop value
--> Uninitialized

Pinned --
    move out or forget
--> UB

Uninitialized --
    free the place
--> Freed

UB[💥 Undefined behavior 💥]

When first allocated, a place p is uninitialized – that is, p has no value at all.

An uninitialized place can be freed. This corresponds to e.g. popping a stack frame or invoking free.

p may at some point become initialized by an assignment like p = v. At that point, there are three ways to transition back to uninitialized:

The value v could be moved somewhere else, e.g. by moving it somewhere else, like let p2 = p. At that point, p goes back to being uninitialized.
The value v can be forgotten, with std::mem::forget(p). At this point, no destructor runs, but p goes back to being considered uninitialized.
The value v can be dropped, which occurs when the place p goes out of scope. At this point, the destructor runs, and p goes back to being considered uninitialized.

Alternatively, the value v can be pinned in place:

At this point, v cannot be moved again, and the only way for p to be reused is for v to be dropped.

Once a value is pinned, moving or forgetting the value is not allowed. These actions are “undefined behavior”, and safe Rust must not permit them to occur.

A digression on forgetting vs other ways to leak

As most folks know, Rust does not guarantee that destructors run. If you have a value v whose destructor never runs, we say that value is leaked. There are however two ways to leak a value, and they are quite different in their impact:

Option A: Forgetting. Using std::mem::forget, you can forget the value v. The place p that was storing that value will go from initialized to uninitialized, at which point the place p can be freed.
- Forgetting a value is undefined behavior if that value has been pinned, however!
Option B: Leak the place. When you leak a place, it just stays in the initialized or pinned state forever, so its value is never dropped. This can happen, for example, with a ref-count cycle.
- This is safe even if the value is pinned!

In retrospect, I wish that Option A did not exist – I wish that we had not added std::mem::forget. We did so as part of working through the impact of ref-count cycles. It seemed equivalent at the time (“the dtor doesn’t run anyway, why not make it easy to do”) but I think this diagram shows why it adding forget made things permanently more complicated for relatively little gain.⁴ Oh well! Can’t win ’em all.

Values of types implementing `Unpin` cannot be pinned

There is one subtle aspect here: not all values can be pinned. If a type T implements Unpin, then values of type T cannot be pinned. When you have a pinned reference to them, they can still squirm out from under you via swap or other techniques. Another way to say the same thing is to say that values can only be pinned if their type is !Unpin (“does not implement Unpin”).

Types that are !Unpin can be called address sensitive, meaning that once they pinned, there can be pointers to the internals of that value that will be invalidated if the address changes. Types that implement Unpin would therefore be address insensitive. Traditionally, all Rust types have been address insensitive, and therefore Unpin is an auto trait, implemented by most types by default.

`Pin<&mut T>` is really a “maybe pinned” reference

Looking at the state machine as I describe it here, we can see that possessing a Pin<&mut T> isn’t really a pinned mutable reference, in the sense that it doesn’t always refer to a place that is pinning its value. If T: Unpin, then it’s just a regular reference. But if T: !Unpin, then a pinned reference guarantees that the value it refers to is pinned in place.

This fits with the name Unpin, which I believe was meant to convey that idea that, even if you have a pinned reference to a value of type T: Unpin, that value can become unpinned. I’ve heard the metaphor of “if T: Unpin, you can left out the pin, swap in a different value, and put the pin back”.

Pin picked a peck of pickled pain

Everyone agrees that Pin is confusing and a pain to use. But what makes it such a pain?

If you are attempting to author a Pin-based API, there are two primary problems:

Pin<&mut Self> methods can’t make use of regular &mut self methods.
Pin<&mut Self> methods can’t access fields by default. Crates like pin-project-lite make this easier but still require learning obscure concepts like structural pinning.

If you attempting to consume a Pin-based API, the primary annoyance is that getting a pinned reference is hard. You can’t just call Pin<&mut Self> methods normally, you have to remember to use Box::pin or pin! first. (We saw this in Example 1 from this post.)

My proposal in a nutshell

This post is focused on a proposal with two parts:

Making Pin-based APIs easier to author by replacing the Unpin trait with Overwrite.
Making Pin-based APIs easier to call by integrating pinning into the borrow checker.

I’m going to walk through those in turn.

Making `Pin`-based APIs easier to author

`Overwrite` as the better `Unpin`

The first part of my proposalis a change I call s/Unpin/Overwrite/. The idea is to introduce Overwrite and then change the “place lifecycle” to reference Overwrite instead of Unpin:

flowchart TD
Uninitialized 
Initialized
Pinned

Uninitialized --
    p = v where v: T
--> Initialized

Initialized -- 
    move out, drop, or forget
--> Uninitialized

Initialized --
    pin value v in p
    (only possible when
T is 👉!Overwrite👈)
--> Pinned

Pinned --
    drop value
--> Uninitialized

Pinned --
    move out or forget
--> UB

Uninitialized --
    free the place
--> Freed

UB[💥 Undefined behavior 💥]

For s/Unpin/Overwrite/ to work well, we have to make all !Unpin types also be !Overwrite. This is not, strictly speaking, backwards compatible, since today !Unpin types (like all types) can be overwritten and swapped. I think eventually we want every type to be !Overwrite by default, but I don’t think we can change that default in a general way without an edition. But for !Unpin types in particular I suspect we can get away with it, because !Unpin types are pretty rare, and the simplification we get from doing so is pretty large. (And, as I argued in the previous post, there is no loss of expressiveness; code today that overwrites or swaps !Unpin values can be locally rewritten.)

Why swaps are bad without `s/Unpin/Overwrite/`

Today, Pin<&mut T> cannot be converted into an &mut T reference unless T: Unpin.⁵ This because it would allow safe Rust code to create Undefined Behavior by swapping the referent of the &mut T reference and hence moving the pinned value. By requiring that T: Unpin, the DerefMut impl is effectively limiting itself to references that are not, in fact, in the “pinned” state, but just in the “initialized” state.

As a result, `Pin<&mut T>` and `&mut T` methods don’t interoperate today

This leads directly to our first two pain points. To start, from a Pin<&mut Self> method, you can only invoke &self methods (via the Deref impl) or other Pin<&mut Self> methods. This schism separates out the “regular” methods of a type from its pinned methods; it also means that methods doing field assignments don’t compile:

fn increment_field(self: Pin<&mut Self>) {
    self.field = self.field + 1;
}

This errors because compiling a field assignment requires a DerefMut impl and Pin<&mut Self> doesn’t have one.

With `s/Unpin/Overwrite/`, `Pin<&mut Self>` is a subtype of `&mut self`

s/Unpin/Overwrite/ allows us to implement DerefMut for all pinned types. This is because, unlike Unpin, Overwrite affects how &mut works, and hence &mut T would preserve the pinned state for the place it references. Consider the two possibilities for the value of type T referred to by the &mut T:

If T: Overwrite, then the value is not pinnable, and so the place cannot be in the pinned state.
If T: !Overwrite, the value could be pinned, but we also cannot overwrite or swap it, and so pinning is preserved.

This implies that Pin<&mut T> is in fact a generalized version of &mut T. Every &'a mut T keeps the value pinned for the duration of its lifetime 'a, but a Pin<&mut T> ensures the value stays pinned for the lifetime of the underlying storage.

If we have a DerefMut impl, then Pin<&mut Self> methods can freely call &mut self methods. Big win!

Today you must categorize fields as “structurally pinned” or not

The other pain point today with Pin is that we have no native support for “pin projection”⁶. That is, you cannot safely go from a Pin<&mut Self> reference to a Pin<&mut F> method that referring to some field self.f without relying on unsafe code.

The most common practice today is to use a custom crate like pin-project-lite. Even then, you also have to make a choice for each field between whether you want to be able to get a Pin<&mut F> reference or a normal &mut F reference. Fields for which you can get a pinned reference are called structurally pinned and the criteria for which one you should use is rather subtle. Ultimately this choice is required because Pin<&mut F> and &mut F don’t play nicely together.

Pin projection is safe from any `!Overwrite` type

With s/Unpin/Overwrite/, we can scrap the idea of structural pinning. Instead, if we have a field owner self: Pin<&mut Self>, pinned projection is allowed so long as Self: !Overwrite. That is, if Self: !Overwrite, then I can always get a Pin<&mut F> reference to some field self.f of type F. How is that possible?

Actually, the full explanation relies on borrow checker extensions I haven’t introduced yet. But let’s see how far we get without them, so that we can see the gap that the borrow checker has to close.

Assume we are creating a Pin<&'a mut F> reference r to some field self.f, where self: Pin<&mut Self>:

We are creating a Pin<&'a mut F> reference to the value in self.f:
- If F: Overwrite, then the value is not pinnable, so this is equivalent to an ordinary &mut F and we have nothing to prove.
- Else, if F: !Overwrite, then we have to show that the value in self.f will not move for the remainder of its lifetime.
 - Pin projection from ``*selfis only valid ifSelf: !Overwriteandself: Pin<&‘b mut Self>, so we know that the value in *self` is pinned for the remainder of its lifetime by induction.
 - We have to show then that the value v_f in self.f will never be moved until the end of its lifetime.

There are three ways to move a value out of self.f:

You can assign a new value to self.f, like self.f = ....
- This will run the destructor, ending the lifetime of the value v_f.
You can create a mutable reference r = &mut self.f and then…
- assign a new value to *r: but that will be an error because F: !Overwrite.
- swap the value in *r with another: but that will be an error because F: !Overwrite.

QED. =)

Making `Pin`-based APIs easier to call

Today, getting a Pin<&mut> requires using the pin! macro, going through Box::pin, or some similar explicit action. This adds “syntactic salt” to calling a Pin<&mut Self> some other abstraction rooted in unsafe (e.g., Box::pin). There is no built-in way to safely create a pinned reference. This is fine but introduces ergonomic hurdles

We want to make calling a Pin<&mut Self> method as easy as calling an &mut self method. To do this, we need to extra the compiler’s notion of “auto-ref” to include the option of “auto-pin-ref”:

// Instead of this:
let future: Pin<&mut impl Future> = pin!(async { ... });
future.poll(cx);

// We would do this:
let mut future: impl Future = async { ... };
future.poll(cx); // <-- Wowee!

Just as a typical method call like vec.len() expands to Vec::len(&vec), the compiler would be expanding future.poll(cx) to something like so:

Future::poll(&pinned mut future, cx)
//           ^^^^^^^^^^^ but what, what's this?

This expansion though includes a new piece of syntax that doesn’t exist today, the &pinned mut operation. (I’m lifting this syntax from boats’ pinned places proposal.)

Whereas &mut var results in an &mut T reference (assuming var: T), &pinned mut var borrow would result in a Pin<&mut T>. It would also make the borrow checker consider the value in future to be pinned. That means that it is illegal to move out from var. The pinned state continues indefinitely until var goes out of scope or is overwritten by an assignment like var = ... (which drops the heretofore pinned value). This is a fairly straightforward extension to the borrow checker’s existing logic.

New syntax not strictly required

It’s worth noting that we don’t actually need the &pinned mut syntax (which means we don’t need the pinned keyword). We could make it so that the only way to get the compiler to do a pinned borrow is via auto-ref. We could even add a silly trait to make it explicit, like so:

trait Pinned {
    fn pinned(self: Pin<&mut Self>) -> Pin<&mut Self>;
}

impl<T: ?Sized> Pinned for T {
    fn pinned(self: Pin<&mut T>) -> Pin<&mut T> {
        self
    }
}

Now you can write var.pinned(), which the compiler would desugar to Pinned::pinned(&rustc#pinned mut var). Here I am using rustc#pinned to denote an “internal keyword” that users can’t type.⁷

Frequently asked questions

So…there’s a lot here. What’s the key takeaways?

The shortest version of this post I can manage is⁸

Pinning fits smoothly into Rust if we make two changes:
- Limit the ability to swap types by default, making Pin<&mut T> a subtype of &mut T and enabling uniform pin projection.
- Integrate pinning in the auto-ref rules and the borrow checker.

Why do you only mention swaps? Doesn’t `Overwrite` affect other things?

Indeed the Overwrite trait as I defined it is overkill for pinning. The more precise, we might imagine two special traits that affect how and when we can drop or move values:

trait DropWhileBorrowed: Sized { }
trait Swap: DropWhileBorrowed { }

Given a reference r: &mut T, overwriting its referent *r with a new value would require T: DropWhileBorrowed;
Swapping two values of type T requires that T: Swap.
- This is true regardless of whether they are borrowed or not.

Today, every type is Swap. What I argued in the previous post is that we should make the default be that user-defined types implement neither of these two traits (over an edition, etc etc). Instead, you could opt-in to both of them at once by implementing Overwrite.

But we could get all the pin benefits by making a weaker change. Instead of having types opt out from both traits by default, they could only opt out of Swap, but continue to implement DropWhileBorrowed. This is enough to make pinning work smoothly. To see why, recall the pinning state diagram: dropping the value in *r (permitted by DropWhileBorrowed) will exit the “pinned” state and return to the “uninitialized” state. This is valid. Swapping, in contrast, is UB.

Two subtle observations here worth calling out:

Both DropWhileBorrowed and Swap have Sized as a supertrait. Today in Rust you can’t drop a &mut dyn SomeTrait value and replace it with another, for example. I think it’s a bit unclear whether unsafe could do this if it knows the dynamic type of value behind the dyn. But under this model, it would only be valid for unsafe code do that drop if (a) it knew the dynamic type and (b) the dynamic type implemented DropWhileBorrowed. Same applies to Swap.
The Swap trait applies longer than just the duration of a borrow. This is because, once you pin a value to create a Pin<&mut T> reference, the state of being pinned persists even after that reference has ended. I say a bit more about this in another FAQ below.

EDIT: An earlier draft of this post named the trait Swap. This was wrong, as described in the FAQ on subtle reasoning.

Why then did you propose opting out from both overwrites and swaps?

Opting out of overwrites (i.e., making the default be neither DropWhileBorrowed nor Swap) gives us the additional benefit of truly immutable fields. This will make cross-function borrows less of an issue, as I described in my previous post, and make some other things (e.g., variance) less relevant. Moreover, I don’t think overwriting an entire reference like *r is that common, versus accessing individual fields. And in the cases where people do do it, it is easy to make a dummy struct with a single field, and then overwrite r.value instead of *r. To me, therefore, distinguishing between DropWhileBorrowed and Swap doesn’t obviously carry its weight.

Can you come up with a more semantic name for `Overwrite`?

All the trait names I’ve given so far (Overwrite, DropWhileBorrowed, Swap) answer the question of “what operation does this trait allow”. That’s pretty common for traits (e.g., Clone or, for that matter, Unpin) but it is sometimes useful to think instead about “what kinds of types should implement this trait” (or not implement it, as the case may be).

My current favorite “semantic style name” is Mobile, which corresponds to implementing Swap. A mobile type is one that, while borrowed, can move to a new place. This name doesn’t convey that it’s also ok to drop the value, but that follows, since if you can swap the value to a new place, you can presumably drop that new place.

I don’t have a “semantic” name for DropWhileBorrowed. As I said, I’m hard pressed to characterize the type that would want to implement DropWhileBorrowed but not Swap.

What do `DropWhileBorrowed` and `Swap` have in common?

These traits pertain to whether an owner who lends out a local variable (i.e., executes r = &mut lv) can rely on that local variable lv to store the same value after the borrow completes. Under this model, the answer depends on the type T of the local variable:

If T: DropWhileBorrowed (or T: Swap, which implies DropWhileBorrowed), the answer is “no”, the local variable may point at some other value, because it is possible to do *r = /* new value */.
But if T: !DropWhileBorrowed, then the owner can be sure that lv still stores the same value (though lv’s fields may have changed).

Let’s use an analogy. Suppose I own a house and I lease it out to someone else to use. I expect that they will make changes on the inside, such as hanging up a new picture. But I don’t expect them to tear down the house and build a new one on the same lot. I also don’t expect them to drive up a flatbed truck, load my house onto it, and move it somewhere else (while proving me with a new one in return). In Rust today, a reference r: &mut T reference allows all of these things:

Mutating a field like r.count += 1 corresponds to hanging up a picture. The values inside r change, but r still refers to the same conceptual value.
Overwriting *r = t with a new value t is like tearing down the house and building a new one. The original value that was in r no longer exists.
Swapping *r with some other reference *r2 is like moving my house somewhere else and putting a new house in its place.

EDIT: Wording refined based on feedback.

What does it mean to be the “same value”?

One question I received was what it meant for two structs to have the “same value”? Imagine a struct with all public fields – can we make any sense of it having an identity? The way I think of it, every struct has a “ghost” private field $identity (one that doesn’t exist at runtime) that contains its identity. Every StructName { } expression has an implicit $identity: new_value() that assigns the identity a distinct value from every other struct that has been created thus far. If two struct values have the same $identity, then they are the same value.

Admittedly, if a struct has all public fields, then it doesn’t really matter whether it’s identity is the same, except perhaps to philosophers. But most structs don’t.

An example that can help clarify this is what I call the “scope pattern”. Imagine I have a Scope type that has some private fields and which can be “installed” in some way and later “deinstalled” (perhaps it modifies thread-local values):

pub struct Scope {...}

impl Scope {
    fn new() -> Self { /* install scope */ }
}

impl Drop for Scope {
    fn drop(&mut self) {
        /* deinstall scope */
    }
}

And the only way for users to get their hands on a “scope” is to use with_scope, which ensures it is installed and deinstalled properly:

pub fn with_scope(op: impl FnOnce(&mut Scope)) {
    let mut scope = Scope::new();
    op(&mut scope);
}

It may appear that this code enforces a “stack discipline”, where nested scopes will be installed and deinstalled in a stack-like fashion. But in fact, thanks to std::mem::swap, this is not guaranteed:

with_scope(|s1| {
    with_scope(|s2| {
        std::mem::swap(s1, s2);
    })
})

This could easily cause logic bugs or, in unsafe is involved, something worse. This is why lending out scopes requires some extra step to be safe, such as using a &-reference or adding a “fresh” lifetime paramteer of some kind to ensure that each scope has a unique type. In principle you could also use a type like &mut dyn ScopeTrait, because the compiler disallows overwriting or swapping dyn Trait values: but I think it’s ambiguous today whether unsafe code could validly do such a swap.

EDIT: Question added based on feedback.

There’s a lot of subtle reasoning in this post. Are you sure this is correct?

I am pretty sure! But not 100%. I’m definitely scared that people will point out some obvious flaw in my reasoning. But of course, if there’s a flaw I want to know. To help people analyze, let me recap the two subtle arguments that I made in this post and recap the reasoning.

Lemma. Given some local variable lv: T where T: !Overwrite mutably borrowed by a reference r: &'a mut T, the value in lv cannot be dropped, moved, or forgotten for the lifetime 'a.

During 'a, the variable lv cannot be accessed directly (per the borrow checker’s usual rules). Therefore, any drops/moves/forgets must take place to *r:

Because T: !Overwrite, it is not possible to overwrite or swap *r with a new value; it is only legal to mutate individual fields. Therefore the value cannot be dropped or moved.
Forgetting a value (via std::mem::forget) requires ownership and is not accesible while lv is borrowed.

Theorem A. If we replace T: Unpin and T: Overwrite, then Pin<&mut T> is a safe subtype of &mut T.

The argument proceeds by cases:

If T: Overwrite, then Pin<&mut T> does not refer to a pinned value, and hence it is semantically equivalent to &mut T.
If T: !Overwrite, then Pin<&mut T> does refer to a pinned value, so we must show that the pinning guarantee cannot be disturbed by the &mut T. By our lemma, the &mut T cannot move or forget the pinned value, which is the only way to disturb the pinning guarantee.

Theorem B. Given some field owner o: O where O: !Overwrite with a field f: F, it is safe to pin-project from Pin<&mut O> to a Pin<&mut F> reference referring to o.f.

The argument proceeds by cases:

If F: Overwrite, then Pin<&mut F> is equivalent to &mut F. We showed in Theorem A that Pin<&mut O> could be upcast to &mut O and it is possible to create an &mut F from &mut O, so this must be safe.
If F: !Overwrite, then Pin<&mut F> refers to a pinned value found in o.f. The lemma tells us that the value in o.f will not be disturbed for the duration of the borrow.

EDIT: It was pointed out to me that this last theorem isn’t quite proving what it needs to prove. It shows that o.f will not be disturbed for the duration of the borrow, but to meet the pin rules, we need to ensure that the value is not swapped even after the borrow ends. We can do this by committing to never permit swaps of values unless T: Overwrite, regardless of whether they are borrowed. I meant to clarify this in the post but forgot about it, and then I made a mistake and talked about Swap – but Swap is the right name.

What part of this post are you most proud of?

Geez, I’m so glad you asked! Such a thoughtful question. To be honest, the part of this post that I am happiest with is the state diagram for places, which I’ve found very useful in helping me to understand Pin:

flowchart TD
Uninitialized 
Initialized
Pinned

Uninitialized --
    `p = v` where `v: T`
--> Initialized

Initialized -- 
    move out, drop, or forget
--> Uninitialized

Initialized --
    pin value `v` in `p`
    (only possible when `T` is `!Unpin`)
--> Pinned

Pinned --
    drop value
--> Uninitialized

Pinned --
    move out or forget
--> UB

Uninitialized --
    free the place
--> Freed

UB[💥 Undefined behavior 💥]

Obviously this question was just an excuse to reproduce it again. Some of the key insights that it helped me to crystallize:

A value that is Unpin cannot be pinned:
- And hence Pin<&mut Self> really means “reference to a maybe-pinned value” (a value that is pinned if it can be).
Forgetting a value is very different from leaking the place that value is stored:
- In both cases, the value’s Drop never runs, but only one of them can lead to a “freed place”.

In thinking through the stuff I wrote in this post, I’ve found it very useful to go back to this diagram and trace through it with my finger.

Is this backwards compatible?

Maybe? The question does not have a simple answer. I will address in a future blog post in this series. Let me say a few points here though:

First, the s/Unpin/Overwrite/ proposal is not backwards compatible as I described. It would mean for example that all futures returned by async fn are no longer Overwrite. It is quite possible we simply can’t get away with it.

That’s not fatal, but it makes things more annoying. It would mean there exist types that are !Unpin but which can be overwritten. This in turn means that Pin<&mut Self> is not a subtype of &mut Self for all types. Pinned mutable references would be a subtype for almost all types, but not those that are !Unpin && Overwrite.

Second, a naive, conservative transition would definitely be rough. My current thinking is that, in older editions, we add T: Overwrite bounds by default on type parameters T and, when you have a T: SomeTrait bound, we would expand that to include a Overwrite bound on associated types in SomeTrait, like T: SomeTrait. When you move to a newer edition I think we would just not add those bounds. This is kind of a mess, though, because if you call code from an older edition, you are still going to need those bounds to be present.

That all sounds painful enough that I think we might have to do something smarter, where we don’t always add Overwrite bounds, but instead use some kind of inference in older editions to avoid it most of the time.

Conclusion

My takeaway from authoring this post is that something like Overwrite has the potential to turn Pin from wizard level Rust into mere “advanced Rust”, somewhat akin to knowing the borrow checker really well. If we had no backwards compatibility constraints to work with, it seems clear that this would be a better design than Unpin as it is today.

Of course, we do have backwards compatibility constraints, so the real question is how we can make the transition. I don’t know the answer yet! I’m planning on thinking more deeply about it (and talking to folks) once this post is out. My hope was first to make the case for the value of Overwrite (and to be sure my reasoning is sound) before I invest too much into thinking how we can make the transition.

Assuming we can make the transition, I’m wondering two things. First, is Overwrite the right name? Second, should we take the time to re-evaluate the default bounds on generic types in a more complete way? For example, to truly have a nice async story, and for myraid other reasons, I think we need must move types. How does that fit in?

The precise design of generators is of course an ongoing topic of some controversy. I am not trying to flesh out a true design here or take a position. Mostly I want to show that we can create ergonomic bridges between “must pin” types like generators and “non pin” interfaces like Iterator in an ergonomic way without explicit mentioning of pinning. ↩︎
Boats has argued that, since no existing iterator can support borrows over a yield point, generators might not need to do so either. I don’t agree. I think supporting borrows over yield points is necessary for ergonomics just as it was in futures. ↩︎
Actually for Pin>. ↩︎
I will say, I use std::mem::forget quite regularly, but mostly to make up for a shortcoming in Drop. I would like it if Drop had a separate method, fn drop_on_unwind(&mut self), and we invoked that method when unwinding. Most of the time, it would be the same as regular drop, but in some cases it’s useful to have cleanup logic that only runs in the case of unwinding. ↩︎
In contrast, a Pin<&mut T> reference can be safely converted into an &T reference, as evidenced by Pin’s Deref impl. This is because, even if T: !Unpin, a &T reference cannot do anything that is invalid for a pinned value. You can’t swap the underlying value or read from it. ↩︎
Projection is the wonky PL term for “accessing a field”. It’s never made much sense to me, but I don’t have a better term to use, so I’m sticking with it. ↩︎
We have a syntax k#foo for explicitly referred to a keyword foo. It is meant to be used only for keywords that will be added in future Rust editions. However, I sometimes think it’d be neat to internal-ish keywords (like k#pinned) that are used in desugaring but rarely need to be typed explicitly; you would still be able to write k#pinned if for whatever reason you wanted to. And of course we could later opt to stabilize it as pinned (no prefix required) in a future edition. ↩︎
I tried asking ChatGPT to summarize the post but, when I pasted in my post, it replied, “The message you submitted was too long, please reload the conversation and submit something shorter.” Dang ChatGPT, that’s rude! Gemini at least gave it the old college try. Score one for Google. Plus, it called my post “thought-provoking!” Aww, I’m blushing! ↩︎

Making overwrite opt-in #crazyideas

2024-09-26T00:00:00+00:00

What would you say if I told you that it was possible to (a) eliminate a lot of “inter-method borrow conflicts” without introducing something like view types and (b) make pinning easier even than boats’s pinned places proposal, all without needing pinned fields or even a pinned keyword? You’d probably say “Sounds great… what’s the catch?” The catch it requires us to change Rust’s fundamental assumption that, given x: &mut T, you can always overwrite *x by doing *x = /* new value */, for any type T: Sized. This kind of change is tricky, but not impossible, to do over an edition.

TL;DR

We can reduce inter-procedural borrow check errors, increase clarity, and make pin vastly simpler to work with if we limit when it is possible to overwrite an &mut reference. The idea is that if you have a mutable reference x: &mut T, it should only be possible to overwrite x via *x = /* new value */ or to swap its value via std::mem::swap if T: Overwrite. To start with, most structs and enums would implement Overwrite, and it would be a default bound, like Sized; but we would transition in a future edition to have structs/enums be !Overwrite by default and to have T: Overwrite bounds written explicitly.

Structure of this series

This blog post is part of a series:

This first post will introduce the idea of immutable fields and show why they could make Rust more ergonomic and more consistent. It will then show how overwrites and swaps are the key blocker and introduce the idea of the Overwrite trait, which could overcome that.
In the next post, I’ll dive deeper into Pin and how the Overwrite trait can help there.
After that, who knows? Depends on what people say in response.¹

If you could change one thing about Rust, what would it be?

People often ask me to name something I would change about Rust if I could. One of the items on my list is the fact that, given a mutable reference x: &mut SomeStruct to some struct, I can overwrite the entire value of x by doing *x = /* new value */, versus only modifying individual fields like x.field = /* new value */.

Having the ability to overwrite *x always seemed very natural to me, having come from C, and it’s definitely useful sometimes (particularly with Copy types like integers or newtyped integers). But it turns out to make borrowing and pinning much more painful than they would otherwise have to be, as I’ll explain shortly.

In the past, when I’ve thought about how to fix this, I always assumed we would need a new form of reference type, like &move T or something. That seemed like a non-starter to me. But at RustConf last week, while talking about the ergonomics of Pin, a few of us stumbled on the idea of using a trait instead. Under this design, you can always make an x: &mut T, but you can’t always assign to *x as a result. This turns out to be a much smoother integration. And, as I’ll show, it doesn’t really give up any expressiveness.

Motivating example #1: Immutable fields

In this post, I’m going to motivate the changes by talking about immutable fields. Today in Rust, when you declare a local variable let x = …, that variable is immutable by default². Fields, in contrast, inherit their mutability from the outside: when a struct appears in a mut location, all of its fields are mutable.

Not all fields are mutable, but I can’t declare that in my Rust code

It turns out that declaring local variables as mut is not needed for the borrow checker — and yet we do it nonetheless, in part because it helps readability. It’s useful to see when a variable might change. But if that argument holds for local variables, it holds double for fields! For local variables, we can find all potential mutation just by searching one function. To know if a field may be mutated, we have to search across many functions. And for fields, precisely because they can be mutated across functions, declaring them as immutable can actually help the borrow checker to see that your code is safe.

Idea: Declare fields as mutable

So what if we extended the mutable declaration to fields? The idea would be that, in your struct, if you want to mutate fields, you have to declare them as mut. This would allow them to be mutated: but only if the struct itself appears in a mutable local field.

For example, maybe I have an Analyzer struct that is created with some vector of datums and which has to compute the number of “important” ones:

#[derive(Default)]
struct Analyzer {
    /// Data being analyzed: will never be modified.
    data: Vec<Datum>,

    /// Number of important datums uncovered so far.
    mut important: usize,
}

As you can see from the struct declaration, the field data is declared as immutable. This is because we are only going to be reading the Datum values. The important field is declared as mut, indicating that it will be updated.

When can you mutate fields?

In this world, mutating a field is only possible when (1) the struct appears in a mutable location and (2) the field you are referencing is declared as mut. So this code compiles fine, because the field important is mut:

let mut analyzer = Analyzer::new();
analyzer.important += 1; // OK: mut field in a mut location

But this code does not compile, because the local variable x is not:

let x = Analyzer::default();
x.important += 1; // ERROR: `x` not declared as mutable

And this code does not compile, because the field data is not declared as mut:

let mut x = Analyzer::default();
x.data.clear(); // ERROR: field `data` is not declared as mutable

Leveraging immutable fields in the borrow checker

So why is it useful to declare fields as mut? Well, imagine you have a method like increment_if_important, which checks if datum.is_important() is true and modifies the important flag if so:

impl Analyzer {
    fn increment_if_important(&mut self, datum: &Datum) {
        if datum.is_important() {
            self.important += 1;
        }
    }
}

Now imagine you have a function that loops over self.data and calls increment_if_important on each item:

impl Analyzer {
    fn count_important(&mut self) {
        for datum in &self.data {
            self.increment_if_important(datum);
        }
    }
}

I can hear the experienced Rustaceans crying out in pain now. This function, natural as it appears, will not compile in Rust today. Why is that? Well, we have a shared borrow on self.data but we are trying to call an &mut self function, so we have no way to be sure that self.data will not be modified.

But what about immutable fields? Doesn’t that solve this?

Annoyingly, immutable fields on their own don’t change anything! Why? Well, just because you can’t write to a field directly doesn’t mean you can’t mutate the memory it’s stored in. For example, maybe I write a malicious version of increment_if_important:

impl Analyzer {
    fn malicious_increment_if_important(&mut self, datum: &Datum) {
        *self = Analyzer::default();
    }
}

This version never directly accesses the field data, but it just writes to *self, and hence it has the same impact. Annoying!

Generics: why we can’t trivially disallow overwrites

Maybe you’re thinking “well, can’t we just disallow overwriting *self if there are fields declared mut?” The answer is yes, we can, and that’s what this blog post is about. But it’s not so simple as it sounds, because we are changing the “basic contract” that all Rust types currently satisfy. In particular, Rust today assumes that if you have a reference x: &mut T and a value v: T, you can always do *x = v and overwrite the referent of x. That means I could can write a generic function like set_to_default:

fn set_to_default<T: Default>(r: &mut T) {
    *r = T::default();
}

Now, since Analyzer implements Default, I can make increment_if_important call set_to_default. This will still free self.data, but it does it in a sneaky way, where we can’t obviously tell that the value being overwritten is an instance of a struct with mut fields:

impl Analyzer {
    fn malicious_increment_if_important(&mut self, datum: &Datum) {
        // Overwrites `self.data`, but not in an obvious way
        set_to_default(self);
    }
}

Recap

So let’s step back and recap what we’ve seen so far:

If we could distinguish which fields were mutable and which were definitely not, we could eliminate many inter-function borrow check errors³.
However, just adding mut declarations is not enough, because fields can also be mutated indirectly. Specifically, when you have a &mut SomeStruct, you can overwrite with a fresh instance of SomeStruct or swap with another &mut SomeStruct, thus changing all fields at once.
Whatever fix we use has to consider generic code like std::mem::swap, which mutates an &mut T without knowing precisely what T is. Therefore we can’t do something simple like looking to see if T is a struct with mut fields⁴.

The trait system to the rescue

My proposal is to introduce a new, built-in marker trait called Overwrite:

/// Marker trait that permits overwriting
/// the referent of an `&mut Self` reference.
#[marker] // <-- means the trait cannot have methods
trait Overwrite: Sized {}

The effect of `Overwrite`

As a marker trait, Overwrite does not have methods, but rather indicates a property of the type. Specifically, assigning to a borrowed place of type T requires that T: Overwrite is implemented. For example, the following code writes to *x, which has type T; this is only legal if T: Overwrite:

fn overwrite<T>(x: &mut T, t: T) {
    *x = t; // <— requires `T: Overwrite`
}

Given this this code compiles today, this implies that a generic type parameter declaration like would require a default Overwrite bound in the current edition. We would want to phase these defaults out in some future edition, as I’ll describe in detail later on.

Similarly, the standard library’s swap function would require a T: Overwrite bound, since it (via unsafe code) assigns to *x and *y:

fn swap<T>(x: &mut T, y: &mut T) {
    unsafe {
        let tmp: T = std::ptr::read(x);
        std::ptr::write(*x, *y); // overwrites `*x`, `T: Overwrite` required
        std::ptr::write(*y, tmp); // overwrites `*y`, `T: Overwrite` required
    }
}

`Overwrite` requires `Sized`

The Overwrite trait requires Sized because, for *x = /* new value */ to be safe, the compiler needs to ensure that the place *x has enough space to store “new value”, and that is only possible when the size of the new value is known at compilation time (i.e., the type implements Sized).

`Overwrite` only applies to borrowed values

The overwrite trait is only needed when assigning to a borrowed place of type T. If that place is owned, the owner is allowed to reassign it, just as they are allowed to drop it. So e.g. the following code compiles whether or not SomeType: Overwrite holds:

let mut x: SomeType = /* something */;
x = /* something else */; // <— does not require that `SomeType: Overwrite` holds

Subtle: `Overwrite` is not infectious

Somewhat surprisingly, it is ok to have a struct that implements Overwrite which has fields that do not. Consider the types Foo and Bar, where Foo: Overwrite holds but Bar: Overwrite does not:

struct Foo(Bar);
struct Bar;
impl Overwrite for Foo { }
impl !Overwrite for Bar { }

The following code would type check:

let foo = &mut Foo(Bar);
// OK: Overwriting a borrowed place of type `Foo`
// and `Foo: Overwrite` holds.
*foo = Foo(Bar);

However, the following code would not:

let foo = &mut Foo(Bar);
// ERROR: Overwriting a borrowed place of type `Bar`
// but `Bar: Overwrite` does not hold.
foo.0 = Bar;

Types that do not implement Overwrite can therefore still be overwritten in memory, but only as part of overwriting the value in which they are embedded. In the FAQ I show how this non-infectious property preserves expressiveness.⁵

Who implements `Overwrite`?

This section walks through which types should implement Overwrite.

`Copy` implies `Overwrite`

Any type that implements Copy would automatically implement Overwrite:

impl<T: Copy> Overwrite for T { }

(If you, like me, get nervous when you see blanket impls due to coherence concerns, it’s worth noting that RFC #1268 allows for overlapping impls of marker traits, though that RFC is not yet fully implemented nor stable. It’s not terribly relevant at the moment anyway.)

“Pointer” types are `Overwrite`

Types that represent pointers all implement Overwrite for all T:

&T
&mut T
Box
Rc
Arc
*const T
*mut T

`dyn`,`[]`, and other “unsized” types do not implement `Overwrite`

Types that do not have a static size, like dyn and [], do not implement Overwrite. Safe Rust already disallows writing code like *x = … in such cases.

There are ways to do overwrites with unsized types in unsafe code, but they’d have to prove various bounds. For example, overwriting a [u32] value could be ok, but you have to know the length of data. Similarly swapping two dyn Value referents can be safe, but you have to know that (a) both dyn values have the same underlying type and (b) that type implements Overwrite.

Structs and enums

The question of whether structs and enums should implement Overwrite is complicated because of backwards compatibility. I’m going to distinguish two cases: Rust 2021, and Rust Next, which is Rust in some hypothetical future edition (surely not 2024, but maybe the one after that).

Rust 2021. Struct and enum types in Rust 2021 implement Overwrite by default. Structs could opt-out from Overwrite with an explicit negative impl (impl !Overwrite for S).

Integrating mut fields. Structs that have opted out from Overwrite require mutable fields to be declared as mut. Fields not declared as mut are immutable. This gives them the nicer borrow check behavior.⁶

Rust Next. In some future edition, we can swap the default, with fields being !Overwrite by default and having to opt-in to enable overwrites. This would make the nice borrow check behavior the default.

Futures and closures

Futures and closures can implement Overwrite iff their captured values implement Overwrite, though in future editions it would be best if they simple do not implement Overwrite.

Default bounds and backwards compatibility

The other big backwards compatibility issue has to do with default bounds. In Rust 2021, every type parameter declared as T implicitly gets a T: Sized bound. We would have to extend that default to be T: Sized + Overwrite. This also applies to associated types in trait definitions and impl X types.⁷

Interestingly, type parameters declared as T: ?Sized also opt-out from Overwrite. Why is that? Well, remember that Overwrite: Sized, so if T is not known to be Sized, it cannot be known to be Overwrite either. This is actually a big win. It means that types like &T and Box can work with “non-overwrite” types out of the box.

Associated type bounds are annoying, but perhaps not fatal

Still, the fact that default bounds apply to associated types and impl Trait is a pain in the neck. For example, it implies that Iterator::Item would require its items to be Overwrite, which would prevent you from authoring iterators that iterate over structs with immutable fields. This can to some extent be overcome by associated type aliases⁸ (we could declare Item to be a “virtual associated type”, mapping to Item2021 in older editions, which require Overwrite, and ItemNext in newer ones, which do not).

Frequently asked questions

OMG endless words. What did I just read?

Let me recap!

It would be more declarative and create fewer borrow check conflicts if we had users declare their fields as mut when they may be mutated and we were able to assume that non-mut fields will never be mutated.
- If we were to add this, in the current Rust edition it would obviously be opt-in.
- But in a future Rust edition it would become mandatory to declare fields as mut if you want to mutate them.
But to do that, we need to prevent overwrites and swaps. We can do that by introducing a trait, Overwrite, that is required to a given location.
- In the current Rust edition, this trait would be added by default to all type parameters, associated types, and impl Trait bounds; it would be implemented by all structs, enums, and unions.
- In a future Rust edition, the trait would no longer be the default, and structs, enums, and unions would have to explicitly implement if they want to be overwriteable.

This change doesn’t seem worth it just to get immutable fields. Is there more?

But wait, there’s more! Oh, you just said that. Yes, there’s more. I’m going to write a follow-up post showing how opting out from Overwrite eliminates most of the ergonomic pain of using Pin.

In “Rust Next”, who would ever implement `Overwrite` manually?

I said that, in Rust Next, types should be !Overwrite by default and require people to implement Overwrite manually if they want to. But who would ever do that? It’s a good question, because I don’t think there’s very much reason to.

Because Overwrite is not infectious, you can actually make a wrapper type…

#[repr(transparent)]
struct ForceOverwrite<T> { t: T }
impl<T> Overwrite for ForceOverwrite <T> { }

…and now you can put values of any type X into an ForceOverwrite which can be reassigned.

This pattern allows you to make “local” use of overwrite, for example to implement a sorting algorithm (which has to do a lot of swapping). You could have a sort function that takes an &mut [T] for any T: Ord (Overwrite not required):

fn sort<T: Ord>(data: &mut [T])

Internally, it can safely transmute the &mut [T] to a &mut [ForceOverwrite] and sort that. Note that at no point during that sorting are we moving or overwriting an element while it is borrowed (the slice that owns it is borrowed, but not the elements themselves).

What is the relationship of `Overwrite` and `Unpin`?

I’m still puzzling that over myself. I think that Overwrite is “morally the same” as Unpin, but it is much more powerful (and ergonomic) because it is integrated into the behavior of &mut (of course, this comes at the cost of a complex backwards compatibility story).

Let me describe it this way. Types that do not implement Overwrite cannot be overwritten while borrowed, and hence are “pinned for the duration of the borrow”. This has always been true for &T, but for &mut T has traditionally not been true. We’ll see in the next post that Pin<&mut T> basically just extends that guarantee to apply indefinitely.

Compare that to types that do not implement Unpin and hence are “address sensitive”. Such types are pinned for the duration of a Pin<&mut T>. Unlike T: !Overwrite types, they are not pinned by &mut T references, but that’s a bug, not a feature: this is why Pin has to bend over backwards to prevent you from getting your hands on an &mut T.

I’ll explain this more in my next post, of course.

Should `Overwrite` be an auto trait?

I think not. If we did so, it would lock people into semver hazards in the “Rust Next” edition where mut is mandatory for mutation. Consider a struct Foo { value: u32 } type. This type has not opted into becoming Copy, but it only contains types that are Copy and therefore Overwrite. By auto trait rules it would by default be Overwrite. But that would prevent you from adding a mut field in the future or benefit from immutable fields. This is why I said the default would just be !Overwrite, no matter the field types.

Conclusion

After this grandiose intro, hopefully I won’t be printing a retraction of the idea due to some glaring flaw… eep! ↩︎
Whenever I saw immutable here, I mean immutable-modulo-Cell, of course. We should probably find another word for that, this is kind of terminology debt that Rust has bought its way into and I’m not sure the best way for us to get out! ↩︎
Immutable fields don’t resolve all inter-function borrow conflicts. To do that, you need something like view types. But in my experience they would eliminate many. ↩︎
The simple solution — if a struct has mut fields, disallow overwriting it — is basically what C++ does with their const fields. Classes or structs with const fields are more limited in how you can use them. This works in C++ because they don’t wait until post-substitution to check templates for validity. ↩︎
I love the Felleisen definition of “expressiveness”: two language features are equally expressive if one can be converted into the other with only local rewrites, which I generally interpret as “rewrites that don’t affect the function signature (or other abstraction boundary)”. ↩︎
We can also make the !Overwrite impl implied by declaring fields mut, of course. This is fine for backwards compatibility, but isn’t the design I would want long-term, since it introduces an odd “step change” where declaring one field as mut implicitly declares all other fields as immutable (and, conversely, deleting the mut keyword from that field has the effect of declaring all fields, including that one, as mutable). ↩︎
The Self type in traits is exempt from the Sized default, and it could be exempt from the Overwrite default as well, unless the trait is declared as Sized. ↩︎
Hat tip to TC, who pointed this out to me. ↩︎

More thoughts on claiming

2024-06-26T00:00:00+00:00

This is the first of what I think will be several follow-up posts to “Claiming, auto and otherwise”. This post is focused on clarifying and tweaking the design I laid out previously in response to some of the feedback I’ve gotten. In future posts I want to lay out some of the alternative designs I’ve heard.

TL;DR: People like it

If there’s any one thing I can take away from what I’ve heard, is that people really like the idea of making working with reference counted or cheaply cloneable data more ergonomic than it is today. A lot of people have expressed a lot of excitement.

If you read only one additional thing from the post—well, don’t do that, but if you must—read the Conclusion. It attempts to restate what I was proposing to help make it clear.

Clarifying the relationship of the traits

I got a few questions about the relationship of the Copy/Clone/Claim traits to one another. I think the best way to show it is with a venn diagram:

The Clone trait is the most general, representing any way of duplicating the value. There are two important subtraits:
- Copy represents values that can be cloned via memcpy and which lack destructors (“plain old data”).
- Claim represents values whose clones are cheap, infallible, and transparent; on the basis of these properties, claims are inserted automatically by the compiler.

Copy and Claim overlap but do not have a strict hierarchical relationship. Some Claim types (like Rc and Arc) are not “plain old data”. And while all Copy operations are infallible, some of them fail to meet claims other conditions:

Copying a large type like [u8; 1024] is not cheap.
Copying a type with interior mutability like Cell is not transparent.

On heuristics

One challenge with the Claim trait is that the choice to implement it involves some heuristics:

What exactly is cheap? I tried to be specific by saying “O(1) and doesn’t copy more than a few cache lines”, but clearly it will be hard to draw a strict line.
What exactly is infallible? It was pointed out to me that Arc will abort if the ref count overflows (which is one reason why the Rust-for-Linux project rolled their own alternative). And besides, any Rust code can abort on stack overflow. So clearly we need to have some reasonable compromise.
What exactly is transparent? Again, I tried to specify it, but iterator types are an example of types that are technically transparent to copy but where it is nontheless very confusing to claim them.

An aversion to heuristics is the reason we have the current copy/clone split. We couldn’t figure out where to draw the line (“how much data is too much?”) so we decided to simply make it “memcpy or custom code”. This was a reasonable starting point, but we’ve seen that it is imperfect, leading to uncomfortable compromises.

The thing about “cheap, infallible, and transparent” is that I think it represents exactly the criteria that we really want to represent when something can be automatically claimed. And it seems inherent that those criteria are a bit squishy.

One implication of this is that Claim should rarely if ever appear as a bound on a function. Writing fn foo(t: T) doesn’t really feel like it adds a lot of value to me, since, given the heuristical nature of claim, it’s going to rule out some uses that may make sense. eternaleye proposed an interesting twist on the original proposal, suggesting we introducing stricter versions of Claim for, say, O(1) Clone, although I don’t yet see what code would want to use that as a bound either.

“Infallible” ought to be “does not unwind” (and we ought to abort if it does)

I originally laid out the conditions for claim as “cheap, infallible, and transparent”, where “infallible” means “cannot panic or abort”. But it was pointed out to me that Arc and Rc in the standard library will indeed abort if the ref-count exceeds std::usize::MAX! This obviously can’t work, since reference counted values are the prime candidate to implement Claim.

Therefore, I think infallible ought to say that “Claim operations should never panic”. This almost doesn’t need to be said, since panics are already meant to represent impossible or extraordinarily unlikely conditions, but it seems worth reiterating since it is particularly important in this case.

In fact, I think we should go further and have the compiler insert an abort if an automatic claim operation does unwind.¹ My reasoning here is the same as I gave in my post on unwinding²:

Reasoning about unwinding is already very hard, it becomes nigh impossible if the sources of unwinding are hidden.
It would make for more efficient codegen if the compiler doesn’t have to account for unwinding, which would make code using claim() (automatically or explicitly) mildly more efficient than code using clone().

I was originally thinking of the Rust For Linux project when I wrote the wording on infallible, but their requirements around aborting are really orthogonal and much broader than Claim itself. They already don’t use the Rust standard library, or most dependencies, because they want to limit themselves to code that treats abort as an absolute last resort. Rather than abort on overflow, their version of reference counting opts simply to leak, for example, and their memory allocators return a Result to account for OOM conditions. I think the Claim trait will work just fine for them whatever we say on this point, as they’ll already have to screen for code that meets their more stringent criteria.

Clarifying `claim` codegen

In my post, I noted almost in passing that I would expect the compiler to still use memcpy at monomorphization time when it knew that the type being claimed implements Copy. One interesting bit of feedback I got was anecdotal evidence that this will indeed be cricital for performance.

To model the semantics I want for claim we would need specialization³. I’m going to use a variant of specialized that lcnr first proposed to me; the idea is to have an if impl expression that, at monomorphization time, either takes the true path (if the type implements Foo via always applicable impls) or the false path (otherwise). This is a cleaner formulation for specialization when the main thing you want to do is provide more optimized or alternative implementations.

Using that, we could write a function use_claim_value that defines the code the compiler should insert:

fn use_claim_value<T: Claim>(t: &T) -> T {
    std::panic::catch_unwind(|| {
        if impl T: Copy {
            // Copy T if we can
            *t
        } else {
            // Otherwise clone
            t.clone()
        }
    }).unwrap_or_else(|| {
        // Do not allow unwinding
        abort();
    })
}

This has three important properties:

No unwinding, for easier reasoning and better codegen.
Copies if it can.
Always calls clone otherwise.

Conclusion

What I really proposed

Effectively I proposed to change what it means to “use something by value” in Rust. This has always been a kind of awkward concept in Rust without a proper name, but I’m talking about what happens to the value x in any of these scenarios:

let x: SomeType;

// Scenario A: passing as an argument
fn consume(x: SomeType) {}
consume(x);

// Scenario B: assigning to a new place
let y = x;

// Scenario C: captured by a "move" closure
let c = move || x.operation();

// Scenario D: used in a non-move closure
// in a way that requires ownership
let d = || consume(x);

No matter which way you do it, the rules today are the same:

If SomeType: Copy, then x is copied, and you can go on using it later.
Else, x is moved, and you cannot.

I am proposing that, modulo the staging required for backwards compatibility, we change those rules to the following:

If SomeType: Claim, then x is claimed, and you can go on using it later.
Else, x is moved, and you cannot.

To a first approximation, “claiming” something means calling x.claim() (which is the same as x.clone()). But in reality we can be more efficient, and the definition I would use is as follows:

If the compiler sees x is “live” (may be used again later), it transforms the use of x to use_claimed_value(&x) (as defined earlier).
If x is dead, then it is just moved.

Why I proposed it

There’s a reason I proposed this change in the way that I did. I really value the way Rust handles “by value consumption” in a consistent way across all those contexts. It fits with Rust’s ethos of orthogonal, consistent rules that fit together to make a harmonious, usable whole.

My goal is to retain Rust’s consistency while also improving the gaps in the current rule, which neither highlights the things I want to pay attention to (large copies), hides the things I (almost always) don’t (reference count increments), nor covers all the patterns I sometimes want (e.g., being able to get and set a Cell>, which doesn’t work today because making Range: Copy would introduce footguns). My hope is that we can do this in a way that it benefits most every Rust program, whether it be low-level or high-level in nature.

In fact, I wonder if we could extend RFC #3288 to apply this retroactively to all operations invoked automatically by the compiler, like Deref, DerefMut, and Drop. Obviously this is technically backwards incompatible, but the benefits here could well be worth it in my view, and the code impacted seems very small (who intentionally panics in Deref?). ↩︎
Another blog post for which I ought to post a follow-up! ↩︎
Specialization has definitely acquired that “vaporware” reputation and for good reason—but I still think we can add it! That said, my thinking on the topic has evolved quite a bit. It’d be worth another post sometime. /me adds it to the queue. ↩︎

Claiming, auto and otherwise

2024-06-21T00:00:00+00:00

This blog post proposes adding a third trait, Claim, that would live alongside Copy and Clone. The goal of this trait is to improve Rust’s existing split, where types are categorized as either Copy (for “plain old data”¹ that is safe to memcpy) and Clone (for types that require executing custom code or which have destructors). This split has served Rust fairly well but also has some shortcomings that we’ve seen over time, including maintenance hazards, performance footguns, and (at times quite significant) ergonomic pain and user confusion.

TL;DR

The proposal in this blog post has three phases:

Adding a new Claim trait that refines Clone to identify “cheap, infallible, and transparent” clones (see below for the definition, but it explicitly excludes allocation). Explicit calls to x.claim() are therefore known to be cheap and easily distinguished from calls to x.clone(), which may not be. This makes code easier to understand and addresses existing maintenance hazards (obviously we can bikeshed the name).
Modifying the borrow checker to insert calls to claim() when using a value from a place that will be used later. So given e.g. a variable y: Rc>, an assignment like x = y would be transformed to x = y.claim() if y is used again later. This addresses the ergonomic pain and user confusion of reference-counted values in rust today, especially in connection with closures and async blocks.
Finally, disconnect Copy from “moves” altogether, first with warnings (in the current edition) and then errors (in Rust 2027). In short, x = y would move y unless y: Claim. Most Copy types would also be Claim, so this is largely backwards compatible, but it would let us rule out cases like y: [u8; 1024] and also extend Copy to types like Cell or iterators without the risk of introducing subtle bugs.

For some code, automatically calling Claim may be undesirable. For example, some data structure definitions track reference count increments closely. I propose to address this case by creating a “allow-by-default” automatic-claim lint that crates or modules can opt-into so that all “claims” can be made explicit. This is more-or-less the profile pattern, although I think it’s notable here that the set of crates which would want “auto-claim” do not necessarily fall into neat categories, as I will discuss.

Step 1: Introducing an explicit `Claim` trait

Quick, reading this code, can you tell me anything about it’s performance characteristics?

tokio::spawn({
    // Clone `map` and store it into another variable
    // named `map`. This new variable shadows the original.
    // We can now write code that uses `map` and then go on
    // using the original afterwards.
    let map = map.clone();
    async move { /* code using map */ }
});

/* more code using map */

Short answer: no, you can’t, not without knowing the type of map. The call to map.clone() may just be cloning a large map or incrementing a reference count, you can’t tell.

One-clone-fits-all creates a maintenance hazard

When you’re in the midst of writing code, you tend to have a good idea whether a given value is “cheap to clone” or “expensive”. But this property can change over the lifetime of the code. Maybe map starts out as an Rc> but is later refactored to HashMap. A call to map.clone() will still compile but with very different performance characteristics.

In fact, clone can have an effect on the program’s semantics as well. Imagine you have a variable c: Rc> and a call c.clone(). Currently this creates another handle to the same underlying cell. But if you refactor c to Cell, that call to c.clone() is now creating an independent cell. Argh. (We’ll see this theme, of the importance of distinguishing interior mutability, come up again later.)

Proposal: an explicit `Claim` trait distinguishing “cheap, infallible, transparent” clones

Now imagine we introduced a new trait Claim. This would be a subtrait of Clonethat indicates that cloning is:

Cheap: Claiming should complete in O(1) time and avoid copying more than a few cache lines (64-256 bytes on current arhictectures).
Infallible: Claim should not encounter failures, even panics or aborts, under any circumstances. Memory allocation is not allowed, as it can abort if memory is exhausted.
Transparent: The old and new value should behave the same with respect to their public API.

The trait itself could be defined like so:²

trait Claim: Clone {
    fn claim(&self) -> Self {
        self.clone()
    }
}

Now when I see code calling map.claim(), even without knowing what the type of map is, I can be reasonably confident that this is a “cheap clone”. Moreover, if my code is refactored so that map is no longer ref-counted, I will start to get compilation errors, letting me decide whether I want to clone here (potentially expensive) or find some other solution.

Step 2: Claiming values in assignments

In Rust today, values are moved when accessed unless their type implement the Copy trait. This means (among other things) that given a ref-counted map: Rc>, using the value map will mean that I can’t use map anymore. So e.g. if I do some_operation(map), then gives my handle to some_operation, preventing me from using it again.

Not all memcopies should be ‘quiet’

The intention of this rule is that something as simple as x = y should correspond to a simple operation at runtime (a memcpy, specifically) rather than something extensible. That, I think, is laudable. And yet the current rule in practice has some issues:

First, x = y can still result in surprising things happening at runtime. If y: [u8; 1024], for example, then a few simple calls like process1(y); process2(y); can easily copy large amounts of data (you probably meant to pass that by reference).
Second, seeing x = y.clone() (or even x = y.claim()) is visual clutter, distracting the reader from what’s really going on. In most applications, incrementing ref counts is simply not that interesting that it needs to be called out so explicitly.

Some things that should implement `Copy` do not

There’s a more subtle problem: the current rule means adding Copy impls can create correctness hazards. For example, many iterator types like std::ops::Range and std::vec::Iter could well be Copy, in the sense that they are safe to memcpy. And that would be cool, because you could put them in a Cell and then use get/set to manipulate them. But we don’t implement Copy for those types because it would introduce a subtle footgun:

let mut iter0 = vec.iter();
let mut iter1 = iter0;
iter1.next(); // does not effect `iter0`

Whether this is surprising or not depends on how well you know Rust – but definitely it would be clearer if you had to call clone explicitly:

let mut iter0 = vec.iter();
let mut iter1 = iter0.clone();
iter1.next();

Similar considerations are the reason we have not made Cell implement Copy.

The clone/copy rules interact very poorly with closures

The biggest source of confusion when it comes to clone/copy, however, is not about assignments like x = y but rather closures and async blocks. Combining ref-counted values with closures is a big stumbling block for new users. This has been true as long as I can remember. Here for example is a 2014 talk at Strangeloop in which the speaker devotes considerable time to the “accidental complexity” (their words, but I agree) they encountered navigating cloning and closures (and, I will note, how the term clone is misleading because it doesn’t mean a deep clone). I’m sorry to say that the situation they describe hasn’t really improved much since then. And, bear in mind, this speaker is a skilled programmer. Now imagine a novice trying to navigate this. Oh boy.

But it’s not just beginners who struggle! In fact, there isn’t really a convenient way to manage the problem of having to clone a copy of a ref-counted item for a closure’s use. At the RustNL unconf, Jonathan Kelley, who heads up the Dioxus Labs, described how at CloudFlare codebase they spent significant time trying to find the most ergonomic way to thread context (and these are not Rust novices).

In that setting, they had a master context object cx that had a number of subsystems, each of which was ref-counted. Before launching a new task, they would handle out handles to the subsystems that task required (they didn’t want every task to hold on to the entire context). They ultimately landed on a setup like this, which is still pretty painful:

let _io = cx.io.clone():
let _disk = cx.disk.clone():
let _health_check = cx.health_check.clone():
tokio::spawn(async move {
    do_something(_io, _disk, _health_check)
})

You can make this (in my opinion) mildly better by leveraging variable shadowing, but even then, it’s pretty verbose:

tokio::spawn({
    let io = cx.io.clone():
    let disk = cx.disk.clone():
    let health_check = cx.health_check.clone():
    async move {
        do_something(io, disk, health_check)
    }
})

What you really want is to just write something like this, like you would in Swift or Go or most any other modern language:³

tokio::spawn(async move {
    do_something(cx.io, cx.disk, cx.health_check)
})

“Autoclaim” to the rescue

What I propose is to modify the borrow checker to automatically invoke claim as needed. So e.g. an expression like x = y would be automatically converted to x = y.claim() if y will be used again later. And closures that capture variables in their environment would respect auto-claim as well, so move || process(y) would become { let y = y.claim(); move || process(y) } if y were used again later.

Autoclaim would not apply to the last use of a variable. So x = y only introduces a call to claim if it is needed to prevent an error. This avoids unnecessary reference counting.

Naturally, if the type of y doesn’t implement Claim, we would give a suitable error explaining that this is a move and the user should insert a call to clone if they want to make a cloned value.

Support opt-out with an allow-by-default lint

There is definitely some code that benefits from having the distinction between moving an existing handle and claiming a new one made explicit. For these cases, what I think we should do is add an “allow-by-default” automatic-claim lint that triggers whenever the compiler inserts a call to claim on a type that is not Copy. This is a signal that user-supplied code is running.

To aid in discovery, I would consider a automatic-operations lint group for these kind of “almost always useful, but sometimes not” conveniences; effectively adopting the profile pattern I floated at one point, but just by making it a lint group. Crates could then add automatic-operations = 'deny" (bikeshed needed) in the [lints] section of their Cargo.toml.

Step 3. Stop using `Copy` to control moves

Adding “autoclaim” addresses the ergonomic issues around having to call clone, but it still means that anything which is Copy can be, well, copied. As noted before that implies performance footguns ([u8;1024] is probably not something to be copied lightly) and correctness hazards (neither is an iterator).

The real goal should be to disconnect “can be memcopied” and “can be automatically copied”⁴. Once we have “autoclaim”, we can do that, thanks to the magic of lints and editions:

In Rust 2024 and before, we warn when x = y copies a value that is Copy but not Claim.
In the next Rust edition (Rust 2027, presumably), we make it a hard error so that the rule is just tied to Claim trait.

At codegen time, I would still expect us to guarantee that x = y will memcpy and will not invoke y.claim(), since technically the Clone impl may not be the same behavior; it’d be nice if we could extend this guarantee to any call to clone, but I don’t know how to do that, and it’s a separate problem. Furthermore, the automatic_claims lint would only apply to types that don’t implement Copy.⁵

Frequently asked questions

All right, I’ve laid out the proposal, let me dive into some of the questions that usually come up.

Are you ??!@$!$! nuts???

I mean, maybe? The Copy/Clone split has been a part of Rust for a long time⁶. But from what I can see in real codebases and daily life, the impact of this change would be a net-positive all around:

For most code, they get less clutter and less confusing error messages but the same great Rust taste (i.e., no impact on reliability or performance).
Where desired, projects can enable the lint (declaring that they care about performance as a side benefit). Furthermore, they can distinguish calls to claim (cheap, infallible, transparent) from calls to clone (anything goes).

What’s not to like?

What kind of code would `#[deny(automatic_claims)]`?

That’s actually an interesting question! At first I thought this would correspond to the “high-level, business-logic-oriented code” vs “low-level systems software” distinction, but I am no longer convinced.

For example, I spoke with someone from Rust For Linux who felt that autoclaim would be useful, and it doesn’t get more low-level than that! Their basic constraint is that they want to track carefully where memory allocation and other fallible operations occur, and incrementing a reference count is fine.

I think the real answer is “I’m not entirely sure”, we have to wait and see! I suspect it will be a fairly small, specialized set of projects. This is part of why I this this is a good idea.

Well my code definitely wants to track when ref-counts are incremented!

I totally get that! And in fact I think this proposal actually helps your code:

By setting #![deny(automatic_claims)], you declare up front the fact that reference counts are something you track carefully. OK, I admit not everything will consider this a pro. Regardless, it’s a 1-time setup cost.
By distinguishing claim from clone, your project avoids surprising performance footguns (this seems inarguably good).
In the next edition, when we no longer make Copy implicitly copy, you further avoid the footguns associated with that (also inarguably good).

Is this revisiting RFC 936?

Ooh, deep cut! RFC 936 was a proposal to split Pod (memcopyable values) from Copy (implicitly memcopyable values). At the time, we decided not to do this.⁷ I am even the one who summarized the reasons. The short version is that we felt it better to have a single trait and lints.

I am definitely offering another alternative aiming at the same problem identified by the RFC. I don’t think this means we made the wrong decision at the time. The problem was real, but the proposed solutions were not worth it. This proposal solves the same problems and more, and it has the benefit of ~10 years of experience.⁸ (Also, it’s worth pointing out that this RFC came two months before 1.0, and I definitely feel to avoid derailing 1.0 with last minute changes – stability without stagnation!)

Doesn’t having these “profile lints” split Rust?

A good question. Certainly on a technical level, there is nothing new here. We’ve had lints since forever, and we’ve seen that many projects use them in different ways (e.g., customized clippy levels or even – like the linux kernel – a dedicated custom linter). An important invariant is that lints define “subsets” of Rust, they don’t change it. Any given piece of code that compiles always means the same thing.

That said, the profile pattern does lower the cost to adding syntactic sugar, and I see a “slippery slope” here. I don’t want Rust to fundamentally change its character. We should still be aiming at our core constituency of programs that prioritize performance, reliability, and long-term maintenance.

How will we judge when an ergonomic change is “worth it”?

I think we should write up some design axioms. But it turns out we already have a first draft! Some years back Aaron Turon wrote an astute analysis in the “ergonomics initiative” blog post. He identified three axes to consider:

Applicability. Where are you allowed to elide implied information? Is there any heads-up that this might be happening?

Power. What influence does the elided information have? Can it radically change program behavior or its types?

Context-dependence. How much of do you have to know about the rest of the code to know what is being implied, i.e. how elided details will be filled in? Is there always a clear place to look?

Aaron concluded that "implicit features should balance these three dimensions. If a feature is large in one of the dimensions, it’s best to strongly limit it in the other two." In the case of autoclaim, the applicability is high (could happen a lot with no heads up) and the context dependence is medium-to-large (you have to know the types of things and traits they implement). We should therefore limit power, and this is why we put clear guidelines on who should implement Claim. And of course for the cases where that doesn’t suffice, the lint can limit the applicability to zero.

I like this analysis. I also want us to consider “who will want to opt-out and why” and see if there are simple steps (e.g., ruling out allocation) we can take which will minimize that while retaining the feature’s overall usefulness.

What about explicit closure autoclaim syntax?

In a recent lang team meeting Josh raised the idea of annotating closures (and presumably async blocks) with some form of syntax that means “they will auto-capture things they capture”. I find the concept appealing because I like having an explicit version of automatic syntax; also, projects that deny automatic_claim should have a lightweight alternative for cases where they want to be more explicit. However, I’ve not seen any actual specific proposal and I can’t think of one myself that seems to carry its weight. So I guess I’d say “sure, I like it, but I would want it in addition to what is in this blog post, not instead of”.

What about explicit closure capture clauses?

Ah, good question! It’s almost like you read my mind! I was going to add to the previous question that I do like the idea of having some syntax for “explicit capture clauses” on closures.

Today, we just have || $body (which implicitly captures paths in $body in some mode) and move || $body (which implicitly captures paths in $body by value).

Some years ago I wrote a draft RFC in a hackmd that I still mostly like (I’d want to revisit the details). The idea was to expand move to let it be more explicit about what is captured. So move(a, b) || $body would capture only a and b by value (and error if $body references other variables). But move(&a, b) || $body would capture a = &a. And move(a.claim(), b) || $body would capture a = a.claim().

This is really attacking a different problem, the fact that closure captures have no explicit form, but it also gives a canonical, lighterweight pattern for “claiming” values from the surrounding context.

How did you come up with the name `Claim`?

I thought Jonathan Kelley suggested it to me, but reviewing my notes I see he suggested Capture. Well, that’s a good name too. Maybe even a better one! I’ve already written this whole damn blog post using the name Claim, so I’m not going to go change it now. But I’d expect a proper bikeshed before taking any real action.

I love Wikipedia (of course), but using the name passive data structure (which I have never heard before) instead of plain old data feels very… well, very Wikipedia. ↩︎
In point of fact, I would prefer if we could define the claim method as “final”, meaning that it cannot be overridden by implementations, so that we would have a guarantee that x.claim() and x.clone() are identical. You can do this somewhat awkwardly by defining claim in an extension trait, like so, but it’d be a bit embarassing to have that in the standard library. ↩︎
Interestingly, when I read that snippet, I had a moment where I thought “maybe it should be async move { do_something(cx.io.claim(), ...) }?”. But of course that won’t work, that would be doing the claim in the future, whereas we want to do it before. But it really looks like it should work, and it’s good evidence for how non-obvious this can be. ↩︎
In effect I am proposing to revisit the decision we made in RFC 936, way back when. Actually, I have more thoughts on this, I’ll leave them to a FAQ! ↩︎
Oooh, that gives me an idea. It would be nice if in addition to writing x.claim() one could write x.copy() (similar to iter.copied()) to explicitly indicate that you are doing a memcpy. Then the compiler rule is basicaly that it will insert either x.claim() or x.copy() as appropriate for types that implement Claim. ↩︎
I’ve noticed I’m often more willing to revisit long-standing design decisions than others I talk to. I think it comes from having been present when the decisions were made. I know most of them were close calls and often began with “let’s try this for a while and see how it feels…”. Well, I think it comes from that and a certain predilection for recklessness. 🤘 ↩︎
This RFC is so old it predates rfcbot! Look how informal that comment was. Astounding. ↩︎
This seems to reflect the best and worst of Rust decision making. The best because autoclaim represents (to my mind) a nice “third way” in between two extreme alternatives. The worst because the rough design for autoclaim has been clear for years but it sometimes takes a long time for us to actually act on things. Perhaps that’s just the nature of the beast, though. ↩︎

The borrow checker within

2024-06-02T00:00:00+00:00

This post lays out a 4-part roadmap for the borrow checker that I call “the borrow checker within”. These changes are meant to help Rust become a better version of itself, enabling patterns of code which feel like they fit within Rust’s spirit, but run afoul of the letter of its law. I feel fairly comfortable with the design for each of these items, though work remains to scope out the details. My belief is that a-mir-formality will make a perfect place to do that work.

When I refer to the spirit of the borrow checker, I mean the rules of mutation xor sharing that I see as Rust’s core design ethos. This basic rule—that when you are mutating a value using the variable x, you should not also be reading that data through a variable y—is what enables Rust’s memory safety guarantees and also, I think, contributes to its overall sense of “if it compiles, it works”.

Mutation xor sharing is, in some sense, neither necessary nor sufficient. It’s not necessary because there are many programs (like every program written in Java) that share data like crazy and yet still work fine¹. It’s also not sufficient in that there are many problems that demand some amount of sharing – which is why Rust has “backdoors” like Arc>, AtomicU32, and—the ultimate backdoor of them all—unsafe.

But to me the biggest surprise from working on Rust is how often this mutation xor sharing pattern is “just right”, once you learn how to work with it². The other surprise has been seeing the benefits over time: programs written in this style are fundamentally “less surprising” which, in turn, means they are more maintainable over time.

In Rust today though there are a number of patterns that are rejected by the borrow checker despite fitting the mutation xor sharing pattern. Chipping away at this gap, helping to make the borrow checker’s rules a more perfect reflection of mutation xor sharing, is what I mean by the borrow checker within.

I saw the angel in the marble and carved until I set him free. — Michelangelo

OK, enough inspirational rhetoric, let’s get to the code.

Ahem, right. Let’s do that.

Step 1: Conditionally return references easily with “Polonius”

Rust 2018 introduced “non-lexical lifetimes” — this rather cryptic name refers to an extension of the borrow checker so that it understood the control flow within functions much more deeply. This change made using Rust a much more “fluid” experience, since the borrow checker was able to accept a lot more code.

But NLL does not handle one important case³: conditionally returning references. Here is the canonical example, taken from Remy’s Polonius update blog post:

fn get_default<'r, K: Hash + Eq + Copy, V: Default>(
    map: &'r mut HashMap<K, V>,
    key: K,
) -> &'r mut V {
    match map.get_mut(&key) {
        Some(value) => value,
        None => {
            map.insert(key, V::default());
            //  ------ 💥 Gets an error today,
            //            but not with polonius
            map.get_mut(&key).unwrap()
        }
    }
}

Remy’s post gives more details about why this occurs and how we plan to fix it. It’s mostly accurate except that the timeline has stretched on more than I’d like (of course). But we are making steady progress these days.

Step 2: A syntax for lifetimes based on places

The next step is to add an explicit syntax for lifetimes based on “place expressions” (e.g., x or x.y). I wrote about this in my post Borrow checking without lifetimes. This is basically taking the formulation that underlies Polonius and adding a syntax.

The idea would be that, in addition to the abstract lifetime parameters we have today, you could reference program variables and even fields as the “lifetime” of a reference. So you could write ’x to indicate a value that is “borrowed from the variable x”. You could also write ’x.y to indicate that it was borrowed from the field y of x, and even '(x.y, z) to mean borrowed from either x.y or z. For example:

struct WidgetFactory {
    manufacturer: String,
    model: String,
}

impl WidgetFactory {
    fn new_widget(&self, name: String) -> Widget {
        let name_suffix: &’name str = &name[3..];
                       // ——- borrowed from “name”
        let model_prefix: &’self.model str = &self.model[..2];
                         // —————- borrowed from “self.model”
    }
}

This would make many of lifetime parameters we write today unnecessary. For example, the classic Polonius example where the function takes a parameter map: &mut Hashmap and returns a reference into the map can be written as follows:

fn get_default<K: Hash + Eq + Copy, V: Default>(
    map: &mut HashMap<K, V>,
    key: K,
) -> &'map mut V {
    //---- "borrowed from the parameter map"
    ...
}

This syntax is more convenient — but I think its bigger impact will be to make Rust more teachable and learnable. Right now, lifetimes are in a tricky place, because

they represent a concept (spans of code) that isn’t normal for users to think explicitly about and
they don’t have any kind of syntax.

Syntax is useful when learning because it allows you to make everything explicit, which is a critical intermediate step to really internalizing a concept — what boats memorably called the dialectical ratchet. Anecdotally I’ve been using a “place-based” syntax when teaching people Rust and I’ve found it is much quicker for them to grasp it.

Step 3: View types and interprocedural borrows

The next piece of the plan is view types, which are a way to have functions declare which fields they access. Consider a struct like WidgetFactory…

struct WidgetFactory {
    counter: usize,
    widgets: Vec<Widget>,
}

…which has a helper function increment_counter…

impl WidgetFactory {
    fn increment_counter(&mut self) {
        self.counter += 1;
    }
}

Today, if we want to iterate over the widgets and occasionally increment the counter with increment_counter, we will encounter an error:

impl WidgetFactory {
    fn increment_counter(&mut self) {...}
    
    pub fn count_widgets(&mut self) {
        for widget in &self.widgets {
            if widget.should_be_counted() {
                self.increment_counter();
                // ^ 💥 Can't borrow self as mutable
                //      while iterating over `self.widgets`
            }
        }    
    }
}

The problem is that the borrow checker operates one function at a time. It doesn’t know precisely which fields increment_counter is going to mutate. So it conservatively assumes that self.widgets may be changed, and that’s not allowed. There are a number of workarounds today, such as writing a “free function” that doesn’t take &mut self but rather takes references to the individual fields (e.g., counter: &mut usize) or even collecting those references into a “view struct” (e.g., struct WidgetFactoryView<'a> { widgets: &'a [Widget], counter: &'a mut usize }) but these are non-obvious, annoying, and non-local (they require changing significant parts of your code)

View types extend struct types so that instead of just having a type like WidgetFactory, you can have a “view” on that type that included only a subset of the fields, like {counter} WidgetFactory. We can use this to modify increment_counter so that it declares that it will only access the field counter:

impl WidgetFactory {
    fn increment_counter(&mut {counter} self) {
        //               -------------------
        // Equivalent to `self: &mut {counter} WidgetFactory`
        self.counter += 1;
    }
}

This allows the compiler to compile count_widgets just fine, since it can see that iterating over self.widgets while modifying self.counter is not a problem.⁴

View types also address phased initialization

There is another place where the borrow checker’s rules fall short: phased initialization. Rust today follows the functional programming language style of requiring values for all the fields of a struct when it is created. Mostly this is fine, but sometimes you have structs where you want to initialize some of the fields and then invoke helper functions, much like increment_counter, to create the remainder. In this scenario you are stuck, because those helper functions cannot take a reference to the struct since you haven’t created the struct yet. The workarounds (free functions, intermediate struct types) are very similar.

Start with private functions, consider scaling to public functions

View types as described here have limitations. Because the types involve the names of fields, they are not really suitable for public interfaces. They could also be annoying to use in practice because one will have sets of fields that go together that have to be manually copied and pasted. All of this is true but I think something that can be addressed later (e.g., with named groups of fields).

What I’ve found is that the majority of times that I want to use view types, it is in private functions. Private methods often do little bits of logic and make use of the struct’s internal structure. Public methods in contrast tend to do larger operations and to hide that internal structure from users. This isn’t a universal law – sometimes I have public functions that should be callable concurrently – but it happens less.

There is also an advantage to the current behavior for public functions in particular: it preserves forward compatibilty. Taking &mut self (versus some subset of fields) means that the function can change the set of fields that it uses without affecting its clients. This is not a concern for private functions.

Step 4: Internal references

Rust today cannot support structs whose fields refer to data owned by another. This gap is partially closed through crates like rental (no longer maintained), though more often by modeling internal references with indices. We also have Pin, which covers the related (but even harder) problem of immobile data.

I’ve been chipping away at a solution to this problem for some time. I won’t be able to lay it out in full in this post, but I can sketch what I have in mind, and lay out more details in future posts (I have done some formalization of this, enough to convince myself it works).

As an example, imagine that we have some kind of Message struct consisting of a big string along with several references into that string. You could model that like so:

struct Message {
    text: String,
    headers: Vec<(&'self.text str, &'self.text str)>,
    body: &'self.text str,
}

This message would be constructed in the usual way:

let text: String = parse_text();
let (headers, body) = parse_message(&text);
let message = Message { text, headers, body };

where parse_message is some function like

fn parse_message(text: &str) -> (
    Vec<(&'text str, &'text str)>,
    &'text str
) {
    let mut headers = vec![];
    // ...
    (headers, body)
}

Note that Message doesn’t have any lifetime parameters – it doesn’t need any, because it doesn’t borrow from anything outside of itself. In fact, Message: 'static is true, which means that I could send this Message to another thread:

// A channel of `Message` values:
let (tx, rx) = std::sync::mpsc::channel();

// A thread to consume those values:
std:🧵:spawn(move || {
    for message in rx {
        // `message` here has type `Message`
        process(message.body);
    }
});

// Produce them:
loop {
    let message: Message = next_message();
    tx.send(message);
}

How far along are each of these ideas?

Roughly speaking…

Polonius – ‘just’ engineering
Syntax – ‘just’ bikeshedding
View types – needs modeling, one or two open questions in my mind⁵
Internal references – modeled in some detail for a simplified variant of Rust, have to port to Rust and explain the assumptions I made along the way⁶

…in other words, I’ve done enough work to to convince myself that these designs are practical, but plenty of work remains. :)

How do we prioritize this work?

Whenever I think about investing in borrow checker ergonomics and usability, I feel a bit guilty. Surely something so fun to think about must be a bad use of my time.

Conversations at RustNL shifted my perspective. When I asked people about pain points, I kept hearing the same few themes arise, especially from people trying building applications or GUIs.

I now think I had fallen victim to the dreaded “curse of knowledge”, forgetting how frustrating it can be to run into a limitation of the borrow checker and not know how to resolve it.

Conclusion

This post proposes four changes attacking some very long-standing problems:

Conditionally returned references, solved by Polonius
No or awkward syntax for lifetimes, solved by an explicit lifetime syntax
Helper methods whose body must be inlined, solved by view types
Can’t “package up” a value and references into that value, solved by interior references

You may have noticed that these changes build on one another. Polonius remodels borrowing in terms of “place expressions” (variables, fields). This enables an explicit lifetime syntax, which in turn is a key building block for interior references. View types in turn let us expose helper methods that can operate on ‘partially borrowed’ (or even partially initialized!) values.

Why these changes won’t make Rust “more complex” (or, if they do, it’s worth it)

You might wonder about the impact of these changes on Rust’s complexity. Certainly they grow the set of things the type system can express. But in my mind they, like NLL before them, fall into that category of changes that will actually make using Rust feel simpler overall.

To see why, put yourself in the shoes of a user today who has written any one of the “obviously correct” programs we’ve seen in this post – for example, the WidgetFactory code we saw in view types. Compiling this code today gives an error:

error[E0502]: cannot borrow `*self` as mutable
              because it is also borrowed as immutable
  --> src/lib.rs:14:17
   |
12 | for widget in &self.widgets {
   |               -------------
   |               |
   |               immutable borrow occurs here
   |               immutable borrow later used here
13 |     if widget.should_be_counted() {
14 |         self.increment_counter();
   |         ^^^^^^^^^^^^^^^^^^^^^^^^
   |         |
   |         mutable borrow occurs here

Despite all our efforts to render it well, this error is inherently confusing. It is not possible to explain why WidgetFactory doesn’t work from an “intuitive” point-of-view because conceptually it ought to work, it just runs up against a limit of our type system.

The only way to understand why WidgetFactory doesn’t compile is to dive deeper into the engineering details of how the Rust type system functions, and that is precisely the kind of thing people don’t want to learn. Moreover, once you’ve done that deep dive, what is your reward? At best you can devise an awkward workaround. Yay 🥳.⁷

Now imagine what happens with view types. You still get an error, but now that error can come with a suggestion:

help: consider declaring the fields
      accessed by `increment_counter` so that
      other functions can rely on that
 7 | fn increment_counter(&mut self) {
   |                      ---------
   |                      |
   |      help: annotate with accessed fields: `&mut {counter} self`

You now have two choices. First, you can apply the suggestion and move on – your code works! Next, at your leisure, you can dig in a bit deeper and understand what’s going on. You can learn about the semver hazards that motivate an explicit declaration here.

Yes, you’ve learned a new detail of the type system, but you did so on your schedule and, where extra annotations were required, they were well-motivated. Yay 🥳!⁸

Reifying the borrow checker into types

There is another theme running through here: moving the borrow checker analysis out from the compiler’s mind and into types that can be expressed. Right now, all types always represent fully initialized, unborrowed values. There is no way to express a type that captures the state of being in the midst of iterating over something or having moved one or two fields but not all of them. These changes address that gap.⁹

This conclusion is too long

I know, I’m like Peter Jackson trying to end “The Return of the King”, I just can’t do it! I keep coming up with more things to say. Well, I’ll stop now. Have a nice weekend y’all.

Well, every program written in Java does share data like crazy, but they do not all work fine. But you get what I mean. ↩︎
And I think learning how to work with mutation xor sharing is a big part of what it means to learn Rust. ↩︎
NLL as implemented, anyway. The original design was meant to cover conditionally returning references, but the proposed type system was not feasible to implement. Moreover, and I say this as the one who designed it, the formulation in the NLL RFC was not good. It was mind-bending and hard to comprehend. Polonius is much better. ↩︎
In fact, view types will also allow us to implement the “disjoint closure capture” rules from RFC 2229 in a more efficient way. Currently a closure using self.widgets and self.counter will store 2 references, kind of an implicit “view struct”. Although we found this doesn’t really affect much code in practice, it still bothers me. With view types they could store 1. ↩︎
To me, the biggest open question for view types is how to accommodate “strong updates” to types. I’d like to be able to do let mut wf: {} WidgetFactory = WidgetFactory {} to create a WidgetFactory value that is completely uninitialized and then permit writing (for example) wf.counter = 0. This should update the type of wf to {counter} WidgetFactory. Basically I want to link the information found in types with the borrow checker’s notion of what is initialized, but I haven’t worked that out in detail. ↩︎
As an example, to make this work I’m assuming some kind of “true deref” trait that indicates that Deref yields a reference that remains valid even as the value being deref’d moves from place to place. We need a trait much like this for other reasons too. ↩︎
That’s a sarcastic “Yay 🥳”, in case you couldn’t tell. ↩︎
This “Yay 🥳” is genuine. ↩︎
I remember years ago presenting Rust at some academic conference and a friendly professor telling me, “In my experience, you always want to get that state into the type system”. I think that professor was right, though I don’t regret not prioritizing it (always a million things to do, better to ask what is the right next step now than to worry about what step might’ve been better in the past). Anyway, I wish I could remember who that was! ↩︎

Unwind considered harmful?

2024-05-02T00:00:00+00:00

I’ve been thinking a wild thought lately: we should deprecate panic=unwind. Most production users I know either already run with panic=abort or use unwinding in a very limited fashion, basically just to run to cleanup, not to truly recover. Removing unwinding from most case meanwhile has a number of benefits, allowing us to extend the type system in interesting and potentially very impactful ways. It also removes a common source of subtle bugs. Note that I am not saying we should remove unwinding entirely: that’s not an option, both because of stability and because of Rust’s mission to “deeply integrate” with all kinds of languages and systems.

Unwinding means all code must be able to stop at every point

Unwinding puts a “non-local burden” on the language. The fundamental premise of unwinding is that it should be possible for all code to just stop execution at any point (or at least at any function call) and then be restarted. But this is not always possible. Sometimes code disturbs invariants which must be restored before execution can continue in a reasonable way.

The impact of unwinding was supposed to be contained

In Graydon’s initial sketches for Rust’s design, he was very suspicious of unwinding.¹ Unwinding introduces implicit control flow that is difficult to reason about. Worse, this control flow doesn’t surface during “normal execution”, it only shows up when things go wrong — this can tend to pile up, making a bad situation worse.

The initial idea was that unwinding would be allowed, but it would always unwinding the entire active thread. Moreover, since in very early Rust threads couldn’t share state at all (it was more like Erlang), that limited the damage that a thread could do. It was reasonable to assume that programs could recover.

But it escaped its bounds

Over time, both of the invariants that limited unwinding’s scope proved untenable. Most importantly, we added shared-mutability with types like Mutex. This was necessary to cover the full range of use cases Rust aims to cover, but it meant that it was now possible for threads to leave data in a disturbed state. We added “lock poisoning” to account for that, but it’s an ergonomic annoyance and an imperfect solution, and so libraries like parking_lot have simply removed it.

We also added catch_unwind, allowing recovery within a thread. This was meant to be used in libraries like rayon that were simulating many logical threads with one OS thread, but it of course opened the door to “catching” exceptions in other scenarios. We added the idea of UnwindSafe to try and discourage abuse, but (in a familiar theme) it’s an ergonomic annoyance and an imperfect solution, and so many folks would prefer to just remove it.

Unwinding increases binary size and reduces optimization potential

Unwinding is supposed to be a “zero-cost abstraction”, but it’s not really. To start, it requires inserting “landing pads” — basically, the code that will execute when unwinding occurs — which can take up quite a large amount of space in your binary. Folks like Fuchsia have measured binary size improvements of up to 10% by removing unwinding. Second, the need to account for unwinding limits optimizations, because the compiler has to account for more control-flow paths. I don’t have a number for how high of an impact this is, but it’s clearly not zero.

Unwinding puts limits on the borrow checker

Accounting for unwinding also requires the borrow checker to be more conservative. Consider for example the function std::mem::swap. It’d be nice if one could write this in safe code:

fn swap<T>(
    a: &mut T,
    b: &mut T,
) {
    let tmp = *a;
    *a = *b;
    *b = tmp;
}

This code won’t compile today, because let tmp = *a requires moving out of *a, and a is an &mut reference. That would leave the reference in an “incomplete” state, so we don’t allow it. But is that constraint truly needed? After all, the reference is going to be restored a few lines below…?

The reason the borrow checker does not accept code like the above is due to unwinding. In general, if you move out of an &mut, you leave a hole behind that MUST be filled before the function returns. In the function above, it is in fact guaranteed that the hole will be filled before swap returns. But in general there is a very narrow range of code that can safely execute, since any function call (and many other operations besides) can initiate a panic!. And if unwinding occurred, then the code that restores the &mut value would never execute. For this reason, we deemed it not worth the complexity to support moving out of &mut references.

Unwinding prevents code from running to completion

If the only cost of unwinding was moving out of &mutand inflated binary sizes, I would think that it’s probably worth it to keep it. But over time it’s become clear to me that this is just one special case of a more general challenge with unwinding, which is that functions simply cannot rely on running to completion. This creates challenges in a number of areas.

Unwinding makes unsafe code really hard to write

If you are writing unsafe code, you have to be very careful to account for possible unwinding. And it can occur in a lot of places! Some of them are obvious, such as when the user gives you a closure and you call it. Others are less obvious, such as when you call a trait method like x.clone() where x has some unknown type T: Clone. Others are downright obscure, such as when you execute vec[i] = new_value and vec is a Vec for some unknown type T — that last one will run the destructor on vec[i] , which can panic, and hence can unwind (at least until RFC #3288 is accepted). When developing Rayon, I found I could not feasibly track all the places that unwinding could occur, and thus gave up and just added code to abort if unwinding occurs when I don’t expect it.

Unwinding makes Must Move types untenable

In a previous blog post I wrote about the idea of must move types. I am not sure if this idea is worth it on balance (although I think it might be, it addresses an awful lot of scenarios) but I think it will not be workable with unwinding. And the reason is the same as everything else: the point of a “must move” type is that it must be moved before the fn ends. This effectively means there is some kind of action you must take. But unwinding assumes you can stop the function at any point, so you can never guarantee that this action gets taken (at least, not in a practical sense, in principle you could setup destructors to take the action, but it would be unworkable I think).

Unwinding is of course useful

I’ve been dunking on unwinding, but it is of course useful (although I suspect less broadly than is commonly believed). The most obvious use case is recovering in an “event-driven” sort of process, like a webserver or perhaps a GUI. We’ve all been to websites that dump a stack trace on our screen. Unwinding is one way that you could implement this sort of recovery in Rust. It’s not, however, the only way. We could look into constructs that leverage process-based recovery, for example. And of course unwinding-based recovery is a bit risky, if there is shared state. Plus, in practice, a good many things that become exceptions in Java are Result-return values in Rust.

For me, the key thing here is that virtually every network service I know of ships either with panic=abort or without really leveraging unwinding to recover, just to take cleanup actions and then exit. This could be done with panic=abort and exit handlers.

One other place that uses unwinding is the salsa framework, which uses it to abort cancelled operations in IDEs. It’s useful there because all the code is side-effect free, so we really can unwinding without any impact. But we could always find another solution to the problem.

Unwinding is in fact required…but only in narrow places

I don’t really think Rust should remove support for unwinding, of course. For one thing, there is backwards compatibility to consider. But for another, I think that Rust ought to have the goal that it ultimately supports any low-level thing you might want to do. There are C++ systems that use exceptions, and Rust ought to interoperate with them. But I don’t think that means the default across all of Rust should be unwinding: it’s more like “something you need in a narrow part of your codebase so you can convert to Result”.

Conclusion

I think the argument for deprecating unwinding boils down to this: unwinding purports to make cheap recovery tenable, but it’s not really reliable in the face of shared state. Meanwhile, it puts limits on what we can do in the language, ultimately decreasing reliability (because we can’t guarantee cleanup is done) and ease of use (borrow checker is stricter, APIs that would require cleanup can’t be written).

How could we deprecate it, though? It would basically become part of the ABI, much like C vs C-unwind. It’d be possible to opt-in on a finer-grained basis. In functions that are guaranteed not to have unwinding, the borrow checker could be more permissive, and must-move types could be supported.

I’m definitely tempted to sketch out what deprecating unwinding might look like in more detail. I’d be curious to hear from folks that rely on unwinding to better understand where it is useful— and if we can find alternatives that meet the need in a more narrowly tailored way!

For a time, we were exploring an alternative approach to panics called signals that didn’t use unwinding at all – the idea was that, for each error condition, you would expose a hook point (a “signal”) that users could customize to control what to do in the case of error. This proved a bit too unfamiliar and kind of a pain in practice, and we wound up backing away from it. Today’s panic hook is sort of a simpler version of that (it doesn’t support in-place recovery, but it does enable in-place cleanup). ↩︎

Sized, DynSized, and Unsized

2024-04-23T00:00:00+00:00

Extern types have been blocked for an unreasonably long time on a fairly narrow, specialized question: Rust today divides all types into two categories — sized, whose size can be statically computed, and unsized, whose size can only be computed at runtime. But for external types what we really want is a third category, types whose size can never be known, even at runtime (in C, you can model this by defining structs with an unknown set of fields). The problem is that Rust’s ?Sized notation does not naturally scale to this third case. I think it’s time we fixed this. At some point I read a proposal — I no longer remember where — that seems like the obvious way forward and which I think is a win on several levels. So I thought I would take a bit of time to float the idea again, explain the tradeoffs I see with it, and explain why I think the idea is a good change.

TL;DR: write `T: Unsized` in place of `T: ?Sized` (and sometimes `T: DynSized`)

The basic idea is to deprecate the ?Sized notation and instead have a family of Sized supertraits. As today, the default is that every type parameter T gets a T: Sized bound unless the user explicitly chooses one of the other supertraits:

/// Types whose size is known at compilation time (statically).
/// Implemented by (e.g.) `u32`. References to `Sized` types
/// are "thin pointers" -- just a pointer.
trait Sized: DynSized { }

/// Types whose size can be computed at runtime (dynamically).
/// Implemented by (e.g.) `[u32]` or `dyn Trait`.
/// References to these types are "wide pointers",
/// with the extra metadata making it possible to compute the size
/// at runtime.
trait DynSized: Unsized { }

/// Types that may not have a knowable size at all (either statically or dynamically).
/// All types implement this, but extern types **only** implement this.
trait Unsized { }

Under this proposal, T: ?Sized notation could be converted to T: DynSized or T: Unsized. T: DynSized matches the current semantics precisely, but T: Unsized is probably what most uses actually want. This is because most users of T: ?Sized never compute the size of T but rather just refer to existing values of T by pointer.

Credit where credit is due?

For the record, this design is not my idea, but I’m not sure where I saw it. I would appreciate a link so I can properly give credit.

Why do we have a default `T: Sized` bound in the first place?

It’s natural to wonder why we have this T: Sized default in the first place. The short version is that Rust would be very annoying to use without it. If the compiler doesn’t know the size of a value at compilation time, it cannot (at least, cannot easily) generate code to do a number of common things, such as store a value of type T on the stack or have structs with fields of type T. This means that a very large fraction of generic type parameters would wind up with T: Sized.

So why the `?Sized` notation?

The ?Sized notation was the result of a lot of discussion. It satisfied a number of criteria.

`?` signals that the bound operates in reverse

The ? is meant to signal that a bound like ?Sized actually works in reverse from a normal bound. When you have T: Clone, you are saying “type T must implement Clone”. So you are narrowing the set of types that T could be: before, it could have been both types that implement Clone and those that do not. After, it can only be types that implement Clone. T: ?Sized does the reverse: before, it can only be types that implement Sized (like u32), but after, it can also be types that do not (like [u32] or dyn Debug). Hence the ?, which can be read as “maybe” — i.e., T is “maybe” Sized.

`?` can be extended to other default bounds

The ? notation also scales to other default traits. Although we’ve been reluctant to exercise this ability, we wanted to leave room to add a new default bound. This power will be needed if we ever adopt “must move” types¹ or add a bound like ?Leak to signal a value that cannot be leaked.

But `?` doesn’t scale well to “differences in degree”

When we debated the ? notation, we thought a lot about extensibility to other orthogonal defaults (like ?Leak), but we didn’t consider extending a single dimension (like Sized) to multiple levels. There is no theoretical challenge. In principle we could say…

T means T: Sized + DynSized
T: ?Sized drops the Sized default, leaving T: DynSized
T: ?DynSized drops both, leaving any type T

…but I personally find that very confusing. To me, saying something “might be statically sized” does not signify that it is dynamically sized.

And `?` looks “more magical” than it needs to

Despite knowing that T: ?Sized operates in reverse, I find that in practice it still feels very much like other bounds. Just like T: Debug gives the function the extra capability of generating debug info, T: ?Sized feels to me like it gives the function an extra capability: the ability to be used on unsized types. This logic is specious, these are different kinds of capabilities, but, as I said, it’s how I find myself thinking about it.

Moreover, even though I know that T: ?Sized “most properly” means “a type that may or may not be Sized”, I find it wind up thinking about it as “a type that is unsized”, just as I think about T: Debug as a “type that is Debug”. Why is that? Well, beacuse ?Sized types may be unsized, I have to treat them as if they are unsized – i.e., refer to them only by pointer. So the fact that they might also be sized isn’t very relevant.

How would we use these new traits?

So if we adopted the “family of sized traits” proposal, how would we use it? Well, for starters, the size_of methods would no longer be defined as T and T: ?Sized…

fn size_of<T>() -> usize {}
fn size_of_val<T: ?Sized>(t: &T) -> usize {}

… but instead as T and T: DynSized …

fn size_of<T>() -> usize {}
fn size_of_val<T: DynSized>(t: &T) -> usize {}

That said, most uses of ?Sized today do not need to compute the size of the value, and would be better translated to Unsized…

impl<T: Unsized> Debug for &T {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) { .. }
}

Option: Defaults could also be disabled by supertraits?

As an interesting extension to today’s system, we could say that every type parameter T gets an implicit Sized bound unless either…

There is an explicit weaker alternative(like T: DynSized or T: Unsized);
Or some other bound T: Trait has an explicit supertrait DynSized or Unsized.

This would clarify that trait aliases can be used to disable the Sized default. For example, today, one might create a Value trait is equivalent to Debug + Hash + Org, roughly like this:

trait Value: Debug + Hash + Ord {
    // Note that `Self` is the *only* type parameter that does NOT get `Sized` by default
}

impl<T: ?Sized + Debug + Hash + Ord> Value for T {}

But what if, in your particular data structure, all values are boxed and hence can be unsized. Today, you have to repeat ?Sized everywhere:

struct Tree<V: ?Sized + Value> {
    value: Box<V>,
    children: Vec<Tree<V>>,
}

impl<V: ?Sized + Value> Tree<V> { … }

With this proposal, the explicit Unsized bound could be signaled on the trait:

trait Value: Debug + Hash + Ord + Unsized {
    // Note that `Self` is the *only* type parameter that does NOT get `Sized` by default
}

impl<T: Unsized + Debug + Hash + Ord> Value for T {}

which would mean that

struct Tree<V: Value> { … }

would imply V: Unsized.

Alternatives

Different names

The name of the Unsized trait in particular is a bit odd. It means “you can treat this type as unsized”, which is true of all types, but it sounds like the type is definitely unsized. I’m open to alternative names, but I haven’t come up with one I like yet. Here are some alternatives and the problems with them I see:

Unsizeable — doesn’t meet our typical name conventions, has overlap with the Unsize trait
NoSize, UnknownSize — same general problem as Unsize
ByPointer — in some ways, I kind of like this, because it says “you can work with this type by pointer”, which is clearly true of all types. But it doesn’t align well with the existing Sized trait — what would we call that, ByValue? And it seems too tied to today’s limitations: there are, after all, ways that we can make DynSized types work by value, at least in some places.
MaybeSized — just seems awkward, and should it be MaybeDynSized?

All told, I think Unsized is the best name. It’s a bit wrong, but I think you can understand it, and to me it fits the intuition I have, which is that I mark type parameters as Unsized and then I tend to just think of them as being unsized (since I have to).

Some sigil

Under this proposal, the DynSized and Unsized traits are “magic” in that explicitly declaring them as a bound has the impact of disabling a default T: Sized bound. We could signify that in their names by having their name be prefixed with some sort of sigil. I’m not really sure what that sigil would be — T: %Unsized? T: ?Unsized? It all seems unnecessary.

Drop the implicit bound altogether

The purist in me is tempted to question whether we need the default bound. Maybe in Rust 2027 we should try to drop it altogether. Then people could write

fn size_of<T: Sized>() -> usize {}
fn size_of_val<T: DynSized>(t: &T) -> usize {}

and

impl<T> Debug for &T {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) { .. }
}

Of course, it would also mean a lot of Sized bounds cropping up in surprising places. Beyond random functions, consider that every associated type today has a default Sized bound, so you would need

trait Iterator {
    type Item: Sized;
}

Overall, I doubt this idea is worth it. Not surprising: it was deemed too annoying before, and now it has the added problem of being hugely disruptive.

Conclusion

I’ve covered a design to move away from ?Sized bounds and towards specialized traits. There are avrious “pros and cons” to this proposal but one aspect in particular feels common to this question and many others: when do you make two “similar but different” concepts feel very different — e.g., via special syntax like T: ?Sized — and when do you make them feel very similar — e.g., via the idea of “special traits” where a bound like T: Unsized has extra meaning (disabling defaults).

There is a definite trade-off here. Distinct syntax help avoid potential confusion, but it forces people to recognize that something special is going on even when that may not be relevant or important to them. This can deter folks early on, when they are most “deter-able”. I think it can also contribute to a general sense of “big-ness” that makes it feel like understanding the entire language is harder.

Over time, I’ve started to believe that it’s generally better to make things feel similar, letting people push off the time at which they have to learn a new concept. In this case, this lessens my fears around the idea that Unsized and DynSized traits would be confusing because they behave differently than other traits. In this particular case, I also feel that ?Sized doesn’t “scale well” to default bounds where you want to pick from one of many options, so it’s kind of the worst of both worlds – distinct syntax that shouts at you but which also fails to add clarity.

Ultimately, though, I’m not wedded to this idea, but I am interested in kicking off a discussion of how we can unblock extern types. I think by now we’ve no doubt covered the space pretty well and we should pick a direction and go for it (or else just give up on extern types).

I still think “must move” types are a good idea — but that’s a topic for another post. ↩︎

Ownership in Rust

2024-04-05T00:00:00+00:00

Ownership is an important concept in Rust — but I’m not talking about the type system. I’m talking about in our open source project. One of the big failure modes I’ve seen in the Rust community, especially lately, is the feeling that it’s unclear who is entitled to make decisions. Over the last six months or so, I’ve been developing a project goals proposal, which is an attempt to reinvigorate Rust’s roadmap process — and a key part of this is the idea of giving each goal an owner. I wanted to write a post just exploring this idea of being an owner: what it means and what it doesn’t.

Every goal needs an owner

Under my proposal, the project will identify its top priority goals, and every goal will have a designated owner. This is ideally a single, concrete person, though it can be a small group. Owners are the ones who, well, own the design being proposed. Just like in Rust, when they own something, they have the power to change it.¹

Just because owners own the design does not mean they work alone. Like any good Rustacean, they should treasure dissent, making sure that when a concern is raised, the owner fully understands it and does what they can to mitigate or address it. But there always comes a point where the tradeoffs have been laid on the table, the space has been mapped, and somebody just has to make a call about what to do. This is where the owner comes in. Under project goals, the owner is the one we’ve chosen to do that job, and they should feel free to make decisions in order to keep things moving.

Teams make the final decision

Owners own the proposal, but they don’t decide whether the proposal gets accepted. That is the job of the team. So, if e.g. the goal in question requires making a change to the language, the language design team is the one that ultimately decides whether to accept the proposal.

Teams can ultimately overrule an owner: they can ask the owner to come back with a modified proposal that weighs the tradeoffs differently. This is right and appropriate, because teams are the ones we recognize as having the best broad understanding of the domain they maintain.² But teams should use their power judiciously, because the owner is typically the one who understands the tradeoffs for this particular goal most deeply.

Ownership is empowerment

Rust’s primary goal is empowerment — and that is as true for the open-source org as it is for the language itself. Our goal should be to empower people to improve Rust. That does not mean giving them unfettered ability to make changes — that would result in chaos, not an improved version of Rust — but when their vision is aligned with Rust’s values, we should ensure they have the capability and support they need to realize it.

Ownership requires trust

There is an interesting tension around ownership. Giving someone ownership of a goal is an act of faith — it means that we consider them to be an individual of high judgment who understands Rust and its values and will act accordingly. This implies to me that we are unlikely to take a goal if the owner is not known to the project. They don’t necessarily have to have worked on Rust, but they have to have enough of a reputation that we can evaluate whether they’re going to do a good job.’

The design of project goal proposals includes steps designed to increase trust. Each goal includes a set of design axioms identifying the key tradeoffs that are expected and how they will be weighed against one another. The goal also identifies milestones, which shows that the author has thought about how to breakup and approach the work incrementally.

It’s also worth highlighting that while the project has to trust the owner, the reverse is also true: the project hasn’t always done a good job of making good on its commitments. Sometimes we’ve asked for a proposal on a given feature and then not responded when it arrives.³ Or we set up unbounded queues that wind up getting overfull, resulting in long delays.

The project goal system has steps to build that kind of trust too: the owner identifies exactly the kind of support they expect to require from the team, and the team commits to provide it. Moreover, the general expectation is that any project goal represents an important priority, and so teams should prioritize nominated issues and the like that are related.

Trust requires accountability

Trust is something that has to be maintained over time. The primary mechanism for that in the project goal system is regular reporting. The idea is that, once we’ve identified a goal, we will create a tracking issue. Bots will prompt owners to give regular status updates on the issue. Then, periodically, we will post a blog post that aggregates these status updates. This gives us a chance to identify goals that haven’t been moving — or at least where no status update has been provided — and take a look as to see why.

In my view, it’s expected and normal that we will not make all our goals. Things happen. Sometimes owners get busy with other things. Other times, priorities change and what was once a goal no longer seems relevant. That’s fine, but we do want to be explicit about noticing it has happened. The problem is when we let things live in the dark, so that if you want to really know what’s going on, you have to conduct an exhaustive archaeological expedition through github comments, zulip threads, emails, and sometimes random chats and minutes.

Conclusion

Rust has strong values of being an open, participatory language. This is a good thing and a key part of how Rust has gotten as good as it is. Rust’s design does not belong to any one person. A key part of how we enforce that is by making decisions by consensus.

But people sometimes get confused and think consensus means that everyone has to agree. This is wrong on two levels:

The team must be in consensus, not the RFC thread: in Rust’s system, it’s the teams that ultimately make the decision. There have been plenty of RFCs that the team decided to accept despite strong opposition from the RFC thread (e.g., the ? operator comes to mind). This is right and good. The team has the most context, but the team also gets input from many other sources beyond the people that come to participate in the RFC thread.
Consensus doesn’t mean unanimity: Being in consensus means that a majority agrees with the proposal and nobody thinks that it is definitely wrong. Plenty of proposals are decided where team members have significant, even grave, doubts. But ultimately tradeoffs must be made, and the team members trust one another’s judgment, so sometimes proposals go forward that aren’t made the way you would do it.

The reality is that every good thing that ever got done in Rust had an owner – somebody driving the work to completion. But we’ve never named those owners explicitly or given them a formal place in our structure. I think it’s time we fixed that!

Hat tip to Jack Huey for this turn of phrase. Clever guy. ↩︎
There is a common misunderstanding that being on a Rust team for a project X means you are the one authoring code for X. That’s not the role of a team member. Team members hold the overall design of X in their heads. They review changes and mentor contributors who are looking to make a change. Of course, team members do sometimes write code, too, but in that case they are playing the role of a (particularly knowledgable) contributor. ↩︎
I still feel bad about delegation. ↩︎

Borrow checking without lifetimes

2024-03-04T00:00:00+00:00

This blog post explores an alternative formulation of Rust’s type system that eschews lifetimes in favor of places. The TL;DR is that instead of having 'a represent a lifetime in the code, it can represent a set of loans, like shared(a.b.c) or mut(x). If this sounds familiar, it should, it’s the basis for polonius, but reformulated as a type system instead of a static analysis. This blog post is just going to give the high-level ideas. In follow-up posts I’ll dig into how we can use this to support interior references and other advanced borrowing patterns. In terms of implementation, I’ve mocked this up a bit, but I intend to start extending a-mir-formality to include this analysis.

Why would you want to replace lifetimes?

Lifetimes are the best and worst part of Rust. The best in that they let you express very cool patterns, like returning a pointer into some data in the middle of your data structure. But they’ve got some serious issues. For one, the idea of what a lifetime is rather abstract, and hard for people to grasp (“what does 'a actually represent?”). But also Rust is not able to express some important patterns, most notably interior references, where one field of a struct refers to data owned by another field.

So what is a lifetime exactly?

Here is the definition of a lifetime from the RFC on non-lexical lifetimes:

Whenever you create a borrow, the compiler assigns the resulting reference a lifetime. This lifetime corresponds to the span of the code where the reference may be used. The compiler will infer this lifetime to be the smallest lifetime that it can have that still encompasses all the uses of the reference.

Read the RFC for more details.

Replacing a lifetime with an origin

Under this formulation, 'a no longer represents a lifetime but rather an origin – i.e., it explains where the reference may have come from. We define an origin as a set of loans. Each loan captures some place expression (e.g. a or a.b.c), that has been borrowed along with the mode in which it was borrowed (shared or mut).

Origin = { Loan }

Loan = shared(Place)
     | mut(Place)

Place = variable(.field)*  // e.g., a.b.c

Defining types

Using origins, we can define Rust types roughly like this (obviously I’m ignoring a bunch of complexity here…):

Type = TypeName < Generic* >
     | & Origin Type
     | & Origin mut Type
     
TypeName = u32 (for now I'll ignore the rest of the scalars)
         | ()  (unit type, don't worry about tuples)
         | StructName
         | EnumName
         | UnionName

Generic = Type | Origin

Here is the first interesting thing to note: there is no 'a notation here! This is because I’ve not introduced generics yet. Unlike Rust proper, this formulation of the type system has a concrete syntax (Origin) for what 'a represents.

Explicit types for a simple program

Having a fully explicit type system also means we can easily write out example programs where all types are fully specified. This used to be rather challenging because we had no notation for lifetimes. Let’s look at a simple example, a program that ought to get an error:

let mut counter: u32 = 22_u32;
let p: & /*{shared(counter)}*/ u32 = &counter;
//       ---------------------
//       no syntax for this today!
counter += 1; // Error: cannot mutate `counter` while `p` is live
println!("{p}");

Apart from the type of p, this is valid Rust. Of course, it won’t compile, because we can’t modify counter while there is a live shared reference p (playground). As we continue, you will see how the new type system formulation arrives at the same conclusion.

Basic typing judgments

Typing judgments are the standard way to describe a type system. We’re going to phase in the typing judgments for our system iteratively. We’ll start with a simple, fairly standard formulation that doesn’t include borrow checking, and then show how we introduce borrow checking. For this first version, the typing judgment we are defining has the form

Env |- Expr : Type

This says, “in the environment Env, the expression Expr is legal and has the type Type”. The environment Env here defines the local variables in scope. The Rust expressions we are looking at for our sample program are pretty simple:

Expr = integer literal (e.g., 22_u32)
     | & Place
     | Expr + Expr
     | Place (read the value of a place)
     | Place = Expr (overwrite the value of a place)
     | ...

Since we only support one scalar type (u32), the typing judgment for Expr + Expr is as simple as:

Env |- Expr1 : u32
Env |- Expr2 : u32
----------------------------------------- addition
Env |- Expr1 + Expr2 : u32

The rule for Place = Expr assignments is based on subtyping:

Env |- Expr : Type1
Env |- Place : Type2
Env |- Type1 <: Type2
----------------------------------------- assignment
Env |- Place = Expr : ()

The rule for &Place is somewhat more interesting:

Env |- Place : Type
----------------------------------------- shared references
Env |- & Place : & {shared(Place)} Type

The rule just says that we figure out the type of the place Place being borrowed (here, the place is counter and its type will be u32) and then we have a resulting reference to that type. The origin of that reference will be {shared(Place)}, indicating that the reference came from Place:

&{shared(Place)} Type

Computing liveness

To introduce borrow checking, we need to phase in the idea of liveness.¹ If you’re not familiar with the concept, the NLL RFC has a nice introduction:

The term “liveness” derives from compiler analysis, but it’s fairly intuitive. We say that a variable is live if the current value that it holds may be used later.

Unlike with NLL, where we just computed live variables, we’re going to compute live places:

LivePlaces = { Place }

To compute the set of live places, we’ll introduce a helper function LiveBefore(Env, LivePlaces, Expr): LivePlaces. LiveBefore() returns the set of places that are live before Expr is evaluated, given the environment Env and the set of places live after expression. I won’t define this function in detail, but it looks roughly like this:

// `&Place` reads `Place`, so add it to `LivePlaces`
LiveBefore(Env, LivePlaces, &Place) =
    LivePlaces ∪ {Place}

// `Place = Expr` overwrites `Place`, so remove it from `LivePlaces`
LiveBefore(Env, LivePlaces, Place = Expr) =
    LiveBefore(Env, (LivePlaces - {Place}), Expr)

// `Expr1` is evaluated first, then `Expr2`, so the set of places
// live after expr1 is the set that are live *before* expr2
LiveBefore(Env, LivePlaces, Expr1 + Expr2) =
    LiveBefore(Env, LiveBefore(Env, LivePlaces, Expr2), Expr1)
    
... etc ...

Integrating liveness into our typing judgments

To detect borrow check errors, we need to adjust our typing judgment to include liveness. The result will be as follows:

(Env, LivePlaces) |- Expr : Type

This judgment says, “in the environment Env, and given that the function will access LivePlaces in the future, Expr is valid and has type Type”. Integrating liveness in this way gives us some idea of what accesses will happen in the future.

For compound expressions, like Expr1 + Expr2, we have to adjust the set of live places to reflect control flow:

LiveAfter1 = LiveBefore(Env, LiveAfter2, Expr2)
(Env, LiveAfter1) |- Expr1 : u32
(Env, LiveAfter2) |- Expr2 : u32
----------------------------------------- addition
(Env, LiveAfter2) |- Expr1 + Expr2 : u32

We start out with LiveAfter2, i.e., the places that are live after the entire expression. These are also the same as the places live after expression 2 is evaluated, since this expression doesn’t itself reference or overwrite any places. We then compute LiveAfter1 – i.e., the places live after Expr1 is evaluated – by looking at the places that are live before Expr2. This is a bit mind-bending and took me a bit of time to see. The tricky bit here is that liveness is computed backwards, but most of our typing rules (and intution) tends to flow forwards. If it helps, think of the “fully desugared” version of +:

let tmp0 = 
    // <-- the set LiveAfter1 is live here (ignoring tmp0, tmp1)
let tmp1 = 
    // <-- the set LiveAfter2 is live here (ignoring tmp0, tmp1)
tmp0 + tmp1
    // <-- the set LiveAfter2 is live here

Borrow checking with liveness

Now that we know liveness information, we can use it to do borrow checking. We’ll introduce a “permits” judgment:

(Env, LiveAfter) permits Loan

that indicates that “taking the loan Loan would be allowed given the environment and the live places”. Here is the rule for assignments, modified to include liveness and the new “permits” judgment:

(Env, LiveAfter - {Place}) |- Expr : Type1
(Env, LiveAfter) |- Place : Type2
(Env, LiveAfter) |- Type1 <: Type2
(Env, LiveAfter) permits mut(Place)
----------------------------------------- assignment
(Env, LiveAfter) |- Place = Expr : ()

Before I dive into how we define “permits”, let’s go back to our example and get an intution for what is going on here. We want to declare an error on this assigment:

let mut counter: u32 = 22_u32;
let p: &{shared(counter)} u32 = &counter;
counter += 1; // <-- Error
println!("{p}"); // <-- p is live

Note that, because of the println! on the next line, p will be in our LiveAfter set. Looking at the type of p, we see that it includes the loan shared(counter). The idea then is that mutating counter is illegal because there is a live loan shared(counter), which implies that counter must be immutable.

Restating that intution:

A set Live of live places permits a loan Loan1 if, for every live place Place in Live, the loans in the type of Place are compatible with Loan1.

Written more formally:

∀ Place ∈ Live {
    (Env, Live) |- Place : Type
    ∀ Loan2 ∈ Loans(Type) { Compatible(Loan1, Loan2) }
}
-----------------------------------------
(Env, Live) permits Loan1

This definition makes use of two helper functions:

Loans(Type) – the set of loans that appear in the type
Compatible(Loan1, Loan2) – defines if two loans are compatible. Two shared loans are always compatible. A mutable loan is only compatible with another loan if the places are disjoint.

Conclusion

The goal of this post was to give a high-level intution. I wrote it from memory, so I’ve probably overlooked a thing or two. In follow-up posts though I want to go deeper into how the system I’ve been playing with works and what new things it can support. Some high-level examples:

How to define subtyping, and in particular the role of liveness in subtyping
Important borrow patterns that we use today and how they work in the new system
Interior references that point at data owned by other struct fields and how it can be supported

If this is not obvious to you, don’t worry, it wasn’t obvious to me either. It turns out that using liveness in the rules is the key to making them simple. I’ll try to write a follow-up about the alternatives I explored and why they don’t work later on. ↩︎

What I'd like to see for Async Rust in 2024 🎄

2024-01-03T00:00:00+00:00

Well, it’s that time of year, when thoughts turn to…well, Rust of course. I guess that’s every time of year. This year was a pretty big year for Rust, though I think a lot of what happened was more in the vein of “setting things up for success in 2024”. So let’s talk about 2024! I’m going to publish a series of blog posts about different aspects of Rust I’m excited about, and what I think we should be doing. To help make things concrete, I’m going to frame the 2024 by using proposed project goals – basically a specific piece of work I think we can get done this year. In this first post, I’ll focus on async Rust.

What we did in 2023

On Dec 28, with the release of Rust 1.75.0, we stabilized async fn and impl trait in traits. This is a really big deal. Async fn in traits has been “considered hard” since 2019 and they’re at the foundation of basically everything that we need to do to make async better.

Async Rust to me showcases the best and worst of Rust. It delivers on that Rust promise of “high-level code, low-level performance”. Building on the highly tuned Tokio runtime, network services in Rust consistently have tighter tail latency and lower memory usage, which means you can service a lot more clients with a lot less resources. Alternatively, because Rust doesn’t hardcode the runtime, you can write async Rust code that targets embedded environments that don’t even have an underlying operating system, or anywhere in between.

And yet it continues to be true that, in the words of an Amazon engineer I talked to, “Async Rust is Rust on hard mode”. Truly closing this gap requires work in the language, standard library, and the ecosystem. We won’t get all the way there in 2024, but I think we can make some big strides.

Proposed goal: Solve the send bound problem in Q2

We made a lot of progress on async functions in traits last year, but we still can’t cover the use case of generic traits that can be used either with a work-stealing executor or without one. One very specific example of this is the Service trait from tower. To handle this use case, we need a solution to the send bound problem. We have a bunch of idea for what this might be, and we’ve even got a prototype implementation for (a subset of) return type notation, so we are well positioned for success. I think we should aim to finish this by the end of Q2 (summer, basically). This in turn would unblock a 1.0 release of the tower crate, letting us having a stable trait for middleware.

Proposed goal: Stabilize an MVP for async closures in Q3

The holy grail for async is that you should be able to easily make any synchronous function into an asynchronous one. The 2019 MVP supported only top-level functions and inherent methods. We’ve now extended that to include trait methods. In 2024, we should take the next step and support async closures. This will allow people to define combinator methods like iterator map and so forth and avoid the convoluted workarounds currently required.

For this first goal, I think we should be working to establish an MVP. Recently, Errs and I outlined an MVP we thought seemed quite doable. It began with creating AsyncFn traits that look that mirror the Fn trait hierarchy…

trait AsyncFnOnce<A> {
    type Output;
    
    async fn call_once(self, args: A) -> Self::Output;
}

trait AsyncFnMut<A>: AsyncFnOnce<A> {
    async fn call_mut(&mut self, args: A) -> Self::Output;
}

trait AsyncFn<A>: AsyncFnMut<A> {
    async fn call(self, args: A) -> Self::Output;
}

…and the ability to write async closures like async || , as well as a bridge such that any function that returns a future also implements the appropiate AsyncFn traits. Async clsoures would unblock us from creating combinator traits, like a truly nice version of async iterators.

This MVP is not intended as the final state, but it is intended to be compatible with whatever final state we wind up with. There remains a really interesing question about how to integrate the AsyncFn traits with the regular Fn traits. Nonetheless, I think we can stabilize the above MVP in parallel with exploring that question.

Proposed goal: Author an RFC for “maybe async” in Q4 (or decide not to!)

One of the big questions around async is whether we should be supporting some way to write “maybe async” code. This idea has gone through a lot of names. Yosh and Oli originally kicked off something they called keyword generics and later rebranded as effect generics. I prefer the framing of trait transformers, and I wrote a blog post about how trait transformers can make async closures fit nicely.

There is significant skepticism about whether this is a good direction. There are other ways to think about async closures (though Errs pointed out an issue with this that I hope to write about in a future post). Boats has written a number of blog posts with concerns, and members of the types team have expressed fear about what will be required to write code that is generic over effects. These concerns make a lot of sense to me!

Overall, I still believe that something like trait transformers could make Rust feel simpler and help us scale to future needs. But I think we have to prove our case! My goal for 2024 then is to do exactly that. The idea would be to author an RFC laying out a “maybe async” scheme and to get that RFC accepted. To address the concerns of the types team, I think that will require modeling “maybe async” formally as part of a-mir-formality, so that everybody can understand how it will work.

Another possible outcome here is that we opt to abandon the idea. Maybe the complexity really is infeasible. Or maybe the lang design doesn’t feel right. I’m good with that too, but either way, I think we need to settle on a plan this year.

Stretch goal: stabilize generator syntax

As a stretch goal, it would be really cool to land support for generator expressions – basically a way to write async iterators. Errs recently opened a PR adding nightly support for async and RFC #3513 proposed reserving the gen keyword for Rust 2024. Really stabilizing generators however requires us to answer some interesting questions about the best design for the async iteration trait. Thanks to the stabilization of async fn in trait, we can now have this conversation – and we have certainly been having it! Over the last month or so there has also been a lot of interesting back and forth about the best setup. I’m still digesting all the posts, I hope to put up some thoughts this month (no promises). Regardless, I think it’s plausible that we could see async genreators land in 2024, which would be great, as it would eliminate the major reason that people have to interact directly with Pin.

Conclusion: looking past 2024

If we accomplish the goals I outlined above, async Rust by the end of 2024 will be much improved. But there will still be a few big items before we can really say that we’ve laid out the pieces we need. Sadly, we can’t do it all, so these items would have to wait until after 2024, though I think we will continue to experiment and discuss their design:

Async drop: Once we have async closures, there remains one place where you cannot write an async function – the Drop trait. Async drop has a bunch of interesting complications (Sabrina wrote a great blog post on this!), but it is also a major pain point for users. We’ll get to it!
Dyn async trait: Besides send bounds, the other major limitation for async fn in trait is that traits using them do not yet support dynamic dispatch. We should absolutely lift this, but to me it’s lower in priority because there is an existing workaround of using a proc-macro to create a DynAsyncTrait type. It’s not ideal, but it’s not as fundamental a limitation as send bounds or the lack of async closures and async drop. (That said, the design work for this is largely done, so it is entirely possible that we land it this year as a drive-by piece of work.)
Traits for being generic over runtimes: Async Rust’s ability to support runtimes as varied as Tokio and Embassy is one of its superpowers. But the fact that switching runtimes or writing code that is generic over what runtime it uses is very hard to impossible is a key pain point, made even worse by the fact that runtimes often don’t play nice together. We need to build out traits for interop, starting with [async read + write] but eventually covering [task spawning and timers].
Better APIs: Many of the nastiest async Rust bugs come about when users are trying to manage nested tasks. Existing APIs like FutureUnordered and select have a lot of rough edges and can easily lead to deadlock – Tyler had a good post on this. I would like to see us take a fresh look at the async APIs we offer Rust programmers and build up a powerful, easy to use library that helps steer people away from potential sources of deadlock. Ideally this API would not be specific to the underlying runtime, but instead let users switch between different runtimes, and hopefully cleanly support embedded systems (perhaps with limited functionality). I don’t think we know how to do this yet, and I think that doing it will require us to have a lot more tools (things like send bounds, async closure, and quite possibly trait transformers or async drop).

Being Rusty: Discovering Rust's design axioms

2023-12-07T00:00:00+00:00

To your average Joe, being “rusty” is not seen as a good thing.¹ But readers of this blog know that being Rusty – with a capitol R! – is, of course, something completely different! So what is that makes Rust Rust? Our slogans articulate key parts of it, like fearless concurrency, stability without stagnation, or the epic Hack without fear. And there is of course Lindsey Kuper’s epic haiku: “A systems language / pursuing the trifecta: / fast, concurrent, safe”. But I feel like we’re still missing a unified set of axioms that we can refer back to over time and use to guide us as we make decisions. Some of you will remember the Rustacean Principles, which was my first attempt at this. I’ve been dissatisfied with them for a couple of reasons, so I decided to try again. The structure is really different, so I’m calling it Rust’s design axioms. This post documents the current state – I’m quite a bit happier with it! But it’s not quite there yet. So I’ve also got a link to a repository where I’m hoping people can help improve them by opening issues with examples, counter-examples, or other thoughts.

Axioms capture the principles you use in your decision-making process

What I’ve noticed is that when I am trying to make some decision – whether it’s a question of language design or something else – I am implicitly bringing assumptions, intuitions, and hypotheses to bear. Oftentimes, those intutions fly by very quickly in my mind, and I barely even notice them. Ah yeah, we could do X, but if we did that, it would mean Y, and I don’t want that, scratch that idea. I’m slowly learning to be attentive to these moments – whatever Y is right there, it’s related to one of my design axioms — something I’m implicitly using to shape my thinking.

I’ve found that if I can capture those axioms and write them out, they can help me down the line when I’m facing future decisions. It can also help to bring alignment to a group of people by making those intutions explicit (and giving people a chance to refute or sharpen them). Obviously I’m not the first to observe this. I’ve found Amazon’s practice of using tenets to be quite useful², for example, and I’ve also been inspired by things I’ve read online about the importance of making your hypotheses explicit.³

In proof systems, your axioms are the things that you assert to be true and take on faith, and from which the rest of your argument follows. I choose to call these Rust’s design axioms because that seemed like exactly what I was going for. What are the starting assumptions that, followed to their conclusion, lead you to Rust? The more clearly we can articulate those assumptions, the better we’ll be able to ensure that we continue to follow them as we evolve Rust to meet future needs.

Axioms have a hypothesis and a consequence

I’ve structured the axioms in a particular way. They begin by stating the axiom itself – the core belief that we assert to be true. That is followed by a consequence, which is something that we do as a result of that core belief. To show you what I mean, here is one of the Rust design axioms I’ve drafted:

Rust users want to surface problems as early as possible, and so Rust is designed to be reliable. We make choices that help surface bugs earlier. We don’t make guesses about what our users meant to do, we let them tell us, and we endeavor to make the meaning of code transparent to its reader. And we always, always guarantee memory safety and data-race freedom in safe Rust code.

Axioms have an ordering and earlier things take priority

Each axiom is useful on its own, but where things become interesting is when they come into conflict. Consider reliability: that is a core axiom of Rust, no doubt, but is it the most important? I would argue it is not. If it were, we wouldn’t permit unsafe code, or at least not without a safety proof. I think our core axiom is actually that Rust is is meant to be used, and used for building a particular kind of program. I articulated it like this:

Rust is meant to empower everyone to build reliable and efficient software, so above all else, Rust needs to be accessible to a broad audience. We avoid designs that will be too complex to be used in practice. We build supportive tooling that not only points out potential mistakes but helps users understand and fix them.

When it comes to safety, I think Rust’s approach is eminently practical. We’ve designed a safe type system that we believe covers 90-95% of what people need to do, and we are always working to expand that scope. We to get that last 5-10%, we fallback to unsafe code. Is this as safe and reliable as it could be? No. That would be requiring 100% proofs of correctness. There are systems that do that, but they are maintained by a small handful of experts, and that idea – that systems programming is just for “wizards” – is exactly what we are trying to get away from.

To express this in our axioms, we put accessible as the top-most axiom. It defines the mission overall. But we put reliability as the second in the list, since that takes precedence over everything else.

The design axioms I really like

Without further ado, here is my current list design axioms. Well, part of it. These are the axioms that I feel pretty good about it. The ordering also feels right to me.

We believe that…

Rust is meant to empower everyone to build reliable and efficient software, so above all else, Rust needs to be accessible to a broad audience. We avoid designs that will be too complex to be used in practice. We build supportive tooling that not only points out potential mistakes but helps users understand and fix them.

Rust users want to surface problems as early as possible, and so Rust is designed to be reliable. We make choices that help surface bugs earlier. We don’t make guesses about what our users meant to do, we let them tell us, and we endeavor to make the meaning of code transparent to its reader. And we always, always guarantee memory safety and data-race freedom in safe Rust code.

Rust users are just as obsessed with quality as we are, and so Rust is extensible. We empower our users to build their own abstractions. We prefer to let people build what they need than to try (and fail) to give them everything ourselves.

Systems programmers need to know what is happening and where, and so system details and especially performance costs in Rust are transparent and tunable. When building systems, it’s often important to know what’s going on underneath the abstractions. Abstractions should still leave the programmer feeling like they’re in control of the underlying system, such as by making it easy to notice (or avoid) certain types of operations.

…where earlier things take precedence.

The design axioms that are still a work-in-progress

These axioms are things I am less sure of. It’s not that I don’t think they are true. It’s that I don’t know yet if they’re worded correctly. Maybe they should be combined together? And where, exactly, do they fall in the ordering?

Rust users want to focus on solving their problem, not the fiddly details, so Rust is productive. We favor APIs that where the most convenient and high-level option is also the most efficient one. We support portability across operating systems and execution environments by default. We aren’t explicit for the sake of being explicit, but rather to surface details we believe are needed.

N✕M is bigger than N+M, and so we design for composability and orthogonality. We are looking for features that tackle independent problems and build on one another, giving rise to N✕M possibilities.

It’s nicer to use one language than two, so Rust is versatile. Rust can’t be the best at everything, but we can make it decent for just about anything, whether that’s low-level C code or high-level scripting.

Of these, I like the first one best. Also, it follows the axiom structure better, because it starts with a hypothesis about Rust users and what they want. The other two are a bit older and I hadn’t adopted that convention yet.

Help shape the axioms!

My ultimate goal is to author an RFC endorsing these axioms for Rust. But I need help to get there. Are these the right axioms? Am I missing things? Should we change the ordering?

I’d love to know what you think! To aid in collaboration, I’ve created a nikomatsakis/rust-design-axioms github repository. It hosts the current state of the axioms and also has suggested ways to contribute.

I’ve already opened issues for some of the things I am wondering about, such as:

nikomatsakis/rust-design-axioms#1: Maybe we need a “performant” axiom? Right now, the idea of “zero-cost abstractions” and ““the default thing is also the most efficient one” feels a bit smeared across “transparent and tunable” and “productive”.
nikomatsakis/rust-design-axioms#2: Is “portability” sufficiently important to pull out from “productivity” into its own axiom?
nikomatsakis/rust-design-axioms#3: Are “versatility” and “orthogonality” really expressing something different from “productivity”?

Check it out!

I have a Google alert for “Rust” and I cannot tell you how often it seems that some sports teams or another shakes off Rust. I’d never heard that expression before signing up for this Google alert. ↩︎
I’m perhaps a bit unusual in my love for things like Amazon’s Leadership Principles. I can totally understand why, to many people, they seem like corporate nonsense. But if there’s one theme I’ve seen consistenly over my time working on Rust, it’s that process and structure are essential. Take a look at the “People Systems” keynote that Aaron, Ashley, and I gave at RustConf 2018 and you will see that theme running throughout. So many of Rust’s greatest practices – things like the teams or RFCs or public, rfcbot-based decision making – are an attempt to take some kind of informal, unstructured process and give it shape. ↩︎
I really like this Learning for Action page, which I admit I found just by googling for “strategy articulate a hypotheses”. I’m less into this super corporate-sounding LinkedIn post, but I have to admit I think it’s right on the money. ↩︎

Project Goals

2023-11-28T00:00:00+00:00

Lately I’ve been iterating on an idea I call project goals. Project goals are a new kind of RFC that defines a specific goal that a specific group of people hope to achieve in a specific amount of time – for example, “Rusty Spoon Corp proposes to fund 2 engineers full time to stabilize collections that support custom memory allocations by the end of 2023”.

Project goals would also include asks from various teams that are needed to complete the goal. For example, “Achieving this goal requires a dedicated reviewer from the compiler team along with an agreement from the language design team to respond to RFCs or nominated issues within 2 weeks.” The decision of whether to accept a goal would be up to those teams who are being asked to support it. If those teams approve the RFC, it means they agree with the goal, and also that they agree to commit those resources.

My belief is that project goals become a kind of incremental, rolling roadmap, declaring our intent to fix specific problems and then tracking our follow-through (or lack thereof). As I’ll explain in the post, I believe that a mechanism like project goals will help our morale and help us to get shit done, but I also think it’ll help with a bunch of other ancillary problems, such as providing a clearer path to get involved in Rust as well as getting more paid maintainers and contributors.

At the moment, project goals are just an idea. My plan is to author some sample goals to iron out the process and then an RFC to make it official.

Driving a goal in the Rust project is an uncertain process

Rust today has a lot of half-finished features waiting for people to invest time into them. But figuring out how to do so can be quite intimidating. You may have to trawl through github or Zulip threads to figure out what’s going on. Once you’ve done that, you’ll likely have to work through some competing constraints to find a proposed solution. But that stuff isn’t the real problem. The real problem is that, once you’ve invested that time and done that work, you don’t really know whether anyone will care enough about your work to approve it. There’s a good chance you’ll author an RFC, or a PR, and nobody will even respond to it.

Rust teams today often operate in a fairly reactive mode, without clear priorities. The official Rust procedures are almost exclusively ‘push’, and often based on evaluating artifacts, not intentions – people decide a problem they would like to see solved, and write an RFC or a PR to drive it forward; the teams decide whether to accept that work. But there is no established way to get feedback from the team on whether this is a problem – or an approach the problem – that would be welcome. Or, even if the team does theoretically want the work, there is no real promise from the team that they’ll respond or accountability when they do not.

We do try to be proactive and talk about our goals. Teams sometimes post lists of aspirations or roadmaps to to Inside Rust, for example, and we used to publish annual roadmaps as a project. But these documents have never seemed very successful to me. There is a fundamental tension that is peculiar to open source: the teams are not the ones doing the work. Teams review and provide feedback. Contributors do the work, and ultimately they decide what they will work on (or if they will do work at all). It’s hard to plan for the kinds of things you will do when you don’t know what resources you have. A more reliable barometer of the Rust project’s priorities has been to read the personal blogs doing the work, where people are talking about the goals they personally plan to drive.

This uncertainty holds back investment

The uncertainty involved in trying to push an idea forward in Rust is a major deterrent for companies thinking about investing in Rust. I hear about this gap from virtually every angle:

Imagine you’re a a developer who wants to use paid time to work on open source. How do you convince your manager it makes sense? Right now, the best you can do is I think I can make progress, and besides, it’s the right thing to do!"
Imagine you’re a contractor who wants to deliver for a client. They want to pay you to help drive a feature over the finish line – but you can’t be sure if you’re going to be able to deliver, since it will require consensus from a Rust team, and it’s unclear whether it meets their priorities.
Imagine you’re a CTO considering whether to adopt Rust for your company. You see that there are gaps in an area, but you don’t know whether that is something the project is actively looking to close, or what.
Or maybe you’re a CTO who has adopted Rust and is looking to “give back” to the community by contributing. You want to help deliver support for a feature you need and that you know a lot of people in the community would like, but you can’t figure out how to get started, and you can’t afford to have an engineer or two work on something for months without a return.

But some things work really well and we don’t want to lose those

Rust’s development may be chaotic, but there’s a beauty to it as well. As Mara’s classic blog post put it, “Rust is not a company”. Rust’s current structure allows for a feature to make progress in fits and starts, which means we can accommodate all kinds many different interest levels and motivation. Someone who is motivated can author and contribute an RFC, and then disappear. Somebody else can pick up the ball and move the implementation forward. And yet a third person can drive the docs and stabilization over the finish line. This is not only cool to watch, it also means that some features get done that would never be “top priority”. Consider let-else – this is one of the most popular features from the last few years, and yet, compared against core enabled like “async fn in trait”, it clearly takes second place in the priority list. But that’s fine, there are plenty of folks who don’t have the time or expertise to work on async fn in trait, but they can move let-else forward. It’s really important to me that we don’t lose this.

Proposal: project goal RFCs

So, top-down roadmaps are a poor fit for open-source. But working purely bottom-up has its own downsides. What can we do?

My proposal is to form roadmaps, but to do it bottom-up, via a new kind of RFC called a project goal RFC. A regular RFC proposes a solution to a problem. A project goal RFC proposes a plan to solve a particular problem in a particular timeframe. This could be specific, like “stabilize support for async closures in 2024”, or it could be more general, like “land nightly support for managing resource cleanup in async functions in 2024”. What it can’t be is non-actionable, such as “simplify async programming in 2024” or “make async Rust nice in 2024”.

Project goal RFCs are opened by the goal owners, the people proposing to do the work. They are approved by the teams which will be responsible for approving that work.¹ The RFC serves as a kind of contract: the owners will drive the work and the team will review that work and/or provide other kinds of support (such as mentorship).

Project goal RFCs are aimed squarely at larger projects

Project goal RFCs are not appropriate for all projects. In fact, they’re not appropriate for most projects. They are meant for larger, flagship projects, the kind where you want to be sure that the project is aligned around the goals before you start investing heavily. Here are some examples where I think project goal RFCs would be useful…

The async WG set an “unofficial” project goal of shipping async functions in traits this year (coming Dec 28!). Honestly, setting a goal like this felt a bit uncomfortable, as we didn’t have a means to make it “official and blessed”. I think that would have also helped during the push to stabilization, since we could reference this goal to help make the case for “time to ship”.
Goals might also take the shape of internal improvements. The types team is driving a flagship goal to ship a new trait solver. Authoring a project goal RFC would help bring this visibility and would also make it easier to make the case for funding work on this project.
I sometimes help to mentor collaborations with people in universities or with Master’s students. Project goals would let us set expectations up front about what work we expect to do during that time.
I’d like to drive consensus around the idea of easing tradeoffs with profiles – but I don’t want to start off with an RFC that is going to focus discuss on the details of how profiles are specified. I want to start off by getting alignment around whether to do something like profiles at all. Wearing my Amazon manager hat, having alignment there would also influence whether I allocated some of our team’s bandwidth to work on that. A project goal could be perfect for that.
The Foundation has run several project grant programs, and one of the challenges has been trying to choose projects to fund which will be welcomed by the project. As I’ve been saying, we don’t really have a mechanism for making those sorts of decisions.
The embedded working group or the Rust For Linux folks have a bunch of pain points. I think it’s been hard for us to manage cooperation between those really important efforts and the other Rust teams. Developing a joint project goal would be a way to highlight needs.
Someone who wants to work on Rust at their company could work with a team to develop an official goal that they can show to their manager to get authorized work time.
Companies that want to invest in Rust to close gaps could propose project goals. For example, I frequently get asked how a company can help move custom allocators forward. One candidate that comes up a lot is support for custom allocators and collections with fallible allocation. This same mechanism would also allow larger companies to propose goals that they’d like to drive. For example, there was a recent RFC on debugger visualization aimed at better support for debugging Rust in Windows. I could imagine folks from Microsoft proposing some goals in that area.

Anatomy of a project goal RFC

Project goal RFCs need to include enough detail that both the owners and the teams know what they are signing up for. I believe a project goal RFC should answer the following questions:

Why is this work important?
What work will be done on what timeframe?
- This should include…
  - milestones you will meet along the way,
  - specific use-cases you plan to address,
  - and guiding principles that will be used during design.
Who will be doing the work, and how much time will the have?
What support is needed and from which Rust teams?

The list above is intentionally somewhat detailed. Project goal RFCs are not meant to be used for everything. They are meant to be used for goals that are big enough that doing the planning is worthwhile. The planning also helps the owners and the teams set realistic timelines. (My assumption is that the first few project goals we set will be wildly optimistic, and over time we learn to temper our expectations.)

Why is this work important?

Naturally whenever we propose to do something, it is important to explain why this thing is worth doing. A quality project goal will lay out the context and motivation. The goal is for the owners to explain to the team why the team should dedicate their maintenance bandwidth to this feature. It’s also a space for the owners to explain to the world why they feel it’s worth their time to do the work to develop this feature.

What will be done and on what timeframe?

The heart of the project goal is declaring what work is to be done and when it will be done by. It’s important that this “work to be done” is specific enough to be evaluated. For example, “make async nice next year” is not a good goal. Something like “stabilize async closures in 2024” is good. It’s also ok to just talk about the problem to be solved, if the best solution isn’t known yet. For example, “deliver nightly support for managing resource cleanup in async programs in 2025” is a good goal that could be solved by [“async drop”][] but also by some other means.

Scaling work with timeframes and milestones

Goals should always include a specific timeframe, such as “in 2024” or “in 2025”. I think these timeframes will typically be about a year. If the time is too short, then the work is probably not significant enough to call it a goal. But if the timeframe is much longer than a year, then it’s probably best to scale back the “work to be done” to something more intermediate.

Of course, many goals will be part of a bigger project. For example, if one took a goal to deliver nightly support for something in 2024, then the next year, one might propose a goal to stabilize that support.

Ideally, the goal will also include milestones along the way. For example, if the goal is to have something stable in 1 year, it might begin with an RFC after 3 months, then 3 months of impl, 3 months of gaining experience, and 3 months for stabilization.

Pinning things down with use-cases

Unlike a feature RFC, a project goal RFC does not specify a precise design for the feature in question. Even if the project goal is something relatively specific, like “add support for async functions in traits”, there will still be a lot of ambiguity about what counts as success. For example, we decided to stabilize async functions in traits without support for send bounds. This means that some use cases, notably a crate like tower, aren’t supported yet. Does this count as success? To help pin this down, the project goal should include a list of use cases that it is trying to address.

Establishing guiding principles early

Finally, especially when goals involve a fair bit of design leeway, it is useful to lay down some of the guiding principles the goal owners expect to use. I think having discussion around these principles early will really help focus discussions later on. For example, when discussing how dynamic dispatch for async functions in traits should work, Tyler Mandry and I had an early goal that it should “just work” for simple cases but give the ability to customize behavior. But we quickly found that ran smack into Josh’s prioritization of allocation transparency. This conflict was precictable and I think it would have been useful to have had the discussion around these tenets early as a lang team, rather than waiting.²

Who will be doing the work, and how much time will the have?

Part of the goal is specifying who is going to be doing the work. For example, the goal might say “two developers to work at 50% time”. It might also say something more flexible, like “one developer to create quest issues and then mentor a group of volunteers to drive most of the work”. If possible, including specific names is useful too, particularly in more specialized areas. For example, “Ralf Jung and one graduate student will pursue an official set of rules for stacked borrows”.

What support is needed and from which Rust teams?

This section is where the project goal owners make asks of the project. Here are some typical asks that I expect we will have:

A dedicated reviewer for PRs to the compiler and an expected SLA of reviews within 3 days (or 1 week, or something).
An agreement from the lang team to review and provide feedback on RFCs.
Mentorship on some aspect or other.

I think teams should suggest the expected shape of asks and track their resources. For example, the lang team can probably have manage up to only a small number of “prioritized RFCs” at a time, so if there are more project goals, they may have to wait or accept a lower SLA.

Tracking progress

One of the interesting things about project goals is that they give us an immediate roadmap. I would like to see the project author a quarterly report – which means every 12 weeks, or two release cycles. This report would include all the current project goals and updates on their progress. Did they make their declared milestones? If not, why not? Because project goals don’t cover the entirety of the work we do, the report could also include other significant developments. This would be published on the main Rust blog and would let people follow along with Rust development and get a sense for our current trajectory.

One thing I’ve learned, though: you can’t require the goal owners to author that blog post. It would be much better to have a dedicated person or team authoring the blog posts and pinging the goal owners to get those status updates. Preparing an update so that it can be understood by a mass audience is its own sort of skill. Moreover, goal owners will be tempted to put it off, and the updates won’t happen. I think it’s quite important that these project updates happen every quarter, like clockwork, just as our Rust releases do. This is true even if the update has to ship without an update from some goals.

I envision this progress tracking as providing a measure of accountability. When somebody takes a goal, we’ll be able to follow along with their progress. I’ve seen at Amazon and elsewhere that having written down a goal and declared milestones, and then having to say whether you’ve met them, helps to keep teams focused on getting the job done. I often find that I have a job about 95% done but then, in the week before I have to write an update about it, I’m inspired to go and finish that last 5%.

Conclusion: next steps

My next step is that I am going to fashion an RFC making the case for project goals. This RFC will include a template. To try out the idea, I plan to also author an example project goal for “async function in traits” and perhaps some other ongoing or proposed efforts. In truth, I don’t think we need an RFC to do project goals – nothing is stopping us from accepting whatever RFC we want – but I see some value in spelling out and legitimizing the process. I think this probably ought to be approved by the governance council, which is an interesting test for that new group.

There are some follow-up questions worth discussing. One of the ones I think is most interesting is how to manage the quarterly project updates. This deserves a post of its own. The short version of my opinion is that I think it’d be great to have an open source “reporting” team that has the job of authoring this update and others of its ilk. I suspect that this team would work best if we had one or more people paid to participate and to bear the brunt of some of the organizational lift. I further suspect that the Foundation would be a good place for at least one of those people. But this is getting pretty speculative by now and I’d have to make the case to the board and Rust community that it’s a good use for the Foundation budget, which I certainly have not done.

It’s worth noting that I see project goal RFCs as just one piece of a larger puzzle that is giving a bit more structure to our design effort. One thing I think went wrong in prior efforts was that we attemped to be too proscriptive and too “one size fits all”. These days I tend to think that the only thing we must have to add a new feature to stable is an FCP-binding decision from the relevant teams(s). All the rest, whether it be authoring a feature RFC or creating a project goal RFC, are steps that make sense for projects of a certain magnitude, but not everything. Our job then should be to lay out the various kinds of RFCs one can write and when they are appropriate for use, and then let the teams judge how and when to request one.

In theory, anyway. In practice, I imagine that many team maintainers may keep some draft project goal RFCs in their pocket, looking for someone willing to do the work. ↩︎
The question of how to make dyn async traits easy to use and transparent remains unresolved, which is partly why I’m keen on something like profiles. ↩︎

Idea: "Using Rust", a living document

2023-10-20T00:00:00+00:00

A few years back, the Async Wg tried something new. We collaboratively authored an Async Vision Doc. The doc began by writing “status quo” stories, written as narratives from our cast of characters, that described how people were experiencing Async Rust at that time and then went on to plan a “shiny future”. This was a great experience. My impression was that authoring the “status quo” stories in particular was really helpful. Discussions at EuroRust recently got me wondering: can we adapt the “status quo” stories to something bigger? What if we could author a living document on the Rust user experience? One that captures what people are trying to do with Rust, where it is working really well for them, and where it could use improvement. I love this idea, and the more I thought about it, the more I saw opportunities to use it to improve other processes, such as planning, public communication, and RFCs. But I’m getting ahead of myself! Let’s dive in.

TL;DR

I think authoring a living document (working title: “Using Rust”) that collects “status quo” stories could be a tremendous resource for the Rust community. I’m curious to hear from folks who might like to be part of a group authoring such a document, especially (but not only) people with experience as product managers, developer advocates, or UX researchers.

Open source is full of ideas, but which to do?

The Rust open-source organization is a raucuous, chaotic, and, at its best, joyful environment. People are bubbling with ideas on how to make things better (some better than others). There are also a ton of people who want to be involved, but don’t know what to do. This sounds great, but it presents a real challenge: how do you decide which ideas to do?

The vast majority of ideas for improvement tend to be incremental. They take some small problem and polish it. If I sound disparaging, I don’t mean to be. This kind of polish is absolutely essential. It’s kind of ironic: there’s always been a perception that open source can’t build a quality product, but my experience has often been the opposite. Open source means that people show up out of nowhere with PRs that remove sharp edges. Sometimes it’s an edge you knew was there but didn’t have time to fix; other times it’s a problem you weren’t aware of, perhaps because of the Curse of Knowledge.

But finding those revolutionary ideas is harder. To be clear, it’s hard in any environment, but I think it’s particularly hard in open source. A big part of the problem is that open source has always focused on coding as our basic currency. Discussions tend to orient around specific proposals – that could be as small as a PR or as large as an RFC. But finding a revolutionary idea doesn’t start from coding or from a specific idea.

It all starts with the “status quo”

So how do we go about having more “revolutionary ideas”? My experience is that it begins by deeply understandly understanding the present moment. It’s amazing how often we take the “status quo” for granted. We assume that we know the problems people experience, and we assume that everybody else knows them too. In reality, we only know the problems that we personally experience – and most of the time we are not even fully aware of those!

One thing I remember from authoring the async vision doc is how hard it was to focus on the “status quo” – and how rewarding it was when we did! When you get people talking about the problems they experience, the temptation is to immediately jump to how to fix the problem. But if you resist that, and you force yourself to just document the current state, you’ll find you have a much richer idea of the problem.¹ And that richer understanding, in turn, gives rise to better ideas for how to fix it.

Idea: a living “Using Rust” document

So here is my idea: what if we created a living document, working title “Using Rust”, that aims to capture the “status quo” of Rust today:

What are people building with Rust?
How are people’s Rust experiences influenced by their background (e.g., prior programming experience, native language, etc)?
What is working well?
What challenges are they encountering?

Just as with the Async Vision Doc, I imagine “Using Rust” would cover the whole gamut of experiences, including not just the language itself but tooling, libraries, etc. Unlike the vision doc, I wouldn’t narrow it to async (though we might start by focusing on a particular domain to prove out the idea).

Like the vision doc, I imagine “Using Rust” would be composed of a series of vignettes, expressed in narrative form, using a similar set of personas² to the Async Vision Doc (perhaps with variations, like Spanish-speaking Alano instead of Alan).

I personally found the narratives really helpful to get the emotional “heft” of some of the stories. For example, “Alan started trusting the Rust compiler, but then… async” helped drive home the importance of that “if it compiles, it works” feeling for Rust users, as well as the way that panics can undermine it. Even though these are narratives, they can still dive deep into technical details. Researching and writing “Barbara battles buffered streams”, for example, really helped me to appreciate the trickiness of async cancellation’s semantics.³

I don’t think “Using Rust” would ever be finished, nor would I narrow it to one domain. Rather, I imagine it being a living document, one that we continuously revise as Rust changes.

Improving on the async vision doc

The async vision doc experience was great, but I learned a few things along the way that I would do differently now. One of them is that collecting stories is good, but synthesizing them is better (and harder). I also found that people telling you the stories are not always the right ones to author them. Last time, we had a lot of success with people authoring PRs, but many times people would tell a story, agree to author a PR, and then never follow up. This is pretty standard for open source but it also applies a sort of “selection bias” to the stories we got. I would address both of these problems by dividing up the roles. Rust users would just have to tell their stories. There would be a group of maintainers who would record those stories and then go try to author the PRs that integrate into “Using Rust”.

The other thing I learned is that trying to author a single shiny future does not work. It was meant to be a unifying vision for the group, but there are just too many variables at play to reach consensus on that. We should definitely be talking about where we will be in 5 years, but we don’t have to be entirely aligned on it. We just have to agree on the right next steps. My new plan is to integrate the “shiny future” into RFCs, as I describe below.

Maintaining “Using Rust”

In the fullness of time, and presuming it works out well, I think “Using Rust” should be a rust-lang project, owned and maintained by its own team. My working title for this team is the User Research Team, which has the charter of gathering up data on how people use Rust and putting that data into a form that makes it accessible to the rest of the Rust project. But I tend to think it’s better to prove out ideas before creating the team, so I think I would start with an experimental project, and create the team once we demonstrate the concept is working.

Gathering stories

So how would this team go about gathering data? There’s so many ways. When doing the async vision doc, we got some stories submitted by PRs on the repo. We ran writing sessions where people would come and tell us about their experiences.

I think it’s very valuable to have people gather “in depth” data from within specific companies. For the Async Vision Doc, I also interviewed team members, culminating in the “meta-story” “Alan extends an AWS service”. Tyler Mandry and I also met with members from Google, and I recall we had folks from Embark and a few other companies reach out to tell us about their experiences.

Another really cool idea that came from Pietro Albini: set up a booth at various Rust conferences where people can come up and tell you about their stories. Or perhaps we can run a workshop. So many possibilities!

Integrating “Using Rust” with the RFC process

The purpose of an RFC, in my mind, is to lay out a problem and a specific solution to that problem. The RFC is not code. It doesn’t have to be a complete description of the problem. But it should be complete enough that people can imagine how the problem is going to be solved.

Every RFC includes a motivation, but when I read those motivations, I am often a bit at a loss as to how to evaluate them. Clearly there is some kind of problem. But is it important? How does it rank with respect to other problems that users are encountering?

I imagine that the “Using Rust” doc would help greatly here. I’d like to get to the point where the moivation for RFCs is primarily addressing particular stories or aspects of stories within the document. We would then be able to read over other related stories to get a sense for how this problem ranks compared to other problems for that audience, and thus how important the motivation is.

RFCs can also include a section that “retells” the story to explain how it would have played out had this feature been available. I’ve often found that doing this helps me to identify obvious gaps. For example, maybe we are adding a nifty new syntax to address an issue, but how will users learn about it? Perhaps we can add a “note” to the diagnostic to guide them.

Frequently asked questions

Will this help us in cross-team collaboration?

Like any organization, the Rust organization can easily wind up “shipping its org chart”. For example, if I see a problem, as a lang-team member, I may be inclined to ship a language-based solution for it; similarly, I’ve seen that the embedded community works very hard to work within the confines of Rust as it is, whereas sometimes they could be a lot more productive if we added something to the language.

Although they are not a complete solution, I think having a “Using Rust” document will be helpful. Focusing on describing the problem means it can be presented to multiple teams and each can evaluate it to decide where the best solution lies.

What about other kinds of stories?

I’ve focused on stories about Rust users, but I think there are other kinds of stories we might want to include. For example, what about the trials and travails of Alan, Barbara, Grace, and Niklaus as they try to contribute to Rust?

How will we avoid “scenario solving”?

Scenario solving refers to a pattern where a feature is made to target various specific examples rather than being generalized to address a pattern of problems. It’s possible that if we write out user stories, people will design features to target exactly the problems that they read about, rather than observing that a whole host of problems can be addressed via a single solution. That is true, and I think teams will want to watch out for that. At the same time, I think that having access to a full range of stories will make it much easier to see those large patterns and to help identify the full value for a proposal.

What about a project management team?

From time to time there are proposals to create a “project management” team. There are many different shapes for what such a team would do, but the high-level motivation is to help provide “overall guidance” and ensure coherence between the Rust teams. I am skeptical about any idea that sounds like an “overseer” team. I trust the Rust teams to own and maintain their area. But I do think we can all benefit from getting more alignment on the sets of problems to be solved, which I think this “Using Rust” document would help to create. I can also imagine other interesting mechanisms that build on the doc, such as reviewing stories as a group online, or at “unconferences”.

Call to action: get in touch!

I’m feeling pretty excited about this project. I’m contemplating how to go about organizing it. I’m really interested to hear from people who would like to take part as authors and collators of user stories. If you think you’d be interested to participate, please send me an email. I’m particularly interested to hear from people with experience doing this sort of work (e.g., product managers, developer advocates, UX researchers).

If you’re hearing resonance of the wisdom of the Buddha, it was not intentional when I wrote this, but you are not alone. ↩︎
The personas/characters may look simple, but developing that cast of characters took a lot of work. Finding a set that is small enough to be memorable but which captures the essentials is hard work. One key insight was separating out the projects people are building from the characters building them, since otherwise you get a combinatorial explosion. ↩︎
Async cancellation is an area I deseparately want to return to! I still think we want some kind of structured concurrency like solution. My current thinking is roughly that we want something like moro for task-based concurrency and something like Yosh’s merged streams for handling “expect one of many possible message”-like scenarios. ↩︎

Eurorust reflections

2023-10-14T00:00:00+00:00

I’m on the plane back to the US from Belgium now and feeling grateful for having had the chance to speak at the EuroRust conference¹. EuroRust was the first Rust-focused conference that I’ve attended since COVID (though not the first conference overall). It was also the first Rust-focused conference that I’ve attended in Europe since…ever, from what I recall.² Since many of us were going to be in attendance, the types team also organized an in-person meetup which took place for 3 days before the conference itself³. Both the meetup and the conference were great in many ways, and sparked a lot of ideas. I think I’ll be writing blog posts about them for weeks to come, but I thought that to start, I’d write up something general about the conference itself, and some of my takeaways from the experience

It’s great to talk to people using Rust

When I started on Rust, I figured the project was never going to go anywhere — I mean, come on, we were making a new programming language. What are the odds it’ll be a success? But it still seemed like fun. So I set myself a simple benchmark: I will consider the project a success the first time I see an announcement where somebody built something cool with it, and I didn’t know them beforehand. In those days, everybody using Rust was also hanging out on IRC or on the mailing list.

Well, that turned out to be a touch on the conservative side. These days, Rust has gotten big enough that the core project itself is just a small piece of the action. It’s just amazing to hear all the things people are using Rust for. Just looking at the conference sponsors alone, I loved meeting the Shuttle and Tauri/CrabNebula teams and I got excited about playing with both of them. I had a great time talking to the RustRover team about the possibilities for building custom diagnostics and the ways we could leverage their custom GUI to finally get past the limitations of the terminal when we present error messages. But one of my favorite parts happened on the tram ride home, when I randomly met the maintainer of PyO3. Such a cool project, and definite inspiration for work I’ve been doing lately, like duchess.

Rust teachers everywhere

Speaking of Shuttle and Tauri, both of them are interesting in a particular way: they are empowerment efforts in their own right, and so they attract people whose primary interest is not Rust itself, but rather achieving some other goal (e.g., cloud development, or building a GUI application). It’s cool to see Rust empowering people to build other empowerment apps, but it’s also a fascinating source of data. Both of those projects have started embarking on efforts to teach Rust precisely because that will help grow their userbase. The Shuttle blog has all kinds of interesting articles⁴; the Tauri folks told me about their efforts to build Rust articles specifically targeting JavaScript and TypeScript programmers, which required careful choice of terminology and concepts.

The whole RustFest idea seems to have really worked

At some point, RustFest morphed from a particular conference into a kind of ‘meta conference’ organization, helping others to organize and run their own events. Looking over the calendar of Rust events in Europe, I have to say, that looks like it’s worked out pretty dang well. Hats off to y’all on that. Between EuroRust, RustLab in Italy, Rust Nation in the UK, and probably a bunch more that I’m not aware of.

I should also say that meeting the conference organizers at this conference was very nice. Both the EuroRust organizers (Marco and Sarah, from Mainmatter) were great to talk to, and I finally got to meet Ernest (now organizing Rust Nation in the UK), whom I’ve talked to on and off over the years but never met in person.

I do still miss the cozy chats at Rust Belt Rust (RIP), but this new generation of Rust conferences (and their organizers) is pretty rad too. Plus I get to eat good cheese and drink beer outdoors, two things that for reasons unbeknownst to me are all too rare in the United States.

The kids are all right

One of my favorite things about being involved in the Rust project has been watching it sustain and reinvent itself over the years. This year at the conference I got to see the “new generation” of Rust maintainers and contributors — some of them, like @davidtwco, I had met before, but who have gone from “wanna be” Rust contributor to driving core initiatives like the diagnostic translation effort. Others — like @bjorn3, @WaffleLapkin, @Nilstrieb, and even @MaraBos — I had never had a chance to meet before. I love that working on Rust lets you interact with people from all other the world, but there’s nothing like putting a name to a face, and getting to give someone a hug or shake their hand.

But yeah, there’s that thing

So, let me say up front, due to scheduling conflicts, I wasn’t able to attend RustConf this year (or last year, as it happens). But I read Adam Chalmer’s blog post that many people were talking about, and I saw this paragraph…

Rustconf definitely felt sadder and downbeat than my previous visit. Rustconf 2019 felt jubilant. The opening keynote celebrated the many exciting things that had happened over the last year. Non-lexical lifetimes had just shipped, which removed a ton of confusing borrow checker edge cases. Async/await was just a few short months away from being stabilized, unleashing a lot of high-performance, massively-scalable software. Eliza Weisman was presenting a new async tracing library which soon took over the Rust ecosystem. Lin Clark presented about how you could actually compile Rust into this niche thing called WebAssembly and get Rust to run on the frontend – awesome! It felt like Rust had a clear vision and was rapidly achieving its goals. I was super excited to be part of this revolution in software engineering.

…and it made me feel really sad.⁵ Rust’s mission has always been empowerment. I’ve always loved the “can do” spirit of Rust, the way we aim high and try to push boundaries in every way we can. To me, the open source org has always been an important part of how we empower.

Developing a programming language, especially a compiled one, is often viewed as the work of “wizards”, just like systems programming. I think Rust proves that this “wizard-like” reputation has more to do with the limitations of the tools we were using than the task itself. But just like Rust has the goal of making systems programming more practical and accessible, I like to think the Rust org helps to open up language development to a wider audience. I’ve seen so many people come to Rust, full of enthusiasm but not so much experience, and use it to launch a new career.

But, if I’m honest, I’ve also seen a lot of people come into Rust full of enthusiasm and wind up burned out and frustrated. And sometimes I think that’s precisely because of our “sky’s the limit” attitude — sometimes we can get so ambitious, we set ourselves up to crash and burn.

Sometimes “thinking big” means getting nowhere

Everybody wants to “think big”. And Rust has always prided itself on taking a “holistic view” of problems — we’ve tried to pay attention to the whole project, not just generating good code, but targeting the whole experience with quality diagnostics, a build system, an easy way to manage which Rust version you want, a package ecosystem, etc. But when we look at all the stuff we’ve built, it’s easy to forget how we got there: incrementally and painfully.

I mean, in Ye Olde Days of Rust, we didn’t even have a borrow checker. Soundness was an aspiration, not a reality. And once we got one, it sucked to use, because the design was still stuck in some ‘old style’ thinking. And even once we had INHTWAMA⁶, the error messages were pretty confounding. And once we invented the idea of multiline errors, it wasn’t until late 2018 that we had NLL, which changed the game again. And that’s just the compiler! The story is pretty much the same for every other detail of the language. You used to have to build the compiler with a Makefile that was so complex, I wouldn’t be surprised if were self-aware.⁷

When I feel burned out, one of the biggest reasons is that I’ve fallen into the trap of thinking too big, doing too much, and as a result I am spread too thin and everything seems impossible. Just look back three years ago: the async working group was driving this crazy project, the Async Vision Doc, and it seemed like we were on top of the world. We recorded all these stories of how async Rust was hard, and we were thinking about how we could solve it. Not surprisingly, we found that these stories were sometimes language problems, but just as often they were library limitations, or gaps in the tooling, or the docs. And so we set out an expansive vision, spawning out a ton of subprojects. And all the time, there was a voice in my head saying, “is this really going to work?”

Well, I’d say the answer is “no”. I mean, we made a lot of progress. We are going to stabilize async functions in traits this year, and that is awesome. We made a bunch of improvements to async usability, most notably cjgillot’s fantastic PR that improves the accuracy of send bounds and futures, preventing a whole ton of false errors (though that work wasn’t really done in coordination with the async wg effort per se, it’s just because cjgillot is out there silently making huge refactors⁸).

And yet, there’s a lot we didn’t do. We don’t have generators. We didn’t yet find a way to make futures smaller. We didn’t really drive to ground the conversation on structured concurrency. We also took a lot longer to do stuff than I hoped. I thought async functions in traits would ship in 2021 — it’s shipping now, but it’s 2023.

Focus, focus, focus; iterate, iterate, iterate

One lesson I take away from the async wg experience is focus, focus, focus and iterate, iterate, iterate. You can (almost) never start too small. I think we were absolutely right that “doing async right” demands addressing all of those concerns, but I think that we overestimated our ability to coordinate them up front, and as a result, things like shipping async fn in traits took longer than they needed to. We are going to get the async shiny future, but we’re going to get it one step at a time.

Also: we’re a lot bigger than we used to

Still, sometimes I find that when I float ideas, I encounter a reflexive bit of pushback: “sounds great, who’s going to do it”. One the one hand, that’s the voice of experience, coming back from one too many Think Big plans that didn’t work out. But on the other, sometimes it feels a bit like “old school” thinking to me. Rust is not the dinky little project it used to be, where we all knew everybody. Rust is used by millions of developers and is one of the fastest growing language today; it powers the cloud and it’s quite possibly in your kernel. In many ways, this growth hasn’t caught up with the open source org: I’d still like to see more companies hiring dedicated Rust teams of Rust developers, or giving their employees paid time to work on Rust⁹. But I think that growth is coming, especially if we work harder at harnessing it, and I am very excited about what that can mean.

Nothing succeeds like success

Now I know that when we talk about burnout, we’re also talking about other kinds of drama. Maybe you think that things like ‘working iteratively’ and having more people or resources are not going to help when the problem is conflicts between people or organizations. And you’re not wrong, it’s not going to solve all conflict. But I also think that an awful lot of conflict ultimately comes out of zero-sum, scarcity-oriented thinking, or from feeling disempowered to achieve the goals you set out to do. To help with burnout, we need to do better at a number of things, including I think helping each other to practice empathy and manage conflict more productively¹⁰, but I think we also need to do better at shipping product.

Don’t be afraid to fail — you got this

One of my favorite conversations from the whole conference happened after the conference itself. I was in the midst of pitching Jack Huey on some of the organizational ideas that I’m really excited about right now, which I think can help bring the Rust project closer to being the empowering, inclusive open-source project it aspires to be. Jack wasn’t sure if they were going to work. “But”, he said, “what the heck, let’s try it! I mean, what have we got to lose? If it doesn’t work, we’ll learn something, and do something else.”¹¹ Hell yes.

As I usually do, I’ve put my slides online. If you’re curious, take a look! If you see a typo, maybe open a PR. The speaker notes have some of the “soundrack”, though not all of it. ↩︎
Somehow, I never made it to a RustFest. ↩︎
You can find the agenda here. It contains links to the briefing documents that we prepared in advance, along with loose notes that we took during the discussions. I expect we’ll author a blog post covering the key developments on the Inside Rust blog. ↩︎
Including one I can’t wait to read about OAuth – I tried to understand Github’s docs on OAuth and just got completely lost. ↩︎
Side note, but I think Rust 2024 is shaping up to be another hugely impactful edition. There’s a very good chance we’ll have async functions in traits, type alias impl trait, and polonius, each of which is a massive usability and expressiveness win. I’m hoping we’ll also get improved temporary lifetimes in the new edition, eliminating the “blocking bugs” identified as among the most common in real-world Rust programs. And of course the last few years have already seen let-else, scoped threads, cargo add, and a variety of other changes. Gonna be great! ↩︎
INHTWAMA was the rather awkward (and inaccurate) acronym that we gave to the idea of “aliasing xor mutation” — i.e., the key principle underlying Rust’s borrow checker. The name comes from a blog post I wrote called “Imagine never hearing the phrase aliasable, mutable again”, which @pcwalton incorrectly remembered as “Imagine never hearing the words aliasable, mutable again”, and hence shortened to INHTWAMA. I notice now though that this acronym was also frequently mutated to IMHTWAMA which just makes no sense at all. ↩︎
I learned a lot from reading Rust’s Makefile in the early days. I had no idea you could model function calls in make with macros. Brilliant. I’ve always deeply admired Graydon’s Makefile wizardry there, though it occurs to me now that I never checked the git logs – maybe it was somebody else! I’ll have to go look later. ↩︎
Side note, but more often than not, I think cjgillot’s approaches are not going to work. And so far I’m 0 for 2 on this, he’s always been right. To paraphrase Brendan Eich, “always bet on cjgillot”. ↩︎
And I have some thoughts on how we can do better at encouraging them! More on that in some later posts. ↩︎
One of the biggest lessons for me in my personal life has been realizing that not telling people when I feel upset is not necessarily being kind to them and certainly not kind to myself. It seems like avoiding conflict, but it can actually lead to much larger conflicts down the line. ↩︎
Full confession, this quote is made up out of thin air. I have no memory of what words he used. But this is what he meant! ↩︎

Easing tradeoffs with profiles

2023-09-30T00:00:00+00:00

Rust helps you to build reliable programs. One of the ways it does that is by surfacing things to your attention that you really ought to care about. Think of the way we handle errors with Result: if some operation can fail, you can’t, ahem, fail to recognize that, because you have to account for the error case. And yet often the kinds of things you care about depend on the kind of application you are building. A classic example is memory allocation, which for many Rust apps is No Big Deal, but for others is something to be done carefully, and for still others is completely verboten. But this pattern crops up a lot. I’ve heard and like the framing of designing for “what do you have to pay attention to” – Rust currently aims for a balance that errs on the side of paying attention to more things, but tries to make them easy to manage. But this post is about a speculative idea of how we could do better than that by allowing programs to declare a profile.

Profiles declare what you want to pay attention to

The core idea is pretty simple. A profile would be declared, I think, in the Cargo.toml. Profiles would never change the semantics of your Rust code. You could always copy and paste code between Rust projects with different profiles and things would work the same. But it would adjust lint settings and errors. So if you copy code from a more lenient profile into your more stringent project, you might find that it gets warnings or errors it didn’t get before.

Primarily, this means lints

In effect, a profile would be a lot like a lint group. So if we have a profile for kernel development, this would turn on various lints that help to detect things that kernel developers really care about – unexpected memory allocation, potential panics – but other projects don’t. Much like Rust-for-linux’s existing klint project.

So why not just make it a lint group? Well, actually, maybe we should – but I thought Cargo.toml would be better because it would allow us to apply more stringent checks to what dependencies you use, which features they use, etc. For example, maybe dependencies could declare that some of their features are not well suited to certain profiles, and you would get a warning if your application winds up depending on them. I imagine would select a profile when running cargo new.

Example: autoclone for `Rc` and `Arc`

Let’s give an example of how this might work. In Rust today, if you want to have many handles to the same value, you can use a reference counted type like Rc or Arc. But whenever you want to get a new handle to that value, you have to explicit clone it:

let map: Rc<HashMap> = create_map();
let map2 = map.clone(); // 👈 Clone!

The idea of this clone is to call attention to the fact that custom code is executing here. This is not just a memcpy¹. I’ve been grateful for this some of the time. For example, when optimizing a concurrent data structure, I really like knowing exactly when one of my reference counts is going to change. But a lot of the time, these calls to clone are just noise, and I wish I could just write let map2 = map and be done with it.

So what if we modify the compiler as follows. Today, when you move out from a variable, you effectively get an error if that is not the “last use” of the variable:

let a = v; // move out from `v` here...
...
read(&v); // 💥 ...so we get an error when we use `v`.

What if, instead, when you move out from a value and it is not the last use, we introduce an auto-clone operation. This may fail if the type is not auto-cloneable (e.g., a Vec), but for Rc, Arc, and other O(1) clone operations, it would be equivalent to x.clone(). We could designate which types can be auto-cloneable by extra marker traits, for example. This means that let a = v above would be equivalent to let a = v.clone().

Now, here comes the interesing part. When we introduce an auto-clone, we would also introduce a lint: implicit clone operation. In the higher-level profile, this lint would be allow-by-default, but in the profile for lower-level code, if would be deny-by-default, with an auto-fix to insert clone. Now when I’m editing my concurrent data structure, I still get to see the clone operations explicitly, but when I’m writing my application code, I don’t have to think about it.

Example: dynamic dispatch with async trait

Here’s another example. Last year we spent a while exploring the ways that we can enable dynamic dispatch for traits that use async functions. We landed on a design that seemed like it hit a sweet spot. Most users could just use traits with async functions like normal, but they might get some implicit allocations. Users who cared could use other allocation strategies by being more explicit about things. (You can read about the design here.) But, as I described in my blog post The Soul of Rust, this design had a crucial flaw: although it was still possible to avoid allocation, it was no longer easy. This seemed to push Rust over the line from its current position as a systems language that can claim to be a true C alternative into a “just another higher-level language that can be made low-level if you program with care”.

But profiles seem to offer another alternative. We could go with our original design, but whenever the compiler inserted an adapter that might cause boxing to occur, it would issue a lint warning. In the higher-level profile, the warning would be allow-by-default, but in the lower-level profile, it would by deny-by-default.

Example: panic effects or other capabilities

If you really want to go crazy, we can use annotations to signal various kinds of effects. For example, one way to achieve panic safety, we might allow functions to be annotated with #[panics], signaling a function that might panic. Depending on the profile, this might require you to declare that the caller may panic (similar to how unsafe works now).

Depending how far we want to go here, we would ultimately have to integrate these kind of checks more deeply into the type system. For example, if you have a fn-pointer, or a dyn Trait call, we would have to introduce “may panic” effects into the type system to be able to track that information (but we could be conservative and just assume calls by pointer may panic, for example). But we could likely still use profiles to control how much you as the caller choose to care.

Changing the profile for a module or a function

Because profiles primarily address lints, we can also allow you to change the profile in a more narrow way. This could be done with lint groups (maybe each profile is a lint group), or perhaps with a #![profile] annotation.

Why I care: profiles could open up design space

So why am I writing about profiles? In short, I’m looking for opportunities to do the classic Rust thing of trying to have our cake and eat it too. I want Rust to be versatile, suitable for projects up and down the stack. I know that many projects contain hot spots or core bits of the code where the details matter quite a bit, and then large swaths of code where they don’t matter a jot. I’d like to have a Rust that feels closer to Swift that I can use most of the time, and then the ability to “dial up” the detail level for the code where I do care.

Conclusion: the core principles

I do want to emphasize that this idea is speculation. As far as I know, nobody else on the lang team is into this idea – most of them haven’t even heard about it!

I also am not hung up on the details. Maybe we can implement profiles with some well-named lint groups. Or maybe, as I proposed, it should go in Cargo.toml.

What I do care about are the core principles of what I am proposing:

Defining some small set of profiles for Rust applications that define the kinds of things you want to care about in that code.
- I think these should be global and not user-defined. This will allow profiles to work more smoothly across dependencies. Plus we can always allow user-defined profiles or something later if want.
Profiles never change what code will do when it runs, but they can make code get more warnings or errors.
- You can always copy-and-paste code between applications without fear that it will behave differently (though it may not compile).
- You can always understand what Rust code will do without knowing the profile or context it is running in.
Profiles let us do more implicit things to ease ergonomics without making Rust inapplicable for other use cases.
- Looking at Aaron Turon’s classic post introducing the lang team’s Rust 2018 ergonomics initiative, profiles let users dial down the context dependence and applicability of any particular change.

Back in the early days of Rust, we debated a lot about what ought to be the rule for when clone was required. I think the current rule of “memcpy is quiet, everything else is not” is pretty decent, but it’s not ideal in a few ways. For example, an O(1) clone operation like incrementing a refcount is not the same as an O(n) operation like cloning a vector, and yet they look the same. Moreover, memcpy’ing a giant array (or Future) can be a real performance footgun (not to mention blowing up your stack), and yet we let you do that quite quietly. This is a good example of where profiles could help, I believe. ↩︎

Polonius revisited, part 2

2023-09-29T00:00:00+00:00

In the previous Polonius post, we formulated the original borrow checker in a Polonius-like style. In this post, we are going to explore how we can extend that formulation to be flow-sensitive. In so doing, we will enable the original Polonius goals, but also overcome some of its shortcomings. I believe this formulation is also more amenable to efficient implementation. As I’ll cover at the end, though, I do find myself wondering if there’s still more room for improvement.

Running example

We will be working from the same Rust example as the original post, but focusing especially on the mutation in the false branch¹:

let mut x = 22;
let mut y = 44;
let mut p: &'0 u32 = &x;
y += 1;
let mut q: &'1 u32 = &y; // Borrow `y` here (L1)
if something() {
    p = q;  // Store borrow into `p`
    x += 1;
} else {
    y += 1; // Mutate `y` on `false` branch
}
y += 1;
read_value(p); // May refer to `x` or `y`

There is no reason to have an error on this line. There is a borrow of y, but on the false branch that borrow is only stored in q, and q will never be read again. So there cannot be undefined behavior (UB).

Existing borrow checker flags an error

The existing borrow checker, however, is not that smart. It sees read_value(p) at the end and, because that line could potentially read x or y, it flags the y += 1 as an error. When expressed this way, maybe you can have some sympathy for the poor borrow checker – it’s not an unreasonable conclusion! But it’s wrong.

The core issue of the existing borrow check stems from its use of a flow insensitive subset graph. This in turn is related to how it does the type check. In Polonius today, each variable has a single type and hence a single origin (e.g., q: &'1 u32). This causes us to conflate all the possible loans that the variable may refer to throughout execution. And yet as we have seen, this information is actually flow dependent.

The borrow checker today is based on a pretty standard style of type checker applied to the MIR. Essentially there is an environment that maps each variable to a type.

Env  = { X -> Type }
Type = scalar | & 'Y T | ...

Then we have type-checking inference rules that thread this same environment everywhere. Conceptually the structure of the the rules is as follows:

construct Env from local variable declarations
Env |- each basic block type checks
--------------------------
the MIR type checks

Type-checking a place then uses this Env, bottoming out in an inference rule like:

Env[X] = T
-------------
Env |- X : T

Flow-sensitive type check

The key thing that makes the borrow checker flow insensitive is that we use the same environment at all points. What if instead we had one environment per program point:

EnvAt = { Point -> Env }

Whenever we type check a statement at program point A, we will use EnvAt[A] as its environment. When program point A flows into point B, then the environment at A must be a subenvironment of the environment at B, which we write as EnvAt[A] <: EnvAt[B].

The subenvironment relationship Env1 <: Env2 holds if

for each variable X in Env2:
- X appears in Env1
- Env1[X] <: Env2[X]

There are two interesting things here. The first is that the set of variables can change over time. The idea is that once a variable goes dead, you can drop it from the environment. The second is that the type of the variable can change according to the subtyping rules.

You can think of flow-sensitive typing as if, for each program variable like q, we have a separate copy per program point, so q@A for point A and q@B for point at B. When we flow from one point to another, we assign from q@A to q@B. Like any assignment, this would require the type of q@A to be a subtype of the type of q@B.

Flow-sensitive typing in our example

Let’s see how this idea of a flow-sensitive type check plays out for our example. First, recall the MIR for our example from the previous post:

flowchart TD
  Intro --> BB1
  Intro["let mut x: i32\nlet mut y: i32\nlet mut p: &'0 i32\nlet mut q: &'1 i32"]
  BB1["BB1:\np = &x;\ny = y + 1;\nq = &y;\nif something goto BB2 else BB3"]
  BB1 --> BB2
  BB1 --> BB3
  BB2["BB2\np = q;\nx = x + 1;\n"]
  BB3["BB3\ny = y + 1;"]
  BB2 --> BB4;
  BB3 --> BB4;
  BB4["BB4\ny = y + 1;\nread_value(p);\n"]

  classDef default text-align:left,fill-opacity:0;

One environment per program point

In the original, flow-insensitive type check, the first thing we did was to create origin variables ('0, '1) for each of the origins that appear in our types. You can see those variables in the chart above. So we effectively had an environment like

Env_flow_insensitive = {
    p: &'0 i32,
    q: &'1 i32,
}

But now we are going to have one environment per program point. There is one program point in between each MIR statement. So the point BB1_0 would be the entry to basic block BB1, and BB1_1 would be after the first statement. So we have Env_BB1_0, Env_BB1_1, etc. We are going to create distinct origin variables for each of them:

Env_BB1_0 = {
    p: &'0_BB1_0 i32,
    q: &'1_BB1_0 i32,
}

Env_BB1_1 = {
    p: &'0_BB1_1 i32,
    q: &'1_BB1_1 i32,
}

...

Type-checking the edge from BB1 to BB2

Let’s look at point BB1_3, which is the final line in BB1, which in MIR-speak is called the terminator. It is an if terminator (if something goto BB2 else BB3). To type-check it, we will take the environment on entry (Env_BB1_3) and require that it is a sub-environment of the environment on entry to the true branch (Env_BB2_0) and on entry to the false branch (Env1_BB3_0).

Let’s start with the true branch. Here we have the environment Env_BB2_0:

Env_BB2_0 = {
    q: &'1_BB2_0 i32,
}

You should notice something curious here – why is there no entry for p? The reason is that the variable p is dead on entry to BB2, because its current value is about to be overridden. The type checker knows not to include dead variables in the environment.

This means that…

Env_BB1_3 <: Env_BB2_0 if the type of q at BB1_3 is a subtype of the type of q at BB2_0…
…so &'1_BB1_3 i32 <: &'1_BB2_0 i32 must hold…
…so '1_BB1_3 : '1_BB2_0 must hold.

What we just found then is that, because of the edge from BB1 to BB2, the version of '1 on exit from BB1 flows into '1 on entry to BB2.

Type-checking the `p = q` assignment

let’s look at the assignment p = q. This occurs in statement BB2_0. The environment before we just saw:

Env_BB2_0 = {
    q: &'1_BB2_0 i32,
}

For an assignment, we take the type of the left-hand side (p) from the environment after, because that is what we are storing into. The environment after is Env_BB2_1:

Env_BB2_1 = {
    p: &'0_BB2_1 i32,
}

And so to type check the statement, we get that &'1_BB2_0 i32 <: &'0 BB2_1 i32, or '1_BB2_0 : '0_BB2_1.

In addition to this relation from the assignment, we also have to make the environment Env_BB2_0 be a subenvironment of the env after Env_BB2_1. But since the set of live variables are disjoint, in this case, that doesn’t add anything to the picture.

Type-checking the edge from BB1 to BB3

As the final example, let’s look at the false edge from BB1 to BB3. On entry to BB3, the variable q is dead but p is not, so the environment looks like

Env_BB3_0 = {
    p: &'0_BB3_0 i32,
}

Following a similar process to before, we conclude that '0_BB1_3 : '0_BB3_0.

Building the flow-sensitive subset graph

We are now starting to see how we can build a flow-sensitive version of the flow graph. Instead of having one node in the graph per origin variable, we now have one node in the graph per origin variable per program point, and we create an edge N1 -> N2 between two nodes if the type check requires that N1 : N2, just as before. Basically the only difference is that we have a lot more nodes.

Putting together what we saw thus far, we can construct a subset graph for this program like the following. I’ve excluded nodes that correspond to dead variables – so for example there is no node '1_BB1_0, because '1 appears in the variable q, and q is dead at the start of the program.

flowchart TD
    subgraph "'0"
        N0_BB1_0["'0_BB1_0"]
        N0_BB1_1["'0_BB1_1"]
        N0_BB1_2["'0_BB1_2"]
        N0_BB1_3["'0_BB1_3"]
        N0_BB2_1["'0_BB2_1"]
        N0_BB3_0["'0_BB3_0"]
        N0_BB4_0["'0_BB4_0"]
        N0_BB4_1["'0_BB4_1"]
    end

    subgraph "'1"
        N1_BB1_2["'1_BB1_2"]
        N1_BB1_3["'1_BB1_3"]
        N1_BB2_0["'1_BB2_0"]
    end
    
    subgraph "Loans"
        L0["{L0} (&x)"]
        L1["{L1} (&y)"]
    end
    
    L0 --> N0_BB1_0
    L1 --> N1_BB1_2
    
    N0_BB1_0 --> N0_BB1_1 --> N0_BB1_2 --> N0_BB1_3
    N0_BB1_3 --> N0_BB3_0
    N0_BB3_0 --> N0_BB4_0 --> N0_BB4_1
    N0_BB2_1 --> N0_BB4_0

    N1_BB1_2 --> N1_BB1_3
    N1_BB1_3 --> N1_BB2_0
    
    N1_BB2_0 --> N0_BB2_1

Just as before, we can trace back from the node for a particular origin O to find all the loans contained within O. Only this time, the origin O also indicates a program point.

In particular, compare '0_BB3_0 (the data reachable from p on the false branch of the if) to '0_BB4_0 (the data reachable after the if finishes). We can see that in the first case, the origin can only reference L0, but afterwards, it could reference L1.

Active loans

Just as in described in the previous post, to complete the analysis we compute the active loans. Active loans are defined in almost exactly the same way, but with one twist. A loan L is active at a program point P if there is a path from the borrow that created L to P where, for each point along the path…

there is some live variable whose type at P may reference the loan; and,
the place expression that was borrowed by L (here, x) is not reassigned at P.

See the bolded test? We are now taking into account the fact that the type of the variable can change along the path. In particular, it may reference distinct origins.

Implementing using dataflow

Just as in the previous post, we can compute active loans using dataflow. In particular, we gen a loan when it is issued, and we kill a loan L at a point P if (a) there are no live variables whose origins contain L or (b) the path borrowed by L is assigned at P.

Applying this to our running example

When we apply this to our running example, the unnecessary error on the false branch of the if goes away. Let’s walk through it.

Entry block

In BB1, we gen L0 and L1 at their two borrow sites, respectively. As a result, the active loans on exit from BB1 wil be {L0, L1}:

flowchart TD
  Start["..."]
  BB1["BB1:
       p = &x; // Gen: L0
       y = y + 1;
       q = &y; // Gen: L1
       if something goto BB2 else BB3
  "]
  BB2["..."]
  BB3["..."]
  BB4["..."]
 
  Start --> BB1
  BB1 --> BB2
  BB1 --> BB3
  BB2 --> BB4
  BB3 --> BB4
 
  classDef default text-align:left,fill:#ffffff;
  classDef highlight text-align:left,fill:yellow;
  class BB3 highlight

The `false` branch of the `if`

On the false branch of the if (BB3), the only live reference is p, which will be used later on in BB4. In particular, q is dead.

In the flow insensitive version, when the borrow checker looked at the type of p, it was p: &'0 i32, and '0 had the value {L0, L1}, so the borrow checker concluded that both loans were active.

But in the flow sensitive version we are looking at now, the type of p on entry to BB3 is p: &'0_BB3_0 i32. And, consulting the subset graph shown earlier in this post, the value of '0_BB3_0 is just {L0}. So there is a kill for L1 on entry to the block. This means that the only active loan is L0, which borrows x. This in turn means that y = y + 1 is not an error.

flowchart TD
  Start["
    ...
  "]
  BB1["
      BB1:
      p = &x; // Gen: L0
      ...
      q = &y; // Gen: L1
      ...
  "]
  BB2["
      BB2:
      ...
  "]
  BB3["
      BB3:
      // Kill `L1` (no live references)
      // Active loans: {L0}
      y = y + 1;
  "]
  BB4["
      BB4:
      ...
      read_value(p); // later use of `p`
  "]
 
  Start --> BB1
  BB1 --> BB2
  BB1 --> BB3
  BB2 --> BB4
  BB3 --> BB4
 
  classDef default text-align:left,fill:#ffffff;
  classDef highlight text-align:left,fill:yellow;
  class BB3 highlight

The role of invariance: vec-push-ref

I didn’t highlight it before, but invariance plays a really interesting role in this analysis. Let’s see another example, a simplified version of vec-push-ref from polonius:

let v: Vec<&'v u32>;
let p: &'p mut Vec<&'vp u32>;
let x: u32;

/* P0 */ v = vec![];
/* P1 */ p = &mut v; // Loan L0
/* P2 */ x += 1; // <-- Expect NO error here.
/* P3 */ p.push(&x); // Loan 1
/* P4 */ x += 1; // <-- 💥 Expect an error here!
/* P5 */ drop(v);

What makes this interesting? We create a reference p at point P1 that points at v. We then insert a borrow of x into the reference p. After that point, the reference p is dead, but the loan L1 is still active – this is because it is also stored in v. This connection between p and v is what is key about this example.

The way that this connection is reflected in the type system is through variance. In particular, a type &mut T is invariant with respect to T. This means that when you assign one reference to another, the type that they reference must be exactly the same.

In terms of the subset graph, invariance works out to creating bidirectional edges between origins. Take a look at the resulting subset graph to see what I mean. To keep things simple, I am going to exclude nodes for p: the interesting origins here at 'v (the data in the vector v) and 'vp (the data in the vector referenced by p – which is also v).

flowchart TD
    subgraph "Loans"
      L1["L1 (&x)"]
    end
    
    subgraph "'v"
      V_P0["'v_P0"]
      V_P1["'v_P1"]
      V_P2["'v_P2"]
      V_P3["'v_P3"]
      V_P4["'v_P4"]
      V_P5["'v_P5"]
    end

    subgraph "'vp"
      VP_P1["'vp_P1"]
      VP_P2["'vp_P2"]
      VP_P3["'vp_P3"]
    end

    V_P0 --> V_P1 --> V_P2 --> V_P3 --> V_P4 --> V_P5
    
    V_P1 <---> VP_P1
    VP_P1 <---> VP_P2 <---> VP_P3
        
    L1 --> VP_P3

The key part here are the bidirectional arrows between v_P1 and vp_P1 and between vp_P1 and vp_P3. How did those come about?

The first edge resulted from p = &mut v. The type of v (at P1) is Vec<&'v_P1 u32>, and that type had to be equal to the referent of p (Vec<&'vp_P1 u32>). Since the types must be equal, that means 'v_P1: 'vp_P1 and vice versa, hence a bidirectional arrow.
The second edge resulted from the flow from P1 to P3. The variable p is live across that edge, so its type before (&'p_P1 mut Vec<&'vp_P1 u32>) must be a subtype of its type after (&'p_P3 mut Vec<&'vp_P3 u32>). Because &mut references are invariant with respect to their referent types, this implies that 'vp_P1 and 'vp_P3 must be equal.

Put all together, and we see that L1 can reach 'v_P4 and 'v_P5, even though it only flowed into an earlier point in the graph. That’s cool! We will get the error we expect.

On the other hand, we can also see that there is some imprecision introduced through invariance. The loan L1 is introduced at point P3, and yet it appears to flow from 'vp_P3 backwards in time to 'vp_P2, 'vp_P1, over to 'v_P1, and downward from there. If we were only looking at the subset graph, then, we would conclude that both x += 1 statements in this program are illegal, but in fact only the second one causes a problem.

Active loans to the rescue (again)

The imprecision we see here is very similar to the imprecision we saw in the original polonius. Effectively, invariance is taking away some of our flow sensitivity. Interestingly, the active loans portion of the analysis makes up for this, in the same way that it did in the previous post. In vec-push-ref, L1 will only be generated at P3, so even though it can reach 'v_P2 via the subset graph, it is not considered active at P2. But once it is generated, it is not killed, even when p goes dead, because it can flow into 'v_P4. Therefore we get the one error we expect.

Conclusion

I’m going to stop this post here. I’ve described a version of polonius where we give variables distinct types at each program point and then relate those types together to create an improved subset graph. This graph increases the precision of the active loans analysis such that we don’t get as many false errors, but it is still imprecise in some ways.

I think this formulation is interesting for a few reasons. First, the most expensive part of it is going to be the subset graph, which has a LOT of nodes and edges. But that can be compressed significantly with some simple heuristics. Moreover, the core operation we perform on that graph is reachability, and that can be implemented quite efficiently as well (do a strongly connected components computation to reduce the graph to a tree, and then you can assign pre- and post-orderings and just compare indices). So I believe it could scale in practice.

I have worked through a few more classic examples, and I may come back to them in future posts, so far this analysis seems to get the results I expect. However, I would also like to go back and compare it more deeply to the original polonius, as well as to some of the formulations that came out of academia. There is still something odd about leaning on the dataflow check. I hope to talk about some of that in follow-up posts (or perhaps on Zulip or elsewhere with some of you readers!).

If this particular example feels artificial, that’s because it is. But similar errors cause more common errors, most notably Problem Case #3. ↩︎

Empathy in open source: be gentle with each other

2023-09-27T00:00:00+00:00

Over the last few weeks I had been preparing a talk on “Inclusive Mentoring: Mentoring Across Differences” with one of my good friends at Amazon. Unfortunately, that talk got canceled because I came down with COVID when we were supposed to be presenting. But the themes we covered in the talk have been rattling in my brain ever since, and suddenly I’m seeing them everywhere. One of the big ones was about empathy — what it is, what it isn’t, and how you can practice it. Now that I’m thinking about it, I see empathy so often in open source.

What empathy is

In her book Atlas of the Heart¹, Brené Brown defines empathy as

an emotional skill set that allows us to understand what someone is experiencing and to reflect back that understanding.

Empathy is not about being nice or making the other person feel good or even feel better². Being empathetic means understanding what the other person feels and then showing them that you understand.

Understanding what the other person feels doesn’t mean you have to feel the same way. It also doesn’t mean you have to agree with them, or feel that they are “justified” in those feelings. In fact, as I’ll explain in a second, strong feelings and emotion are by design limited in their viewpoints — they are always showing us something, and showing us something real, but they are never showing us the full picture.

Usually we feel multiple, seemingly contradictory things, which can leave everything feeling like a big muddle. The goal, from what I can see, is to be able to pull those multiple feelings apart, understand them, and then – from a balanced place – decide how we are going to react to them. Hopefully in real time. Pretty damn hard, in my experience, but something we can get better at.

People are not any one thing

Some time back, Aaron Turon introduced me to Internal Family Systems through the book Self Therapy³. It’s really had a big influence on how I think about things. The super short version of IFS is “Inside Out is real”. We are each composites of a number of independent parts which capture pieces of our personality. When we are feeling balanced and whole, we are switching between these parts all the time in reaction to what is going on around us.

But sometimes things go awry. Sometimes, one part will get very alarmed about what it perceives to be happening, and it will take complete control of you. This is called blending. While you are blended, the part is doing its best to help you in the ways that it knows: that might mean making you super anxious, so that you identify risks, or it might mean making you yell at people, so that they will go away and you don’t have to risk them letting you down. No matter which part you are blended with in the moment, though, you lose access to your whole self and your full range of capabilities. Even though the part will help you solve the immediate problem, it often does so in ways that create other problems down the line.

This concept of parts has really helped me to understand myself, but it has also helped me to understand what previously seemed like contradictory behavior in other people. The reason that people sometimes act in extreme ways, ways that seem so different from the person I know at other times, is because they’re blended — they’re not the person I know at that time, they’re just one part of that person. And probably a part that has helped them through some tough times in the past.

Empathy as “holding space”

I’ve often heard the term ‘emotional labor’ and, to be honest, I had a hard time connecting to it. But in Lama Rod Owen’s “Love and Rage”, he talks about emotional labor in terms of “the work we do to help people process their emotions” and, in particular, gives this list of examples:

This includes actively listening to others, asking how people are feeling, checking in with them, letting them vent in front of you, and not reacting to someone when they are being rude or disrespectful.

Now this list struck a chord with me. To me, the hardest part of empathy is holding space — letting someone have a reaction or a feeling without turning away. When people are reacting in an extreme way — whether it’s venting or being rude — it makes us uncomfortable, and often we’ll try to make them stop. This can take many forms. It could mean changing the topic, dismissing it (“get over it”, “I’m sure they didn’t mean it like that”), or trying to fix it (“what you need to do is…”, “let’s go kick their ass!”) For me, when people do that, it makes me feel unseen and kind of upset. Even if the other person is getting righteously angry on my behalf, I feel like suddenly the situation isn’t about me and how I want to think about things.

What does all this have to do with Github?

At this point you might be wondering “what do obscure therapeutic processes and buddhist philosophy have to do with Github issue threads?” Take another look at Lama Rod Owens’s list of examples of emotional labor, especially the last one:

not reacting to someone when they are being rude or disrespectful

To be frank, being an open-source maintainer means taking a lot of shit⁴. In his insightful, and widely discussed, talk “The Hard Parts of Open Source", Evan Czaplicki identified many of the “failure modes” of open source comment threads. One very memorable pattern is the “Why don’t you just…” comment, where somebody chimes in with an obvious alternative, as if you hadn’t thought of it. There is also my personal favorite, what I’ll call the “double agent” comment, where someone seems to feel that your goal is actually to ruin the project you’ve put so much effort into, and so comes in hot and angry.

My goal is always to respond to comments as if the commenter had been constructive and polite, or was my best friend. I don’t always achieve my goal, especially in forums where I have to respond quickly⁵. But I honestly do try. One technique is to find the key points in their comment and rephrase them, to be sure you understand, and then give your take. When I do that, I usually learn things — even when I initially thought somebody was just a blowhard, there is often a strong point underlying their argument, and it may lead me to change course if I listen to it. If nothing else, it’s always good to know the counterarguments in depth.

Empathy as a maintainer

And this brings us to the role of empathy as an open-source maintainer. As I said, these days, I see it popping up everywhere. To start, the idea of responding to someone’s comment, even one that feels rude, by identifying the key points they are trying to make feels to me like empathy, even if those points are often highly technical⁶. Fundamentally, empathy is all about understanding the other person and letting them know you understand, and that is what I am trying to do here.

But empathy comes into play in a more meta way as well. Trying to think how somebody feels — and why they might be feeling that way — can really help me to step back from feeling angry or injured by the tone of a comment and instead to refocus on what they are trying to communicate to me. Aaron Turon wrote a truly insightful and honest series of posts about his perspective on this called Listening and Trust. In part 3 of that series, he identified some of the key contributors to comment threads that go off the rails, what he called “momentum, urgency, and fatigue”. It’s worth reading that post, or reading it again if you already have. It’s a masterpiece of looking past the immediate reactions to understand better what’s going on, both within others and yourself.

Empathy when we surprise people

When Apple is working on a new product, they keep it absolutely top secret until they are ready – and then they tell the world, hoping for a big splash. This works for them. In open source, though, it’s an anti-pattern. The last thing you want to do is to surprise people – that’s a great way to trigger those parts we were talking about.

The difference, I think, is that open source projects are community projects – everybody feels some degree of ownership. That’s a big part of what makes open source so great! But, at the same time, when somebody starts messing with your stuff, that’s sure to get you upset. Paul Ford wrote an article identifying this feeling, which he called “Why wasn’t I consulted?”.

I find the phrase “Why wasn’t I consulted?” a pretty useful reminder for how it feels, but to be honest I’ve never liked it. The problem is that to me it feels condescending. But I totally get the way that people feel. It doesn’t always mean I think they’re right, or even justified in that feeling. But I get it, and I respect it. Heck, I feel it too!⁷

My personal creed these days is to be as open and transparent as I can with what I am doing and why. It’s part of why I love having this blog, since it lets me post up early ideas while I am still thinking about them. This also means I can start to get input and feedback. I don’t always listen to that feedback. A lot of times, people hate the things I am talking about, and they’re not shy about saying so – I try to take that as a signal, but just one signal of many. If people are upset, I’m probably doing something wrong, but it may not be the idea, it may be the way I am talking about it, or some particular aspect of it.

Empathy when we design our project processes

As I prepared this blog post, I re-read Aaron’s Listening and Trust, and I was struck again by how many insights he had there. One of them was that by applying empathy, and looking at our processes from the lens of how it feels to be a participant – what concerns get triggered – we can make changes so that everyone feels more included and less worn down. The key part here is that we have to look not only as how things feel for ourselves, but also how they feel for the participants – and for those who are not yet participating! There’s a huge swath of people who do not join in on Rust discussions, and I think we’re really missing out. This kind of design isn’t easy, but it’s crucial.

Empathy as a contributor

I’ve focused a lot on the role of empathy as an open-source maintainer. But empathy absolutely comes into play as a contributor. There’s a lot said on how people behave differently when commenting on the internet versus in person, and how the tone of a text comment can so easily be misread.

The fact is, when you contribute to an open-source project, the maintainers are going to come up short. They’re going to overlook things. They may not respond promptly to your comment or PR – they’re likely going to hide their head in the sand because they’re overwhemed.⁸ Or they may snap at you.

So what do you do when people let you down? I think the best is to speak for your feelings, but to do so in an empathetic way. If you are feeling hurt, don’t leave an angry comment. This doesn’t mean you have to silence your feelings – but just own them as your feelings. “Hey, I get that you are busy. Still, when I open a PR and nobody answers, it feels like this contribution is not wanted. If that’s true, just tell me, I can go elsewhere.”⁹

I bet some of you, when you read that last comment, were like “oh, heck no”. It’s scary to talk about how you feel. It takes a lot of courage. But it’s effective – and it can help the maintainer get unblended from whatever part they are in and think about things from your perspective. Maybe they will answer, “No, I really want this change, but I am just super busy right now, can you give me 3 months?” Or maybe they will say, “Actually, you’re right, I am not sure this is the right direction. I’m sorry that I didn’t say so before you put so much work into it.” Or maybe they won’t answer at all, because they’re hiding from the github issue thread – but when they come back and read it much later, they’ll reflect on how that made you feel, and try to be more prompt the next time. Either way, you know that you spoke up for yourself, but did so in a way that they can hear.

Empathy for ourselves and our own parts

This brings me to my final topic. No matter what role we play in an open-source project, or in life, the most important person to have empathy for is yourself. Ironically, this is often the hardest. We usually have very high expectations for ourselves, and we don’t cut ourselves much slack. As a maintainer, this might manifest as feeling you have to respond to every comment or task, and feeling bad when you don’t keep up. As a contributor, it might be feeling crappy when people point out bugs in your PR. No matter who we are, it might be kicking ourselves and feeling shame when we overreact in a comment.

In my view, shame is basically never good. Of course I make mistakes, and I regret them. But when I feel shame about them, I am actually focusing inward, focusing on my own mistakes instead of focusing on how I can make it up to the other person or resolve my predicament. It doesn’t actually do anyone any good.

I think there are different ways to experience shame. I know how I experience it. It feels like one of my parts is kicking the crap out of itself. And that really hurts. It hurts so bad that it tends to cause other parts to rise up to try and make it stop. That might be by getting angry at others — “it’s their fault we screwed up!” — or, more common for me, it might be by feeling depressed, withdrawing, and perhaps focusing on some technical project that can make me feel good about myself.

In their classic and highly recommended blog post, My FOSS Story, Andrew Gallant talked about how they deal with an overflowing inbox full of issues, feature requests, and comments:

The solution that I’ve adopted for this phenomenon is one that I’ve used extremely effectively in my personal life: establish boundaries. Courteously but firmly setting boundaries is one of those magical life hacks that pays dividends once you figure out how to do it. If you don’t know how to do it, then I’m not sure exactly how to learn how to do it unfortunately. But setting boundaries lets you focus on what’s important to you and not what’s important to others.

It can be really easy to overextend yourself in an open-source project. This could mean, as a maintainer, feeling you have to respond to every comment, fix every bug. Overextending yourself in turn is a great way to become blended with a part, and start acting out some of those older, defensive strategies you have for dealing with stress.

Also, I’ve got bad news. You are going to screw up in some way. It might be overextending yourself¹⁰. It might be responding poorly. Or pushing for an idea that turns out to be very deeply wrong. When you do that, you have a choice. You can feel shame, or you can extend compassion and empathy to yourself. It’s ok. Mistakes happen. They are how we learn.

Once you’ve gotten past the shame, and realized that making mistakes doesn’t make you bad, you can start to think about repair. OK, so you messed up. What can you do about it? Maybe nothing is needed. Or maybe you need to go and undo some of what you did. Or maybe you have to go and tell some people that what they are doing is not ok. Either way, compassion and empathy for yourself is how you will get there.

On the limits of my own experience

Before I go, I want to take a moment to acknowledge the limits of my own experience. I am a cis, white male, and I think in this post it shows. When I encounter antipathy, it tends to be targeted at individual things I have done or ideas I am espousing. At most, it might come about because of the role I am playing. I don’t encounter conscious or unconscious bias on the basis of my race, gender, sexual orientation, or any other such thing. This gives me a lot of luxury. For example, for the most part, I can take a rude comment and I can usually find an underlying technical point to focus on in my response. This is not true for all maintainers. In writing this post, I thought a lot about how the dynamics of open source seem almost perfectly designed¹¹ to be exclusive to people who are not from groups deemed “high status” by society.

Rust has a pretty uneven track record here. There are projects that do better. Improving our processes to take better account of how they feel for participants is definitely a necessary step, along with other things. One thing I am convinced of: the more people that get involved in Rust – and especially the more distinct backgrounds and experiences those people have – the better it becomes. Rust is always trying to achieve 6 (previously) impossible things before breakfast, and we need all the ideas we can get.¹²

Be gentle with each other

If could I have just one wish, it would be this bastardized quote from the great Bill and Ted:

We’ve talked a lot about empathy and how it comes into play, but really, in my mind, it all boils down to being gentle when somebody slips up. Note that being gentle doesn’t mean you can’t also be real and authentic about how you felt. We talked earlier about I-messages – by speaking plainly about how somebody made you feel, you can deliver a message that is both gentle and yet incredibly powerful. To me, the key is not to make assumptions about what’s going on for other people. You can never know their motivations. You can make guesses, but they’re always based on incomplete information.

Does this mean I think we should all go running around saying “when you do X, I felt like you were trying to ruin the project?” Well, not really, although I think that would be an improvement. Even better though would be to stop and think, wait, why would they be trying to ruin the project? Instead of assuming what other people are doing, tell them how they are making you feel. Maybe say, “when you do X, I feel like you are saying my use case doesn’t matter”. Or, better yet, say “when you do X, I will no longer be able to do Y, which I find really valuable”. I predict this is much more likely to lead to a constructive discussion.

It’s important to remember that the choice of words can have strong impact, too. For me, words like ruin or phrases like dumpster fire, shitshow, etc, can be quite triggering all on their own. I’m not always consistent on this. I’ve noticed that I sometimes use strong, colorful language because I think it’s funny. But I’ve also noticed that when other people do it, I can get pretty upset (“I know that code is not the best, but it’s worked for the last 3 years dang it.”).

I think you can boil all of this down to be precise and accurate when you communicate. It’s not accurate to say “you are trying to ruin the project”. You can’t know that. It is accurate to talk about what you feel and why you feel it. It’s also not accurate to say something is a dumpster fire, but it is accurate to call out shortcomings and concerns.

Anyway, I’m done giving advice. I’m no expert here, just one more person trying to learn and do the best I can. What I can say with confidence is that the things I’m talking here have really helped me personally in approaching difficult situations in my life, and I hope that they’ll help some of you too!

I bought this book when it first came out, read a bit of it, and then thought of it more as a reference — a great book for getting clear, distinguished definitions that help to elucidate the subtleties of human emotion. But when I revisited it to prepare for this talk, I was surprised to find it was much more “front-to-back” readable than I thought, and carried a lot of hidden wisdom. ↩︎
Though I think people feeling good and better is always a consequence of having encountered someone else empathetic. ↩︎
By none other than Jay Earley, inventer of the Earley parser! This guy is my hero. ↩︎
And I say this as a cis white man, which means I don’t even have to deal with shit resulting from people’s conscious or unconscious bias. ↩︎
This is one reason I don’t personally like fast moving threads and discussions, and I often limit the venues where I will participate. I need a bit of time to sit with things and process them. ↩︎
It’s worth highlighting that the key points they are trying to make are not always technical. Re-reading Aaron Turon’s Listening and Trust posts for this series, I was reminded of glaebhoerl’s pivotal comment that articulated very well their frustration at the Rust maintainer’s sense of entitlement and superiority, and the reasons for it. As glaebhoerl identified so clearly, it wasn’t so much the technical decision that was the problem — though I think on balance it was the wrong call, it was a debatable point — as the manner of engagement. ↩︎
Like when Disney canceled Owl House without even asking me. WHAT GIVES DISNEY. ↩︎
For example, I’ve been ignoring messages in the Salsa Zulip for a bit, and feeling bad about how I just don’t have the time to focus on that project right now. I’m sorry y’all and I do still expect to come back to Salsa 2022 (which, alas, will clearly not ship in 2022 – ah well, I knew the risks when I put a year into the name). ↩︎
This structure, “when you do X, I feel Y”, is called an I-message. It’s surprisingly hard to do it right. It’s easy to make something that sounds like an I-message, but isn’t. For example, “When you closed this PR without commenting, it showed me I am not welcome here” is very different from “When you closed this PR without commenting, it made me feel like I am not welcome here”. The first one is not an I-message. It’s telling someone else how they feel. The second one is telling someone else how they made you feel. There’s a very good chance those two statements would land quite differently. ↩︎
Unless, perhaps, you are Andrew Gallant, who from what I can see is one supremely well balanced individual. :) ↩︎
This of course is what people mean when they talk about systemic racism, or at least how I understand it: it’s not that open source or most other things were designed intentionally to reinforce bias, but the structures of our society are setup so that if you don’t actively work to counteract bias, you wind up playing into it. ↩︎
I always think of Jessica Lord’s inspirational blog post Privilege, Community, and Open source, which sadly appears to be offline, but you can read it on the web-archive. ↩︎

Polonius revisited, part 1

2023-09-22T00:00:00+00:00

lqd has been doing awesome work driving progress on polonius. He’s authoring an update for Inside Rust, but the TL;DR is that, with his latest PR, we’ve reimplemented the traditional Rust borrow checker in a more polonius-like style. We are working to iron out the last few performance hiccups and thinking about replacing the existing borrow checker with this new re-implementation, which is effectively a no-op from a user’s perspective (including from a performance perspective). This blog post walks through that work, describing how the new analysis works at a high-level. I plan to write some follow-up posts diving into how we can extend this analysis to be more precise (while hopefully remaining efficient).

What is Polonius?

Polonius is one of those long-running projects that are finally starting to move again. From an end user’s perspective, the key goal is that we want to accept functions like so-called Problem Case #3, which was originally a goal of NLL but eventually cut from the deliverable. From my perspective, though, I’m most excited about Polonius as a stepping stone towards an analysis that can support internal references and self borrows.

Polonius began its life as an alternative formulation of the borrow checker rules defined in Datalog. The key idea is to switch the way we do the analysis. Whereas NLL thinks of 'r as a lifetime consisting of a set of program points, in polonius, we call 'r an origin containing a set of loans. In other words, rather than tracking the parts of the program where a reference will be used, we track the places that the reference may have come from. For deeper coverage of Polonius, I recommend my talk at Rust Belt Rust from (egads) 2019 (slides here).

Running example

In order to explain the analyses, I’m going to use this running example. One thing you’ll note is that the lifetimes/origins in the example are written as numbers, like '0 and '1. This is because, when we start the borrow check, we haven’t computed lifetimes/origins yet – that is the job of the borrow check! So, we first go and create synthetic inference variables (just like an algebraic variable) to use as placeholders throughout the computation. Once we’re all done, we’ll have actual values we could plug in for them – in the case of polonius, those values are sets of loans (each loan is a & expression, more or less, that appears somewhere in the program).

Here is our example. It contains two loans, L0 and L1, of x and y respectively. There are also four assignments:

let mut x = 22;
let mut y = 44;
let mut p: &'0 u32 = &x; // Loan L0, borrowing `x`
y += 1;                  // (A) Mutate `y` -- is this ok?
let mut q: &'1 u32 = &y; // Loan L1, borrowing `y`
if something() {
    p = q;               // `p` now points at `y`
    x += 1;              // (B) Mutate `x` -- is this ok?
} else {
    y += 1;              // (C) Mutate `y` -- is this ok?
}
y += 1;                  // (D) Mutate `y` -- is this ok?
read_value(p);           // use `p` again here

Today in Rust, we get two errors (C and D). If you were to run this example with MiniRust, though, you would find that only D can actually cause Undefined Behavior. At point C, we mutate y, but the only variable that references y is q, and it will never be used again. The borrow checker today reports an error because its overly conservative. Polonius, on the other hand, gets that case correct.

Location	Existing borrow checker	Polonius	MiniRust
A	✔️	✔️	OK
B	✔️	✔️	OK
C	❌	✔️	OK
D	❌	❌	Can cause UB, if `true` branch is taken

Reformulating the existing borrow check à la polonius

This blog post is going describe the existing borrow checker, but reformulated in a polonius-like style. This will make it easier to see how polonius is different in the next post. The idea of doing this reformulation came about when implementing the borrow checker in a-mir-formality¹. At first, we weren’t sure if it was equivalent, but lqd verified it experimentally by testing it against the rustc test suite, where it matches the behavior 100% (lqd is also going to test against crater).

The borrow check analysis is a combination of three things, which we will cover in turn:

flowchart TD
  ConstructMIR --> LiveVariable
  ConstructMIR --> OutlivesGraph
  LiveVariable --> LiveLoanDataflow
  OutlivesGraph --> LiveLoanDataflow
  ConstructMIR["Construct the MIR"]
  LiveVariable["Compute the live variables"]
  OutlivesGraph["Compute the outlives graph"]
  LiveLoanDataflow["Compute the active loans at a given point"]

Construct the MIR

The borrow checker these days operates on MIR². MIR is basically a very simplified version of Rust where each statement is broken down into rudimentary statements. Our program is already so simple that the MIR basically looks the same as the original program, except for the fact that it’s structured into a control-flow graph. The MIR would look roughly like this (simplified):

flowchart TD
  Intro --> BB1
  Intro["let mut x: i32\nlet mut y: i32\nlet mut p: &'0 i32\nlet mut q: &'1 i32"]
  BB1["p = &x;\ny = y + 1;\nq = &y;\nif something goto BB2 else BB3"]
  BB1 --> BB2
  BB1 --> BB3
  BB2["p = q;\nx = x + 1;\n"]
  BB3["y = y + 1;"]
  BB2 --> BB4;
  BB3 --> BB4;
  BB4["y = y + 1;\nread_value(p);\n"]

  classDef default text-align:left,fill-opacity:0;

Note that MIR begins with the types for all the variables; control-flow constructs like if get transformed into graph nodes called basic blocks, where each basic block contains only simple, straightline statements.

Compute the live origins

The first step is to compute the set of live origins at each program point. This is precisely the same as it was described in the NLL RFC. This is very similar to the classic liveness computation that is taught in a typical compiler course, but with one key difference. We are not computing live variables but rather live origins – the idea is roughly that the live origins are equal to the origins that appear in the types of the live variables:

LiveOrigins(P) = { O | O appears in the type of some variable V live at P }

The actual computation is slightly more subtle: when variables go out of scope, we take into account the rules from RFC #1327 to figure out precisely which of their origins may be accessed by the Drop impl. But I’m going to skip over that in this post.

Going back to our example, I’ve added comments which origins would be live at various points of interest:

let mut x = 22;
let mut y = 44;
let mut p: &'0 u32 = &x;
y += 1;
let mut q: &'1 u32 = &y;
// Here both `p` and `q` may be used later,
// and so the origins in their types (`'0` and `'1`)
// are live.
if something() {
    // Here, only the variable `q` is live.
    // `p` is dead because its current value is about
    // to be overwritten. As a result, the only live
    // origin is `'1`, since it appears in `q`'s type.
    p = q;
    x += 1;
} else {
    y += 1;
}
// Here, only the variable `p` is live
// (`q` is never used again),
// and so only the origin `'0` is live.
y += 1;
read_value(p);

Compute the subset graph

The next step in borrow checking is to run a type check across the MIR. MIR is effectively a very simplified form of Rust where statements are heavily desugared and there is a lot less type inference. There is, however, a lot of lifetime inference – basically when NLL starts every lifetime is an inference variable.

For example, consider the p = q assignment in our running example:

...
let mut p: &'0 u32 = &x;
y += 1;
let mut q: &'1 u32 = &y;
if something() {
    p = q; // <-- this assignment
    ...
} else {
    ...
}
...

To type check this, we take the type of q (&'1 u32) and require that it is a subtype of the type of p (&'0 u32):

&'1 u32 <: &'0 u32

As described in the NLL RFC, this subtyping relation holds if '1: '0. In NLL, we called this an outlives relation. But in polonius, because '0 and '1 are origins representing sets of loans, we call it a subset relation. In other words, '1: '0 could be written '1 ⊆ '0, and it means that whatever loans '1 may be referencing, '0 may reference too. Whatever final values we wind up with for '0 and '1 will have to reflect this constraint.

We can view these subset relations as a graph, where '1: '0 means there is an edge '1 --⊆--> '0. In the borrow checker today, this graph is flow insensitive, meaning that there is one graph for the entire function. As a result, we are going to get a graph like this:

flowchart LR
  L0 --"⊆"--> Tick0
  L1 --"⊆"--> Tick1
  Tick1 --"⊆"--> Tick0
  
  L0["{L0}"]
  L1["{L1}"]
  Tick0["'0"]
  Tick1["'1"]

  classDef default text-align:left,fill:#ffffff;

You can see that '0, the origin that appears in p, can be reached from both loan L0 and loan L1. That means that it could store a reference to either x or y, in short. In contrast, '1 (q) can only be reached from L1, and hence can only store a reference to y.

Active loans

There is one last piece to complete the borrow checker, which is computing the active loans. Active loans determine the errors that get reported. The idea is that, if there is an active loan of a place a.b.c, then accessing a.b.c may be an error, depending on the kind of loan/access.

Active loans build on the liveness analysis as well as the subset graph. The basic idea is that a loan is active at a point P if there is a path from the borrow that created the loan to P where, for each point along the path…

there is some live variable that may reference the loan
- i.e., there is a live origin O at P where L ∈ O. L ∈ O means that there is a path in the subset graph from the loan L to the origin O.
the place expression that was borrowed (here, x) is not reassigned
- this isn’t relevant to the current example, but the idea is that you can borrow the referent of a pointer, e.g., &mut *tmp. If you then later change tmp to point somewhere else, then the old loan of *tmp is no longer relevant, because it’s pointing to different data than the current value of *tmp.

Implementing using dataflow

In the compiler, we implement the above as a dataflow analysis. The value at any given point is the set of active loans. We gen a loan (add it to the value) when it is issued, and we kill a loan at a point P if either (1) the loan is not a member of the origins of any live variables; (2) the path borrowed by the loan is overwritten.

Active loans on entry to the function

Let’s walk through our running example. To start, look at the first basic block:

flowchart TD
  Start["..."]
  BB1["// Active loans: {}
       p = &x; // Gen: L0 -- loan issued
       // Active loans: {L0}
       y = y + 1;
       q = &y; // Gen L1 -- loan issued
       // Active loans {L0, L1}
       if something goto BB2 else BB3
  "]
  BB2["..."]
  BB3["..."]
  BB4["..."]

  Start --> BB1
  BB1 --> BB2
  BB1 --> BB3
  BB2 --> BB4
  BB3 --> BB4

  classDef default text-align:left,fill:#ffffff;
  classDef highlight text-align:left,fill:yellow;
  class BB1 highlight

This block is the start of the function, so the set of active loans starts out as empty. But then we encounter two &x statements, and each of them is the gen site for a loan (L0 and L1 respectively). By the end of the block, the active loan set is {L0, L1}.

Active loans on the “true” branch

The next interesting point is the “true” branch of the if:

flowchart TD
  Start["
    ...
    let mut q: &'1 i32;
    ...
  "]
  BB1["..."]
  BB2["
      // Kill L0 -- not part of any live origin
      // Active loans {L1}
      p = q;
      x = x + 1;
  "]
  BB3["..."]
  BB4["..."]
 
  Start --> BB1
  BB1 --> BB2
  BB1 --> BB3
  BB2 --> BB4
  BB3 --> BB4
 
  classDef default text-align:left,fill:#ffffff;
  classDef highlight text-align:left,fill:yellow;
  class BB2 highlight

The interesting thing here is that, on entering the block, there is a kill of L0. This is because the only live reference on entry to the block is q, as p is about to be overwritten. As the type of q is &'1 i32, this means that the live origins on entry to the block are {'1}. Looking at the subset graph we saw earlier…

flowchart LR
  L0 --"⊆"--> Tick0
  L1 --"⊆"--> Tick1
  Tick1 --"⊆"--> Tick0
  
  L0["{L0}"]
  L1["{L1}"]
  Tick0["'0"]
  Tick1["'1"]

  class L1 trace
  class Tick1 trace

  classDef default text-align:left,fill:#ffffff;
  classDef trace text-align:left,fill:yellow;

…we can trace the transitive predecessors of '1 to see that it contains only {L1} (I’ve highlighted those predecessors in yellow in the graph). This means that there is no live variable whose origins contains L0, so we add a kill for L0.

No error on `true` branch

Because the only active loan is L1, and L1 borrowed y, the x = x + 1 statement is accepted. This is a really interesting result! It illustrates how the idea of active loans restores some flow sensitivity to the borrow check.

Why is it so interesting? Well, consider this. At this point, the variable p is live. The variable p contains the origin '0, and if we look at the subset graph, '0 contains both L0 and L1. So, based purely on the subset graph, we would expect modifying x to be an error, since it is borrowed by L0. And yet it’s not!

This is because the active loan analysis noticed that, although in theory x may reference L0, it definitely doesn’t at this point.

Active loans on the `false` branch

In contrast, if we look at the “false” branch of the if:

flowchart TD
  Start["
    ...
    let mut p: &'0 i32;
    ...
  "]
  BB1["..."]
  BB2["..."]
  BB3["
      // Active loans {L0}, {L1}
      y = y + 1;
  "]
  BB4["..."]
 
  Start --> BB1
  BB1 --> BB2
  BB1 --> BB3
  BB2 --> BB4
  BB3 --> BB4
 
  classDef default text-align:left,fill:#ffffff;
  classDef highlight text-align:left,fill:yellow;
  class BB3 highlight

False error on the `false` branch

This path is also interesting: there is only one live variable, p. If you trace the code by hand, you can see that p could only refer to L0 (x) here. And yet the analysis concludes that we have two active loans: L0 and L1. This is because it is looking at the subset graph to determine what p may reference, and that graph is flow insensitive. So, since p may reference L1 at some point in the program, and we haven’t yet seen references to L1 go completely dead, we assume that p may reference L1 here. This leads to a false error being reported when the user does y = y + 1.

Active loans on the final block

Now let’s look at the final block:

flowchart TD
  Start["
    ...
    let mut p: &'0 i32;
    ...
  "]
  BB1["..."]
  BB2["..."]
  BB3["..."]
  BB4["
        // Active loans {L0}, {L1}
        y = y + 1;
        read_value(p);
  "]
 
  Start --> BB1
  BB1 --> BB2
  BB1 --> BB3
  BB2 --> BB4
  BB3 --> BB4
 
  classDef default text-align:left,fill:#ffffff;
  classDef highlight text-align:left,fill:yellow;
  class BB4 highlight

At this point, there is one live variable (p) and hence one live origin ('0); the subset graph tells us that p may reference both L0 and L1, so the set of active loans is {L0, L1}. This is correct: depending on which path we took, p may refer to either L0 or L1, and hence we flag a (correct) error when the user attempts to modify y.

Kills for reassignment

Our running example showed one reason that loans get killed when there are no more live references to them. This most commonly happens when you create a short-lived reference and then stop using it. But there is another way to get a kill, which happens from reassignment. Consider this example:

struct List {
    data: u32,
    next: Option<Box<List>>
}

fn print_all(mut p: &mut List) {
    loop {
        println!("{}", p.data);
        if let Some(n) = &mut p.next {
            p = n;
        } else {
            break;
        }
    }
}

I’m not going to walk through how this is borrow checked in detail here, but let me just point out what makes it interesting. In this loop, the code first borrows from p and then assigns that result to p. This means that, if you just look at the subset graph, on the next iteration around the loop, there would be an active loan of p. However, this code compiles – how does that work? The answer is that when we do p = n, we are mutating p, which means that, when we borrow from p on the next iteration, we are actually borrowing from a previous node than we borrowed from in the first iteration. So everything is fine. The reason the borrow checker is able to conclude this is that it kills the loan of p.next when it sees that p is assigned to. This is discussed in the NLL RFC in more detail.

Conclusion

That brings us to the end of part 1! In this post, we covered how you can describe the existing borrow check in a more polonius-like style. We also uncovered an interesting quirk in how the borrow checker is formulated. It uses a location insensitive alias analysis (the subset graph) but completes that with a dataflow propagation to track active loans. Together, this makes it more expressive. This wasn’t, however, the original plan with NLL. Originally, the subset graph was meant to be flow sensitive. Extending the subset graph to be flow sensitive is basically the heart of polonius. I’ve got some thoughts on how we might do that and I’ll be getting to that in later posts. I do want to say in passing though that doing all of this framing is also making me wonder – is it really necessary to combine a type check and the dataflow check? Can we frame the borrow checker (probably the more precise variants we’ll be getting to in future posts) in a more unified way? Not sure yet!

You won’t find this code in the current version of a-mir-formality; it’s since been rewritten a few times and the current version hasn’t caught up yet. ↩︎
The origin of the MIR is actually an interesting story. As documented in RFC #1211, ↩︎

New Layout, and now using Hugo!

2023-09-19T00:00:00+00:00

Some time ago I wrote about how I wanted to improve how my blog works. I recently got a spate of emails about this – thanks to all of you! And a particular big thank you to Luna Razzaghipour, who went ahead and ported the blog over to use Hugo, cleaning up the layout a bit and preserving URLs. It’s much appreciated! If you notice something amiss (like a link that doesn’t work anymore), I’d be very grateful if you opened an issue on the babysteps github repo! Thanks!

Hugo seems fast so far, although I will say that figuring out how to use Hugo modules (so that I could preserve the atom feed…) was rather confusing! But it’s all working now (I think!). I’m still interested in playing around more with the layout, but overall I think it looks good, and I’m happy to have code coloring on the snippets. Hopefully it renders better on mobile too.

Stability without stressing the !@#! out

2023-09-18T00:00:00+00:00

One of Rust’s core principles is “stability without stagnation”. This is embodied by our use of a “release train” model, in which we issue a new release every 6 weeks. Release trains make releasing a new release a “non-event”. Feature-based releases, in contrast, are super stressful! Since they occur infrequently, people try to cram everything into that release, which inevitably makes the release late. In contrast, with a release train, it’s not so important to make any particular release – if you miss one deadline, you can always catch the next one six weeks later. That’s the theory, anyway: but I’ve observed that, in practice, stabilizing a feature in Rust can still be a pretty stressful process. And the more important the feature, the more stress. This blog post talks over my theories as to why this is the case, and how we can tweak our processes (and our habits) to address it.

TL;DR

I like to write, and sometimes my posts get long. Sorry! Let me summarize for you:

Stabilization designs in Rust are stressful because they are conflating two distinct things: “does the feature do what it is supposed to do” (semver-stability) and “is the feature ready for general use for all its intended use cases” (recommended-for-use).
Open source works incrementally: to complete the polish we want, we need users to encounter the feature; incremental milestones help us do that.
Nightly is effective for getting some kinds of feedback, but not all; in particular, production users and library authors often won’t touch it. This gives us less data to work with when making high stakes decisions, and it’s a problem.
We should modify our process to distinguish four phases
- Accepted RFC – The team agrees idea is worth implementing, but it may yet be changed or removed. Use at your own risk. (Nightly today)
- Preview – Team agrees feature is ready for use, but wishes more feedback before committing. We reserve the right to tweak the details, but will not remove functionality without some migration path or workaround. (No equivalent today)
- Stable – Team agrees feature is done. Semantics will no longer change. Implementation may lack polish and may not yet meet all its intended use cases (but should meet some). (Stable today)
- Recommended – everyone should use this, it rocks. 🎸 (No equivalent today, though some would say stable)
I have an initial proposal for how we could implement these phases for Rust, but I’m not sure on the details. The point is more to identify this as a problem and start a discussion on potential solutions, rather than to drive a particular proposal.

Context

This post is inspired by years of experience trying to stabilize features. I’ve been meaning to write it for a while, but I was influenced most recently by the discussion on the PR to stabilize async fn in trait and return-position impl trait. I’m not intending this blog post to be an argument either way on that particular discussion, although I will be explaining my POV, which certainly has bearing on the outcome.

I will zoom out though and say that I think the Rust project needs to think about the whole “feature design lifecycle”. This has been a topic for me for years – just search for “adventures in consensus” on this blog. I think in the past I’ve been a bit too ambitious in my proposals¹, so I’m thinking now about how we can move more incrementally. This blog post is one such example.

Summary of Rust’s process today

Let me briefly summarize the “feature lifecycle” for Rust today. I’ll focus on language features since that’s what I know best: this material is also published on the “How do I propose a change to the language” page for the lang-team, which I suspect most people don’t know exists².

The path is roughly like this:

Author an RFC that outlines the problem to be solved and the key aspects of your solution. The RFC doesn’t have to have everything figured out, especially when it comes to the implementation – but it should describe most everything that a user of the language would have to know. The RFC can include “unresolved questions” that lay out corner cases or things where we need more experience to figure out the right answer.
- Generally speaking, to avoid undue maintenance burden, we don’t allow code to land until there is an accepted RFC. There is an exception though for experienced Rust contributors, who can create an experimental feature gate to do some initial hacking. That’s sometimes useful to prove out designs.³
Complete the implementation on master. This should force you to work out answers to the all unresolved questions that came up in the RFC. Often, having an implementation to work with also leads to other changes in the design. Presuming these are relatively minor, these changes are discussed and approved by the lang team on issues on the rust-lang repository.
Author a stabilization report, describing precisely what is being stabilized along with how each unresolved question was resolved.

Observation: Stabilization means different things to different people.

In a technical sense, stabilization means exactly one thing: the feature is now available on the stable release, and hence we can no longer make breaking changes to it⁴.

But, of course, stabilization also means that the feature is going to be encountered by users. Rust has always prided itself on holding a high bar for polish and quality, as reflected in how easy cargo is to use, our quality error messages, etc. There is always a concern when stabilizing a long-awaited feature that users are going to get excited, try it out, encounter rough edges, and conclude from this that Rust is impossible to use.

Observation: Open source works incrementally

Something I’ve come to appreciate over time is that open source is most effective if you work incrementally. If you want people to contribute or to provide meaningful feedback, you have to give them something to play with. Once you do that, the pace of progress and polish increases dramatically. It’s not magic, it’s just people “scratching their own itch” – once people have a chance to use the feature, if there is a confusing diagnostic or other similar issue, there’s a good chance that somebody will take a shot at addressing it.

In fact, speaking of diagnostics, it’s pretty hard to write a good diagnostic until you’ve thrown the feature at users. Often it’s not obvious up front what is going to be confusing. If you’ve ever watched Esteban at work, you’ll know that he scans all kinds of sources (github issues, twitter or whatever it’s called now, etc) to see the kinds of confusions that people are having and to look for ideas on how to explain them better.

Observation: Incremental progress boosts morale

The other big impact of working incrementally is for morale. If you’ve ever tried to push a big feature over the line, you’ll know that achieving milestones along the way is crucial. There’s a huge difference between trying to get everything perfect before you can ship and saying: “ok, this part is done, let’s get it in people’s hands, and then go focus on the next one”. This is both because it’s good to have the satisfaction of a job well done, and because stabilization is the only point at which we can truly end discussion. Up until stabilization is done, it’s always possible to stop and revisit old decisions.⁵

Observation: Working incrementally has a cost

Obviously, I am a big of working incrementally, but I won’t deny that it has a cost. For every person who encounters a bad diagnostic and gets inspired to open a PR, there are a lot more who will get confused. Some portion of them will walk away, concluding “Rust is too confusing”. That’s a problem.

Observation: A polished feature has a lot of moving parts

A polished feature in Rust today has a lot of moving parts…

a thoughtful design
a stable, bug free implementation
documentation in the Rust reference
quality error messages
tooling support, such as rustfmt, rustdoc, IDE, etc

…and we’d like to add more. For example, we are working on various Rust formalizations (MiniRust, a-mir-formality) and talking about upgrading the Rust reference into a normative specification.

Observation: Distinct skillsets are required to polish a feature

One interesting detail is that, often, completeing a polished feature requires the work of different people with different skillsets, which in turn means the involvement of many distinct Rust teams – in fact, when it comes to development tooling, this can mean the involvement of distinct projects that aren’t even part of the Rust org!

Just looking at language features, the design, for example, belongs to the lang-team, and often completes relatively early through the RFC process. The implementation is (typically) the compiler team, but often also more specialized teams and groups, like the types team or the diagnostics working group; RFCs can sometimes languish for a long time before being implemented. Documentation meanwhile is driven by the lang-docs team (for language features, anyway). Once that is done, the rustfmt, rustdoc, and IDE vendors also have work to do incorporating the new feature.

One of the challenges to open-source development is coordinating all of these different aspects. Open source development tends to be opportunistic – you don’t have dedicated resources available, so you have to do a balancing act where you adapt the work that needs to get done to the people that are available to do it. In my experience, it’s neither top down nor bottom up, but a strange mixture of the two.⁶

Because of the opportunistic nature of open-source development, some parts of a feature move more quickly than others – often, the basic design gets hammered out early, but implementation can take a long time. Sadly, the reference is often the hardest thing to catch up, in part because the rather heroic Eric Huss does not implement the Clone trait. 💜

Observation: Polished features don’t stand alone

And yet, to be truly polished, features need more than docs and error-messages: they need other features! It often happens that users using feature X will find that, to complete their task, they also need feature Y. This inevitably presents a challenge to our stabilization system, which judges the stability of each feature independently.

Async functions in trait are a great example: the core feature is working great on stable, but we haven’t reached consensus on a solution to the send bound problem. For some users, like embedded users, this doesn’t matter at all. For others, like Tower, this is a pretty big problem. So, do we hold back async function in traits until both features are ready? Or do we work incrementally, releasing what is ready now and then turning to focus on what’s left?

Observation: Nightly is just the beginning

I can hear readers saying now, “but wait, isn’t this what Nightly is for?” And yes, in principle, the nightly release is our vehicle for enabling experimentation with in-progress features. Sometimes it works great! It can be a great way to get ahead of confusing error messages, for example, or to flush out bugs. But all too often, Nightly is a big barrier for people, particularly production Rust users or those building widely used libraries. And those are precisely the users whose feedback would be most valuable.

What’s interesting is that many production users would be willing to tolerate a certain amount of instability. Many users tell me they wouldn’t mind rebasing over small changes in the feature design⁷, but what they can’t tolerate is building a codebase around a feature and then having it removed entirely, or having dropped support for major use cases without some kind of workaround.

Libraries are another interesting story. Library authors tend to be more advanced than your typical Rust user. They can tolerate a lack of polish in exchange for having access to a feature that lets them build a nicer experience for their users. Generic associated types are a clear example of this. One of the big arguments in favor of stabilizing them was that they often show up in the implementation of libraries but not in the outward interfaces. As one personal example, we’ve been using them extensively in Duchess, an experimental library for Java-Rust interop, and yet you won’t find any mention of them in the docs. Do we sometimes hit confusing errors or other problems? Yes. Is the syntax annoyingly verbose? Yes, absolutely. Am I glad they are stabilized? Hell yes.

Observation: having users help us figure out what else is needed

Remember how I said that it was hard to design quality diagnostics until you had seen the ways that users got confused? Well, the same goes for designing related features. Once production users or library authors start playing with something, they find all kinds of clever things they can do with it – or, often, things they could almost do, except for this one other missing piece. In this way, holding things unstable on Nightly – which means far fewer users can touch it – holds back the whole pace of Rust development significantly.

Prior art

Ember’s feature lifecycle

The Ember and Rust projects have long had a lot of fruitful back-and-forth when it comes to governance and process, thanks in part to the fact that Yehuda Katz was deeply involved in both of them. In 2022, they adopted a revised RFC process in which each feature goes through a number of stages:⁸

Proposed – An open pull request on the emberjs/rfcs repo.
Exploring – An RFC deemed worth pursuing but in need of refinement.
Accepted – A fully specified RFC.
Ready for release – The implementation of the RFC is complete, including learning materials.
Released – The work is published.
Recommended – The feature/resource is recommended for general use.

This is pretty cool! One other interesting aspect for Ember is how they approach editions. Remember I talked about how features don’t stand alone? In Ember, a significant cluster of related features is called an “edition”. New editions are declaed when all the pieces are in place to enable a new model for programming. This is pretty distinct from Rust’s time-based editions.

I’m not totally sure how to map Ember’s edition to Rust, but I think that the concept of an “umbrella initiative” is pretty close. For example, the async fundamentals initaitive roadmap identifies a cluster of related work that together constitute “async-sync language parity” – i.e., you can truly use async operations everywhere you would like to.

One interesting aspect of Ember’s editions is that they often begin by stabilizing “primitives” – e.g., fundamental APIs that aren’t really meant for end-users, but rather for plugin authors or people in the ecosystem, who can use them to experiment with the right end-user abstractions. I’ve found in Rust that we sometimes do this, though sometimes we find it better to begin with the end-user abstraction, and expose the primitives later.

The TC39 process for ECMAScript

The TC39 committee has a nice staged process. It’s not exactly comparable to Rust, but there are few things worth observing. First, I love the designation of a champion for a feature, and I think Rust would benefit from being more official about that in some ways. Second, I also love the explainer concept of authoring user documentation as part of the process. Third, before they stabilize, they always make the feature available to end-users, but under gates.

Java’s preview features

Ever since [JEP-12], Java has included preview features in their release process. A preview feature is one that is “fully specified, fully implemented, and yet impermanent” – it’s released for feedback, but it may be removed or changed based on the result of the evaluation. The motivation is to get more feedback on the design before committing to it:

To build confidence in the correctness and completeness of a new feature – whether in the Java language, the JVM, or the Java SE API – it is desirable for the feature to enjoy a period of broad exposure after its specification and implementation are stable but before it achieves final and permanent status in the Java SE Platform.

When using preview features, users opt-in both at compilation time and at runtime. In other words, if you compile a Java file that uses preview features to a JAR, and distribute the JAR, people using the JAR must also opt-in.

Proposal

Instead of rehashing the same debate every time we go to stabilize a feature, I think we should look at our feature release process so that we have more gradations of stability:

accepted RFC – With an accepted RFC, the team has agreed that we want the feature in principal. However, the details often change during development, and may even be removed. Use at your own risk.
preview – We are commited to keeping this functionality in some form, but we reserve the right to make changes. We won’t remove functionality from preview state without some kind of workaround. You can use this feature so long as you are willing to update your code when moving to a new version of the compiler. Preview features must be viral, meaning that if I build a crate using preview features, consumers must opt-in to the resulting instability somehow.
semver stable – We have committed to the technical design of this feature and people can build on it without fear of breakage between compiler revisions. The experience may lack polish and some intended use cases may not yet be possible.
recommended for use – This feature has all the documentation, error messages, and associated features that are needed for most Rust users to be successful. USE IT!

Comparison with today’s release trains. In our system today, the first three phases are both covered by “nightly” and the latter two are both covered by “stable”, but of course we don’t draw any formal distinctions. Async function in trait, for example, is clearly past the accepted RFC phase and is now in preview: the team is committed to shipping it in some form, and we don’t expect any major changes. But how would you know this, if you aren’t closely following Rust development? Generic associated types, meanwhile, are clearly semver stable rather than recommended for use – we know of many major gaps in the experience, mostly blocked on the trait system refactor initiative, but how would you know that, unless you were actively attending Rust types team meetings?

Unresolved questions

I am confident that these four phases are important, but there are a number of details of which I am not sure. Let me pose some of the questions I anticipate here.

How committed should we be to preview features?

In my proposal above, I said that the project would not remove functionality without a workaround. This is somewhat stronger than [JEP-12][], which indicates that preview features “will either be granted final and permanent status (with or without refinements) or be removed”. I said something somewhat stronger because I was thinking of production users. I know many such users would happily make use of preview features, and they are willing to make updates, but they don’t want to get stuck having based their codebase on something that completely goes away. I feel pretty confident that by the time we get to preview state, we should be able to say “yes, we want something like this”. I think it’s fine however if the feature gets removed in favor, say, of a procedural macro or some other solution, so long as the people using that preview feature has somewhere to go. (Naturally, my preference would be to provide as smooth a path as possible between compiler revisions; ideally, we’d issue automatable suggestions using cargo fix, similar to what we do for editions.)

How should the features be reflected in our release trains?

I don’t entirely know! I think there are a lot of different versions. I do know a few things:

Instability should be viral, whether experimental or preview: today, if I depend on a crate that uses nightly features, I must use nightly myself; this falls out from the fact that Rust doesn’t support binary distribution, but is very much intentional. The reason is that a crate cannot truly “hide” instability from its users. They can always upgrade to a new version of Rust and, if that causes the crate to stop compiling, they will perceive this as a failure of Rust’s promise, even it is a result of the crate having used an unstable feature. We need to do the same kind of viral result for preview features.
Preview and stabilized features need to be internally consistent, but not complete or fully polished: Preview features need to meet a certain quality bar – e.g., support in rustfmt, adequate documentation – but it’s fine for them to be a subset of what we hope to do in the fullness of time. It’s also ok for them to have less-than-ideal error messages. Those things come with time.
Documentation is key: A big challenge for Rust today is that we don’t have a canonical way for people to find out the status of the things they care about. I think we should invest some effort in setting up a consistent format with bot/tooling support to make it easy to maintain. Users will understand the idea that a feature is unpolished if you can direct them to a page where they can understand the context and learn about the workarounds they need in the short term.

With that in mind, here is a possible proposal for how we might do this:

Initially, features are nightly only, as today, and require an individual feature-gate.
- Until there is an accepted RFC, we should have a mandatory warning that the team has not yet decided if the feature is worth including; we also can continue to warn for features whose implementation is very incomplete.
Preview features are usable on stable, but with opt-in:
- Every project that uses any preview features, or which depends on crates that use preview features, must include preview-features = true in their Cargo.toml.
- Every crate that directly uses preview features must additionally include the appropriate feature gates.
- Reaching preview status should require some base level of support
  - core tooling, e.g. rustfmt, rustdoc, must work
  - an explainer must be available, but Rust reference material is not required
  - a nice landing page (or Github issue with known format) that indicates how to provide feedback; this page should also cover polish or supporting features that are known to be missing (similar to the [async fn fundamentals roadmap][roadmap])
  - the feature must be “complete enough” to meet some of its intended use cases; it doesn’t have to meet all of its intended use cases.
- This is an FCP decision, because it is commits the Rust project to supporting the use cases targeted by the preview feature (if not the details of how the feature works).
Semver stable features features are usable on stable, but we make efforts to redirect users to the landing page have a landing page that outlines what kind of support is still missing and how to provide feedback.
- Reaching semver stable requires an update to the Rust reference, in addition to the requirements for preview.
- The feature must be “complete enough” to meet some of its intended use cases; it doesn’t have to meet all of its intended use cases.
- This is an FCP decision, because it is commits the Rust project to supporting the feature in its current form going forward.
Recommended for use features would be just as today.
- The feature must meet all of the major use cases, which may mean that other features are present.

Conclusion

With apologies to Jane Austen:

“All Rust features are so accomplished. They all have stable semantics and even make helpful suggestions when you go astray. I am sure I never encountered a Rust feature without being informed that it was very accomplished.”

“Your list of the common extent of accomplishments,” said Darcy, “has too much truth. The word is applied to many a feature who deserves it no otherwise than by being stabilized. But I am very far from agreeing with you in your estimation of Rust features in general. I cannot boast of knowing more than half-a-dozen, in the whole range of my acquaintance, that are really accomplished.”

“Then,” observed Elizabeth, “you must comprehend a great deal in your idea of an accomplished feature.”

“Oh! certainly,” cried his faithful assistant, “no feature can be really esteemed accomplished without strong support in the IDE, wondorous documentation, and perhaps a chapter in the Rust book.”

“All this it must possess,” added Darcy, “and to all this it must yet add something more substantial: a host of related features that address common problems our users may encounter.”

“I am no longer surprised at your knowing ONLY six accomplished features. I rather wonder now at your knowing ANY.”

To translate: I think our ‘all or nothing’ stability system is introducing unnecessary friction into Rust development. Let’s change it!

A critique which many people pointed out to me at the time. ↩︎
The whole “How do I…” section on the page has some interesting things, if you’re looking to interact with the lang team! ↩︎
The decision to limit in-tree experimentation to experienced contributors was based on our experience with the earlier initiative system, where we were more open-ended. We found that the majority of those projects never went anywhere. Most of the people who signed up to drive experiments didn’t really have the time or knowledge to move them independently, and there wasn’t enough mentoring bandwidth to help them make progress. So we decided to limit in-tree experimentation to maintainers who’ve already demonstrated staying power. ↩︎
RFC 1122 lays out the lang team’s definition of “breaking change”, which is not quite the same as “your code will always continue to compile”. For example, we sometimes change the rules of inference; we also introduce or modify the behavior of lints (which can cause code that has #[deny] to stop compiling). Finally, we reserve the right to fix soundness bugs. And, in rare cases, we will override the policy altogether, if a feature’s design is so broken, but the bar for that is quite high. ↩︎
One of the things I am proud of about the Rust project is that we are willing to stop and revisit old decisions – I think we’ve dodged a number of bullets that way. At the same time, it’s exhausting. I think there’s more to say about finding ways to enable conversation that are not as draining on the participants, and especially on the designers and maintainers, but that’s a topic for another post. ↩︎
That said, my experience is that Amazon works in a surprisingly similar way – there are top-down decisions, but there are an awful lot of bottom-up ones. I imagine this varies company to company, but I think ultimately every good manager tries to ensure that their people are working on things that are well-suited to their skills. ↩︎
Many of which could be automated via cargo fix! ↩︎
Speaking of Ember-Rust cross-polination, Peter Wagenet, co-author of the Ember release blog post, also hacks on the Rust compiler from time to time. ↩︎
There’s nothing worse than investing months and months of work into getting something ready for stabilization, endlessly triaging issues, only to open a stabilization PR – the culmination of all that effort – and have the first few comments tell you that your work is not good enough. Oftentimes the people opening those PRs are volunteers, as well, which makes it all the worse. ↩︎

Higher-ranked projections (send bound problem, part 4)

2023-06-12T00:00:00+00:00

I recently posted a draft of an RFC about Return Type Notation to the async working group Zulip stream. In response, Josh Triplett reached out to me to raise some concerns. Talking to him gave rise to a 3rd idea for how to resolve the send bound problem. I still prefer RTN, but I think this idea is interesting and worth elaborating. I call it higher-ranked projections.

Idea part 1: Define `T::Foo` when `T` has higher-ranked bounds

Consider a trait like this…

trait Transform<In> {
    type Output;

    fn apply(&self, in: In) -> Self::Output;
}

Today, given a trait bound like T: Transform>, when you write T::Output, the compiler expands that to a fully qualified associated type >>::Output. This took a bit of work — the self type (T) of the trait is specified by the user, but the compiler looked at the bounds to select Vec as the value for In.

But suppose you have a higher-ranked trait bound like T: for<‘a> Transform<&’a [u32]>. Then what should the compiler do for T::Output? The compiler would have to something like >::Output where we pick a specific lifetime ’b. Instead of doing that, the compiler currently gives an error.

But we don’t always need to expand T::Output to a specific type. If T::Output is appearing in a where-clause, we could expand it to a random of types. For example, consider this function, which today will not compile:

fn process<T>()
where
    T: for<‘a> Transform<&’a str>>,
    T::Output: Send, // ERROR: `T::Output` is not allowed
{ /* … */ }

We could interpret T::Output: Send as a higher-ranked bound, for example:

fn process<T>()
where
    T: for<‘a> Transform<&’a str>>,
    for<‘a> <T as Transform<&’a str>>::Output: Send, // Desugared?
{ /* … */ }

Idea part 2: Fix the bugs on associated type chains

Right now, if have an iterator that yields other items, the compiler won’t let you write things like T::Item::Item…

fn foo<T: Iterator>
where
    T::Item: Iterator,
    T::Item::Item: Send, // <— ERROR
{ /* … */ }

…instead you have to write something horrible like <::Item as Iterator>::Item. There’s no particularly good reason for this. We should make it work better. One thing that would be useful is if we examined the bounds declared in the trait, so that e.g. if we have a trait like…

trait Factory {
    type Iterator: Iterator;
}

…and a F: Factory, then F::Iterator::Item should work.

Idea part 3: Associated type for every method in a trait

As the final step, for every method in a trait, we could add an associated type that binds to the “zero-sized function type” associated with that method. So in the Iterator trait…

trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}

…there’d be two associated types, Item and next. Given T: Iterator, T::next would map to a function type that implements for<‘a> Fn(&’a mut T) -> Option.

Putting it all together

If we put this all together, we can start to put bounds in the return types of async functions. Consider our usual trait:

trait HealthCheck {
    async fn check(&mut self);
}

and then a function like

fn spawn_health_check<HC>(hc: &mut HC)
where
    HC: HealthCheck,
    HC::check::Output: Send,
{
    /* … */
}

what does HC::check::Output: Send mean? Note that the Output here is the return type of the function trait, so it refers to the future that you get when you call the async function.

Regardless, by combining ideas part 1, 2, and 3, HC::check::Output can then be expanded to the following:

fn spawn_health_check<HC>(hc: &mut HC)
where
    HC: HealthCheck,
    // `HC::check::Output: Send` becomes…
    for<‘a> <HC::check as Fn<(&’a mut HC,)>>::Output: Send,
{
    /* … */
}

which, if you really like complex where clauses, you could further expand to this to a where-clause like this:

for<‘a> <
    <HC as HealthCheck>::check 
    as 
    Fn<(&’a mut HC,)>
>::Output: Send

Comparing this approach and RTN

In many ways, this idea is very similar to RTN. Compare this example…

fn spawn_health_check<HC>(hc: &mut HC)
where
    HC: HealthCheck,
    HC::check::Output: Send,
{
    /* … */
}

…to the RTN-based approach…

fn spawn_health_check<HC>(hc: &mut HC)
where
    HC: HealthCheck,
    HC::check(): Send,
{
    /* … */
}

In fact, () could be a shorthand for ::Output.

Associated type bounds

Another part of RTN, and in fact the only part that we’ve implemented so far, is the ability to put bounds on function returns “inline”:

fn spawn_health_check<HC>(hc: &mut HC)
where
    HC: HealthCheck<check(): Send>,
    //             ———
{
    /* … */
}

We could in principle do the same thing with ::Output notation:

fn spawn_health_check<HC>(hc: &mut HC)
where
    HC: HealthCheck<check::Output: Send>,
    //             ———
{
    /* … */
}

Pro: simpler building blocks

What I really like about this idea is that it doesn’t introduce new concepts or notation, but rather refines and extends ones that exist. We already have T::Output — all this is doing is making it work in contexts where it didn’t work before, and in a fairly logical way. We already have zero-sized function types representing every method, but now we would have a way to name them.

Con: Rust has two namespaces, and this is at odds with that

I said that we can add an associated type for every method in the trait — but what do we do if there is an associated type and a method with the same name? Something like this…

trait Foo {
    type process;
    fn process(&mut self);
}

…that would be weird, but it can certainly happen (in fact, I’ve written proc macros that generate code like this because I was too lazy to transform the name of the associated type).

We have some options here. We could say that we only add associated types for a method if there isn’t an explicit associated type. We can make this shadowing illegal in Rust 2024 (but not earlier Rust editions). We can only add methods for async functions and RPITIT functions, which are not currently possible, and then forbid shadowing in those cases.

Still, fundamentally, this approach is of making a method into an associated type is at odds with Rust’s primary two namespaces (types, values), whereas the RTN approach is working with those two namespaces.

Con: omg so verbose; and so. many. colons.

The obvious downside of the ::Output notation is that it is significantly more verbose to read and write when compared to RTN, and it puts :: and : in close proximity (admittedly an existing problem with Rust syntax). Consider:

where HC::check(): Send
// vs
where HC::check::Output: Send

RTN also works really well in associated type bound position, but ::Output works less well:

where HC: HealthCheck<check(): Send>
// vs
where HC: HealthCheck<check::Output: Send>

but…

…although it must be said that, in practice, check(): Send isn’t the only thing you have to write. For example, this example only says that the future returned by check() is Send, but in practice you actually need HC to be Send + ‘static too. So you would have to write something like…

HC: HealthCheck<check(): Send> + Send + ‘static

…and, of course, many traits in practice have a lot more than one method. Consider something like this trait…

trait Resource {
    async fn get(&mut self);
    async fn put(&mut self);
}

…then you would need to write…

R: Resource<get(): Send, put(): Send> + Send + ‘static

…and that quickly gets tedious. We encountered this in the case studies that we did, which is why the Google folks created a crate that lets you define a trait alias like SendResource, so that R: SendResource says all the above.

Con: confusion between `Output`

One interesting point that Yosh raised in our lang team design meeting is that people already have the potential to be confused about whether the Send bound applies to the future returned by the async function or the value you get from awaiting the future; the fact that both FnOnce and Future have an Output associated type could well play into that confusion.

One thing we discussed is how one would place bounds on the value returned from a future (versus the future itself). Under the higher-ranked projections proposal described in this blog post, this is fairly clear, you just do ...::Output::Output:

where 
    T::method::Output::Output: Send
    //         ------  ------
    //           |       |
    //           |     Describes value produced by future
    //         Describes the future itself.

For RTN, there are multiple options. One is to use ::Output:

where 
    T::method()::Output: Send,
    //       --  ------
    //       |    |
    //       |  Describes value produced by future
    //       Describes the future itself.

Another is to “double down” on the “pseudo-expression” syntax:

where 
    T::method().await: Send,
    //       -- -----
    //       |    |
    //       |  Describes value produced by future
    //       Describes the future itself.

We don’t have to settle this today, but it’s interesting to think about.

Pro: Building blocks first?

I’m torn on this point. Lately I’ve been into the idea of “stabilize the building blocks”. For a mature language like Rust, it is important to work piece by piece. Moreover, thanks to custom derive and procedural macros, people can build really powerful abstractions if they have the buildings blocks to work with. And it’s sometimes a lot easier to get consensus around the building blocks than the nice syntax on top¹. All of this argues to me for the ::Output approach, which feels to me like more of a general purpose building block.

but…

On the other hand, the () syntax is itself a building block. But it’s a building block that’s actually nice enough to use in simple cases. We’ve often been reluctant to add new bits of syntax to Rust, and I think that’s generally good, but sometimes I look with envy at other languages that are willing to take bold steps to build designs that are aggressively awesome. I’d like us as a language community to dare to ask for more. It’s hard to argue that the ::Output syntax is aggressively awesome. The () syntax may not be aggressively awesome (that’s probably trait transformers), but it’s at least mildly awesome.

Implementation notes

Right now, the only form of RTN that we have implemented is the “associated type bound” notation, e.g., HealthCheck. If we add RTN, I think we should also support use in where clauses (e.g., HC::check(): Send) and as a type for local variables (e.g., let x: HC::check() = hc.check(…)), persuant to the “year of everywhere” philosophy, where we try to make Rust notations as uniformly applicable as possible². That said, implementing it in those other places is significantly more complicated in the compiler.

The ::Output notation, in contrast, doesn’t read especially well as an associated type bound (HealthCheck is kind of O_O to me). I think it works better as a standalone where clause like HC::check::Output: Send. It’s not clear how quickly we can implement that. It should be possible, imo, but it requires more investigation.

Conclusion

There isn’t one yet. My sense is that both the ::Output and the RTN approach would work. The ::Output approach feels a bit more “primitive”. It can be used with any higher-ranked trait bound, which means it covers slightly more options, although I don’t have a compelling example of where you would want it right now. In contrast, RTN feels easier to explain and more accessible to newcomers, and it respects Rust’s “two namespaces” approach. Neither feels like a one-way door: we can start with RTN and then add ::Output (in which case, () is a kind of sugar for ::Output), and we can start with ::Output and then add () as a sugar for it later.

Although not always! I think that -> impl Trait is a good example of where stabilizing the syntax first, and working through the semantics and core primitives over time, has paid off. ↩︎
Hat tip to TC for bringing up this slogan in the lang team meeting. ↩︎

Giving, lending, and async closures

2023-05-09T00:00:00+00:00

In a previous post on async closures, I concluded that the best way to support async closures was with an async trait combinator. I’ve had a few conversations since the post and I want to share some additional thoughts. In particular, this post dives into what it would take to make async functions matchable with a type like impl FnMut() -> impl Future. This takes us down some interesting roads, in particular the distinction between giving and lending traits; it turns out that the closure traits specifically are a bit of a special case in turns of what we can do backwards compatibly, due to their special syntax. on!

Goal

Let me cut to the chase. This article lays out a way that we could support a notation like this:

fn take_closure(x: impl FnMut() -> impl Future<Output = bool>) { }

It requires some changes to the FnMut trait which, somewhat surprisingly, are backwards compatible I believe. It also requires us to change how we interpret -> impl Trait when in a trait bound (and likely in the value of an associated type); this could be done (over an Edition if necessary) but it introduces some further questions without clear answers.

This blog post itself isn’t a real proposal, but it’s a useful ingredient to use when discussing the right shape for async closures.

Giving traits

The split between Fn and async Fn turns out to be one instance of a general pattern, which I call “giving” vs “lending” traits. In a giving trait, when you invoke its methods, you get back a value that is independent from self.

Let’s see an example. The current Iterator trait is a giving trait:

trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
    //      ^ the lifetime of this reference
    //        does not appear in the return type;
    //        hence "giving"
}

In Iterator, each time you invoke next, you get ownership of a Self::Item value (or None). This value is not borrowed from the iterator.¹ As a consumer, a giving trait is convenient, because it permits you to invoke next multiple times and keep using the return value afterwards. For example, this function compiles and works for any iterator (playground):

fn take_two_v1<T: Iterator>(t: &mut T) -> Option<(T::Item, T::Item)> {
    let Some(i) = t.next() else { return None };
    let Some(j) = t.next() else { return None };
    // *Key point:* `i` is still live here, even though we called `next`
    // again to get `j`.
    Some((i, j))
}

Lending traits

Whereas a giving trait gives you ownership of the return value, a lending trait is one that returns a value borrowed from self. This pattern is less common, but it certainly appears from time to time. Consider the AsMut trait:

trait AsMut<T: ?Sized> {
    fn as_mut(&mut self) -> &mut T;
    //        -             -
    // Returns a reference borrowed from `self`.
}

AsMut takes an &mut self and (thanks to Rust’s elision rules) returns an &mut T borrowed from it. As a caller, this means that so long as you use the return value, the self is considered borrowed. Unlike with Iterator, therefore, you can’t invoke as_mut twice and keep using both return values (playground):

fn as_mut_two<T: AsMut<String>>(t: &mut T) {
    let i = t.as_mut(); // Borrows `t` mutably
    
    let j = t.as_mut(); // Error: second mutable borrow
                        // while the first is still live
    
    i.len();            // Use result from first borrow
}

Lending iterators

Of course, AsMut is kind of a “trivial” lending trait. A more interesting one is lending iterators². A lending iterator is an iterator that returns references into the iterator self. Typically this is because the iterator has some kind of internal buffer that it uses. Until recently, there was no lending iterator trait because it wasn’t even possible to express it in Rust. But with generic associated types (GATs), that changed. It’s now possible to express the trait, although there are borrow checker limitations that block it from being practical³:

trait LendingIterator {
    type Item<'this>
    where
        Self: 'this;
    
    fn next(&mut self) -> Option<Self::Item<'_>>;
    //      ^                        ^^
    // Unlike `Iterator`, returns a value
    // potentially borrowed from `self`.
}

As the name suggests, when you use a lending iterator, it is lending values to you; you have to “give them back” (stop using them) before you can invoke next again. This gives more freedom to the iterator: it has the ability to use an internal mutable buffer, for example. But it takes some flexibility from you as the consumer. For example, the take_two function we saw earlier will not compile with LendingIterator (playground):

fn take_two_v2<T: LendingIterator>(
    t: &mut T,
) -> Option<(T::Item<'_>, T::Item<'_>)> {
    let Some(i) = t.next() else { return None };
    let Some(j) = t.next() else { return None };
    // *Key point:* `i` is still live here, even though we called `next`
    // again to get `j`.
    Some((i, j))
}

An aside: Inherent or accidental complexity?

It seems kind of annoying that Iterator and LendingIterator are two distinct traits. In a GC’d language, they wouldn’t be. This is a good example of what makes using Rust more complex. On the other hand, it’s worth asking, is this inherent or accidental complexity? The answer, I think, is “it depends”.

For example, I could certainly write an Iterator in Java that makes use of an internal buffer:

class Compute
    implements Iterator<ByteBuffer>
{
    ByteBuffer shared = new ByteBuffer(256);
    
    ByteBuffer next() {
        if (mutateSharedBuffer()) {
            return shared.asReadOnlyBufer();
        }
        return null;
    }
    
    /// Mutates `shared` and return true if there is a new value.
    private boolean mutateSharedBuffer() {
        // ...
    }
}

Despite the fact that Java has no way to express the concept, this is most definitely a lending iterator. If I try to write a function that invokes next twice, the first value will simply not exist anymore:

Compute c = new Compute();
ByteBuffer a = c.next();
ByteBuffer b = c.next();
byte a0 = a.get(); // a has been overwritten with b..
byte b0 = b.get(); // ..so `a0 == b0` is always true.

In a case like this, Rust’s distinctions are expressing inherent complexity⁴. If you want to have a shared buffer that you reuse between calls, Java makes it easy to make mistakes. Rust’s ownership rules force you to copy out data that you want to keep using, preventing bugs like the one above. Eventually people learn to adopt functional patterns or to clone data instead of sharing access to mutable state. But that requires time and experience, and the compiler and language isn’t helping you do so (unless you use, say, Haskell or O’Caml or some purely functional language). These kinds of patterns are a good example of why Rust code winds up having that “if it compiles, it works” feeling, and how the same machinery that guarantees memory safety also prevents logical bugs.

`Iterator` as a special case of `LendingIterator`

OK, so we saw that the Iterator and LendingIterator trait, while clearly related, express an important tradeoff. The Iterator trait declares up front that each Item is independent from the iterator, but the LendingIterator declares that the Item<'_> values returned may be borrowed from the iterator. This affects what fully generic code (like our take_two function) can do.

But note a careful hedge: I said that the LendingIterator trait declares that Item<'_> calues may be borrowed from the iterator. They don’t have to be. In fact, every Iterator can be viewed as a LendingIterator (as you can see in this playground), much like every FnMut (which takes an &mut self) can be viewed as a Fn (which takes an &self). Essentially an Iterator is “just” a LendingIterator that doesn’t happen to make use of the 'a argument when defining its Item<'a>.

It’s also possible to write a version of take_two that uses LendingIterator but compiles (playground)⁵:

fn take_two_v3<T, U>(t: &mut T) -> Option<(U, U)> 
where
    T: for<'a> LendingIterator<Item<'a> = U>
    // ^^^^^^                             ^
    // No matter which `'a` is used, result is always `U`,
    // which cannot reference `'a` (after all, `'a` is not
    // in scope when `U` is declared).
{
    let Some(i) = t.next() else { return None };
    let Some(j) = t.next() else { return None };
    Some((i, j))
}

The key here is the where-clause. It says that T::Item<'a> is always equal to U, no matter what 'a is. In other words, the item that is produced by this iterator is never borrowed from self – if it were, then its type would include 'a somewhere, as that is the lifetime of the reference to the iterator. As a result, take_two compiles successfully. Of course, it also can’t be used with LendingIterator values that actually make use of the flexibility the trait is offering them.

Can we “unify” `Iterator` and `LendingIterator`?

The fact that every iterator is just a special case of lending iterator begs the question, can they be unified? Jack Huey, in the runup to GATs, spend a while exploring this question, and concluded that it doesn’t work. To see why, imagine that we changed Iterator so that it had type Item<'a>, instead of just type Item. It’s easy enough to imagine that existing code that says T: Iterator could be reinterpreted as for<'a> T: Iterator = u32>, and then it ought to continue compiling. But the scheme doesn’t quite work precisely because of examples like take_two_v1:

fn take_two_v1(t: &mut T) -> Option<(T::Item, T::Item)> {...}

This signature just says that it takes an Iterator; it doesn’t put any additional constraints on it. If we’ve modified Iterator to be a lending iterator, then you can’t take two items independently. So we would have to have some way to say “any giving iterator” vs “any lending iterator” – and if we’re going to say those two things, why not make it two distinct traits?

`FnMut` is a giving trait

I started off this post talking about async closures, but so far I’ve just talked about iterators. What’s the connection? Well, for starters, the distinction between sync and async closures is precisely the difference between giving and lending closures.

Sync closures (at least as defined now) are giving traits. Consider a (simplified) view of the FnMut trait as an example:

trait FnMut<A> {
    type Output;
    fn call(&mut self, args: A) -> Self::Output;
    //      ^                      ^^^^^^^^^^^^
    // The `self` reference is independent from the
    // return type.
}

FnMut returns a Self::Output, just like the giving Iterator returns Self::Item.

`FnMut` has special syntax

You may not be accustomed to seeing the FnMut trait as a regular trait. In fact, on stable Rust, we require you to use special syntax with FnMut. For example, you write impl FnMut(u32) -> bool as a shorthand for FnMut<(u32,), Output = bool>. This is not just for convenience, it’s also because we have planned for some time to make changes to the FnMut trait (e.g., to make it variadic, rather than having it take a tuple of argument types), and the special syntax is meant to leave room for that. Pay attention here: this special syntax turns out to have an important role.

Async closures are a lending pattern

Async closures are closures that return a future. But that future has to capture self. So that makes them a kind of lending trait. Imagine we had a LendingFnMut:

trait LendingFnMut<A> {
    type Output<'this>
    where
        Self: 'this;
    
    fn call(&mut self, args: A) -> Self::Output<'_>;
    //      ^                                  ^^^^
    // Lends data from `self` as part of return value.
}

Now we could (not saying we should) express an async closure as a kind of bound on Output:

// Imagine we want something like this...
async fn foo(x: async FnMut() -> bool) {...}

// ...that is kind of this:
async fn foo<F>(f: F)
where
    F: LendingFnMut<()>,
    for<'a> F::Output<'a>: Future<Output = bool>
{
    ...
}

What is going on here? We saying first that f is a lending closure that takes no arguments F: LendingFnMut<()>. Note that we are not using the special FnMut sugar here, so this constraint says nothing about the value of Output. Then, in the next where-clause, we are specifying that Output implements Future. Importantly, we never say what F::Output is. Just that it will implement Future. This means that it could include references to self (but it doesn’t have to).

Note what just happened. This is effectively a “third option” for how to desugar some kind of async closures. In my [previous post], I talked about using HKT and about transforming the FnMut trait into an async variant (async FnMut). But here we see that we could also have a lending variant of the trait and then bound the Output of that to implement Future.

Closure syntax gives us more room to maneuver

So, to recap things we have seen:

Giving vs lending traits is a fundamental pattern:
- A giving trait has a return value that never borrows from self
- A lending trait has a return value that may borrow from self
Giving traits are subtraits of lending traits; i.e., you can view a giving trait as a lending trait that happens not to lend.
We can’t convert Iterator to a lending trait “in place”, because functions that are generic over T: Iterator rely on it being the giving pattern.
Async closures are expressible using a lending variant of FnMut, but not the current trait, which is the giving version.

Given the last two points, it might seem logical that we also can’t convert FnMut “in place” to the lending version, and that therefore we have to add some kind of separate trait. In fact, though, this is not true, and the reason is because of the forced closure syntax. In particular, it’s not possible to write a function today that is generic over F: FnMut but doesn’t specify a specific value for the Output generic type. When you write F: FnMut(u32), you are actually specifying F: FnMut<(u32,), Output = ()>. It is possible to write generic code that talks about F::Output, but that will always be normalizable to something else, because adding the FnMut bound always includes a value for Output.

In principle, then, we could redefine the Output associated type to take a lifetime parameter and change the desugaring for F: FnMut() -> R to be for<'a> F: FnMut<(), Output<'a> = R>. We would also have to make F::Output be legal even without specifying a value for its lifetime parameter; there are a few ways we could do that.

How to interpret impl Trait in the value of an associated type

Let’s imagine that we changed the Fn* to be lending traits, then. That’s still not enough to support our original goal:

fn take_closure(x: impl FnMut() -> impl Future<Output = bool>) { }
//                                 ^^^^
// Impl trait is not supported here.

The problem is that we also have to decide how to desugar impl Trait in this position. The interpretation that we want is not entirely obvious. We could choose to desugar -> impl Future as a bound on the Output type, i.e., to this:

fn take_closure<F>(x: F) 
where
    F: FnMut<()>,
    for<'a> <F as FnMut<()>>::Output<'a>: Future<Output = bool>.
{ }

If we did this, then the Output value is permitted to capture 'a, and hence we are taking advantage of FnMut being a lending closure. This means that, when we call the closure, we have to await the resulting future before we can call again, just like we wanted.

Complications

Interpreting impl Trait this way is a bit tricky. For one thing, it seems inconsistent with how we interpret impl Trait in a parameter like impl Iterator. Today, that desugars to two fresh parameters where F: Iterator, G: Debug. We could probably change that without breaking real world code, since if the associated type is not a GAT I don’t think it matters, but we also permit things like impl Iterator that cannot be expressed as bounds. RFC #2289 proposed a new syntax for these sorts of bounds, such that one would write F: Iterator to express the same thing. By analogy, one could imagine writing F: FnMut(): Future, but that’s not consistent with the -> impl Future that we see elsewhere. It feels like there’s a bit of a tangle of string to sort out here if we try to go down this road, and I worry about winding up with something that is very confusing for end-users (too many subtle variations).

Conclusion

To recap all the points made in this post:

Giving vs lending traits is a fundamental pattern:
- A giving trait has a return value that never borrows from self
- A lending trait has a return value that may borrow from self
Giving traits are subtraits of lending traits; i.e., you can view a giving trait as a lending trait that happens not to lend.
We can’t convert Iterator to a lending trait “in place”, because functions that are generic over T: Iterator rely on it being the giving pattern.
Async closures are expressible using a lending variant of FnMut, but not the current trait, which is the giving version.
It is possible to modify the Fn* traits to be “lending” by changing how we desugar F: Fn, but we have to make it possible to write F::Output even when Output has a lifetime parameter (perhaps only if that parameter is statically known not to be used).
We’d also have to interpret FnMut() -> impl Future as being a bound on a possibly lent return type, which would be somewhat inconsistent with how Foo is interpreted now (which is as a fresh type).

Hat tip

Tip of the hat to Tyler Mandry – this post is basically a summary of a conversation we had.

Footnotes

There is a subtle point here. If you are iterating over, say, a &[T] value, then the Item you get back is an &T and hence borrowed. It may seem strange for me to say that you get ownership of the &T. The key point here is that the &T is borrowed from the collection you are iterating over and not from the iterator itself. In other words, from the point of view of the Iterator, it is copying out a &T reference and handing ownership of the reference to you. Owning the reference does not give you ownership of the data it refers to. ↩︎
Sometimes called “streaming” iterators. ↩︎
Not to mention that GATs remain in an “MVP” state that is rather unergonomic to use; we’re working on it! ↩︎
Of course, Rust’s notations for expressing these distinctions involve some “accidental complexity” of their own, and you might argue that the cure is worse than the disease. Fair enough. ↩︎
This example, by the way, demonstrates how the unergonomic state of GAT support. I don’t love writing for<'a> all the time. ↩︎

Fix my blog, please

2023-04-03T00:00:00+00:00

It’s well known that my blog has some issues. The category links don’t work. It renders oddly on mobile. And maybe Safari, too? The Rust snippets are not colored. The RSS feed is apparently not advertised properly in the metadata. It’s published via a makefile instead of some hot-rod CI/CD script, and it uses jekyll instead of whatever the new hotness is.¹ Being a programmer, you’d think I could fix this, but I am intimidated by HTML, CSS, and Github Actions. Hence this call for help: I’d like to hire someone to “tune up” the blog, a combination of fixing the underlying setup and also the visual layout. This post will be a rough set of things I have in mind, but I’m open to suggestions. If you think you’d be up for the job, read on.

Desiderata²

In short, I am looking for a rad visual designer who also can do the technical side of fixing up my jekyll and CI/CD setup.

Specific works item I have in mind:

Syntax highlighting
Make it look great on mobile and safari
Fix the category links
Add RSS feed into metadata and link it, whatever is normal
CI/CD setup so that when I push or land a PR, it deploys automatically
“Tune up” the layout, but keep the cute picture!³

Bonus points if you can make the setup easier to duplicate. Installing and upgrading Ruby is a horrible pain and I always forget whether I like rbenv or rubyenv or whatever better. Porting over to Hugo or Zola would likely be awesome, so long as links and content can be preserved. I do use some funky jekyll plugins, though I kind of forgot why. Alternatively maybe something with docker?

Current blog implementation

The blog is a jekyll blog with a custom theme. Sources are here:

Deployment is done via rsync at present.

Interested?

Send me an email with your name, some examples of past work, any recommendations etc, and the rate you charge. Thanks!

On the other hand, it has that super cute picture of my daughter (from around a decade ago, but still…). And the content, I like to think, is decent. ↩︎
I have a soft spot for wacky plurals, and “desiderata” might be my fave. I heard it first from a Dave Herman presentation to TC39 and it’s been rattling in my brain ever since, wanting to be used. ↩︎
Ooooh, I always want nice looking tables like those wizards who style github have. How come my tables are always so ugly? ↩︎

Thoughts on async closures

2023-03-29T00:00:00+00:00

I’ve been thinking about async closures and how they could work once we have static async fn in trait. Somewhat surprisingly to me, I found that async closures are a strong example for where async transformers could be an important tool. Let’s dive in! We’re going to start with the problem, then show why modeling async closures as “closures that return futures” would require some deep lifetime magic, and finally circle back to how async transformers can make all this “just work” in a surprisingly natural way.

Sync closures

Closures are omnipresent in combinator style APIs in Rust. For the purposes of this post, let’s dive into a really simple closure function, call_twice_sync:

fn call_twice_sync(mut op: impl FnMut(&str)) {
    op("Hello");
    op("Rustaceans");
}

As the name suggests, call_twice_sync invokes its argument twice. You might call it from synchronous code like so:

let mut buf = String::new();
call_twice_sync(|s| buf.push_str(s));

As you might expect, after this code executes, buf will have the value "HelloRustaceans". (Playground link, if you’re curious to try it out.)

Async closures as closures that return futures

Suppose we want to allow the closure to do async operations, though. That won’t work with call_twice_sync because the closure is a synchronous function:

let mut buf = String::new();
call_twice_sync(|s| s.push_str(receive_message().await));
//                                               ----- ERROR

Given that an async function is just a sync function that returns a future, perhaps we can model an async clousure as a sync closure that returns a future? Let’s try it.

async fn call_twice_async<F>(op: impl FnMut(&str) -> F)
where
    F: Future<Output = ()>,
{
    op("Hello").await;
    op("Rustaceans").await;
}

This compiles. So far so good. Now let’s try using it. For now we won’t even use an await, just the same sync code we tried before:

// Hint: won't compile
async fn use_it() {
    let mut buf = String::new();
    call_twice_async(|s| async { buf.push_str(s); });
    //                   ----- Return a future
}

Wait, what’s this? Lo and behold, we get an error, and a kind of intimidating one:

error: captured variable cannot escape `FnMut` closure body
  --> src/lib.rs:13:26
   |
12 |     let mut buf = String::new();
   |         ------- variable defined here
13 |     call_twice_async(|s| async { buf.push_str(s); });
   |                        - ^^^^^^^^---^^^^^^^^^^^^^^^
   |                        | |       |
   |                        | |       variable captured here
   |                        | returns an `async` block that contains a reference to a captured variable, which then escapes the closure body
   |                        inferred to be a `FnMut` closure
   |
   = note: `FnMut` closures only have access to their captured variables while they are executing...
   = note: ...therefore, they cannot allow references to captured variables to escape

So what is this all about? The last two lines actually tell you, but to really see it you have to do a bit of desugaring.

Futures capture the data they will use

The closure tries to construct a future with an async block. This async block is going to capture a reference to all the variables it needs: in this case, s and buf. So the closure will become something like:

|s| MyAsyncBlockType { buf, s }

where MyAsyncBlockType implements Future:

struct MyAsyncBlockType<'b> {
    buf: &'b mut String,
    s: &'b str,
}

impl Future for MyAsyncBlockType<'_> {
    type Output = ();
    
    fn poll(..) { ... }
}

The key point here is that the closure is returning a struct (MyAsyncBlockType) and this struct is holding on to a reference to both buf and s so that it can use them when it is awaited.

Closure signature promises to be finished

The problem is that the FnMut closure signature actually promises something different than what the body does. The signature says that it takes an &str – this means that the closure is allowed to use the string while it executes, but it cannot hold on to a reference to the string and use it later. The same is true for buf, which will be accessible through the implicit self argument of the closure. But when the closure return the future, it is trying to create references to buf and s that outlive the closure itself! This is why the error message says:

= note: `FnMut` closures only have access to their captured variables while they are executing...
= note: ...therefore, they cannot allow references to captured variables to escape

This is a problem!

Add some lifetime arguments?

So maybe we can declare the fact that we hold on to the data? It turns out you almost can, but not quite, and making an async closure be “just” a sync closure that returns a future would require some rather fundamental extensions to Rust’s trait system. There are two variables to consider, buf and s. Let’s begin with the argument s.

An aside: impl Trait capture rules

Before we dive more deeply into the closure case, let’s back up and imagine a top-level function that returns a future:

fn push_buf(buf: &mut String, s: &str) -> impl Future {
    async move {
        buf.push_str(s);
    }
}

If you try to compile this code, you’ll find that it does not build (playground):

error[E0700]: hidden type for `impl Future` captures lifetime that does not appear in bounds
 --> src/lib.rs:4:5
  |
3 |   fn push_buf(buf: &mut String, s: &str) -> impl Future {
  |                    ----------- hidden type `[async block@src/lib.rs:4:5: 6:6]` captures the anonymous lifetime defined here
4 | /     async move {
5 | |         buf.push_str(s);
6 | |     }
  | |_____^
  |
help: to declare that `impl Future` captures `'_`, you can introduce a named lifetime parameter `'a`
  |
3 | fn push_buf<'a>(buf: &'a mut String, s: &'a str) -> impl Future + 'a  {
  |            ++++       ++                 ++                                  ++++

impl Trait values can only capture borrowed data if they explicitly name the lifetime. This is why the suggested fix is to use a named lifetime 'a for buf and s and declare that the Future captures it:

fn push_buf<'a>(buf: &'a mut String, s: &'a str) -> impl Future<Output = ()> + 'a

If you desugar this return position impl trait into an explicit type alias impl trait, you can see the captures more clearly, as they become parameters to the type. The original (no captures) would be:

type PushBuf = impl Future<Output = ()>;
fn push_buf<'a>(buf: &'a mut String, s: &'a str) -> PushBuf

and the fixed version would be:

type PushBuf<'a> = impl Future<Output = ()> + 'a
fn push_buf<'a>(buf: &'a mut String, s: &'a str) -> PushBuf<'a>

From functions to closures

OK, so we just saw how we can define a function that returns an impl Future, how that future will wind up capturing the arguments, and how that is made explicit in the return type by references to a named lifetime 'a. We could do something similar for closures, although Rust’s rather limited support for explicit closure syntax makes it awkward. I’ll use the unimplemented syntax from RFC 3216, you can see the workaround on the playground if that’s your thing:

type PushBuf<'a> = impl Future<Output = ()> + 'a


async fn test() {
    let mut c = for<'a> |buf: &'a mut String, s: &'a str| -> PushBuf<'a> {
        async move { buf.push_str(s) }
    });
    
    let mut buf = String::new();
    c(&mut buf, "foo").await;
}

(Side note that this is an interesting case for the “currently under debate” rules around defining type alias impl trait.)

Now for the HAMMER

OK, so far so grody, but we’ve shown that indeed you could define a closure that returns a future and it seems like things would work. But now comes the problem. Let’s take a look at the call_twice_async function – i.e., instead of looking at where the closure is defined, we look at the function that takes the closure as argument. That’s where things get tricky.

Here is call_twice_async, but with the anonymous lifetime given an explicit name 'a:

fn call_twice_async<F>(op: impl for<'a> FnMut(&str) -> F)
where
    F: Future<Output = ()>,

Now the problem is this: we need to declare that the future which is returned (F) might capture 'a. But F is declared in an outer scope, and it can’t name 'a. In other words, right now, the return type F of the closure op must be the same each time the closure is called, but to get the semantics we want, we need the return type to include a different value for 'a each time.

If Rust had higher-kinded types (HKT), you could do something a bit wild, like this…

fn call_twice_async<F<'_>>(op: impl for<'a> FnMut(&'a str) -> F<'a>)
//                  ----- HKT
where
    for<'a> F<'a>: Future<Output = ()>,

but, of course, we don’t have HKT (and, cool as they are, I don’t think that’s a good fit for Rust right now, it would bust our complexity barrier in my opinion and then some without near enough payoff).

Short of adding HKT or some equivalent, I believe the option workaround is to use a dyn type:

fn call_twice_async(op: impl for<'a> FnMut(&'a str) -> Box<dyn Future<Output = ()> + 'a>)

This works today (and it is, for example, what moro does to resolve exactly this problem). Of course that means that the closure has to allocate a box, instead of just returning an async move. That’s a non-starter.

So we’re kind of stuck. As far as I can tell, modeling async closures as “normal closures that happen to return futures” requires one of two unappealing options

extend the language with HKT, or possibly some syntactic sugar that ultimately however desugars to HKT
use Box everywhere, giving up on zero cost futures, embedded use cases, etc.

More traits, less problems

But wait, there is another way. Instead of modeling async closures using the normal Fn traits, we could define some async closure traits. To keep our life simple, let’s just look at one, for FnMut:

trait AsyncFnMut<A> {
    type Output;
    
    async fn call(&mut self, args: A) -> Self::Output;
}

This is identical to the [sync FnMut] trait, except that call is an async fn. But that’s a pretty important difference. If we desugar the async fn to one using impl Trait, and then to GATs, we can start to see why:

trait AsyncFnMut<A> {
    type Output;
    type Call<'a>: Future<Output = Self::Output> + 'a;
    
    fn call(&mut self, args: A) -> Self::Call<'_>;
}

Notice the Generic Associated Type (GAT) Call. GATs are basically the Rusty way to do HKTs (if you want to go deeper, I wrote a comparison series which may help; back then we called them associated type constructors, not GATs). Essentially what has happened here is that we moved the “HKT” into the trait definition itself, instead of forcing the caller to have it.

Given this definition, when we try to write the “call twice async” function, things work out more smoothly:

async fn call_twice_async<F>(mut op: impl AsyncFnMut(&str)) {
    op.call("Hello").await;
    op.call("World").await;
}

Try it out on the playground, though note that we don’t actually support the () sugar for arbitrary traits, so I wrote impl for<'a> AsyncFnMut<&'a str, Output = ()> instead.

Connection to trait transformers

The translation between the normal FnMut trait and the AsyncFnMut trait was pretty automatic. The only thing we did was change the “call” function to async. So what if we had an async trait transformer, as was discussed earlier? Then we only have one “maybe async” trait, FnMut:

#[maybe(async)]
trait FnMut {
    type Output;
    
    #[maybe(async)]
    fn call(&mut self, args: A) -> Self::Output;
}

Now we can write call_twice either sync or async, as we like, and the code is virtually identical. The only difference is that I write impl FnMut for sync or impl async FnMut for async:

fn call_twice_sync(mut op: impl FnMut(&str)) {
    op.call("Hello");
    op.call("World");
}

async fn call_twice_async(mut op: impl async FnMut(&str)) {
    op.call("Hello").await;
    op.call("World").await;
}

Of course, with a more general maybe-async design, we might just write this function once, but that’s separate concern. Right now I’m only concerned with the idea of authoring traits that can be used in two modes, but not necessarily with writing code that is generic over which mode is being used.

Final note: creating the closure in a maybe-async world

When calling call_twice, we could write |s| buf.push_str(s) or async |s| buf.push_str(s) to indicate which traits it implements, but we could also infer this from context. We already do similar inference to decide the type of s for example. In fact, we could have some blanket impls, so that every F: FnMut also implements F: async FnMut; I guess this is generally true for any trait.

Conclusion

My conclusions:

Nothing in this discussion required or even suggested any changes to the underlying design of async fn in trait. Stabilizing the statically dispatched subset of async fn in trait should be forwards compatible with supporting async closures. 🎉
The “higher-kinded-ness” of async closures has to go somewhere. In stabilizing GATs, in my view, we’ve committed to the path that it should go into the trait definition (vs HKT, which would push it to the use site). The standard “def vs use site” tradeoffs apply here, I think: def sites often feel simpler and easier to understand, but are less flexible. I think that’s fine.
Async trait transformers feel like a great option here that makes async closures work just like you would expect.

Must move types

2023-03-16T00:00:00+00:00

Rust has lots of mechanisms that prevent you from doing something bad. But, right now, it has NO mechanisms that force you to do something good1. I’ve been thinking lately about what it would mean to add “must move” types to the language. This is an idea that I’ve long resisted, because it represents a fundamental increase to complexity. But lately I’m seeing more and more problems that it would help to address, so I wanted to try and think what it might look like, so we can better decide if it’s a good idea.

Must move?

The term ‘must move’ type is not standard. I made it up. The more usual name in PL circles is a “linear” type, which means a value that must be used exactly once. The idea of a must move type T is that, if some function f has a value t of type T, then f must move t before it returns (modulo panic, which I discuss below). Moving t can mean either calling some other function that takes ownership of t, returning it, or — as we’ll see later — destructuring it via pattern matching.

Here are some examples of functions that move the value t. You can return it…

fn return_it<T>(t: T) {
    t
}

…call a function that takes ownership of it…

fn send_it<T>(t: T) {
    channel.send(t); // takes ownership of `t`
}

…or maybe call a constructor function that takes ownership of it (which would usually mean you must “recursively” move the result)…

fn return_opt<T>(t: T) -> Option<T> {
    Some(t) // moves t into the option
}

Doesn’t Rust have “linear types” already?

You may have heard that Rust’s ownership and borrowing is a form of “linear types”. That’s not really true. Rust has affine types, which means a value that can be moved at most once. But we have nothing that forces you to move a value. For example, I can write the consume function in Rust today:

fn consume<T>(t: T) {
    /* look ma, no .. nothin' */
}

This function takes a value t of (almost, see below) any type T and…does nothing with it. This is not possible with linear types. If T were linear, we would have to do something with t — e.g., move it somewhere. This is why I call linear types must move.

What about the destructor?

“Hold up!”, you’re thinking, “consume doesn’t actually do nothing with t. It drops t, executing its destructor!” Good point. That’s true. But consume isn’t actually required to execute the destructor; you can always use forget to avoid it²:

fn consume<T>(t: T) {
    std::mem::forget(t();
}

If weren’t possible to “forget” values, destructors would mean that Rust had a linear system, but even then, it would only be in a technical sense. In particular, destructors would be a required action, but of a limited form — they can’t, for example, take arguments. Nor can they be async.

What about `Sized`?

There is one other detail about the consume type worth mentioning. When I write fn consume(t: T), that is actually shorthand for saying “any type T that is Sized”. In other words, the fully elaborated “do nothing with a value” function looks like this:

fn consume<T: Sized>(t: T) {
    std::mem::forget(t();
}

If you don’t want this default Sized bound, you write T: ?Sized. The leading ? means “maybe Sized” — i.e., now T can any type, whether it be sized (e.g., u32) or unsized (e.g., [u32]).

This is important: a where-clause like T: Foo narrows the set of types that T can be, since now it must be a type that implements Foo. The “maybe” where-clause T: ?Sized (we don’t accept other traits here) broadens the set of types that T can be, by removing default bounds.

So how would “must move” work?

You might imagine that we could encode “must move” types via a new kind of bound, e.g., T: MustMove. But that’s actually backwards. The problem is that “must move” types are actually a superset of ordinary types — after all, if you have an ordinary type, it’s still ok to write a function that always moves it. But it’s also ok to have a function that drops it or forgets it. In contrast, with a “must move” type, the only option is to move it. This implies that what we want is a ? bound, not a normal bound.

The notation I propose is ?Drop. The idea is that, by default, every type parameter D is assumed to be droppable, meaning that you can always choose to drop it at any point. But a M: ?Drop parameter is not necessarily droppable. You must ensure that a value of type M is moved somewhere else.

Let’s see a few examples to get the idea of it. To start, the identity function, which just returns its argument, could be declared with ?Drop:

fn identity<M: ?Drop>(m: M) -> M {
    m // OK — moving `m` to the caller
}

But the consume function could not:

fn consume<M: ?Drop>(m: M) -> M {
    // ERROR: `M` is not moved.
}

You might think that the version of consume which calls mem::forget is sound — after all, forget is declared like so

fn forget<T>(t: T) {
    /* compiler magic to avoid dropping */
}

Therefore, if consume were to call forget(m), wouldn’t that count as a move? The answer is yes, it would, but we still get an error. This is because forget is not declared with ?Drop, and therefore there is an implicit T: Drop where-clause:

fn consume<M: ?Drop>(m: M) -> M {
    forget(m); // ERROR: `forget` requires `M: Drop`, which isn’t known to hold.
}

Declaring types to be `?Drop`

Under this scheme, all structs and types you declare would be droppable by default. If you don’t implement Drop explicitly, the compiler adds an automatic Drop impl for you that just recursively drops your fields. But you could explicitly declare your type to be ?Drop by using a negative impl:

pub struct Guard {
    value: u32
}

impl !Drop for Guard { }

When you do this, the type becomes “must move” and any function which has a value of type Guard must either move it somewhere else. You might wonder then how you ever terminate — the answer is that one way to “move” the value is to unpack it with a pattern. For example, Guard might declare a log method:

impl Guard {
    pub fn log(self, message: &str) {
        let Guard { value } = self; // moves “self”
        println!(“{value} = {message}”);
    }
}

This plays nicely with privacy: if your type have private fields, only functions within that module will be able to destruct it, everyone else must (eventually) discharge their obligation to move by invoking some function within your module.

Interactions between “must move” and control-flow

Must move values interact with control-flow like ?. Consider the Guard type from the previous section, and imagine I have a function like this one…

fn execute(t: Guard) -> Result<(), std::io::Error> {
    let s: String = read_file(“message.txt”)?;  // ERROR: `t` is not moved on error
    t.log(&s);
    Ok(())
}

This code would not compile. The problem is that the ? in read_file may return with an Err result, in which case the call to t.log would not execute! This is a good error, in the sense that it is helping us ensure that the log call to Guard is invoked, but you can imagine that it’s going to interact with other things. To fix the error, you should do something like this…

fn execute(t: Guard) -> Result<(), std::io::Error> {
    match read_file(“message.txt”) {
        Ok(s) => {
		t.log(&s);
		Ok(())
        }
        Err(e) => {
            t.log(“error”); // now `t` is moved
            Err(e)
        }
    }
}

Of course, you could also opt to pass back the t value to the caller, making it their problem.

Conditional “must move” types

Talking about types like Option and Result — it’s clear that we are going to want to be able to have types that are conditionally must move — i.e., must move only if their type parameter is “must move”. That’s easy enough to do:

enum Option<T: ?Drop> {
    Some(T),
    None,
}

Some of the methods on Option work just fine:

impl<T: ?Drop> Option<T> {
    pub fn map<U: ?Drop>(self, op: impl FnOnce(T) -> U) -> Option<U> {
        match self {
            Some(t) => Some(op(t)),
            None => None,
        }
    }
}

Other methods would require a Drop bound, such as unwrap_or:

impl<T: ?Drop> Option<T> {
    pub fn unwrap_or(self, default:T) -> T
    where
        T: Drop,
    {
        match self {
            // OK
            None => default,

            // Without the `T: Drop` bound, we are not allowed to drop `default` here.
            Some(v) => v,
       }
    }
}

“Must move” and panic

One very interesting question is what to do in the case of panic. This is tricky! Ordinarily, a panic will unwind all stack frames, executing destructors. But what should we do for a ?Drop type that doesn’t have a destructor?

I see a few options:

Force an abort. Seems bad.
Deprecate and remove unwinding, limit to panic=abort. A more honest version of the previous one. Still seems bad, though dang would it make life easier.
Provide some kind of fallback option.

The last one is most appealing, but I’m not 100% sure how it works. It may mean that we don’t want to have the “must move” opt-in be to impl !Drop but rather to impl MustMove, or something like that, which would provide a method that is invoked on the case of panic (this method could, of course, choose to abort). The idea of fallback might also be used to permit cancellation with the ? operator or other control-flow drops (though I think we definitely want types that don’t permit cancellation in those cases).

“Must move” and trait objects

What do we do with dyn? I think the answer is that dyn Foo defaults to dyn Foo + Drop, and hence requires that the type be droppable. To create a “must move” dyn, we could permit dyn Foo + ?Drop. To make that really work out, we’d have to have self methods to consume the dyn (though today you can do that via self: Box methods).

Uses for “must move”

Contra to best practices, I suppose, I’ve purposefully kept this blog post focused on the mechanism of must move and not talked much about the motivation. This is because I’m not really trying to sell anyone on the idea, at least not yet, I just wanted to sketch some thoughts about how we might achieve it. That said, let me indicate why I am interested in “must move” types.

First, async drop: right now, you cannot have destructors in async code that perform awaits. But this means that async code is not able to manage cleanup in the same way that sync code does. Take a look at the status quo story about dropping database handles to get an idea of the kinds of problems that arise. Adding async drop itself isn’t that hard, but what’s really hard is guaranteeing that types with async drop are not dropped in sync code, as documented at length in Sabrina Jewson’s blog post. This is precisely because we currently assume that all types are droppable. The simplest way to achieve “async drop” then would to define a trait trait AsyncDrop { async fn async_drop(self); } and then make the type “must move”. This will force callers to eventually invoke async_drop(x).await. We might want some syntactic sugar to handle ? more easily, but that could come later.

Second, parallel structured concurrency. As Tyler Mandry elegant documented, if we want to mix parallel scopes and async, we need some way to have futures that cannot be forgotten. The way I think of it is like this: in sync code, when you create a local variable x on your stack, you have a guarantee from the language that it’s destructor will eventually run, unless you move it. In async code, you have no such guarantee, as your entire future could just be forgotten by a caller. “Must move” types solve this problem (with some kind of callback for panic) give us a tool to solve this problem, by having the future type be ?Drop — this is effectively a principled way to integrate completion-style futures that must be fully polled.

Finally, “liveness conditions writ large”. As I noted in the beginning, Rust’s type system today is pretty good at letting you guarantee “safety” properties (“nothing bad happens”), but it’s much less useful for liveness properties (“something good eventually happens”). Destructors let you get close, but they can be circumvented. And yet I see liveness properties cropping up all over the place, often in the form of guards or cleanup that really ought to happen. Any time you’ve ever wanted to have a destructor that takes an argument, that applies. This comes up a lot in unsafe code, in particular. Being able to “log” those obligations via “must move” types feels like a really powerful tool that will be used in many different ways.

Parting thoughts

This post sketches out one way to get “true linear” types in Rust, which I’ve dubbed as “must move” types. I think I would call this the ?Drop approach, because the basic idea is to allow types to “opt out” from being “droppable” (in which case they must be moved). This is not the only approach we could use. One of my goals with this blog post is to start collecting ideas for different ways to add linear capabilities, so that we can compare them with one another.

I should also address the obvious “elephant in the room”. The Rust type system is already complex, and adding “must move” types will unquestionably make it more complex. I’m not sure yet whether the tradeoff is worth it: it’s hard to judge without trying the system out. I think there’s a good chance that “must move” types live “on the edges” of the type system, through things like guards and so forth that are rarely abstracted over. I think that when you are dealing with concrete types, like the Guard example, must move types won’t feel particularly complicated. It will just be a helpful lint saying “oh, by the way, you are supposed to clean this up properly”. But where pain will arise is when you are trying to build up generic functions — and of course just in the sense of making the Rust language that much bigger. Things like ?Sized definitely make the language feel more complex, even if you never have to interact with them directly.

On the other hand, “must move” types definitely add value in the form of preventing very real failure modes. I continue to feel that Rust’s goal, above all else, is “productive reliability”, and that we should double down on that strength. Put another way, I think that the complexity that comes from reasoning about “must move” types is, in large part, inherent complexity, and I feel ok about extending the language with new tools for that. We saw this with the interaction with the ? operator — no doubt it’s annoying to have to account for moves and cleanup when an error occurs, but it’s also a a key part of building a robust system, and destructors don’t always cut it.

Well, apart from the “must use” lint. ↩︎
Or create a Rc-cycle, if that’s more your speed. ↩︎

Temporary lifetimes

2023-03-15T00:00:00+00:00

In today’s lang team design meeting, we reviewed a doc I wrote about temporary lifetimes in Rust. The current rules were established in a blog post I wrote in 2014. Almost a decade later, we’ve seen that they have some rough edges, and in particular can be a common source of bugs for people. The Rust 2024 Edition gives us a chance to address some of those rough edges. This blog post is a copy of the document that the lang team reviewed. It’s not a proposal, but it covers some of what works well and what doesn’t, and includes a few sketchy ideas towards what we could do better.

Summary

Rust’s rules on temporary lifetimes often work well but have some sharp edges. The 2024 edition offers us a chance to adjust these rules. Since those adjustments change the times when destructors run, they must be done over an edition.

Design principles

I propose the following design principles to guide our decision.

Independent from borrow checker: We need to be able to figure out when destructors run without consulting the borrow checker. This is a slight weakening of the original rules, which required that we knew when destructors would run without consulting results from name resolution or type check.
Shorter is more reliable and predictable: In general, we should prefer shorter temporary lifetimes, as that results in more reliable and predictable programs.
- Editor’s note: A number of people in the lang questions this point. The reasoning is as follows. First, a lot of the problems in practice come from locks that are held longer than expected. Second, problems that come from temporaries being dropped too early tend to manifest as borrow check errors. Therefore, they don’t cause reliability issues, but rather ergonomic ones.
Longer is more convenient: Extending temporary lifetimes where we can do so safely gives more convenience and is key for some patterns.
- Editor’s note: As noted in the previous bullet, our current rules sometimes give temporary lifetimes that are shorter than what the code requires, but these generally surface as borrow check errors.

Equivalences and anti-equivalences

The rules should ensure that E and (E), for any expression E, result in temporaries with the same lifetimes.

Today, the rules also ensure that E and {E}, for any expression E, result in temporaries with the same lifetimes, but this document proposes dropping that equivalence as of Rust 2024.

Current rules

When are temporaries introduced?

Temporaries are introduced when there is a borrow of a value-producing expression (often called an “rvalue”). Consider an example like &foo(); in this case, the compiler needs to produce a reference to some memory somewhere, so it stores the result of foo() into a temporary local variable and returns a reference to that.

Often the borrows are implicit. Consider a function get_data() that returns a Vec and a call get_data().is_empty(); because is_empty() is declared with &self on [T], this will store the result of get_data() into a temporary, invoke deref to get a &[T], and then call is_empty.

Default temporary lifetime

Whenever a temporary is introduced, the default rule is that the temporary is dropped at the end of the innermost enclosing statement; this rule is sometimes summarized as “at the next semicolon”. But the definition of statement involves some subtlety.

Block tail expressions. Consider a Rust block:

{
    stmt[0];
    ...
    stmt[n];
    tail_expression
}

And temporaries created in a statement stmt[i] will be dropped once that statement completes. But the tail expression is not considered a statement, so temporaries produced there are dropped at the end of the statement that encloses the block. For example, given get_data and is_empty as defined in the previous section, and a statement let x = foo({get_data().is_empty()});, the vector will be freed at the end of the let.

Conditional scopes for if and while. if and while expressions and if guards (but not match or if let) introduce a temporary scope around the condition. So any temporaries from expr in if expr { ... } would be dropped before the { ... } executes. The reasoning here is that all of these contexts produce a boolean and hence it is not possible to have a reference into the temporary that is still live. For example, given if get_data().is_empty(), the vector must be safe to drop before entering the body of the if. This is not true for a case like match get_data().last() { Some(x) => ..., None => ... }, where the x would be a reference into the vector returned by get_data().

Function scope. The tail expression of a function block (e.g., the expression E in fn foo() { E }) is not contained by any statement. In this case, we drop temporaries from E just before returning from the function, and thus fn last() -> Option<&Datum> { get_data().last() } fails the borrow check (because the temporary returned by get_data() is dropped before the function returns). Importantly, this function scope ends after local variables in the function are dropped. Therefore, this function…

fn foo() {
    let x = String::new();
    vec![].is_empty()
}

…is effectively desugared to this…

fn foo() {
    let tmp;
    {
        let x = String::new();
        { tmp = vec![]; &tmp }.is_empty()
    } // x dropped here
} // tmp dropped here

Lifetime extension

In some cases, temporary lifetimes are extended from the innermost statement to the innermost block. The rules for this are currently defined syntactically, meaning that they do not consider types or name resolution. The intution is that we extend the lifetime of the temporary for an expression E if it is evident that this temporary will be stored into a local variable. Consider the trivial example:

let t = &foo();

Here, foo() is a value expression, and hence &foo() needs to create a temporary so that we can have a reference. But the resulting &T is going to be stored in the local variable t. If we were to free the temporary at the next ;, this local variable would be immediately invalid. That doesn’t seem to match the user intent. Therefore, we extend the lifetime of the temporary so that it is dropped at the end of the innermost block. This is the equivalent of:

let tmp;
let t = { tmp = foo(); &tmp };

We can extend this same logic to compound expressions. Consider:

let t = (&foo(), &bar());

we will expand this to

let tmp1;
let tmp2;
let t = { tmp1 = foo(); tmp2 = bar(); (&tmp1, &tmp2) };

The exact rules are given by a grammar in the code and also covered in the reference. Rather than define them here I’ll just give some examples. In each case, the &foo() temporary is extended:

let t = &foo();

// Aggregates containing a reference that is stored into a local:
let t = Foo { x: &foo() };
let t = (&foo(), );
let t = [&foo()];

// Patterns that create a reference, rather than `&`:
let ref t = foo();

Here are some cases where temporaries are NOT extended:

let f = some_function(&foo()); // could be `fn some_function(x: &Vec) -> bool`, may not need extension

struct SomeTupleStruct<T>(T);
let f = SomeTupleStruct(&foo()); // looks like a function call

Patterns that work well in the current rules

Storing temporary into a local

struct Data<'a> {
    data: &'a [u32] // use a slice to permit subslicing later
}

fn initialize() {
    let d = Data { x: &[1, 2, 3] };
    //                 ^^^^^^^^^ extended temporary
    d.process();
}

impl Data<'_> {
    fn process(&mut self) {
        ...
        self.data = &self.data[1..];
        ...
    }
}

Reading values out of a lock/refcell

The current rules allow you to do atomic operations on locals/refcells conveniently, so long as they don’t return references to the data. This works great in a let statement (there are other cases below where it works less well).

let result = cell.borrow_mut().do_something();
// `cell` is not borrowed here
...

Error-prone cases with today’s rules

Today’s rules sometimes give lifetimes that are too long, resulting in bugs at runtime.

Deadlocks because of temporary lifetimes in matches

One very common problem is deadlocks (or panics, for ref-cell) when mutex locks occur in a match scrutinee:

match lock.lock().data.clone() {
    //     ------ returns a temporary guard
    
    Data { ... } => {
        lock.lock(); // deadlock
    }
    
} // <-- lock() temporary dropped here

Ergonomic problems with today’s rules

Today’s rules sometimes give lifetimes that are too short, resulting in ergonomic failures or confusing error messages.

Call parameter temporary lifetime is too short (RFC66)

Somewhat surprisingly, the following code does not compile:

fn get_data() -> Vec<u32> { vec![1, 2, 3] }

fn main() {
    let last_elem = get_data().last();
    drop(last_elem); // just a dummy use
}

This fails because the Vec returned by get_data() is stored into a temporary so that we can invoke last, which requires &self, but that temporary is dropped at the ; (as this case doesn’t fall under the lifetime extension rules).

RFC 66 proposed a rather underspecified extension to the temporary lifetime rules to cover this case; loosely speaking, the idea was to extend the lifetime extension rules to extend the lifetime of temporaries that appear in function arguments if the function’s signature is going to return a reference from that argument. So, in this case, the signature of last indicates that it returns a reference from self:

impl<T> [T] {
    fn last(&self) -> Option<&T> {...}
}

and therefore, since E.last() is being assigned to last_elem, we would extend the lifetime of any temporaries in E (the value for self). Ding Xiang Fei has been exploring how to actually implement RFC 66 and has made some progress, but it’s clear that we need to settle on the exact rules for when lifetime temporary extension should happen.

Even assuming we created some rules for RFC 66, there can be confusing cases that wouldn’t be covered. Consider this statement:

let l = get_data().last().unwrap();
drop(l); // ERROR

Here, the unwrap call has a signature fn(Option) -> T, which doesn’t contain any references. Therefore, it does not extend the lifetimes of temporaries in its arguments. The argument here is the expression get_data().last(), which creates a temporary to store get_data(). This temporary is then dropped at the end of the statement, and hence l winds up pointing to dead memory.

Statement-like expressions in tail position

The original rules assumed that changing E to {E} should not change when temporaries are dropped. This has the counterintuitive behavior though that introducing a block doesn’t constrain the stack lifetime of temporaries. It is also surprising for blocks that have tail expressions that are “statement-like” (e.g., match), because these can be used as statements without a ;, and thus users may not have a clear picture of whether they are an expression producing a value or a statement.

Example. The following code does not compile:

struct Identity<A>(A);
impl<A> Drop for Identity<A> {
    fn drop(&mut self) { }
}
fn main() {
    let x = 22;
    match Identity(&x) {
        //------------ creates a temporary that can be matched
        _ => {
            println!("");
        }
    } // <-- this is considered a trailing expression by the compiler
} // <-- temporary is dropped after this block executes

Because of the way that the implicit function scope works, and the fact that this match is actually the tail expression in the function body, this is effectively desugared to something like this:

struct Identity<A>(A);
impl<A> Drop for Identity<A> {
    fn drop(&mut self) { }
}
fn main() {
    let tmp;
    {
        let x = 22;
        match {tmp = Identity(&x); tmp} {
            _ => {
                println!("");
            }
        }
    }
}

Lack of equivalence between if and match

The current rules distinguish temporary behavior for if/while from match/if-let. As a result, code like this compiles and executes fine:

if lock.lock().something { // grab lock, then release
    lock.lock(); // OK to grab lock again
}

but very similar code using a match gives a deadlock:

if let true = lock.lock().something {
    lock.lock(), // Deadlock lock.lock(), // Deadlock
}

// or

match lock.lock().something {
    true => lock.lock(), // Deadlock
    false => (),
}

Partly as a result of this lack of equivalence, we have had a lot of trouble doing desugarings for things like let-else and if-let expressions.

Named block

Tail expressions aren’t the only way to “escape” a value from a block, the same applies to breaking with a named label, but they don’t benefit from lifetime extension. The following example, therefore, fails to compile:

fn main() {
    let x = 'a: {
        break 'a &vec![0]; // ERROR
    };
    
    drop(x);
}

Note that a tail-expression based version does compile today:

fn main() {
    let x = { &vec![0] };
    drop(x);
}

Proposed properties to focus discussion

To focus discussion, here are some named examples we can use that capture key patterns.

Examples of behaviors we would ideally preserve:

read-locked-field: let x: Event = ref_cell.borrow_mut().get_event(); releases borrow at the end of the statement (as today)
obvious aggregate construction: let x: Event = Event { x: &[1, 2, 3] } stores [1, 2, 3] in a temporary with block scope

Examples of behavior that we would like, but which we don’t have today, resulting in bugs/confusion:

match-locked-field: match data.lock().unwrap().data { ... } releases lock before match body executes
if-match-correspondence: if {}, if let true = {}, and match { true => .. } all behave the same with respect to temporaries in (unlike today)
block containment: {} must not create any temporaries that extend past the end of the block (unlike today)
tail-break-correspondence: {} and 'a: { break 'a } should be equivalent

Examples we behavior that we would like, but which we don’t have today, resulting in ergonomic pain (these cases may not be achievable without violating the previous ones):

last: let x = get_data().last(); (the canonical RFC66 example) will extend lifetime of data to end of block; also covers (some) new methods like let x: Event<'_> = Event::new(&[1, 2, 3])
last-unwrap: let x = get_data().last().unwrap(); (extended form of the above) will extend lifetime of data to end of block
tuple struct construction: let x = Event(&[1, 2, 3])

Tightest proposal

The proposal with minimal confusion would be to remove syntactic lifetime extension and tighten default lifetimes in two ways:

Tighten block tail expressions. Have temporaries in the tail expression of a block be dropped when returning from the block. This ensures block containment and tail-break-correspondence.

Tighten match scrutinees. Drop temporaries from match/if-let scrutinees performing the match. This ensures match-locked-field and if-match-correspondence. To avoid footguns, we can tighten up the rules around match/if-let scrutinees so that temporaries are dropped before entering body of the match.

In short, temporaries would always be dropped at the innermost statement, match/if/if-let/while scrutinee, or block.

Things that no longer build

There are three cases that build today which will no longer build with this minimal proposal:

let x = &vec![] no longer builds, nor does let x = Foo { x: &[1, 2, 3] }. Both of them create temporaries that are dropped at the end of the let.
match &foo.borrow_mut().parent { Some(ref p) => .., None => ... } no longer builds, since temporary from borrow_mut() is dropped before entering the match arms.
{let x = {&vec![0]}; ...} no longer builds, as a result of tightening block tail expressions. Note however that other examples, e.g. the one from th section “statement-like expressions in tail position”, would now build successfully.

The core proposal also does nothing to address RFC66-like patterns, tuple struct construction, etc.

Extension option A: Do What I Mean

One way to overcome the concerns of the core proposal would be to extend with more “DWIM”-like options. For example, we could extend “lifetime extension rules” to cover match expressions.

Lifetime extension for let statements, as today. To allow let x = &vec![] to build, we can restore today’s lifetime extension rules.

Pro: things like this will build

let x = Foo { 
    data: &get_data()
    //     ---------- stored in a temporary that outlives `x`
};)

Con: the following example would build again, which leads to a (perhaps surprising) panic – that said, I’ve never seen a case like this in the wild, the confusion always occurs with match

use std::cell::RefCell;

struct Foo<'a> {
    data: &'a u32
}

fn main() {
    let cell = RefCell::new(22);
    let x: Foo<'_> = Foo {
        data: &*cell.borrow_mut(),
    };
    *cell.borrow_mut() += 1; // <-- panic
    drop(x);
}

Scope extension for match structinees. To allow match &foo.borrow_mut().parent { Some(ref x => ... } to work, we could fix this by including similar scope extension rules to the ones used with let initializers (i.e., if we can see that a ref is taken into the temporary, then extend its lifetime, but otherwise do not).

Pro: match &foo.borrow_mut().parent { .. } works as it does today.
Con: Syntactic extension rules can be approximate, so e.g. match (foo(), bar().baz()) { (Some(ref x), y) => .. } would likely keep the temporary returned by bar(), even though it is not referenced.

RFC66-like rules. Use some heuristic rules to determine, from a function signature, when the return type includes data from the arguments. If the return type of a function f references a generic type or lifetime parameter that also appears in some argument i, and the function call f(a0, ..., ai, ..., an) appears in some position with an extended temporary lifetime, then ai will also have an extended temporary lifetime (i.e., any temporaries created in ai will persist until end of enclosing block / match expression).

Pro: Patterns like let x = E where E is get_data().last(), get_data().last().unwrap(), TupleStruct(&get_data()), or SomeStruct::new(&get_data()) would all allocate a temporary for get_data() that persistent until the end of the enclosing block. This occurs because
Con: Complex rules imply that let x = locked_vec.lock().last() would also extend lock lifetime to end-of-block, which users may not expect.

Extension option B: “Anonymous lets” for extended temporary lifetimes

Allow expr.let as an operator that means “introduce a let to store this value inside the innermost block but before the current statement and replace this statement with a reference to it”. So for example:

let x = get_data().let.last();

would be equivalent to

let tmp = get_data();
let x = tmp.last();

Question: Do we keep some amount of implicit extension? For example, should let x = &vec![] keep compiling, or do you have to do let x = &vec![].let?

Parting notes

Editor’s note: As I wrote at the start, this was an early document to prompt discussion in a meeting (you can see notes from the meeting here) It’s not a full proposal. That said, my position when I started writing was different than where I landed. Initially I was going to propose more of a “DWIM”-approach, tweaking the rules to be tighter in some places, more flexible in others. I’m still interested in exploring that, but I am worried that the end-result will just be people having very little idea when their destructors run. For the most part, you shouldn’t have to care about that, but it is sometimes quite important. That leads me to: let’s have some simple rules that can be explained on a postcard and work “pretty well”, and some convenient way to extend lifetimes when you want it. The .let syntax is interesting but ultimately probably too confusing to play this role.

Oh, and a note on the edition: I didn’t say it explicitly, but we can make changes to temporary lifetime rules over an edition by rewriting where necessary to use explicit lets, or (if we add one) some other explicit notation. The result would be code that runs on all editions with same semantics.

To async trait or just to trait

2023-03-12T00:00:00+00:00

One interesting question about async fn in traits is whether or not we should label the trait itself as async. Until recently, I didn’t see any need for that. But as we discussed the question of how to enable “maybe async” code, we realized that there would be some advantages to distinguishing “async traits” (which could contain async functions) from sync traits (which could not). However, as I’ve thought about the idea more, I’m more and more of the mind that we should not take this step — at least not now. I wanted to write a blog post divin g into the considerations as I see them now.

What is being proposed?

The specific proposal I am discussing is to require that traits which include async functions are declared as async traits…

// The "async trait" (vs just "trait") would be required
// to have an "async fn" (vs just a "fn").
async trait HttpEngine {
    async fn fetch(&mut self, url: Url) -> Vec<u8>;
}

…and when you reference them, you use the async keyword as well…

fn load_data<H>(h: &mut impl async HttpEngine, urls: &[Url]) {
    //                       ----- just writing `impl HttpEngine`
    //                             would be an error
    …
}

This would be a change from the support implemented in nightly today, where any trait can have async functions.

Why have “async traits” vs “normal” traits?

When authoring an async application, you’re going to define traits like HttpEngine that inherently involve async operations. In that case, having to write async trait seems like pure overhead. So why would we ever want it?

The answer is that not all traits are like HttpEngine. We can call HttpEngine an “always async” trait — it will always involve an async operation. But a lot of traits are “maybe async” — they sometimes involve async operations and sometimes not. In fact, we can probably break these down further: you have traits like Read, which involve I/O but have a sync and async equivalent, and then you have traits like Iterator, which are orthogonal from I/O.

Particularly for traits like Iterator, the current trajectory will result in two nearly identical traits in the stdlib: Iterator and AsyncIterator. These will be mostly the same apart from AsyncIterator have an async next function, and perhaps some more combinators. It’s not the end of the world, but it’s also not ideal, particularly when you consider that we likely want more “modes”, like a const Iterator, a “sendable” iterator, perhaps a fallible iterator (one that returns results), etc. This is of course the problem often referred to as the “color problem”, from Bob Nystron’s well-known “What color is your function?” blog post, and it’s precisely what the “keyword generics” initiative is looking to solve.

Requiring an async keyword ensures consistency between “maybe” and “always” async traits…

It’s not really clear what a full solution to the “color problem” looks like. But whatever it is, it’s going to involve having traits with multiple modes. So instead of Iterator and AsyncIterator, we’ll have the base definition of Iterator and then a way to derive an async version, async Iterator. We can then call an Iterator a “maybe async” trait, because it might be sync but it might be async. We might declare a “maybe async” trait using an attribute, like this¹:

#[maybe(async)]
trait Iterator {
    type Item;

    // Because of the #[maybe(async)] attribute,
    // the async keyword on this function means “if
    // this trait is in async mode, then this is an
    // async function”:
    async fn next(&mut self) -> Option<Self::Item>;
}

Now imagine I have a function that reads urls from some kind of input stream. This might be an async fn that takes an impl async Iterator as argument:

async fn read_urls(urls: impl async Iterator<Item = Url>) {
    //                        --——- specify async mode
    while let Some(u) = urls.next().await {
        //                          -———- needed because this is an async iterator
        …
    }
}

But now let’s say I want to combine this (async) iterator of urls and use an HttpEngine (our “always async” trait) to fetch them:

async fn fetch_urls(
    urls: impl async Iterator<Item = Url>,
    engine: impl HttpEngine,
) {
   while let Some(u) = urls.next().await {
       let data = engine.fetch(u).await;
       …
   }
}

There’s nothing wrong with this code, but it might be a bit surprising that I have to write impl async Iterator but I just write impl HttpEngine, even though both traits involve async functions. I can imagine that it would sometimes be hard to remember which traits are “always async” versus which ones are only “maybe async”.

…which also means traits can go from “always” to “maybe” async without a major version bump.

There is another tricky bit: imagine that I am authoring a library and I create a “always async” HttpEngine trait to start:

trait HttpEngine {
    async fn fetch(&mut self, url: Url) -> Vec<u8>;
}

but then later I want to issue a new version that offers a sync and an async version of HttpEngine. I can’t add a #[maybe(async)] to the trait declaration because, if I do so, then code using impl HttpEngine would suddenly be getting the sync version of the trait, whereas before they were getting the async version.

In other words, unless we force people to declare async traits up front, then changing a trait from “always async” to “maybe async” is a breaking change.

But writing `async Trait` for traits that are always async is annoying…

The points above are solid. But there are some flaws. The most obvious is that having to write async for every trait that uses an async function is likely to be pretty tedious. I can easily imagine that people writing async applications are going to use a lot of “always async” traits and I imagine that, each time they write impl async HttpEngine, they will think to themselves, “How many times do I have to tell the compiler this is async already?! We get it, we get it!!”

Put another way, the consistency argument (“how will I remember which traits need to be declared async?”) may not hold water in practice. I can imagine that for many applications the only “maybe async” traits are the core abstractions coming from libraries, like Iterator, and most of the other code is just “always async”. So actually it’s not that hard to remember which is which.

…and it’s not clear that traits will go from “always” to “maybe” async anyway…

But what about semver violations? Well, if my thesis above is correct, then it’s also true that there will be relatively few traits that need to go from “always async” to “maybe async”. Moreover, I imagine most libraries will know up front whether they expect to be sync or not. So maybe it’s not a big deal that this is a breaking change,

…and trait aliases would give a workaround for “always -> maybe” transitions anyway…

So, maybe it won’t happen in practice, but let’s imagine that we did define an always async HttpEngine and then later want to make the trait “maybe async”. Do we absolutely need a new major version of the crate? Not really, there is a workaround. We can define a new “maybe async” trait — let’s call it HttpFetch and then redefine HttpEngine in terms of HttpFetch:

// This is a trait alias. It’s an unstable feature that I would like to stabilize.
// Even without a trait alias, though, you could do this with a blanket impl.
trait HttpEngine = async HttpFetch;

#[maybe(async)]
trait HttpFetch { … }

This obviously isn’t ideal: you wind up with two names for the same underlying trait. Maybe you deprecate the old one. But it’s not the end of the world.

…and requiring async composes poorly with supertraits and trait aliases…

Actually, that last example brings up an interesting point. To truly ensure consistency, it’s not enough to say that “traits with async functions must be declared async”. We also need to be careful what we permit in trait aliases and supertraits. For example, imagine we have a trait UrlIterator that has an async Iterator as a supertrait…

trait UrlIterator: async Iterator<Item = Url> { }

…now people could write functions that take a impl UrlIterator, but it will still require await when you invoke its methods. So we didn’t really achieve consistency after all. The same thing would apply with a trait alias like trait UrlIterator = async Iterator.

It’s possible to imagine a requirement like “to have a supertrait that is async, the trait must be async”, but — to me — that feels non-compositional. I’d like to be able to declare a trait alias trait A = … and have the … be able to be any sort of trait bounds, whether they’re async or not. It feels funny to have the async propagate out of the ... and onto the trait alias A.

…and, while this decision is hard to reverse, it can be reversed.

So, let’s say that we were to stabilize the ability to add async functions to any trait. And then later we find that we actually want to have maybe async traits and that we wish we had required people to write async explicitly all the time, because consistency and semver. Are we stuck?

Well, not really. There are options here. For example, we might might make it possible to write async (but not required) and then lint and warn when people don’t. Perhaps in another edition, we would make it mandatory. This is basically what we did with the dyn keyword. Then we could declare that making a trait always-async to maybe-async is not considered worthy of a major version, because people’s code that follows the lints and warnings will not be affected. If we had transitioned so that all code in the new edition required an async keyword even for “always async” traits, we could let people declare a trait to be “maybe async but only in the new edition”, which would avoid all breakage entirely.

In any case, I don’t really want to do those things. It’d be embarassing and confusing to stabilize SAFIT and then decide that “oh, no, you have to declare traits to be async”. I’d rather we just think through the arguments now and make a call. But it’s always good to know that, just in case you’re wrong, you have options.

My (current) conclusion: YAGNI

So which way to go? I think the question hinges a lot on how common we expect “maybe async” code to be. My expectation is that, even if we do support it, “maybe async” will be fairly limited. It will mostly apply to (a) code like Iterator that is orthogonal from I/O and (b) core I/O primitives like the Read trait or the File type. If we’re especially successful, then crates like reqwest (which currently offers both a sync and async interface) would be able to unify those into one. But application code I expect to largely be written to be either sync or async.

I also think that it’ll be relatively unusual to go from “always async” to “maybe async”. Not impossible, but unusual enough that either making a new major version or using the “renaming” trick will be fine.

For this reason, I lean towards NOT requiring async trait, and instead allowing async fn to be added to any trait. I am still hopeful we’ll add “maybe async” traits as well, but I think there won’t be a big problem of “always async” traits needing to change to maybe async. (Clearly we are going to want to go from “never async” to “maybe async”, since there are lots of traits like Iterator in the stdlib, but that’s a non-issue.)

The other argument in favor is that it’s closer to what we do today. There are lots of people using #[async_trait] and I’ve never heard anyone say “it’s so weird that you can write T: HttpEngine and don’t have to write T: async HttpEngine”. At minimum, if we were going to change to requiring the “async” keyword, I would want to give that change some time to bake on nightly before we stabilized it. This could well delay stabilization significantly.

If, in contrast, you believed that lots of code was going to be “maybe async”, then I think you would probably want the async keyword to be mandatory on traits. After all, since most traits are maybe async anyway, you’re going to need to write it a lot of the time.

I can feel you fixating on the #[maybe(async)] syntax. Resist the urge! There is no concrete proposal yet. ↩︎

Trait transformers (send bounds, part 3)

2023-03-03T00:00:00+00:00

I previously introduced the “send bound” problem, which refers to the need to add a Send bound to the future returned by an async function. This post continues my tour over the various solutions that are available. This post covers “Trait Transformers”. This proposal arose from a joint conversation with myself, Eric Holk, Yoshua Wuyts, Oli Scherer, and Tyler Mandry. It’s a variant of Eric Holk’s inferred async send bounds proposal as well as the work that Yosh/Oli have been doing in the keyword generics group. Those posts are worth reading as well, lots of good ideas there.¹

Core idea: the trait transformer

A transformer is a way for a single trait definition to define multiple variants of that trait. For example, where T: Iterator means that T implements the Iterator trait we know and love, T: async Iterator means that T implements the async version of Iterator. Similarly, T: Send Iterator means that T implements the sendable version of Iterator (we’ll define both the “sendable version” and “async version” more precisely, don’t worry).

Transformers can be combined, so you can write T: async Send Iterator to mean “the async, sendable version”. They can also be distributed, so you can write T: async Send (Iterator + Factory) to mean the “async, sendable” version of both Iterator and Factory.

There are 3 proposed transformers:

async
const
any auto trait

The set of transformers is defined by the language and is not user extensible. This could change in the future, as transformers can be seen as a kind of trait alias.

The async transformer

The async transformer is used to choose whether functions are sync or async. It can only be applied to traits that opt-in by specifying which methods should be made into sync or async. Traits can opt-in either by declaring the async transformer to be mandatory, as follows…

async trait Fetch {
    async fn fetch(&mut self, url: Url) -> Data;
}

…or by making it optional, in which case we call it a “maybe-async” trait…

#[maybe(async)]
trait Iterator {
    type Item;
    
    #[maybe(async)]
    fn next(&mut self) -> Self::Item;
    
    fn size_hint(&self) -> Option<(usize, usize)>;
}

Here, the trait Iterator is the same Iterator we’ve always had, but async Iterator refers to the “async version” of Iterator, which means that it has an async next method (but still has a sync method size_hint).

(For the time being, maybe-async traits cannot have default methods, which avoids the need to deal with “maybe-async” code. This can change in the future.)

Trait transformer as macros

You can think of a trait transformer as being like a fancy kind of macro. When you write a maybe-async trait like Iterator above, you are effectively defining a template from which the compiler can derive a family of traits. You could think of the #[maybe(async)] annotation as a macro that derives two related traits, so that…

#[maybe(async)]
trait Iterator {
    type Item;
    
    #[maybe(async)]
    fn next(&mut self) -> Self::Item;
    
    fn size_hint(&self) -> Option<(usize, usize)>;
}

…would effectively expand into two traits, one with a sync next method and one with an async version…

trait Iterator { fn next(&mut self ) -> Self::Item; ... }
trait AsyncIterator { async fn next(&mut self) -> Self::Item; ... }

…when you have a where-clause like T: async Iterator, then, the compiler would be transforming that to T: AsyncIterator. In fact, Oli and Yosh implemented a procedural macro crate that does more-or-less exactly this.

The idea with trait transformers though is not to literally do expansions like the ones above, but rather to build those mechanisms into the compiler. This makes them more efficient, and also paves the way for us to have code that is generic over whether or not it is async, or expand the list of modifiers. But the “macro view” is useful to have in mind.

Always async traits

When a trait is declared like async trait Fetch, it only defines an async version, and it is an error to request the sync version like T: Fetch, you must write T: async Fetch.

Defining an async method without being always-async or maybe-async is disallowed:

trait Fetch {
    async fn fetch(&mut self, url: Url) -> Data; // ERROR
}

Forbidding traits of this kind means that traits can move from “always async” to “maybe async” without a breaking change. See the frequently asked questions for more details.

The const transformer

The const transformer works similarly to async. One can write

#[maybe(const)]
trait Compute {
    #[maybe(const)]
    fn a(&mut self);
    
    fn b(&mut self);
}

and then if you write T: const Compute it means that a must be a const fn but b need not be. Similarly one could write const trait Compute to indicate that the const transformer is mandatory.

The auto-trait transformer

Auto-traits can be used as a transformer. This is permitted on any (maybe) async trait or on traits that explicitly opt-in by defining #[maybe(Send)] variants. The default behavior of T: Send Foo for some trait Foo is that…

T must be Send
the future returned by any async method in Foo must be Send
the value returned by any RPITIT method must be Send²

Per these rules, given:

#[maybe(async)]
trait Iterator {
    type Item;

    #[maybe(async)]
    fn next(&mut self) -> Self::Item;
}

writing T: async Send Iterator would be equivalent to:

T: async Iterator + Send

using the return type notation.

The #[maybe(Send)] annotation can be applied to associated types or functions…

#[maybe(Send)]
trait IntoIterator {
    #[maybe(Send)]
    type IntoIter;
    
    type Item;
}

…in which case writing T: Send IntoIterator would expand to T: IntoIterator + Send.

Frequently asked questions

How is this different from eholk’s Inferred Async Send Bounds?

Eric’s proposal was similar in that it permitted T: async(Send) Foo as a similar sort of “macro” to get a bound that included Send bounds on the resulting futures. In that proposal, though the “send bounds” were tied to the use of async sugar, which means that you could no longer consider async fn to be sugar for a function returning an -> impl Future. That seemed like a bad thing, particularly since explicitly -> impl Future syntax is the only way to write an async fn that doesn’t capture all of its arguments.

How is this different from the keyword generics post?

Yosh and Oli posted a keyword generics update that included notation for “maybe async” traits (they wrote ?async) along with some other things. The ideas in this post are very similar to those, the main difference is treating Send as an independent transformer, similar to the previous question.

Should the auto-trait transformer be specific to each auto-trait, or generic?

As written, the auto-trait transformer is specific to a particular auto-trait, but it might be useful to be able to be generic over multiple (e.g., if you are maybe Send, you likely want to be maybe Send-Sync too, right?). You could imagine writing #[maybe(auto)] instead of #[maybe(Send)], but that’s kind of confusing, because an “always-auto” trait (i.e., an auto trait like Send) is quite a different thing from a “maybe-auto” trait (i.e., a trait that has a “sendable version”). OTOH users can’t define their own auto traits and likely will never be able to. Unclear.

Why make auto-trait transformer be opt-in?

You can imagine letting T: Send Foo mean T: Foo + Send for all traits Foo, without requiring Foo to be declared as maybe(Send). The problem is that this would mean that customizing the Send version of a trait for the first time is a semver breaking change, and so must be done at the same time the trait is introduced. This implies that no existing trait in the ecosystem could customize its Send version. Seems bad.

Will you permit `async` methods without the async transformer? Why or why not?

No. The following trait…

trait Http {
    async fn fetch(&mut self); // ERROR
}

…would get an error like “cannot use async in a trait unless it is declared as async or #[maybe(async)]. Ensuring that people write T: async Http and not just T: Http means that the trait can become “maybe async” later without breaking those clients. It also means that people would have to remember (when writing async code) whether a trait is “maybe async” or “always async” so they know whether to write T: async Http (for maybe-async traits) or T: Http (for always-async). This way, if the trait has async methods, you write async.

Why did you label methods in a `#[maybe(async)]` trait as `#[maybe(async)]` instead of `async`?

In the examples, I wrote maybe(async) traits like so:

#[maybe(async)]
trait Iterator {
    type Item;

    #[maybe(async)]
    fn next(&mut self) -> Self::Item;
}

Personally, I rather prefer the idea that inside a #[maybe(async)] block, you define the trait as it were always async…

#[maybe(async)]
trait Iterator {
    type Item;

    async fn next(&mut self) -> Self::Item;
}

…but then the async gets removed when used in a sync context. However, I changed it because I couldn’t figure out the right way to permit #[maybe(Send)] in this scenario. I can also imagine that it’s a bit confusing to write async fn when you maybe “maybe async”.

Why use an annotation (`#[..]`) like `#[maybe(async)]` instead of a keyword?

I don’t know, because ?async is hard to read, and we’ve got enough keywords? I’m open to bikeshedding here.

Do we still want return type notation?

Yes, RTN is useful for giving more precise specification of which methods should return send-futures (you may not want to require that all async methods are send, for example). It’s also needed internally by the compiler anyway as the “desugaring target” for the Send transformer.

Can we allow `#[maybe]` on types/functions?

Maybe!³ That’s basically full-on keyword generics. This proposal is meant as a stepping stone. It doesn’t permit code or types to be generic whether they are async/send/whatever, but it does permit us to define multiple versions of trait. To the language, it’s effectively a kind of macro, so that (i.e.) a single trait definition #[maybe(async)] trait Iterator effectively defines two traits, Iterator and AsyncIterator, and the T: async Iterator notation is being used to select the second one. (This is only an example, I don’t mean that users would literally be able to reference a AsyncIterator trait.)

What order are transformers applied?

Transformers must be written according to this grammar

Trait := async? const? Path* Path

where x? means optional x, x* means zero or more x, and the traits named in Path* must be auto-traits. The transformers (if present) are applied in order, so first things are made async, then const, then sendable. (I’m not sure if both async and const make any sense?)

Can auto-trait transformers let us genearlize over rc/arc?

Yosh at some point suggested that we could think of “send” or “not send” as another application of keyword generics, and that got me very excited. It’s a known problem that people have to define two versions of their structs (see e.g. the im and im-rc crates). Maybe we could permit something like

#[maybe(Send)]
struct Shared<T> {
    /* either Rc or Arc, depending */
}

and then permit variables of type Shared or Send Shared. The keywosrd generics proposals already are exploring the idea of structs whose types vary depending on whether they are async or not, so this fits in.

Conclusion

This post covered “trait transformers” as a possible solution the “send bounds” problem. Trait transformers are not exactly an alternative to the return type notation proposed earlier; they are more like a complement, in that they make the “easy easy”, but effectively provide a convenient desugaring to uses of return type notation.

The full set of solutions thus far are…

Return type notation (RTN)
- Example: T: Fetch
- Pros: flexible and expressive
- Cons: verbose
eholk’s inferred async send bounds
- Example: T: async(Send) Fetch
- Pros: concise
- Cons: specific to async notation, doesn’t support -> impl Future functions; requires RTN for completeness
trait transformers (this post)
- Example: T: async Send Fetch
- Pros: concise
- Cons: requires RTN for completeness

I originally planned to have part 3 of this series simply summarize those posts, in fact, but I consider Trait Transformers an evolution of those ideas, and close enough that I’m not sure separate posts are needed. ↩︎
It’s unclear if Send Foo should always convert RPITIT return values to be Send, but it is clear that we want some way to permit one to write -> impl Future in a trait and have that be Send iff async methods are Send. ↩︎
See what I did there? ↩︎

Return type notation (send bounds, part 2)

2023-02-13T00:00:00+00:00

In the previous post, I introduced the “send bound” problem, which refers to the need to add a Send bound to the future returned by an async function. I want to start talking about some of the ideas that have been floating around for how to solve this problem. I consider this a bit of an open problem, in that I think we know a lot of the ingredients, but there is a bit of a “delicate balance” to finding the right syntax and so forth. To start with, though, I want to introduce Return Type Notation, which is an idea that Tyler Mandry and I came up with for referring to the type returned by a trait method.

Recap of the problem

If we have a trait HealthCheck that has an async function check…

trait HealthCheck {
    async fn check(&mut self, server: Server);
}

…and then a function that is going to call that method check but in a parallel task…

fn start_health_check<H>(health_check: H, server: Server)
where
    H: HealthCheck + Send + 'static,
{ 
    …
}

…we don’t currently have a way to say that the future returned by calling H::check() is send. The where clause H: HealthCheck + Send says that the type H must be send, but it says nothing about the future that gets returned from calling check.

Core idea: A way to name “the type returned by a function”

The core idea of return-type notation is to let you write where-clauses that apply to ::check(..), which means “any return type you can get by calling check as defined in the impl of HealthCheck for H”. This notation is meant to be reminiscent of the fully qualified notation for associated types, e.g. ::Item. Just as we usually abbreviate associated types to T::Item, you would also typically abbreviate return type notation to H::check(..). The trait name is only needed when there is ambiguity.

Here is an example of how start_health_check would look using this notation:

fn start_health_check<H>(health_check: H, server: Server)
where
    H: HealthCheck + Send + 'static,
    H::check(..): Send, // <— return type notation

Here the where clause H::check(..): Send means “the type(s) returned when you call H::check must be Send. Since async functions return a future, this means that future must implement Send.

More compact notation

Although it has not yet been stabilized, RFC #2289 proposed a shorthand way to write bounds on associated types; something like T: Iterator means “T implements Iterator and its associated type Item implements Send”. We can apply that same sugar to return-type notations:

fn start_health_check<H>(health_check: H, server: Server)
where
    H: HealthCheck<check(..): Send> + Send + 'static,
    //             ^^^^^^^^^

This is more concise, though also clearly kind of repetitive. (When I read it, I think “how many dang times do I have to write Send?” But for now we’re just trying to explore the idea, not evaluate its downsides, so let’s hold on that thought.)

Futures capture their arguments

Note that the where clause we wrote was

H::check(..): Send

and not

H::check(..): Send + ‘static

Moreover, if we were to add a 'static bound, the program would not compile. Why is that? The reason is that async functions in Rust desugar to returning a future that captures all of the function’s arguments:

trait HealthCheck {
    // async fn check(&mut self, server: Server);
    fn check<‘s>(&’s mut self, server: Server) -> impl Future<Output = ()> + ‘s;
    //           ^^^^^^^^^^^^                                                ^^
    //         The future captures `self`, so it requires the lifetime bound `'s` 
}

Because the future being returned captures self, and self has type &’s mut Self, the Future returned must capture ’s. Therefore, it is not ’static, and so the where-clause H::check(..): Send + ‘static doesn’t hold for all possible calls to check, since you are not required to give an argument of type &’static mut Self.

RTN with specific parameter types

Most of the time, you would use RTN to bound all possible return values from the function. But sometimes you might want to be more specific, and talk just about the return value for some specific argument types. As a silly example, we could have a function like

fn call_check_with_static<H>(h: &’static mut H)
where
   H: HealthCheck + ‘static,
   H::check(&’static mut H, Server): ‘static,

This function has a generic parameter H that is ’static and it gets a &’static mut H as argument. The where clause H::check(&’static mut H, Server): ‘static then says: if I call check with the argument &’static mut H, it will return a ‘static future. In contrast to the previous section, where we were talking about any possible return value from check, this where-clause is true and valid.

Desugaring RTN to associated types

To understand what RTN does, it’s best to think of the desugaring from async functions to associated types. This desugaring is exactly how Rust works internally, but we are not proposing to expose it to users directly, for reasons I’ll elaborate in a bit.

We saw earlier how an async fn desugars to a function that returns impl Future. Well, in a trait, returning impl Future can itself be desugared to a trait with a(generic) associated type:

trait HealthCheck {
    // async fn check(&mut self, server: Server);
    type Check<‘t>: Future<Output = ()> + ‘t;
    fn check<‘s>(&’s mut self, server: Server) -> Self::Check<‘s>;
}

When we write a where-clause like H::check(..): Send, that is then effectively a bound on this hidden associated type Check:

fn start_health_check<H>(health_check: H, server: Server)
where
    H: HealthCheck + Send + 'static,
    for<‘a> H::Check<‘a>: Send, // <— equivalent to `H::check(..): Send`

Generic methods

It is also possible to have generic async functions in traits. Imagine that instead of HealthCheck taking a specific Server type, we wanted to accept any type that implements the trait ServerTrait:

trait HealthCheckGeneric {
    async fn check_gen<S: ServerTrait>(&mut self, server: S);
}

We can still think of this trait as desugaring to a trait with an associated type:

trait HealthCheckGeneric {
    // async fn check(&mut self, server: S) where S: ServerTrait,
    type CheckGen<‘t, S: ServerTrait>: Future<Output = ()> + ‘t;
   fn check_gen <‘s, S: ServerTrait>(&’s mut self, server: Server) -> Self::CheckGen<‘s, S>;
}

~~But if we want to write a where-clause like H::check_gen(..): Send, this would require us to support higher-ranked trait bounds over types and not just lifetimes:~~

fn start_health_check<H>(health_check: H, server: Server) where H: HealthCheckGeneric + Send + 'static, for<‘a, S> H::CheckGen<‘a, S>: Send, // <— // ^ for all types S…
As it happens, this sort of where-clause is something the types team is working on in our new solver design. I’m going to skip over the details, as it’s kind of orthogonal to the topic of how to write Send bounds.

One final note: just as you can specify a particular value for the argument types, you should be able to use turbofish to specify the value for generic parameters. So something like H::check_gen::(..): Send would mean “whenever you call check_gen on H with S = MyServer, the return type is Send”.

Using RTN outside of where-clauses

So far, all the examples I’ve shown you for RTN involved a where-clause. That is the most important context, but it should be possible to write RTN types any place you write a type. For the most part, this is just fine, but using the .. notation outside of a where-clause introduces some additional complications. Think of H::check — the precise type that is returned will depend on the lifetime of the first argument. So we could have one type H::check(&’a mut H, Server) and the return value would reference the lifetime ’a, but we could also have H::check(&’b mut H, Server), and the return value would reference the lifetime ’b. The .. notation really names a range of types. For the time being, I think we would simply say that .. is not allowed outside of a where-clause, but there are ways that you could make it make sense (e.g., it might be valid only when the return type doesn’t depend on the types of the parameters).

“Frequently asked questions”

That sums up our tour of the “return-type-notation” idea. In short:

You can write bounds like ::method(..): Send in a where-clause to mean “the method method from the impl of Trait for T returns a value that is Send, no matter what parameters I give it”.

Like an associated type, this would more commonly be written T::method(..), with the trait automatically determined.

You could also specify precise types for the parameters and/or generic types, like T::method(U, V).

Let’s dive into some of the common questions about this idea.

Why not just expose the desugared associated type directly?

Earlier I explained how H::check(..) would work by desugaring it to an associated type. So, why not just have users talk about that associated type directly, instead of adding a new notation for “the type returned by check”? The main reason is that it would require us to expose details about this desugaring that we don’t necessarily want to expose.

The most obvious detail is “what is the name of the associated type” — I think the only clear choice is to have it have the same name as the method itself, which is slightly backwards incompatible (since one can have a trait with an associated type and a method that has the same name), but easy enough to do over an edition.

We would also have to expose what generic parameters this associated type has. This is not always so simple. For example, consider this trait:

trait Dump { async fn dump(&mut self, data: &impl Debug); }
If we want to desugar this to an associated type, what generics should that type have?

trait Dump { type Dump<…>: Future<Output = ()> + …; // ^^^ how many generics go here? fn dump(&mut self, data: &impl Debug) -> Self::Dump<…>; }
This function has two sources of “implicit” generic parameters: elided lifetimes and the impl Trait argument. One desugaring would be:

trait Dump { type Dump<‘a, ‘b, D: Debug>: Future<Output = ()> + ‘a + ‘b; fn dump<‘a, ‘b, D: Debug>(&’a mut self, data: &’b D) -> Self::Dump<‘a, ‘b, D>; }
But, in this case, we could also have a simpler desugaring that uses just one lifetime parameter (this isn’t always the case):

trait Dump { type Dump<‘a, D: Debug>: Future<Output = ()> + ‘a; fn dump<‘a, D: Debug>(&’a mut self, data: &’a D) -> Self::Dump<‘a, D>; }
Regardless of how we expose the lifetimes, the impl Trait argument also raises interesting questions. In ordinary functions, the lang-team generally favors not including impl Trait arguments in the list of generics (i.e., they can’t be specified by turbofish, their values are inferred from the argument types), although we’ve not reached a final decision there. That seems inconsistent with exposing the type parameter D.

All in all, the appeal of the RTN is that it skips over these questions, leaving the compiler room to desugar in any of the various equivalent ways. It also means users don’t have to understand the desugaring, and can just think about the “return value of check”.

Should H::check(..): Send mean that the future is Send, or the result of the future?

Some folks have pointed out that H::check(..): Send seems like it refers to the value you get from awaiting check, and not the future itself. This is particularly true since our async function notation doesn’t write the future explicitly, unlike (say) C# or TypeScript (in those languages, an async fn must return a task or promise type). This seems true, it will likely be a source of confusion — but it’s also consistent with how async functions work. For example:

trait Get { async fn get(&mut self) -> u32; } async fn bar<G: Get>(g: &mut G) { let f: impl Future<Output = u32> = g.get(); }
In this code, even though g.get() is declared to return u32, f is a future, not an integer. Writing G::get(..): Send thus talks about the future, not the integer.

Isn’t RTN kind of verbose?

Interesting fact: when I talk to people about what is confusing in Rust, the trait system ranks as high or higher than the borrow checker. If we take another look at our motivation example, I think we can start to see why:

fn start_health_check<H>(health_check: H, server: Server) where H: HealthCheck<check(..): Send> + Send + 'static,
That where-clause basically just says “H is safe to use from other threads”, but it requires a pretty dense bit of notation! (And, of course, also demonstrates that the borrow checker and the trait system are not independent things, since ’static can be seen as a part of both, and is certainly a common source of confusion.) Wouldn’t it be nice if we had a more compact way to say that?

Now imagine you have a trait with a lot of methods:

trait AsyncOps { async fn op1(self); async fn op2(self); async fn op3(self); }
Under the current proposal, to create an AsyncOps that can be (fully) used across threads, one would write:

fn do_async_ops<A>(health_check: H, server: Server) where A: AsyncOps<op1(..): Send, op2(..): Send, op3(..): Send> + Send + 'static,
You could use a trait alias (if we stabilized them) to help here, but still, this seems like a problem!

But maybe that verbosity is useful?

Indeed! RTN is a very flexible notation. To continue with the AsyncOps example, we could write a function that says “the future returned by op1 must be send, but not the others”, which would be useful for a function like so:

async fn do_op1_in_parallel(a: impl AsyncOps<op1(..): Send + 'static>) { // ^^^^^^^^^^^^^^^^^^^^^^^ // Return value of `op1` must be Send, static tokio::spawn(a.op1()).await; }
Is RTN limited to async fn in traits?

All my examples have focused on async fn in traits, but we can use RTN to name the return types of any function anywhere. For example, given a function like get:

fn get() -> impl FnOnce() -> u32 { move || 22 }
we could allow you to write get() to name name the closure type that is returned:

fn foo() { let c: get() = get(); let d: u32 = c(); }
This seems like it would be useful for things like iterator combinators, so that you can say things like “the iterator returned by calling map is Send”.

Why do we have to write ..?

OK, nobody asks this, but I do sometimes feel that writing .. just seems silly. We could say that you just write H::check(): Send to mean “for all parameters”. (In the case where the method has no parameters, then “for all parameters” is satisfied trivially.) That doesn’t change anything fundamental about the proposal but it lightens the “line noise” aspect a tad:

fn start_health_check<H>(health_check: H, server: Server) where H: HealthCheck<check(): Send> + Send + 'static,
It does introduce some ambiguity. Did the user mean “for all parameters” or did they forget that check() has parameters? I’m not sure how this confusion is harmful, though. The main way I can see it coming about is something like this:

check() initially has zero parameters, and the user writes check(): Send.

In a later version of the program, a parameter is added, and now the meaning of check changes to “for all parameters” (although, as we noted before, that was arguably the meaning before).

There is a shift happening here, but what harm can it do? If the check still passes, then check(T): Send is true for any T. If it doesn’t, the user gets an error has to add an explicit type for this new parameter.

Can we really handle this in our trait solver?

As we saw when discussing generic methods, handling this feature in its full generality is a bit much for our trait solver today. But we could begin with a subset – for example, the notation can only be used in where-clauses and only for methods that are generic over lifetime parameters and not types. Tyler and I worked out a subset we believe would be readily implementable.

Conclusion

This post introduced return-type notation, an extension to the type grammar that allows you to refer to the return type of a trait method, and covered some of the pros/cons. Here is a rundown:

Pros:

Extremely flexible notation that lets us say precisely which methods must return Send types, and even lets us go into detail about which argument types they will be called with.

Avoids having to specify a desugaring to associated types precisely. For example, we don’t have to decide how to name that type, nor do we have to decide how many lifetime parameters it has, or whether impl Trait arguments become type parameters.

Can be used to refer to return values of things beyond async functions.

Cons:

New concept for users to learn — now they have associated types as well as associated return types.

Verbose even for common cases; doesn’t scale up to traits with many methods.

Async trait send bounds, part 1: intro

2023-02-01T00:00:00+00:00

Nightly Rust now has support for async functions in traits, so long as you limit yourself to static dispatch. That’s super exciting! And yet, for many users, this support won’t yet meet their needs. One of the problems we need to resolve is how users can conveniently specify when they need an async function to return a Send future. This post covers some of the background on send futures, why we don’t want to adopt the solution from the async_trait crate for the language, and the general direction we would like to go. Follow-up posts will dive into specific solutions.

Why do we care about Send bounds?

Let’s look at an example. Suppose I have an async trait for performs some kind of periodic health check on a given server:

trait HealthCheck { async fn check(&mut self, server: &Server) -> bool; }
Now suppose we want to write a function that, given a HealthCheck, starts a parallel task that runs that check every second, logging failures. This might look like so:

fn start_health_check<H>(health_check: H, server: Server) where H: HealthCheck + Send + 'static, { tokio::spawn(async move { while health_check.check(&server).await { tokio::time::sleep(Duration::from_secs(1)).await; } emit_failure_log(&server).await; }); }
So far so good! So what happens if we try to compile this? You can try it yourself if you use the async_fn_in_trait feature gate, you should see a compilation error like so:

error: future cannot be sent between threads safely --> src/lib.rs:15:18 | 15 | tokio::spawn(async move { | __________________^ 16 | | while health_check.check(&server).await { 17 | | tokio::time::sleep(Duration::from_secs(1)).await; 18 | | } 19 | | emit_failure_log(&server).await; 20 | | }); | |_____^ future created by async block is not `Send` | = help: within `[async block@src/lib.rs:15:18: 20:6]`, the trait `Send` is not implemented for `impl Future`
The error is saying that the future for our task cannot be sent between threads. But why not? After all, the health_check value is both Send and ’static, so we know that health_check is safe to send it over to the new thread. But the problem lies elsewhere. The error has an attached note that points it out to us:

note: future is not `Send` as it awaits another future which is not `Send` --> src/lib.rs:16:15 | 16 | while health_check.check(&server).await { | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ await occurs here
The problem is that the call to check is going to return a future, and that future is not known to be Send. To see this more clearly, let’s desugar the HealthCheck trait slightly:

trait HealthCheck { // async fn check(&mut self, server: &Server) -> bool; fn check(&mut self, server: &Server) -> impl Future<Output = bool>; // ^ Problem is here! This returns a future, but not necessarily a `Send` future. }
The problem is that check returns an impl Future, but the trait doesn’t say whether this future is Send or not. The compiler therefore sees that our task is going to be awaiting a future, but that future might not be sendable between threads.

What does the async-trait crate do?

Interestingly, if you rewrite the above example to use the async_trait crate, it compiles. What’s going on here? The answer is that the async_trait proc macro uses a different desugaring. Instead of creating a trait that yields -> impl Future, it creates a trait that returns a Pin>. This means that the future can be sent between threads; it also means that the trait is dyn-safe.

This is a good answer for the async-trait crate, but it’s not a good answer for a core language construct as it loses key flexibility. We want to support async in single-threaded executors, where the Send bound is irrelevant, and we also to support async in no-std applications, where Box isn’t available. Moreover, we want to have key interop traits (e.g., Read) that can be used for all three of those applications at the same time. An approach like the used in async-trait cannot support a trait that works for all three of those applications at once.

How would we like to solve this?

Instead of having the trait specify whether the returned future is Send (or boxed, for that matter), our preferred solution is to have the start_health_check function declare that it requires check to return a sendable future. Remember that health_check already included a where clause specifying that the type H was sendable across threads:

fn start_health_check<H>(health_check: H, server: Server) where H: HealthCheck + Send + 'static, // ————— ^^^^^^^^^^^^^^ “sendable to another disconnected thread” // | // Implements the `HealthCheck` trait
Right now, this where clause says two independent things:

H implements HealthCheck;

values of type H can be sent to an independent task, which is really a combination of two things

type H can be sent between threads (H: Send)

type H contains no references to the current stack (H: ‘static)

What we want is to add syntax to specify an additional condition:

H implements HealthCheck and its check method returns a Send future

In other words, we don’t want just any type that implements HealthCheck. We specifically want a type that implements HealthCheck and returns a Send future.

Note the contrast to the desugaring approach used in the async_trait crate: in that approach, we changed what it means to implement HealthCheck to always require a sendable future. In this approach, we allow the trait to be used in both ways, but allow the function to say when it needs sendability or not.

The approach of “let the function specify what it needs” is very in-line with Rust. In fact, the existing where-clause demonstrates the same pattern. We don’t say that implementing HealthCheck implies that H is Send, rather we say that the trait can be implemented by any type, but allow the function to specify that H must be both HealthCheck and Send.

Next post: Let’s talk syntax

I’m going to leave you on a cliffhanger. This blog post setup the problem we are trying to solve: for traits with async functions, we need some kind of syntax for declaring that you want an implementation that returns Send futures, and not just any implementation. In the next set of posts, I’ll walk through our proposed solution to this, and some of the other approaches we’ve considered and rejected.

Appendix: Why does the returned future have to be send anyway?

Some of you may wonder why it matters that the future returned is not Send. After all, the only thing we are actually sending between threads is health_check — the future is being created on the new thread itself, when we call check. It is a bit surprising, but this is actually highlighting an area where async tasks are different from threads (and where we might consider future language extensions).

Async is intended to support a number of different task models:

Single-threaded: all tasks run in the same OS thread. This is a great choice for embedded systems, or systems where you have lightweight processes (e.g., Fuchsia¹).

Work-dealing, sometimes called thread-per-core: tasks run in multiple threads, but once a task starts in a thread, it never moves again.

Work-stealing: tasks start in one thread, but can migrate between OS threads while they execute.

Tokio’s spawn function supports the final mode (work-stealing). The key point here is that the future can move between threads at any await point. This means that it’s possible for the future to be moved between threads while awaiting the future returned by check. Therefore, any data in this future must be Send.

This might be surprising. After all, the most common example of non-send data is something like a (non-atomic) Rc. It would be fine to create an Rc within one async task and then move that task to another thread, so long as the task is paused at the point of move. But there are other non-Send types that wouldn’t work so well. For example, you might make a type that relies on thread-local storage; such a type would not be Send because it’s only safe to use it on the thread in which it was created. If that type were moved between threads, the system could break.

In the future, it might be useful to separate out types like Rc from other Send types. The distinguishing characteristic is that Rc can be moved between threads so long as all possible aliases are also moved at the same time. Other types are really tied to a specific thread. There’s no example in the stdlib that comes to mind, but it seems like a valid pattern for Rust today that I would like to continue supporting. I’m not sure yet the right way to think about that!

I have finally learned how to spell this word without having to look it up! 💪 ↩︎

Rust in 2023: Growing up

2023-01-20T00:00:00+00:00

When I started working on Rust in 2011, my daughter was about three months old. She’s now in sixth grade, and she’s started growing rapidly. Sometimes we wake up to find that her clothes don’t quite fit anymore: the sleeves might be a little too short, or the legs come up to her ankles. Rust is experiencing something similar. We’ve been growing tremendously fast over the last few years, and any time you experience growth like that, there are bound to be a few rough patches. Things that don’t work as well as they used to. This holds both in a technical sense — there are parts of the language that don’t seem to scale up to Rust’s current size — and in a social one — some aspects of how the projects runs need to change if we’re going to keep growing the way I think we should. As we head into 2023, with two years to go until the Rust 2024 edition, this is the theme I see for Rust: maturation and scaling.

TL;DR

In summary, these are (some of) the things I think are most important for Rust in 2023:

Implementing “the year of everywhere” so that you can make any function async, write impl Trait just about anywhere, and fully utilize generic associated types; planning for the Rust 2024 edition.

Beginning work on a Rust specification and integrating it into our processes.

Defining rules for unsafe code and smooth tooling to check whether you’re following them.

Supporting efforts to teach Rust in universities and elsewhere.

Improving our product planning and user feedback processes.

Refining our governance structure with specialized teams for dedicated areas, more scalable structure for broad oversight, and more intensional onboarding.

“The year of everywhere” and the 2024 edition

What do async-await, impl Trait, and generic parameters have in common? They’re all essential parts of modern Rust, that’s one thing. They’re also all, in my opinion, in a “minimum viable product” state. Each of them has some key limitations that make them less useful and more confusing than they have to be. As I wrote in “Rust 2024: The Year of Everywhere”, there are currently a lot of folks working hard to lift those limitations through a number of extensions:

Generic associated types (stabilized in October, now undergoing various improvements!)

Type alias impl trait (proposed for stabilization)

Async functions in traits and “return position impl Trait in traits” (static dispatch available on nightly, but more work is needed)

Polonius (under active discussion)

None of these features are “new”. They just take something that exists in Rust and let you use it more broadly. Nonetheless, I think they’re going to have a big impact, on experienced and new users alike. Experienced users can express more patterns more easily and avoid awkward workarounds. New users never have to experience the confusion that comes from typing something that feels like it should work, but doesn’t.

One other important point: Rust 2024 is just around the corner! Our goal is to get any edition changes landed on master this year, so that we can spend the next year doing finishing touches. This means we need to put in some effort to thinking ahead and planning what we can achieve.

Towards a Rust specification

As Rust grows, there is increasing need for a specification. Mara had a recent blog post outlining some of the considerations — and especially the distinction between a specification and standardization. I don’t see the need for Rust to get involved in any standards bodies — our existing RFC and open-source process works well. But I do think that for us to continue growing out the set of people working on Rust, we need a central definition of what Rust should do, and that we need to integrate that definition into our processes more thoroughly.

In addition to long-standing docs like the Rust Reference, the last year has seen a number of notable efforts towards a Rust specification. The Ferrocene language specification is the most comprehensive, covering the grammar, name resolution, and overall functioning of the compiler. Separately, I’ve been working on a project called a-mir-formality, which aims to be a “formal model” of Rust’s type system, including the borrow checker. And Ralf Jung has MiniRust, which is targeting the rules for unsafe code.

So what would an official Rust specification look like? Mara opened RFC 3355, which lays out some basic parameters. I think there are still a lot of questions to work out. Most obviously, how can we combine the existing efforts and documents? Each of them has a different focus and — as a result — a somewhat different structure. I’m hopeful that we can create a complementary whole.

Another important question is how to integrate the specification into our project processes. We’ve already got a rule that new language features can’t be stabilized until the reference is updated, but we’ve not always followed it, and the lang docs team is always in need of support. There are hopeful signs here: both the Foundation and Ferrocene are interested in supporting this effort.

Unsafe code

In my experience, most production users of Rust don’t touch unsafe code, which is as it should be. But almost every user of Rust relies on dependencies that do, and those dependencies are often the most critical systems.

At first, the idea of unsafe code seems simple. By writing unsafe, you gain access to new capabilities, but you take responsibility for using them correctly. But the more you look at unsafe code, the more questions come up. What does it mean to use those capabilities correctly? These questions are not just academic, they have a real impact on optimizations performed by the Rust compiler, LLVM, and even the hardware.

Eventually, we want to get to a place where those who author unsafe code have clear rules to follow, as well as simple tooling to test if their code violates those rules (think cargo test —unsafe). Authors who want more assurance than dynamic testing can provide should have access to static verifiers that can prove their crate is safe — and we should start by proving the standard library is safe.

We’ve been trying for some years to build that world but it’s been ridiculously hard. Lately, though, there have been some breakthroughs. Gankra’s experiments with strict_provenance APIs have given some hope that we can define a relatively simple provenance model that will support both arbitrary unsafe code trickery and aggressive optimization, and Ralf Jung’s aforementioned MiniRust shows how a Rust operational semantics could look. More and more crates test with miri to check their unsafe code, and for those who wish to go further, the kani verifier can check unsafe code for UB (more formal methods tooling here).

I think we need a renewed focus on unsafe code in 2023. The first step is already underway: we are creating the opsem team. Led by Ralf Jung and Jakob Degen, the opsem team has the job of defining “the rules governing unsafe code in Rust”. It’s been clear for some time that this area requires dedicated focus, and I am hopeful that the opsem team will help to provide that.

I would like to see progress on dynamic verification. In particular, I think we need a tool that can handle arbitrary binaries. miri is great, but it can’t be used to test programs that call into C code. I’d like to see something more like valgrind or ubsan, where you can test your Rust project for UB even if it’s calling into other languages through FFI.

Dynamic verification is great, but it is limited by the scope of your tests. To get true reliability, we need a way for unsafe code authors to do static verification. Building static verification tools today is possible but extremely painful. The compiler’s APIs are unstable and a moving target. The stable MIR project proposes to change that by providing a stable set of APIs that tool authors can build on.

Finally, the best unsafe code is the unsafe code you don’t have to write. Unsafe code provides infinite power, but people often have simpler needs that could be made safe with enough effort. Projects like cxx demonstrate the power of this approach. For Rust the language, safe transmute is the most promising such effort, and I’d like to see more of that.

Teaching Rust in universities

More and more universities are offering classes that make use of Rust, and recently many of these educators have come together in the Rust Edu initiative to form shared teaching materials. I think this is great, and a trend we should encourage. It’s helpful for the Rust community, of course, since it means more Rust programmers. I think it’s also helpful for the students: much like learning a functional programming language, learning Rust requires incorporating different patterns and structure than other languages. I find my programs tend to be broken into smaller pieces, and the borrow checker forces me to be more thoughtful about which bits of context each function will need. Even if you wind up building your code in other languages, those new patterns will influence the way you work.

Stronger connections to teacher can also be a great source of data for improving Rust. If we understand better how people learn Rust and what they find difficult, we can use that to guide our priorities and look for ways to make it better. This might mean changing the language, but it might also mean changing the tooling or error messages. I’d like to see us setup some mechanism to feed insights from Rust educators, both in universities but also trainers at companies like Ferrous Systems or Integer32, into the Rust teams.

One particularly exciting effort here is the research being done at Brown University¹ by Will Crichton and Shriram Krisnamurthi. Will and Shriram have published an interactive version of the Rust book that includes quizzes. As a reader, these quizzes help you check that you understood the section. But they also provide feedback to the book authors on which sections are effective. And they allow for “A/B testing”, where you change the content of the book and see whether the quiz scores improve. Will and Shriram are also looking at other ways to deepen our understanding of how people learn Rust.

More insight and data into the user experience

As Rust has grown, we no longer have the obvious gaps in our user experience that there used to be (e.g., “no IDE support”). At the same time, it’s clear that the experience of Rust developers could be a lot smoother. There are a lot of great ideas of changes to make, but it’s hard to know which ones would be most effective. I would like to see a more coordinated effort to gather data on the user experience and transform it into actionable insights. Currently, the largest source of data that we have is the annual Rust survey. This is a great resource, but it only gives a very broad picture of what’s going on.

A few years back, the async working group collected “status quo” stories as part of its vision doc effort. These stories were immensely helpful in understanding the “async Rust user experience”, and they are still helping to shape the priorities of the async working group today. At the same time, that was a one-time effort, and it was focused on async specifically. I think that kind of effort could be useful in a number of areas.

I’ve already mentioned that teachers can provide one source of data. Another is simply going out and having conversations with Rust users. But I think we also need fine-grained data about the user experience. In the compiler team’s mid-year report, they noted (emphasis mine):

One more thing I want to point out: five of the ambitions checked the box in the survey that said “some of our work has reached Rust programmers, but we do not know if it has improved Rust for them.”

Right now, it’s really hard to know even basic things, like how many users are encountering compiler bugs in the wild. We have to judge that by how many comments people leave on a Github issue. Meanwhile, Esteban personally scours twitter to find out which error messages are confusing to people.² We should look into better ways to gather data here. I’m a fan of (opt-in, privacy preserving) telemetry, but I think there’s a discussion to be had here about the best approach. All I know is that there has to be a better way.

Maturing our governance

In 2015, shortly after 1.0, RFC 1068 introduced the original Rust teams: libs, lang, compiler, infra, and moderation. Each team is an independent, decision-making entity, owning one particular aspect of Rust, and operating by consensus. The “Rust core team” was given the role of knitting them together and providing a unifying vision. This structure has been a great success, but as we’ve grown, it has started to hit some limits.

The first limiting point has been bringing the teams together. The original vision was that team leads—along with others—would be part of a core team that would provide a unifying technical vision and tend to the health of the project. It’s become clear over time though that there are really different jobs. Over this year, the various Rust teams, project directors, and existing core team have come together to define a new model for project-wide governance. This effort is being driven by a dedicated working group and I am looking forward to seeing that effort come to fruition this year.

The second limiting point has been the need for more specialized teams. One example near and dear to my heart is the new types team, which is focused on type and trait system. This team has the job of diving into the nitty gritty on proposals like Generic Associated Types or impl Trait, and then surfacing up the key details for broader-based teams like lang or compiler where necessary. The aforementioned opsem team is another example of this sort of team. I suspect we’ll be seeing more teams like this.

There continues to be a need for us to grow teams that do more than coding. The compiler team prioritization effort, under the leadership of apiraino, is a great example of a vital role that allows Rust to function but doesn’t involve landing PRs. I think there are a number of other “multiplier”-type efforts that we could use. One example would be “reporters”, i.e., people to help publish blog posts about the many things going on and spread information around the project. I am hopeful that as we get a new structure for top-level governance we can see some renewed focus and experimentation here.

Conclusion

Seven years since Rust 1.0 and we are still going strong. As Rust usage spreads, our focus is changing. Where once we had gaping holes to close, it’s now more a question of iterating to build on our success. But the more things change, the more they stay the same. Rust is still working to empower people to build reliable, performant programs. We still believe that building a supportive, productive tool for systems programming — one that brings more people into the “systems programming” tent — is also the best way to help the existing C and C++ programmers “hack without fear” and build the kind of systems they always wanted to build. So, what are you waiting for? Let’s get building!

In disclosure, AWS is a sponsor of this work. ↩︎

To be honest, Esteban will probably always do that, whatever we do. ↩︎

Rust 2024...the year of everywhere?

2022-09-22T00:00:00+00:00

I’ve been thinking about what “Rust 2024” will look like lately. I don’t really mean the edition itself — but more like, what will Rust feel like after we’ve finished up the next few years of work? I think the answer is that Rust 2024 is going to be the year of “everywhere”. Let me explain what I mean. Up until now, Rust has had a lot of nice features, but they only work sometimes. By the time 2024 rolls around, they’re going to work everywhere that you want to use them, and I think that’s going to make a big difference in how Rust feels.

Async everywhere

Let’s start with async. Right now, you can write async functions, but not in traits. You can’t write async closures. You can’t use async drop. This creates a real hurdle. You have to learn the workarounds (e.g., the async-trait crate), and in some cases, there are no proper workarounds (e.g., for async-drop).

Thanks to a recent PR by Michael Goulet, static async functions in traits almost work on nightly today! I’m confident we can work out the remaining kinks soon and start advancing the static subset (i.e., no support for dyn trait) towards stabilization.

The plans for dyn, meanwhile, are advancing rapidly. At this point I think we have two good options on the table and I’m hopeful we can get that nailed down and start planning what’s needed to make the implementation work.

Once async functions in traits work, the next steps for core Rust will be figuring out how to support async closures and async drop. Both of them add some additional challenges — particularly async drop, which has some complex interactions with other parts of the language, as Sabrina Jewson elaborated in a great, if dense, blog post — but we’ve started to develop a crack team of people in the async working group and I’m confident we can overcome them.

There is also library work, most notably settling on some interop traits, and defining ways to write code that is portable across allocators. I would like to see more exploration of structured concurrency¹, as well, or other alternatives to select! like the stream merging pattern Yosh has been advocating for.

Finally, for extra credit, I would love to see us integrate async/await keywords into other bits of the function body, permitting you to write common patterns more easily. Yoshua Wuyts has had a really interesting series of blog posts exploring these sorts of ideas. I think that being able to do for await x in y to iterate, or (a, b).await as a form of join, or async let x = … to create a future in a really lightweight way could be great.

Impl trait everywhere

The impl Trait notation is one of Rust’s most powerful conveniences, allowing you to omit specific types and instead talk about the interface you need. Like async, however, impl Trait can only be used in inherent functions and methods, and can’t be used for return types in traits, nor can it be used in type aliases, let bindings, or any number of other places it might be useful.

Thanks to Oli Scherer’s hard work over the last year, we are nearing stabilization for impl Trait in type aliases. Oli’s work has also laid the groundwork to support impl trait in let bindings, meaning that you will be able to do something like

let iter: impl Iterator<Item = i32> = (0..10); // ^^^^^^^^^^^^^ Declare type of `iter` to be “some iterator”.
Finally, the same PR that added support for async fns in traits also added initial support for return-position impl trait in traits. Put it all together, and we are getting very close the letting you use impl trait everywhere you might want to.

There is still at least one place where impl Trait is not accepted that I think it should be, which is nested in other positions. I’d like you to be able to write impl Fn(impl Debug), for example, to refer to “some closure that takes an argument of type impl Debug” (i.e., can be invoked multiple times with different debug types).

Generics everywhere

Generic types are a big part of how Rust libraries are built, but Rust doesn’t allow people to write generic parameters in all the places they would be useful, and limitations in the compiler prevent us from making full use of the annotations we do have.

Not being able to use generic types everywhere might seem abstract, particularly if you’re not super familiar with Rust. And indeed, for a lot of code, it’s not a big deal. But if you’re trying to write libraries, or to write one common function that will be used all over your code base, then it can quickly become a huge blocker. Moreover, given that Rust supports generic types in many places, the fact that we don’t support them in some places can be really confusing — people don’t realize that the reason their idea doesn’t work is not because the idea is wrong, it’s because the language (or, often, the compiler) is limited.

The biggest example of generics everywhere is generic associated types. Thanks to hard work by Jack Huey, Matthew Jasper, and a number of others, this feature is very close to hitting stable Rust — in fact, it is in the current beta, and should be available in 1.65. One caveat, though: the upcoming support for GATs has a number of known limitations and shortcomings, and it gives some pretty confusing errors. It’s still really useful, and a lot of people are already using it on nightly, but it’s going to require more attention before it lives up to its full potential.

You may not wind up using GATs in your code, but it will definitely be used in some of the libraries you rely on. GATs directly enables common patterns like Iterable that have heretofore been inexpressible, but we’ve also seen a lot of examples where its used internally to help libraries present a more unified, simpler interface to their users.

Beyond GATs, there are a number of other places where we could support generics, but we don’t. In the previous section, for example, I talked about being able to have a function with a parameter like impl Fn(impl Debug) — this is actually an example of a “generic closure”. That is, a closure that itself has generic arguments. Rust doesn’t support this yet, but there’s no reason we can’t.

Oftentimes, though, the work to realize “generics everywhere” is not so much a matter of extending the language as it is a matter of improving the compiler’s implementation. Rust’s current traits implementation works pretty well, but as you start to push the bounds of it, you find that there are lots of places where it could be smarter. A lot of the ergonomic problems in GATs arise exactly out of these areas.

One of the developments I’m most excited about in Rust is not any particular feature, it’s the formation of the new types team. The goal of this team is to revamp the compiler’s trait system implementation into something efficient and extensible, as well as building up a core set of contributors.

Making Rust feel simpler by making it more uniform

The topics in this post, of course, only scratch the surface of what’s going on in Rust right now. For example, I’m really excited about “everyday niceties” like let/else-syntax and if-let-pattern guards, or the scoped threads API that we got in 1.63. There are exciting conversations about ways to improve error messages. Cargo, the compiler, and rust-analyzer are all generally getting faster and more capable. And so on, and so on.

The pattern of having a feature that starts working somewhere and then extending it so that it works everywhere seems, though, to be a key part of how Rust development works. It’s inspiring also because it becomes a win-win for users. Newer users find Rust easier to use and more consistent; they don’t have to learn the “edges” of where one thing works and where it doesn’t. Experienced users gain new expressiveness and unlock patterns that were either awkward or impossible before.

One challenge with this iterative development style is that sometimes it takes a long time. Async functions, impl Trait, and generic reasoning are three areas where progress has been stalled for years, for a variety of reasons. That’s all started to shift this year, though. A big part of is the formation of new Rust teams at many companies, allowing a lot more people to have a lot more time. It’s also just the accumulation of the hard work of many people over a long time, slowly chipping away at hard problems (to get a sense for what I mean, read Jack’s blog post on NLL removal, and take a look at the full list of contributors he cited there — just assembling the list was impressive work, not to mention the actual work itself).

It may have been a long time coming, but I’m really excited about where Rust is going right now, as well as the new crop of contributors that have started to push the compiler faster and faster than it’s ever moved before. If things continue like this, Rust in 2024 is going to be pretty damn great.

Oh, my beloved moro! I will return to thee! ↩︎

Dyn async traits, part 9: call-site selection

2022-09-21T00:00:00+00:00

After my last post on dyn async traits, some folks pointed out that I was overlooking a seemingly obvious possibility. Why not have the choice of how to manage the future be made at the call site? It’s true, I had largely dismissed that alternative, but it’s worth consideration. This post is going to explore what it would take to get call-site-based dispatch working, and what the ergonomics might look like. I think it’s actually fairly appealing, though it has some limitations.

If we added support for unsized return values…

The idea is to build on the mechanisms proposed in RFC 2884. With that RFC, you would be able to have functions that returned a dyn Future:

fn return_dyn() -> dyn Future<Output = ()> { async move { } }
Normally, when you call a function, we can allocate space on the stack to store the return value. But when you call return_dyn, we don’t know how much space we need at compile time, so we can’t do that¹. This means you can’t just write let x = return_dyn(). Instead, you have to choose how to allocate that memory. Using the APIs proposed in RFC 2884, the most common option would be to store it on the heap. A new method, Box::new_with, would be added to Box; it acts like new, but it takes a closure, and the closure can return values of any type, including dyn values:

let result = Box::new_with(|| return_dyn()); // result has type `Box>`
Invoking new_with would be ergonomically unpleasant, so we could also add a .box operator. Rust has had an unstable box operator since forever, this might finally provide enough motivation to make it worth adding:

let result = return_dyn().box; // result has type `Box>`
Of course, you wouldn’t have to use Box. Assuming we have sufficient APIs available, people can write their own methods, such as something to do arena allocation…

let arena = Arena::new(); let result = arena.new_with(|| return_dyn());
…or perhaps a hypothetical maybe_box, which would use a buffer if that’s big enough, and use box otherwise:

let mut big_buf = [0; 1024]; let result = maybe_box(&mut big_buf, || return_dyn()).await;
If we add postfix macros, then we might even support something like return_dyn.maybe_box!(&mut big_buf), though I’m not sure if the current proposal would support that or not.

What are unsized return values?

This idea of returning dyn Future is sometimes called “unsized return values”, as functions can now return values of “unsized” type (i.e., types who size is not statically known). They’ve been proposed in RFC 2884 by Olivier Faure, and I believe there were some earlier RFCs as well. The .box operator, meanwhile, has been a part of “nightly Rust” since approximately forever, though its currently written in prefix form, i.e., box foo².

The primary motivation for both unsized-return-values and .box has historically been efficiency: they permit in-place initialization in cases where it is not possible today. For example, if I write Box::new([0; 1024]) today, I am technically allocating a [0; 1024] buffer on the stack and then copying it into the box:

// First evaluate the argument, creating the temporary: let temp: [u8; 1024] = ...; // Then invoke `Box::new`, which allocates a Box... let box: *const T = allocate_memory(); // ...and copies the memory in. std::ptr::write(box, temp);
The optimizer may be able to fix that, but it’s not trivial. If you look at the order of operations, it requires making the allocation happen before the arguments are allocated. LLVM considers calls to known allocators to be “side-effect free”, but promoting them is still risky, since it means that more memory is allocated earlier, which can lead to memory exhaustion. The point isn’t so much to look at exactly what optimizations LLVM will do in practice, so much as to say that it is not trivial to optimize away the temporary: it requires some thoughtful heuristics.

How would unsized return values work?

This merits a blog post of its own, and I won’t dive into details. For our purposes here, the key point is that somehow when the callee goes to return its final value, it can use whatever strategy the caller prefers to get a return point, and write the return value directly in there. RFC 2884 proposes one solution based on generators, but I would want to spend time thinking through all the alternatives before we settled on something.

Using dynamic return types for async fn in traits

So, the question is, can we use dyn return types to help with async function in traits? Continuing with my example from my previous post, if you have an AsyncIterator trait…

trait AsyncIterator { type Item; async fn next(&mut self) -> Option<Self::Item>; }
…the idea is that calling next on a dyn AsyncIterator type would yield dyn Future>. Therefore, one could write code like this:

fn use_dyn(di: &mut dyn AsyncIterator) { di.next().box.await; // ^^^^ }
The expression di.next() by itself yields a dyn Future. This type is not sized and so it won’t compile on its own. Adding .box produces a Box, which you can then await.³

Compared to the Boxing adapter I discussed before, this is relatively straightforward to explain. I’m not entirely sure which is more convenient to use in practice: it depends how many dyn values you create and how many methods you call on them. Certainly you can work around the problem of having to write .box at each call-site via wrapper types or helper methods that do it for you.

Complication: dyn AsyncIterator does not implement AsyncIterator

There is one complication. Today in Rust, every dyn Trait type also implements Trait. But can dyn AsyncIterator implement AsyncIterator? In fact, it cannot! The problem is that the AsyncIterator trait defines next as returning impl Future<..>, which is actually shorthand for impl Future<..> + Sized, but we said that next would return dyn Future<..>, which is ?Sized. So the dyn AsyncIterator type doesn’t meet the bounds the trait requires. Hmm.

But…does dyn AsyncIterator have to implement AsyncIterator?

There is no “hard and fixed” reason that dyn Trait types have to implement Trait, and there are a few good reasons not to do it. The alternative to dyn safety is a design like this: you can always create a dyn Trait value for any Trait, but you may not be able to use all of its members. For example, given a dyn Iterator, you could call next, but you couldn’t call generic methods like map. In fact, we’ve kind of got this design in practice, thanks to the where Self: Sized hack that lets us exclude methods from being used on dyn values.

Why did we adopt object safety in the first place? If you look back at RFC 255, the primary motivation for this rule was ergonomics: clearer rules and better error messages. Although I argued for RFC 255 at the time, I don’t think these motivations have aged so well. Right now, for example, if you have a trait with a generic method, you get an error when you try to create a dyn Trait value, telling you that you cannot create a dyn Trait from a trait with a generic method. But it may well be clearer to get an error at the point where you to call that generic method telling you that you cannot call generic methods through dyn Trait.

Another motivation for having dyn Trait implement Trait was that one could write a generic function with T: Trait and have it work equally well for object types. That capability is useful, but because you have to write T: ?Sized to take advantage of it, it only really works if you plan carefully. In practice what I’ve found works much better is to implement Trait to &dyn Trait.

What would it mean to remove the rule that dyn AsyncIterator: AsyncIterator?

I think the new system would be something like this…

You can always⁴ create a dyn Foo value. The dyn Foo type would define inherent methods based on the trait Foo that use dynamic dispatch, but with some changes:

Async functions and other methods defined with -> impl Trait return -> dyn Trait instead.

Generic methods, methods referencing Self, and other such cases are excluded. These cannot be handled with virtual dispatch.

If Foo is object safe using today’s rules, dyn Foo: Foo holds. Otherwise, it does not.⁵

On a related but orthogonal note, I would like to make a dyn keyword required to declare dyn safety.

Implications of removing that rule

This implies that dyn AsyncIterator (or any trait with async functions/RPITIT⁶) will not implement AsyncIterator. So if I write this function…

fn use_any(x: &mut I) where I: ?Sized + AsyncIterator, { x.next().await }
…I cannot use it with I = dyn AsyncIterator. You can see why: it calls next and assumes the result is Sized (as promised by the trait), so it doesn’t add any kind of .box directive (and it shouldn’t have to).

What you can do is implement a wrapper type that encapsulates the boxing:

struct BoxingAsyncIterator<'i, I> { iter: &'i mut dyn AsyncIterator<Item = I> } impl AsyncIterator for BoxingAsyncIterator<'i, I> { type Item = I; async fn next(&mut self) -> Option<Self::Item> { self.iter.next().box.await } }
…and then you can call use_any(BoxingAsyncIterator::new(ai)).⁷

Limitation: what if you wanted to do stack allocation?

One of the goals with the previous proposal was to allow you to write code that used dyn AsyncIterator which worked equally well in std and no-std environments. I would say that goal was partially achieved. The core idea was that the caller would choose the strategy by which the future got allocated, and so it could opt to use inline allocation (and thus be no-std compatible) or use boxing (and thus be simple).

In this proposal, the call-site has to choose. You might think then that you could just choose to use stack allocation at the call-site and thus be no-std compatible. But how does one choose stack allocation? It’s actually quite tricky! Part of the problem is that async stack frames are stored in structs, and thus we cannot support something like alloca (at least not for values that will be live across an await, which includes any future that is awaited⁸). In fact, even outside of async, using alloca is quite hard! The problem is that a stack is, well, a stack. Ideally, you would do the allocation just before your callee returns, but that’s when you know how much memory you need. But at that time, your callee is still using the stack, so your allocation is on the wrong spot.⁹ I personally think we should just rule out the idea of using alloca to do stack allocation.

If we can’t use alloca, what can we do? We have a few choices. In the very beginning, I talked about the idea of a maybe_box function that would take a buffer and use it only for really large values. That’s kind of nifty, but it still relies on a box fallback, so it doesn’t really work for no-std.¹⁰ Might be a nice alternative to stackfuture though!¹¹

You can also achieve inlining by writing wrapper types (something tmandry and I prototyped some time back), but the challenge then is that your callee doesn’t accept a &mut dyn AsyncIterator, it accepts something like &mut DynAsyncIter, where DynAsyncIter is a struct that you defined to do the wrapping.

All told, I think the answer in reality would be: If you want to be used in a no-std environment, you don’t use dyn in your public interfaces. Just use impl AsyncIterator. You can use hacks like the wrapper types internally if you really want dynamic dispatch.

Question: How much room is there for the compiler to get clever?

One other concern I had in thinking about this proposal was that it seemed like it was overspecified. That is, the vast majority of call-sites in this proposal will be written with .box, which thus specifies that they should allocate a box to store the result. But what about ideas like caching the box across invocations, or “best effort” stack allocation? Where do they fit in? From what I can tell, those optimizations are still possible, so long as the Box which would be allocated doesn’t escape the function (which was the same condition we had before).

The way to think of it: by writing foo().box.await, the user told us to use the boxing allocator to box the return value of foo. But we can then see that this result is passed to await, which takes ownership and later frees it. We can thus decide to substitute a different allocator, perhaps one that reuses the box across invocations, or tries to use stack memory; this is fine so long as we modifed the freeing code to match. Doing this relies on knowing that the allocated value is immediately returned to us and that it never leaves our control.

Conclusion

To sum up, I think for most users this design would work like so…

You can use dyn with traits that have async functions, but you have to write .box every time you call a method.

You get to use .box in other places too, and we gain at least some support for unsized return values.¹²

If you want to write code that is sometimes using dyn and sometimes using static dispatch, you’ll have to write some awkward wrapper types.¹³

If you are writing no-std code, use impl Trait, not dyn Trait; if you must use dyn, it’ll require wrapper types.

Initially, I dismissed call-site allocation because it violated dyn Trait: Trait and it didn’t allow code to be written with dyn that could work in both std and no-std. But I think that violating dyn Trait: Trait may actually be good, and I’m not sure how important that latter constraint truly is. Furthermore, I think that Boxing::new and the various “dyn adapters” are probably going to be pretty confusing for users, but writing .box on a call-site is relatively easy to explain (“we don’t know what future you need, so you have to box it”). So now it seems a lot more appealing to me, and I’m grateful to Olivier Faure for bringing it up again.

One possible extension would be to permit users to specify the type of each returned future in some way. As I was finishing up this post, I saw that matthieum posted an intriguing idea in this direction on the internals thread. In general, I do see a need for some kind of “trait adapters”, such that you can take a base trait like Iterator and “adapt” it in various ways, e.g. producing a version that uses async methods, or which is const-safe. This has some pretty heavy overlap with the whole keyword generics initiative too. I think it’s a good extension to think about, but it wouldn’t be part of the “MVP” that we ship first.

Thoughts?

Please leave comments in this internals thread, thanks!

Appendix A: the Output associated type

Here is an interesting thing! The FnOnce trait, implemented by all callable things, defines its associated type Output as Sized! We have to change this if we want to allow unsized return values.

In theory, this could be a big backwards compatibility hazard. Code that writes F::Output can assume, based on the trait, that the return value is sized – so if we remove that bound, the code will no longer build!

Fortunately, I think this is ok. We’ve deliberately restricted the fn types so you can only use them with the () notation, e.g., where F: FnOnce() or where F: FnOnce() -> (). Both of these forms expand to something which explicitly specifies Output, like F: FnOnce<(), Output = ()>. What this means is that even if you really generic code…

fn foo(f: F) where F: FnOnce { let value: F::Output = f(); ... }
…when you write F::Output, that is actually normalized to R, and the type R has its own (implicit) Sized bound.

(There’s was actually a recent unsoundness related to this bound, closed by this PR, and we discussed exactly this forwards compatibility question on Zulip.)

Footnotes

I can hear you now: “but what about alloca!” I’ll get there. ↩︎

The box foo operator supported by the compiler has no current path to stabilization. There were earlier plans (see RFC 809 and RFC 1228), but we ultimately abandoned those efforts. Part of the problem, in fact, was that the precedence of box foo made for bad ergonomics: foo.box works much better. ↩︎

If you try to await a Box today, you get an error that it needs to be pinned. I think we can solve that by implementing IntoFuture for Box and having that convert it to Pin>. ↩︎

Or almost always? I may be overlooking some edge cases. ↩︎

Internally in the compiler, this would require modifying the definition of MIR to make “dyn dispatch” more first-class. ↩︎

Don’t know what RPITIT stands for?! “Return position impl trait in traits!” Get with the program! ↩︎

This is basically what the “magical” Boxing::new would have done for you in the older proposal. ↩︎

Brief explanation of why async and alloca don’t mix here. ↩︎

I was told Ada compiles will allocate the memory at the top of the stack, copy it over to the start of the function’s area, and then pop what’s left. Theoretically possible! ↩︎

You could imagine a version that aborted the code if the size is wrong, too, which would make it no-std safe, but not in a realiable way (aborts == yuck). ↩︎

Conceivably you could set the size to size_of(SomeOtherType) to automatically determine how much space is needed. ↩︎

I say at least some because I suspect many details of the more general case would remain unstable until we gain more experience. ↩︎

You have to write awkward wrapper types for now, anyway. I’m intrigued by ideas about how we could make that more automatic, but I think it’s way out of scope here. ↩︎

What I meant by the "soul of Rust"

2022-09-19T00:00:00+00:00

Re-reading my previous post, I felt I should clarify why I called it the “soul of Rust”. The soul of Rust, to my mind, is definitely not being explicit about allocation. Rather, it’s about the struggle between a few key values — especially productivity and versatility¹ in tension with transparency. Rust’s goal has always been to feel like a high-level but with the performance and control of a low-level one. Oftentimes, we are able to find a “third way” that removes the tradeoff, solving both goals pretty well. But finding those “third ways” takes time — and sometimes we just have to accept a certain hit to one value or another for the time being to make progress. It’s exactly at these times, when we have to make a difficult call, that questions about the “soul of Rust” starts to come into play. I’ve been thinking about this a lot, so I thought I would write a post that expands on the role of transparency in Rust, and some of the tensions that arise around it.

Why do we value transparency?

From the draft Rustacean Principles:

🔧 Transparent: “you can predict and control low-level details”

The C language, famously, maps quite closely to how machines typically operate. So much so that people have sometimes called it “portable assembly”.² Both C++ and Rust are trying to carry on that tradition, but to add on higher levels of abstraction. Inevitably, this leads to tension. Operator overloading, for example, makes figuring out what a + b more difficult.³

Transparency gives you control

Transparency doesn’t automatically give high performance, but it does give control. This helps when crafting your system, since you can set it up to do what you want, but it also helps when analyzing its performance or debugging. There’s nothing more frustrating than starting at code for hours and hours only to realize that the source of your problem isn’t anywhere in the code you can see — it lies in some invisible interaction that wasn’t made explicit.

Transparency can cost performance

The flip-side of transparency is overspecification. The more directly your program maps to assembly, the less room the compiler and runtime have to do clever things, which can lead to lower performance. In Rust, we are always looking for places where we can be less transparent in order to gain performance — but only up to a point. One example is struct layout: the Rust compiler retains the freedom to reorder fields in a struct, enabling us to make more compact data structures. That’s less transparent than C, but usually not in a way that you care about. (And, of course, if you want to specify the order of your fields, we offer the #[repr] attribute.)

Transparency hurts versatility and productivity

The bigger price of transparency, though, is versatility. It forces everyone to care about low-level details that may not actually matter to the problem at hand⁴. Relevant to dyn async trait, most async Rust systems, for example, perform allocations left and right. The fact that a particular call to an async function might invoke Box::new is unlikely to be a performance problem. For those users, selecting a Boxing adapter adds to the overall complexity they have to manage for very little gain. If you’re working on a project where you don’t need peak performance, that’s going to make Rust less appealing than other languages. I’m not saying that’s bad, but it’s a fact.

A zero-sum situation…

At this moment in the design of async traits, we are struggling with a core question here of “how versatile can Rust be”. Right now, it feels like a “zero sum situation”. We can add in something like Boxing::new to preserve transparency, but it’s going to cost us some in versatility — hopefully not too much.

…for now?

I do wonder, though, if there’s a “third way” waiting somewhere. I hinted at this a bit in the previous post. At the moment, I don’t know what that third way is, and I think that requiring an explicit adapter is the most practical way forward. But it seems to me that it’s not a perfect sweet spot yet, and I am hopeful we’ll be able to subsume it into something more general.

Some ingredients that might lead to a ‘third way’:

With-clauses or capabilities: I am intrigued by the idea of [with-clauses] and the general idea of scoped capabilities. We might be able to think about the “default adapter” as something that gets specified via a with-clause?

Const evaluation: One of the niftier uses for const evaluation is for “meta-programming” that customizes how Rust is compiled. For example, we could potentially let you write a const fn that creates the vtable data structure for a given trait.

Profiles and portability: Can we find a better way to identify the kinds of transparency that you want, perhaps via some kind of ‘profiles’? I feel we already have ‘de facto’ profiles right now, but we don’t recognize them. “No std” is a clear example, but another would be the set of operating systems or architectures that you try to support. Recognizing that different users have different needs, and giving people a way to choose which one fits them best, might allow us to be more supportive of all our users — but then again, it might make it make Rust “modal” and more confusing.

Comments?

Please leave comments in this internals thread. Thanks!

Footnotes

I didn’t write about versatility in my original post: instead I focused on the hit to productivity. But as I think about it now, versatility is really what’s at play here — versatility really meant that Rust was useful for high-level things and low-level things, and I think that requiring an explicit dyn adaptor is unquestionably a hit against being high-level. Interestingly, I put versatility after transparency in the list, meaning that it was lower priority, and that seems to back up the decision to have some kind of explicit adaptor. ↩︎

At this point, some folks point out all the myriad subtleties and details that are actually hidden in C code. Hush you. ↩︎

I remember a colleague at a past job discovering that somebody had overloaded the -> operator in our codebase. They sent out an angry email, “When does it stop? Must I examine every dot and squiggle in the code?” (NB: Rust supports overloading the deref operator.) ↩︎

Put another way, being transparent about one thing can make other things more obscure (“can’t see the forest for the trees”). ↩︎

Dyn async traits, part 8: the soul of Rust

2022-09-18T00:00:00+00:00

In the last few months, Tyler Mandry and I have been circulating a “User’s Guide from the Future” that describes our current proposed design for async functions in traits. In this blog post, I want to deep dive on one aspect of that proposal: how to handle dynamic dispatch. My goal here is to explore the space a bit and also to address one particularly tricky topic: how explicit do we have to be about the possibility of allocation? This is a tricky topic, and one that gets at that core question: what is the soul of Rust?

The running example trait

Throughout this blog post, I am going to focus exclusively on this example trait, AsyncIterator:

trait AsyncIterator { type Item; async fn next(&mut self) -> Option<Self::Item>; }
And we’re particularly focused on the scenario where we are invoking next via dynamic dispatch:

fn make_dyn<AI: AsyncIterator>(ai: AI) { use_dyn(&mut ai); // <— coercion from `&mut AI` to `&mut dyn AsyncIterator` } fn use_dyn(di: &mut dyn AsyncIterator) { di.next().await; // <— this call right here! }
Even though I’m focusing the blog post on this particular snippet of code, everything I’m talking about is applicable to any trait with methods that return impl Trait (async functions themselves being a shorthand for a function that returns impl Future).

The basic challenge that we have to face is this:

The caller function, use_dyn, doesn’t know what impl is behind the dyn, so it needs to allocate a fixed amount of space that works for everybody. It also needs some kind of vtable so it knows what poll method to call.

The callee, AI::next, needs to be able to package up the future for its next function in some way to fit the caller’s expectations.

The first blog post in this series¹ explains the problem in more detail.

A brief tour through the options

One of the challenges here is that there are many, many ways to make this work, and none of them is “obviously best”. What follows is, I think, an exhaustive list of the various ways one might handle the situation. If anybody has an idea that doesn’t fit into this list, I’d love to hear it.

Box it. The most obvious strategy is to have the callee box the future type, effectively returning a Box, and have the caller invoke the poll method via virtual dispatch. This is what the async-trait crate does (although it also boxes for static dispatch, which we don’t have to do).

Box it with some custom allocator. You might want to box the future with a custom allocator.

Box it and cache box in the caller. For most applications, boxing itself is not a performance problem, unless it occurs repeatedly in a tight loop. Mathias Einwag pointed out if you have some code that is repeatedly calling next on the same object, you could have that caller cache the box in between calls, and have the callee reuse it. This way you only have to actually allocate once.

Inline it into the iterator. Another option is to store all the state needed by the function in the AsyncIter type itself. This is actually what the existing Stream trait does, if you think about it: instead of returning a future, it offers a poll_next method, so that the implementor of Stream effectively is the future, and the caller doesn’t have to store any state. Tyler and I worked out a more general way to do inlining that doesn’t require user intervention, where you basically wrap the AsyncIterator type in another type W that has a field big enough to store the next future. When you call next, this wrapper W stores the future into that field and then returns a pointer to the field, so that the caller only has to poll that pointer. One problem with inlining things into the iterator is that it only works well for &mut self methods, since in that case there can be at most one active future at a time. With &self methods, you could have any number of active futures.

Box it and cache box in the callee. Instead of inlining the entire future into the AsyncIterator type, you could inline just one pointer-word slot, so that you can cache and reuse the Box that next returns. The upside of this strategy is that the cached box moves with the iterator and can potentially be reused across callers. The downside is that once the caller has finished, the cached box lives on until the object itself is destroyed.

Have caller allocate maximal space. Another strategy is to have the caller allocate a big chunk of space on the stack, one that should be big enough for every callee. If you know the callees your code will have to handle, and the futures for those callees are close enough in size, this strategy works well. Eric Holk recently released the [stackfuture crate] that can help automate it. One problem with this strategy is that the caller has to know the size of all its callees.

Have caller allocate some space, and fall back to boxing for large callees. If you don’t know the sizes of all your callees, or those sizes have a wide distribution, another strategy might be to have the caller allocate some amount of stack space (say, 128 bytes) and then have the callee invoke Box if that space is not enough.

Alloca on the caller side. You might think you can store the size of the future to be returned in the vtable and then have the caller “alloca” that space — i.e., bump the stack pointer by some dynamic amount. Interestingly, this doesn’t work with Rust’s async model. Async tasks require that the size of the stack frame is known up front.

Side stack. Similar to the previous suggestion, you could imagine having the async runtimes provide some kind of “dynamic side stack” for each task.² We could then allocate the right amount of space on this stack. This is probably the most efficient option, but it assumes that the runtime is able to provide a dynamic stack. Runtimes like embassy wouldn’t be able to do this. Moreover, we don’t have any sort of protocol for this sort of thing right now. Introducing a side-stack also starts to “eat away” at some of the appeal of Rust’s async model, which is designed to allocate the “perfect size stack” up front and avoid the need to allocate a “big stack per task”.³

Can async functions used with dyn be “normal”?

One of my initial goals for async functions in traits was that they should feel “as natural as possible”. In particular, I wanted you to be able to use them with dynamic dispatch in just the same way as you would a synchronous function. In other words, I wanted this code to compile, and I would want it to work even if use_dyn were put into another crate (and therefore were compiled with no idea of who is calling it):

fn make_dyn<AI: AsyncIterator>(ai: AI) { use_dyn(&mut ai); } fn use_dyn(di: &mut dyn AsyncIterator) { di.next().await; }
My hope was that we could make this code work just as it is by selecting some kind of default strategy that works most of the time, and then provide ways for you to pick other strategies for those code where the default strategy is not a good fit. The problem though is that there is no single default strategy that seems “obvious and right almost all of the time”…

Strategy Downside

Box it (with default allocator) requires allocation, not especially efficient

Box it with cache on caller side requires allocation

Inline it into the iterator adds space to AI, doesn’t work for &self

Box it with cache on callee side requires allocation, adds space to AI, doesn’t work for &self

Allocate maximal space can’t necessarily use that across crates, requires extensive interprocedural analysis

Allocate some space, fallback uses allocator, requires extensive interprocedural analysis or else random guesswork

Alloca on the caller side incompatible with async Rust

Side-stack requires cooperation from runtime and allocation

The soul of Rust

This is where we get to the “soul of Rust”. Looking at the above table, the strategy that seems the closest to “obviously correct” is “box it”. It works fine with separate compilation, fits great with Rust’s async model, and it matches what people are doing today in practice. I’ve spoken with a fair number of people who use async Rust in production, and virtually all of them agreed that “box by default, but let me control it” would work great in practice.

And yet, when we floated the idea of using this as the default, Josh Triplett objected strenuously, and I think for good reason. Josh’s core concern was that this would be crossing a line for Rust. Until now, there is no way to allocate heap memory without some kind of explicit operation (though that operation could be a function call). But if we wanted make “box it” the default strategy, then you’d be able to write “innocent looking” Rust code that nonetheless is invoking Box::new. In particular, it would be invoking Box::new each time that next is called, to box up the future. But that is very unclear from reading over make_dyn and use_dyn.

As an example of where this might matter, it might be that you are writing some sensitive systems code where allocation is something you always do with great care. It doesn’t mean the code is no-std, it may have access to an allocator, but you still would like to know exactly where you will be doing allocations. Today, you can audit the code by hand, scanning for “obvious” allocation points like Box::new or vec![]. Under this proposal, while it would still be possible, the presence of an allocation in the code is much less obvious. The allocation is “injected” as part of the vtable construction process. To figure out that this will happen, you have to know Rust’s rules quite well, and you also have to know the signature of the callee (because in this case, the vtable is built as part of an implicit coercion). In short, scanning for allocation went from being relatively obvious to requiring a PhD in Rustology. Hmm.

On the other hand, if scanning for allocations is what is important, we could address that in many ways. We could add an “allow by default” lint to flag the points where the “default vtable” is constructed, and you could enable it in your project. This way the compiler would warn you about the possible future allocation. In fact, even today, scanning for allocations is actually much harder than I made it ought to be: you can easily see if your function allocates, but you can’t easily see what its callees do. You have to read deeply into all of your dependencies and, if there are function pointers or dyn Trait values, figure out what code is potentially being called. With compiler/language support, we could make that whole process much more first-class and better.

In a way, though, the technical arguments are besides the point. “Rust makes allocations explicit” is widely seen as a key attribute of Rust’s design. In making this change, we would be tweaking that rule to be something like ”Rust makes allocations explicit most of the time”. This would be harder for users to understand, and it would introduce doubt as whether Rust really intends to be the kind of language that can replace C and C++⁴.

Looking to the Rustacean design principles for guidance

Some time back, Josh and I drew up a draft set of design principles for Rust. It’s interesting to look back on them and see what they have to say about this question:

⚙️ Reliable: “if it compiles, it works”

🐎 Performant: “idiomatic code runs efficiently”

🥰 Supportive: “the language, tools, and community are here to help”

🧩 Productive: “a little effort does a lot of work”

🔧 Transparent: “you can predict and control low-level details”

🤸 Versatile: “you can do anything with Rust”

Boxing by default, to my mind, scores as follows:

🐎 Performant: meh. The real goal with performant is that the cleanest code also runs the fastest. Boxing on every dynamic call doesn’t meet this goal, but something like “boxing with caller-side caching” or “have caller allocate space and fall back to boxing” very well might.

🧩 Productive: yes! Virtually every production user of async Rust that I’ve talked to has agreed that having code box by default would (but giving the option to do something else for tight loops) would be a great sweet spot for Rust.

🔧 Transparent: no. As I wrote before, understanding when a call may box now requires a PhD in Rustology, so this definitely fails on transparency.

(The other principles are not affected in any notable way, I don’t think.)

What the “user’s guide from the future” suggests

These considerations led Tyler and I to a different design. In the “User’s Guide From the Future” document from before, you’ll see that it does not accept the running example just as is. Instead, if you were to compile the example code we’ve been using thus far, you’d get an error:

error[E0277]: the type `AI` cannot be converted to a `dyn AsyncIterator` without an adapter --> src/lib.rs:3:23 | 3 | use_dyn(&mut ai); | ^^ adapter required to convert to `dyn AsyncIterator` | = help: consider introducing the `Boxing` adapter, which will box the futures returned by each async fn 3 | use_dyn(&mut Boxing::new(ai)); ++++++++++++ +
As the error suggests, in order to get the boxing behavior, you have to opt-in via a type that we called Boxing⁵:

fn make_dyn<AI: AsyncIterator>(ai: AI) { use_dyn(&mut Boxing::new(ai)); // ^^^^^^^^^^^ } fn use_dyn(di: &mut dyn AsyncIterator) { di.next().await; }
Under this design, you can only create a &mut dyn AsyncIterator when the caller can verify that the next method returns a type from which a dyn* can be constructed. If that’s not the case, and it’s usually not, you can use the Boxing::new adapter to create a Boxing. Via some kind of compiler magic that ahem we haven’t fully worked out yet⁶, you could coerce a Boxing into a dyn AsyncIterator.

The details of the Boxing type need more work⁷, but the basic idea remains the same: require users to make some explicit opt-in to the default vtable strategy, which may indeed perform allocation.

How does Boxing rank on the design principles?

To my mind, adding the Boxing adapter ranks as follows…

🐎 Performant: meh. This is roughly the same as before. We’ll come back to this.

🥰 Supportive: yes! The error message guides you to exactly what you need to do, and hopefully links to a well-written explanation that can help you learn about why this is required.

🧩 Productive: meh. Having to add Boxing::new call each time you create a dyn AsyncIterator is not great, but also on-par with other Rust papercuts.

🔧 Transparent: yes! It is easy to see that boxing may occur in the future now.

This design is now transparent. It’s also less productive than before, but we’ve tried to make up for it with supportiveness. “Rust isn’t always easy, but it’s always helpful.”

Improving performance with a more complex ABI

One thing that bugs me about the “box by default” strategy is that the performance is only “meh”. I like stories like Iterator, where you write nice code and you get tight loops. It bothers me that writing “nice” async code yields a naive, middling efficiency story.

That said, I think this is something we could fix in the future, and I think we could fix it backwards compatibly. The idea would be to extend our ABI when doing virtual calls so that the caller has the option to provide some “scratch space” for the callee. For example, we could then do things like analyze the binary to get a good guess as to how much stack space is needed (either by doing dataflow or just by looking at all implementations of AsyncIterator). We could then have the caller reserve stack space for the future and pass a pointer into the callee — the callee would still have the option of allocating, if for example, there wasn’t enough stack space, but it could make use of the space in the common case.

Interestingly, I think that if we did this, we would also be putting some pressure on Rust’s “transparency” story again. While Rust’s leans heavily on optimizations to get performance, we’ve generally restricted ourselves to simple, local ones like inlining; we don’t require interprocedural dataflow in particular, although of course it helps (and LLVM does it). But getting a good estimate of how much stack space to reserve for potential calleees would violate that rule (we’d also need some simple escape analysis, as I describe in Appendix A). All of this adds up to a bit of ‘performance unpredictability’. Still, I don’t see this as a big problem, particularly since the fallback is just to use Box::new, and as we’ve said, for most users that is perfectly adequate.

Picking another strategy, such as inlining

Of course, maybe you don’t want to use Boxing. It would also be possible to construct other kinds of adapters, and they would work in a similar fashion. For example, an inlining adapter might look like:

fn make_dyn<AI: AsyncIterator>(ai: AI) { use_dyn(&mut InlineAsyncIterator::new(ai)); // ^^^^^^^^^^^^^^^^^^^^^^^^ }
The InlineAsyncIterator type would add the extra space to store the future, so that when the next method is called, it writes the future into its own fields and then returns it to the caller. Similarly, a cached box adapter might be &mut CachedAsyncIterator::new(ai), only it would use a field to cache the resulting Box.

You may have noticed that the inline/cached adapters include the name of the trait. That’s because they aren’t relying on compiler magic like Boxing, but are instead intended to be authored by end-users, and we don’t yet have a way to be generic over any trait definition. (The proposal as we wrote it uses macros to generate an adapter type for any trait you wish to adapt.) This is something I’d love to address in the future. You can read more about how adapters work here.

Conclusion

OK, so let’s put it all together into a coherent design proposal:

You cannot coerce from an arbitrary type AI into a dyn AsyncIterator. Instead, you must select an adaptor:

Typically you want Boxing, which has a decent performance profile and “just works”.

But users can write their own adapters to implement other strategies, such as InlineAsyncIterator or CachingAsyncIterator.

From an implementation perspective:

When invoked via dynamic dispatch, async functions return a dyn* Future. The caller can invoke poll via virtual dispatch and invoke the (virtual) drop function when it’s ready to dispose of the future.

The vtable created for Boxing will allocate a box to store the future AI::next() and use that to create the dyn* Future.

The vtable for other adapters can use whatever strategy they want. InlineAsyncIterator, for example, stores the AI::next() future into a field in the wrapper, takes a raw pointer to that field, and creates a dyn* Future from this raw pointer.

Possible future extension for better performance:⁸

We modify the ABI for async trait functions (or any trait function using return-position impl trait) to allow the caller to optionally provide stack space. The Boxing adapter, if such stack space is available, will use it to avoid boxing when it can. This would have to be coupled with some compiler analysis to figure out how much to stack space to pre-allocate.

This lets us express virtually any pattern. Its even possible to express side-stacks, if the runtime provides a suitable adapter (e.g., TokioSideStackAdapter::new(ai)), though if side-stacks become popular I would rather consider a more standard means to expose them.

The main downsides to this proposal are:

Users have to write Boxing::new, which is a productivity and learnability hit, but it avoids a big hit to transparency. Is that the right call? I’m still not entirely sure, though my heart increasingly says yes. It’s also something we could revisit in the future (e.g., and add a default adapter).

If we opt to modify the ABI, we’re adding some complexity there, but in exchange for potentially quite a lot of performance. I would expect us not to do this initially, but to explore it as an extension in the future once we have more data about how important it is.

There is one pattern that we can’t express: “have caller allocate maximal space”. This pattern guarantees that heap allocation is not needed; the best we can do is a heuristic that tries to avoid heap allocation, since we have to consider public functions on crate boundaries and the like. To offer a guarantee, the argument type needs to change from &mut dyn AsyncIterator (which accepts any async iterator) to something narrower. This would also support futures that escape the stack frame (see Appendix A below). It seems likely that these details don’t matter, and that either inline futures or heuristics would suffice, but if not, a crate like stackfuture remains an option.

Comments?

Please leave comments in this internals thread. Thanks!

Appendix A: futures that escape the stack frame

In all of this discussion, I’ve been assuming that the async call was followed closely by an await. But what happens if the future is not awaited, but instead is moved into the heap or other locations?

fn foo(x: &mut dyn AsyncIterator<Item = u32>) -> impl Future<Output = Option<u32>> + ‘_ { x.next() }
For boxing, this kind of code doesn’t pose any problem at all. But if we had allocated space on the stack to store the future, examples like this would be a problem. So long as the scratch space is optional, with a fallback to boxing, this is no problem. We can do an escape analysis and avoid the use of scratch space for examples like this.

Footnotes

Written in Sep 2020, egads! ↩︎

I was intrigued to learn that this is what Ada does, and that Ada features like returning dynamically sized types are built on this model. I’m not sure how SPARK and other Ada subsets that target embedded spaces manage that, I’d like to learn more about it. ↩︎

Of course, without a side stack, we are left using mechanisms like Box::new to cover cases like dynamic dispatch or recursive functions. This becomes a kind of pessimistically sized segmented stack, where we allocate for each little piece of extra state that we need. A side stack might be an appealing middle ground, but because of cases like embassy, it can’t be the only option. ↩︎

Ironically, C++ itself inserts implicit heap allocations to help with coroutines! ↩︎

Suggestions for a better name very welcome. ↩︎

Pay no attention to the compiler author behind the curtain. 🪄 🌈 Avert your eyes! ↩︎

e.g., if you look closely at the User’s Guide from the Future, you’ll see that it writes Boxing::new(&mut ai), and not &mut Boxing::new(ai). I go back and forth on this one. ↩︎

I should clarify that, while Tyler and I have discussed this, I don’t know how he feels about it. I wouldn’t call it ‘part of the proposal’ exactly, more like an extension I am interested in. ↩︎

Come contribute to Salsa 2022!

2022-08-18T00:00:00+00:00

Have you heard of the Salsa project? Salsa is a library for incremental computation – it’s used by rust-analyzer, for example, to stay responsive as you type into your IDE (we have also discussed using it in rustc, though more work is needed there). We are in the midst of a big push right now to develop and release Salsa 2022, a major new revision to the API that will make Salsa far more natural to use. I’m writing this blog post both to advertise that ongoing work and to put out a call for contribution. Salsa doesn’t yet have a large group of maintainers, and I would like to fix that. If you’ve been looking for an open source project to try and get involved in, maybe take a look at our Salsa 2022 tracking issue and see if there is an issue you’d like to tackle?

So wait, what does Salsa do?

Salsa is designed to help you build programs that respond to rapidly changing inputs. The prototypical example is a compiler, especially an IDE. You’d like to be able to do things like “jump to definition” and keep those results up-to-date even as the user is actively typing. Salsa can help you build programs that manage that.

The key way that Salsa achieves reuse is through memoization. The idea is that you define a function that does some specific computation, let’s say it has the job of parsing the input and creating the Abstract Syntax Tree (AST):

fn parse_program(input: &str) -> AST { }
Then later I have other functions that might take parts of that AST and operate on them, such as type-checking:

fn type_check(function: &AstFunction) { }
In a setup like this, I would like to have it so that when my base input changes, I do have to re-parse but I don’t necessarily have to run the type checker. For example, if the only change to my progam was to add a comment, then maybe my AST is not affected, and so I don’t need to run the type checker again. Or perhaps the AST contains many functions, and only one of them changed, so while I have to type check that function, I don’t want to type check the others. Salsa can help you manage this sort of thing automatically.

What is Salsa 2022 and how is it different?

The original salsa system was modeled very closely on the [rustc query system]. As such, it required you to structure your program entirely in terms of functions and queries that called one another. All data was passed through return values. This is a very powerful and flexible system, but it can also be kind of mind-bending sometimes to figure out how to “close the loop”, particularly if you wanted to get effective re-use, or do lazy computation.

Just looking at the parse_program function we saw before, it was defined to return a complete AST:

fn parse_program(input: &str) -> AST { }
But that AST has, internally, a lot of structure. For example, perhaps an AST looks like a set of functions:

struct Ast { functions: Vec<AstFunction> } struct AstFunction { name: Name, body: AstFunctionBody, } struct AstFunctionBody { ... }
Under the old Salsa, changes were tracked at a pretty coarse-grained level. So if your input changed, and the content of any function body changed, then your entire AST was considered to have changed. If you were naive about it, this would mean that everything would have to be type-checked again. In order to get good reuse, you had to change the structure of your program pretty dramatically from the “natural structure” that you started with.

Enter: tracked structs

The newer Salsa introduces tracked structs, which makes this a lot easier. The idea is that you can label a struct as tracked, and now its fields become managed by the database:

#[salsa::tracked] struct AstFunction { name: Name, body: AstFunctionBody, }
When a struct is declared as tracked, then we also track accesses to its fields. This means that if the parser produces the same set of functions, then its output is considered not to have changed, even if the function bodies are different. When the type checker reads the function body, we’ll track that read independently. So if just one function has changed, only that function will be type checked again.

Goal: relatively natural

The goal of Salsa 2022 is that you should be able to convert a program to use Salsa without dramatically restructuring it. It should still feel quite similar to the ’natural structure’ that you would have used if you didn’t care about incremental reuse.

Using techniques like tracked structs, you can keep the pattern of a compiler as a kind of “big function” that passes the input through many phases, while still getting pretty good re-use:

fn typical_compiler(input: &str) -> Result { let ast = parse_ast(input); for function in &ast.functions { type_check(function); } ... }
Salsa 2022 also has other nice features, such as accumulators for managing diagnostics and built-in interning.

If you’d like to learn more about how Salsa works, check out the overview page or read through the (WIP) tutorial, which covers the design of a complete compiler and interpreter.

How to get involved

As I mentioned, the purpose of this blog post is to serve as a call for contribution. Salsa is a cool project but it doesn’t have a lot of active maintainers, and we are actively looking to recruit new people.

The Salsa 2022 tracking issue contains a list of possible items to work on. Many of those items have mentoring instructions, just search for things tagged with good first issue. There is also documentation of salsa’s internal structure on the main web page that can help you navigate the code base. Finally, we have a Zulip instance where we hang out and chat (the #good-first-issue stream is a good place to ask for help!)

Many modes: a GATs pattern

2022-06-27T00:00:00+00:00

As some of you may know, on May 4th Jack Huey opened a PR to stabilize an initial version of generic associated types. The current version is at best an MVP: the compiler support is limited, resulting in unnecessary errors, and the syntax is limited, making code that uses GATs much more verbose than I’d like. Nonetheless, I’m super excited, since GATs unlock a lot of interesting use cases, and we can continue to smooth out the rough edges over time. However, folks on the thread have raised some strong concerns about GAT stabilization, including asking whether GATs are worth including in the language at all. The fear is that they make Rust the language too complex, and that it would be better to just use them as an internal building block for other, more accessible features (like async functions and [return position impl trait in traits][RPITIT]). In response to this concern, a number of people have posted about how they are using GATs. I recently took some time to deep dive into these comments and to write about some of the patterns that I found there, including a pattern I am calling the “many modes” pattern, which comes from the chumsky parser combinator library. I posted about this pattern on the thread, but I thought I would cross-post my write-up here to the blog as well, because I think it’s of general interest.

General thoughts from reading the examples

I’ve been going through the (many, many) examples that people have posted where they are relying on GATs and look at them in a bit more detail. A few interesting things jumped out at me as I read through the examples:

Many of the use-cases involve GATs with type parameters. There has been some discussion of stabilizing “lifetime-only” GATs, but I don’t think that makes sense from any angle. It’s more complex for the implementation and, I think, more confusing for the user. But also, given that the “workaround” for not having GATs tends to be higher-ranked trait bounds (HRTB), and given that those only work for lifetimes, it means we’re losing one of the primary benefits of GATs in practice (note that I do expect to get HRTB for types in the near-ish future).

GATs allowed libraries to better hide details from their clients. This is precisely because they could make a trait hierarchy that more directly captured the “spirit” of the trait, resulting in bounds like M: Mode instead of higher-ranked trait bounds (in some cases, the HRTB would have to be over types, like for M: Mode, which isn’t even legal in Rust…yet).

As I read, I felt this fit a pattern that I’ve experienced many times but hadn’t given a name to: when traits are being used to describe a situation that they don’t quite fit, the result is an explosion of where-clauses on the clients. Sometimes you can hide these via supertraits or something, but those complex bounds are still visible in rustdoc, still leak out in error mesages, and don’t generally “stay hidden” as well as you’d like. You’ll see this come up here when I talk about how you would model this pattern in Rust today, but it’s a comon theme across all examples. Issue #95 on the RustAudio crate for example says, “The first [solution] would be to make PortType generic over a 'a lifetime…however, this has a cascading effect, which would force all downstream users of port types to specify their lifetimes”. Pythonesque made a simpler point here, “Without GATs, I ended up having to make an Hkt trait that had to be implemented for every type, define its projections, and then make everything heavily parametric and generic over the various conversions.”

The “many modes” pattern (chumsky)

The first example I looked at closely was the chumsky parsing library. This is leveraging a pattern that I would call the “many modes” pattern. The idea is that you have some “core function” but you want to execute this function in many different modes. Ideally, you’d like to define the modes independently from the function, and you’d like to be able to add more modes later without having to change the function at all. (If you’re familiar with Haskell, monads are an example of this pattern; the monad specifies the “mode” in which some simple sequential function is executed.)

chumsky is a parser combinator library, so the “core function” is a parse function, defined in the Parser trait. Each Parser trait impl contains a function that indicates how to parse some particular construct in the grammar. Normally, this parser function builds up a data structure representing the parsed data. But sometimes you don’t need the full results of the parse: sometimes you might just like to know if the parse succeeds or fails, without building the parsed version. Thus, the “many modes” pattern: we’d like to be able to define our parser and then execute it against one of two modes, emit or check. The emit mode will build the data structure, but check will just check if the parse succeeds.

In the past, chumsky only had one mode, so they always built the data structure. This could take significant time and memory. Adding the “check” mode let’s them skip that, which is a significant performance win. Moreover, the modes are encapsulated within the library traits, and aren’t visible to end-users. Nice!

How did chumsky model modes with GATs?

Chumsky added a Mode trait, encapsulated as part of their internals module. Instead of directly constructing the results from parsing, the Parser impls invoke methods on Mode with closures. This allows the mode to decide which parts of the parsing to execute and which to skip. So, in check mode, the Mode would decide not to execute the closure that builds the output data structure, for example.

Using this approach, the Parser trait does indeed have several ’entrypoint’ methods, but they are all defaulted and just invoke a common implementation method called go:

pub trait Parser<'a, I: Input + ?Sized, E: Error<I::Token> = (), S: 'a = ()> { type Output; fn parse(&self, input: &'a I) -> Result<Self::Output, E> ... { self.go::<Emit>(...) } fn check(&self, input: &'a I) -> Result<(), E> ... { self.go::<Check>(...) } #[doc(hidden)] fn go<M: Mode>(&self, inp: &mut InputRef<'a, '_, I, E, S>) -> PResult<M, Self::Output, E> where Self: Sized; }
Implementations of Parser just specify the go method. Note that the impls are, presumably, either contained within chumsky or generated by chumsky proc-macros, so the go method doesn’t need to be documented. However, even if go were documented, the trait bounds certainly look quite reasonable. (The type of inp is a bit…imposing, admittedly.)

So how is the Mode trait defined? Just to focus on the GAT, the trait look likes this:

pub trait Mode { type Output<T>; ... }
Here, the T represents the result type of “some parser parsed in this mode”. GATs thus allow us to define a Mode that is independent from any particular Parser. There are two impls of Mode (also internal to chumsky):

Check, defined like struct Check; impl Mode for Check { type Output = (); ... }. In other words, no matter what parser you use, Check just builds a () result (success or failure is propagated inepdendently of the mode).

Emit, defined like struct Emit; impl Mode for Emit { type Output = T; ... }. In Emit mode, the output is exactly what the parser generated.

Note that you could, in theory, produce other modes. For example, a Count mode that not only computes success/failure but counts the number of nodes parsed, or perhaps a mode that computes hashes of the resulting parsed value. Moreover, you could add these modes (and the defaulted methods in Parser) without breaking any clients.

How could you model this today?

I was trying to think how one might model this problem with traits today. All the options I came up with had significant downsides.

Multiple functions on the trait, or multiple traits. One obvious option would be to use multiple functions in the parse trait, or multiple traits:

// Multiple functions trait Parser { fn parse(); fn check(); } // Multiple traits trait Parser: Checker { fn parse(); } trait Checker { fn check(); }
Both of these approaches mean that defining a new combinator requires writing the same logic twice, once for parse and once for check, but with small variations, which is both annoying and a great opportunity for bugs. It also means that if chumsky ever wanted to define a new mode, they would have to modify every implementation of Parser (a breaking change, to boot).

Mode with a type parameter. You could try defining a the mode trait with a type parameter, like so…

trait ModeFor<T> { type Output; ... }
The go function would then look like

fn go<M: ModeFor<Self::Output>>(&self, inp: &mut InputRef<'a, '_, I, E, S>) -> PResult<M, Self::Output, E> where Self: Sized;
In practice, though, this doesn’t really work, for a number of reasons. One of them is that the Mode trait includes methods like combine, which take the output of many parsers, not just one, and combine them together. Good luck writing that constraint with ModeFor. But even ignoring that, lacking HRTB, the signature of go itself is incomplete. The problem is that, given some impl of Parser for some parser type MyParser, MyParser only knows that M is a valid mode for its particular output. But maybe MyParser plans to (internally) use some other parser combinators that produce different kinds of results. Will the mode M still apply to those? We don’t know. We’d have to be able to write a HRTB like for Mode, which Rust doesn’t support yet:

fn go<M: for<O> Mode<O>>(&self, inp: &mut InputRef<'a, '_, I, E, S>) -> PResult<M, Self::Output, E> where Self: Sized;
But even if Rust did support it, you can see that the Mode trait doesn’t capture the user’s intent as closely as the Mode trait from Chumsky did. The Mode trait was defined independently from all parsers, which is what we wanted. The Mode trait is defined relative to some specific parser, and then it falls to the go function to say “oh, I want this to be a mode for all parsers” using a HRTB.

Using just HRTB (which, again, Rust doesn’t have), you could define another trait…

trait Mode: for<O> ModeFor<O> {} trait ModeFor<O> {}
…which would allow us to write M: Mode on go against, but it’s hard to argue this is simpler than the original GAT variety. This extra ModeFor trait has a “code smell” to it, it’s hard to understand why it is there. Whereas before, you implemented the Mode trait in just the way you think about it, with a single impl that applies to all parsers…

impl Mode for Check { type Output<T> = (); ... }
…you now write an impl of ModeFor, where one “instance” of the impl applies to only one parser (which has output type O). It feels indirect:

impl<O> ModeFor<O> for Check { type Output = (); ... }
How could you model this with RPITIT?

It’s also been proposed that we should keep GATs, but only as an implementation detail for things like return position impl Trait in traits (RPITIT) or async functions. This implies that we could model the “many modes” pattern with RPITIT. If you look at the Mode trait, though, you’ll see that this simply doesn’t work. Consider the combine method, which takes the results from two parsers and combines them to form a new result:

fn combine<T, U, V, F: FnOnce(T, U) -> V>( x: Self::Output<T>, y: Self::Output, f: F, ) -> Self::Output<V>;
How could we write this in terms of a function that returns impl Trait?

Other patterns

In this post, I went through the chumsky pattern in detail. I’ve not had time to dive quite as deep into other examples, but I’ve been reading through them and trying to extract out patterns. Here are a few patterns I extracted so far:

The “generic scopes” pattern (smithay, playground):

In the Smithay API, if you have some variable r: R where R: Renderer, you can invoke r.render(|my_frame| ...). This will invoke your callback with some frame my_frame that you can then modify. The thing is that the type of my_frame depends on the type of renderer that you have; moreover, frames often include thread-local data and so should only be accessible to during that callback.

I called this the “generic scopes” pattern because, at least from a types POV, it is kind of a generic version of APIs like std:🧵:scope. The scope function also uses a callback to give limited access to a variable (the “thread scope”), but in the case of std:🧵:scope, the type of that scope is hard-coded to be std:🧵:Scope, whereas here, we want the specific type to depend on the renderer.

Thanks to GATs, you can express that pretty cleanly, so that the only bound you need is R: Renderer. As with “many modes”, if you tried to express it using features today, you can get part of the way there, but the bounds will be complex and involve HRTB.

The “pointer types” pattern:

I didn’t dig deep enough into Pythonesque’s hypotheticals, but this comment seemed to be describing a desire to talk about “pointer types” in the abstract, which is definitely a common need; looking at the comits from Veloren that pythonesque also cited, this might be a kind of “pointer types” pattern, but I think I might also call it “many modes”.

The “iterable” pattern:

In this pattern, you would like a way to say where C: Iterable, meaning that C is a collection with an iter method which fits the signature fn iter(&self) -> impl Iterator. This is distinct from IntoIterator because it takes &self and thus we can iterate over the same collection many times and concurrently.

The most common workaround is to return a Box (as in graphene) or a collection (as in metamolectular). Neither is zero-cost, which can be a problem in tight loops, as commented here. You can also use HRTB (as rustc does, which is complex and leaky.

Did I miss something?

Maybe you see a way to express the “many modes” pattern (or one of the other patterns I cited) in Rust today that works well? Let me know by commenting on the thread.

(Since posting this, it occurs to me that one could probably use procedural macros to achieve some similar goals, though I think this approach would also have significant downsides.)

What it feels like when Rust saves your bacon

2022-06-15T00:00:00+00:00

You’ve probably heard that the Rust type checker can be a great “co-pilot”, helping you to avoid subtle bugs that would have been a royal pain in the !@#!$! to debug. This is truly awesome! But what you may not realize is how it feels in the moment when this happens. The answer typically is: really, really frustrating! Usually, you are trying to get some code to compile and you find you just can’t do it.

As you come to learn Rust better, and especially to gain a bit of a deeper understanding of what is happening when your code runs, you can start to see when you are getting a type-check error because you have a typo versus because you are trying to do something fundamentally flawed.

A couple of days back, I had a moment where the compiler caught a really subtle bug that would’ve been horrible had it been allowd to compile. I thought it would be fun to narrate a bit how it played out, and also take the moment to explain a bit more about temporaries in Rust (a common source of confusion, in my observations).

Code available in this repository

All the code for this blog post is available in a github repository.

Setting the scene: lowering the AST

In the compiler, we first represent Rust programs using an Abstract Syntax Tree (AST). I’ve prepared a standalone example that shows roughly how the code looks today (of course the real thing is a lot more complex). The AST in particular is found in the ast module containing various data structures that map closely to Rust syntax. So for example we have a Ty type that represents Rust types:

pub enum Ty { ImplTrait(TraitRef), NamedType(String, Vec<Ty>), // ... } pub struct Lifetime { // ... }
The impl Trait notation references a TraitRef, which stores the Trait part of things:

pub struct TraitRef { pub trait_name: String, pub parameters: Parameters, } pub enum Parameters { AngleBracket(Vec<Parameter>), Parenthesized(Vec<Ty>), } pub enum Parameter { Ty(Ty), Lifetime(Lifetime), }
Note that the parameters of the trait come in two varieties, angle-bracket (e.g., impl PartialEq or impl MyTrait<'a, U>) and parenthesized (e.g., impl FnOnce(String, u32)). These two are slightly different – parenthesized parameters, for example, only accept types, whereas angle-bracket accept types or lifetimes.

After parsing, this AST gets translated to something called High-level Intermediate Representation (HIR) through a process called lowering. The snippet doesn’t include the HIR, but it includes a number of methods like lower_ty that take as input an AST type and produce the HIR type:

impl Context { fn lower_ty(&mut self, ty: &ast::Ty) -> hir::Ty { match ty { // ... lots of stuff here // A type like `impl Trait` ast::Ty::ImplTrait(trait_ref) => { do_something_with(trait_ref); } // A type like `Vec`, where `Vec` is the name and // `[T]` are the `parameters` ast::Ty::NamedType(name, parameters) => { for parameter in parameters { self.lower_ty(parameter); } } } // ... } }
Each method is defined on this Context type that carries some common state, and the methods tend to call one another. For example, lower_signature invokes lower_ty on all of the input (argument) types and on the output (return) type:

impl Context { fn lower_signature(&mut self, sig: &ast::Signature) -> hir::Signature { for input in &sig.inputs { self.lower_ty(input); } self.lower_ty(&sig.output); ... } }
Our story begins

Santiago Pastorino is working on a refactoring to make it easier to support returning impl Trait values from trait functions. As part of that, he needs to collect all the impl Trait types that appear in the function arguments. The challenge is that these types can appear anywhere, and not just at the top level. In other words, you might have fn foo(x: impl Debug), but you might also have fn foo(x: Box<(impl Debug, impl Debug)>). Therefore, we decided it would make sense to add a vector to Context and have lower_ty collect the impl Trait types into it. That way, we can find the complete set.

To do this, we started by adding the vector into this Context. We’ll store the TraitRef from each impl Trait type:

struct Context<'ast> { saved_impl_trait_types: Vec<&'ast ast::TraitRef>, // ... }
To do this, we had to add a new lifetime parameter, 'ast, which is meant to represent the lifetime of the AST structure itself. In other words, saved_impl_trait_types stores references into the AST. Of course, once we did this, the compiler got upset and we had to go modify the impl block that references Context:

impl<'ast> Context<'ast> { ... }
Now we can modify the lower_ty to push the trait ref into the vector:

impl<'ast> Context<'ast> { fn lower_ty(&mut self, ty: &ast::Ty) { match ty { ... ast::Ty::ImplTrait(...) => { // 👇 push the types into the vector 👇 self.saved_impl_trait_types.push(ty); do_something(); } ast::Ty::NamedType(name, parameters) => { ... // just like before } ... } } }
At this point, the compiler gives us an error:

error[E0621]: explicit lifetime required in the type of `ty` --> examples/b.rs:125:42 | 119 | fn lower_ty(&mut self, ty: &ast::Ty) -> hir::Ty { | -------- help: add explicit lifetime `'ast` to the type of `ty`: `&'ast ast::Ty` ... 125 | self.impl_trait_tys.push(trait_ref); | ^^^^^^^^^ lifetime `'ast` required
Pretty nice error, actually! It’s pointing out that we are pushing into this vector which needs references into “the AST”, but we haven’t declared in our signature that the ast::Ty must actually from “the AST”. OK, let’s fix this:

impl<'ast> Context<'ast> { fn lower_ty(&mut self, ty: &'ast ast::Ty) { // had to add 'ast here 👆, just like the error message said ... } }
Propagating lifetimes everywhere

Of course, now we start getting errors in the functions that call lower_ty. For example, lower_signature says:

error[E0621]: explicit lifetime required in the type of `sig` --> examples/b.rs:71:18 | 65 | fn lower_signature(&mut self, sig: &ast::Signature) -> hir::Signature { | --------------- help: add explicit lifetime `'ast` to the type of `sig`: `&'ast ast::Signature` ... 71 | self.lower_ty(input); | ^^^^^^^^ lifetime `'ast` required
The fix is the same. We tell the compiler that the ast::Signature is part of “the AST”, and that implies that the ast::Ty values owned by the ast::Signature are also part of “the AST”:

impl<'ast> Context<'ast> { fn lower_signature(&mut self, sig: &'ast ast::Signature) -> hir::Signature { // had to add 'ast here 👆, just like the error message said ... } }
Great. This continues for a bit. But then… we hit this error:

error[E0597]: `parameters` does not live long enough --> examples/b.rs:92:53 | 58 | impl<'ast> Context<'ast> { | ---- lifetime `'ast` defined here ... 92 | self.lower_angle_bracket_parameters(¶meters); | ------------------------------------^^^^^^^^^^^- | | | | | borrowed value does not live long enough | argument requires that `parameters` is borrowed for `'ast` 93 | } | - `parameters` dropped here while still borrowed
What’s this about?

Uh oh…

Jumping to that line, we see this function lower_trait_ref:

impl Context<'ast> { // ... fn lower_trait_ref(&mut self, trait_ref: &'ast ast::TraitRef) -> hir::TraitRef { match &trait_ref.parameters { ast::Parameters::AngleBracket(parameters) => { self.lower_angle_bracket_parameters(&parameters); } ast::Parameters::Parenthesized(types) => { let parameters: Vec<_> = types.iter().cloned().map(ast::Parameter::Ty).collect(); self.lower_angle_bracket_parameters(&parameters); // 👈 error is on this line } } hir::TraitRef } // ... }
So what’s this about? Well, the purpose of this code is a bit clever. As we saw before, Rust has two syntaxes for trait-refs, you can use parentheses like FnOnce(u32), in which case you only have types, or you can use angle brackets like Foo<'a, u32>, in which case you could have either lifetimes or types. So this code is normalizing to the angle-bracket notation, which is more general, and then using the same lowering helper function.

Wait! Right there! That was the moment!

What?

That was the moment that Rust saved you a world of pain!

It was? It just kind of seemed like an annoying, and I will say, kind of confusing compilation error. What the heck is going on? The problem here is that parameters is a local variable. It is going to be freed as soon as lower_trait_ref returns. But it could happen that lower_trait_ref calls lower_ty which takes a reference to the type and stores it into the saved_impl_trait_types vector. Then, later, some code would try to use that reference, and access freed memory. That would sometimes work, but often not – and if you forgot to test with parenthesized trait refs, the code would work fine for ever, so you’d never even notice.

How to fix it

Maybe you’re wondering: great, Rust saved me a world of pain, but how do I fix it? Do I just have to copy the lower_angle_bracket_parameters and have two copies? ‘Cause that’s kind of unfortunate.

Well, there are a variety of ways you might fix it. One of them is to use an arena, like the typed-arena crate. An arena is a memory pool. Instead of storing the temporary Vec vector on the stack, we’ll put it in an arena, and that way it will live for the entire time that we are lowering things. Example C in the repo takes this approach. It starts by adding the arena field to the Context:

struct Context<'ast> { impl_trait_tys: Vec<&'ast ast::TraitRef>, // Holds temporary AST nodes that we create during lowering; // this can be dropped once lowering is complete. arena: &'ast typed_arena::Arena<Vec<ast::Parameter>>, }
This actually makes a subtle change to the meaning of 'ast. It used to be that the only things with 'ast lifetime were “the AST” itself, so having that lifetime implied being a part of the AST. But now that same lifetime is being used to tag the arena, too, so if we hae &'ast Foo it means the data comes is owned by either the arena or the AST itself.

Side note: despite the name lifetimes, which I now rather regret, more and more I tend to think of lifetimes like 'ast in terms of “who owns the data”, which you can see in my description in the previous paragraph. You could instead think of 'ast as a span of time (a “lifetime”), in which case it refers to the time that the Context type is valid, really, which must be a subset of the time that the arena is valid and the time that the AST itself is valid, since Context stores references to data owned by both of those.

Now we can rewrite lower_trait_ref to call self.arena.alloc():

impl Context<'ast> { fn lower_trait_ref(&mut self, trait_ref: &'ast ast::TraitRef) -> hir::TraitRef { match &trait_ref.parameters { // ... ast::Parameters::Parenthesized(types) => { let parameters: Vec<_> = types.iter().cloned().map(ast::Parameter::Ty).collect(); let parameters = self.arena.alloc(parameters); // 👈 added this line! self.lower_angle_bracket_parameters(parameters); } } // ... } }
Now the parameters variable is not stored on the stack but allocated in the arena; the arena has 'ast lifetime, so that’s fine, and everything works!

Calling the lowering code and creating the context

Now that we added, the arena, creating the context will look a bit different. It’ll look something like:

let arena = TypedArena::new(); let context = Context::new(&arena); let hir_signature = context.lower_signature(&signature);
The nice thing about this is that, once we are done with lowering, the context will be dropped and all those temporary nodes will be freed.

Another way to fix it

The other obvious option is to avoid lifetimes altogether and just “clone all the things”. Given that the AST is immutable once constructed, you can just clone them into the vector:

struct Context { impl_trait_tys: Vec<ast::TraitRef>, // just clone it! }
If that clone is too expensive (possible), then use Rc or Arc (this will require deep-ish changes to the AST to put all the things into Rc or Arc that might need to be individually referenced). At this point you’ve got a feeling a lot like garbage collection (if less ergonomic).

Yet another way

The way I tend to write compilers these days is to use the “indices as pointers”. In this approach, all the data in the AST is stored in vectors, and references between things use indices, kind of like I described here.

Conclusion

Compilation errors are pretty frustrating, but they may also be a sign that the compiler is protecting us from ourselves. In this case, when we embarked on this refactoring, I was totally sure it was going to work fine, because I didn’t realize we ever created “temporary AST” nodes, so I assumed that all the data was owned by the original AST. In a language like C or C++, it would have been very easy to have a bug here, and it would have been a horrible pain to find. With Rust, that’s not a problem.

Of course, not everything is great. For me, doing these kinds of lifetime transformations is old-hat. But for many people it’s pretty non-obvious how to start when the compiler is giving you error messages. When people come to me for help, the first thing I try to do is to suss out: what are the ownership relationships, and where do we expect these references to be coming form? There’s also various heuristics that I use to decide: do we need a new lifetime parameter? Can we re-use an existing one? I’ll try to write up more stories like this to clarify that side of things. Honestly, my main point here was that I was just so grateful that Rust prevented us from spending hours and hours debugging a subtle crash!

Looking forward a bit, I see a lot of potential to improve things about our notation and terminology. I think we should be able to make cases like this one much slicker, hopefully without requiring named lifetime parameters and so forth, or as many edits. But I admit I don’t yet know how to do it! :) My plan for now is to keep an eye out for the tricks I am using and the kinds of analysis I am doing in my head and write out blog posts like this one to capture those narratives. I encourage those of you who know Rust well (or who don’t!) to do the same.

Appendix: why not have Context own the TypedArena?

You may have noticed that using the arena had a kind of annoying consequence: people who called Context::new now had to create and supply an area:

let arena = TypedArena::new(); let context = Context::new(&arena); let hir_signature = context.lower_signature(&signature);
This is because Context<'ast> stores a &'ast TypedArena<_>, and so the caller must create the arena. If we modified Context to own the arena, then the API could be better. So why didn’t I do that? To see why, check out example D (which doesn’t build). In that example, the Context looks like…

struct Context<'ast> { impl_trait_tys: Vec<&'ast ast::TraitRef>, // Holds temporary AST nodes that we create during lowering; // this can be dropped once lowering is complete. arena: typed_arena::Arena<Vec<ast::Parameter>>, }
You then have to change the signatures of each function to take an &'ast mut self:

impl Context<'ast> { fn lower_signature(&'ast mut self, sig: &'ast ast::Signature) -> hir::Signature {...} }
This is saying: the 'ast parameter might refer to data owned by self, or maybe by sig. Seems sensible, but if you try to build Example D, though, you get lots of errors. Here is one of the most interesting to me:

error[E0502]: cannot borrow `*self` as mutable because it is also borrowed as immutable --> examples/d.rs:98:17 | 62 | impl<'ast> Context<'ast> { | ---- lifetime `'ast` defined here ... 97 | let parameters = self.arena.alloc(parameters); | ---------------------------- | | | immutable borrow occurs here | argument requires that `self.arena` is borrowed for `'ast` 98 | self.lower_angle_bracket_parameters(parameters); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here
What is this all about? This is actually pretty subtle! This is saying that parameters was allocated from self.arena. That means that parameters will be valid as long as self.arena is valid.

But self is an &mut Context, which means it can mutate any of the fields of the Context. When we call self.lower_angle_bracket_parameters(), it’s entirely possible that lower_angle_bracket_parameters could mutate the arena:

fn lower_angle_bracket_parameters(&'ast mut self, parameters: &'ast [ast::Parameter]) { self.arena = TypedArena::new(); // what if we did this? // ... }
Of course, the code doesn’t do that now, but what if it did? The answer is that the parameters would be freed, because the arena that owns them is freed, and so we’d have dead code. D’oh!

All things considered, I’d like to make it possible for Context to own the arena, but right now it’s pretty challenging. This is a good example of code patterns we could enable, but it’ll require language extensions.

Async cancellation: a case study of pub-sub in mini-redis

2022-06-13T00:00:00+00:00

Lately I’ve been diving deep into tokio’s mini-redis example. The mini-redis example is a great one to look at because it’s a realistic piece of quality async Rust code that is both self-contained and very well documented. Digging into mini-redis, I found that it exemplifies the best and worst of async Rust. On the one hand, the code itself is clean, efficient, and high-level. On the other hand, it relies on a number of subtle async conventions that can easily be done wrong – worse, if you do them wrong, you won’t get a compilation error, and your code will “mostly work”, breaking only in unpredictable timing conditions that are unlikely to occur in unit tests. Just the kind of thing Rust tries to avoid! This isn’t the fault of mini-redis – to my knowledge, there aren’t great alterantive patterns available in async Rust today (I go through some of the alternatives in this post, and their downsides).

Context: evaluating moro

We’ve heard from many users that async Rust has a number of pitfalls where things can break in subtle ways. In the Async Vision Doc, for example, the Barbara battles buffered streams and solving a deadlock stories discuss challenges with FuturesUnordered (wrapped in the buffered combinator); the Barbara gets burned by select and Alan tries to cache requests, which doesn’t always happen stories talk about cancellation hazards and the select! or race combinators.

In response to these stories, I created an experimental project called moro that explores structured concurrency in Rust. I’ve not yet blogged about moro, and that’s intentional. I’ve been holding off until I gain more confidence in moro’s APIs. In the meantime, various people (including myself) have been porting different bits of code to moro to get a better sense for what works and what doesn’t. GusWynn, for example, started changing bits of the materialize.io codebase to use moro and to have a safer alternative to cancellation. I’ve been poking at mini-redis, and I’ve also been working with some folks within AWS with some internal codebases.

What I’ve found so far is that moro absolutely helps, but it’s not enough. Therefore, instead of the triumphant blog post I had hoped for, I’m writing this one, which does a kind of deep-dive into the patterns that mini-redis uses: both how they work well when done right, but also how they are tedious and error-prone. I’ll be posting some follow-up blog posts that explore some of the ways that moro can help.

What is mini-redis?

If you’ve not seen it, mini-redis is a really cool bit of example code from the tokio project. It implements a “miniature” version of the redis in-memory data store, focusing on the key-value and pub-sub aspects of redis. Specifically, clients can connect to mini-redis and issue a subset of the redis commands. In this post, I’m going to focus on the “pub-sub” aspect of redis, in which clients can publish messages to a topic which are then broadcast to everyone who has subscribed to that topic. Whenever a client publishes a message, it receives in response the number of other clients that are currently subscribed to that topic.

Here is an example workflow involving two clients. Client 1 is subscribing to things, and Client 2 is publishing messages.

sequenceDiagram Client1 ->> Server: subscribe `A` Client2 ->> Server: publish `foo` to `A` Server -->> Client2: 1 client is subscribed to `A` Server -->> Client1: `foo` was published to `A` Client1 ->> Server: subscribe `B` Client2 ->> Server: publish `bar` to `B` Server -->> Client2: 1 client is subscribed to `B` Server -->> Client1: `bar` was published to `B` Client1 ->> Server: unsubscribe A Client2 ->> Server: publish `baz` to `A` Server -->> Client2: 0 clients are subscribed to `A`

Core data structures

To implement this, the redis server maintains a struct State that is shared across all active clients. Since it is shared across all clients, it is maintained in a Mutex (source):

struct Shared { /// The shared state is guarded by a mutex. […] state: Mutex<State>, … }
Within this State struct, there is a pub_sub field (source):

pub_sub: HashMap<String, broadcast::Sender<Bytes>>,
The pub_sub field stores a big hashmap. The key is the topic and the value is the broadcast::Sender, which is the “sender half” of a tokio broadcast channel. Whenever a client issues a publish command, it ultimately calls Db::publish, which winds up invoking send on this broadcast channel:

pub(crate) fn publish(&self, key: &str, value: Bytes) -> usize { let state = self.shared.state.lock().unwrap(); state .pub_sub .get(key) // On a successful message send on the broadcast channel, the number // of subscribers is returned. An error indicates there are no // receivers, in which case, `0` should be returned. .map(|tx| tx.send(value).unwrap_or(0)) // If there is no entry for the channel key, then there are no // subscribers. In this case, return `0`. .unwrap_or(0) }
The subscriber loop

We just saw how, when clients publish data to a channel, that winds up invoking send on a broadcast channel. But how do the clients who are subscribed to that channel receive those messages? The answer lies in the Subscribe command.

The idea is that the server has a set subscriptions of subscribed channels for the client (source):

let mut subscriptions = StreamMap::new();
This is implemented using a tokio StreamMap, which is a neato data structure that takes multiple streams which each yield up values of type V, gives each of them a key K, and combines them into one stream that yields up (K, V) pairs. In this case, the streams are the “receiver half” of those broadcast channels, and the keys are the channel names.

When it receives a subscribe command, then, the server wants to do the following:

Add the receivers for each subscribed channel into subscriptions.

Loop:

If a message is published to subscriptions, then send it to the client.

If the client subscribes to new channels, add those to subscriptions and send an acknowledgement to client.

If the client unsubscribes from some channels, remove them from subscriptions and send an acknowledgement to client.

If the client terminates, end the loop and close the connection.

“Show me the state”

Learning to write Rust code is basically an exercise in asking “show me the state” — i.e., the key to making Rust code work is knowing what data is going to be modified and when¹. In this case, there are a few key pieces of state…

The set subscriptions of “broadcast receivers” from each subscribed stream

There is also a set self.channels of “pending channel names” that ought to be subscribed to, though this is kind of an implementation detail and not essential.

The connection connection used to communicate with the client (a TCP socket)

And there are three concurrent tasks going on, each of which access that same state…

Looking for published messages from subscriptions and forwarding to connection (reads subscriptions, writes to connection)

Reading client commands from connection and then either…

subscribing to new channels (writes to subscriptions) and sending a confirmation (writes to connection);

or unsubscribing from channels (writes to subscriptions) and sending a confirmation (writes to connection).

Watching for termination and then cancelling everything (drops the broadcast handles in connections).

You can start to see that this is going to be a challenge. There are three conceptual tasks, but they are each needing mutable access to the same data:

flowchart LR forward["Forward published messages to client"] client["Process subscribe/unsubscribe messages from client"] terminate["Watch for termination"] subscriptions[("subscriptions:\nHandles from\nsubscribed channels")] connection[("connection:\nTCP stream\nto/from\nclient")] forward -- reads --> subscriptions forward -- writes --> connection client -- reads --> connection client -- writes --> subscriptions terminate -- drops --> subscriptions style forward fill:oldlace style client fill:oldlace style terminate fill:oldlace style subscriptions fill:pink style connection fill:pink

If you tried to do this with normal threads, it just plain wouldn’t work…

let mut subscriptions = vec![]; // close enough to a StreamMap for now std:🧵:scope(|s| { s.spawn(|| subscriptions.push("key1")); s.spawn(|| subscriptions.push("key2")); });
If you try this on the playground, you’ll see it gets an error because both closures are trying to access the same mutable state. No good. So how does it work in mini-redis?

Enter select!, our dark knight

Mini-redis is able to juggle these three threads through careful use of the select! macro. This is pretty cool, but also pretty error-prone — as we’ll see, there are a number of subtle points in the way that select! is being used here, and it’s easy to write the code wrong and have surprising bugs. At the same time, it’s pretty neat that we can use select! in this way, and it begs the question of whether we can find safer patterns to achieve the same thing. I think right now you can find safer ones, but they require less efficiency, which isn’t really living up to Rust’s promise (though it might be a good idea). I’ll cover that in a follow-up post, though, for now I just want to focus on explaining what mini-redis is doing and the pros and cons of this approach.

The main loop looks like this (source):

let mut subscriptions = StreamMap::new(); loop { … select! { Some((channel_name, msg)) = subscriptions.next() => ... // -------------------- future 1 res = dst.read_frame() => ... // ---------------- future 2 _ = shutdown.recv() => ... // --------------- future 3 } }
select! is kind of like a match statement. It takes multiple futures (underlined in the code above) and continues executing them until one of them completes. Since the select! is in a loop, and in this case each of the features are producing a series of events, this setup effectively runs the three futures concurrently, processing events as they arrive:

subscriptions.next() – the future waiting for the next message to arise to the StreamMap

dst.read_frame() – the async method read_frame is defined on the conection, dst. It reads data from the client, parses it into a complete command, and returns that command. We’ll dive into this function in a bit – it turns out that it is written in a very careful way to account

shutdown.recv() – the mini-redis server signals a global shutdown by threading a tokio channel to every connection; when a message is sent to that channel, all the loops cleanup and stop.

How select! works

So, select! runs multiple futures concurrently until one of them completes. In practice, this means that it iterates down the futures, one after the other. Each future gets awoken and runs until it either yields (meaning, awaits on something that isn’t ready yet) or completes. If the future yields, then select! goes to the next future and tries that one.

Once a future completes, though, the select! gets ready to complete. It begins by dropping all the other futures that were selected. This means that they immediately stop executing at whatever await point they reached, running any destructors for things on the stack. As I described in a previous blog post, in practice this feels a lot like a panic! that is injected at the await point. And, just like any other case of recovering from an exception, it requires that code is written carefully to avoid introducing bugs – tomaka describes one such example in his blog post. These bugs are what gives async cancellation in Rust a reputation for being difficult.

Cancellation and mini-redis

Let’s talk through what cancellation means for mini-redis. As we saw, the select! here is effectively running two distinct tasks (as well as waiting for shutdown):

Waiting on subscriptions.next() for a message to arrive from subscribed channels, so it can be forwarded to the client.

Waiting on dst.read_frame() for the next comand from the client, so that we can modify the set of subscribed channels.

We’ll see that mini-redis is coded carefully so that, whichever of these events occurs first, everything keeps working correctly. We’ll also see that this setup is fragile – it would be easy to introduce subtle bugs, and the compiler would not help you find them.

Take a look back at the sample subscription workflow at the start of this post. After Client1 has subscribed to A, the server is effectively waiting for Client1 to send further messages, or for other clients to publish.

The code that checks for further messages from Client1 is an async function called read_frame. It has to read the raw bytes sent by the client and assemble them into a “frame” (a single command). The read_frame in mini-redis is written in particular way:

It loops and, for each iteration…

tries to parse from a complete frame from self.buffer,

if self.buffer doesn’t contain a complete frame, then it reads more data from the stream into the buffer.

In pseudocode, it looks like (source):

impl Connection { async fn read_frame(&mut self) -> Result<Option<Frame>> { loop { if let Some(f) = parse_frame(&self.buffer) { return Ok(Some(f)); } read_more_data_into_buffer(&mut self.buffer).await; } } }
The key idea is that the function buffers up data until it can read an entire frame (i.e., successfully complete) and then it removes that entire frame at once. It never removes part of a frame from the buffer. This ensures that if the read_frame function is canceled while awaiting more data, nothing gets lost.

Ways to write a broken read_frame

There are many ways to a version of read_frame that is NOT cancel-safe. For example, instead of storing the buffer in self, one could put the buffer on the stack:

impl Connection { async fn read_frame(&mut self) -> Result<Option<Frame>> { let mut buffer = vec![]; loop { if let Some(f) = parse_frame(&buffer) { return Ok(Some(f)); } read_more_data_into_buffer(&mut buffer).await; // ----- // If future is canceled here, // buffer is lost. } } }
This setup is broken because, if the future is canceled when awaiting more data, the buffered data is lost.

Alternatively, read_frame could intersperse reading from the stream and parsing the frame itself:

impl Connection { async fn read_frame(&mut self) -> Result<Option<Frame>> { let mut buffer = vec![]; let command_name = self.read_command_name().await match command_name { "subscribe" => self.parse_subscribe_command().await, "unsubscribe" => self.parse_unsubscribe_command().await, "publish" => self.parse_publish_command().await, ... } } }
The problem here is similar: if we are canceled while awaiting one of the parse_foo_command futures, then we will forget the fact that we read the command_name already.

Comparison with JavaScript

It is interesting to compare Rust’s Future model with Javascript’s Promise model. In JavaScript, when an async function is called, it implicitly creates a new task. This task has “independent life”, and it keeps executing even if nobody ever awaits it. In Rust, invoking an async fn returns a Future, but that is inert. A Future only executes when some task awaits it. (You can create a task by invoking a suitable spawn method your runtime, and then it will execute on its own.)

There are really good reasons for Rust’s model: in particular, it is a zero-cost abstraction (or very close to it). In JavaScript, if you have one async function, and you factor out a helper function, you just went from one task to two tasks, meaning twice as much load on the scheduler. In Rust, if you have an async fn and you factor out a helper, you still have one task; you also still allocate basically the same amount of stack space. This is a good example of the “performant” (“idiomatic code runs efficiently”) Rust design principle in action.

However, at least as we’ve currently set things up, the Rust model does have some sharp edges. We’ve seen three ways to write read_frame, and only one of them works. Interestingly, all three of them would work in JavaScript, because in the JS model, an async function always starts a task and hence maintains its context.

I would argue that this represents a serious problem for Rust, because it represents a failure to maintain the “reliability” principle (“if it compiles, it works”), whigh ought to come first and foremost for us. The result is that async Rust feels a bit more like C or C++, where performant and versatile take top rank, and one has to have a lot of experience to know how to avoid sharp edges.

Now, I am not arguing Rust should adopt the “Promises” model – I think the Future model is better. But I think we need to tweak something to recover that reliability.

Comparison with threads

It’s interesting to compare how mini-redis with async Rust would compare to a mini-redis implemented with threads. It turns out that it would also be challenging, but in different ways. To start, let’s write up some pseudocode for what we are trying to do:

let mut subscriptions = StreamMap::new(); spawn(async move { while let Some((channel_name, msg)) = subscriptions.next().await { connection.send_message(channel_name, msg); } }); spawn(async move { while let Some(frame) = connection.read_frame().await { match frame { Subscribe(new_channel) => subscribe(&mut connection, new_channel), Unsubscribe(channel) => unsubscribe(&mut connection, channel), _ => ..., } } });
Here we have spawned out two threads, one of which is waiting for new messages from the subscriptions, and one of which is processing incoming client messages (which may involve adding channels the subscriptions map).

There are two problems here. First, you may have noticed I didn’t handle server shutdown! That turns out to be kind of a pain in this setup, because tearing down those spawns tasks is harder than you might think. For simplicity, I’m going to skip that for the rest of the post – it turns out that moro’s APIs solve this problem in a really nice way by allowing shutdown to be imposed externally without any deep changes.

Second, those two threads are both accessing subscriptions and connection in a mutable way, which the Rust compiler will not accept. This is a key problem. Rust’s type system works really well when you can breakdown your data such that every task accesses distinct data (i.e., “spatially disjoint”), either because each task owns the data or because they have &mut references to different parts of it. We have a much harder time dealing with multiple tasks accessing the same data but at different points in time (i.e., “temporally disjoint”).

Use an arc-mutex?

The main way to manage multiple tasks sharing access to the same data is with some kind of interior mutability, typically an Arc>. One problem with this is that it fails Rust’s performant design principle (“idiomatic code runs efficiently”), because there is runtime overhead (even if it is minimal in practice, it doesn’t feel good). Another problem with Arc> is that it hits on a lot of Rust’s ergonomic weak points, failing our “supportive” principle (“the language, tools, and community are here to help”):

You have to allocate the arcs and clone references explicitly, which is annoying;

You have to invoke methods like lock, get back lock guards, and understand how destructors and lock guards interact;

In Async code in particular, thanks to #57478, the compiler doesn’t understand very well when a lock guard has been dropped, resulting in annoying compiler errors – though Eric Holk is close to landing a fix for this one! 🎉

Of course, people who remember the “bad old days” of async Rust before async-await are very familiar with this dynamic. In fact, one of the big selling points of adding async await sugar into Rust was getting rid of the need to use arc-mutex.

Deeper problems

But the ergonomic pitfalls of Arc are only the beginning. It’s also just really hard to get Arc to actually work for this setup. To see what I mean, let’s dive a bit deeper into the state for mini-redis. There are two main bits of state we have to think about:

the tcp-stream to the client

the StreamMap of active connections

Managing access to the tcp-stream for the client is actually relatively easy. For one thing, tokio streams support a split operation, so it is possible to take the stream and split out the “sending half” (for sending messages to the client) and the “receiving half” (for receiving messages from the client). All the active threads can send data to the client, so they all need the sending half, and presumably it’ll be have to be wrapped in an (async aware) mutex. But only one active thread needs the receiving half, so it can own that, and avoid any locks.

Managing access to the StreamMap of active connections, though, is quite a bit more difficult. Imagine we were to put that StreamMap itself into a Arc, so that both tasks can access it. Now one of the tasks is going to be waiting for new messages to arrive. It’s going to look something like this:

let mut subscriptions = Arc::new(Mutex::new(StreamMap::new())); spawn(async move { while let Some((channel_name, msg)) = subscriptions.lock().unwrap().next().await { connection.send_message(channel_name, msg); } });
However, this code won’t compile (thankfully!). The problem is that we are acquiring a lock but we are trying to hold onto that lock while we await, which means we might switch to other tasks with the lock being held. This can easily lead to deadlock if those other tasks try to acquire the lock, since the tokio scheduler and the O/S scheduler are not cooprerating with one another.

An alternative would be to use an async-aware mutex like tokio::sync::Mutex, but that is also not great: we can still wind up with a deadlock, but for another reason. The server is now prevented from adding a new subscription to the list until the lock is released, which means that if Client1 is trying to subscribe to a new channel, it has to wait for some other client to send a message to an existing channel to do so (because that is when the lock is released). Not great.

Actually, this whole saga is covered under another async vision doc “status quo” story, Alan thinks he needs async locks.

A third alternative: actors

Recognizing the problems with locks, Alice Ryhl some time ago wrote a nice blog post, “Actors with Tokio”, that explains how to setup actors. This problem actually helps to address both our problems around mutable state. The idea is to move the connections array so that it belongs solely to one actor. Instead of directly modifying collections, the other tasks will communicate with this actor by exchanging messages.

So basically there could be two actors, or even three:

Actor A, which owns the connections (list of subscribed streams). It receives messages that are either publishing new messages to the streams or messages that say “add this stream” to the list.

Actor B, which owns the “read half” of the client’s TCP stream. It reads bytes and parses new frames, then sends out requests to the other actors in response. For example, when a subscribe message comes in, it can send a message to Actor A saying “subscribe the client to this channel”.

Actor C, which owns the “write half” of the client’s TCP stream. Both actors A and B will send messages to it when there are things to be sent to client.

To see how this would be implemented, take a look at Alice’s post. The TL;DR is that you would model connections between actors as tokio channels. Each actor is either spawned or otherwise setup to run independently. You still wind up using select!, but you only use it to receive messages from multiple channels at once. This doesn’t present any cancelation hazards because the channel code is carefully written to avoid them.

This setup works fine, and is even elegant in its own way, but it’s also not living up to Rust’s concept of performant or the goal of “zero-cost abstractions” (ZCA). In particular, the idea with ZCA is that it is supposed to give you a model that says “if you wrote this by hand, you couldn’t do any better”. But if you wrote a mini-redis server in C, by hand, you probably wouldn’t adopt actors. In some sense, this is just adopting something much closer to the Promise model. (Plus, the most obvious way to implement actors in tokio is largely to use tokio::spawn, which definitely adds overhead, or to use FuturesUnordered, which can be a bit subtle as well – moro does address these problems by adding a nice API here.)

(The other challenge with actors implemented this way is coordinating shutdown, though it can certainly be done: you just have to remember to thread the shutdown handler around everywhere.)

Cancellation as the “dark knight”: looking again at select!

Taking a step back, we’ve now seen that trying to use distinct tasks introduces this interesting problem that we have shared data being accessed by all the tasks. That either pushes us to locks (broken) or actors (works), but either way, it raises the question: why wasn’t this a problem with select!? After all, select! is still combining various logical tasks, and those tasks are still touching the same variables, so why is the compiler ok with it?

The answer is closely tied to cancellation: the select! setup works because

the things running concurrently are not touching overlapping state:

one of them is looking at subscriptions (waiting for a message);

another is looking at connection;

and the last one is receiving the termination message.

and once we decide which one of these paths to take, we cancel all the others.

This last part is key: if we receive an incoming message from the client, for example, we drop the future that was looking at subscriptions, canceling it. That means subscriptions is no longer in use, so we can push new subscriptions into it, or remove things from it.

So, cancellation is both what enables the mini-redis example to be performant and a zero-cost abstraction, but it is also the cause of our reliability hazards. That’s a pickle!

Conclusions

We’ve seen a lot of information, so let me try to sum it all up for you:

Fine-grained cancellation in select! is what enables async Rust to be a zero-cost abstraction and to avoid the need to create either locks or actors all over the place.

Fine-grained cancellation in select is the root cause for a LOT of reliability problems.

You’ll note that I wrote fine-grained cancellation. What I mean by that is specifically things like how select! will cancel the other futures. This is very different from coarse-grained cancellation like having the entire server shutdown, for which I think structured concurrency solves the problem very well.

So what can we do about fine-grained cancellation? Well, the answer depends.

In the short term, I value reliability above all, so I think adopting an actor-like pattern is a good idea. This setup can be a nice architecture for a lot of reasons², and while I’ve described it as “not performant”, that assumes you are running a really high-scale server that has to handle a ton of load. For most applications, it will perform very well indeed.

I think it makes sense to be very judiciouis in what you select!! In the context of Materialize, GusWynn was experimenting with a Selectable trait for precisely this reason; that trait just permits select from a few sources, like channels. It’d be nice to support some convenient way of declaring that an async fn is cancel-safe, e.g. only allowing it to be used in select! if it is tagged with #[cancel_safe]. (This might be something one could author as a proc macro.)

But in the longer term, I’m interested if we can come up with a mechanism that will allow the compiler to get smarter. For example, I think it’d be cool if we could share one &mut across two async fn that are running concurrently, so long as that &mut is not borrowed across an await point. I have thoughts on that but…not for this post.

My experience is that being forced to get a clear picture on this is part of what makes Rust code reliable in practice. ↩︎

It’d be fun to take a look at Reactive Design Patterns and examine how many of them apply to Rust. I enjoyed that book a lot. ↩︎

Coherence and crate-level where-clauses

2022-04-17T00:00:00+00:00

Rust has been wrestling with coherence more-or-less since we added methods; our current rule, the “orphan rule”, is safe but overly strict. Roughly speaking, the rule says that one can only implement foreign traits (that is, traits defined by one of your dependencies) for local types (that is, types that you define). The goal of this rule was to help foster the crates.io ecosystem — we wanted to ensure that you could grab any two crates and use them together, without worrying that they might define incompatible impls that can’t be combined. The rule has served us well in that respect, but over time we’ve seen that it can also have a kind of chilling effect, unintentionally working against successful composition of crates in the ecosystem. For this reason, I’ve come to believe that we will have to weaken the orphan rule. The purpose of this post is to write out some preliminary exploration of ways that we might do that.

So wait, how does the orphan rule protect composition?

You might be wondering how the orphan rule ensures you can compose crates from crates.io. Well, imagine that there is a crate widget that defines a struct Widget:

// crate widget #[derive(PartialEq, Eq)] pub struct Widget { pub name: String, pub code: u32, }
As you can see, the crate has derived Eq, but neglected to derive Hash. Now, I am writing another crate, widget-factory that depends on widget. I’d like to store widgets in a hashset, but I can’t, because they don’t implement Hash! Today, if you want Widget to implement Hash, the only way is to open a PR against widget and wait for a new release.¹ But if we didn’t have the orphan rule, we could just define Hash ourselves:

// Crate widget-factory impl Hash for Widget { fn hash(&self) { // PSA: Don’t really define your hash functions like this omg. self.name.hash() ^ self.code.hash() } }
Now we can define our WidgetFactory using HashSet…

pub struct WidgetFactory { produced: HashSet<Widget>, } impl WidgetFactory { fn take_produced(&mut self) -> HashSet<Widget> { self.produced.take() } }
OK, so far so good, but what happens if somebody else defines a widget-delivery crate and they too wish to use a HashSet? Well, they will also define Hash for Widget, but of course they might do it differently — maybe even very badly:

// Crate widget-factory impl Hash for Widget { fn hash(&self) { // PSA: You REALLY shouldn’t define your hash functions this way omg 0 } }
Now the problem comes when I try to develop my widget-app crate that depends on widget-delivery and widget-factory. I now have two different impls of Hash for Widget, so which should the compiler use?

There are a bunch of answers we might give here, but most of them are bad:

We could have each crate use its own impl, in theory: but that wouldn’t work so well if the user tried to take a HashSet from one crate and pass it to another crate.

The compiler could pick one of the two impls arbitrarily, but how do we know which one to use? In this case, one of them would give very bad performance, but it’s also possible that some code is designed to expect the exact hash algorithm it specified.

This is even harder with associated types.

Users could tell us which impl they want, which is maybe better, but it also means that the widget-delivery crates have to be prepared that any impl they are using might be switched to another one by some other crate later on. This makes it impossible for us to inline the hash function or do other optimizations except at the very last second.

Faced with these options, we decided to just rule out orphan impls altogether. Too much hassle!

But the orphan rules make it hard to establish a standard

The orphan rules work well at ensuring that we can link two crates together, but ironically they can also work to make actual interop much harder. Consider the async runtime situation. Right now, there are a number of async runtimes, but no convenient way to write code that works with any runtime. As a result, people writing async libraries often wind up writing directly against one specific runtime. The end result is that we cannot combine libraries that were written against different runtimes, or at least that doing so can result in surprising failures.

It would be nice if we could implement some traits that allowed for greater interop. But we don’t quite know what those traits should look like (we also lack support for async fn in traits, but that’s coming!), so it would be nice if we could introduce those traits in the crates.io ecosystem and iterate a bit there — this was indeed the original vision for the futures crate! But if we do that, in practice, then the same crate that defines the trait must also define an implementation for every runtime. The problem is that the runtimes won’t want to depend on the futures crate, as it is still unstable; and the futures crate doesn’t want to have to depend on every runtime. So we’re kind of stuck. And of course if the futures crate were to take a dependency on some specific runtime, then that runtime couldn’t later add futures as a dependency, since that would result in a cycle.

Distinguishing “I need an impl” from “I prove an impl”

At the end of the day, I think we’re going to have to lift the orphan rule, and just accept that it may be possible to create crates that cannot be linked together because they contain overlapping impls. However, we can still give people the tools to ensure that composition works smoothly.

I would like to see us distinguish (at least) two cases:

I need this type to implement this trait (which maybe it doesn’t, yet).

I am supplying an impl of a trait for a given type.

The idea would be that most crates can just declare that they need an impl without actually supplying a specific one. Any number of such crates can be combined together without a problem (assuming that they don’t put inconsistent conditions on associated types).

Then, separately, one can have a crate that actually supplies an impl of a foreign trait for a foreign type. These impls can be isolated as much as possible. The hope is that only the final binary would be responsible for actually supplying the impl itself.

Where clauses are how we express “I need an impl” today

If you think about it, expressing “I need an impl” is something that we do all the time, but we typically do it with generic types. For example, when I write a function like so…

fn clone_list<T: Clone>(v: &[T]) { … }
I am saying “I need a type T and I need it to implement Clone”, but I’m not being specific about what those types are.

In fact, it’s also possible to use where-clauses to specify things about non-generic types…

fn example() where u32: Copy, { {
…but the compiler today is a bit inconsistent about how it treats those. The plan is to move to a model where we “trust” what the user wrote — e.g., if the user wrote where String: Copy, then the function would treat the String type as if it were Copy, even if we can’t find any Copy impl. It so happens that such a function could never be called, but that’s no reason you can’t define it².

Where clauses at the crate scope

What if we could put where clauses at the crate scope? We could use that to express impls that we need to exist without actually providing those impls. For example, the widget-factory crate from our earlier example might add a line like this into its lib.rs:

// Crate widget-factory where Widget: Hash;
As a result, people would not be able to use that crate unless they either (a) supplied an impl of Hash for Widget or (b) repeated the where clause themselves, propagating the request up to the crates that depend on them. (Same as with any other where-clause.)

The intent would be to do the latter, propagating the dependencies up to the root crate, which could then either supply the impl itself or link in some other crate that does.

Allow crates to implement foreign traits for foreign impls

The next part of the idea would be to allow crates to implement foreign traits for foreign impls. I think I would convert the orphan check into a “deny by default” lint. The lint text would explain that these impls are not permitted because they may cause linker errors, but a crate could mark the impl with #[allow(orphan_impls]) to ignore that warning. Best practice would be to put orphan impls into their own crate that others can use.

Another idea: permit duplicate impls (especially those generated via derive)

Josh Triplett floated another interesting idea, which is that we could permit duplicate impls. One common example might be if the impl is defined via a derive (though we’d have to extend derive to permit one to derive on a struct definition that is not local somehow).

Conflicting where clauses

Even if you don’t supply an actual impl, it’s possible to create two crates that can’t be linked together if they contain contradictory where-clauses. For example, perhaps widget-factory defines Widget as an iterator over strings…

// Widget-factory where Widget: Iterator<Item = String>;
…whilst widget-lib wants Widget to be an iterator over UUIDs:

// Widget-lib where Widget: Iterator<Item = UUID>;
At the end of the day, at most one of these where-clauses can be satisfied, not both, so the two crates would not interoperate. That seems inevitable and ok.

Expressing target dependencies via where-clauses

Another idea that has been kicking around is the idea of expressing portability across target-architectures via traits and some kind of Platform type. As an example, one could imagine having code that says where Platform: NativeSimd to mean “this code requires native SIMD support”, or perhaps where Platform: Windows to mean “this msut support various windows APIs. This is just a “kernel” of an idea, I have no idea what the real trait hierarchy would look like, but it’s quite appealing and seems to fit well with the idea of crate-level where-clauses. Essentially the idea is to allow crates to “constrain the environment that they are used in” in an explicit way.

Module-level generics

In truth, the idea of crate-level where clauses is kind of a special case of having module-level generics, which I would very much like. The idea would be to allow modules (like types, functions, etc) to declare generic parameters and where-clauses.³ These would be nameable and usable from all code within the module, and when you referenced an item from outside the module, you would have to specify their value. This is very much like how a trait-level generic gets “inherited” by the methods in the trait.

I have wanted this for a long time because I often have modules where all the code is parameterized over some sort of “context parameter”. In the compiler, that is the lifetime ’tcx, but very often it’s some kind of generic type (e.g., Interner in salsa).

Conclusion

I discussed a few things in this post:

How coherence helps composability by ensuring that crates can be linked together, but harms composability by making it much harder to establish and use interoperability traits.

How crate-level where-clauses can allow us to express “I need someone to implement this trait” without actually providing an impl, providing for the ability to link things together.

A sketch of how crate-level where-clauses might be generalized to capture other kinds of constraints on the environment, such as conditions on the target platform, or to module-level generics, which could potentially be an ergonomic win.

Overall, I feel pretty excited about this direction. I feel like more and more things are becoming possible if we think about generalizing the trait system and making it more uniform. All of this, in my mind, builds on the work we’ve been doing to create a more precise definition of the trait system in a-mir-formality and to build up a team with expertise in how it works (see the types team RFC). I’ll write more about those in upcoming posts though! =)

You could also create a newtype and making your hashmap key off the newtype, but that’s more of a workaround, and doesn’t always work out. ↩︎

It might be nice of us to give a warning. ↩︎

Fans of ML will recognize this as “applicative functors”. ↩︎

Implied bounds and perfect derive

2022-04-12T00:00:00+00:00

There are two ergonomic features that have been discussed for quite some time in Rust land: perfect derive and expanded implied bounds. Until recently, we were a bit stuck on the best way to implement them. Recently though I’ve been working on a new formulation of the Rust trait checker that gives us a bunch of new capabilities — among them, it resolved a soundness formulation that would have prevented these two features from being combined. I’m not going to describe my fix in detail in this post, though; instead, I want to ask a different question. Now that we can implement these features, should we?

Both of these features fit nicely into the less rigamarole part of the lang team Rust 2024 roadmap. That is, they allow the compiler to be smarter and require less annotation from you to figure out what code should be legal. Interestingly, as a direct result of that, they both also carry the same downside: semver hazards.

What is a semver hazard?

A semver hazard occurs when you have a change which feels innocuous but which, in fact, can break clients of your library. Whenever you try to automatically figure out some part of a crate’s public interface, you risk some kind of semver hazard. This doesn’t necessarily mean that you shouldn’t do the auto-detection: the convenience may be worth it. But it’s usually worth asking yourself if there is some way to lessen the semver hazard while still getting similar or the same benefits.

Rust has a number of semver hazards today.¹ The most common example is around thread-safety. In Rust, a struct MyStruct is automatically deemed to implement the trait Send so long as all the fields of MyStruct are Send (this is why we call Send an auto trait: it is automatically implemented). This is very convenient, but an implication of it is that adding a private field to your struct whose type is not thread-safe (e.g., a Rc) is potentially a breaking change: if someone was using your library and sending MyStruct to run in another thread, they would no longer be able to do so.

What is “perfect derive”?

So what is the perfect derive feature? Currently, when you derive a trait (e.g., Clone) on a generic type, the derive just assumes that all the generic parameters must be Clone. This is sometimes necessary, but not always; the idea of perfect derive is to change how derive works so that it instead figures out exactly the bounds that are needed.

Let’s see an example. Consider this List type, which creates a linked list of T elements. Suppose that List can be deref’d to yield its &T value. However, lists are immutable once created, and we also want them to be cheaply cloneable, so we use Rc to store the data itself:

#[derive(Clone)] struct List<T> { data: Rc<T>, next: Option<Rc<List<T>>>, } impl<T> Deref for List<T> { type Target = T; fn deref(&self) -> &T { &self.data } }
Currently, derive is going to generate an impl that requires T: Clone, like this…

impl<T> Clone for List<T> where T: Clone, { fn clone(&self) { List { value: self.value.clone(), next: self.next.clone(), } } }
If you look closely at this impl, though, you will see that the T: Clone requirement is not actually necessary. This is because the only T in this struct is inside of an Rc, and hence is reference counted. Cloning the Rc only increments the reference count, it doesn’t actually create a new T.

With perfect derive, we would change the derive to generate an impl with one where clause per field, instead. The idea is that what we really need to know is that every field is cloneable (which may in turn require that T be cloneable):

impl<T> Clone for List<T> where Rc<T>: Clone, // type of the `value` field Option<Rc<List<T>>: Clone, // type of the `next` field { fn clone(&self) { /* as before */ } }
Making perfect derive sound was tricky, but we can do it now

This idea is quite old, but there were a few problems that have blocked us from doing it. First, it requires changing all trait matching to permit cycles (currently, cycles are only permitted for auto traits like Send). This is because checking whether List is Send would not require checking whether Option>> is Send. If you work that through, you’ll find that a cycle arises. I’m not going to talk much about this in this post, but it is not a trivial thing to do: if we are not careful, it would make Rust quite unsound indeed. For now, though, let’s just assume we can do it soundly.

The semver hazard with perfect derive

The other problem is that it introduces a new semver hazard: just as Rust currently commits you to being Send so long as you don’t have any non-Send types, derive would now commit List to being cloneable even when T: Clone does not hold.

For example, perhaps we decide that storing a Rc for each list wasn’t really necessary. Therefore, we might refactor List to store T directly, like so:

#[derive(Clone)] struct List<T> { data: T, next: Option<Rc<List<T>>>, }
We might expect that, since we are only changing the type of a private field, this change could not cause any clients of the library to stop compiling. With perfect derive, we would be wrong.² This change means that we now own a T directly, and so List: Clone is only true if T: Clone.

Expanded implied bounds

An implied bound is a where clause that you don’t have to write explicitly. For example, if you have a struct that declares T: Ord, like this one…

struct RedBlackTree<T: Ord> { … } impl<T: Ord> RedBlackTree<T> { fn insert(&mut self, value: T) { … } }
…it would be nice if functions that worked with a red-black tree didn’t have to redeclare those same bounds:

fn insert_smaller<T>(red_black_tree: &mut RedBlackTree<T>, item1: T, item2: T) { // Today, this function would require `where T: Ord`: if item1 < item2 { red_black_tree.insert(item); } else { red_black_tree.insert(item2); } }\
I am saying expanded implied bounds because Rust already has two notions of implied bounds: expanding supertraits (T: Ord implies T: PartialOrd, for example, which is why the fn above can contain item1 < item2) and outlives relations (an argument of type &’a T, for example, implies that T: ‘a). The most maximal version of this proposal would expand those implied bounds from supertraits and lifetimes to any where-clause at all.

Implied bounds and semver

Expanding the set of implied bounds will also introduce a new semver hazard — or perhaps it would be better to say that is expands an existing semver hazard. It’s already the case that removing a supertrait from a trait is a breaking change: if the stdlib were to change trait Ord so that it no longer extended Eq, then Rust programs that just wrote T: Ord would no longer be able to assume that T: Eq, for example.

Similarly, at least with a maximal version of expanded implied bounds, removing the T: Ord from BinaryTree would potentially stop client code from compiling. Making changes like that is not that uncommon. For example, we might want to introduce new methods on BinaryTree that work even without ordering. To do that, we would remove the T: Ord bound from the struct and just keep it on the impl:

struct RedBlackTree<T> { … } impl<T> RedBlackTree<T> { fn len(&self) -> usize { /* doesn’t need to compare `T` values, so no bound */ } } impl<T: Ord> RedBlackTree<T> { fn insert(&mut self, value: T) { … } }
But, if we had a maximal expansion of implied bounds, this could cause crates that depend on your library to stop compiling, because they would no longer be able to assume that RedBlackTree being valid implies X: Ord. As a general rule, I think we want it to be clear what parts of your interface you are committing to and which you are not.

PSA: Removing bounds not always semver compliant

Interestingly, while it is true that you can remove bounds from a struct (today, at least) and be at semver complaint³, this is not the case for impls. For example if I have

impl<T: Copy> MyTrait for Vec<T> { }
and I change it to impl MyTrait for Vec, this is effectively introducing a new blanket impl, and that is not a semver compliant change (see RFC 2451 for more details).

Summarize

So, to summarize:

Perfect derive is great, but it reveals details about your fields—- sure, you can clone your List for any type T now, but maybe you want the right to require T: Clone in the future?

Expanded implied bounds are great, but they prevent you from “relaxing” your requirements in the future— sure, you only ever have a RedBlackTree for T: Ord now, but maybe you want to support more types in the future?

But also: the rules around semver compliance are rather subtle and quick to anger.

How can we fix these features?

I see a few options. The most obvious of course is to just accept the semver hazards. It’s not clear to me whether they will be a problem in practice, and Rust already has a number of similar hazards (e.g., adding a Box makes your type no longer Send).

Another extreme alternative: crate-local implied bounds

Another option for implied bounds would be to expand implied bounds, but only on a crate-local basis. Imagine that the RedBlackTree type is declared in some crate rbtree, like so…

// The crate rbtree struct RedBlackTree<T: Ord> { .. } … impl<T> RedBlackTree<T> { fn insert(&mut self, value: T) { … } }
This impl, because it lives in the same crate as RedBlackTree, would be able to benefit from expanded implied bounds. Therefore, code inside the impl could assume that T: Ord. That’s nice. If I later remove the T: Ord bound from RedBlackTree, I can move it to the impl, and that’s fine.

But if I’m in some downstream crate, then I don’t benefit from implied bounds. If I were going to, say, implement some trait for RedBlackTree, I’d have to repeat T: Ord…

trait MyTrait { } impl<T> MyTrait for rbtrait::RedBlackTree<T> where T: Ord, // required { }
A middle ground: declaring “how public” your bounds are

Another variation would be to add a visibility to your bounds. The default would be that where clauses on structs are “private”, i.e., implied only within your module. But you could declare where clauses as “public”, in which case you would be committing to them as part of your semver guarantee:

struct RedBlackTree<T: pub Ord> { .. }
In principle, we could also support pub(crate) and other visibility modifiers.

Explicit perfect derive

I’ve been focused on implied bounds, but the same questions apply to perfect derive. In that case, I think the question is mildly simpler— we likely want some way to expand the perfect derive syntax to “opt in” to the perfect version (or “opt out” from it).

There have been some proposals that would allow you to be explicit about which parameters require which bounds. I’ve been a fan of those, but now that I’ve realized we can do perfect derive, I’m less sure. Maybe we should just want some way to say “add the bounds all the time” (the default today) or “use perfect derive” (the new option), and that’s good enough. We could even make there be a new attribute, e.g. #[perfect_derive(…)] or #[semver_derive]. Not sure.

Conclusion

In the past, we were blocked for technical reasons from expanding implied bounds and supporting perfect derive, but I believe we have resolved those issues. So now we have to think a bit about semver and decide how much explicit we want to be.

Side not that, no matter what we pick, I think it would be great to have easy tooling to help authors determine if something is a semver breaking change. This is a bit tricky because it requires reasoning about two versions of your code. I know there is rust-semverer but I’m not sure how well maintained it is. It’d be great to have a simple github action one could deploy that would warn you when reviewing PRs.

Rules regarding semver are documented here, by the way. ↩︎

Actually, you were wrong before: changing the types of private fields in Rust can already be a breaking change, as we discussed earlier (e.g., by introducing a Rc, which makes the type no longer implement Send). ↩︎

Uh, no promises — there may be some edge cases, particularly involving regions, where this is not true today. I should experiment. ↩︎

dyn*: can we make dyn sized?

2022-03-29T00:00:00+00:00

Last Friday, tmandry, cramertj, and I had an exciting conversation. We were talking about the design for combining async functions in traits with dyn Trait that tmandry and I had presented to the lang team on Friday. cramertj had an insightful twist to offer on that design, and I want to talk about it here. Keep in mind that this is a piece of “hot off the presses”, in-progress design and hence may easily go nowhere – but at the same time, I’m pretty excited about it. If it works out, it could go a long way towards making dyn Trait user-friendly and accessible in Rust, which I think would be a big deal.

Background: The core problem with dyn

dyn Trait is one of Rust’s most frustrating features. On the one hand, dyn Trait values are absolutely necessary. You need to be able to build up collections of heterogeneous types that all implement some common interface in order to implement core parts of the system. But working with heterogeneous types is just fundamentally hard because you don’t know how big they are. This implies that you have to manipulate them by pointer, and that brings up questions of how to manage the memory that these pointers point at. This is where the problems begin.

Problem: no memory allocator in core

One challenge has to do with how we factor our allocation. The core crate that is required for all Rust programs, libcore, doesn’t have a concept of a memory allocator. It relies purely on stack allocation. For the most part, this works fine: you can pass ownership of objects around by copying them from one stack frame to another. But it doesn’t work if you don’t know how much stack space they occupy!¹

Problem: Dyn traits can’t really be substituted for impl Trait

In Rust today, the type dyn Trait is guaranteed to implement the trait Trait, so long as Trait is dyn safe. That seems pretty cool, but in practice it’s not all that useful. Consider a simple function that operates on any kind of Debug type:

fn print_me(x: impl Debug) { println!(“{x:?}”); }
Even though the Debug trait is dyn-safe, you can’t just change the impl above into a dyn:

fn print_me(x: dyn Debug) { .. }
The problem here is that stack-allocated parameters need to have a known size, and we don’t know how big dyn is. The common solution is to introduce some kind of pointer, e.g. a reference:

fn print_me(x: &dyn Debug) { … }
That works ok for this function, but it has a few downsides. First, we have to change existing callers of print_me — maybe we had print_me(22) before, but now they have to write print_me(&22). That’s an ergonomic hit. Second, we’ve now hardcoded that we are borrowing the dyn Debug. There are other functions where this isn’t necessarily what we wanted to do. Maybe we wanted to store that dyn Debug into a datastructure and return it — for example, this function print_me_later returns a closure that will print x when called:

fn print_me_later(x: &dyn Debug) -> impl FnOnce() + ‘_ { move || println!(“{x:?}”) }
Imagine that we wanted to spawn a thread that will invoke print_me_later:

fn spawn_thread(value: usize) { let closure = print_me_later(&value); std:🧵:spawn(move || closure()); // <— Error, ‘static bound not satisfied }
This code will not compile because closure references value on the stack. But if we had written print_me_later with an impl Debug parameter, it could take ownership of its argument and everything would work fine.

Of course, we could solve this by writing print_me_later to use Box but that’s hardcoding memory allocation. This is problematic if we want print_me_later to appear in a context, like libcore, that might not even have access to a memory allocator.

fn print_me_later(x: Box<dyn Debug>) -> impl FnOnce() + ‘_ { move || println!(“{x:?}”) }
In this specific example, the Box is also kind of inefficient. After all, the value x is just a usize, and a Box is also a usize, so in theory we could just copy the integer around (the usize methods expect an &usize, after all). This is sort of a special case, but it does come up more than you would think at the lower levels of the system, where it may be worth the trouble to try and pack things into a usize — there are a number of futures, for example, that don’t really require much state.

The idea: What if the dyn were the pointer?

In the proposal for “async fns in traits” that tmandry and I put forward, we had introduced the idea of dynx Trait types. dynx Trait types were not an actual syntax that users would ever type; rather, they were an implementation detail. Effectively a dynx Future refers to a pointer to a type that implements Future. They don’t hardcode that this pointer is a Box; instead, the vtable includes a “drop” function that knows how to release the pointer’s referent (for a Box, that would free the memory).

Better idea: What if the dyn were “something of known size”?

After the lang team meeting, tmandry and I met with cramertj, who proceeded to point out to us something very insightful.² The truth is that dynx Trait values don’t have to be a pointer to something that implemented Trait — they just have to be something pointer-sized. tmandry and I actually knew that, but what we didn’t see was how critically important this was:

First, a number of futures, in practice, consist of very little state and can be pointer-sized. For example, reading from a file descriptor only needs to store the file descriptor, which is a 32-bit integer, since the kernel stores the other state. Similarly the future for a timer or other builtin runtime primitive often just needs to store an index.

Second, a dynx Trait lets you write code that manipulates values which may be boxed without directly talking about the box. This is critical for code that wants to appear in libcore or be reusable across any possible context.

As an example of something that would be much easier this way, the Waker struct, which lives in libcore, is effectively a hand-written dynx Waker struct.

Finally, and we’ll get to this in a bit, a lot of low-level systems code employs clever tricks where they know something about the layout of a value. For example, you might have a vector that contains values of various types, but (a) all those types have the same size and (b) they all share a common prefix. In that case, you can manipulate fields in that prefix without knowing what kind of data is contained with, and use a vtable or discriminatory to do the rest.

In Rust, this pattern is painful to encode, though you can sometimes do it with a Vec ~~where S is some struct that contains the prefix fields and an enum. Enums work ok but if you have a more open-ended set of types, you might prefer to have trait objects.~~

A sketch: The dyn-star type

To give you a sense for how cool “fixed-size dyn types” could be, I’m going to start with a very simple design sketch. Imagine that we introduced a new type dyn* Trait, which represents the pair of:

a pointer-sized value of some type T that implements Trait (the * is meant to convey “pointer-sized”³)

a vtable for T: Trait; the drop method in the vtable drops the T value.

For now, don’t get too hung up on the specific syntax. There’s plenty of time to bikeshed, and I’ll talk a bit about how we might truly phase in something like dyn*. For now let’s just talk about what it would be like to use it.

Creating a dyn*

To coerce a value of type T into a dyn* Trait, two constraints must be met:

The type T must be pointer-sized or smaller.

The type T must implement Trait

Converting an impl to a dyn*

Using dyn*, we can convert impl Trait directly to dyn* Trait. This works fine, because dyn* Trait is Sized. To be truly equivalent to impl Trait, you do actually want a lifetime bound, so that the dyn* can represent references too:

// fn print_me(x: impl Debug) {…} becomes fn print_me(x: dyn* Debug + ‘_) { println!(“{x:?}”); } fn print_me_later(x: dyn* Debug + ‘_) -> impl FnOnce() + ‘_ { move || println!(“{x:?}”) }
These two functions can be directly invoked on a usize (e.g., print_me_later(22) compiles). What’s more, they work on references (e.g., print_me_later(&some_type)) or boxed values print_me_later(Box::new(some_type))).

They are also suitable for inclusion in a no-std project, as they don’t directly reference an allocator. Instead, when the dyn* is dropped, we will invoke its destructor from the vtable, which might wind up deallocating memory (but doesn’t have to).

More things are dyn* safe than dyn safe

Many things that were hard for dyn Trait values are trivial for dyn* Trait values:

By-value self methods work fine: a dyn* Trait value is sized, so you can move ownership of it just by copying its bytes.

Returning Self, as in the Clone trait, works fine.

Similarly, the fact that trait Clone: Sized doesn’t mean that dyn* Clone can’t implement Clone, although it does imply that dyn Clone: Clone cannot hold.

Function arguments of type impl ArgTrait can be converted to dyn* ArgTrait, so long as ArgTrait is dyn*-safe

Returning an impl ArgTrait can return a dyn* ArgTrait.

In short, a large number of the barriers that make traits “not dyn-safe” don’t apply to dyn*. Not all, of course. Traits that take parameters of type Self won’t work (we don’t know that two dyn* Trait types have the same underlying type) and we also can’t support generic methods in many cases (we wouldn’t know how to monomorphize)⁴.

A catch: dyn* Foo requires Box: Foo and friends

There is one catch from this whole setup, but I like to think of it is as an opportunity. In order to create a dyn* Trait from a pointer type like Box, you need to know that Box: Trait, whereas creating a Box just requires knowing that Widget: Trait (this follows directly from the fact that the Box is now part of the hidden type).

At the moment, annoyingly, when you define a trait you don’t automatically get any sort of impls for “pointers to types that implement the trait”. Instead, people often define such traits automatically — for example, the Iterator trait has impls like

impl for &mut I where I: ?Sized + Iterator impl for Box where I: ?Sized + Iterator
Many people forget to define such impls, however, which can be annoying in practice (and not just when using dyn).

I’m not totally sure the best way to fix this, but I view it as an opportunity because if we can supply such impls, that would make Rust more ergonomic overall.

One interesting thing: the impls for Iterator that you see above include I: ?Sized, which makes them applicable to Box. But with dyn* Iterator, we are starting from a Box type — in other words, the ?Sized bound is not necessary, because we are creating our “dyn” abstraction around the pointer, which is sized. (The ?Sized is not harmful, either, of course, and if we auto-generate such impls, we should include it so that they apply to old-style dyn as well as slice types like [u8].)

Another catch: “shared subsets” of traits

One of the cool things about Rust’s Trait design is that it allows you to combine “read-only” and “modifier” methods into one trait, as in this example:

trait WidgetContainer { fn num_components(&self); fn add_component(&mut self, c: WidgetComponent); }
I can write a function that takes a &mut dyn WidgetContainer and it will be able to invoke both methods. If that function takes &dyn WidgetContainer instead, it can only invoke num_components.

If we don’t do anything else, this flexibility is going to be lost with dyn*. Imagine that we wish to create a dyn* WidgetContainer from some &impl WidgetContainer type. To do that, we would need an impl of WidgetContainer for &T, but we can’t write that code, at least not without panicking:

impl<W> WidgetContainer for &W where W: WidgetContainer, { fn num_components(&self) { W::num_components(self) // OK } fn add_component(&mut self, c: WidgetComponent) { W::add_component(self, c) // Error! } }
This problem is not specific to dyn — imagine I have some code that just invokes num_components but which can be called with a &W or with a Rc or with other such types. It’s kind of awkward for me to write a function like that now: the easiest way is to hardcode that it takes &W and then lean on deref-coercions in the caller.

One idea that tmandry and I have been kicking around is the idea of having “views” on traits. The idea would be that you could write something like T: &WidgetContainer to mean “the &self methods of WidgetContainer”. If you had this idea, then you could certainly have

impl &WidgetContainer for &W where W: WidgetContainer
because you would only need to define num_components (though I would hope you don’t have to write such an impl by hand).

Now, instead of taking a &dyn WidgetContainer, you would take a dyn &WidgetContainer. Similarly, instead of taking an &impl WidgetContainer, you would probably be better off taking a impl &WidgetContainer (this has some other benefits too, as it happens).

A third catch: dyn safety sometimes puts constraints on impls, not just the trait itself

Rust’s current design assumes that you have a single trait definition and we can determine from that trait definition whether or not the trait ought to be dyn safe. But sometimes there are constraints around dyn safety that actually don’t affect the trait but only the impls of the trait. That kind of situation doesn’t work well with “implicit dyn safety”: if you determine that the trait is dyn-safe, you have to impose those limitations on its impls, but maybe the trait wasn’t meant to be dyn-safe.

I think overall it would be better if traits explicitly declared their intent to be dyn-safe or not. The most obvious way to do that would be with a declaration like dyn trait:

dyn trait Foo { }
As a nice side benefit, a declaration like this could also auto-generate impls like impl Foo for Box and so forth. It would also mean that dyn-safety becomes a semver guarantee.

My main concern here is that I suspect most traits could and should be dyn-safe. I think I’d prefer if one had to opt out from dyn safety instead of opting in. I don’t know what the syntax for that would be, of course, and we’d have to deal with backwards compatibility.

Phasing things in over an edition

If we could start over again, I think I would approach dyn like this:

The syntax dyn Trait means a pointer-sized value that implements Trait. Typically a Box or & but sometimes other things.

The syntax dyn[T] Trait means “a value that is layout-compatible with T that implements Trait”; dyn Trait is thus sugar for dyn[*const ()] Trait, which we might write more compactly as dyn* Trait.

The syntax dyn[T..] Trait means “a value that starts with a prefix of T but has unknown size and implements Trait.

The syntax dyn[..] Trait means “some unknown value of a type that implements Trait”.

Meanwhile, we would extend the grammar of a trait bound with some new capabilities:

A bound like &Trait refers to “only the &self methods from Trait”;

A bound like &mut Trait refers to “only the &self and &mut self methods from Trait”;

Probably this wants to include Pin<&mut Self> too? I’ve not thought about that.

We probably want a way to write a bound like Rc> to mean self: Rc and friends, but I don’t know what that looks like yet. Those kinds of traits are quite unusual.

I would expect that most people would just learn dyn Trait. The use cases for the dyn[] notation are far more specialized and would come later.

Interestingly, we could phase in this syntax in Rust 2024 if we wanted. The idea would be that we move existing uses of dyn to the explicit form in prep for the new edition:

&dyn Trait, for example, would become dyn* Trait + ‘_

Box would become dyn* Trait (note that a ’static bound is implied today; this might be worth reconsidering, but that’s a separate question).

other uses of dyn Trait would become dyn[…] Trait

Then, in Rust 2024, we would rewrite dyn* Trait to just dyn Trait with an “edition idom lint”.

Conclusion

Whew! This was a long post. Let me summarize what we covered:

If dyn Trait encapsulated some value of pointer size that implements Trait and not some value of unknown size:

We could expand the set of things that are dyn safe by quite a lot without needing clever hacks:

methods that take by-value self: fn into_foo(self, …)

methods with parameters of impl Trait type (as long as Trait is dyn safe): fn foo(…, impl Trait, …)

methods that return impl Trait values: fn iter(&self) -> impl Iterator

methods that return Self types: fn clone(&self) -> Self

That would raise some problems we have to deal with, but all of them are things that would be useful anyway:

You’d need dyn &Trait and things to “select” sets of methods.

You’d need a more ergonomic way to ensure that Box: Trait and so forth.

We could plausibly transition to this model for Rust 2024 by introducing two syntaxes, dyn* (pointer-sized) and dyn[..] (unknown size) and then changing what dyn means.

There are a number of details to work out, but among the most prominent are:

Should we declare dyn-safe traits explicitly? (I think yes)

What “bridging” impls should we create when we do so? (e.g., to cover Box: Trait etc)

How exactly do &Trait bounds work — do you get impls automatically? Do you have to write them?

Appendix A: Going even more crazy: dyn[T] for arbitrary prefixes

dyn* is pretty useful. But we could actually generalize it. You could imagine writing dyn[T] to mean “a value whose layout can be read as T. What we’ve called dyn* Trait would thus be equivalent to dyn[*const ()] Trait. This more general version allows us to package up larger values — for example, you could write dyn[[usize; 2]] Trait to mean a “two-word value”.

You could even imagine writing dyn[T] where the T meant that you can safely access the underlying value as a T instance. This would give access to common fields that the implementing type must expose or other such things. Systems programming hacks often lean on clever things like this. This would be a bit tricky to reconcile with cases where the T is a type like usize that is just indicating how many bytes of data there are, since if you are going to allow the dyn[T] to be treated like a &mut T the user could go crazy overwriting values in ways that are definitely not valid. So we’d have to think hard about this to make it work, that’s why I left it for an Appendix.

Appendix B: The “other” big problems with dyn

I think that the designs in this post address a number of the big problems with dyn:

You can’t use it like impl

Lots of useful trait features are not dyn-safe

You have to write ?Sized on impls to make them work

But it leaves a few problems unresolved. One of the biggest to my mind is the interaction with auto traits (and lifetimes, actually). With generic parameters like T: Debug, I don’t have to talk explicitly about whether T is Send or not or whether T contains lifetimes. I can just write write a generic type like struct MyWriter where W: Write { w: W, ... }. Users of MyWriter know what W is, so they can determine whether or not MyWriter: Send based on whether Foo: Send, and they also can understand that MyWriter<&'a Foo> includes references with the lifetime 'a. In contrast, if we did struct MyWriter { w: dyn* Write, ... }, that dyn* Write type is hiding the underlying data. As Rust currently stands, it implies that MyWriter it not Send and that it does not contain references. We don’t have a good way for MyWriter to declare that it is “send if the writer you gave me is send” and use dyn*. That’s an interesting problem! But orthogonal, I think, from the problems addressed in this blog post.

But, you are thinking, what about alloca? The answer is that alloca isn’t really a good option. For one thing, it doesn’t work on all targets, but in particular it doesn’t work for async functions, which require a fixed size stack frame. It also doesn’t let you return things back up the stack, at least not easily. ↩︎

Also, cramertj apparently had this idea a long time back but we didn’t really understand it. Ah well, sometimes it goes like that — you have to reinvent something to realize how brilliant the original inventor really was. ↩︎

In truth, I also just think “dyn-star” sounds cool. I’ve always been jealous of the A* algorithm and wanted to name something in a similar way. Now’s my chance! Ha ha! ↩︎

Obviously, we would be lifting this partly to accommoate impl Trait arguments. I think we could lift this restriction in more cases but it’s going to take a bit more design. ↩︎

Dare to ask for more #rust2024

2022-02-09T00:00:00+00:00

Last year, we shipped Rust 2021 and I have found the changes to be a real improvement in usability. Even though the actual changes themselves were quite modest, the combination of precise capture closure and simpler formatting strings (println!("{x:?}") instead of println!("{:?}", x)) is making a real difference in my “day to day” life.¹ Just like NLL and the new module system from Rust 2018, I’ve quickly adapted to these new conventions. When I go back to older code, with its clunky borrow checker workarounds and format strings, I die a little inside.²

As we enter 2022, I am finding my thoughts turning more and more to the next Rust edition. What do I want from Rust, and the Rust community, over the next few years? To me, the theme that keeps coming to mind is dare to ask for more. Rust has gotten quite a bit nicer to use over the last few years, but I am not satisfied. I believe that there is room for Rust to be 22x more productive³ and easy to use than it is today, and I think we can do it without meaningfully sacrificing reliability, performance, or versatility.

Daring to ask for a more ergonomic, expressive Rust

As Rust usage continues to grow, I have been able to talk to quite a number of Rust users with a wide variety of backgrounds and experience. One of the themes I like to ask about is their experience of learning Rust. In many ways, the story here is much better than I had anticipated. Most people are able to learn Rust and feel productive in 3-6 months. Moreover, once they get used to it, most people seem to really enjoy it, and they talk about how learning ownership rules influences the code they write in other languages too (for the better). They also talk about experiencing far fewer bugs in Rust than in other languages – this is true for C++⁴, but it’s also true for things written in Java or other languages⁵.

That said, it’s also quite clear that using Rust has a significant cognitive overhead. Few Rust users feel like true experts⁶. There are a few topics – “where clauses”, “lifetimes” – that people mention over and over as being confusing. The more I talk to people, the more I get the sense that the problem isn’t any one thing, it’s all the things. It’s having to juggle a lot of concerns all at once, and having to get everything lined up before one can even see your code run.

These interviews really validate the work we did on the ergonomics initiative and also in Rust 2021. One person I spoke to said the following:

Looking backwards, NLL and match ergonomics were major improvements in getting people to learn Rust. A lot of people suddenly found stuff way easier. NLL made a lot of things with regard to mutability much simpler. One remaining thing coming up is disjoint capture of fields in closures. That’s another example where people just didn’t understand, “why is this compiler yelling at me? This should work?”

As happy as I am with those results, I don’t think we’re done. I would like to see progress in two different dimensions:

Fundamental simplifications: These are changes like NLL or disjoint-closure-capture that just change the game in terms of what the compiler can accept. Even though these kinds of changes often make the analysis more complex, they ultimately make the language feel simpler: more of the programs that should work actually do work. Simplifications like this tend not to be particularly controversial, but they are difficult to design and implement. Often they require an edition because of small changes to language semantics in various edge cases.

One of the simplest improvements here would be landing polonius, which would fix #47680, a pattern that I see happening with some regularity. I think that there are also language extensions, like scoped contexts, some kind of view types, specialization, or some way to manage self-referential structs, that could fit in this category. That’s a bit trickier. The language grows, which is not a simplification, but it can make common patterns so much simpler than it’s a net win.

Sanding rough edges. These are changes that just make writing Rust code easier. There are fewer “i’s to dot” or “t’s to cross”. Good examples are lifetime elision. You know you are hitting a rough edge when you find yourself blindly following compiler suggestions, or randomly adding an & or a * here or there to see if it will make the compiler happy.

While sanding rough edges can benefit everyone, the impact is largest for newcomers. Experienced folks have a bit of “survival bias”. They tend to know the tricks and apply them automatically. Newcomers don’t have that benefit and can waste quite a lot of time (or just give up entirely) trying to fix some simple compilation errors.

Match ergonomics was a recent change in this category: while I believe it was an improvement, it also gave rise to a number of rough edges, particularly around references to copy types (see #44619 for more discussion). I’d like to see us fix those, and also fix “rough edges” in other areas, like implied bounds.

Daring to ask for a more ergonomic, expressive async Rust

Going along with the previous bullet, I think we still have quite a bit of work to do before using Async Rust feels natural. Tyler Mandry and I recently wrote a post on the “Inside Rust” blog, Async Rust in 2022, that sketched both the way we want async Rust to feel (“just add async”) and the plan to get there.

It seems clear that highly concurrent applications are a key area where Rust shines, so it makes sense for us to continue investing heavily in this area. What’s more, those investments benefit more than just async Rust users. Many of them are fundamental extensions to Rust, like generic associated types⁷ or type alias impl trait⁸, which ultimately benefit everyone.

Having a truly great async Rust experience, however, is going to require more than language extensions. It’s also going to require better tooling, like tokio console, and more efforts at standardization, like the portability and interoperability effort led by nrc.

Daring to ask for a more ergonomic, expressive unsafe Rust

Strange as it sounds, part of what makes Rust as safe as it is is the fact that Rust supports unsafe code. Unsafe code allows Rust programmers to gain access access to the full range of machine capabilities, which is what allows Rust to be versatile. Rust programmers can then use ownership/borrowing to encapsulate those raw capabilities in a safe interface, so that clients of that library can rely on things working correctly.

There are some flies in the unsafe ointment, though. The reality is that writing correct unsafe Rust code can be quite difficult.⁹ In fact, because we’ve never truly defined the set of rules that unsafe code authors have to follow, you could even say it is literally impossible, since there is no way to know if you are doing it correctly if nobody has defined what correct is.

To be clear, we do have a lot of promising work here! Stacked borrows, for example, looks to be awfully close to a viable approach for the aliasing rules. The rules are implemented in miri and a lot of folks are using that to check their unsafe code. Finally, the unsafe code guidelines effort made good progress on documenting layout guarantees and other aspects of unsafe code, though that work was never RFC’d or made normative. (The issues on that repo also contain a lot of great discussion.)

I think it’s time we paid good attention to the full experience of writing unsafe code. We need to be sure that people can write unsafe Rust abstractions that are correct. This means, yes, that we need to invest in defining the rules they have to follow. I think we also need to invest time in making correct unsafe Rust code more ergonomic to write. Unsafe Rust today often involves a lot of annotations and casts that don’t necessarily add much to the code¹⁰. There are also some core features, like method dispatch with a raw pointer, that don’t work, as well as features (like unsafe fields) that would help in ensuring unsafe guarantees are met.

Daring to ask for a richer, more interactive experience from Rust’s tooling

Tooling has a huge impact on the experience of using Rust, both as a learner and as a power user. I maintain that the the hassle-free experience of rustup and cargo has done as much for Rust’s adoption as our safety guarantees – maybe more. The quality of the compiler’s error messages comes up in virtually every single conversation I have, and I’ve lost count of how many people cite clippy and rustfmt as a key part of their onboarding process for new developers. Furthermore, after many years of ridiculously hard work, Rust’s IDE support is starting to be really, really good. Major kudos to both the rust-analyzer and IntelliJ Rust teams.

And yet, because I’m greedy, I want more. I want Rust to continue its tradition of “groundbreakingly good” tooling. I want you to be able to write cargo test --debug and have your test failures show up automatically in an omniscient debugger that lets you easily determine what happened¹¹. I want profilers that serve up an approachable analysis of where you are burning CPU or allocating memory. I want it to be trivial to “up your game” when it comes to reliability by applying best practices like analyzing and improving code coverage or using a fuzzer to produce inputs.

I’m especially interested in tooling that changes the “fundamental relationship” between the Rust programmer and their programs. The difference between fixing compilation bugs in a modern Rust IDE and using rustc is a good illustration of this. In an IDE, you have the freedom to pick and choose which errors to fix and in which order, and the IDEs are getting good enough these days that this works quite well. Feedback is swift. This can be a big win.

I think we can do more like this. I would like to see people learning how the borrow checker works by “stepping through” code that doesn’t pass the borrow check, seeing the kinds of memory safety errors that can occur if that code were to execute. Or perhaps “debugging” trait resolution failures or other complex errors in a more interactive fashion. The sky’s the limit.

Daring to ask for richer tooling for unsafe Rust

One area where improved tooling could be particularly important is around “unsafe” Rust. If we really want people to write unsafe Rust code that is correct in practice – and I do! – they are going to need help. Just as with all Rust tooling, I think we need to cover the basics, but I also think we can go beyond that. We definitely need sanitizers, for example, but rather than just detecting errors, we can connect those sanitizers to debuggers and use that error as an opportunity to teach people how stacked borrows works. We can build better testing frameworks that make things like fuzzing and property-based testing easy. And we can offer strong support for formal methods, to support libraries that want to invest the time can give higher levels of assurance (the standard library seems like a good candidate, for example).

Conclusion: we got this

As Rust sees more success, it becomes harder and harder to make changes. There’s more and more Rust code out there and continuity and stability can sometimes be more important than fixing something that’s broken. And even when you do decide to make a change, everybody has opinions about how you should be doing it differently – worse yet, sometimes they’re right.¹² It can sometimes be very tempting to say, “Rust is good enough, you don’t want one language for everything anyway” and leave it at that.

For Rust 2024, I don’t want us to do that. I think Rust is awesome. But I think Rust could be awesomer. We definitely shouldn’t go about making changes “just because”, we have to respect the work we’ve done before, and we have to be realistic about the price of churn. But we should be planning and dreaming as though the current crop of Rust programmers is just the beginning – as though the vast majority of Rust programs are yet to be written (which they are).

My hope is that for RustConf 2024, people will be bragging to each other about the hardships they endured back in the day. “Oh yeah,” they’ll say, “I was writing async Rust back in the old days. You had to grab a random crate from crates.io for every little thing you want to do. You want to use an async fn in a trait? Get a crate. You want to write an iterator that can await? Get a crate. People would come to standup after 5 days of hacking and be like ‘I finally got the code to compile!’ And we walked to work uphill in the snow! Both ways! In the summer!”¹³

So yeah, for Rust 2024, let’s dare to ask for more.¹⁴

Footnotes

One interesting change: I’ve been writing more and more code again. This itself is making a big difference in my state of mind, too! ↩︎

Die, I tell you! DIE! ↩︎

Because it’s 2022, get it? ↩︎

I talked to a team that developed some low-level Rust code (what would’ve been writte in C++) and they reported experienced one crash in 3+ years, which originated in an FFI to a C library. That’s just amazing. ↩︎

Most commonly, if Rust has an edge of a language like Java, it is because of our stronger concurrency guarantees. But it’s not only that. It’s also that meeting the required performance bar in other languages often requires one to write code that is “rather clever”. Rust’s higher performance means that one can write simpler code instead, which then has correspondingly fewer bugs. ↩︎

The survey consistenly has a peak of around 7 out of 10 in terms of how people self-identify their expertise. ↩︎

Shout out to Jack Huey, tirelessly driving that work forward! ↩︎

Shout out to Oliver Scherer, tirelessly driving that work forward! ↩︎

Armin wrote a recent article, Unsafe Rust is Too Hard, that gives some real-life examples of the kinds of challenges you can encounter. ↩︎

…besides boilerplate. ↩︎

Watch the recording pernos.co demo that Felix did for the Rustc Reading Club to get a sense for what is possible here! ↩︎

It’s so much easier when everybody else is wrong. ↩︎

I may have gotten a little carried away there. ↩︎

Hey, that rhymes! I’m a poet, and I didn’t even know it! ↩︎

Panics vs cancellation, part 1

2022-01-27T00:00:00+00:00

One of the things people often complain about when doing Async Rust is cancellation. This has always been a bit confusing to me, because it seems to me that async cancellation should feel a lot like panics in practice, and people don’t complain about panics very often (though they do sometimes). This post is the start of a short series comparing panics and cancellation, seeking after the answer to the question “Why is async cancellation a pain point and what should we do about it?” This post focuses on explaining Rust’s panic philosophy and explaining why I see panics and cancellation as being quite analogous to one another.

Why panics are discouraged in Rust

Let’s go back to some pre-history. The Rust design has always included panics, but it hasn’t always included the catch_unwind function. In fact, adding that function was quite controversial. Why?

The reason is that long experience with exceptions has shown that exceptions work really well for propagating errors out, but they don’t work well for recovering from errors or handling them in a structured way. The problem is that exceptions make errors invisible, which means that programmers don’t think about them.

The only time when exceptions work well for recovery is when that recovery is done at a very coarse-grained level. If you have a “main loop” of your application and you can kind of catch the exception and restart that main loop, that can be very useful. You see this insight popping up all over the place; I think Erlang did it best, with their “let it crash” philosophy.

Why exceptions are bad at fine-grained recovery

The reason that exceptions are bad at fine-grained recovery is simple. In most programs, you have some kind of invariants that you are maintaining to ensure your data is in a valid state. It’s relatively straightforward to ensure that these invariants hold at the beginning of every operation and that they hold by the end of every operation. It’s really, really hard to ensure that those invariants hold all the time. Very often, you have some code that wants to make some mutations, put your data in an inconsistent state, and then fix that inconsistency.

Unfortunately, with widespread use of exceptions, what you have is that any piece of code, at any time, might suddenly just abort. So if that function is doing mutation, it could leave the program in an inconsistent state.

Consider this simple pseudocode (inspired by tomaka’s blog post). The idea of this function is that it is going to read from some file, parse the data it reads, and then send that data over a socket:

fn copy_data(from_file: &File, to_socket: &Socket) { let buffer = from_file.read(); let parsed_items = parse(buffer); parsed_items.send(to_socket); }
You might think that since this function doesn’t do any explicit mutation, it would be fine to stop it any point and re-execute it. But that’s not true: there is some implicit state, which is the cursor in the from_file. If the parse function or the send function were to throw an exception, whatever data had just been read (and maybe parsed) would be lost. The next time the function is invoked, it’s not going to go back and re-read that data, it’s just going to proceed from where it left off, and some data is lost.

Rust’s compromise

The initial design of Rust included the idea that panic recovery was only possible at the thread boundary. The idea was that threads own all of their state, so if a thread panicked, you would take down the thread, and with it all of the potentially corrupted state. In this way, recovery could be done with some reasonable assurance of success. There are some limits to this idea. For one thing, threads can share state. The most obvious way for that to happen is with a Mutex, but – as the copy_data example shows – you can also have problems when you are communicating (reading from a file, sending messages over a channel, etc). We have extra mechanisms to help with those cases, such as lock posioning, but the jury is out on how well they work.¹

Why ? is good

All of this discussion of course begs the question, how is one supposed to handle error recovery in Rust? The answer, of course, is the ? operator. This operator desugars into a pattern match, but it has the effect of “propagating” the error to the caller of the function. If we look at the copy_data one more time, but imagine that any potential errors were propagated using results, it would look like:

fn copy_data(from_file: &File, to_socket: &Socket) -> eyre::Result<()> { let buffer = from_file.read()?; let parsed_items = parse(buffer); parsed_items.send(to_socket)?; }
The nice thing about this code is that one can easily see and audit potential errors: for example, I can see that send may result in an error, and a sharp-eyed reviewer might see the potential data loss.² Even better, I can do some sort of recovery in the case of error by opting not to forward the error but matching instead. (Note that the send methods typically pass back the message in the event of an error.)

fn copy_data(from_file: &File, to_socket: &Socket) -> eyre::Result<()> { let buffer = from_file.read()?; let parsed_items = parse(buffer); match parsed_items.send(to_socket) { Ok(()) => (), Err(SendError(parsed_items)) => recover_from_error(parsed_items), } }
How does this connect to async cancellation?

I said that, from a user’s perspective, it seems to me that async cancellation and Rust panics should feel very similar. Let me explain.

It sometimes happen that you have spawned a future whose result is no longer needed. For example, you may be running a server that is doing work on behalf of a client, but that client may drop its connection, in which case you’d like to cancel that work.

In Rust, our cancellation story is centered around dropping. The idea is that to cancel a future, you drop it. Whenever you drop any kind of value in Rust, the value’s destructor runs which has the job of disposing of whatever resources that value owns. In the case of a future, the values that it owns are the suspended variables from the stack frame. Consider that same copy_data function we saw earlier, but ported to async Rust:

async fn copy_data(from_file: &File, to_socket: &Socket) { let buffer = from_file.read().await; let parsed_items = parse(buffer); parsed_items.send(to_socket).await; }
Suppose that, at some point, we pause the program at the final line, parsed_items.send(...).await. In that case, the future would be storing the value of buffer and parsed_items. So when the future is dropped, those values will be dropped.

In effect, if you look at things from the “inside view” of the async fn, cancellation looks like the await call panicking – it unwinds the stack, running the destructors for all values. The analogy, of course, only goes so far: you can’t, for example, “catch” the unwinding from a cancellation. Also, panics arise from code that the thread executed, but cancellations are injected from the outside when the async fn’s result is no longer needed.³

Next time

In the next post I plan to start looking at examples of async cancellation and practice, trying to pinpoint how it is used and why it seems to cause more problems than panic.

Thanks

Thanks to Aaron Turon, Yoshua Wuyts, Yehuda Katz, and others with whom I’ve deep dived on this topic over the years, and to tomaka for their blog post.

Footnotes

My take is that the concept behind lock poisoning still seems good to me, but the ergonomics of how we implemented it are bad, and make people not like it. That said, I’d like to dig more into this: I’ve been hearing from various people that – even in their limited form – panics are one of the weaker points in Rust’s reliability story, and I’m not yet sure what to think. ↩︎

My experience is that these bugs are hard to spot in review, but that the ? operator is invaluable when debugging – in that case, you are asking the question, “how could this function possibly return early?”, and having the ? operator really helps you find the answer. ↩︎

This could be a crucial difference: I think, for example, it’s the reason that Java deprecated its Thread.stop method. ↩︎

Dyn async traits, part 7: a design emerges?

2022-01-07T00:00:00+00:00

Hi all! Welcome to 2022! Towards the end of last year, Tyler Mandry and I were doing a lot of iteration around supporting “dyn async trait” – i.e., making traits that use async fn dyn safe – and we’re starting to feel pretty good about our design. This is the start of several blog posts talking about where we’re at. In this first post, I’m going to reiterate our goals and give a high-level outline of the design. The next few posts will dive more into the details and the next steps.

The goal: traits with async fn that work “just like normal”

It’s been a while since my last post about dyn trait, so let’s start by reviewing the overall goal: our mission is to allow async fn to be used in traits just like fn. For example, we would like to have an async version of the Iterator trait that looks roughly like this¹:

trait AsyncIterator { type Item; async fn next(&mut self) -> Self::Item; }
You should be able to use this AsyncIterator trait in all the ways you would use any other trait. Naturally, static dispatch and impl Trait should work:

async fn sum_static(mut v: impl AsyncIterator<Item = u32>) -> u32 { let mut result = 0; while let Some(i) = v.next().await { result += i; } result }
But dynamic dispatch should work too:

async fn sum_dyn(v: &mut dyn AsyncIterator<Item = u32>) -> u32 { // ^^^ let mut result = 0; while let Some(i) = v.next().await { result += i; } result }
Another goal: leave dyn cleaner than we found it

While we started out with the goal of improving async fn, we’ve also had a general interest in making dyn Trait more usable overall. There are a few reasons for this. To start, async fn is itself just sugar for a function that returns impl Trait, so making async fn in traits work is equivalent to making RPITIT (“return position impl trait in traits”) work. But also, the existing dyn Trait design contains a number of limitations that can be pretty frustrating, and so we would like a design that improves as many of those as possible. Currently, our plan lifts the following limitations, so that traits which make use of these features would still be compatible with dyn:

Return position impl Trait, so long as Trait is dyn safe.

e.g., fn get_widgets(&self) -> impl Iterator

As discussed above, this means that async fn works, since it desugars

Argument position impl Trait, so long as Trait is dyn safe.

e.g., fn process_widgets(&mut self, items: impl Iterator).

By-value self methods.

e.g., given fn process(self) and d: Box, able to call d.process()

eventually this would be extended to other “box-like” smart pointers

If you put all three of those together, it represents a pretty large expansion to what dyn safety feels like in Rust. Here is an example trait that would now be dyn safe that uses all of these things together in a natural way:

trait Widget { async fn augment(&mut self, component: impl Into<WidgetComponent>); fn components(&self) -> impl Iterator<Item = WidgetComponent>; async fn transmit(self, factory: impl Factory); }
Final goal: works without an allocator, too, though you have to work a bit harder

The most straightforward way to support RPITIT is to allocate a Box to store the return value. Most of the time, this is just fine. But there are use-cases where it’s not a good choice:

In a kernel, where you would like to use a custom allocator.

In a tight loop, where the performance cost of an allocation is too high.

Extreme embedded cases, where you have no allocator at all.

Therefore, we would like to ensure that it is possible to use a trait that uses async fns or RPITIT without requiring an allocator, though we think it’s ok for that to require a bit more work. Here are some alternative strategies one might want to support:

Pre-allocating stack space: when you create the dyn Trait, you reserve some space on the stack to store any futures or impl Trait that it might return.

Caching: reuse the same Box over and over to reduce the performance impact (a good allocator would do this for you, but not all systems ship with efficient allocators).

Sealed trait: you derive a wrapper enum for just the types that you need.

Ultimately, though, there is no limit to the number of ways that one might manage dynamic dispatch, so the goal is not to have a “built-in” set of strategies but rather allow people to develop their own using procedural macros. We can then offer the most common strategies in utility crates or perhaps even in the stdlib, while also allowing people to develop their own if they have very particular needs.

The design from 22,222 feet

I’ve drawn a little diagram to illustrate how our design works at a high-level:

VtableVtableCallerCallerArgumentadaptation from vtableArgument…Normal function found in the implNormal functi…Return value adaptation to vtableReturn value…Return type adaptation from vtableReturn type a…Caller knows:Types of impl Trait arguments.Caller does not know:Type of the callee.Precise return type, if function returns impl Trait.Caller knows:…Argument adaptation to vtableArgument adap…Callee does not know:Types of impl Trait arguments.Callee knows:Type of the callee.Precise return type, if function returns impl Trait.Callee does not know:…Viewer does not support full SVG 1.1

Let’s walk through it:

To start, we have the caller, which has access to some kind of dyn trait, such as w: &mut Widget, and wishes to call a method, like w.augment()

The caller looks up the function for augment in the vtable and calls it:

But wait, augment takes a impl Into, which means that it is a generic function. Normally, we would have a separate copy of this function for every Into type! But we must have only a single copy for the vtable! What do we do?

The answer is that the vtable encodes a copy that expects “some kind of pointer to a dyn Into”. This could be a Box but it could also be other kinds of pointers: I’m being hand-wavy for now, I’ll go into the details later.

The caller therefore has the job of creating a “pointer to a dyn Into”. It can do this because it knows the type of the value being provided; in this case, it would do it by allocating some memory space on the stack.

The vtable, meanwhile, includes a pointer to the right function to call. But it’s not a direct pointer to the function from the impl: it’s a lightweight shim that wraps that function. This shim has the job of converting from the vtable’s ABI into the standard ABI used for static dispatch.

When the function returns, meanwhile, it is giving back some kind of future. The callee knows that type, but the caller doesn’t. Therefore, the callee has the job of converting it to “some kind of pointer to a dyn Future” and returning that pointer to the caller.

The default is to box it, but the callee can customize this to use other strategies.

The caller gets back its “pointer to a dyn Future” and is able to await that, even though it doesn’t know exactly what sort of future it is.

Upcoming posts

In upcoming blog posts, I’m going to expand on several things that I alluded to in my walkthrough:

“Pointer to a dyn Trait”:

How exactly do we encode “some kind of pointer” and what does that mean?

This is really key, because we need to be able to support

Adaptation for impl Trait arguments:

How do we adapt to/from the vtable for arguments of generic type?

Hint: it involves create a dyn Trait for the argument

Adaptation for impl trait return values:

How do we adapt to/from the vtable for arguments of generic type?

Hint: it involves returning a dyn Trait, potentially boxed but not necessarily

Adaptation for by-value self:

How do we adapt to/from the vtable for by-value self, and when are such functions callable?

Boxing and alternatives thereto:

When you call an async fn or fn that returns impl Trait via dynamic dispatch, the default behavior is going to allocate a Box, but we’ve seen that doesn’t work for everyone. How convenient can we make it to select an alternative strategy like stack pre-allocation, and how can people create their own strategies?

We’ll also be updating the async fundamentals initiative page with more detailed design docs.

Appendix: Things I’d still like to see

I’m pretty excited about where we’re landing in this round of work, but it doesn’t get dyn where I ultimately want it to be. My ultimate goal is that people are able to use dynamic dispatch as conveniently as you use impl Trait, but I’m not entirely sure how to get there. That means being able to write function signatures that don’t talk about Box vs & or other details that you don’t have to deal with when you talk about impl Trait. It also means not having to worry so much about Send/Sync and lifetimes.

Here are some of the improvements I would like to see, if we can figure out how:

Support clone:

Given trait Widget: Clone and w: Box, able to invoke w.clone()

This almost works, but the fact that trait Clone: Sized makes it difficult.

Support “partially dyn safe” traits:

Right now, dyn safe is all or nothing. This has the nice implication that dyn Foo: Foo for all types. However, it is also limiting, and many people have told me they find it confusing. Moreover, dyn Foo is not Sized, and hence while it’s cool conceptually that dyn Foo implements Foo, you can’t actually use a dyn Foo in the same way that you would use most other types.

Improve how Send interacts with returned values (e.g., RPIT, async fn in traits, etc):

If you write dyn Foo + Send, that

Avoid having to talk about pointers so much

When you use impl Trait, you get a really ergonomic experience today:

fn apply_map(map_fn: impl FnMut(u32) -> u32)

fn items(&self) -> impl Iterator + '_

In contrast, when you use dyn trait, you wind up having to be very explicit around lots of details, and your callers have to change as well:

fn apply_map(map_fn: &mut dyn FnMut(u32) -> u32)

fn items(&self) -> Box + '_>

Make dyn trait feel more parametric:

If I have an struct Foo { t: Box }, it has the nice property that it exposes the T. This means we know that Foo: Send if T: Send (assuming Foo doesn’t have any fields that are not send), we know that Foo: 'static if T: 'static, and so forth. This is very cool.

In contrast, struct Foo { t: Box } bakes a lot of details – it doesn’t permit t to contain any references, and it doesn’t let Foo be Send.

Make it sound:

There are a few open soundness bugs around dyn trait, such as #57893, and I would like to close them. This interacts with other things in this list.

This has traditionally been called Stream. ↩︎

Rustc Reading Club, Take 2

2021-11-18T00:00:00+00:00

Wow! The response to the last Rustc Reading Club was overwhelming – literally! We maxed out the number of potential zoom attendees and I couldn’t even join the call! It’s clear that there’s a lot of demand here, which is great. We’ve decided to take another stab at running the Rustc Reading Club, but we’re going to try it a bit differently this time. We’re going to start by selecting a smaller group to do it a few times and see how it goes, and then decide how to scale up.

The ask

Here is what we want from you. If you are interested in the Rustc Reading Club, fill sign up on the form below!

Rustc reading club signup form

Start small…

As Doc Jones announced in her post, we’re going to hold our second meeting on December 2, 2021 at 12PM EST (see in your timezone). Read her post for all the details on how that’s going to work! To avoid a repeat of last time, this meeting will be invite only – we’re going to “hand select” about 10-15 people from the folks who sign up, looking for a range of experience and interests. The reason for this is that we want to try out the idea with a smaller group and see how it goes.

…and scale!

Presuming the club is a success, we would love to have more active clubs going on. My expectation is that we will have a number of rustc reading clubs of different kinds and flavors – for example, a recorded club, or a club that is held on Zulip instead of Zoom, or clubs in other languages.¹ As we try out new ideas, we’ll make sure to reach out to people who signed up on the google form, so please do sign up if you are interested!

In fact, if you’re really excited, you don’t need to wait for us – just create a zoom room and invite your friends to read some code! Or leave a message in #rustc-reading-club on zulip, I bet you’d find some takers. ↩︎

CTCFT 2021-11-22 Agenda

2021-11-15T00:00:00+00:00

The next “Cross Team Collaboration Fun Times” (CTCFT) meeting will take place next Monday, on 2021-11-22 at 11am US Eastern Time (click to see in your time zone). Note that this is a new time: we are experimenting with rotating in an earlier time that occurs during the European workday. This post covers the agenda. You’ll find the full details (along with a calendar event, zoom details, etc) on the CTCFT website.

Agenda

This meeting we’ve invited some of the people working to integrate Rust into the Linux kernel to come and speak. We’ve asked them to give us a feel for how the integration works and help identify those places where the experience is rough. The expectation is that we can use this feedback as an input when deciding what work to pursue and what features to prioritize for stabilization.

(5 min) Opening remarks 👋 (nikomatsakis)

(40 min) Rust for Linux (ojeda, alex, wedsonaf)

The Rust for Linux project is adding Rust support to the Linux kernel. While it is still the early days, there are some areas of the Rust language, library, and tooling where the Rust project might be able to help out - for instance, via stabilization of features, suggesting ways to tackle particular problems, and more. This talk will walk through the issues found, along with examples where applicable.

(5 min) Closing (nikomatsakis)

Afterwards: Social Hour

After the CTCFT this week, we are going to try an experimental social hour. The hour will be coordinated in the #ctcft stream of the rust-lang Zulip. The idea is to create breakout rooms where people can gather to talk, hack together, or just chill.

View types for Rust

2021-11-05T00:00:00+00:00

I wanted to write about an idea that’s been kicking around in the back of my mind for some time. I call it view types. The basic idea is to give a way for an &mut or & reference to identify which fields it is actually going to access. The main use case for this is having “disjoint” methods that don’t interfere with one another.

This is not a proposal (yet?)

To be clear, this isn’t an RFC or a proposal, at least not yet. It’s some early stage ideas that I wanted to document. I’d love to hear reactions and thoughts, as I discuss in the conclusion.

Running example

As a running example, consider this struct WonkaShipmentManifest. It combines a vector bars of ChocolateBars and a list golden_tickets of indices for bars that should receive a ticket.

struct WonkaShipmentManifest { bars: Vec<ChocolateBar>, golden_tickets: Vec<usize>, }
Now suppose we want to iterate over those bars and put them into their packaging. Along the way, we’ll insert a golden ticket. To start, we write a little function that checks whether a given bar should receive a golden ticket:

impl WonkaShipmentManifest { fn should_insert_ticket(&self, index: usize) -> bool { self.golden_tickets.contains(&index) } }
Next, we write the loop that iterates over the chocolate bars and prepares them for shipment:

impl WonkaShipmentManifest { fn prepare_shipment(self) -> Vec<WrappedChocolateBar> { let mut result = vec![]; for (bar, i) in self.bars.into_iter().zip(0..) { let opt_ticket = if self.should_insert_ticket(i) { Some(GoldenTicket::new()) } else { None }; result.push(bar.into_wrapped(opt_ticket)); } result } }
Satisfied with our code, we sit back and fire up the compiler and, wait… what’s this?

error[E0382]: borrow of partially moved value: `self` --> src/lib.rs:16:33 | 15 | for (bar, i) in self.bars.into_iter().zip(0..) { | ----------- `self.bars` partially moved due to this method call 16 | let opt_ticket = if self.should_insert_ticket(i) { | ^^^^ value borrowed here after partial move |
Well, the message makes sense, but it’s unnecessary! The compiler is concerned because we are borrowing self when we’ve already moved out of the field self.bars, but we know that should_insert_ticket is only going to look at self.golden_tickets, and that value is still intact. So there’s not a real conflict here.

Still, thinking on it more, you can see why the compiler is complaining. It only looks at one function at a time, so how would it know what fields should_insert_ticket is going to read? And, even if were to look at the body of should_insert_ticket, maybe it’s reasonable to give a warning for future-proofing. Without knowing more about our plans here at Wonka Inc., it’s reasonable to assume that future code authors may modify should_insert_ticket to look at self.bars or any other field. This is part of the reason that Rust does its analysis on a per-function basis: checking each function independently gives room for other functions to change, so long as they don’t change their signature, without disturbing their callers.

What we need, then, is a way for should_insert_ticket to describe to its callers which fields it may use and which ones it won’t. Then the caller could permit invoking should_insert_ticket whenever the field self.golden_tickets is accessible, even if other fields are borrowed or have been moved.

An idea

When I’ve thought about this problem in the past, I’ve usually imagined that the list of “fields that may be accessed” would be attached to the reference. But that’s a bit odd, because a reference type &mut T doesn’t itself have an fields. The fields come from T.

So recently I was thinking, what if we had a view type? I’ll write it {place1, ..., placeN} T for now. What it means is “an instance of T, but where only the paths place1...placeN are accessible”. Like other types, view types can be borrowed. In our example, then, &{golden_tickets} WonkaShipmentManifest would describe a reference to WonkaShipmentManifest which only gives access to the golden_tickets field.

Creating a view

We could use some syntax like {place1..placeN} expr to create a view type¹. This would be a place expression, which means that it refers to a specific place in memory. This means that it can be directly borrowed without creating a temporary. So I can create a view onto self that only has access to bars_counter like so:

impl WonkaShipmentManifest { fn example_a(&mut self) { let self1 = &{golden_tickets} self; println!("tickets = {:#?}", self1.golden_tickets); } }
Notice the distinction between &self.golden_tickets and &{golden_tickets} self. The former borrows the field directly. The latter borrows the entire struct, but only gives access to one field. What happens if you try to access another field? An error, of course:

impl WonkaShipmentManifest { fn example_b(&mut self) { let self1 = &{golden_tickets} self; println!("tickets = {:#?}", self1.golden_tickets); for bar in &self1.bars { // ^^^^^^^^^^ // Error: self1 does not have access to `bars` } } }
Of course, when a view is active, you can still access other fields through the original path, without disturbing the borrow:

impl WonkaShipmentManifest { fn example_c(&mut self) { let self1 = &{golden_tickets) self; for bar in &mut self.bars { println!("tickets = {:#?}", self1.golden_tickets); } } }
And, naturally, that access includes the ability to create multiple views at once, so long as they have disjoint paths:

impl WonkaShipmentManifest { fn example_d(&mut self) { let self1 = &{golden_tickets) self; let self2 = &mut {bars} self; for bar in &mut self2.bars { println!("tickets = {:#?}", self1.golden_tickets); bar.modify(); } } }
View types in methods

As example C in the previous section suggested, we can use a view type in our definition of should_insert_ticket to specify which fields it will use:

impl WonkaChocolateFactory { fn should_insert_ticket(&{golden_tickets} self, index: usize) -> bool { self.golden_tickets.contains(&index) } }
As a result of doing this, we can successfully compile the prepare_shipment function:

impl WonkaShipmentManifest { fn prepare_shipment(self) -> Vec<WrappedChocolateBar> { let mut result = vec![]; for (bar, i) in self.bars.into_iter().zip(0..) { // ^^^^^^^^^^^^^^^^^^^^^ // Moving out of `self.bars` here.... let opt_ticket = if self.should_insert_ticket(i) { // ^^^^ // ...does not conflict with borrowing a // view of `{golden_tickets}` from `self` here. Some(GoldenTicket::new()) } else { None }; result.push(bar.into_wrapped(opt_ticket)); } result } }
View types with access modes

All my examples so far were with “shared” views through & references. We could of course say that &mut {bars} WonkaShipmentManifest gives mutable access to the field bars, but it might also be nice to have an explicit mut mode, such that you write &mut {mut bars} WonkaShipmentManifest. This is more verbose, but it permits one to give away a mix of “shared” and “mut” access:

impl WonkaShipmentManifest { fn add_ticket(&mut {bars, mut golden_tickets} self, index: usize) { // ^^^^ ^^^^^^^^^^^^^^^^^^^ // | mut access to golden-tickets // shared access to bars assert!(index < self.bars.len()); self.golden_tickets.push(index); } }
One could invoke add_ticket even if you had existing borrows to bars:

fn foo() { let manifest = WonkaShipmentManifest { bars, golden_tickets }; let bar0 = &manifest.bars[0]; // ^^^^^^^^^^^^^^ shared borrow of `manifest.bars`... manifest.add_ticket(22); // ^ borrows `self` mutably, but with view // `{bars, mut golden_tickets}` println!("debug: {:?}", bar0); }
View types and ownership

I’ve always shown view types with references, but combining them with ownership makes for other interesting possibilities. For example, suppose I wanted to extend GoldenTicket with some kind of unique serial_number that should never change, along with a owner field that will be mutated over time. For various reasons², I might like to make the fields of GoldenTicket public:

pub struct GoldenTicket { pub serial_number: usize, pub owner: Option<String>, } impl GoldenTicket { pub fn new() -> Self { Self { .. } } }
However, if I do that, then nothing stops future owners of a GoldenTicket from altering its serial_number:

let mut t = GoldenTicket::new(); t.serial_number += 1; // uh-oh!
The best answer today is to use a private field and an accessor:

pub struct GoldenTicket { pub serial_number: usize, pub owner: Option<String>, } impl GoldenTicket { pub fn new() -> Self { } pub fn serial_number(&self) -> usize { self.serial_number } }
However, Rust’s design kind of discourages accessors. For one thing, the borrow checker doesn’t know which fields are used by an accessor, so you have code like this, you will now get annoying errors (this has been the theme of this whole post, of course):

let mut t = GoldenTicket::new(); let n = &mut t.owner; compute_new_owner(n, t.serial_number());
Furthermore, accessors can be kind of unergonomic, particularly for things that are not copy types. Returning (say) an &T from a get can be super annoying.

Using a view type, we have some interesting other options. I could define a type alias GoldenTicket that is a limited view onto the underlying data:

pub type GoldenTicket = {serial_number, mut owner} GoldenTicketData; pub struct GoldenTicketData { pub serial_number: usize, pub owner: Option<String>, dummy: (), }
Now if my constructor function only ever creates this view, we know that nobody will be able to modify the serial_number for a GoldenTicket:

impl GoldenTicket { pub fn new() -> GoldenTicket { } }
Obviously, this is not ergonomic to write, but it’s interesting that it is possible.

View types vs privacy

As you may have noticed in the previous example, view types interact with traditional privacy in interesting ways. It seems like there may be room for some sort of unification, but the two are also different. Traditional privacy (pub fields and so on) is like a view type in that, if you are outside the module, you can’t access private fields. Unlike a view, though, you can call methods on the type that do access those fields. In other words, traditional privacy denies you direct access, but permits intermediated access.

View types, in contrast, are “transitive” and apply both to direct and intermediated actions. If I have a view {serial_number} GoldenTicketData, I cannot access the owner field at all, even by invoking methods on the type.

Longer places

My examples so far have only shown views onto individual fields, but there is no reason we can’t have a view onto an arbitrary place. For example, one could write:

struct Point { x: u32, y: u32 } struct Square { upper_left: Point, lower_right: Point } let mut s: Square = Square { upper_left: Point { x: 22, y: 44 }, lower_right: Point { x: 66, y: 88 } }; let s_x = &{upper_left.x} s;
to get a view of type &{upper_left.x} Square. Paths like s.upper_left.y and s.lower_right would then still be mutable and not considered borrowed.

View types and named groups

There is another interaction with view types and privacy: view types name fields, but if you have private fields, you probably don’t want people outside your module typing their names, since that would prevent you from renaming them. At the same time, you might like to be able to let users refer to “groups of data” more abstractly. For example, for a WonkaShipmentManifest, I might like users to know they can iterate the bars and check if they have a golden ticket at once:

impl WonkaShipmentManifest { pub fn should_insert_ticket(&{golden_tickets} self, index: usize) -> bool { self.golden_tickets.contains(&index) } pub fn iter_bars_mut(&mut {bars} self) -> impl Iterator<Item = &mut Bar> { &mut self.bars } }
But how should we express that to users without having them name fields directly? The obvious extension is to have some kind of “logical” fields that represent groups of data that can change over time. I don’t know how to declare those groups though.

Groups could be more DRY

Another reason to want named groups is to avoid repeating the names of common sets of fields over and over. It’s easy to imagine that there might be a few fields that some cluster of methods all want to access, and that repeating those names will be annoying and make the code harder to edit.

One positive thing from Rust’s current restrictions is that it has sometimes encouraged me to factor a single large type into multiple smaller ones, where the smaller ones encapsulate a group of logically related fields that are accessed together.[^ex] On the other hand, I’ve also encountered situations where such refactorings feel quite arbitrary – I have groups of fields that, yes, are accessed together, but which don’t form a logical unit on their own.

As an example of both why this sort of refactoring can be good and bad at the same time, I introduced the [cfg] field of the MIR Builder type to resolve errors where some methods only accessed a subset of fields. On the one hand, the CFG-related data is indeed conceptually distinct from the rest. On the other, the CFG type isn’t something you would use independently of the Builder itself, and I don’t feel that writing self.cfg.foo instead of self.foo made the code particularly clearer.

View types and fields in traits

Some time back, I had a draft RFC for fields in traits. That RFC was “postponed” and moved to a repo to iterate, but I have never had the time to invest in bringing it back. It has some obvious overlap with this idea of views, and (iirc) I had at some point considered using “fields in traits” as the basis for declaring views. I think I rather like this more “structural” approach, but perhaps traits with fields might be a way to give names to groups of fields that public users can reference. Have to mull on that.

View types and disjoint closure capture

Rust 2021 introduced disjoint closure capture. The idea is that closures capture one reference per path that is referenced, subject to some caveats. One of the things I am very happy with is that this was implemented with virtually no changes to the borrow checker: we basically just tweaked how closures are desugared. Besides saving a bunch of effort on the implementation³, this means that the risk of soundness problems is not increased. This strategy does have a downside, however: closures can sometimes get bigger (though we found experimentally that they rarely do in practice, and sometimes get smaller too).

Closures that access two paths like a.foo and a.bar can get bigger because they capture those paths independently, whereas before they have just captured a as a whole. Interestingly, using view types offers us a way to desugar those closures without introducing unsafe code. Closures could capture {foo, bar} a instead of the two fields independently. Neat!

How does this affect learning?

I’m always wary about extending “core Rust” because I don’t want to make Rust harder to learn. However, I also tend to feel that extensions like this one can have the opposite effect: I think that what throws people the most when learning Rust is trying to get a feel for what they can and cannot do. When they hit “arbitrary” restrictions like “cannot say that my helper function only uses a subset of my fields”⁴ that can often be the most confusing thing of all, because at first people think that they just don’t understand the system. “Surely there must be some way to do this!”

Going a bit further, one of the other challenges with Rust’s borrow checker is that so much of its reasoning is invisible and lacks explicit syntax. There is no way to “hand annotate” the value of lifetime parameters, for example, so as to explore how they work. Similarly, the borrow checker is currently tracking fine-grained state about which paths are borrowed in your program, but you have no way to talk about that logic explicitly. Adding explicit types may indeed prove helpful for learning.

But there must be some risks?

Yes, for sure. One of the best and worst things about Rust is that your public API docs force you to make decisions like “do I want &self or &mut self access for this function?” It pushes a lot of design up front (raising the risk of premature commitment) and makes things harder to change (more viscous). If it became “the norm” for people to document fine-grained information about which methods use which groups of fields, I worry that it would create more opportunities for semver-hazards, and also just make the docs harder to read.

On the other side, one of my observations it that public-facing types don’t want views that often; the main exception is that sometimes it’d be nice small accessors (for example, a Vec might like to document that one can read len even when iterating). Most of the time I find myself frustrated with this particular limitation of Rust, it has to do with private helper functions (similar to the initial example). In those cases, I think that the documentation is actually helpful, since it guides people who are reading and helps them know what to expect from the function.

Conclusion

This concludes our tour of “view types”, a proto-proposal. I hope you enjoyed your ride. Curious to hear what people think! I’ve opened an thread on internals for feedback. I’d love to know if you feel this would solve problems for you, but also how you think it would affect Rust learning – not to mention better syntax ideas.

I’d also be interested to read about related work. The idea here seems likely to have been invented and re-invented numerous times. What other languages, either in academic or industry, have similar mechanisms? How do they work? Educate me!

Footnotes

Yes, this is ambiguous. Think of it as my way of encouraging you to bikeshed something better. ↩︎

↩︎

Shout out to the RFC 2229 working group folks, who put in months and months and months of work on this. ↩︎

Another example is that there is no way to have a struct that has references to its own fields. ↩︎

Rustc Reading Club

2021-10-28T00:00:00+00:00

Ever wanted to understand how rustc works? Me too! Doc Jones and I have been talking and we had an idea we wanted to try. Inspired by the very cool Code Reading Club, we are launching an experimental Rustc Reading Club. Doc Jones posted an announcement on her blog, so go take a look!

The way this club works is pretty simple: every other week, we’ll get together for 90 minutes and read some part of rustc (or some project related to rustc), and talk about it. Our goal is to walk away with a high-level understanding of how that code works. For more complex parts of the code, we may wind up spending multiple sessions on the same code.

We may yet tweak this, but the plan is to follow a “semi-structured” reading process:

Identify the modules in the code and their purpose.

Look at the type definitions and try to describe their high-level purpose.

Identify the most important functions and their purpose.

Dig into how a few of those functions are actually implemented.

The meetings will not be recorded, but they will be open to anyone. The first meeting of the Rustc Reading Club will be November 4th, 2021 at 12:00pm US Eastern time. Hope to see you there!

Dyn async traits, part 6

2021-10-15T00:00:00+00:00

A quick update to my last post: first, a better way to do what I was trying to do, and second, a sketch of the crate I’d like to see for experimental purposes.

An easier way to roll our own boxed dyn traits

In the previous post I covered how you could create vtables and pair the up with a data pointer to kind of “roll your own dyn”. After I published the post, though, dtolnay sent me this Rust playground link to show me a much better approach, one based on the erased-serde crate. The idea is that instead of make a “vtable struct” with a bunch of fn pointers, we create a “shadow trait” that reflects the contents of that vtable:

// erased trait: trait ErasedAsyncIter { type Item; fn next<'me>(&'me mut self) -> Pin<Box<dyn Future<Output = Option<Self::Item>> + 'me>>; }
Then the DynAsyncIter struct can just be a boxed form of this trait:

pub struct DynAsyncIter<'data, Item> { pointer: Box<dyn ErasedAsyncIter<Item = Item> + 'data>, }
We define the “shim functions” by implementing ErasedAsyncIter for all T: AsyncIter:

impl<T> ErasedAsyncIter for T where T: AsyncIter, { type Item = T::Item; fn next<'me>(&'me mut self) -> Pin<Box<dyn Future<Output = Option<Self::Item>> + 'me>> { // This code allocates a box for the result // and coerces into a dyn: Box::pin(AsyncIter::next(self)) } }
And finally we can implement the AsyncIter trait for the dynamic type:

impl<'data, Item> AsyncIter for DynAsyncIter<'data, Item> { type Item = Item; type Next<'me> where Item: 'me, 'data: 'me, = Pin<Box<dyn Future<Output = Option<Item>> + 'me>>; fn next(&mut self) -> Self::Next<'_> { self.pointer.next() } }
Yay, it all works, and without any unsafe code!

What I’d like to see

This “convert to dyn” approach isn’t really specific to async (as erased-serde shows). I’d like to see a decorator that applies it to any trait. I imagine something like:

// Generates the `DynAsyncIter` type shown above: #[derive_dyn(DynAsyncIter)] trait AsyncIter { type Item; async fn next(&mut self) -> Option<Self::Item>; }
But this ought to work with any -> impl Trait return type, too, so long as Trait is dyn safe and implemented for Box. So something like this:

// Generates the `DynAsyncIter` type shown above: #[derive_dyn(DynSillyIterTools)] trait SillyIterTools: Iterator { // Iterate over the iter in pairs of two items. fn pair_up(&mut self) -> impl Iterator<(Self::Item, Self::Item)>; }
would generate an erased trait that returns a Box>. Similarly, you could do a trick with taking any impl Foo and passing in a Box, so you can support impl Trait in argument position.

Even without impl trait, derive_dyn would create a more ergonomic dyn to play with.

I don’t really see this as a “long term solution”, but I would be interested to play with it.

Comments?

I’ve created a thread on internals if you’d like to comment on this post, or others in this series.

Dyn async traits, part 5

2021-10-14T00:00:00+00:00

If you’re willing to use nightly, you can already model async functions in traits by using GATs and impl Trait — this is what the Embassy async runtime does, and it’s also what the real-async-trait crate does. One shortcoming, though, is that your trait doesn’t support dynamic dispatch. In the previous posts of this series, I have been exploring some of the reasons for that limitation, and what kind of primitive capabilities need to be exposed in the language to overcome it. My thought was that we could try to stabilize those primitive capabilities with the plan of enabling experimentation. I am still in favor of this plan, but I realized something yesterday: using procedural macros, you can ALMOST do this experimentation today! Unfortunately, it doesn’t quite work owing to some relatively obscure rules in the Rust type system (perhaps some clever readers will find a workaround; that said, these are rules I have wanted to change for a while).

Just to be crystal clear: Nothing in this post is intended to describe an “ideal end state” for async functions in traits. I still want to get to the point where one can write async fn in a trait without any further annotation and have the trait be “fully capable” (support both static dispatch and dyn mode while adhering to the tenets of zero-cost abstractions¹). But there are some significant questions there, and to find the best answers for those questions, we need to enable more exploration, which is the point of this post.

Code is on github

The code covered in this blog post has been prototyped and is available on github. See the caveat at the end of the post, though!

Design goal

To see what I mean, let’s return to my favorite trait, AsyncIter:

trait AsyncIter { type Item; async fn next(&mut self) -> Option<Self::Item>; }
The post is going to lay out how we can transform a trait declaration like the one above into a series of declarations that achieve the following:

We can use it as a generic bound (fn foo()), in which case we get static dispatch, full auto trait support, and all the other goodies that normally come with generic bounds in Rust.

Given a T: AsyncIter, we can coerce it into some form of DynAsyncIter that uses virtual dispatch. In this case, the type doesn’t reveal the specific T or the specific types of the futures.

I wrote DynAsyncIter, and not dyn AsyncIter on purpose — we are going to create our own type that acts like a dyn type, but which manages the adaptations needed for async.

For simplicity, let’s assume we want to box the resulting futures. Part of the point of this design though is that it leaves room for us to generate whatever sort of wrapping types we want.

You could write the code I’m showing here by hand, but the better route would be to package it up as a kind of decorator (e.g., #[async_trait_v2]²).

The basics: trait with a GAT

The first step is to transform the trait to have a GAT and a regular fn, in the way that we’ve seen many times:

trait AsyncIter { type Item; type Next<‘me>: Future<Output = Option<Self::Item>> where Self: ‘me; fn next(&mut self) -> Self::Next<‘_>; }
Next: define a “DynAsyncIter” struct

The next step is to manage the virtual dispatch (dyn) version of the trait. To do this, we are going to “roll our own” object by creating a struct DynAsyncIter. This struct plays the role of a Box trait object. Instances of the struct can be created by calling DynAsyncIter::from with some specific iterator type; the DynAsyncIter type implements the AsyncIter trait, so once you have one you can just call next as usual:

let the_iter: DynAsyncIter<u32> = DynAsyncIter::from(some_iterator); process_items(&mut the_iter); async fn sum_items(iter: &mut impl AsyncIter<Item = u32>) -> u32 { let mut s = 0; while let Some(v) = the_iter.next().await { s += v; } s }
Struct definition

Let’s look at how this DynAsyncIter struct is defined. First, we are going to “roll our own” object by creating a struct DynAsyncIter. This struct is going to model a Box trait object; it will have one generic parameter for every ordinary associated type declared in the trait (not including the GATs we introduced for async fn return types). The struct itself has two fields, the data pointer (a box, but in raw form) and a vtable. We don’t know the type of the underlying value, so we’ll use ErasedData for that:

type ErasedData = (); pub struct DynAsyncIter<Item> { data: *mut ErasedData, vtable: &’static DynAsyncIterVtable<Item>, }
For the vtable, we will make a struct that contains a fn for each of the methods in the trait. Unlike the builtin vtables, we will modify the return type of these functions to be a boxed future:

struct DynAsyncIterVtable<Item> { drop_fn: unsafe fn(*mut ErasedData), next_fn: unsafe fn(&mut *mut ErasedData) -> Box<dyn Future<Output = Option<Item>> + ‘_>, }
Implementing the AsyncIter trait

Next, we can implement the AsyncIter trait for the DynAsyncIter type. For each of the new GATs we introduced, we simply use a boxed future type. For the method bodies, we extract the function pointer from the vtable and call it:

impl<Item> AsyncIter for DynAsyncIter<Item> { type Item = Item; type Next<‘me> = Box<dyn Future<Output = Option<Item>> + ‘me>; fn next(&mut self) -> Self::Next<‘_> { let next_fn = self.vtable.next_fn; unsafe { next_fn(&mut self.data) } } }
The unsafe keyword here is asserting that the safety conditions of next_fn are met. We’ll cover that in more detail later, but in short those conditions are:

The vtable corresponds to some erased type T: AsyncIter…

…and each instance of *mut ErasedData points to a valid Box for that type.

Dropping the object

Speaking of Drop, we do need to implement that as well. It too will call through the vtable:

impl Drop for DynAsyncIter { fn drop(&mut self) { let drop_fn = self.vtable.drop_fn; unsafe { drop_fn(self.data); } } }
We need to call through the vtable because we don’t know what kind of data we have, so we can’t know how to drop it correctly.

Creating an instance of DynAsyncIter

To create one of these DynAsyncIter objects, we can implement the From trait. This allocates a box, coerces it into a raw pointer, and then combines that with the vtable:

impl<Item, T> From<T> for DynAsyncIter<Item> where T: AsyncIter<Item = Item>, { fn from(value: T) -> DynAsyncIter { let boxed_value = Box::new(value); DynAsyncIter { data: Box::into_raw(boxed_value) as *mut (), vtable: dyn_async_iter_vtable::<T>(), // we’ll cover this fn later } } }
Creating the vtable shims

Now we come to the most interesting part: how do we create the vtable for one of these objects? Recall that our vtable was a struct like so:

struct DynAsyncIterVtable<Item> { drop_fn: unsafe fn(*mut ErasedData), next_fn: unsafe fn(&mut *mut ErasedData) -> Box<dyn Future<Output = Option<Item>> + ‘_>, }
We are going to need to create the values for each of those fields. In an ordinary dyn, these would be pointers directly to the methods from the impl, but for us they are “wrapper functions” around the core trait functions. The role of these wrappers is to introduce some minor coercions, such as allocating a box for the resulting future, as well as to adapt from the “erased data” to the true type:

// Safety conditions: // // The `*mut ErasedData` is actually the raw form of a `Box` // that is valid for ‘a. unsafe fn next_wrapper<‘a, T>( this: &’a mut *mut ErasedData, ) -> Box<dyn Future<Output = Option<T::Item>> + ‘a where T: AsyncIter, { let unerased_this: &mut Box<T> = unsafe { &mut *(this as *mut Box<T>) }; let future: T::Next<‘_> = <T as AsyncIter>::next(unerased_this); Box::new(future) }
We’ll also need a “drop” wrapper:

// Safety conditions: // // The `*mut ErasedData` is actually the raw form of a `Box` // and this function is being given ownership of it. fn drop_wrapper<T>( this: *mut ErasedData, ) where T: AsyncIter, { let unerased_this = Box::from_raw(this as *mut T); drop(unerased_this); // Execute destructor as normal }
Constructing the vtable

Now that we’ve defined the wrappers, we can construct the vtable itself. Recall that the From impl called a function dyn_async_iter_vtable::. That function looks like this:

fn dyn_async_iter_vtable<T>() -> &’static DynAsyncIterVtable<T::Item> where T: AsyncIter, { const { &DynAsyncIterVtable { drop_fn: drop_wrapper::<T>, next_fn: next_wrapper::<T>, } } }
This constructs a struct with the two function pointers: this struct only contains static data, so we are allowed to return a &’static reference to it.

Done!

And now the caveat, and a plea for help

Unfortunately, this setup doesn’t work quite how I described it. There are two problems:

const functions and expressions stil lhave a lot of limitations, especially around generics like T, and I couldn’t get them to work;

Because of the rules introduced by RFC 1214, the &’static DynAsyncIterVtable type requires that T::Item: 'static, which may not be true here. This condition perhaps shouldn’t be necessary, but the compiler currently enforces it.

I wound up hacking something terrible that erased the T::Item type into uses and used Box::leak to get a &'static reference, just to prove out the concept. I’m almost embarassed to show the code, but there it is.

Anyway, I know people have done some pretty clever tricks, so I’d be curious to know if I’m missing something and there is a way to build this vtable on Rust today. Regardless, it seems like extending const and a few other things to support this case is a relatively light lift, if we wanted to do that.

Conclusion

This blog post presented a way to implement the dyn dispatch ideas I’ve been talking using only features that currently exist and are generally en route to stabilization. That’s exiting to me, because it means that we can start to do measurements and experimentation. For example, I would really like to know the performance impact of transitiong from async-trait to a scheme that uses a combination of static dispatch and boxed dynamic dispatch as described here. I would also like to explore whether there are other ways to wrap futures (e.g., with task-local allocators or other smart pointers) that might perform better. This would help inform what kind of capabilities we ultimately need.

Looking beyond async, I’m interested in tinkering with different models for dyn in general. As an obvious example, the “always boxed” version I implemented here has some runtime cost (an allocation!) and isn’t applicable in all environments, but it would be far more ergonomic. Trait objects would be Sized and would transparently work in far more contexts. We can also prototype different kinds of vtable adaptation.

In the words of Bjarne Stroustroup, “What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better.” ↩︎

Egads, I need a snazzier name than that! ↩︎

CTCFT 2021-10-18 Agenda

2021-10-13T00:00:00+00:00

The next “Cross Team Collaboration Fun Times” (CTCFT) meeting will take place next Monday, on 2021-10-18 (in your time zone)! This post covers the agenda. You’ll find the full details (along with a calendar event, zoom details, etc) on the CTCFT website.

Agenda

The theme for this meeting is exploring ways to empower and organize contributors.

(5 min) Opening remarks 👋 (nikomatsakis)

(5 min) CTCFT update (angelonfira)

(20 min) Sprints and groups implementing the async vision doc (tmandry)

(15 min) rust-analyzer talk (TBD)

The rust-analyzer project aims to succeed RLS as the official language server for Rust. We talk about how it differs from RLS, how it is developed, and what to expect in the future.

(10 min) Contributor survey (yaahc)

Introducing the contributor survey, it’s goals, methodology, and soliciting community feedback

(5 min) Closing (nikomatsakis)

Afterwards: Social hour

After the CTCFT this week, we are going to try an experimental social hour. The hour will be coordinated in the #ctcft stream of the rust-lang Zulip. The idea is to create breakout rooms where people can gather to talk, hack together, or just chill.

Dyn async traits, part 4

2021-10-07T00:00:00+00:00

In the previous post, I talked about how we could write our own impl Iterator for dyn Iterator by adding a few primitives. In this post, I want to look at what it would take to extend that to an async iterator trait. As before, I am interested in exploring the “core capabilities” that would be needed to make everything work.

Start somewhere: Just assume we want Box

In the first post of this series, we talked about how invoking an async fn through a dyn trait should to have the return type of that async fn be a Box — but only when calling it through a dyn type, not all the time.

Actually, that’s a slight simplification: Box is certainly one type we could use, but there are other types you might want:

Box, to indicate that the future is sendable across threads;

Some other wrapper type besides Box.

To keep things simple, I’m just going to look at Box in this post. We’ll come back to some of those extensions later.

Background: Running example

Let’s start by recalling the AsyncIter trait:

trait AsyncIter { type Item; async fn next(&mut self) -> Option<Self::Item>; }
Remember that when we “desugared” this async fn, we introduced a new (generic) associated type for the future returned by next, called Next here:

trait AsyncIter { type Item; type Next<'me>: Future<Output = Self::Item> + 'me; fn next(&mut self) -> Self::Next<'_>; }
We were working with a struct SleepyRange that implements AsyncIter:

struct SleepyRange { … } impl AsyncIter for SleepyRange { type Item = u32; … }
Background: Associated types in a static vs dyn context

Using an associated type is great in a static context, because it means that when you call sleepy_range.next(), we are able to resolve the returned future type precisely. This helps us to allocate exactly as much stack as is needed and so forth.

But in a dynamic context, i.e. if you have some_iter: Box and you invoke some_iter.next(), that’s a liability. The whole point of using dyn is that we don’t know exactly what implementation of AsyncIter::next we are invoking, so we can’t know exactly what future type is returned. Really, we just want to get back a Box>> — or something very similar.

How could we have a trait that boxes futures, but only when using dyn?

If we want the trait to only box futures when using dyn, there are two things we need.

First, we need to change the impl AsyncIter for dyn AsyncIter. In the compiler today, it generates an impl which is generic over the value of every associated type. But we want an impl that is generic over the value of the Item type, but which specifies the value of the Next type to be Box. This way, we are effectively saying that “when you call the next method on a dyn AsyncIter, you always get a Box back” (but when you call the next method on a specific type, such as a SleepyRange, you would get back a different type — the actual future type, not a boxed version). If we were to write that dyn impl in Rust code, it might look something like this:

impl AsyncIter for dyn AsyncIter<Item = I> { type Item = I; type Next<'me> = Box<dyn Future<Output = Option> + ‘me>; fn next(&mut self) -> Self::Next<'_> { /* see below */ } }
The body of the next function is code that extracts the function pointer from the vtable and calls it. Something like this, relying on the APIs from [RFC 2580] along with the function associated_fn that I sketched in the previous post:

fn next(&mut self) -> Self::Next<‘_> { type RuntimeType = (); let data_pointer: *mut RuntimeType = self as *mut (); let vtable: DynMetadata = ptr::metadata(self); let fn_pointer: fn(*mut RuntimeType) -> Box<dyn Future<Output = Option> + ‘_> = associated_fn::<AsyncIter::next>(); fn_pointer(data) }
This is still the code we want. However, there is a slight wrinkle.

Constructing the vtable: Async functions need a shim to return a Box

In the next method above, the type of the function pointer that we extracted from the vtable was the following:

fn(*mut RuntimeType) -> Box<dyn Future<Output = Option> + ‘_>
However, the signature of the function in the impl is different! It doesn’t return a Box, it returns an impl Future! Somehow we have to bridge this gap. What we need is a kind of “shim function”, something like this:

fn next_box_shim<T: AsyncIter>(this: &mut T) -> Box<dyn Future<Output = Option> + ‘_> { let future: impl Future<Output = Option> = AsyncIter::next(this); Box::new(future) }
Now the vtable for SleepyRange can store next_box_shim:: instead of storing ::next directly.

Extending the AssociatedFn trait

In my previous post, I sketched out the idea of an AssociatedFn trait that had an associated type FnPtr. If we wanted to make the construction of this sort of shim automated, we would want to change that from an associated type into its own trait. I’m imagining something like this:

trait AssociatedFn { } trait Reify<F>: AssociatedFn { fn reify(self) -> F; }
where A: Reify indicates that the associated function A can be “reified” (made into a function pointer) for a function type F. The compiler could implement this trait for the direct mapping where possible, but also for various kinds of shims and ABI transformations. For example, the AsyncIter::next method might implementReify Box>> to allow a “boxing shim” to be constructed and so forth.

Other sorts of shims

There are other sorts of limitations around dyn traits that could be overcome with judicious use of shims and tweaked vtables, at least in some cases. As an example, consider this trait:

pub trait Append { fn append(&mut self, values: impl Iterator<Item = u32>); }
This trait is not traditionally dyn-safe because the append function is generic and requires monomorphization for each kind of iterator — therefore, we don’t know which version to put in the vtable for Append, since we don’t yet know the types of iterators it will be applied to! But what if we just put one version, the case where the iterator type is &mut dyn Iterator? We could then tweak the impl Append for dyn Append to create this &mut dyn Iterator and call the function from the vtable:

impl Append for dyn Append { fn append(&mut self, values: impl Iterator<Item = u32>) { let values_dyn: &mut dyn Iterator<Item = u32> = &values; type RuntimeType = (); let data_pointer: *mut RuntimeType = self as *mut (); let vtable: DynMetadata = ptr::metadata(self); let f = associated_fn::<Append::append>(vtable); f(data_pointer, values_dyn); } }
Conclusion

So where does this leave us? The core building blocks for “dyn async traits” seem to be:

The ability to customize the contents of the vtable that gets generated for a trait.

For example, async fns need shim functions that box the output.

The ability to customize the dispatch logic (impl Foo for dyn Foo).

The ability to customize associated types like Next to be a Box:

This requires the ability to extract the vtable, as given by [RFC 2580].

It also requires the ability to extract functions from the vtable (not presently supported).

I said at the outset that I was going to assume, for the purposes of this post, that we wanted to return a Box, and I have. It seems possible to extend these core capabilities to other sorts of return types (such as other smart pointers), but it’s not entirely trivial; we’d have to define what kinds of shims the compiler can generate.

I haven’t really thought very hard about how we might allow users to specify each of those building blocks, though I sketched out some possibilities. At this point, I’m mostly trying to explore the possibilities of what kinds of capabilities may be useful or necessary to expose.

Dyn async traits, part 3

2021-10-06T00:00:00+00:00

In the previous “dyn async traits” posts, I talked about how we can think about the compiler as synthesizing an impl that performed the dynamic dispatch. In this post, I wanted to start explore a theoretical future in which this impl was written manually by the Rust programmer. This is in part a thought exercise, but it’s also a possible ingredient for a future design: if we could give programmers more control over the “impl Trait for dyn Trait” impl, then we could enable a lot of use cases.

Example

For this post, async fn is kind of a distraction. Let’s just work with a simplified Iterator trait:

trait Iterator { type Item; fn next(&mut self) -> Option<Self::Item>; }
As we discussed in the previous post, the compiler today generates an impl that is something like this:

impl Iterator for dyn Iterator<Item = I> { type Item = I; fn next(&mut self) -> Option { type RuntimeType = (); let data_pointer: *mut RuntimeType = self as *mut (); let vtable: DynMetadata = ptr::metadata(self); let fn_pointer: fn(*mut RuntimeType) -> Option = __get_next_fn_pointer__(vtable); fn_pointer(data) } }
This code draws on the APIs from RFC 2580, along with a healthy dash of “pseduo-code”. Let’s see what it does:

Extracting the data pointer

type RuntimeType = (); let data_pointer: *mut RuntimeType = self as *mut ();
Here, self is a wide pointer of type &mut dyn Iterator. The rules for as state that casting a wide pointer to a thin pointer drops the metadata¹, so we can (ab)use that to get the data pointer. Here I just gave the pointer the type *mut RuntimeType, which is an alias for *mut () — i.e., raw pointer to something. The type alias RuntimeType is meant to signify “whatever type of data we have at runtime”. Using () for this is a hack; the “proper” way to model it would be with an existential type. But since Rust doesn’t have those, and I’m not keen to add them if we don’t have to, we’ll just use this type alias for now.

Extracting the vtable (or DynMetadata)

let vtable: DynMetadata = ptr::metadata(self);
The ptr::metadata function was added in RFC 2580. Its purpose is to extract the “metadata” from a wide pointer. The type of this metadata depends on the type of wide pointer you have: this is determined by the Pointee trait[^noreferent]. For dyn types, the metadata is a DynMetadata, which just means “pointer to the vtable”. In today’s APIs, the DynMetadata is pretty limited: it lets you extract the size/alignment of the underlying RuntimeType, but it doesn’t give any access to the actual function pointers that are inside.

Extracting the function pointer from the vtable

let fn_pointer: fn(*mut RuntimeType) -> Option = __get_next_fn_pointer__(vtable);
Now we get to the pseudocode. Somehow, we need a way to get the fn pointer out from the vtable. At runtime, the way this works is that each method has an assigned offset within the vtable, and you basically do an array lookup; kind of like vtable.methods()[0], where methods() returns a array &[fn()] of function pointers. The problem is that there’s a lot of “dynamic typing” going on here: the signature of each one of those methods is going to be different. Moreover, we’d like some freedom to change how vtables are laid out. For example, the ongoing (and awesome!) work on dyn upcasting by Charles Lew has required modifying our vtable layout, and I expect further modification as we try to support dyn types with multiple traits, like dyn Debug + Display.

So, for now, let’s just leave this as pseudocode. Once we’ve finished walking through the example, I’ll return to this question of how we might model __get_next_fn_pointer__ in a forwards compatible way.

One thing worth pointing out: the type of fn_pointer is a fn(*mut RuntimeType) -> Option. There are two interesting things going on here:

The argument has type *mut RuntimeType: using the type alias indicates that this function is known to take a single pointer (in fact, it’s a reference, but those have the same layout). This pointer is expected to point to the same runtime data that self points at — we don’t know what it is, but we know that they’re the same. This works because self paired together a pointer to some data of type RuntimeType along with a vtable of functions that expect RuntimeType references.²

The return type is Option, where I is the item type: this is interesting because although we don’t know statically what the Self type is, we do know the Item type. In fact, we will generate a distinct copy of this impl for every kind of item. This allows us to easily pass the return value.

Calling the function

fn_pointer(data)
The final line in the code is very simple: we call the function! It returns an Option and we can return that to our caller.

Returning to the pseudocode

We relied on one piece of pseudocode in that imaginary impl:

let fn_pointer: fn(*mut RuntimeType) -> Option = __get_next_fn_pointer__(vtable);
So how could we possibly turn __get_next_fn_pointer__ from pseudocode into real code? There are two things worth noting:

First, the name of this function already encodes the method we want (next). We probably don’t want to generate an infinite family of these “getter” functions.

Second, the signature of the function is specific to the method we want, since it returns a fn type(fn *mut RuntimeType) -> Option) that encodes the signature for next (with the self type changed, of course). This seems better than just returning a generic signature like fn() that must be cast manually by the user; less opportunity for error.

Using zero-sized fn types as the basis for an API

One way to solve these problems would be to build on the trait system. Imagine there were a type for every method, let’s call it A, and that this type implemented a trait like AssociatedFn:

trait AssociatedFn { // The type of the associated function, but as a `fn` pointer // with the self type erased. This is the type that would be // encoded in the vtable. type FnPointer; … // maybe other things }
We could then define a generic “get function pointer” function like so:

fn associated_fn<A>(vtable: DynMetadata) -> A::FnPtr where A: AssociatedFn
Now instead of __get_next_fn_pointer__, we can write

type NextMethodType = /* type corresponding to the next method */; let fn_pointer: fn(*mut RuntimeType) -> Option = associated_fn::<NextMethodType>(vtable);
Ah, but what is this NextMethodType? How do we get the type for the next method? Presumably we’d have to introduce some syntax, like Iterator::item.

Related concept: zero-sized fn types

This idea of a type for associated functions is very close (but not identical) to an already existing concept in Rust: zero-sized function types. As you may know, the type of a Rust function is in fact a special zero-sized type that uniquely identifies the function. There is (presently, anyway) no syntax for this type, but you can observe it by printing out the size of values (playground):

fn foo() { } // The type of `f` is not `fn()`. It is a special, zero-sized type that uniquely // identifies `foo` let f = foo; println!(“{}”, sizeof_value(&f)); // prints 0 // This type can be coerced to `fn()`, which is a function pointer let g: fn() = f; println!(“{}”, sizeof_value(&g)); // prints 8
There are also types for functions that appear in impls. For example, you could get an instance of the type that represents the next method on vec::IntoIter like so:

let x = <vec::IntoIter<u32> as Iterator>::next; println!(“{}”, sizeof_value(&f)); // prints 0
Where the zero-sized types don’t fit

The existing zero-sized types can’t be used for our “associated function” type for two reasons:

You can’t name them! We can fix this by adding syntax.

There is no zero-sized type for a trait function independent of an impl.

The latter point is subtle³. Before, when I talked about getting the type for a function from an impl, you’ll note that I gave a fully qualified function name, which specified the Self type precisely:

let x = <vec::IntoIter<u32> as Iterator>::next; // ^^^^^^^^^^^^^^^^^^ the Self type
But what we want in our impl is to write code that doesn’t know what the Self type is! So this type that exists in the Rust type system today isn’t quite what we need. But it’s very close.

Conclusion

I’m going to leave it here. Obviously, I haven’t presented any kind of final design, but we’ve seen a lot of tantalizing ingredients:

Today, the compiler generates a impl Iterator for dyn Iterator that extract functions from a vtable and invokes them by magic.

But, using the APIs from RFC 2580, you can almost write the by hand. What is missing is a way to extract a function pointer from a vtable, and what makes that hard is that we need a way to identify the function we are extracting

We have zero-sized types that represent functions today, but we don’t have a way to name them, and we don’t have zero-sized types for functions in traits, only in impls.

Of course, all of the stuff I wrote here was just about normal functions. We still need to circle back to async functions, which add a few extra wrinkles. Until next time!

Footnotes

I don’t actually like these rules, which have bitten me a few times. I think we should introduce an accessor function, but I didn’t see one in RFC 2580 — maybe I missed it, or it already exists. ↩︎

If you used unsafe code to pair up a random pointer with an unrelated vtable, then hilarity would ensue here, as there is no runtime checking that these types line up. ↩︎

And, in fact, I didn’t see it until I was writing this blog post! ↩︎

Dyn async traits, part 2

2021-10-01T00:00:00+00:00

In the previous post, we uncovered a key challenge for dyn and async traits: the fact that, in Rust today, dyn types have to specify the values for all associated types. This post is going to dive into more background about how dyn traits work today, and in particular it will talk about where that limitation comes from.

Today: Dyn traits implement the trait

In Rust today, assuming you have a “dyn-safe” trait DoTheThing , then the type dyn DoTheThing implements Trait. Consider this trait:

trait DoTheThing { fn do_the_thing(&self); } impl DoTheThing for String { fn do_the_thing(&self) { println!(“{}”, self); } }
And now imagine some generic function that uses the trait:

fn some_generic_fn<T: ?Sized + DoTheThing>(t: &T) { t.do_the_thing(); }
Naturally, we can call some_generic_fn with a &String, but — because dyn DoTheThing implements DoTheThing — we can also call some_generic_fn with a &dyn DoTheThing:

fn some_nongeneric_fn(x: &dyn DoTheThing) { some_generic_fn(x) }
Dyn safety, a mini retrospective

Early on in Rust, we debated whether dyn DoTheThing ought to implement the trait DoTheThing or not. This was, indeed, the origin of the term “dyn safe” (then called “object safe”). At the time, I argued in favor of the current approach: that is, creating a binary property. Either the trait was dyn safe, in which case dyn DoTheThing implements DoTheThing, or it was not, in which case dyn DoTheThing is not a legal type. I am no longer sure that was the right call.

What I liked at the time was the idea that, in this model, whenever you see a type like dyn DoTheThing, you know that you can use it like any other type that implements DoTheThing.

Unfortunately, in practice, the type dyn DoTheThing is not comparable to a type like String. Notably, dyn types are not sized, so you can’t pass them around by value or work with them like strings. You must instead always pass around some kind of pointer to them, such as a Box or a &dyn DoTheThing. This is “unusual” enough that we make you opt-in to it for generic functions, by writing T: ?Sized.

What this means is that, in practice, generic functions don’t accept dyn types “automatically”, you have to design for dyn explicitly. So a lot of the benefit I envisioned didn’t come to pass.

Static versus dynamic dispatch, vtables

Let’s talk for a bit about dyn safety and where it comes from. To start, we need to explain the difference between static dispatch and virtual (dyn) dispatch. Simply put, static dispatch means that the compiler knows which function is being called, whereas dyn dispatch means that the compiler doesn’t know. In terms of the CPU itself, there isn’t much difference. With static dispatch, there is a “hard-coded” instruction that says “call the code at this address”¹; with dynamic dispatch, there is an instruction that says “call the code whose address is in this variable”. The latter can be a bit slower but it hardly matters in practice, particularly with a successful prediction.

When you use a dyn trait, what you actually have is a vtable. You can think of a vtable as being a kind of struct that contains a collection of function pointers, one for each method in the trait. So the vtable type for the DoTheThing trait might look like (in practice, there is a bit of extra data, but this is close enough for our purposes):

struct DoTheThingVtable { do_the_thing: fn(*mut ()) }
Here the do_the_thing method has a corresponding field. Note that the type of the first argument ought to be &self, but we changed it to *mut (). This is because the whole idea of the vtable is that you don’t know what the self type is, so we just changed it to “some pointer” (which is all we need to know).

When you create a vtable, you are making an instance of this struct that is tailored to some particular type. In our example, the type String implements DoTheThing, so we might create the vtable for String like so:

static Vtable_DoTheThing_String: &DoTheThingVtable = &DoTheThingVtable { do_the_thing: <String as DoTheThing>::do_the_thing as fn(*mut ()) // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ // Fully qualified reference to `do_the_thing` for strings };
You may have heard that a &dyn DoTheThing type in Rust is a wide pointer. What that means is that, at runtime, it is actually a pair of two pointers: a data pointer and a vtable pointer for the DoTheThing trait. So &dyn DoTheThing is roughly equivalent to:

(*mut (), &’static DoTheThingVtable)
When you cast a &String to a &dyn DoTheThing, what actually happens at runtime is that the compiler takes the &String pointer, casts it to *mut (), and pairs it with the appropriate vtable. So, if you have some code like this:

let x: &String = &”Hello, Rustaceans”.to_string(); let y: &dyn DoTheThing = x;
It winds up “desugared” to something like this:

let x: &String = &”Hello, Rustaceans”.to_string(); let y: (*mut (), &’static DoTheThingVtable) = (x as *mut (), Vtable_DoTheThing_String);
The dyn impl

We’ve seen how you create wide pointers and how the compiler represents vtables. We’ve also seen that, in Rust, dyn DoTheThing implements DoTheThing. You might wonder how that works. Conceptually, the compiler generates an impl where each method in the trait is implemented by extracting the function pointer from the vtable and calling it:

impl DoTheThing for dyn DoTheThing { fn do_the_thing(self: &dyn DoTheThing) { // Remember that `&dyn DoTheThing` is equivalent to // a tuple like `(*mut (), &’static DoTheThingVtable)`: let (data_pointer, vtable_pointer) = self; let function_pointer = vtable_pointer.do_the_thing; function_pointer(data_pointer); } }
In effect, when we call a generic function like some_generic_fn with T = dyn DoTheThing, we monomorphize that call exactly like any other type. The call to do_the_thing is dispatched against the impl above, and it is that special impl that actually does the dynamic dispatch. Neat.

Static dispatch permits monomorphization

Now that we’ve seen how and when vtables are constructed, we can talk about the rules for dyn safety and where they come from. One of the most basic rules is that a trait is only dyn-safe if it contains no generic methods (or, more precisely, if its methods are only generic over lifetimes, not types). The reason for this rule derives directly from how a vtable works: when you construct a vtable, you need to give a single function pointer for each method in the trait (or, perhaps, a finite set of function pointers). The problem with generic methods is that there is no single function pointer for them: you need a different pointer for each type that they’re applied to. Consider this example trait, PrintPrefixed:

trait PrintPrefixed { fn prefix(&self) -> String; fn apply<T: Display>(&self, t: T); } impl PrintPrefixed for String { fn prefix(&self) -> String { self.clone() } fn apply<T: Display>(&self, t: T) { println!(“{}: {}”, self, t); } }
What would a vtable for String as PrintPrefixed look like? Generating a function pointer for prefix is no problem, we can just use ::prefix. But what about apply? We would have to include a function pointer for ::apply, but we don’t know yet what the T is!

In contrast, with static dispatch, we don’t have to know what T is until the point of call. In that case, we can generate just the copy we need.

Partial dyn impls

The previous point shows that a trait can have some methods that are dyn-safe and some methods that are not. In current Rust, this makes the entire trait be “not dyn safe”, and this is because there is no way for us to write a complete impl PrintPrefixed for dyn PrintPrefixed:

impl PrintPrefixed for dyn PrintPrefixed { fn prefix(&self) -> String { // For `prefix`, no problem: let prefix_fn = /* get prefix function pointer from vtable */; prefix_fn(…); } fn apply<T: Display>(&self, t: T) { // For `apply`, we can’t handle all `T` types, what field to fetch? panic!(“No way to implement apply”) } }
Under the alternative design that was considered long ago, we could say that a dyn PrintPrefixed value is always legal, but dyn PrintPrefixed only implements the PrintPrefixed trait if all of its methods (and other items) are dyn safe. Either way, if you had a &dyn PrintPrefixed, you could call prefix. You just wouldn’t be able to use a dyn PrintPrefixed with generic code like fn foo.

(We’ll return to this theme in future blog posts.)

If you’re familiar with the “special case” around trait methods that require where Self: Sized, you might be able to see where it comes from now. If a method has a where Self: Sized requirement, and we have an impl for a type like dyn PrintPrefixed, then we can see that this impl could never be called, and so we can omit the method from the impl (and vtable) altogether. This is awfully similar to saying that dyn PrintPrefixed is always legal, because it means that there only a subset of methods that can be used via virtual dispatch. The difference is that dyn PrintPrefixed: PrintPrefixed still holds, because we know that generic code won’t be able to call those “non-dyn-safe” methods, since generic code would have to require that T: ?Sized.

Associated types and dyn types

We began this saga by talking about associated types and dyn types. In Rust today, a dyn type is required to specify a value for each associated type in the trait. For example, consider a simplified Iterator trait:

trait Iterator { type Item; fn next(&mut self) -> Option<Self::Item>; }
This trait is dyn safe, but if you actually have a dyn in practice, you would have to write something like dyn Iterator. The impl Iterator for dyn Iterator looks like:

impl Iterator for dyn Iterator { type Item = T; fn next(&mut self) -> Option { let next_fn = /* get next function from vtable */; return next_fn(self); } }
Now you can see why we require all the associated types to be part of the dyn type — it lets us write a complete impl (i.e., one that includes a value for each of the associated types).

Conclusion

We covered a lot of background in this post:

Static vs dynamic dispatch, vtables

The origin of dyn safety, and the possibility of “partial dyn safety”

The idea of a synthesized impl Trait for dyn Trait

Modulo dynamic linking. ↩︎

Dyn async traits, part 1

2021-09-30T00:00:00+00:00

Over the last few weeks, Tyler Mandry and I have been digging hard into what it will take to implement async fn in traits. Per the new lang team initiative process, we are collecting our design thoughts in an ever-evolving website, the async fundamentals initiative. If you’re interested in the area, you should definitely poke around; you may be interested to read about the MVP that we hope to stabilize first, or the (very much WIP) evaluation doc which covers some of the challenges we are still working out. I am going to be writing a series of blog posts focusing on one particular thing that we have been talking through: the problem of dyn and async fn. This first post introduces the problem and the general goal that we are shooting for (but don’t yet know the best way to reach).

What we’re shooting for

What we want is simple. Imagine this trait, for “async iterators”:

trait AsyncIter { type Item; async fn next(&mut self) -> Option<Self::Item>; }
We would like you to be able to write a trait like that, and to implement it in the obvious way:

struct SleepyRange { start: u32, stop: u32, } impl AsyncIter for SleepyRange { type Item = u32; async fn next(&mut self) -> Option<Self::Item> { tokio::sleep(1000).await; // just to await something :) let s = self.start; if s < self.stop { self.start = s + 1; Some(s) } else { None } } }
You should then be able to have a Box> and use that in exactly the way you would use a Box> (but with an await after each call to next, of course):

let b: Box<dyn AsyncIter<Item = u32>> = ...; let i = b.next().await;
Desugaring to an associated type

Consider this running example:

trait AsyncIter { type Item; async fn next(&mut self) -> Option<Self::Item>; }
Here, the next method will desugar to a fn that returns some kind of future; you can think of it like a generic associated type:

trait AsyncIter { type Item; type Next<'me>: Future<Output = Self::Item> + 'me; fn next(&mut self) -> Self::Next<'_>; }
The corresponding desugaring for the impl would use type alias impl trait:

struct SleepyRange { start: u32, stop: u32, } // Type alias impl trait: type SleepyRangeNext<'me> = impl Future<Output = u32> + 'me; impl AsyncIter for InfinityAndBeyond { type Item = u32; type Next<'me> = SleepyRangeNext<'me>; fn next(&mut self) -> SleepyRangeNext<'me> { async move { tokio::sleep(1000).await; let s = self.start; ... // as above } } }
This desugaring works quite well for standard generics (or impl Trait). Consider this function:

async fn process<T>(t: &mut T) -> u32 where T: AsyncIter<Item = u32>, { let mut sum = 0; while let Some(x) = t.next().await { sum += x; if sum > 22 { break; } } sum }
This code will work quite nicely. For example, when you call t.next(), the resulting future will be of type T::Next. After monomorphization, the compiler will be able to resolve ::Next to the SleepyRangeNext type, so that the future is known exactly. In fact, crates like embassy already use this desugaring, albeit manually and only on nightly.

Associated types don’t work for dyn

Unfortunately, this desugaring causes problems when you try to use dyn values. Today, when you have dyn AsyncIter, you must specify the values for all associated types defined in AsyncIter. So that means that instead of dyn AsyncIter, you would have to write something like

for<'me> dyn AsyncIter< Item = u32, Next<'me> = SleepyRangeNext<'me>, >
This is clearly a non-starter from an ergonomic perspective, but is has an even more pernicious problem. The whole point of a dyn trait is to have a value where we don’t know what the underlying type is. But specifying the value of Next<'me> as SleepyRangeNext means that there is exactly one impl that could be in use here. This dyn value must be a SleepyRange, since no other impl has that same future.

Conclusion: For dyn AsyncIter to work, the future returned by next() must be independent of the actual impl. Furthermore, it must have a fixed size. In other words, it needs to be something like Box>.

How the async-trait crate solves this problem

You may have used the async-trait crate. It resolves this problem by not using an associated type, but instead desugaring to Box types:

trait AsyncIter { type Item; fn next(&mut self) -> Box + Send + 'me>; }
This has a few disadvantages:

It forces a Box all the time, even when you are using AsyncIter with static dispatch.

The type as given above says that the resulting future must be Send. For other async fn, we use auto traits to analyze automatically whether the resulting future is send (it is Send it if it can be, in other words; we don’t declare up front whether it must be).

Conclusion: Ideally we want Box when using dyn, but not otherwise

So far we’ve seen:

If we desugar async fn to an associated type, it works well for generic cases, because we can resolve the future to precisely the right type.

But it doesn’t work for doesn’t work well for dyn trait, because the rules of Rust require that we specify the value of the associated type exactly. For dyn traits, we really want the returned future to be something like Box.

Using Box does mean a slight performance penalty relative to static dispatch, because we must allocate the future dynamically.

What we would ideally want is to only pay the price of Box when using dyn:

When you use AsyncIter in generic types, you get the desugaring shown above, with no boxing and static dispatch.

But when you create a dyn AsyncIter, the future type becomes Box>.

(And perhaps you can choose another “smart pointer” type besides Box, but I’ll ignore that for now and come back to it later.)

In upcoming posts, I will dig into some of the ways that we might achieve this.

Rustacean Principles, continued

2021-09-16T00:00:00+00:00

RustConf is always a good time for reflecting on the project. For me, the last week has been particularly “reflective”. Since announcing the Rustacean Principles, I’ve been having a number of conversations with members of the community about how they can be improved. I wanted to write a post summarizing some of the feedback I’ve gotten.

The principles are a work-in-progress

Sparking conversation about the principles was exactly what I was hoping for when I posted the previous blog post. The principles have mostly been the product of Josh and I iterating, and hence reflect our experiences. While the two of us have been involved in quite a few parts of the project, for the document to truly serve its purpose, it needs input from the community as a whole.

Unfortunately, for many people, the way I presented the principles made it seem like I was trying to unveil a fait accompli, rather than seeking input on a work-in-progress. I hope this post makes the intention more clear!

The principles as a continuation of Rust’s traditions

Rust has a long tradition of articulating its values. This is why we have a Code of Conduct. This is why we wrote blog posts like Fearless Concurrency, Stability as a Deliverable and Rust Once, Run Anywhere. Looking past the “engineering side” of Rust, aturon’s classic blog posts on listening and trust (part 1, part 2, part 3) did a great job of talking about what it is like to be on a Rust team. And who could forget the whole “fireflowers” debate?¹

My goal with the Rustacean Principles is to help coalesce the existing wisdom found in those classic Rust blog posts into a more concise form. To that end, I took initial inspiration from how AWS uses tenets, although by this point the principles have evolved into a somewhat different form. I like the way tenets use short, crisp statements that identify important concepts, and I like the way assigning a priority ordering helps establish which should have priority. (That said, one of Rust’s oldest values is synthesis: we try to find ways to resolve constraints that are in tension by having our cake and eating it too.)

Given all of this backdrop, I was pretty enthused by a suggestion that I heard from Jacob Finkelman. He suggested adapting the principles to incorporate more of the “classic Rust catchphrases”, such as the “no new rationale” rule described in the first blog post from aturon’s series. A similar idea is to incorporate the lessons from RFCs, both successful and unsuccessful (this is what I was going for in the case studies section, but that clearly needs to be expanded).

The overall goal: Empowerment

My original intention was to structure the principles as a cascading series of ideas:

Rust’s top-level goal: Empowerment

Principles: Dissecting empowerment into its constituent pieces – reliable, performant, etc – and analyzing the importance of those pieces relative to one another.

Mechanisms: Specific rules that we use, like type safety, that engender the principles (reliability, performance, etc.). These mechanisms often work in favor of one principle, but can work against others.

wycats suggested that the site could do a better job of clarifying that empowerment is the top-level, overriding goal, and I agree. I’m going to try and tweak the site to make it clearer.

A goal, not a minimum bar

The principles in “How to Rustacean” were meant to be aspirational: a target to be reaching for. We’re all human: nobody does everything right all the time. But, as Matklad describes, the principles could be understood as setting up a kind of minimum bar – to be a team member, one has to show up, follow through, trust and delegate, all while bringing joy? This could be really stressful for people.

The goal for the “How to Rustacean” section is to be a way to lift people up by giving them clear guidance for how to succeed; it helps us to answer people when they ask “what should I do to get onto the lang/compiler/whatever team”. The internals thread had a number of good ideas for how to help it serve this intended purpose without stressing people out, such as cuviper’s suggestion to use fictional characters like Ferris in examples, passcod’s suggestion of discussing inclusion, or Matklad’s proposal to add something to the effect of “You don’t have to be perfect” to the list. Iteration needed!

Scope of the principles

Some people have wondered why the principles are framed in a rather general way, one that applies to all of Rust, instead of being specific to the lang team. It’s a fair question! In fact, they didn’t start this way. They started their life as a rather narrow set of “design tenets for async” that appeared in the async vision doc. But as those evolved, I found that they were starting to sound like design goals for Rust as a whole, not specifically for async.

Trying to describe Rust as a “coherent whole” makes a lot of sense to me. After all, the experience of using Rust is shaped by all of its facets: the language, the libraries, the tooling, the community, even its internal infrastructure (which contributes to that feeling of reliability by ensuring that the releases are available and high quality). Every part has its own role to play, but they are all working towards the same goal of empowering Rust’s users.²

There is an interesting question about the long-term trajectory for this work. In my mind, the principles remain something of an experiment. Presuming that they prove to be useful, I think that they would make a nice RFC.

What about “easy”?

One final bit of feedback I heard from Carl Lerche is surprise that the principles don’t include the word “easy”. This not an accident. I felt that “easy to use” was too subjective to be actionable, and that the goals of productive and supportive were more precise. However, I do think that for people to feel empowered, it’s important for them not feel mentally overloaded, and Rust can definitely have the problem of carrying a high mental load sometimes.

I’m not sure the best way to tweak the “Rust empowers by being…” section to reflect this, but the answer may lie with the Cognitive Dimensions of Notation. I was introduced to these from Felienne Herman’s excellent book The Programmer’s Brain; I quite enjoyed this journal article as well.

The idea of the CDN is to try and elaborate on the ways that tools can be easier or harder to use for a particular task. For example, Rust would likely do well on the “error prone” dimension, in that when you make changes, the compiler generally helps ensure they are correct. But Rust does tend to have a high “viscosity”, because making local changes tends to be difficult: adding a lifetime, for example, can require updating data structures all over the code in an annoying cascade.

It’s important though to keep in mind that the CDN will vary from task to task. There are many kinds of changes one can make in Rust with very low viscosity, such as adding a new dependency. On the other hand, there are also cases where Rust can be error prone, such as mixing async runtimes.

Conclusion

In retrospect, I wish I had introduced the concept of the Rustacean Principles in a different way. But the subsequent conversations have been really great, and I’m pretty excited by all the ideas on how to improve them. I want to encourage folks again to come over to the internals thread with their thoughts and suggestions.

Love that web page, brson. ↩︎

One interesting question: I do think that some tools may vary the prioritization of different aspects of Rust. For example, a tool for formal verification is obviously aimed at users that particularly value reliability, but other tools may have different audiences. I’m not sure yet the best way to capture that, it may well be that each tool can have its own take on the way that it particularly empowers. ↩︎

CTCFT 2021-09-20 Agenda

2021-09-15T00:00:00+00:00

The next “Cross Team Collaboration Fun Times” (CTCFT) meeting will take place next Monday, on 2021-09-20 (in your time zone)! This post covers the agenda. You’ll find the full details (along with a calendar event, zoom details, etc) on the CTCFT website.

Agenda

Announcements

Interest group panel discussion

We’re going to try something a bit different this time! The agenda is going to focus on Rust interest groups and domain working groups, those brave explorers who are trying to put Rust to use on all kinds of interesting domains. Rather than having fixed presentations, we’re going to have a panel discussion with representatives from a number of Rust interest groups and domain groups, led by AngelOnFira. The idea is to open a channel for communication about how to have more active communication and feedback between interest groups and the Rust teams (in both directions).

Afterwards: Social hour

After the CTCFT this week, we are going to try an experimental social hour. The hour will be coordinated in the #ctcft stream of the rust-lang Zulip. The idea is to create breakout rooms where people can gather to talk, hack together, or just chill.

Rustacean Principles

2021-09-08T00:00:00+00:00

As the web site says, Rust is a language empowering everyone to build reliable and efficient software. I think it’s precisely this feeling of empowerment that people love about Rust. As wycats put it recently to me, Rust makes it “feel like things are possible that otherwise feel out of reach”. But what exactly makes Rust feel that way? If we can describe it, then we can use that description to help us improve Rust, and to guide us as we design extensions to Rust.

Besides the language itself, Rust is also an open-source community, one that prides itself on our ability to do collaborative design. But what do we do which makes us able to work well together? If we can describe that, then we can use those descriptions to help ourselves improve, and to instruct new people on how to better work within the community.

This blog post describes a project I and others have been working on called the Rustacean principles. This project is an attempt to enumerate the (heretofore implicit) principles that govern both Rust’s design and the way our community operates. The principles are still in draft form; for the time being, they live in the nikomatsakis/rustacean-principles repository.

How the principles got started

The Rustacean Principles were suggested by Shane during a discussion about how we can grow the Rust organization while keeping it true to itself. Shane pointed out that, at AWS, mechanisms like tenets and the leadership principles are used to communicate and preserve shared values.¹ The goal at AWS, as in the Rust org, is to have teams that operate independently but which still wind up “itching in the same direction”, as aturon so memorably put it.

Since that initial conversation, the principles have undergone quite some iteration. The initial effort, which I presented at the CTCFT on 2021-06-21, were quite closely modeled on AWS tenets. After a number of in-depth conversations with both joshtriplett and aturon, though, I wound up evolving the structure quite a bit to what you see today. I expect them to continue evolving, particularly the section on what it means to be a team member, which has received less attention.

Rust empowers by being…

The principles are broken into two main sections. The first describes Rust’s particular way of empowering people. This description comes in the form of a list of properties that we are shooting for:

Rust empowers by being…

⚙️ Reliable: “if it compiles, it works”

🐎 Performant: “idiomatic code runs efficiently”

🥰 Supportive: “the language, tools, and community are here to help”

🧩 Productive: “a little effort does a lot of work”

🔧 Transparent: “you can predict and control low-level details”

🤸 Versatile: “you can do anything with Rust”

These properties are frequently in tension with one another. Our challenge as designers is to find ways to satisfy all of these properties at once. In some cases, though, we may be forced to decide between slightly penalizing one goal or another. In that case, we tend to give the edge to those goals that come earlier in the list over those that come later. Still, while the ordering is important, it’s important to emphasize that for Rust to be successful we need to achieve all of these feelings at once.

Each of the properties has a page that describes it in more detail. The page also describes some specific mechanisms that we use to achieve this property. These mechanisms take the form of more concrete rules that we apply to Rust’s design. For example, the page for reliability discusses type safety, consider all cases, and several other mechanisms. The discussion gives concrete examples of the tradeoffs at play and some of the techniques we have used to mitigate them.

One thing: these principles are meant to describe more than just the language. For example, one example of Rust being supportive are the great error messages, and Cargo’s lock files and dependency system are geared towards making Rust feel reliable.

How to Rustacean

Rust has been an open source project since its inception, and over time we have evolved and refined the way that we operate. One key concept for Rust are the governance teams, whose members are responsible for decisions regarding Rust’s design and maintenance. We definitely have a notion of what it means “to Rustacean” – there are specific behaviors that we are looking for. But it has historically been really challenging to define them, and in turn to help people to achieve them (or to recognize when we ourselves are falling short!). The next section of this site, How to Rustacean, is a first attempt at drafting just such a list. You can think of it like a companion to the Code of Conduct: whereas the CoC describes the bare minimum expected of any Rust participant, the How to Rustacean section describes what it means to excel.

How to Rustacean

💖 Be kind and considerate

✨ Bring joy to the user

👋 Show up

🔭 Recognize others’ knowledge

🔁 Start somewhere

✅ Follow through

🤝 Pay it forward

🎁 Trust and delegate

This section of the site has undergone less iteration than the “Rust empowerment” section. The idea is that each of these principles has a dedicated page that elaborates on the principle and gives examples of it in action. The example of Raising an objection about a design (from Show up) is the most developed and a good one to look at to get the idea. One interesting bit is the “goldilocks” structure², which indicates what it means to “show up” too little but also what it means to “show up” too much.

How the principles can be used

For the principles to be a success, they need to be more than words on a website. I would like to see them become something that we actively reference all the time as we go about our work in the Rust org.

As an example, we were recently wrestling with a minor point about the semantics of closures in Rust 2021. The details aren’t that important (you can read them here, if you like), but the decision ultimately came down to a question of whether to adapt the rules so that they are smarter, but more complex. I think it would have been quite useful to refer to these principles in that discussion: ultimately, I think we chose to (slightly) favor productivity at the expense of transparency, which aligns well with the ordering on the site. Further, as I noted in my conclusion, I would personally like to see some form of explicit capture clause for closures, which would give users a way to ensure total transparency in those cases where it is most important.

The How to Rustacean section can be used in a number of ways. One thing would be cheering on examples of where someone is doing a great job: Mara’s issue celebrating all the contributions to the 2021 Edition is a great instance of paying it forward, for example, and I would love it if we had a precise vocabulary for calling that out.

Another time these principles can be used is when looking for new candidates for team membership. When considering a candidate, we can look to see whether we can give concrete examples of times they have exhibited these qualities. We can also use the principles to give feedback to people about where they need to improve. I’d like to be able to tell people who are interested in joining a Rust team, “Well, I’ve noticed you do a great job of showing up, but your designs tend to get mired in complexity. I think you should work on start somewhere.”

“Hard conversations” where you tell someone what they can do better are something that mangers do (or try to do…) in companies, but which often get sidestepped or avoided in an open source context. I don’t claim to be an expert, but I’ve found that having structure can help to take away the “sting” and make it easier for people to hear and learn from the feedback.³

What comes next

I think at this point the principles have evolved enough that it makes sense to get more widespread feedback. I’m interested in hearing from people who are active in the Rust community about whether they reflect what you love about Rust (and, if not, what might be changed). I also plan to try and use them to guide both design discussions and questions of team membership, and I encourage others in the Rust teams to do the same. If we find that they are useful, then I’d like to see them turned into an RFC and ultimately living on forge or somewhere more central.

Questions?

I’ve opened an internals thread for discussion.

Footnotes

One of the first things that our team did at Amazon was to draft its own tenets; the discussion helped us to clarify what we were setting out to do and how we planned to do it. ↩︎

Hat tip to Marc Brooker, who suggested the “Goldilocks” structure, based on how the Leadership Principles are presented in the AWS wiki. ↩︎

Speaking of which, one glance at my queue of assigned PRs make it clear that I need to work on my follow through. ↩︎

Next CTCFT Meeting: 2021-09-20

2021-08-30T00:00:00+00:00

Hold the date! The next Cross Team Collaboration Fun Times meeting will be 2021-09-20. We’ll be using the “Asia-friendly” time slot of 21:00 EST.

What will the talks be about?

A detailed agenda will be announced in a few weeks. Current thinking however is to center the agenda on Rust interest groups and domain working groups, those brave explorers who are trying to put Rust to use on all kinds of interesting domains, such as game development, cryptography, machine learning, formal verification, and embedded development. If you run an interest group and I didn’t list your group here, perhaps you want to get in touch! We’ll be talking about how these groups operate and how we can do a better job of connecting interest groups with the Rust org.

Will there be a social hour?

Absolutely! The social hour has been an increasingly popular feature of the CTCFT meeting. It will take place after the meeting (22:00 EST).

How can I get this on my calendar?

The CTCFT meetings are announced on this google calendar.

Wait, what about August?

Perceptive readers will note that there was no CTCFT meeting in August. That’s because I and many others were on vacation. =)

CTCFT 2021-07-19 Agenda

2021-07-12T00:00:00+00:00

The next “Cross Team Collaboration Fun Times” (CTCFT) meeting will take place one week from today, on 2021-07-19 (in your time zone)! What follows are the abstracts for the talks we have planned. You’ll find the full details (along with a calendar event, zoom details, etc) on the CTCFT website.

Mentoring

Presented by: doc-jones

The Rust project has a number of mechanisms for getting people involved in the project, but most are oriented around 1:1 engagement. Doc has been investigating some of the ways that other projects engage contributors, such as Python’s mentored sprints. She will discuss how some of those projects run things and share some ideas about how that might be applied in the Rust project.

Lang team initiative process

Presented by: joshtriplett

The lang team recently established a new process we call initiatives. This is a refinement of the RFC process to include more explicit staging. Josh will talk about the new process, what motivated it, and how we’re trying to build more sustainable processes.

Driving discussions via postmortem analysis

Presented by: TBD

Innovation means taking risks, and risky behavior sometimes leads to process failures. An example of a recent process failure was the Rust 1.52.0 release, and subsequent 1.52.1 patch release that followed a few days later. Every failure presents an opportunity to learn from our mistakes and correct our processes going forward. In response to the 1.52.0 event, the compiler team recently went through a “course correction” postmortem process inspired by the “Correction of Error” reviews that pnkfelix has observed at Amazon. This talk describes the structure of a formal postmortem, and discusses how other Rust teams might deploy similar postmortem activities for themselves.

Afterwards: Social hour

After the CTCFT this week, we are going to try an experimental social hour. The hour will be coordinated in the #ctcft stream of the rust-lang Zulip. The idea is to create breakout rooms where people can gather to talk, hack together, or just chill.

CTCFT Social Hour

2021-06-18T00:00:00+00:00

Hey everyone! At the CTCFT meeting this Monday (2021-06-21), we’re going to try a “social hour”. The idea is really simple: for the hour after the meeting, we will create breakout rooms in Zoom with different themes. You can join any breakout room you like and hangout.

The themes for the breakout rooms will be based on suggestions. If you have an idea for a room you’d like to try, you can post it in a dedicated topic on the #ctcft Zulip stream. Or, if you see somebody else has posted an idea that you like, then add a 👍 emoji. We’ll create the final breakout list based on what we see there.

The breakout rooms can be as casual or focused as you like. For example, we will have some default rooms for hanging out – please make suggestons for icebreaker topics on Zulip! We also plan to have some rooms where people are chatting while doing Rust work: for example, yaahc suggested for folks who want to write mentoring instructions.

Also: a reminder that there is a CTCFT Calendar that you can subscribe to to be reminded of future meetings. If you like, I can add you to the invite, just ask on Zulip or Discord.

See you there!

CTCFT 2021-06-21 Agenda

2021-06-14T00:00:00+00:00

The second “Cross Team Collaboration Fun Times” (CTCFT) meeting will take place one week from today, on 2021-06-21 (in your time zone)! This post describes the main agenda items for the meeting; you’ll find the full details (along with a calendar event, zoom details, etc) on the CTCFT website.

Afterwards: Social hour

After the CTCFT this week, we are going to try an experimental social hour. The hour will be coordinated in the #ctcft stream of the rust-lang Zulip. The idea is to create breakout rooms where people can gather to talk, hack together, or just chill.

Turbowish and Tokio console

Presented by: pnkfelix and Eliza (hawkw)

Rust programs are known for being performant and correct – but what about when that’s not true? Unfortunately, the state of the art for Rust tooling today can often be a bit difficult. This is particularly true for Async Rust, where users need insights into the state of the async runtime so that they can resolve deadlocks and tune performance. This talk discuss what top-notch debugging and tooling for Rust might look like. One particularly exciting project in this area is tokio-console, which lets users visualize the state of projects build on the tokio library.

Guiding principles for Rust

Presented by: nikomatsakis

As Rust grows, we need to ensure that it retains a coherent design. Establishing a set of “guiding principles” is one mechanism for doing that. Each principle captures a goal that Rust aims to achieve, such as ensuring correctness, or efficiency. The principles give us a shared vocabulary to use when discussing designs, and they are ordered so as to give guidance in resolving tradeoffs. This talk will walk through a draft set of guiding principles for Rust that nikomatsakis has been working on, along with examples of how they those principles are enacted through Rust’s language, library, and tooling.

Edition: the song

2021-05-26T00:00:00+00:00

You may have heard that the Rust 2021 Edition is coming. Along with my daughter Daphne, I have recorded a little song in honor of the occasion! The full lyrics are below – if you feel inspired, please make your own version!¹ Enjoy!

Video

Lyrics

(Spoken) Breaking changes where no code breaks. Sounds impossible, no? But in the Rust language, you might say that we like to do impossible things. It isn’t easy. You may ask, how do we manage such a thing? That I can tell you in one word… Edition!

(Chorus) Edition, edition… edition!

(Lang) Who day and night Is searching for a change Whatever they can do So Rust’s easier for you Who sometimes finds They have to tweak the rules And change a thing or two in Rust?

(All) The lang team, the lang team… edition! The lang team, the lang team… edition!

(Libs) Who designs the traits that we use each day? All the time, in every way? Who updates the prelude so that we can call The methods that we want no sweat

(All) The libs team, the libs team… edition! The libs team, the libs team… edition!

(Users) Three years ago I changed my code to Rust twenty eighteen Some dependencies did not But they… kept working.

(All) The users, the users… edition! The users, the users… edition!

(Tooling) And who does all this work To patch and tweak and fix Migrating all our code Each edition to the next

(All) The tooling, the tooling… edition! The tooling, the tooling… edition!

(Spoken) And here in Rust, we’ve always had our little slogans. For instance, abstraction… without overhead. Concurrency… without data races. Stability… without stagnation. Hack… without fear. But we couldn’t do all of those things… not without… Edition!

Footnotes

OMG, that would be amazing. I’ll update the post with any such links I find. ↩︎

CTCFTFTW

2021-05-14T00:00:00+00:00

This Monday I am starting something new: a monthly meeting called the “Cross Team Collaboration Fun Times” (CTCFT)¹. Check out our nifty logo²:

The meeting is a mechanism to help keep the members of the Rust teams in sync and in touch with one another. The idea is to focus on topics of broad interest (more than two teams):

Status updates on far-reaching projects that could affect multiple teams;

Experience reports about people trying new things (sometimes succeeding, sometimes not);

“Rough draft” proposals that are ready to be brought before a wider audience.

The meeting will focus on things that could either offer insights that might affect the work you’re doing, or where the presenter would like to pose questions to the Rust teams and get feedback.

I announced the meeting some time back to all@rust-lang.org, but I wanted to make a broader announcement as well. This meeting is open for anyone to come and observe. This is by design. Even though the meeting is primarily meant as a forum for the members of the Rust teams, it can be hard to define the borders of a community like ours. I’m hoping we’ll get people who work on major Rust libraries in the ecosystem, for example, or who work on the various Rust teams that have come into being.

The first meeting is scheduled for 2021-05-17 at 15:00 Eastern and you will find the agenda on the CTCFT website, along with links to the slides (still a work-in-progress as of this writing!). There is also a twitter account @RustCTCFT and a Google calendar that you can subscribe to.

I realize the limitations of a synchronous meeting. Due to the reality of time zones and a volunteer project, for example, we’ll never be able to get all of Rust’s global community to attend at once. I’ve designed the meeting to work well even if you can’t attend: the goal is have a place to start conversations, not to finish them. Agendas are annonunced well in advance and the meetings are recorded. We’re also rotating times – the next meeting on 2021-06-21 takes place at 21:00 Eastern time, for example.³

Hope to see you there!

Footnotes

In keeping with Rust’s long-standing tradition of ridiculous acronyms. ↩︎

Thanks to @Xfactor521! 🙏 ↩︎

The agenda is still TBD. I’ll tweet when we get it lined up. We’re not announcing that far in advance! 😂 ↩︎

[AiC] Vision Docs!

2021-05-01T00:00:00+00:00

The Async Vision Doc effort has been going now for about 6 weeks. It’s been a fun ride, and I’ve learned a lot. It seems like a good time to take a step back and start talking a bit about the vision doc structure and the process. In this post, I’m going to focus on the role that I see vision docs playing in Rust’s planning and decision making, particularly as compared to RFCs.

Vision docs frame RFCs

If you look at a description of the design process for a new Rust feature, it usually starts with “write an RFC”. After all, before we start work on something, we begin with an RFC that both motivates and details the idea. We then proceed to implementation and stabilization.

But the RFC process isn’t really the beginning. The process really begins with identifying some sort of problem¹ – something that doesn’t work, or which doesn’t work as well as it could. The next step is imagining what you would like it to be like, and then thinking about how you could make that future into reality.

We’ve always done this sort of “framing” when we work on RFCs. In fact, RFCs are often just one small piece of a larger picture. Think about something like impl Trait, which began with an intentionally conservative step (RFC #1522) and has been gradually extended. Async Rust started the same way; in that case, though, even the first RFC was split into two, which together described a complete first step (RFC #2394 and RFC #2592).

The role of a vision doc is to take that implicit framing and make it explicit. Vision docs capture both the problem and the end-state that we hope to reach, and they describe the first steps we plan to take towards that end-state.

The “shiny future” of vision docs

There are many efforts within the Rust project that could benefit from vision docs. Think of long-running efforts like const generics or library-ification. There is a future we are trying to make real, but it doesn’t really exist in written form.

I can say that when the lang team is asked to approve an RFC relating to some incremental change in a long-running effort, it’s very difficult for me to do. I need to be able to put that RFC into context. What is the latest plan we are working towards? How does this RFC take us closer? Sometimes there are parts of that plan that I have doubts about – does this RFC lock us in, or does it keep our options open? Having a vision doc that I could return to and evolve over time would be a tremendous boon.

I’m also excited about the potential for ‘interlocking’ vision docs. While working on the Async Vision Doc, for example, I’ve found myself wanting to write examples that describe error handling. It’d be really cool if I could pop over to the Error Handling Project Group⁴, take a look at their vision doc, and then make use of what I see there in my own examples. It might even help me to identify a conflict before it happens.

Start with the “status quo”

A key part of the vision doc is that it starts by documenting the “status quo”. It’s all too easy to take the “status quo” for granted – to assume that everybody understands how things play out today.

When we started writing “status quo” stories, it was really hard to focus on the “status quo”. It’s really tempting to jump straight to ideas for how to fix things. It took discipline to force ourselves to just focus on describing and understanding the current state.

I’m really glad we did though. If you haven’t done so already, take a moment to browse through the status quo section of the doc (you may find the metanarrative helpful to get an overview⁵). Reading those stories has given me a much deeper understanding of how Async is working in practice, both at a technical level but also in terms of its impact on people. This is true even when presenting highly technical context. Consider stories like Barbara builds an async executor or Barbara carefully dismisses embedded future. For me, stories like this have more resonance than just seeing a list of the technical obstacles one must overcome. They also help us talk about the various “dead-ends” that might otherwise get forgotten.

Those kind of dead-ends are especially important for people new to Rust, of course, who are likely to just give up and learn something else if the going gets too rough. In working on Rust, we’ve always found that focusing on accessibility and the needs of new users is a great way to identify things that – once fixed – wind up helping everyone. It’s interesting to think how long we put off doing NLL. After all, metajack filed #6393 in 2013, and I remember people raising it with me earlier. But to those of us who were experienced in Rust, we knew the workarounds, and it never seemed pressing, and hence NLL got put off until 2018.⁶ But now it’s clearly one of the most impactful changes we’ve made to Rust for users at all levels.

Brainstorming the “shiny future”

A few weeks back, we started writing “shiny future” stories (in addition to “status quo”). The “shiny future” stories are the point where we try to imagine what Rust could be like in a few years.

Ironically, although in the beginning the “shiny future” was all we could think about, getting a lot of “shiny future” stories up and posted has been rather difficult. It turns out to be hard to figure out what the future should look like!⁷

Writing “shiny future” stories sounds a bit like an RFC, but it’s actually quite different:

The focus is on the end user experience, not the details of how it works.

We want to think a bit past what we know how to do. The goal is to “shake off” the limits of incremental improvement and look for ways to really improve things in a big way.

We’re not making commitments. This is a brainstorming session, so it’s fine to have multiple contradictory shiny futures.

In a way, it’s like writing just the “guide section” of an RFC, except that it’s not written as a manual but in narrative form.

Collaborative writing sessions

To try and make the writing process more fun, we started running collaborative Vision Doc Writing Sessions. We were focused purely on status quo stories at the time. The idea was simple – find people who had used Rust and get them to talk about their experiences. At the end of the session, we would have a “nearly complete” outline of a story that we could hand off to someone to finish.⁸

The sessions work particularly well when you are telling the story of people who were actually in the session. Then you can simply ask them questions to find out what happened. How did you start? What happened next? How did you feel then? Did you try anything else in between? If you’re working from blog posts, you sometimes have to take guesses and try to imagine what might have happened.⁹

One thing to watch out for: I’ve noticed people tend to jump steps when they narrate. They’ll say something like “so then I decided to use FuturesUnordered”, but it’s interesting to find out how they made that decision. How did they learn about FuturesUnordered? Those details will be important later, because if you develop some superior alternative, you have to be sure people will find it.

Shifting to the “shiny future”

Applying the “collaborative writing session” idea to the shiny future has been more difficult. If you get a bunch of people in one session, they may not agree on what the future should be like.

Part of the trick is that, with shiny future, you often want to go for breadth rather than depth. It’s not just about writing one story, it’s about exploring the design space. That leads to a different style of writing session, but you wind up with a scattershot set of ideas, not with a ’nearly complete’ story, and it’s hard to hand those off.

I’ve got a few ideas of things I would like to try when it comes to future writing sessions. One of them is that I would like to work directly with various luminaries from the Async Rust world to make sure their point-of-view is represented in the doc.

Another idea is to try and encourage more “end-to-end” stories that weave together the “most important” substories and give a sense of prioritization. After all, we know that there are subtle footguns in the model as is and we also know that intgrating into external event loops is tricky. Ideally, we’d fix both. But which is a bigger obstacle to Async Rust users? In fact, I imagine that there is no single answer. The answer will depend on what people are doing with Async Rust.

After brainstorming: Consolidating the doc and building a roadmap

The brainstorming period is scheduled to end mid-May. At that point comes the next phase, which is when we try to sort out all the contradictory shiny future stories into one coherent picture. I envision this process being led by the async working group leads (tmandry and I), but it’s going to require a lot of consensus building as well.

In addition to building up the shiny future, part of this process will be deciding a concrete roadmap. The roadmap will describe the specific first steps we will take first towards this shiny future. The roadmap items will correspond to particular designs and work items. And here, with those specific work items, is where we get to RFCs: when those work items call for new stdlib APIs or extensions to the language, we will write RFCs that specify them. But those RFCs will be able to reference the vision doc to explain their motivation in more depth.

Living document: adjusting the “shiny future” as we go

There is one thing I want to emphasize: the “shiny future” stories we write today will be wrong. As we work on those first steps that appear in the roadmap, we are going to learn things. We’re going to realize that the experience we wanted to build is not possible – or perhaps that it’s not even desirable! That’s fine. We’ll adjust the vision doc periodically as we go. We’ll figure out the process for that when the time comes, but I imagine it may be a similar – but foreshortened – version of the one we have used to draft the initial version.

Conclusion

Ack! It’s probably pretty obvious that I’m excited about the potential for vision docs. I’ve got a lot of things I want to say about them, but this post is getting pretty long. There are a lot of interesting questions to poke at, most of which I don’t know the answers to yet. Some of the things on my mind: what are the best roles for the characters and should we tweak how they are defined¹⁰? Can we come up with good heuristics for which character to use for which story? How are the “consolidation” and “iteration / living document” phases going to work? When is the appropriate time to write a vision doc – right away, or should you wait until you’ve done enough work to have a clearer picture of what the future looks like? Are there lighterweight versions of the process? We’re going to figure these things out as we go, and I will write some follow-up posts talking about them.

Footnotes

Not problem, opportunity! ↩︎

And – heck – we’re still working towards Polonius! ↩︎

Not my actual reason. I don’t know my actual reason, it just seems right. ↩︎

Shout out to the error handling group, they’re doing great stuff! ↩︎

Did I mention we have 34 stories so far (and more in open PRs)? So cool. Keep ’em coming! ↩︎

To be fair, it was also because designing and implementing NLL was really, really hard.² ↩︎

Who knew? ↩︎

Big, big shout-out to all those folks who have participated, and especially those brave souls who authored stories. ↩︎

One thing that’s great, though, is that after you post the story, you can ping people and ask them if you got it right. =) ↩︎

I feel pretty strongly that four characters is the right number (it worked for Marvel, it will work for us!)³, but I’m not sure if we got their setup right in other respects. ↩︎

Async Vision Doc Writing Sessions VII

2021-04-26T00:00:00+00:00

My week is very scheduled, so I am not able to host any public drafting sessions this week – however, Ryan Levick will be hosting two sessions!

When Who

Wed at 07:00 ET Ryan

Fri at 07:00 ET Ryan

If you’re available and those stories sound like something that interests you, please join him! Just ping me or Ryan on Discord or Zulip and we’ll send you the Zoom link. If you’ve already joined a previous session, the link is the same as before.

Extending the schedule by two weeks

We have previously set 2021-04-30 as the end-date, but I proposed in a recent PR to extend that end date to 2021-05-14. We’ve been learning how this whole vision doc thing works as we go, and I think it seems clear we’re going to want more time to finish off status quo stories and write shiny future before we feel we’ve really explored the design space.

The vision…what?

Never heard of the async vision doc? It’s a new thing we’re trying as part of the Async Foundations Working Group:

We are launching a collaborative effort to build a shared vision document for Async Rust. Our goal is to engage the entire community in a collective act of the imagination: how can we make the end-to-end experience of using Async I/O not only a pragmatic choice, but a joyful one?

Read the full blog post for more.

Async Vision Doc Writing Sessions VI

2021-04-19T00:00:00+00:00

Ryan Levick and I are going to be hosting more Async Vision Doc Writing Sessions this week. We’re not organized enough to have assigned topics yet, so I’m just going to post the dates/times and we’ll be tweeting about the particular topics as we go.

When Who

Wed at 07:00 ET Ryan

Wed at 15:00 ET Niko

Fri at 07:00 ET Ryan

Fri at 14:00 ET Niko

If you’ve joined before, we’ll be re-using the same Zoom link. If you haven’t joined, then send a private message to one of us and we’ll share the link. Hope to see you there!

Async Vision Doc Writing Sessions V

2021-04-12T00:00:00+00:00

This is an exciting week for the vision doc! As of this week, we are starting to draft “shiny future” stories, and we would like your help! (We are also still working on status quo stories, so there is no need to stop working on those.) There will be a blog post coming out on the main Rust blog soon with all the details, but you can go to the “How to vision: Shiny future” page now.

This week, Ryan Levick and I are going to be hosting four Async Vision Doc Writing Sessions. Here is the schedule:

When Who Topic

Wed at 07:00 ET Ryan TBD

Wed at 15:00 ET Niko Shiny future – Niklaus simulates hydrodynamics

Fri at 07:00 ET Ryan TBD

Fri at 14:00 ET Niko Shiny future – Portability across runtimes

The idea for shiny future is to start by looking at the existing stories we have and to imagine how they might go differently. To be quite honest, I am not entirely how this is going to work, but we’ll figure it out together. It’s going to be fun. =) Come join!

Async Vision Doc Writing Sessions IV

2021-04-07T00:00:00+00:00

My week is very scheduled, so I am not able to host any public drafting sessions this week – however, Ryan Levick will be hosting two sessions!

When Who Topic

Thu at 07:00 ET Ryan The need for Async Traits

Fri at 07:00 ET Ryan Challenges from cancellation

If you’re available and those stories sound like something that interests you, please join him! Just ping me or Ryan on Discord or Zulip and we’ll send you the Zoom link. If you’ve already joined a previous session, the link is the same as before.

Sneak peek: Next week

Next week, we will be holding more vision doc writing sessions. We are now going to expand the scope to go beyond “status quo” stories and cover “shiny future” stories as well. Keep your eyes peeled for a post on the Rust blog and further updates!

The vision…what?

Never heard of the async vision doc? It’s a new thing we’re trying as part of the Async Foundations Working Group:

We are launching a collaborative effort to build a shared vision document for Async Rust. Our goal is to engage the entire community in a collective act of the imagination: how can we make the end-to-end experience of using Async I/O not only a pragmatic choice, but a joyful one?

Read the full blog post for more.

My "shiny future"

2021-04-02T00:00:00+00:00

I’ve been working on the Rust project for just about ten years. The language has evolved radically in that time, and so has the project governance. When I first started, for example, we communicated primarily over the rust-dev mailing list and the #rust IRC channel. I distinctly remember coming into the Mozilla offices¹ one day and brson excitedly telling me, “There were almost a dozen people on the #rust IRC channel last night! Just chatting! About Rust!” It’s funny to think about that now, given the scale Rust is operating at today.

Scaling the project governance

Scaling the governance of the project to keep up with its growing popularity has been a constant theme. The first step was when we created a core team (initially pcwalton, brson, and I) to make decisions. We needed some kind of clear decision makers, but we didn’t want to set up a single person as “BDFL”. We also wanted a mechanism that would allow us to include non-Mozilla employees as equals.²

Having a core team helped us move faster for a time, but we soon found that the range of RFCs being considered was too much for one team. We needed a way to expand the set of decision makers to include focused expertise from each area. To address these problems, aturon and I created RFC 1068, which expanded from a single “core team” into many Rust teams, each focused on accepting RFCs and managing a particular area.

As written, RFC 1068 described a central technical role for the core team³, but it quickly became clear that this wasn’t necessary. In fact, it was a kind of hindrance, since it introduced unnecessary bottlenecks. In practice, the Rust teams operated quite independently from one another. This independence enabled us to move rapidly on improving Rust; the RFC process – which we had introduced in 2014⁴ – provided the “checks and balances” that kept teams on track.⁵ As the project grew further, new teams like the release team were created to address dedicated needs.

The teams were scaling well, but there was still a bottleneck: most people who contributed to Rust were still doing so as volunteers, which ultimately limits the amount of time people can put in. This was a hard nut to crack⁶, but we’ve finally seen progress this year, as more and more companies have been employing people to contribute to Rust. Many of them are forming entire teams for that purpose – including AWS, where I am working now. And of course I would be remiss not to mention the launch of the Rust Foundation itself, which gives Rust a legal entity of its own and creates a forum where companies can pool resources to help Rust grow.

My own role

My own trajectory through Rust governance has kind of mirrored the growth of the project. I was an initial member of the core team, as I said, and after we landed RFC 1068 I became the lead of the compiler and language design teams. I’ve been wearing these three hats until very recently.

In December, I decided to step back as lead of the compiler team. I had a number of reasons for doing so, but the most important is that I want to ensure that the Rust project continues to scale and grow. For that to happen, we need to transition from one individual doing all kinds of roles to people focusing on those places where they can have the most impact.⁷

Today I am announcing that I am stepping back from the Rust core team. I plan to focus all of my energies on my roles as lead of the language design team and tech lead of the AWS Rust Platform team.

Where we go from here

So now we come to my “shiny future”. My goal, as ever, is to continue to help Rust pursue its vision of being an accessible systems language. Accessible to me means that we offer strong safety guarantees coupled with a focus on ergonomics and usability; it also means that we build a welcoming, inclusive, and thoughtful community. To that end, I expect to be doing more product initiatives like the async vision doc to help Rust build a coherent vision for its future; I also expect to continue working on ways to scale the lang team, improve the RFC process, and help the teams function well.

I am so excited about all that we the Rust community have built. Rust has become a language that people not only use but that they love using. We’ve innovated not only in the design of the language but in the design and approach we’ve taken to our community. “In case you haven’t noticed…we’re doing the impossible here people!” So here’s to the next ten years!

Offices! Remember those? Actually, I’ve been working remotely since 2013, so to be honest I barely do. ↩︎

I think the first non-Mozilla member of the core team was Huon Wilson, but I can’t find any announcements about it. I did find this very nicely worded post by Brian Andersion about Huon’s departure though. “They live on in our hearts, and in our IRC channels.” Brilliant. ↩︎

If you read RFC 1068, for example, you’ll see some language about the core team deciding what features to stabilize. I don’t think this happened even once: it was immediately clear that the teams were better positioned to make this decision. ↩︎

The email makes this sound like a minor tweak to the process. Don’t be fooled. It’s true that people had always written “RFCs” to the mailing list. But they weren’t mandatory, and there was no real process around “accepting” or “rejecting” them. The RFC process was a pretty radical change, more radical I think than we ourselves even realized. The best part of it was that it was not optional for anyone, including core developers. ↩︎

Better still, the RFC mechanism invites public feedback. This is important because no single team of people can really have expertise in the full range of considerations needed to design a language like Rust. ↩︎

If you look back at my Rust roadmap posts, you’ll see that this has been a theme in every single one. ↩︎

I kind of love these three slides from my Rust LATAM 2019 talk, which expressed the same basic idea, but from a different perspective. ↩︎

Async Vision Doc Writing Sessions III

2021-03-29T00:00:00+00:00

Ryan Levick and I are hosting a number of public drafting sessions scheduled this week. Some of them are scheduled early to cover a wider range of time zones.

When Who Topic

Tue at 14:30 ET Niko wrapping C++ async APIs in Rust futures and other tales of interop

Wed at 10:00 ET Niko picking an HTTP library and similar stories

Wed at 15:00 ET Niko structured concurrency and parallel data processing

Thu at 07:00 ET Ryan debugging and getting insights into running services

Fri at 07:00 ET Ryan lack of a polished common implementations of basic async helpers

Fri at 14:30 ET Niko bridging sync and async

If you’re available and those stories sound like something that interests you, please join us! We’re particlarly interested in having people join who have had related experiences, as the goal here is to capture the details from people who’ve been there.

In some cases, it may be helpful if you’ve had similar experiences but in other ecosystems:

For example, people who’ve used Kotlin’s coroutines would be most welcome on the Wed sesssion discussing structured concurency.

Similarly, folks who have used debuggers for other sorts of async systems (such as node.js or C#) would probably have useful info to share on Ryan’s Thusday session.

If you would like to join, ping me or Ryan on Discord or Zulip and we’ll send you the Zoom link. If you’ve already joined a previous session, the link is the same as before.

The vision…what?

Never heard of the async vision doc? It’s a new thing we’re trying as part of the Async Foundations Working Group:

We are launching a collaborative effort to build a shared vision document for Async Rust. Our goal is to engage the entire community in a collective act of the imagination: how can we make the end-to-end experience of using Async I/O not only a pragmatic choice, but a joyful one?

Read the full blog post for more.

Async Vision Doc Writing Sessions II

2021-03-25T00:00:00+00:00

I’m scheduling two more public drafting sessions for tomorrow, Match 26th:

On March 26th at 10am ET (click to see in your local timezone), we will be working on writing a story about the challenges of writing a library that can be reused across many runtimes (rust-lang/wg-async-foundations#45);

On March 26th at 2pm ET (click to see in your local tomezone), we will be working on writing a story about the difficulty of debugging and interpreting async stack traces (rust-lang/wg-async-foundations#69).

If you’re available and have interest in one of those issues, please join us! Just ping me on Discord or Zulip and I’ll send you the Zoom link.

I also plan to schedule more sessions next week, so stay tuned!

The vision…what?

Never heard of the async vision doc? It’s a new thing we’re trying as part of the Async Foundations Working Group:

We are launching a collaborative effort to build a shared vision document for Async Rust. Our goal is to engage the entire community in a collective act of the imagination: how can we make the end-to-end experience of using Async I/O not only a pragmatic choice, but a joyful one?

Read the full blog post for more.

Async Vision Doc Writing Sessions

2021-03-22T00:00:00+00:00

Hey folks! As part of the Async Vision Doc effort, I’m planning on holding two public drafting sessions tomorrow, March 23rd:

March 23rd at noon ET (click to see in your local timezone)

March 23rd at 5pm ET (click to see in your local tomezone)

During these sessions, we’ll be looking over the status quo issues and writing a story or two! If you’d like to join, ping me on Discord or Zulip and I’ll send you the Zoom link.

The vision…what?

Never heard of the async vision doc? It’s a new thing we’re trying as part of the Async Foundations Working Group:

We are launching a collaborative effort to build a shared vision document for Async Rust. Our goal is to engage the entire community in a collective act of the imagination: how can we make the end-to-end experience of using Async I/O not only a pragmatic choice, but a joyful one?

Read the full blog post for more.

The more things change...

2020-12-30T00:00:00+00:00

I’ve got an announcement to make. As of Jan 4th, I’m starting at Amazon as the tech lead of their new Rust team. Working at Mozilla has been a great experience, but I’m pretty excited about this change. It’s a chance to help shape what I hope to be an exciting new phase for Rust, where we grow from a project with a single primary sponsor (Mozilla) to an industry standard, supported by a wide array of companies. It’s also a chance to work with some pretty awesome people – both familiar faces from the Rust community¹ and some new folks. Finally, I’m hoping it will be an opportunity for me to refocus my attention to some long-standing projects that I really want to see through.

New Rust teams are an opportunity, but we have to do it right

The goal for Rust has always been to create a language that will be used and supported by companies throughout the industry. With the imminent launch of the Rust Foundation as well as the formation of new Rust teams at Amazon, Microsoft, and Facebook, we are seeing that dream come to fruition. I’m very excited about this. This is a goal I’ve been working towards for years, and it was a particular focus of mine for 2020.

That said, I’ve talked to a number of people in the Rust community who feel nervous about this change. After all, we’ve worked hard to build an open source organization that values curiosity, broad collaboration, and uplifting others. As more companies form Rust teams, there’s a chance that some of that could be lost, even if everyone has the best of intentions. While we all want to see more people paid to work on Rust, that can also result in “part time” contributors feeling edged out.

Working to support Rust and its community

One reason that I am excited to be joining the team at Amazon is that our scope is very simple: help make Rust the best it can be.

In my view, “making Rust the best it can be” means not only doing good work, but doing that work in concert with the rest of the Rust community. That means sharing in the “maintenance work” of open source: reviews, bug fixes, tracking down regressions, organizing meetings, that sort of thing. But it also means expanding and nurturing the Rust teams we’re a part of. It’s good to fix a bug. It’s better to find a newcomer and mentor them to fix it, or to extend the rustc-dev-guide so that it covers the code that had the bug.

The ultimate goal should be free and open collaboration. We’ll know the Amazon team setup is working well if it doesn’t really matter if the people we’re collaborating with work at Amazon or not.

On pluralism and the Rust organization

I want to zoom out a bit to the broader picture. As I said in the intro, we are entering a new phase for Rust, one where there are multiple active Rust teams at different companies, all working as part of the greater Rust community to build and support Rust. This is something to celebrate. I think it will go a long way towards making Rust development more sustainable for everyone.

Even as we celebrate, it’s worth recognizing that in many ways this exciting future is already here. Supporting Rust doesn’t require forming a full-time Rust team. The Google Fuchsia team, for example, has always made a point of not only using Rust but actively contributing to the community. Ferrous Systems has a number of folks who work within the Rust teams. In truth, there are a lot of employers who give their employees time to work on Rust – way too many to list, even if I knew all their names. Then we have companies like Embark and others that actively fund work on their dependencies (shout-out to cargo-fund, an awesome tool developed by the equally awesome acfoltzer, who – as it happens – works at Fastly, another company that has been an active supporter of Rust).

This kind of collaboration is exactly what we envisioned when we setup things like the Rust teams and the RFC process. The ultimate goal is to have a “rich stew” of people with different interests and backgrounds all contributing to Rust, helping to ensure that Rust works well for systems programming everywhere. In order to do that successfully, you need both a structure like the Rust org but also an “open source whenever”² setup that accommodates people with different amounts of availability, since the people you’re trying to reach are not all available full time. I think we have room for improvement here – this is what my Adventures in Consensus series is all about – but ain’t that always the truth?

The trick of course is that in order to achieve “open source whenever”, you need full-time people to help pull it all together. This in many ways has been the limiting factor for Rust thus far, and it is precisely what these new Rust teams – with support from the new Rust Foundation as well – can and will change. We have a lot to look forward to!

Footnotes

I’ll let them make their own announcements. ↩︎

Hat tip to Jessica Lord, whose post “Privilege, Community and Open Source” is one I still re-read regularly. ↩︎

Looking back on 2020

2020-12-18T00:00:00+00:00

I wanted to write a post that looks back over 2020 from a personal perspective. My goal here is to look at the various initiatives that I’ve been involved in and try to get a sense for how they went, what worked and what didn’t, and also what that means for next year. This post is a backdrop for a #niko2021 post that I plan to post sometime before 2021 actually starts, talking about what I expect to be doing in 2021.

I want to emphasize the ‘personal’ bit. This is not meant as a general retrospective of what has happened in the Rust universe. I also don’t mean to claim credit for all (or most) of the ideas on this list. Some of them are things I was at best tangentially involved in, but which I think are inspiring, and would inform events of next year.

The backdrop: total hellscape

It goes without saying that it was quite a year. It’s impossible to ignore the pandemic, the killings of George Floyd, Breonna Taylor, Ahmaud Arbery, China’s actions in Hong Kong, massive financial disruption, what can only be described as an attempt to steal the US election, and all the other things that are going on around us. Many of the biggest events in Rust were shaped by this global backdrop. If nothing else, it added to a general ambient stress level that made 2020 a very difficult year for me personally. Not to provide free advertising for anyone, but this match.com commercial really did capture it. Here’s to a better 2021. 🥂

Still, a lot of good stuff happened

Despite all of that, I am pretty proud of a number of developments around Rust that I have been involved in. I think we done a number of important things, and we have a number of really promising initiatives in flight as well that I think will come to fruition in 2021. I’d like to talk about some of those.

Once I started compiling a list I realized there’s an awful lot, so here is a kind of TL;DR where you can click for more details:

Process and governance

The Major Change Process helped compiler team spend more time on design

Lang Team Project Proposals show promise, but are a WIP

The Lang Team’s Backlog Bonanza was great, and should continue

The Foundation Conversation was an interesting model I think we can apply elsewhere

The Foundation is very exciting

Technical work

The group working on RFC 2229 (“disjoint closure captures”) is awesome

The MVP for const generics is great, and we should do more

Sprints for Polonius are a great model, we need more sprints

Chalk and designs for a shared type library

Progress on ffi-unwind

Progress on never type stabilization

Progress on Async Rust

The Major Change Process helped compiler team spend more time on design

One of the things I am most happy with is the compiler team’s Major Change Process. For those not familiar with it, the idea is simple: if you would like to make a Major Change to the compiler (defined loosely as “something that would change documentation in the rustc-dev-guide”), then you first open an issue (called a Major Change Proposal, or MCP) on the compiler-team repository. In that issue, you describe roughly the idea. This also automatically opens a Zulip thread in #t-compiler/major changes for discussion. If somebody on the compiler team likes the idea, they “second” the proposal. This automatically starts off a Final Comment Period of 10 days. At the end of that, the MCP is approved.

The goal of MCPs is two-fold. The first, and most important, goal is to encourage more design discussion. It would sometimes happen that we have large PRs opened with little or not indication of the greater design that they were shooting for, which made it really hard to review. We can now tell the authors of such PRs “please write an MCP describing the design you have in mind here”. The second goal is to give us a lightweight way to make decisions. It would sometimes happen that PRs kind of get stuck without a clear “decision” having been made.

The MCP process is not without its problems. We recently did a retrospective and while I think the first goal (“design feedback”) has been a big success, the second goal (“clearer decisions”) is a mixed bag. We’ve definitely had problems where MCPs were approved but people didn’t feel their objections had been heard. I think we’ll wind up tweaking the process to better account for that.

Lang Team Project Proposals show promise, but are a WIP

In the lang team, we have been experimenting on a change to our process we call “project proposals”. The idea is that, before writing an RFC, you can write a more lightweight proposal to take the temperature of the lang team. We will take a look and decide whether what we think, which might be one of a few things:

Suggest implementing: The idea is good and it is small enough that we think you can just go straight to implementation.

Needs an RFC: The idea is good but it ought to have an RFC. We’ll assign a liaison to work with you towards fleshing it out.

Close: We don’t feel this idea is a good fit right now.

I had a lot of goals in mind for project proposals. First, to help us avoid RFC limbo and unbounded queues. I want to get to the point where the only open RFCs on the repository are ones that are generally backed by the lang team, so that the team is able to keep up with the traffic on them and keep the process moving. But I want to do this without cutting off the potential for people to bring up interesting ideas that weren’t on the team radar.

Another goal is to support RFC authors better. One bit of feedback I’ve received over the years numerous times is that people are intimidated to author RFCs, or consider it too much of a hassle. The idea of assigning a liaison is that they can help on the RFC and give guidance, while also keeping the broader team in the loop.

Finally, I hope that liaisons can serve as part of a clearer path to lang-team membership. The idea is that serving as the liaison for a project can be a way for us to see how people would be as a member of the lang-team and possibly recruit new members.

I would say that the “project” system has been a mixed success. We’ve had a number of successful project groups, but we’ve also had some that are slow to start. We’ve not done a great job of recruiting fresh liaisons and I think the role could use more definition. Finally, we need to have much clearer messaging, and a more finalized “decision” around the RFC process – I’m also concerned if the RFC process starts to diverge too much between teams. I think it’s quite confusing for people right now to know how they’re supposed to “pitch” an idea (and people are often unclear which team is the best fit for an idea).

Josh and I have been iterating on a more complete “staged RFC” proposal that aims to address a number of those points (it’s a refinement and iteration on the older staged RFC idea that I wrote about years ago). This is one of the things I’d really like to focus on next year, along with improving and defining the lang team liaison process.

The Lang Team’s Backlog Bonanza was great, and should continue

This year the lang team did a series of sync meetings that we called the “Backlog Bonanza”, where we went through every pending RFC and tried to figure out what to do with it. This was great not only because we were able to give feedback on every open RFC and (mostly) determine what to do with it¹, but also as a ’team bonding’ exercise (at least I thought so). It helped us to sharpen what kinds of things we think are important.

Next year I hope to extend the Backlog Bonanza towards triaging open tracking issues and features. I’d like this to fit in with the work towards tracking projects. Ideally we’d get to the point where you can very easily tell “what are the projects that are likely to be stabilized soon”, “what are the projects that could use my help”, and “what are the projects that are stalled out” (along with other similar questions).

The Foundation Conversation was an interesting model I think we can apply elsewhere

One of the things that’s been on my mind this year is that we need to be looking for new ways to get “beyond the comment thread” when it comes to engaging with Rust users and getting design feedback. Comment threads are flexible and sometimes fantastic but prone to all kinds of problems, particularly on controversial or complex topics. Last year I wrote about Collaborative Summary Documents as an alternative to comment threads. This year we tried out the Foundation Conversation², and I thought it worked out quite well. I particularly enjoyed the Github Q&A aspect of it.³ It seemed like a good way to take questions and share information.

The way we ran it, for future reference, was as follows:

Open a github repo for a period of time to take questions.

We had a zoom call going with the team all present.

When new issues were opened, we would briefly discuss and assign someone to write a response. After some period of time, we’d review the response and suggest edits (or someone else might take over). This repeated until consensus was reached.

At the end of the day, we collected the answers into a FAQ.

I feel like this might be an interesting model to use or adapt for other purposes. It might have been a nice way to take feedback on async-await syntax, for example, or other extremely controversial topics. In these cases there is often a lot of context that the team has acquired but it is difficult to “share it”.

(One thing I’ve always wanted to do is to collect feedback via google forms or e-mails. We would then read and think about the feedback, maybe contact the authors, and produce a new design in response; we would also publish the feedback we got and our thoughts.)

The Foundation is very exciting

A large part of my life this year has been spent learning and working towards the creation of a Rust Foundation, and I’m very excited that it’s finally taking shape. I think that the Foundation’s mission of empowering Rust maintainers to joyfully do their best work is tremendously important, and I think it will provide a venue for us to do things on Rust that would be hard to do otherwise. If you want to learn more about it, check out the Foundation FAQ or our live broadcasts.

While I’m on the topic, I want to say that I think Mozilla deserves a lot of credit here. It’s not every company that would embark on a project like Rust, much less launch it out into an independent foundation. Huzzah!

The group working on RFC 2229 (“disjoint closure captures”) is awesome

RFC 2229 proposed a change to how closure capture works. Consider a closure like || some_func(&a.b.c). Today, that closure will capture the entire variable a. Under RFC 2229, it would capture a.b.c, which can avoid a number of unnecessary borrow checker conflicts.

RFC 2229 was approved in 2018 but implementation was stalled while we worked on NLL and other details. Recently though an excellent group of folks decided to take on the implementation work. Over the past year, I’ve been working with them on the design and implementation, and we’ve been making steady progress. The feature is now at the point where it “basically works” and we are working on migration (enabling this feature will require a Rust edition, as it would otherwise change the semantics of existing programs). A particular shout out to arora-aman, who has been the “point person” for the group, helping to collect questions, relay answers, and generally keep things organized.

Given the great progress we’ve been making, I am quite hopeful that we’ll see this feature land as part of a 2021 Rust Edition. The only caveat is that doing the implementation work has raised some questions about the best behavior for move closures and the like, so we may need to do a bit more design iteration before we are fully satisfied.

The MVP for const generics is great, and we should do more

Const generics has been one of those ’long awaited’ features whose fate often felt very uncertain. In July, boats proposed a kind of “MVP” for const generics – a simple subset that enables a number of important use cases and sidesteps some of the areas where the implementation work isn’t done yet. We now have a stabilization PR for that subset in FCP, thanks to a lot of tireless work by lcnr, varkor, and others.

I’m very excited about this for two reasons. First, I think the MVP will be really useful to library authors. But secondly, I think this “MVP” strategy that we should be deploying more often. For example, oli, matthewjasper and I recently outlined a kind of “MVP” for “named impl trait”, though we have yet to describe or fully propose it. =)

This idea of pushing an MVP to conclusion is something we’ve done a number of times in Rust in the past, but it’s one of those strategies that are easy to forget about it when you’re in the thick of trying to work through some problem. I’m hopeful that in 2021 we can make progress on some of our longer running initiatives in this way.

Sprints for Polonius are a great model, we need more sprints

Polonius is another project that has been making slow progress, mostly because other things keep taking higher priority. This year we tried a new approach to working on it, which was to schedule a “sprint week”. The idea was that the entire group would reserve time in their schedules and spend about 4 hours a day over the course of one week to just focus on polonius (some people spent more). For projects like polonius, this kind of concentrated attention is really useful, because there is a lot of context you have to build up in your head in order to make progress.

In a recent compiler team meeting, we discussed the idea of using these “sprints” more generally. For example, we considered having a bi-monthly compiler team sprint, where we would encourage the team (and new contributors!) to clear space in their schedules to help push progress on a particular goal.

I’ve heard from many part-time contributors that this kind of sprint approach can be really useful, as it’s easier to get support for a “week of concentrated work” than for a “steady drip” of tasks. (In the latter case, it’s easy for those tasks to always be pre-empted by higher priorities work items.) It also can create a nice sense of community.

Chalk and designs for a shared type library

Speaking of community, the Chalk project continues to advance, although with the work on the Foundation I at least have not been able to pay as much attention as I would like. Chalk’s integration with rustc has made great progress, and it’s still being used by rust-analyzer as the main trait engine. Lately our focus has been the shared type library that I first proposed in March. A huge shoutout to jackh726, who has not only been writing a lot of great PRs, but also doing a lot of the organizational work. I expect this to be a continued area of focus in 2021.

Progress on ffi-unwind

Unwinding across FFI boundaries has been a persistent annoying pain point for years. We generally wanted it to be UB, but there are some use cases that demand it. Plus, understanding unwinding is really complex and involves lots of grungy platform details. This is a perfect recipe for inaction. This year the ffi-unwind project group finally took the time to dive into the options and make a proposal, resulting in RFC 2945 (which now has a pending implementation PR). Hat tip to Amanieu, BatmanAoD, and katie-martin-fastly for their work on this.

Progress on never-type

Stabilizing the never type (!) is another of those long-standing endeavors that keeps getting blocked by one problem or another. Over the last few months I spent some time working with blitzerr to create a lint for tainted fallback. We succeeded in writing the lint, but found it opened up some new issues, which gave rise to a fresh idea for how to approach fallback which I implemented in #79366. I haven’t had time to revisit this since we did a crater run to assess impact, but I’m hopeful that we’ll be able to finally stabilize the never type in 2021.

Progress on Async Rust

tmandry has been leading the “async foundations working group” for some time. The group has been slowly expanding its focus from polish and fixing bugs towards new RFCs and efforts:

nellshamrell opened an RFC stabilizing the Stream trait, currently in “pre-FCP”, and yoshuawuyts opened a PR with an unstable implementation

blgBV and LucioFranco opened an RFC for a “must not await” lint to help catch values that are live across an await, but should not be

while this is not an “async”-specific effort, sfackler landed an RFC for reading into uninitialized buffers, which potentially unlocks progress on AsyncRead, as he and I discussed in our async interview

continued smaller stabilizations of useful bits of functionality, like core::future::ready

In general, I thought the Async Interviews were a good experience, and I’d like to do more things like that as a way to dig into technical questions. (I actually have one interview that I never got around to publishing – oops. I should do that!)

Conclusion and some personal thoughts

Well, the end of 2020 is coming up quick. We did it. I want to wish all of you a happy end of the year, and encourage everyone to relax and take it easy on yourselves. Despite all odds, I think it’s been a pretty good year for Rust. People who know me know that I have a hard time feeling “satisfied”⁴. I don’t like to count chickens, and I tend to think things will go wrong⁵. Well, as of this year, even I can plainly see that “Rust has made it”. Every day I am learning about new uses for Rust. This isn’t to say we’re done, there’s still plenty to do, but I think we can really take pride in having achieved what initially seemed impossible: launching a new systems programming language into widespread use.

Footnotes

In some cases, we still need to complete the follow-up work, I think, of actually closing and commenting on those RFCs. ↩︎

Hat tip to [Ashley Williams][ag_dubs] for proposing this communication plan. [ag_dubs]: https://twitter.com/ag_dubs ↩︎

Well, that and the crude digital editing. ↩︎

Working on it. ↩︎

The major exception is when I am preparing my To Do list. In that case, I seem to think that nothing unexpected ever happens and there are 72 hours in the day. ↩︎

Rotating the compiler team leads

2020-12-11T00:00:00+00:00

Since we created the Rust teams, I have been serving as lead of two teams: the compiler team and the language design team (I’ve also been a member of the core team, which has no lead). For those less familiar with Rust’s governance, the compiler team is focused on the maintenance and implementation of the compiler itself (and, more recently, the standard library). The language design team is focused on the design aspects. Over that time, all the Rust teams have grown and evolved, with the compiler team in particular being home to a number of really strong members.

Last October, I announced that pnkfelix was joining me as compiler team co-lead. Today, I am stepping back from my role as compiler team co-lead altogether. After taking nominations from the compiler team, pnkfelix and I are proud to announce that wesleywiser will replace me as compiler team co-lead. If you don’t know Wesley, there’ll be an announcement on Inside Rust where you can learn a bit more about what he has done, but let me just say I am pleased as punch that he agreed to serve as co-lead. He’s going to do a great job.

You’re not getting rid of me this easily

Stepping back as compiler team co-lead does not mean I plan to step away from the compiler. In fact, quite the opposite. I’m still quite enthusiastic about pushing forward on ongoing implementaton efforts like the work to implement RFC 2229, or the development on chalk and polonius. In fact, I am hopeful that stepping back as co-lead will create more time for these efforts, as well as time to focus on leadership of the language design team.

Rotation is key

I see these changes to compiler team co-leads as fitting into a larger trend, one that I believe is going to be increasingly important in Rust: rotation of leadership. To me, the “corest of the core” value of the Rust project is the importance of “learning from others” – or as I put it in my rust-latam talk from 2019¹, “a commitment to a CoC and a culture that emphasizes curiosity and deep research”. Part of learning from others has to be actively seeking out fresh leadership and promoting them into positions of authority.

But rotation has a cost too

Another core value of Rust is recognizing the inevitability of tradeoffs². Rotating leadership is no exception: there is a lot of value in having the same people lead for a long time, as they accumulate all kinds of context and skills. But it also means that you are missing out on the fresh energy and ideas that other people can bring to the problem. I feel confident that Felix and Wesley will help to shape the compiler team in ways that I never would’ve thought to do.

Rotation with intention

The tradeoff between experience and enthusiasm makes it all the more important, in my opinion, to rotate leadership intentionally. I am reminded of Emily Dunham’s classic post on leaving a team³, and how it was aimed at normalizing the idea of “retirement” from a team as something you could actively choose to do, rather than just waiting until you are too burned out to continue.

Wesley, Felix, and I have discussed the idea of “staggered terms” as co-leads. The idea is that you serve as co-lead for two years, but we select one new co-lead per year, with the oldest co-lead stepping back. This way, at every point you have a mix of a new co-lead and someone who has already done it for one year and has some experience.

Lang and compiler need separate leadership

Beyond rotation, another reason I would like to step back from being co-lead of the compiler team is that I don’t really think it makes sense to have one person lead two teams. It’s too much work to do both jobs well, for one thing, but I also think it works to the detriment of the teams. I think the compiler and lang team will work better if they each have their own, separate “advocates”.

I’m actually very curious to work with pnkfelix and Wesley to talk about how the teams ought to coordinate, since I’ve always felt we could do a better job. I would like us to be actively coordinating how we are going to manage the implementation work at the same time as we do the design, to help avoid unbounded queues. I would also like us to be doing a better job getting feedback from the implementation and experimentation stage into the lang team.

You might think having me be the lead of both teams would enable coordination, but I think it can have the opposite effect. Having separate leads for compiler and lang means that those leads must actively communicate and avoids the problem of one person just holding things in their head without realizing other people don’t share that context.

Idea: Deliberate team structures that enable rotation

In terms of the compiler team structure, I think there is room for us to introduce “rotation” as a concept in other ways as well. Recently, I’ve been kicking around an idea for “compiler team officers”⁴, which would introduce a number of defined roles, each of which is setup in with staggered terms to allow for structured handoff. I don’t think the current proposal is quite right, but I think it’s going in an intriguing direction.

This proposal is trying to address the fact that a successful open source organization needs more than coders, but all too often we fail to recognize and honor that work. Having fixed terms is important because when someone is willing to do that work, they can easily wind up getting stuck being the only one doing it, and they do that until they burn out. The proposal also aims to enable more “part-time” leadership within the compiler team, by making “finer grained” duties that don’t require as much time to complete.

Oh-so-subtle plug: I really quite liked that talk. ↩︎

Though not always the tradeoffs you expect. Read the post. ↩︎

If you haven’t read it, stop reading now and go do so. Then come back. Or don’t. Just read it already. ↩︎

I am not sure that ‘officer’ is the right word here, but I’m not sure what the best replacement is. I want something that conveys respect and responsibility. ↩︎

Async Interview #8: Stjepan Glavina

2020-07-09T00:00:00+00:00

(removed)

Async interviews: my take thus far

2020-04-30T00:00:00+00:00

The point of the async interview series, in the end, was to help figure out what we should be doing next when it comes to Async I/O. I thought it would be good then to step back and, rather than interviewing someone else, give my opinion on some of the immediate next steps, and a bit about the medium to longer term. I’m also going to talk a bit about what I see as some of the practical challenges.

Focus for the immediate term: interoperability and polish

At the highest level, I think we should be focusing on two things in the “short to medium” term: enabling interoperability and polish.

By interoperability, I mean the ability to write libraries and frameworks that can be used with many different executors/runtimes. Adding the Future trait was a big step in this direction, but there’s plenty more to go.

My dream is that eventually people are able to write portable async apps, frameworks, and libraries that can be moved easily between async executors. We won’t get there right away, but we can get closer.

By polish, I mean “small things that go a long way to improving quality of life for users”. These are the kinds of things that are easy to overlook, because no individual item is a big milestone.

Polish in the compiler: diagnostics, lints, smarter analyses

Most of the focus of wg-async-foundations recently has been on polish work on the compiler, and we’ve made quite a lot of progress. Diagnostics have notably improved, and we’ve been working on inserting helpful suggestions, fixing compiler bugs, and improving efficiency. One thing I’m especially excited about is that we no longer rely on thread-local storage in the async fn transformation, which means that async-await is now compatible with #[no_std] environments and hence embedded development.

I want to give a 👏 “shout-out” 👏 to 👏 tmandry 👏 for leading this polish effort, and to point out that if you’re interested in contributing to the compiler, this is a great place to start! Here are some tips for how to get involved.

I think it’s also a good idea to be looking a bit more broadly. On Zulip, for example, LucioFranco suggested that we could add a lint to warn about things that should not be live across yields (e.g., lock guards), and I think that’s a great idea (there is a clippy lint already, though it’s specific to MutexGuard; maybe this should just be promoted to the compiler and generalized).

Another, more challenging area is improving the precision of the async-await transformation and analysis. Right now, for example, the compiler “overapproximates” what values are live across a yield, which sometimes yields spurious errors about whether a future needs to be Send or not. Fixing this is, um, “non-trivial”, but it would be a major quality of life improvement.

Polish in the standard library: adding utilities

When it comes to polish, I think we can extend that focus beyond the compiler, to the standard library and the language. I’d like to see the stdlib include building blocks like async-aware mutexes and channels, for example, as well as smaller utilities like task::block_on. YoshuaWuyts recently proposed adding some simple constructors, like future::{pending, ready} which I think could fit in this category. A key constraint here is that these should be libraries and APIs that are portable across all executors and runtimes.

Polish in the language: async main, async drop

Polish extends to the language, as well. The idea here is to find small, contained changes that fix specific pain points or limitations. Adding async fn main, as boats proposed, might be such an example (and I rather like the idea of #[test] that XAMPRocky proposed on internals).

Another change I think makes sense is to support async destructors, and I would go further and adopt find some solution to the concerns about RAII and async that Eliza Weisman raised. In particular, I think we need some kind of (optional) callback for values that reside on a stack frame that is being suspended.

Supporting interoperability: the stream trait

Let me talk a bit about what we can do to support interoperability. The first step, I think, is to do as Carl Lerche proposed and add the Stream trait into the standard library. Ideally, it would be added in exactly the form that it takes in futures 0.3.4, so that we can release a (minor) version of futures that simply re-exports the stream trait from the stdlib.

Adding stream enables interoperability in the same way that adding Future did: one can now define libraries that produce streams, or which operate on streams, in a completely neutral fashion.

But what about “attached streams”?

I said that I did not think adding Stream to the standard library would be controversial. This does not mean there aren’t any concerns. cramertj, in particular, raised a concern about the desire for “attached streams” (or “streaming streams”), as they are sometimes called.

To review, today’s Stream trait is basically the exact async analog of Iterator. It has a poll_next method that tries to fetch the next item. If the item is ready, then the caller of poll_next gets ownership of the item that was produced. This means in particular that the item cannot be a reference into the stream itself. The same is true of iterators today: iterators cannot yield references into themselves (though they can yield references into the collection that one is iterating over). This is both useful (it means that generic callers can discard the iterator but keep the items that were produced) and a limitation (it means that iterators/streams cannot reuse some internal buffer between iterations).

We should not block progress on streams on GATs

I hear the concern about attached streams, but I don’t think it should block us from moving forward. There are a few reasons for this. The first is pragmatic: fully resolving the design details around attached streams will require not only GATs, but experience with GATs. This is going to take time and I don’t think we should wait. Just as iterators are used everywhere in their current form, there are plenty of streaming appplications for which the current stream trait is a good fit.

Symmetry between sync and async is a valuable principle

There is another reason I don’t think we should block progress on attached streams. I think there is a lot of value to having symmetric sync/async versions of things in the standard library. I think boats had it right when they said that the guiding vision for Async I/O in Rust should be that one can take sync code and make it async by adding in async and await as necessary.

This isn’t to say that everything between sync and async must be the same. There will likely be things that only make sense in one setting or another. But I think that in cases where we see orthogonal problems – problems that are not really related to being synchronous or asynchronous – we should try to solve them in a uniform way.

In this case, the problem of “attached” vs “detached” is orthogonal from being async or sync. We want attached iterators just as much as we want attached streams – and we are making progress on the foundational features that will enable us to have them.

Once we have those features, we can design variants of Iterator and Stream that support attached iterators/streams. Perhaps these variants will deprecate the existing traits, or perhaps they will live alongside them (or maybe we can even find a way to extend the existing traits in place). I don’t know, but we’ll figure it out, and we’ll do it for both sync and async applications, well, synchronously¹.

Supporting interoperability: adding async read and write traits

I also think we should add AsyncRead and AsyncWrite to the standard library, also in roughly the form they have today in futures. In short, stable, interoperable traits for reading and writing enables a whole lot of libraries and middleware. After all, the main reason people are using async is to do I/O.

In contrast to Stream, I do expect this to be controversial, for a few reasons. But much like Stream, I still think it’s the right thing to do, and actually for much the same reasons.

First concern about async read: uninitialized memory

I know of two major concerns about adding AsyncRead and AsyncWrite. The first is around uninitialized memory. Just like its synchronous counterpart Read, the AsyncRead trait must be given a buffer where the data will be written. And, just like Read, the trait currently requires that this buffer must be zeroed or otherwise initialized.

You will probably recognize that this is another case of an “orthogonal problem”. Both the synchronous and asynchronous traits have the same issue, and I think the best approach is to try and solve it in an analogous way. Fortunately, sfackler has done just that. The idea that we discussed in our async interview is slowly making its way into RFC form.

So, in short, I think uninitialized memory is a “solved problem”, and moreover I think it was solved in the right way. Happy days.

Second concern about async read: io_uring

This is a relatively new thing, but a new concern about AsyncRead and AsyncWrite is that, fundamentally, they were designed around epoll-like interfaces. In these interfaces, you get a callback when data is ready and then you can go and write that data into a buffer. But in Linux 5.1 added a new interface, called io_uring, and it works differently. I won’t go into the details here, but boats gives a good intro in their blog post introducing the iou library.

My take here is somewhat similar to my take on why we should not block streams on GATs: io_uring is super promising, but it’s also super new. We have very little experience trying to build futures atop io_uring. I think it’s great that people are experimenting, and I think that we should encourage and spread those experiments. After some time, I expect that “best practices” will start to emerge, and at that time, we should try to codify those best practices into traits that we can add to the standard library.

In the meantime, though, epoll is not going anywhere. There will always be systems based on epoll that we will want to support, and we know exactly how to do that, because we’ve spend years tinkering with and experimenting with the AsyncRead and AsyncWrite. It’s time to standardize them and to allow people to build I/O libraries based on them. Once we know how best to handle io_uring, we’ll integrate that too.

All of that said, I would really like to learn more about io_uring and what it might mean, since I’ve not dug that deeply here. Maybe a good topic for a future async interview!

Looking further out

Looking further out, I think there are some bigger goals that we should be thinking about. The largest is probably adding some form of generator syntax. Anecdotally, I definitely hear about a fair number of folks working with streams and encountering difficulties doing so. As boats said, writing Stream implementations is a common reason that people have to interact directly with Pin, and that’s something we want to minimize. Further, in a synchronous setting, generator syntax would also give us syntactic support for writing iterators, which would benefit Rust overall. Enabling support for async functions in traits would also be high on my list, along with async closures. (The latter in particular would enable us to bring in a lot more utility methods and combinators for futures and streams, which would be great.)

I think though that it’s worth waiting a bit before we pursue these, for several reasons.

Generator syntax would build on a Stream trait anyhow, so having that in the standard libary is an obvious first step.

There is ongoing work on GATs and chalk integration in the context of wg-traits, and we’re making quite rapid progress there. The above items all potentially interact with GATs in some way, and it’d be nice if we had more of an implementation available before we started in on them (though it may not be a hard requirement).

Quite frankly, we don’t have the bandwidth. We need to work on building up an effective wg-async-foundations group before we can take on these sorts of projects. More on this point later.

Related and supporting efforts

There are a few pending features in the language team that I think may be pretty useful for async applications. I won’t go into detail here, but briefly:

impl Trait everywhere – finishing up the impl Trait saga will enable us to encode some cases where async fn in traits might be nice, such as Tower’s Service trait;

GATs, obviously – GATs arise around a number of advanced features.

procedural macros – we’ve been making slow and steady progress on stabilizing bits and pieces of the procedural macro story, and I think it’s a crucial enabler for async-related applications (and many others). Things like the #[runtime::main] and async-trait crate are only possible because of the procedural macro support. Both Carl and Eliza brought up the importance of offering procedural macros in expression position without requiring things like proc_macro_hack.

I’ll write more about these points in other posts, though.

Summing up: the list

To summarize, here is my list of what I think we should be doing in “async land” as our next steps:

Continued polish and improvements to the core compiler implementation.

Lints for common “gotchas”, like #[must_use] to help identify “not yield safe” types.

Extend the stdlib with mutexes, channels, task::block_on, and other small utilities.

Extend the Drop trait with “lifecycle” methods (“async drop”).

Add Stream, AsyncRead, and AsyncWrite traits to the standard library.

To be clear, this is a proposal, and I am very much interested in feedback on it, and I wouldn’t surprised to add or remove a thing or two. However, it’s not an arbitrary proposal: It’s a proposal that I’ve given a fair amount of thought to, and I feel reasonably certain about it.

There are a few things I’d be particularly interested to get feedback on:

If you maintain a library, what are some of the challenges you’ve encountered in making it operate generically across executors? What could help there?

Do you have ideas for useful bits of polish? Are there small changes or stdlib additions that would make everyday life that much easier?

A challenge: growing an effective working group

I want to close with a few comments on organization. One of the things we’ve been trying to figure out is how best to organize ourselves and create a sustainable working group.

Thus far, tmandry has been doing a great job at organizing the polish work that has been our focus, and I think we’ve been making good progress there, although there’s always a need for more folks to help out. (Shameless plug: Here are some tips for how to get involved!)

If we want to go beyond polish and get back to adding things to the standard library, especially things like the Stream or AsyncRead trait, we’re going to have to up our game. The same is true for some of the more diverse tasks that fall under our umbrella, such as maintaining the async book.

To do those tasks, we’re going to need more than coders. We need to take the time to draft designs, incorporate feedback, write the RFCs, and push things through to stabilization.

To be honest, I’m not entirely sure where that work is going to come from – but I believe we can do it! If this is something you’re interested in, definitely drop in the #wg-async-foundations stream on Zulip and say hello, and monitor the Inside Rust, as I expect we’ll be posting updates there from time to time.

Comments?

As always, please leave comments in the async interviews thread on users.rust-lang.org.

Footnotes

I couldn’t resist. ↩︎

Library-ification and analyzing Rust

2020-04-09T00:00:00+00:00

I’ve noticed that the ideas that I post on my blog are getting much more “well rounded”. That is a problem. It means I’m waiting too long to write about things. So I want to post about something that’s a bit more half-baked – it’s an idea that I’ve been kicking around to create a kind of informal “analysis API” for rustc.

The problem statement

I am interested in finding better ways to support advanced analyses that “layer on” to rustc. I am thinking of projects like Prusti or Facebook’s MIRAI, or even the venerable Clippy. All of these projects are attempts to layer on additional analyses atop Rust’s existing type system that prove useful properties about your code. Prusti, for example, lets you add pre- and post-conditions to your functions, and it will prove that they hold.

In theory, Rust is a great fit for analysis

There has been a trend lately of trying to adapt existing tools build initially for other languages to analyze Rust. Prusti, for example, is adapting an existing project called Viper, which was built to analyze languages like C# or Java. However, actually analyzing programs written in C# or Java in practice is often quite difficult, precisely because of the kinds of pervasive, mutable aliasing that those languages encourage.

Pervasive aliasing means that if you see code like

a.setCount(0);
it can be quite difficult to be sure whether that call might also modify the state of some variable b that happens to be floating around. If you are trying to enforce contracts like “in order to call this method, the count must be greater than zero”, then it’s important to know which variables are affected by calls like setCount.

Rust’s ownership/borrowing system can be really helpful here. The borrow checker rules ensure that it’s fairly easy to see what data a given Rust function might read or mutate. This is of course the key to how Rust is able to steer you away from data races and segmentation faults – but the key insight here is that those same properties can also be used to make higher-level correctness guarantees. Even better, many of the more complex analyses that analysis tools might need – e.g., alias analysis – map fairly well onto what the Rust compile already does.

In practice, analyzing Rust is a pain, but not because of the language

Unfortunately, while Rust ought to be a great fit for analysis tools, it’s a horrible pain to try and implement such a tool in practice. The problem is that there is lots of information that is needed to do this sort of analysis, and that information is not readily accessible. I’m thinking of information like the types of expressions or the kind of aliasing information that the borrow check gathers. Prusti, for example, has to resort to reading the debug output from the borrow checker and trying to reconstitute what is going on.

Ideally, I think what we would want is some way for analyzer tools to leverage the compiler itself. They ought to be able to use the compiler to do the parsing of Rust code, to run the borrow check, and to construct MIR. They should then be able to access the MIR and the accompanying borrow check results and use that to construct their own internal IRs (in practice, virtually all such verifiers would prefer to start from an abstraction level like MIR, and not from a raw Rust AST). They should be able to ask the compiler for information about the layout of data structures in memory and other things they might need, too, or for information about the type signature of other methods.

Enter: on-demand analysis and library-ification

A few years back, the idea of enabling analysis tools to interact with the compiler and request this sort of detailed information would have seemed like a fantasy. But the architectural work that we’ve been doing lately is actually quite a good fit for this use case.

I’m referring to two different trends:

on-demand analysis

library-ification

The first trend: On-demand analysis

On-demand analysis is basically the idea that we should structure the compiler’s internal core into a series of “queries”. Each query is a pure function from some inputs to an output, and it might be something like “parse this file” (yielding an AST) or “type-check this function” (yielding a set of errors). The key idea is that each query can in turn invoke other queries, and thus execution begins from the end state that we want to reach (“give me an executable”) and works its way backwards to the first few steps (“parse this file”). This winds up fitting quite nicely with incremental computation as well as parallel execution. (If you’d like to learn more about this, I gave a talk at PLISS that is available on YouTube.)

On-demand analysis is also a great fit for IDEs, since it allows us to do “just as much work” as we have to" in order to figure out key bits of information (e.g., “what is the type of the expression at the cursor”). The rust-analyzer project is based entirely on on-demand computation, using the salsa library.

On-demand analysis is a good fit for analysis tools

On-demand analysis is not only a good fit for IDEs: it’d be a great fit for tools like Prusti. If we had a reasonably stable API, tools like Prusti could use on-demand analysis to ask for just the results they need. For example, if they are analyzing a particular function, they might ask for the borrow check results. In fact, if we did it right, they could also leverage the same incremental compilation caches that the compiler is using, which would mean that they don’t even have to re-parse or recompute results that are already available from a previous build (or, conversly, upcoming builds can re-use results that Prusti computed when doing its analysis).

The second trend: Library-ification

There is a second trend in the compiler, one that’s only just begun, but one that I hope will transform the way rustc development feels by the time it’s done. We call it “library-ification”. The basic idea is to refactor the compiler into a set of independent libraries, all knit together by the query system.

One of the immediate drivers for library-ification is the desire to integrate [rust-analyzer] and rustc into one coherent codebase. Right now, the [rust-analyzer] IDE is basically a re-implementation of the front-end of the Rust compiler. It has its own parser, its own name resolver, and its own type-checker.

The vision: shared components

So we saw that, presently, rust-analyzer is effectively a re-implementation of many parts of the the Rust compiler. But it’s also interesting to look at what rust-analyzer does not have – its own trait system. rust-analyzer uses the [chalk] library to handle its trait system. And, of course, work is also underway to integrate chalk into rustc.

At the moment, chalk is a promising but incomplete project. But if it works as well as I hope, it points to a promising possibility. We can have the “trait solver” as a coherent block of functionality that is shared by multiple projects. And we could go further, so that we wind up with rustc and rust-analyzer being just two “small shims” over top the same core packages that make up the compiler. One shim would export those packages in a “batch compilation” format suitable for use by cargo, and one as a LSP server suitable for use by IDEs.

The vision: Clean APIs defined in terms of Rust concepts

Chalk is interesting for another reason, too. The API that Chalk offers is based around core concepts and should, I think, be fairly stable. For example, it communicates with the compiler via a trait, the RustIrDatabase, that allows it to query for specific bits of information about the Rust source (e.g., “tell me about this impl”), and doesn’t require a full AST or lots of specifics from its host. One of the benefits of this is that we can have a relatively simple testing harness that lets us write chalk unit tests in a simplified form of Rust syntax.

The fact that chalk’s unit tests are “mini Rust programs” is nice because they’re readable, but it’s important a deeper reason, too. I’ve many times experienced problems when using unit tests where the tests wind up tied very tightly to the structure of the code, and hence big swaths of tests get invalidated when doing refactoring, and it’s often quite hard to port them to the new interface. We don’t generally have to worry about this with rustc, since its tests are just example programs – and the same is true for Chalk, by and large. My sense is that one of the ways that we will know where good library boundaries lie will be our ability to write unit tests in a clear way.

Library-ification can help make rustc more accessible

Right now, many folks have told me that the rustc code base can be quite intimidating. There’s a lot of code. It takes a while to build and requires some custom setup to get things going (not to mention gobs of RAM). Although, like any large code-base, it is factored into several relatively independent modules, it’s not always obvious where the boundaries between those modules are, so it’s hard to learn it a piece at a time.

But imagine instead that rustc was composed of a relatively small number of well-defined libraries, with clear and well-documented APIs that separated them. Those libraries might be in separate repositories and they might not, but regardless you could jump into a single library and start working. It would have a clear API that connects it to the rest of the compiler, and a testing harness that lets you run unit tests that exercise that API (along of course with our existing suite of example programs, which serve as integration tests).

The benefits of course aren’t limited to new contributors. I really enjoy hacking on chalk because it’s a relatively narrow and pliable code base. It’s easy to jump from place to place and find what I’m looking for. In contrast, working on rustc feels much more difficult, even though I know the codebase quite well.

Library-ification will work best if APIs aren’t changing

One thing I want to emphasize. I think that this whole scheme will work best if we can find interfaces between components that are not changing all the time. Frequently changing interfaces would indicate that the modules between the compiler are coupled in ways we’d prefer to avoid, and it will make it harder for people to work within one library without having to learn the details of the others.

Libaries could be used by analysis tools as well

Now we come to the final step. If we imagine that we are able to subdivide rustc into coherent libraries, and that those libraries have relatively clean, stable APIs betwen them, then it is also plausible that we can start publishing those libraries on crates.io (or perhaps wrappers around them, with simplified and more limited APIs). This then starts to look sort of like the .NET Roslyn compiler – we are exporting the tools to help people analyze and understand Rust code for themselves. So, for example, Prusti could invoke rustc’s borrow checker and read its results directly, without having to resort to elaborate hacks.

On stability and semver

I’ve tossed out the term “stable” a few times throughout this post, so it’s worth putting in a few words for how I think stability would work if we went down this direction. I absolutely do not think we would want to commit to some kind of fixed, unchanging API for rustc or libraries used by rustc. In fact, in the early days, I imagine we’d just publish a new major version of each library with each Rust release, which would imply that you’d have to do frequent updates.

But once the APIs settle down – and, as I wrote, I really hope that they do – I think we would simply want to have meaningful semver, like any other library. In other words, we should always feel free to make breaking changes to our APIs, but we should announce when we do so, and I hope that we don’t have to do so frequently.

If this all really works out, I imagine we’d start to think about scheduling breaking changes in APIs, or finding alternatives that let us keep tooling working. I think that’d be a fine price to pay in exchange for having a host of powerful tooling available, but in any case it’s quite far away.

Conclusion

This post sketches out my vision for how Rust compiler development in the long term. I’d like to see a rustc based on a relatively small number of well-defined components that encapsulate major chunks of functionality, like “the trait system”, “the borrow checker”, or “the parser”. In the short term, these components should allow us to share code between rustc and rust-analyzer, and to make rustc more understandable. In the longer term, these components could even enable us to support a broad ecosystem of compiler tools and analyses.

Async Interview #7: Withoutboats

2020-03-10T00:00:00+00:00

Hello everyone! I’m happy to be posting a transcript of my async interview with withoutboats. This particularly interview took place way back on January 14th, but the intervening months have been a bit crazy and I didn’t get around to writing it up till now.

Video

You can watch the video on YouTube. I’ve also embedded a copy here for your convenience:

Next steps for async

Before I go into boats’ interview, I want to talk a bit about the state of async-await in Rust and what I see as the obvious next steps. I may still do a few more async interviews after this – there are tons of interesting folks I never got to speak to! – but I think it’s also past time to try and come to a consensus of the “async roadmap” for the rest of the year (and maybe some of 2021, too). The good news is that I feel like the async interviews highlighted a number of relatively clear next steps. Sometime after this post, I hope to post a blog post laying out a “rough draft” of what such a roadmap might look like.

History

withoutboats is a member of the Rust lang team. Starting around the beginning on 2018, they started looking into async-await for Rust. Everybody knew that we wanted to have some way to write a function that could suspend (await) as needed. But we were stuck on a rather fundamental problem which boats explained in the blog post “self-referential structs”. This blog post was the first in a series of posts that ultimately documented the design that became the Pin type, which describes a pointer to a value that can never be moved to another location in memory. Pin became the foundation for async functions in Rust. (If you’ve not read the blog post series, it’s highly recommended.) If you’d like to learn more about pin, boats posted a recorded stream on YouTube that explores its design in detail.

Vision for async

All along, boats has been motivated by a relatively clear vision: we should make async Rust “just as nice to use” as Rust with blocking I/O. In short, you should be able to write code much like you ever did, but adding making functions which perform I/O into async and then adding await here or there as needed.

Since 2018, we’ve made great progress towards the goal of “async I/O that is as easy as sync” – most notably by landing and stabilizing the async-await MVP – but we’re not there yet. There remain a number of practical obstacles that make writing code using async I/O more difficult than sync I/O. So the mission for the next few years is to identify those obstacles and dismantle them, one by one.

Next step: async destructors

One of the first obstacles that boats mentioned was extending Rust’s Drop trait to work better for async code. The Drop trait, for those who don’t know Rust, is a special trait in Rust that types can implement in order to declare a destructor (code which should run when a value goes out of scope). boats wrote a blog post that discusses the problem in more detail and proposes a solution. Since that blog post, they’ve refined the proposal in response to some feedback, though the overall shape remains the same. The basic idea is to extend the Drop trait with an optional poll_drop_ready method:

trait Drop { fn drop(&mut self); fn poll_drop_ready( self: Pin<&mut Self>, ctx: &mut Context<'_>, ) -> Poll<()> { Poll::Ready(()) } }
When executing an async fn, and a value goes out of scope, we will first invoke poll_drop_ready, and “await” if it returns anything other than Poll::Ready. This gives the value a chance to do async operations that may block, in preparation for the final drop. Once Poll::Ready is returned, the ordinary drop method is invoked.

This async-drop trait came up in early async interviews, and I raised Eliza’s use case with boats. Specifically, she wanted some way to offer values that are live on the stack a callback when a yield occurs and when the function is resumed, so that they can (e.g.) interact with thread-local state correctly in an async context. While distinct from async destructors, the issues are related because destructors are often used to manage thread-local values in a scoped fashion.

Adding async drop requires not only modifying the compiler but also modifying futures combinators to properly handle the new poll_drop_ready method (combinators need to propagate this poll_drop_ready to the sub-futures they contain).

Note that we wouldn’t offer any ‘guarantee’ that poll_drop_ready will run. For example, it would not run if a future is dropped without being resumed, because then there is no “async context” that can handle the awaits. However, like Drop, it would ultimately be something that types can “usually” expect to execute under ordinary circumstances.

Some of the use cases for async-drop include writers that buffer data and wish to ensure that the data is flushed out when the writer is dropped, transactional APIs, or anything that might do I/O when dropped.

block_on in the std library

One very small addition that boats proposed is adding block_on to the standard library. Invoking block_on(future) would block the current thread until future has been fully executed (and then return the resulting value). This is actually something that most async I/O code would never want to do – if you want to get the value from a future, after all, you should do future.await. So why is block_on useful?

Well, block_on is basically the most minimal executor. It allows you to take async code and run it in a synchronous context with minimal fuss. It’s really convenient in examples and documentation. I would personally like it to permit writing stand-alone test cases. Those reasons alone are probably good enough justification to add it, but boats has another use in mind as well.

async fn main

Every Rust program ultimately begins with a main somewhere. Because main is invoked by the surrounding C library to start the program, it also tends to be a place where a certain amount of “boilerplate code” can accumulate in order to “setup” the environment for the rest of the program. This “boilerplate setup” can be particularly annoying when you’re just getting started with Rust, as the main function is often the first one you write, and it winds up working differently than the others. A similar program effects smaller code examples.

In Rust 2018, we extended main so that it supports Result return values. This meant that you could now write main functions that use the ? operator, without having to add some kind of intermediate wrapper:

fn main() -> Result<(), std::io::Error> { let file = std::fs::File::create("output.txt")?; }
Unfortunately, async code today suffers from a similar papercut. If you’re writing an async project, most of your code is going to be async in nature: but the main function is always synchronous, which means you need to bridge the two somehow. Sometimes, especially for larger projects, this isn’t that big a deal, as you likely need to do some setup or configuration anyway. But for smaller examples, it’s quite a pain.

So boats would like to allow people to write an “async” main. This would then permit you to directly “await” futures from within the main function:

async fn main() { let x = load_data(22).await; } async fn load_data(port: usize) -> Data { ... }
Of course, this raises the question: since the program will ultimately run synchronized, how do we bridge from the async fn main to a synchronous main? This is where block_on comes in: at least to start, we can simply declare that the future generated by async fn main will be executed using block_on, which means it will block the main thread until main completes (exactly what we want). For simple programs and examples, this will be exactly what you want.

But most real programs will ultimately want to start some other executor to get more features. In fact, following the lead of the runtime crate, many executors already offer a procedural macro that lets you write an async main. So, for example, tokio and async-std offer attributes called #[tokio::main] and #[async_std::main] respectively, which means that if you have an async fn main program you can pick an executor just by adding the appropriate attribute:

#[tokio::main] // or #[async_std::main], etc async fn main() { .. }
I imagine that other executors offer a similar procedural macro – or if they don’t yet, they could add one. =)

(In fact, since async-std’s runtime starts implicitly in a background thread when you start using it, you could use async-std libraries without any additional setup as well.)

Overall, this seems pretty nice to me. Basically, when you write async fn main, you get Rust’s “default executor”, which presently is a very bare-bones executor suitable only for simple examples. To switch to a more full-featured executor, you simply add a #[foo::main] attribute and you’re off to the races!

(Side note #1: This isn’t something that boats and I talked about, but I wonder about adding a more general attribute, like #[async_runtime(foo)] that just desugars to a call like foo::main_wrapper(...), which is expected to do whatever setup is appropriate for the crate foo.)

(Side note #2: This also isn’t something that boats and I talked about, but I imagine that having a “native” concept of async fn main might help for some platforms where there is already a native executor. I’m thinking of things like GStreamer or perhaps iOS with Grand Central Dispatch. In short, I imagine there are environments where the notion of a “main function” isn’t really a great fit anyhow, although it’s possible I have no idea what I’m talking about.)

async-await in an embedded context

One thing we’ve not talked about very much in the interviews so far is using async-await in an embedded context. When we shipped the async-await MVP, we definitely cut a few corners, and one of those had to do with the use of thread-local storage (TLS). Currently, when you use async fn, the desugaring winds up using a private TLS variable to carry the Context about the current async task down through the stack. This isn’t necessary, it was just a quick and convenient hack that sidestepped some questions about how to pass in arguments when resuming a suspended function. For most programs, TLS works just fine, but some embedded environments don’t support it. Therefore, it makes sense to fix this bug and permit async fn to pass around its state without the use of TLS. (In fact, since boats and I talked, jonas-schievink opened PR #69033 which does exactly this, though it’s not yet landed.)

Async fn are implemented using a more general generator mechanism

You might be surprised when I say that we’ve already started fixing the TLS problem. After all, the reason we used TLS in the first place is that there were unresolved questions about how to pass in data when waking up a suspended function – and we haven’t resolved those problems. So why are we able to go ahead and use them to support TLS?

The answer is that, while the async fn feature is implemented atop a more general mechanism of suspendable functions¹, the full power of that mechanism is not exposed to end-users. So, for example, suspendable functions in the compiler permit yielding arbitrary values, but async functions always yield up (), since they only need to signal that they are blocked waiting on I/O, not transmit values. Similarly, the compiler’s internal mechanism will allow us to pass in a new Context when we wake up from a yield, and we can use that mechanism to pass in the Context argument from the future API. But this is hidden from the end-user, since that Context is never directly exposed or accessed.

In short, the suspended functions supported by the compiler are not a language feature: they are an implementation detail that is (currently) only used for async-await. This is really useful because it means we can change how they work, and it also means that we don’t have to make them support all possible use cases one might want. In this particular case, it means we don’t have to resolve some of the thorny questions about to pass in data after a yield, because we only need to use them in a very specific way.

Supporting generators (iterators) and async generators (streams)

One observation that boats raised is that people who write Async I/O code are interacting with Pin much more directly than was expected. The primary reason for this is that people are having to manually implement the Stream trait, which is basically the async version of an iterator. (We’ve talked about Stream in a number of previous async interviews.) I have also found that, in my conversations with users of async, streams come up very, very often. At the moment, consuming streams is generally fairly easy, but creating them is quite difficult. For that matter, even in synchronous Rust, manually implementing the Iterator traits is kind of annoying (although significantly easier than streams).

So, it would be nice if we had some way to make it easier to write iterators and streams. And, indeed, this design space has been carved out in other languages: the basic mechanism is to add a generator², which is some sort of function that can yield up a series of values before terminating. Obviously, if you’ve read up to this point, you can see that the “suspendable functions” we used to implement async await can also be used to support some form of generator abstractions, so a lot of the hard implementation work has been done here.

That said, support generator functions has been something that we’ve been shying away from. And why is that, if a lot of the implementation work is done? The answer is primarily that the design space is huge. I alluded to this earlier in talking about some of the questions around how to pass data in when resuming a suspended function.

Full generality considered too dang difficult

boats however contends that we are making our lives harder than they need to be. In short, if we narrow our focus from “create the perfect, flexible abstraction for suspended functions and coroutines” to “create something that lets you write iterators and streams”, then a lot of the thorny design problems go away. Now, under the covers, we still want to have some kind of unified form of suspended functions that can support async-await and generators, but that is a much simpler task.

In short, we would want to permit writing a gen fn (and async gen fn), which would be some function that is able to yield values and which eventually returns. Since the iterator’s next method doesn’t take any arguments, we wouldn’t need to support passing data in after yields (in the case of streams, we would pass in data, but only the Context values that are not directly exposed to users). Similarly, iterators and streams don’t produce a “final value” when they’re done, so these functions would always just return unit.

Adopting a more narrow focus wouldn’t close the door to exposing our internal mechanism as a first-class language feature at some point, but it would help us to solve urgent problems sooner, and it would also give us more experience to use when looking again at the more general task. It also means that we are adding features that makes writing iterators and streams as easy as we can make it, which is a good thing³. (In case you can’t tell, I was sympathetic to boats’ argument.)

Extending the stdlib with some key traits

boats is in favor of adding the “big three” traits to the standard library (if you’ve been reading these interviews, these traits will be quite familiar to you by now):

AsyncRead

AsyncWrite

Stream

Stick to the core vision: Async and sync should be analogous

One important point: boats believes (and I agree) that we should try to maintain the principle that the async and synchronous versions of the traits should align as closely as possible. This matches the overarching design vision of minimizing the differences between “async Rust” and “sync Rust”. It also argues in favor of the proposal that sfackler proposed in their interview, where we address the questions of how to handle uninitialized memory in an analogous way for both Read and AsyncRead.

We talked a bit about the finer details of that principle. For example, if we were to extend the Read trait with some kind of read_buf method (which can support an uninitialized output buffer), then this new method would have to have a default, for backwards compatibility reasons:

trait Read { fn read(&mut self, ...); fn read_buf(&mut self, buf: &mut BufMut<..>) { } }
This is a bit unfortunate, as ideally you would only implement read_buf. For AsyncRead, since the trait doesn’t exist yet, we could switch the defaults. But boats pointed out that this carries costs too: we would forever have to explain why the two traits are different, for example. (Another option is to have both methods default to one another, so that you can implement either one, which – combined with a lint – might be the best of both worlds.)

Generic interface for spawning

Some time back, boats wrote a post proposing global executors. This would basically be a way to add a function to the stdlib to spawn a task, which would then delegate (somehow) to whatever executor you are using. Based on the response to the post, boats now feels this is probably not a good short-term goal.

For one thing, there were a lot of unresolved questions about just what features this global executor should support. But for another, the main goal here is to enable libraries to write “executor independent” code, but it’s not clear how many libraries spawn tasks anyway – that’s usually done more at the application level. Libraries tend to instead return a future and let the application do the spawning (interestingly, one place this doesn’t work is in destructors, since they can’t return futures; supporting async drop, as discussed earlier, would help here.)

So it’d probably be better to revisit this question once we have more experience, particularly once we have the async I/O and stream traits available.

The futures crate

We discussed other possible additions to the standard library. There are a lot of “building blocks” currently in the futures library that are independent from executors and which could do well in the standard library. Some of the things that we talked about:

async-aware mutexes, clearly a useful building block

channels

though std channels are not the most loved, crossbeam’s are genreally preferred

interstingly, channel types do show up in public APIs from time to time, as a way to receive data, so having them in std could be particularly useful

In general, where things get more complex is whenever you have bits of code that either have to spawn tasks or which do the “core I/O”. These are the points where you need a more full-fledged reactor or runtime. But there are lots of utilities that don’t need that and which could profitably level in the std library.

Where to put async things in the stdlib?

One theme that boats and I did not discuss, but which has come up when I’ve raised this question with others, is where to put async-aware traits in the std hierarchy, particularly when there are sync versions. For example, should we have std::io::Read and std::io::AsyncRead? Or would it be better to have std::io::Read and something like std::async::io::Read (obviously, async is a keyword, so this precise path may not be an option). In other words, should we combine sync/async traits into the same space, but with different names, or should we carve out a space for “async-enabled” traits and use the same names? An interesting question, and I don’t have an opinion yet.

Conclusion and some of my thoughts

I always enjoy talking with boats, and this time was no exception. I think boats raised a number of small, practical ideas that hadn’t come up before. I do think it’s important that, in addition to stabilizing fundamental building blocks like AsyncRead, we also consider improvements to the ergonomic experience with smaller changes like async fn main, and I agree with the guiding principle that boats raised of keeping async and sync code as “analogous” as possible.

Comments?

There is a thread on the Rust users forum for this series.

Footnotes

In the compiler, we call these “suspendable functions” generators, but I’m avoiding that terminology for a reason. ↩︎

This is why I was avoiding using the term “generator” earlier – I want to say “suspendable functions” when referring to the implementation mechanism, and “generator” when referring to the user-exposed feature. ↩︎

though not one that a fully general mechanism necessarily precludes ↩︎

Async Interview #6: Eliza Weisman

2020-02-11T00:00:00+00:00

Hello! For the latest async interview, I spoke with Eliza Weisman (hawkw, mycoliza on twitter). Eliza first came to my attention as the author of the tracing crate, which is a nifty crate for doing application level tracing. However, she is also a core maintainer of tokio, and she works at Buoyant on the linkerd system. linkerd is one of a small set of large applications that were build using 0.1 futures – i.e., before async-await. This range of experience gives Eliza an interesting “overview” perspective on async-await and Rust more generally.

Video

You can watch the video on YouTube. I’ve also embedded a copy here for your convenience:

The days before question mark

Since I didn’t know Eliza as well, we started out talking a bit about her background. She has been using Rust for 5 years, and I was amused by how she characterized the state of Rust when she got started: pre-“question mark” Rust. Indeed, the introduction of the ? operator does feel one of those “turning points” in the history of Rust, and I’m quite sure that async-await will feel similarly (at least for some applications).

One interesting observation that Eliza made is that it feels like Rust has reached the point where there is nothing critically missing. This isn’t to say there aren’t things that need to be improved, but that the number of “rough edges” has dramatically decreased. I think this is true, and we should be proud of it – though we also shouldn’t relax too much. =) Getting to learn Rust is still a significant hurdle and there are still a number of things that are much harder than they need to be.

One interesting corrolary of this is that a number of the things that most affect Eliza when writing Async I/O code are not specific to async I/O. Rather, they are more general features or requirements that apply to a lot of different things.

Tokio’s needs

We talked some about what tokio needs from async Rust. As Eliza said, many of the main points already came up in my conversation with Carl:

async functions in traits would be great, but they’re hard

stabilizing streams, async read, and async write would be great

Communicating stability

One thing we spent a fair while discusing is how to best communicate our stability story. This goes beyond “semver”. semver tells you when a breaking change has been made, of course, but it doesn’t tell whether a breaking change will be made in the future – or how long we plan to do backports, and the like.

The easiest way for us to communicate stability is to move things to the std library. That is a clear signal that breaking changes will never be made.

But there is room for us to set “intermediate” levels of stability. One thing that might help is to make a public stability policy for crates like futures. For example, we could declare that the futures crate will maintain compatibility with the current Stream crate for the next year, or two ears.

These kind of timelines would be helpful: for example, tokio plans to maintain a stable interface for the next 5 years, and so if they want to expose traits from the futures crate, they would want a guarantee that those traits would be supported during that period (and ideally that futures would not release a semver-incompatible version of those traits).

Depending on community crates

When we talk about interoperability, we are often talking about core traits like Future, Stream, and AsyncRead. But as we move up the stack, there are other things where having a defined standard could be really useful. My go to example for this is the http crate, which defines a number of types for things like HTTP error codes. The types are important because they are likely to find their way in the “public interface” of libraries like hyper, as well as frameworks and things. I would like to see a world where web frameworks can easily be converted between frameworks or across HTTP implementations, but that would be made easier if there is an agreed upon standard for representing the details of a HTTP request. Maybe the http crate is that already, or can become that – in any case, I’m not sure if the stdlib is the right place for such a thing, or at least not for some time. It’s something to think about. (I do suspect that it might be useful to move such crates to the Rust org? But we’d have to have a good story around maintainance.) Anyway, I’m getting beyond what was in the interview I think.

Tracing

We talked a fair amount about the tracing library. Tracing is one of those libraries that can do a large number of things, so it’s kind of hard to concisely summarize what it does. In short, it is a set of crates for collecting scoped, structured, and contextual diagnostic information in Rust programs. One of the simplest use cases is to collect logging information, but it can also be used for things like profiling and any number of other tasks.

I myself started to become interesting in tracing as a possible tool to help for debugging and analyzing programs like rustc and chalk, where the “chain” that leads to a bug can often be quite complex and involve numerous parts of the compiler. Right now I tend to just dump gigabytes of logs into files and traverse them with grep. In so doing, I lose all kinds of information (like hierarchical information about what happens during what) that would make my life easier. I’d love a tool that let me, for example, track “all the logs that pertain to a particular function” while also making it easy to find the context in which a particular log occurred.

The tracing library got its start as a structured replacement for various hacky layers atop the log crate that were in use for debugging linkerd. Like many async applications, debugging a linkerd session involves correlating a lot of events that may be taking place at distinct times – or even distinct machines – but are still part of one conceptual “thread” of control.

tracing is actually a “front-end” built atop the “tracing-core” crate. tracing-core is a minimal crate that just stores a thread-local containing the current “event subscriber” (which processes the tracing events in some way). You don’t interact with tracing-core directly, but it’s important to the overall design, as we’ll see in a bit.

The tracing front-end contains a bunch of macros, rather like the debug! and info! you may be used to from the log crate (and indeed there are crates that let you use those debug! logs directly). The major one is the span! macro, which lets you declare that a task is happening. It works by putting a “placeholder” on the stack: when that placeholder is dropped, the task is done:

let s: Span = span!(...); // create a span `s` let _guard = s.enter(); // enter `s`, so that subsequent events take place "in" `s` let t: Span = span!(...); // create a *subspan* of `s` called `t` ...
Under the hood, all of these macros forward to the “subscripber” we were talking about later. So they might receive events like “we entered this span” or “this log was generated”.

The idea is that events that happen inside of a span inherit the context of that span. So, to jump back to my compiler example, I might use a span to indicate which function is currently being type-checked, which would then be associated with any events that took place.

There are many different possible kinds of subscribers. A subscriber might, for example, dump things out in real time, or it might just collectevents and log them later. Crates like tracing-timing record inter-event timing and make histograms and flamegraphs.

Integrating tracing with other libraries

It seems clear that tracing would work best if it is integrated with other libaries. I believe it is already integrated into tokio, but one could also imagine integrating tracing with rayon, which distributes tasks across worker threads to run in parallel. The goal there would be that we “link” the tasks so that events which occur in a parallel task inherit the context/span information from the task which spawned them, even though they’re running on another thread.

The idea here is not only that Rayon can link up your application events, but that Rayon can add its own debugging information using tracing in a non-obtrusive way. In the ‘bad old days’, tokio used to have a bunch of debug! logs that would let you monitor what was going on – but these logs were often confusing and really targeting internal tokio developers.

With the tracing crate, the goal is that libraries can enrich the user’s diagnostics. For example, the hyper library might add metadata about the set of headers in a request, and tokio might add information about which thread-pool is in use. This information is all “attached” to your actual application logs, which have to do with your business logic. Ideally, you can ignore them most of the time, but if that sort of data becomes relevant – e.g., maybe you are confused about why a header doesn’t seem to be being detected by your appserver – you can dig in and get the full details.

Integrating tracing with other logging systems

Eliza emphasized that she would really like to see more interoperability amongst tracing libraries. The current tracing crate, for example, can be easily made to emit log records, making it interoperable with the log crate (there is also a “logger” that implements the tracing interface).

Having a distinct tracing-core crate means that it possible for there to be multiple facades that build on tracing, potentially operating in quite different ways, which all share the same underlying “subscriber” infrastructure. (rayon uses the same trick; the rayon-core crate defines the underlying scheduler, so that multiple versions of the rayon ParallelIterator traits can co-exist without having multiple global schedulers.) Eliza mentioned that – in her ideal world – there’d be some alternative front-end that is so good it can replaces the tracing crate altogether, so she no longer has to maintain the macros. =)

RAII and async fn doesn’t always play well

There is one feature request for async-await that arises from the tracing library. I mentioned that tracing uses a guard to track the “current span”:

let s: Span = span!(...); // create a span `s` let _guard = s.enter(); // enter `s`, so that subsequent events take place "in" `s` ...
The way this works is that the guard returned by s.enter() adds some info into the thread-local state and, when it is dropped, that info is withdrawn. Any logs that occur while the _guard is still live are then decorated with this extra span information. The problem is that this mechanism doesn’t work with async-await.

As explained in the tracing README, the problem is that if an async await function yields during an await, then it is removed from the current thread and suspended. It will later be resumed, but potentially on another thread altogether. However, the _guard variable is not notified of these events, so (a) the thread-local info remains set on the original thread, where it may not longer belong and (b) the destructor which goes to remove the info will run on the wrong thread.

One way to solve this would be to have some sort of callback that _guard can receive to indicate that it is being yielded, along with another callback for when an async fn resumes. This would probably wind up being optional methods of the Drop trait. This is basically another feature request to making RAII work well in an async environment (in addition to the existing problems with async drop that boats described here).

Priorities as a linkerd hacker

I asked Eliza to think for a second about what priorities she would set for the Rust org while wearing her “linkerd hacker” hat – in other words, when acting not as a library designer, but as the author of an that relies on Async I/O. Most of the feedback here though had more to do with general Rust features than async-await specifically.

Eliza pointed out that linkerd hasn’t yet fully upgraded to use async-await, and that the vast majority of pain points she’s encountered thus far stem from having to use the older futures model, which didn’t integrate well with rust borrows.

The other main pain point is the compilation time costs imposes by the deep trait hierarchies created by tower’s service and layer traits. She mentioned hitting a type error that was so long it actually crashed her terminal. I’ve heard of others hitting similar problems with this sort of setup. I’m not sure yet how this is best addressed.

Another major feature request would be to put more work into procedural macros, especially in expression position. Right now proc-macro-hack is the tool of choice but – as the name suggests – it doesn’t seem ideal.

The other major point is that support for cargo feature flags in tooling is pretty minimal. It’s very easy to have code with feature flags that “accidentally” works – i.e., I depend on feature flag X, but I don’t specify it; it just gets enabled via some other dependency of mine. This also makes testing of feature flags hard. rustdoc integration could be better. All true, all challenging. =)

Comments?

There is a thread on the Rust users forum for this series.

Async Interview #5: Steven Fackler

2020-01-20T00:00:00+00:00

Hello! For the latest async interview, I spoke with Steven Fackler (sfackler). sfackler has been involved in Rust for a long time and is a member of the Rust libs team. He is also the author of a lot of crates, most notably tokio-postgres.

I particularly wanted to talk to sfackler about the AsyncRead and AsyncWrite traits. These traits are on everybody’s list of “important things to stabilize”, particularly if we want to create more interop between different executors and runtimes. On the other hand, in [tokio-rs/tokio#1744], the tokio project is considering adopting its own variant traits that diverge significantly from those in the futures crate, precisely because they have concerns over the design of the traits as is. This seems like an important area to dig into!

Video

You can watch the video on YouTube. I’ve also embedded a copy here for your convenience:

One note: something about our setup meant that I was hearing a lot of echo. I think you can sometimes hear it in the recording, but not nearly as bad as it was live. So if I seem a bit spacey, or take very long pauses, you might know the reason why!

Background: concerns on the async-read trait

So what are the concerns that are motivating tokio-rs/tokio#17144? There are two of them:

the current traits do not permit using uninitialized memory as the backing buffer;

there is no way to test presently whether a given reader supports vectorized operations.

This blog post will focus on uninitialized memory

sfackler and I spent most of our time talking about uninitialized memory. We did also discuss vectorized writes, and I’ll include some notes on that at the end, but by and large sfackler felt that the solutions there are much more straightforward.

Important: The same issues arise with the sync Read trait

Interestingly, neither of these issues is specific to AsyncRead. As defined today, the AsyncRead trait is basically just the async version of Read from std, and both of these concerns apply there as well. In fact, part of why I wanted to talk to sfackler specifically is that he is the author of an excellent paper document that covers the problem of using uninitialized memory in great depth. A lot of what we talked about on this call is also present in that document. Definitely give it a read.

Read interface doesn’t support uninitialized memory

The heart of the Read trait is the read method:

fn read(&mut self, buf: &mut [u8]) -> io::Result<usize>
This method reads data and writes it into buf and then – assuming no error – returns Ok(n) with the number n of bytes written.

Ideally, we would like it if buf could be an uninitialized buffer. After all, the Read trait is not supposed to be reading from buf, it’s just supposed to be writing into it – so it shouldn’t matter what data is in there.

Problem 1: The impl might read from the buf, even if it shouldn’t

However, in practice, there are two problems with using uninitialized memory for buf. The first one is relatively obvious: although it isn’t supposed to, the Read impl can trivially read from buf without using any unsafe code:

impl Read for MyReader { fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> { let x = buf[0]; ... } }
Reading from an uninitialized buffer is Undefined Behavior and could cause crashes, segfaults, or worse.

Problem 2: The impl might not really initialize the buffer

There is also a second problem that is often overlooked: when the Read impl returns, it returns a value n indicating how many bytes of the buffer were written. In principle, if buf was uninitialized to start, then the first n bytes should be written now – but are they? Consider a Read impl like this one:

impl Read for MyReader { fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> { Ok(buf.len()) } }
This impl has no unsafe code. It claims that it has initialized the entire buffer, but it hasn’t done any writes into buf at all! Now if the caller tries to read from buf, it will be reading uninitialized memory, and causing UB.

One subtle point here. The problem isn’t that the read impl could return a false value about how many bytes it has written. The problem is that it can lie without ever using any unsafe code at all. So if you are auditing your code for unsafe blocks, you would overlook this.

Constraints and solutions

There have been a lot of solutions proposed to this problem. sfackler and I talked about all of them, I think, but I’m going to skip over most of the details. You can find them either in the video or in in sfackler’s paper document, which covers much of the same material.

In this post, I’ll just cover what we said about three of the options:

First, adding a freeze operation.

This is in some ways the simplest, as it requires no change to Read at all.

Unfortunately, it has a number of limitations and downsides.

Second, adding a second read method that takes a &mut dyn BufMut dyn value.

This is the solution initially proposed in [tokio-rs/tokio#1744].

It has much to recommend it, but requires virtual calls in a core API, although initial benchmarks suggest such calls are not a performance problem.

Finally, creating a struct BufMuf in the stdlib for dealing with partially initialized buffers, and adding a read method for that.

This overcomes some of the downsides of using a trait, but at the cost of flexibility.

Digression: how to think about uninitialized memory

Before we go further, let me digress a bit. I think the common understanding of uninitialized memory is that “it contains whatever values happen to be in there at the moment”. In other words, you might imagine that when you first allocate some memory, it contains some value – but you can’t predict what that is.

This intuition turns out to be incorrect. This is true for a number of reasons. Compiler optimizations are part of it. In LLVM, for example, an uninitialized variable is not assigned to a fixed stack slot or anything like that. It is instead a kind of “free floating” “uninitialized” value, and – whenever needed – it is mapped to whatever register or stack slot happens to be convenient at the time for most optimal code. What this means in practice is that each time you try to read from it, the compiler will substitute some value, but it won’t necessarily be the same value every time. This behavior is justified by the C standard, which states that reading uninitialized memory is “undefined behavior”.

This can cause code to go quite awry. The canonical example in my mind is the case of a bounds check. You might imagine, for example, that code like this would suffice for legally accessing an array:

let index = compute_index(); if index < length { return &array[index]; } else { panic!("out of bounds"); }
However, if the value returned by compute_index is uninitialized, this is incorrect. Because in that case, index will also be “the uninitialized value”, and hence each access to it conceptually yields different values. So the value that we compare against length might not be the same value that we use to index into the array one line later. Woah.

But, as sfackler and I discussed, there are actually other layers that rely on uninitialized memory never being read even below the kernel. For example, in the linux kernel, the virtual memory system has a flag called MADV_FREE. This flag is used to mark virtual memory pages that are considered uninitialized. For each such virtual page, khe kernel is free to change the physical memory page at will – until the virtual page is written to. At that point, the memory is potentially initialized, and so the virtual page is pinned. What this means in practice is that when you get memory back from your allocator, each read from that memory may yield different values, unless you’ve written to it first.

For all these reasons, it is best to think of uninitialized memory not as having “some random value” but rather as having the value “uninitialized”. This is special value that can, sometimes, be converted to a random value when it is forced to (but, if accessed multiple times, it may yield different values each time).

If you’d like a deeper treatment, I recommend Ralf’s blog post.

Possible solution to read: Freeze operation

So, given the above, what is the freeze operation, and how could it help with handling uninitialized memory in the read API?

The general idea is that we could have a primitive called freeze that, given some (potentially) uninitialized value, converts any uninititalized bits into “some random value”. We could use this to fix our indexing, for example, by “freezing” the index before we compare against the length:

let index = freeze(compute_index()); if index < length { return &array[index]; } else { panic!("out of bounds"); }
In a similar way, if we have a reference to an uninitialized buffer, we could conceivably “freeze” that reference to convert it to a reference of random bytes, and then we can safely use that to invoke read. The idea would be that callers do something like this:

let uninitialized_buffer = ...; let buffer = freeze(uninitialized_buffer); let n = reader.read(&mut buffer)?; ...
If we could do this, it would be great, because the existing read interface wouldn’t have to change at all!

There are a few complications though. First off, there is no such freeze operation in LLVM today. There is talk of adding one, but that operation wouldn’t quite do what we need. For one thing, it freezes the value it is applied to, but it doesn’t apply through a reference. So you could use it to fix our array bounds length checking example, but you can’t use it to fix read – we don’t need to freeze the &mut [u8] reference, we need to fix the memory it refers to.

Secondly, that primitive would only apply to compiler optimizations. It wouldn’t protect against kernel optimizations like MADV_FREE. To handle that, we have to do something extra, such as writing one byte per memory page. That’s conceivable, of course, but there are some downsides:

It feels fragile. What if linux adds some new optimizations in the future, how will we work around those?

It feels disappointing. After all, MADV_FREE was presumably added because it allows this to be faster – and we all agree that given a “well-behaved” Read implementation, it should be reasonable.

It can be expensive. sfackler pointed out that it is sometimes common to “over-provision” your read buffers, such as creating a 16MB buffer, so as to avoid blocking. This is fairly cheap in practice, but only thanks to optimizations (like MADV_FREE) that allow that memory to be lazilly allocated and so forth. If we start writing a byte into every page of a 16MB buffer, you’re going to notice the difference.

For these reasons, sfackler felt like freeze isn’t the right answer here. It might be a useful primitive for things like array bounds checking, but it would be better if we could modify the Read trait in such a way that we permit the use of “unfrozen” uninitialized memory.

Incidentally, this is a topic we’ve hit on in previous async interviews. [cramertj and I talked about it][ctj2], for example. My own opinion has shifted – at first, I thought a freeze primitive was obviously a good idea, but I’ve come to agree with sfackler that it’s not the right solution here.

Fallback and efficient interoperability

If we don’t take the approach of adding a freeze primitive, then this implies that we are going to have to extend the Read trait with some of second method. Let’s call it read2 for short. And this raises an interesting question: how are we going to handle backwards compatibility?

In particular, read2 is going to have a default, so that existing impls of Read are not invalidated. And this default is going to have to fallback to calling read, since that is the only method that we can guarantee to exist. Since read requires a fully initialized buffer, this will mean that read2 will have to zero its buffer if it may be uninitialized. This by itself is ok – it’s no worse than today.

The problem is that some of the solutions discussed in sfackler’s doc can wind up having to zero the buffer multiple times, depending on how things play out. And this could be a big performance cost. That is definitely to be avoided.

Possible solution to read: Take a trait object, and not a buffer

Another proposed solution, in fact the one described in [tokio-rs/tokio#1744], is to modify read so it takes a trait object (in the case of the Read trait, we’d have to add a new, defaulted method):

fn read_buf(&mut self, buf: &mut dyn BufMut) -> io::Result<()>
The idea here is that BufMut is a trait that lets you safely access a potentially uninitialized set of buffers:

pub trait BufMut { fn remaining_mut(&self) -> usize; unsafe fn advance_mut(&mut self, cnt: usize); unsafe fn bytes_mut(&mut self) -> &mut [u8]; ... }
You might wonder why the definition takes a &mut dyn BufMut, rather than a &mut impl BufMut. Taking impl BufMut would mean that the code is specialized to the particular sort of buffer you are using, so that would potentially be quite a bit faster. However, it would also make Read not “dyn-safe”¹, and that’s a non-starter.

There are some nifty aspects to this proposal. One of them is that the same trait can to some extent “paper over” vectorized writes, by distributing the data written across buffers in a chain.

But there are some downsides. Perhaps most important is that requiring virtual calls to write into the buffer could be a significant performance hazard. Thus far, measurements don’t suggest that, but it seems like a cost that can only be recovered by heroic compiler optimizations, and that’s the kind of thing we prefer to avoid.

Moreover, the ability to be generic over vectorized writes may not be as useful as you might think. Often, the caller wants to know whether the underlying Read supports vectorized writes, and it would operate quite differently in that case. Therefore, it doesn’t really hurt to have two read methods, one for normal and one for vectorized writes.

Variant: use a struct, instead of a trait

The variant that sfackler prefers is to replace the BufMut trait with a struct.² The API of this struct would be fairly similar to the trait above, except that it wouldn’t make much attempt to unify vectorized and non-vectorized writes.

Basically, we’d have a struct that encapsulates a “partially initialized slice of bytes”. You could create such a struct from a standard slice, in which case all things are initialized, or you can create it from a slice of “maybe initialized” bytes (e.g., &mut [MaybeUninit]. There can also be convenience methods to create a BufMut that refers to the uninitialized tail of bytes from a Vec (i.e., pointing into the vector’s internal buffer).

The safe methods of the BufMut API would permit

writing to the buffer, which will track the bytes that were initialized;

getting access to a slice, but only one that is guaranteed to be initialized.

There would be unsafe methods for getting access to memory that may be uninitialized, or for asserting that you have initialized a big swath of bytes (e.g., by handing the buffer off to the kernel to get written to).

The buffer has state: it can track what has been initialized. This means that any given part of the buffer will get zeroed at most once. This ensures that fallback from the new read2 method to the old read method is reasonably efficient.

Sync vs async, how to proceed

So, given the above thoughts, how should we proceed with AsyncRead? sfackler felt that the question of how to handle uninitialized output buffers was basically “orthogonal” from the question of whether and when to add AsyncRead. In others, sfackler felt that the AsyncRead and Read traits should mirror one another, which means that we could add AsyncRead now, and then add a solution for uninitialized memory later – or we could do the reverse order.

One minor question has to do with defaults. Currently the Read trait requires an implementation of read – any new method (read_uninit or whatever) will therefore have to have a default implementation that invokes read. But this is sort of the wrong incentive: we’d prefer if users implemented read_uninit, and implemented read in terms of the new method. We could conceivably reverse the defaults for the AsyncRead trait to this preferred style. Alternatively, sfackler noted that we could make both read and read_uninit have a default implementation, one implementing in terms of the other. In this case, users would have to implement one or the other (implementing neither would lead to an infinite loop, and we would likely want a lint for that case).

We also discussed what it would mean it tokio adopted its own AsyncRead trait that diverged from std. While not ideal, sfackler felt like it wouldn’t be that big a deal either way, since it ought to be possible to efficiently interconvert between the two. The main constraint is having some kind of stateful entity that can remember the amount of uninitialized data, thus preventing the inefficient fallover behavior.

Is the ability to use uninitialized memory even a problem?

We spent a bit of time at the end discussing how one could gain data on this problem. There are two things that would be nice to know.

First, how big is the performance impact from zeroing? Second, how ergonomic is the proposed API to use in practice?

Regarding the performance impact, I asked the same question on tokio-rs/tokio#17144, and I did get back some interesting results, [which I summarized in this hackmd at the time][tokio-hackmd]. In short, hyper’s benchmarks show a fairly sizable impact, with uninitialized data getting speedups³ of 1.3-1.5x. Other benchmarks though are much more mixed, showing either no diference or small differences on the order of 2%. Within the stdlib, we found about a [7% impact on microbenchmarks][#26950].

Still, sfackler raised another interesting data point (both on the thread and in our call). He was pointing out #23820, a PR which rewrote read_to_end in the stdlib. The older implementation was simple and obvious, but suffered from massive performance cliffs related to the need to zero buffers. The newer implementation is fast, but much more complex. Using one of the APIs described above would permit us to avoid this complexity.

Regarding ergonomics, as ever, that’s a tricky thing to judge. It’s hard to do better than prototyping as well as offering the API on nightly for a time, so that people can try it out and give feedback.

Having the API on nightly would also help us to make branches of frameworks like tokio and async-std so we can do bigger measurements.

Higher levels of interoperability

sfackler and I talked a bit about what the priorities should be beyond AsyncRead. One of the things we talked about is whether there is a need for higher-level traits or libraries that expose more custom information beyond “here is how to read data”. One example that has come up from time to time is the need to know, for example, the URL or other information associated with a request.

Another example might be the role of crates like http, which aims to define Rust types for things like HTTP header codes that are fairly standard. These would be useful types to share across all HTTP implementations and libraries, but will we be able to achieve that sort of sharing without offering the crate as part of the stdlib (or at last part of the Rust org)? I don’t think we had a definitive answer here.

Priorities beyond async read

We next discussed what other priorities the Rust org might have around Async I/O. For sfackler, the top items would be

better support for GATs and async fn in traits;

some kind of generator or syntactic support for streams;

improved diagnostics, particularly around send/sync.

Conclusion

sfackler and I focused quite heavily on the AsyncRead trait and how to manage uninitialized memory. I think that it would be fair to summarize the main points of our conversation as:

we should add AsyncRead to the stdlib and have it mirror Read;

in general, it makes sense for the synchronous and asynchronous versions of the traits to be analogous;

we should extend both traits with a method that takes a BufMut struct to manage uninitialized output buffers, as the other options all have a crippling downside;

we should extend both traits with a “do you support vectorized output?” callback as well;

beyond that, the Rust org should focus heavily on diagnostics for async/await, but streams and async fns in traits would be great too. =)

Comments?

There is a thread on the Rust users forum for this series.

Appendix: Vectorized reads and writes

There is one minor subthread that I’ve skipped over – vectorized reads and writes. I skipped it in the blog post because this problem is somewhat simpler. The standard read interface takes a single buffer to write the data into. But a vectorized interface takes a series of buffers – if there is more data than will fit in the first one, then the data will be written into the second one, and so on until we run out of data or buffers. Vectorized reads and writes can be much more efficient in some cases.

Unfortunately, not all readers support vectorized reads. For that reason, the “vectorized read” method has a fallback: by default, it just calls the normal read method using the first non-empty buffer in the list. This is theoretically equal, but obviously it could be a lot less efficient – imagine that I have supplied one buffer of size 1K and one buffer of size 16K. The default vectorized read method will just always use that single 1K buffer, which isn’t great – but still, not much to be done about it. Some readers just cannot support vectorized reads.

The problem here then is that it would be nice if there were some way to detect when a reader supports vectorized reads. This would allow the caller to choose between a “vectorized” call path, where it tries to supply many buffers, or a single-buffer call path, where it just allocates a big buffer.

Apparently hyper will do this today, but using a heuristic: if a call to the vectorized read method returns just enough data to fit in the first buffer, hyper guesses that in fact vectorized reads are not supported, and switches dynamically to the “one big buffer” strategy. (Neat.)

There is perhaps a second, more ergonomic issue: since the vectorized read method has a default implementation, it is easy to forget to implement it, even if you would have been able to do so.

In any case, this problem is relatively easy to solve: we basically need to add a new method like

fn supports_vectorized_reads(&self) -> bool
to the trait.

The matter of decided whether or not to supply a default is a bit trickier. If you don’t supply a default, then everybody has to implement it, even if they just want the default behavior. But if you do, people who wished to implement the method may forget to do so – this is particularly unfortunate for reads that are wrapping another reader, which is a pretty common case.

Footnotes

Most folks say “object-safe” here, but I’m trying to shift our terminology to talk more about the dyn keyword. ↩︎

Carl Lerche proposed something similar on the tokio thread here. ↩︎

I am defining a “speedup” here as the ratio of U/Z, where U/Z are the throughput with uninitialized/zeroed buffers respectively. [tokio-hackmd]: https://hackmd.io/ukeyehx7Ta-6KhaVRFi2mg#Measuring-the-impact [#26950]: https://github.com/rust-lang/rust/pull/26950 ↩︎

Async Interview #4: Florian Gilcher

2020-01-13T00:00:00+00:00

Hello! For the latest async interview, I spoke with Florian Gilcher (skade). Florian is involved in the async-std project, but he’s also one of the founders of Ferrous Systems, a Rust consulting firm that also does a lot of trainings. In that capacity, he’s been teaching people to use async Rust now since Rust’s 1.0 release.

Video

You can watch the video on YouTube. I’ve also embedded a copy here for your convenience:

One note: something about our setup meant that I was hearing a lot of echo. I think you can sometimes hear it in the recording, but not nearly as bad as it was live. So if I seem a bit spacey, or take very long pauses, you might know the reason why!

Prioritize stability, read/write traits

The first thing we discussed was some background on async-std itself. From there we started talking about what the Rust org ought to prioritize. Florian felt like having stable, uniform AsyncRead and AsyncWrite traits would be very helpful, as most applications are interested in having access to a “readable/writable thing” but don’t care that much where the bytes are coming from.

He felt that Stream, while useful, might be somewhat lower priority. The main reason was that while streams are useful, in many of the applications that he’s seen, there wasn’t as much need to be generic over a stream. Of course, having a standard Stream trait would still be of use, and would enable libraries as well, so it’s not an argument not to do it, just a question of how to prioritize.

Prioritize diagnostics perhaps even more

Although we’ve done a lot of work on it, there continues to be a need for improved error diagnostics. This kind of detailed ergonomics work may indeed be the highest priority overall.

(A quick plug for the async await working group, which has been steadily making progress here. Big thanks especially to tmandry, who has been running the triage meetings lately, but also (in no particular order) csmoe, davidtwco, gilescope, and centril – and perhaps others I’ve forgotten (sorry!).)

Levels of stability and the futures crate

We discussed the futures crate for a while. In particular, the question of whether we should be “stabilizing” traits by moving them into the standard library, or whether we can use the futures crate as a “semi-stable” home. There are obviously advantages either way.

On the one hand, there is no clearer signal for stability than adding something to libstd. On the other, the future crate facade gives a “finer grained” ability to talk about semver.

One thing Florian noted is that the futures crate itself, although it has evolved a lot, has always maintained an internal consistency, which is good.

One other point Florian emphasized is that people really want to be building applications, so in some way the most important thing is to be moving towards stability, so they can avoid worrying about the sand shifting under their feet.

Deprioritize: Attached and detached streams

I asked Florian how much he thought it made sense to wait on things like streams until the GAT story is straightened out, so that we might have support for “attached” streams. He felt like it would be better to move forward with what we have now, and consider extensions later.

He noted an occasional tendency to try and create the single, perfect generic abstraction that can handle everything – while this can be quite elegant, it can sometimes also lead to really confusing interfaces that are complex to use.

Deprioritize: Special syntax for streams

I asked about syntactic support for generators, but Florian felt that it was too early to prioritize that, and that it would be better to focus first on the missing building blocks.

The importance of building and discovering patterns

Florian felt that we’re now in a stage where we’re transitioning a little. Until now, we’ve been tinkering about with the most primitive layers of the async ecosystem, such as the Future trait, async-await syntax, etc. As these primitives are stabilized, we’re going to see a lot more tinkering with the “next level up” of patterns. These might be questions like “how do I stop a stream?”, or “how do I construct my app?”. But it’s going to be hard for people to focus on these higher-level patterns (and in particular to find new, innovative solutions to them) until the primitives even out.

As these patterns evolve, they can be extracted into crates and types and shared and reused in many contexts. He gave the example of the async-task crate, which extracts out quite a bit of the complexity of managing allocation of an async task. This allows other runtimes to reuse that fairly standard logic. (Editor’s note: If you haven’t seen async-task, you should check it out, it’s quite cool.)

Odds and ends

We then discussed a few other features and how much to prioritize them.

Async fn in traits. Don’t rush it, the async-trait crate is a pretty reasonable practice and we can probably “get by” with that for quite a while.

Async closures. These can likely wait too, but they would be useful for stabilzing convenience combinators. On the other hand, those combinators often come attached to the base libaries you’re using.

Communicating over the futures crate

Returning to the futures crate, I raised the question of how best to help convey its design and stability requirements. I’ve noticed that there is a lot of confusion around its various parts and how they are meant to be used.

Florian felt like one thing that might be helpful is to break apart the facade pattern a bit, to help people see the smaller pieces. Currently the futures crate seems a bit like a monolithic entity. Maybe it would be useful to give more examples of what each part is and how it can be used in isolation, or the overall best practices.

Learning

Finally, I posed to Florian a question of how can help people to learn async coding. I’m very keen on the way that Rust manages to avoid hard-coding a single runtime, but one of the challenges that comes with that is that it is hard to teach people how to use futures without referencing a runtime.

We didn’t solve this problem (shocker that), but we did talk some about the general value in having a system that doesn’t make all the choices for you. To be quite honest I remember that at this point I was getting very tired. I haven’t listened back to the video because I’m too afraid, but hopefully I at least used complete sentences. =)

One interesting idea that Florian raised is that it might be really useful for people to create a “learning runtime” that is oriented not at performance but at helping people to understand how futures work or their own applications. Such a runtime might gather a lot of data, do tracing, or otherwise help in visualizing. Reading back over my notes, I personally find that idea sort of intriguing, particularly if the focus is on helping people learn how futures work early on – i.e., I don’t think we’re anywhere close to the point where you could take production app written against async-std and then have it use this debugging runtime. But I could imagine having a “learner’s runtime” that you start with initially, and then once you’ve got a feel for things, you can move over to more complex runtimes to get better performance.

Conclusion

I think the main points from the conversation were:

Diagnostics and documentation remain of very high importance. We shouldn’t get all dazzled with new, shiny things – we have to keep working on polish.

Beyond that, though, we should be working to stabilize building blocks so as to give more room for the ecosystem to flourish and develop. The AsyncRead/AsyncWrite traits, along with Stream, seem like plausible candidates.

We shouldn’t necessarily try to make those traits be as generic as possible, but instead focus on building something usable and simple that meets the most important needs right now.

We need to give time for people to develop patterns and best practices, and in particular to figure out how to “capture” them as APIs and crates. This isn’t really something that the Rust organization can do, it comes from the ecosystem, by library and application developers.

Comments?

There is a thread on the Rust users forum for this series.

Towards a Rust foundation

2020-01-09T00:00:00+00:00

In my #rust2020 blog post, I mentioned rather off-handedly that I think the time has come for us to talk about forming a Rust foundation. I wanted to come back to this topic and talk in more detail about what I think a Rust foundation might look like. And, since I don’t claim to have the final answer to that question by any means, I’d also like to talk about how I think we should have this conversation going forward.

Hat tip

Before going any further, I want to say that most of the ideas in this post arose from conversations with others. In particular, Florian Gilcher, Ryan Levick, Josh Triplett, Ashley Williams, and I have been chatting pretty reguarly, and this blog post generally reflects the consensus that we seemed to be arriving at (though perhaps they will correct me). Thanks also to Yehuda Katz and Till Schneidereit for lots of detailed discussions.

Why do we want a Rust foundation?

I think this is in many ways the most important question for us to answer: what is it that we hope to achieve by creating a Rust foundation, anyway?

To me, there are two key goals:

to help clarify Rust’s status as an independent project, and thus encourage investment from more companies;

to alleviate some practical problems caused by Rust not having a “legal entity” nor a dedicated bank account.

There are also some anti-goals. Most notably:

the foundation should not replace the existing Rust teams as a decision-making apparatus.

The role of the foundation is to complement the teams and to help us in achieving our goals. It is not to set the goals themselves.

Start small and iterate

You’ll notice that I’ve outlined a fairly narrow role for the foundation. This is no accident. When designing a foundation, just as when designing many other things, I think it makes sense for us to move carefully, a step at a time.

We should try to address immediate problems that we are facing and then give those changes some time to “sink in”. We should also take time to experiment with some of the various funding possibilities that are out there (some of which I’ll discuss later on). Once we’ve had some more experience, it should be easier for us to see which next steps make sense.

Another reason to start small is being able to move more quickly. I’d like to see us setup a foundation like the one I am discussing as soon as this year.

Goal #1: Clarifying Rust’s status as an independent project

So let’s talk a bit more about the two goals that I set forth for a Rust foundation. The first was to clarify Rust’s status as an independent project. In some sense, this is nothing new. Mozilla has from the get-go attempted to create an independent governance structure and to solicit involvement from other companies, because we know this makes Rust a better language for everyone.

Unfortunately, there is sometimes a lingering perception that Mozilla “owns” Rust, which can discourage companies from getting invested, or create the perception that there is no need to support Rust since Mozilla is footing the bill. Establishing a foundation will make official what has been true in practice for a long time: that Rust is an independent project.

We have also heard a few times from companies, large and small, who would like to support Rust financially, but right now there is no clear way to do that. Creating a foundation creates a place where that support can be directed.

Mozilla wants to support Rust… just not alone

Now, establishing a Rust foundation doesn’t mean that Mozilla plans to step back. After all, Mozilla has a lot riding on Rust, and Rust is playing an increasingly important role in how Mozilla builds our products. What we really want is a scenario where other companies join Mozilla in supporting Rust, letting us do much more.

In truth, this has already started to happen. For example, just this year Microsoft started sponsoring Rust’s CI costs and Amazon is paying Rust’s S3 bills. In fact, we recently added a corporate sponsors page to the Rust web site to acknowledge the many companies that are starting to support Rust.

Goal #2: Alleviating some practical difficulties

While the Rust project has its own governance system, it has never had its own distinct legal entity. That role has always been played by Mozilla. For example, Mozilla owns the Rust trademarks, and Mozilla is the legal operator for services like crates.io. This means that Mozilla is (in turn) responsible for ensuring that DMCA requests against those services are properly managed and so forth. For a long time, this arrangement worked out quite well for Rust. Mozilla Legal, for example, provided excellent help in drafting Rust’s trademark agreements and coached us through how to handle DMCA takedown requests (which thankfully have arisen quite infrequently).

Lately, though, the Rust project has started to hit the limits of what Mozilla can reasonably support. One common example that arises is the need to have some entity that can legally sign contracts “for the Rust project”. For example, we wished recently to sign up for Github’s Token Scanning program, but we weren’t able to figure out who ought to sign the contract.

Is token scanning by itself a burning problem? No. We could probably work out a solution for it, and for other similar cases that have arisen, such as deciding who should sign Rust binaries. But it might be a sign that it is time for the Rust project to have its own legal entity.

Another practical difficulty: Rust has no bank account

Another example of a “practical difficulty” that we’ve encountered is that Rust has no bank account. This makes it harder for us to arrange for joint sponsorship and support of events and other programs that the Rust program would like to run. The most recent example is the Rust All Hands. Whereas in the past Mozilla has paid for the venue, catering, and much of the airfare by itself, this year we are trying to “share the load” and have multiple companies provide sponsorship. However, this requires a bank account to collect and pool funds. We have solved the problem for this year, but it would be easier if the Rust organization had a bank account of its own. I imagine we would also make use of a bank account to fund other sorts of programs, such as Increasing Rust’s Reach.

On paying people and contracting

One area where I think we should move slowly is on the topic of employing people and hiring contractors. As a practical matter, the foundation is probably going to want to employ some people. For example, I suspect we need an “operations manager” to help us keep the wheels turning (this is already a challenge for the core team, and it’s only going to get worse as the project grows). We may also want to do some limited amount of contracting for specific purposes (e.g., to pay for someone to run a program like Increasing Rust’s Reach, or to help do data crunching on the Rust survey).

The Rust foundation should not hire developers, at least to start

But I don’t think the Rust foundation should do anything like hiring full-time developers, at least not to start. I would also avoid trying to manage larger contracts to hack on rustc. There are a few reasons for this, but the biggest one is simply that it is expensive. Funding that amount of work will require a significant budget, which will require significant fund-raising.

Managing a large budget, as well as employees, will also require more superstructure. If we hire developers, who decides what they should work on? Who decides when it’s time to hire? Who decides when it’s time to fire?

This is a bit difficult: on the one hand, I think there is a strong need for more people to get paid for their work on Rust. On the other hand, I am not sure a foundation is the right institution to be paying them; even if it were, it seems clear that we don’t have enough experience to know how to answer the sorts of difficult questions that will arise as a result. Therefore, I think it makes sense to fall back on the approach to “start small and iterate” here. Let’s create a foundation with a limited scope and see what difference it makes before we make any further decisions.

Some other things the foundation wouldn’t do

I think there are a variety of other things that a hypothetical foundation should not do, at least not to start. For example, I think the foundation should not pay for local meetups nor sponsor Rust conferences. Why? Well, for one thing, it’ll be hard for us to come up with criteria on when to supply funds and when not to. For another, both meetups and conferences I think will do best if they can forge strong relationships with companies directly.

However, even if there are things that the Rust foundation wouldn’t fund or do directly, I think it makes a lot of sense to collect a list of the kinds of things it might do. If nothing else, we can try to offer suggestions for where to find funding or obtain support, or perhaps offer some lightweight “match-making” role.

We should strive to have many kinds of Rust sponsorship

Overall, I am nervous about a situation in which a Rust Foundation comes to have a kind of “monopoly” on supporting the Rust project or Rust-flavored events. I think it’d be great if we can encourage a wider variety of setups. First and foremost, I’d like to see more companies that use Rust hiring people whose job description is to support the Rust project itself (at least in part). But I think it could also work to create “trade associations” where multiple companies pool funds to hire Rust developers. If nothing else, it is worth experimenting with these sorts of setups to help gain experience.

We should create a “project group” to figure this out

Creating a foundation is a complex task. In this blog post, I’ve just tried to sketch the “high-level view” of what responsiblities I think a foundation might take on and why (and which I think we should avoid or defer). But I left out a lot of interesting details: for example, should the Foundation be a 501(c)(3) (a non-profit, in other words) or not? Should we join an umbrella organization and – if so – which one?

The traditional way that the Rust project makes decisions, of course, is through RFCs, and I think that a decision to create a foundation should be no exception. In fact, I do plan to open an RFC about creating a foundation soon. However, I don’t expect this RFC to try to spell out all the details of how a foundation would work. Rather, I plan to propose creating a project group with the goal of answering those questions.

In short, I think the core team should select some set of folks who will explore the best design for a foundation. Along the way, we’ll keep the community updated with the latest ideas and take feedback, and – in the end – we’ll submit an RFC (or perhaps a series of RFCs) with a final plan for the core team to approve.

Feedback

OK, well, enough about what I think. I’m very curious (and a bit scared, I won’t lie) to hear what people think about the contents of this post. To collect feedback, I’ve created a thread on internals. As ever, I’ll read all the responses, and I’ll do my best to respond where I can. Thanks!

Async Interview #3: Carl Lerche

2019-12-23T00:00:00+00:00

Hello! For the latest async interview, I spoke with Carl Lerche (carllerche). Among many other crates¹, Carl is perhaps best known as one of the key authors behind tokio and mio. These two crates are quite widely used through the async ecosystem. Carl and I spoke on December 3rd.

Video

You can watch the video on YouTube. I’ve also embedded a copy here for your convenience:

Background: the mio crate

One of the first things we talked about was a kind of overview of the layers of the “tokio-based async stack”.

We started with the mio crate. mio is meant to be the “lightest possible” non-blocking I/O layer for Rust. It basically exposes the “epoll” interface that is widely used on linux. Windows uses a fundamentally different model, so in that case there is a kind of compatibility layer, and hence the performance isn’t quite as good, but it’s still pretty decent. mio “does the best it can”, as Carl put it.

The tokio crate builds on mio. It wraps the epoll interface and exposes it via the Future abstraction from std. It also offers other things that people commonly need, such as timers.

Finally, bulding atop tokio you find tower, which exposes a “request-response” abstraction called Service. tower is similar to things like finagle or rack. This is then used by libraries like hyper and tonic, which implement protocol servers (http for hyper, gRPC for tonic). These protocol servers internally use the tower abstractions as well, so you can tell hyper to execute any Service.

One challenge is that it is not yet clear how to adapt tower’s Service trait to std::Future. It would really benefit from support of async functions in traits, in particular, which is difficult for a lot of reasons. The current plan is to adopt Pin and to require boxing and dyn Future values if you wish to use the async fn sugar. (Which seems like a good starting place, -ed.)

Returning to the overall async stack, atop protocol servers like hyper, you find web frameworks, such as warp – and (finally) within those you have middleware and the actual applications.

How independent are these various layers?

I was curious to understand how “interconnected” these various crates were. After all, while tokio is widely used, there are a number of different executors out there, both targeting different platforms (e.g., Fuchsia) as well as different trade-offs (e.g., async-std). I’m really interested to get a better understanding of what we can do to help the various layers described above operate independently, so that people can mix-and-match.

To that end, I asked Carl what it would take to use (say) Warp on Fuchsia. The answer was that “in principle” the point of Tower is to create just such a decoupling, but in practice it might not be so easy.

One of the big changes in the upcoming tokio 0.2 crate, in fact, has been to combine and merge a lot of tokio into one crate. Previously, the components were more decoupled, but people rarely took advantage of that. Therefore, tokio 0.2 combined a lot of components and made the experience of using them together more streamlined, although it is still possible to use components in a more “standalone” fashion.

In general, to make tokio work, you need some form of “driver thread”. Typically this is done by spawning a background thread, you can skip that and run the driver yourself.

The original tokio design had a static global that contained this driver information, but this had a number of issues in practice: the driver sometimes started unexpectedly, it could be hard to configure, and it didn’t work great for embedded environments. Therefore, the new system has switched to an explicitly launch, though there are procedural macros #[tokio::main] or #[tokio::test] that provide sugar if you prefer.

What should we do next? Stabilize stream.

Next we discussed which concrete actions made sense next. Carl felt that an obvious next step would be to stabilize the Stream trait. As you may recall, cramertj and I discussed the Stream trait in quite a lot of detail – in short, the existing design for Stream is “detached”, meaning that it must yield up ownership of each item it produces, much like an Iterator. It would be nice to figure out the story for “attached” streams that can re-use internal buffers, which are a very common use case, especially before we create syntactic sugar.

Carl’s motivation for a stable Stream is in part that he would like to issue a stable tokio release, ideally in Q3 of 2020, and Stream would be a part of that. If there is no Stream trait in the standard libary, that complicates things.

One thing we didn’t discuss, but which I personally would like to understand better, is what sort of libraries and infrastructure might benefit from a stabilized Stream. For example, “data libraries” like hyper mostly want a trait like AsyncRead to be stabilized.

About async read

Next we discussed the AsyncRead trait a little, though not in great depth. If you’ve been following the latest discussion, you’ll have seen that there is a tokio proposal to modify the AsyncRead traits used within tokio. There are two main goals here:

to make it safe to pass an uninitialized memory buffer to read

to better support vectorizing writes

However, there isn’t a clear consensus on the thread (at least not the last time I checked) on the best alternative design. The PR itself proposes changing from a &mut [u8] buffer (for writing the output into) to a dyn trait value, but there are other options. Carl for example proposed using a concrete wrapper struct instead, and adding methods to test for vectorization support (since outer layers may wish to adopt different strategies based on whether vectorization works).

One of the arguments in favor of the current design from the futures crate is that it maps very cleanly to the Read trait from the stdlib ([cramertj advanced this argument][c3], for example). Carl felt that the trait is already quite different (e.g., notably, it uses Pin) and that these more “analogous” interfaces could be made with defaulted helper methods instead. Further, he felt that async applications tend to prize performance more highly than synchronous ones, so the importance and overhead of uninitialized memory may be higher.

About async destructors and other utilities

We discussed async destructors. Carl felt that they would be a valuable thing to add for sure. He felt that the “general design” proposed by boats would be reasonable, although he thought there might be a bit of a duplication issue if you have both a async drop and a sync drop. A possible solution would be to have a prepare_to_drop async method that gives the object time to do async preparations, and then to always run the sync drop afterwards.

We also discussed a few utility methods like select!, and Carl mentioned that a lot of the ecosystem is currently using things like proc-macro-hack to support these, so perhaps a good thing to focus on would be improving procedural macro support so that it can handle expression level macros more cleanly.

Comments?

There is a thread on the Rust users forum for this series.

Footnotes

I think [loom] looks particularly cool. [loom]: https://crates.io/crates/loom ↩︎

Async Interview #2: cramertj, part 3

2019-12-11T00:00:00+00:00

This blog post is continuing my conversation with cramertj. This will be the last post.

In the first post, I covered what we said about Fuchsia, interoperability, and the organization of the futures crate.

In the second post, I covered cramertj’s take on the Stream, AsyncRead, and AsyncWrite traits. We also discused the idea of attached streams and the imporance of GATs for modeling those.

In this post, we’ll talk about async closures.

You can watch the video on YouTube.

Async closures

Next we discussed async closures. You may have noticed that while you can write an async fn:

async fn foo() { ... }
you cannot write the analogous syntax with closures:

let foo = async || ...;
Such a thing would often be useful, especially when writing the combinators on futures and streams that one might expect (like map and so forth). Unfortunately, async closures turn out to be somewhat more complex than their synchronous counterparts – to get the behavior we probably want, it turns out that they too would require some support for generic associated types (GAT), because they sort of want to be “attached closures”.

An example using iterator

To see the problem, let’s start with a synchronous example using Iterator. Here is some code that uses for_each to process each datum in the iterator and – along the way – it increments a counter found on the stack:

fn process_count(iterator: impl Iterator<Item = Datum>) { let mut counter = 0; iterator.for_each(|data| { counter += 1 process_datum(datum); }); use(counter); }
So what is actually happening when we compile this? The closure expression actually compiles to a struct that implements the FnMut trait. This struct will hold a reference to the counter variable. So in practice the desugared form might look like:

fn process_count(iterator: impl Iterator<Item = Datum>) { let mut counter = 0; iterator.for_each(ClosureStruct { counter: &mut counter |}) use(counter); }
The line counter += 1 is compiled then to the equivalent of *self.counter += 1:

impl FnMut<Datum> for ClosureStruct { type Output = (); fn call(&mut self, datum: Datum) { *self.counter += 1; process_datum(datum); } }
Converting the example to use stream

So what would happen if we were using an async closure? The ClosureStruct would still be constructed, presumably, in the same way. But the closure trait no longer directly performs the action. Instead, when you call the closure, you get back a future the performs the action; that future is going to need to have a reference to counter too, and that comes from self. So that means that the type of this future is going to have to hold a reference to self, which means that the impl would have to look something like this:

impl AsyncFnMut<Datum> for ClosureStruct { type Future<'s> = ClosureFuture<'s>; fn call<'s>(&'s mut self, datum: Datum) -> ClosureFuture<'s> { ClosureFuture::new(&mut self.counter, datum) } }
As you can see, modeling this properly requires GATs. In fact, async closures are basically “attached” closures which return a value that borrows from self. (And, just as attached iterators might sometimes be useful, I’ve found that sometimes I have need of an attached closure in synchronous code as well.)

What you can write today

The only thing you can write today is a closure that returns an async block:

let foo = || async move { ... };
But this has rather different semantics. In this case, for example, we would be copying the current value of counter into the future, and not holding a reference to the counter (and if you tried to hold a reference, you’ll get an error).

Conclusion

This wraps up my 3-part summary of my conversation with cramertj. Looking back, I think the main take-aways are:

We could stabilize AsyncRead and AsyncWrite and resolve the questions of uninitialized memory (and presumably vectorized writes, which we didn’t discuss explicitly) in some analogous way with the sync version of the traits.

Stream and async closures would benefit from being “attached”, which requires us to make progress on GATs.

In particular, we would not want to add generator syntax until we have a convincing and complete story.

Similarly, until the async closures story is more complete, we probably want to hold off on adding too many utility functions in the stdlib. Auxiliary libraries like futures allow us to introduce such functions and later make changes.

The select! macro is cool and everybody should read the async book chapter to learn why. =)

Comments?

There is a thread on the Rust users forum for this series.

Async Interview #2: cramertj, part 2

2019-12-10T00:00:00+00:00

This blog post is continuing my conversation with cramertj.

In the first post, I covered what we said about Fuchsia, interoperability, and the organization of the futures crate. This post covers cramertj’s take on the Stream trait as well as the AsyncRead and AsyncWrite traits.

You can watch the video on YouTube.

The need for “streaming” streams and iterators

Next, cramertj and I turned to discussing some of the specific traits from the futures crate. One of the traits that we covered was Stream. The Stream trait is basically the asynchronous version of the Iterator trait. In (slightly) simplified form, it is as follows:

pub trait Stream { type Item; fn poll_next( self: Pin<&mut Self>, cx: &mut Context<'_>, ) -> Poll<Option<Self::Item>>; }
The main concern that cramertj raised with this trait is that, like Iterator, it always gives ownership of each item back to its caller. This falls out from its structure, which requires the implementor to specify an Item type, and that Item type cannot borrow from the self reference given to poll_next.

In practice, many stream/iterator implementations would be more efficient if they could have some internal storage that they re-use over and over. For example, they might have an internal buffer, and when poll_next is called, they would give back (upon completion) a reference to that buffer. The idea would be that once poll_next is called again, they would start to re-use the same buffer.

Terminology note: Detached/attached instead of “streaming”

The idea of having an iterator that re-uses an internal buffer has come up before. In that context, it was often called a “streaming iterator”, which I guess means that we want a “streaming stream”. This is pretty clearly a suboptimal term.

In the call, I mentioned the term “detached”, which I sometimes use to refer to the current Iterator/Stream. The idea is that Item that gets returned by Stream is “detached” from self, which means that it can be stored and moved about independently from self. In contrast, in a “streaming stream” design, the return value may be borrowed from self, and hence is “attached” – it can only be used so long as the self reference remains live.

I’m not really sure that I care for this terminology. I sort of prefer “owned/borrowing iterator”, where the idea is in an owned iterator, the iterator transfers ownership of the data to you, and in borrowing iterator, the data you get back is borrowed from the iterator itself. However, I fear that these terms will be confused for the distinction between vec.into_iter() and vec.iter(). Both of these methods exist today, of course, and they both yield “detached” iterators; however, the former takes ownership of vec and the latter borrows from it. The key point is that vec.iter() is giving back borrowed values, but they are borrowed from the vector, not from the iterator.

(One final note is that this same concept of ‘attached’ vs ‘detached’ will come up when discussing async closures again, which further argues for using terminology other than “streaming”.)

The natural way to write “attached” streams is with GATs

In any case, the challenge here is that, without generic associated types, there is no nice way to write the “attached” (or “streaming”) version of Stream. You really want to be able to write a definition like:

trait AttachedStream { type Item<'s> where Self: 's; // ^^^^ ^^^^^^^^^^^^^^ (we likely need an annotation like this // | too, for reasons I'll cover in an appendix) // note the `'s` here! fn poll_next<'s>( self: Pin<&'s mut Self>, cx: &mut Context<'_>, ) -> Poll<Option<Self::Item<'s>>>; // ^^^^ // `'s` is the lifetime of the `self` reference. // Thus, the `Item` that gets returned may // borrow from `self`. }
“Attached” streams would be used differently than the current ones

There are real implications to adopting an “attached” definition of stream or iterator. In short, particularly in a generic context where you don’t know all the types involved, you wouldn’t be able to get back two values from an “attached” stream/iterator at the same time, whereas you can with the “detached” streams and iterators we have today.

For the most common use case of iterating over each element in turn, this doesn’t matter, but it’s easy to define functions that rely on it. Let me illustrate with Iterator since it’s easier. Today, this code compiles:

/// Returns the next two elements in the iterator. /// Panics if the iterator doesn't have at least two elements. fn first_two(iterator: I) -> (I::Item, I::Item) where I: Iterator, { let first_item = iterator.next().unwrap(); let second_item = iterator.next().unwrap(); (first_item, second_item) }
However, given an “attached” iterator design, the first call to next would “borrow” iterator, and hence you could not call next() again so long as first_item is still in use.

Concerns with blocking the streaming trait

If I may editorialize a bit, in re-watching the video, I had a few thoughts:

First, I don’t want to block a stable Stream on generic associated types. I do think we should prioritize shipping GATs and I would expect to see progress nex year, but I think we need some form of Stream sooner than that.

Second, the existing Stream is very analogous to Iterator. Moreover, there has been a long-standing desire for attached iterators. Therefore, it seems reasonable to move forward with stabilizing stream today, and then expect to revisit both traits in a consistent fashion once generic associated types are available.

“Detached” streams can be converted into “attached” ones

Let’s assume then that we choose to stabilize Stream as it exists today. Then we may want to add an AttachedStream later on. In principle, it should then be possible to add a “conversion” trait such that anything which implements Steam also implements AttachedStream:

impl AttachedStream for S where S: Stream, { type Item<'_> = S::Item; fn poll_next<'s>( self: Pin<&'s mut Self>, cx: &mut Context<'_>, ) -> Poll>> { Stream::poll_next(self, cx) } }
~~The idea here is that the AttachedStream trait gives the possibility of returning values that borrow from self, but it doesn’t require that the returned values do so.~~

As far as I know, the above scheme above would work. In general, interconversion traits like these sometimes are tricky around coherence, but you can typically get away with “one” such impl. It would mean that types can implement AttachedStream if they need to re-use an internal buffer and Stream if they do not, which is a reasonable design. (I’d be curious to know if there are fatal flaws here.)

Things that consume streams would typically want an attached stream

One downside of adding Stream now and AttachedStream later is that functions which consume streams would at first all be written to work with Stream, when in fact they probably would later want to be rewritten to take AttachedStream. In other words, given some code like:

fn consume_stream(s: impl Stream) { .. }
it is quite likely that the signature should be impl AttachedStream. The idea is that you only want to “consume” a stream if you need to have two items from the stream existing at the same time. Otherwise, if you’re jus going to iterate over the stream one element at a time, attached stream is the more general variant.

Syntactic support for streams and iterators

cramertj and I didn’t talk too much about it directly, but there have been discussion about adding two forms of syntactic support for streams/iterators. The first would be to extend the for loop so that it works over streams as well, as boats covers in their blog post on for await loops.

The second would be to add a new form of “generator”, as found in many other languages. The idea would be to introduce a new form of function, written gen fn in synchronous code and async gen fn in asynchronous code, that can contain yield statements. Calling such a function would yield an impl Iterator or impl Stream, for sync and async respectively.

One point that cramertj made is that we should hold off on adding syntactic support until we have some form of “attached” stream trait – or at least until we have a fairly clear idea what its design will be. The idea is that we would likely want (e.g.) a for-await sugar to operate over both detached and attached streams, and similarly we may want gen fn to generate attached streams, or to have the ability to do so.

In fact, generators give a nice way to get an intuitive understanding of the difference between “attached” and “detached” streams: given attached streams, a generator yield could return references to local variables. But if we only have detached streams, as today, then you could only yield things that you own or things that were borrowed from your caller (i.e., references derived from other references that you got as parameters). In other words, yield would have the same limitations as return does today.

The AsyncRead and AsyncWrite traits

Next cramertj and I discussed the AsyncRead and AsyncWrite traits. As currently defined in futures-io, these traits are the “async analog” of the corresponding synchronous traits Read and Write. For example, somewhat simplified, AsyncRead looks like:

trait AsyncRead { fn poll_read( self: Pin<&mut Self>, cx: &mut Context<'_>, buf: &mut [u8], ) -> Poll<Result<usize, Error>>; }
These have been a topic of recent discussion because the tokio crate has been considering adopting a new definition of AsyncRead/AsyncWrite. The primary concern has to do with the buf: &mut [u8] method. This method is supplying a buffer where the data should be written. Therefore, typically, it doesn’t really matter what the contents of that buffer when the function is called, as it will simply be overwritten with the data generated. However, it is of course possible to write a AsyncRead implementation that does read from that buffer. This means that you can’t supply a buffer of uninitialized bytes, since reading from uninitialized memory is undefined behavior and can cause LLVM to perform mis-optimizations.

cramertj and I didn’t go too far into discussing the alternatives here so I won’t either (this blog post is already long enough). I hope to dig into it in future interviews. The main point that cramertj made is that the same issue affects the standard Read trait and that it would make sense to address the design in the same way in both traits. (Indeed, there have been attempts to modify the trait to deal with (e.g., the initializer method, which also has an analogue in the AsyncRead trait).)

cramertj’s preferred solution to the problem would be to have some “freeze” function that can take uninitialized memory and “bless” it such that it can be accessed without UB, though it would contain “random” bytes (this is basically what people intuitively expected from uninitialized memory, though in fact it is not an accurate model). Unfortunately, figuring out how to implement such a thing in LLVM is a pretty open question, and there are also other problems (such as linux’s MADV_FREE feature) that may make this infeasible.

EDIT: An earlier draft of this post mistakely said that we would want some “poison” function, but really the proper term is “freeze”. In other words, some function that – given a bit of uninitialized data – makes it initialized but with some arbitrary value.

Conclusion

This was part two of my conversation with cramertj. Stay tuned for part 3, where we talk about async closures!

Comments?

There is a thread on the Rust users forum for this series.

Async Interview #2: cramertj

2019-12-09T00:00:00+00:00

For the second async interview, I spoke with Taylor Cramer – or cramertj, as I’ll refer to him. cramertj is a member of the compiler and lang teams and was – until recently – working on Fuchsia at Google. He’s been a key player in Rust’s Async I/O design and in the discussions around it. He was also responsible for a lot of the implementation work to make async fn a reality.

Video

You can watch the video on YouTube. I’ve also embedded a copy here for your convenience:

Spreading this out over a few posts

So, cramertj and I had a long conversation, with a lot of technical detail. I was trying to get this blog post finished by last Friday but it took a lot of time! I decided it’s probably too much material to post in one go, so I’m going to break up the blog post into a few pieces (I’ll post the whole video though).

The blog post is mostly covering what cramertj had to say, though in some cases I’m also adding in various bits of background information or my own editorialization. I’m trying to mark it when I do that. =)

On Fuchsia

We kicked off the discussion talking a bit about the particulars of the Fuchsia project. Fuchsia is a microkernel architecture and thus a lot of the services one finds in a typical kernel are implemented as independent Fuchsia processes. These processes are implemented in Rust and use Async I/O.

Fuchsia uses its own unique executor and runtime

Because Fuchsia is not a unix system, its kernel primitives, like sockets and events, work quite differently. Fuchsia therefore uses its own custom executor and runtime, rather than building on a separate stack like tokio or async-std.

Fuchsia benefits from interoperability

Even though Fuchsia uses its own executor, it is able to reuse a lot of libraries from the ecosystem. For example, Fuchsia uses Hyper for its HTTP parsing. This is possible because Hyper offers a generic interface based on traits that Fuchsia can implement.

In general, cramertj feels that the best way to achieve interop is to offer trait-based interfaces. There are other projects, for example, that offer feature flags (e.g., to enable “tokio” compatibilty etc), but this tends to be a suboptimal way of managing things, at least for libraries.

For one thing, offer features means that support for systems like fuschia must be “upstreamed” into the project, whereas offering traits means that downsteam systems can implement the traits themselves.

In addition, using features to choose between alternatives can cause problems across larger dependency graphs. Features are always meant to be “additive” – i.e,. you can add any number of them – but features that choose between backends tend to be exclusive – i.e., you must choose at most one. This is a problem because cargo likes to take the union of all features across a dependency graph, and so having exclusive features can lead to miscompilations when things are combined.

Background topic: futures crate

cramertj and I next talked some about the futures crate. Before going much further into that, I want to give a bit of background on the futures crate itself and how its setup.

The futures crate has been very carefully setup to permit its components to evolve with minimal breakage and incompatibility across the ecosystem. However, my experience from talking to people has been that there is a lot of confusion as to how the futures crate is setup and why, and just how much they can rely on things not to change. So I want to spend a bit of time documenting my understanding the setup and its motivations.

Historically, the futures crate has served as a kind of experimental “proving ground” for various aspects of the future design, including the Future trait itself (which is now in std).

Currently, the futures crate is at version 0.3, and it offers a number of different categories of functionality:

key traits like Stream, AsyncRead, and AsyncWrite

key primitives like [“async-aware” locks]

traditional locks

“extension” traits like FutureExt, StreamExt, AsyncReadExt, and so forth

these traits offer convenient combinator methods like map that are not part of the corresponding base traits

useful macros like join! or select!

useful bits of code such as a ThreadPool for “off-loading” heavy computations

In fact, the first item in that list (“key traits”) is quite distinct from the remaining items. In particular, if you are writing a library, those key traits are things that you might well like to have in your public interface. For example, if you are writing a parser that operates on a stream of data, it might take a AsyncRead as its data source (just as a synchronous parser would take a Read).

The remaining items on the list fall generally into the category of “implementation details”. They ought to be “private” dependencies of your crate. For example, you may use methods from FutureExt internally, but you don’t require other crates to use them; similarly you may join! futures internally, but that is not something that would show up in a function signature.

the futures crate is really a facade

One thing you’ll notice if you look more closely at the futures crate is that it is in fact composed of a number of smaller crates. The futures crate itself simply ’re-exports’ items from these other crates:

futures-core – defines the Stream trait (also the Future trait, but that is an alias for std)

futures-io – defines the AsyncRead and AsyncWrite traits

futures-util – defines extension traits like FutureExt

…

The goal of this facade is to permit things to evolve without forcing semver-incompatible changes. For example, if the AsyncRead trait should evolve, we might be forced to issue a new major version of futures-io and thus ultimately issue a new futures release (say, 0.4). However, the version number of futures-core remains unchanged. This means that if your crate only depends on the Stream trait, it will be interoperable across both futures 0.3 and 0.4, since both of those versions are in fact re-exporting the same Stream trait (from futures-core, whose version has not changed).

In fact, if you are a library crate, it probably behooves you to avoid depending on the futures crate at all, and instead to declare finer-grained dependencies; this will make it very clear whe you need to declare a new semver release yourself.

cramertj: the best place for “standard” traits is in std

So, background aside, let me return to my discussion with cramertj. One of the points that cramertj is that the only “truly standard” place for a trait to live is libstd. Therefore, cramertj feels like the next logical step for traits like Stream or AsyncRead is to start moving them into the standard library. Once they are there, this would be the strongest possible signal that people can rely on them not to change.

we can move to libstd without breakage

You may be wondering what it would mean if we moved one of the traits from the futures crate into libstd – would things in the ecosystem that are currently using futures have to update? The answer is no, not necessarily.

Presuming that some trait from futures is moved wholesale into libstd (i.e., without any modification), then it is possible for us to simply issue a new minor version of the futures crate (and the appropriate subcrate). This new minor version would change from defining a trait (say, Stream) to re-exporting the version from std.

As a concrete example, if we moved AsyncRead from futures-io to libstd (as cramertj advocates for later on), then we would issue a 0.3.2 release of futures-io. This release would replace trait AsyncRead with a pub use that re-exports AsyncRead from std. Now, any crate in the ecosystem that previously depended on 0.3.1 can be transparently upgraded to 0.3.2 (it’s a semver-compatibly change, after all)¹, and suddenly all references to AsyncRead would be referencing the version from std. (This is, in fact, exactly what happened with the futures trait; in 0.3.1., it is simply re-exported from libcore.)

on the extension traits

One of the interesting points that cramertj made, though not until later in the interview, is that when it comes to futures there are a number of “smaller design decisions” one might make when it comes to combinators. For example, consider a function like Stream::filter. As defined in the future crates, this function returns a “future to a boolean”, so it has a signature like:

impl FnMut(&Item) -> impl Future<Output = bool>
This is effectively an async closure; I’ll summarize what cramertj had to say about async closures in one of the upcoming blog posts. However, you might plausibly wish instead to have a signature that just returns a boolean directly, like so:

impl FnMut(&Item) -> bool
For this reason, cramertj felt that it may make sense not to add these sorts of utilities into the standard library (or at least not yet), and instead to leave those extension traits in “user space”. Maybe when we have more experience we’ll be able to say what the best definition would be for the standard library.

(If I may editorialize, I do think it’s important that we add these sorts of helper methods to std eventually; even if there’s no single best choice, we should make some decisions, because it’ll be quite annoying to force everything to pull in utility crates for simple things.)

upcoming posts

OK, that wraps it up for the first post. I have two more coming. In the next post, we’ll discuss the design of the Stream, AsyncRead, and AsyncWrite traits, and what we might want to change there. In the final post, we’ll discuss async closures.

Comments?

There is a thread on the Rust users forum for this series.

This change relies on the fact that cargo will generally not compile two distinct minor versions of a crate; so all crates that depend on 0.3.1 would be compiled against 0.3.2. ↩︎

AiC: Improving the pre-RFC process

2019-12-03T00:00:00+00:00

I want to write about an idea that Josh Triplett and I have been iterating on to revamp the lang team RFC process. I have written a draft of an RFC already, but this blog post aims to introduce the idea and some of the motivations. The key idea of the RFC is formalize the steps leading up to an RFC, as well as to capture the lang team operations around project groups. The hope is that, if this process works well, it can apply to teams beyond the lang team as well.

TL;DR

In a nutshell, the proposal is this:

When you see a problem you think we should try to solve, you open an issue on the lang-team repository. This is called a proposal issue.

In the proposal issue, you include a description of the problem and a link to a thread on internals where the problem is being discussed.

You might have a sketch of a solution in mind, but that’s not required. Even if there is a possible solution, we would always expect to start by looking at different alternatives as well, to make sure we’re headed in the overall direction.

Proposals would not be expected to use the full RFC template. The idea is to be lightweight.

It is important that discussion does not take place on the issue.

The lang-team periodically reviews those issues. If someone on the team likes the idea, we will create a “project group” around the design. Each project group has a repository, a lang team liaison, and one or more shepherds. The repository houses the draft RFC and potentially other documents, such as design notes.

The project group will continue working on the idea until it is complete, meaning that the design has been implemented and become stable. For smaller ideas, this could go quite quickly; for larger ideas, it might take longer. (Of course, we may also decide to cancel the idea at some point.)

Note that I did not say anything yet about the main RFCs repository. The idea is that, when a project group feels the design is ready, they will open the RFC on the main repository. At that point, the RFC represents a design that has already undergone a fair amount of iteration. Moreover, the shepherds and lang team liaison should ensure that the lang team is getting regular updates on the progress. Therefore, the RFC process itself should go significantly faster.

One of my hopes is that a lighter and faster RFC process will also mean that we can use RFCs for smaller decisions, and not just the final design. For example, I think it’d be useful to write an RFC documenting a major choice in the direction, and then have follow-up RFCs that work out some of the details. (This is somewhat similar to the eRFC idea that we used for coroutines but never formalized.)

Goal: Increased transparency

One of the goals here is to increase our transparency – specifically, I want it to be easier to follow along with the design that is taking place. I also want you to be able to control how “deeply” you follow along. I think that this proposal helps in two ways:

First, the lang team will have an active list of project groups which represent the work that is being monitored by the team. This alone gives a good overview of what we’re doing.

Each project group should also have a repository documenting their meetings and communication channels. A well-run group will also have links to blog posts, discussion articles, or other documents. So if you want to dig deeper into a design, or get involved, you can do it that way.

Finally, the RFC repo itself is a good way to get an overview of “major” decisions that are taking place. Monitoring this repo would be a good way for you to raise a red flag if you see something that has been overlooked. However, since RFCs will often be the result of a lot more iteration and design, it wouldn’t be the best place for smaller bikeshedding.

One thing that is worth emphasizing is that RFCs in this model will not be ’early stage’ ideas. They will be the result of a lot more iteration. This will frequently mean that we are not looking for “general feedback” so much as specific, useful criticism.

Goal: Clearer on-ramp

Another goal is to make a clearer “on-ramp” for getting the lang team’s attention. Right now, there isn’t really a good way to “propose” an idea and bring it to the lang team’s attention. You can create a thread on internals, but that is not guaranteed to be seen. You can open an RFC, but if the idea is half-baked, you will get pushback, and if it’s highly developed, you might find that you’ve been going down the wrong road.

I feel like this procedure offers a clearer “invitation” for bringing an idea forward. I think it’s important though that we couple it with lang-team procedures that help us ensure that we stay on top of meeting proposals.

Putting this idea into practice

One question that arises with this idea is what to do with the existing RFC PRs on the repository. If we adopt this proposal, my plan is to encourage authors to migrate those PRs to proposal issues instead. After some period of time, we will close the RFC PRs (except for those that have an active project group behind them). We could also consider an automatic migration, but I think it might be useful to be a bit more selective.

Lang team practice and serendipity

Although it is not part of the RFC proper, I think that it is also important for the lang-team to restructure how we operate a bit. I would like us to use project groups to expose and declare the things we are actively working on, and I think we should devote most of our time to those things. But I also think we should reserve some time for ideas that are not on that list.

I have two goals here. First, sometimes there are just smaller ideas that will never be a kind of “top priority” but are nonetheless nice to have. A prime example might be a syntactic addition like if let.

Second, sometimes there are nice ideas like RFC 2580. These ideas have been well developted, and it might be good to move forward, but it’s hard to find the time to discuss them. As a result, the RFCs hang about in a sort of “limbo”, where it’s totally unclear whether anything will ever happen.

I also expect that as part of this we will impose cerain limits. For example, I don’t think any one person should be shepherding or serving as a liason for more than a few things at a time – possibly just one if the proposal is big enough. That will put an overall cap on how much the lang team can try to do at one time, but that seems like a good limit. The Shepherding 3.0 blog post had more notes on this topic.

I am hoping that if we have a clearer meeting queue, we can put ideas like that on the list, and at least there will be a clear time to discuss and decide definitively whether we can indeed move forward or not.

Conclusion

In general, you can think of the RFC process as a kind of “funnel” with a number of stages. We’ve traditionally thought of the process as beginning at the point where an RFC with a complete design is opened, but of course the design process really begins much earlier. Moreover, a single bit of design can often span multiple RFCs, at least for complex features – moreover, at least in our current process, we often have changes to the design that occur during the implementation stage as well. This can sometimes be difficult to keep up with, even for lang-team members.

This post describes a revision to the process that aims to “intercept” proposals at an earlier stage. It also proposes to create “project groups” for design work and a dedicated repository that can house documents. For smaller designs, these groups and repositories might be small and simple. But for larger designs, they offer a space to include a lot more in the way of design notes and other documents.

Assuming we adopt this process, one of the things I think we should be working on is developing “best practices” around these repositories. For example, I think that for every non-trivial design decision, we should be creating a summary document that describes the pros/cons and the eventual decision (along with, potentially, comments from people who disagreed with that decision outlining their reasoning).

We are already starting to experiment with this sort of process. The FFI-unwind project group, for example, is pursuing an attempt to decide on the rules regarding unwinding across FFI boundaries. And, as I noted in my post announcing the Async Interviews, I’d like to see us collecting design notes for new traits and features that we propose in the async space.

As always, I’d love to hear your feedback. Please leave any comments in the internals thread devoted to the “Adventures in Consensus” series.

Thanks

I just wanted to add a “Thank you!” to Josh Triplett, who co-developed a lot of these specific ideas with me, but also Withoutboats, Yoshua Wuyts, Centril, Steve Klabnik, and the many others that have been discussing variants of this proposal with me over time.

Rust 2020

2019-12-02T00:00:00+00:00

Technically speaking, it’s past the deadline for #rust2020 posts, but I’m running late this year, and I’m going to post something anyway. In this post, I am focusing on what I see as the “largest scale” issues, and not on technical initiatives. If I have time, I will try to post a follow-up talking about some of the key technical initiatives that I think we should focus on as well.

TL;DR

We should do an edition, and we should plan for it now

The time is ripe to talk about encouraging investment from companies

A foundation is perhaps part of the solution, but not the whole solution; we should encourage active participation from stakeholders

Organizational improvements can also encourage investment

Organizationally, we’ve done a lot in 2019, and we can do more in 2020

We should think on longer timescales

One of the questions we asked this year was whether we should plan for a Rust 2021 edition. I feel pretty strongly that the answer is yes. There are a few reasons for this.

I think one of the biggest parts for me is that I think it is very healthy for us to be planning on a 3-year timescale. The fact is that many of our projects these days take years to bring to completion. It is good for us to talk about roadmaps, but it is also good for us to look to a slightly longer horizon.

I don’t necessarily think this kind of “long range” planning should be about specific goals and features, but more about areas of focus. Moreover, I think we should be careful to control our ambitions – I think for example that, in thinking about 2019, we outlined a number of features that are far more realistic on a multi-year timescale.

Plan for edition changes early

Editions also, of course, give us the option to make changes we couldn’t otherwise make. For those who aren’t familiar, editions let us make “backwards incompatible” changes to Rust – but in a way that keeps old code working. These changes might be something as small as adding a keyword, or as large as the module reform we made in Rust 2018. The beauty of editions is that, since they are opt-in at a crate granularity, we are able to keep supporting older crates seamlessly. This means we can improve the language gradually without forcing the entire ecosystem to upgrade in a coordinated fashion.

In Rust 2018, we made a number of these sorts of “migrations”:

We modified use statements to introduce the use crate::foo notation

We transitioned to the dyn Trait syntax

We introduced a few keywords

Crucially, we also provided tooling to automate these migrations. This is what made changes like the first change possible at all, since that change affected almost every crate ever written.

I don’t expect us to do anything as dramatic as changing use statements in Rust 2021, but I am confident we are going to want to make a few backwards incompatible changes. I don’t know exactly what they will be yet, but I do know that now is the time to start planning them – we want to be front-loading that kind of work so that we can have time to work on the documentation, migration tooling, and other things that we will need.

Yosh’s #rust2020 post covered this topic quite well, I think. In the timeline section, he breaks down the time available, concluding with:

All together that leaves us with about 12 months total to plan and prepare the next edition release, starting January 2020. This should be enough time to successfully plan and draft a new edition, with some slack to work with.

We are seeing increased investment from many companies

2019 marked a real turning point when it comes to companies using and supporting Rust. I remember the time when everybody I met who used Rust was a hobbyist. Then we started to see startups and smaller companies experimenting with Rust, looking for a way to boost their productivity when writing low-level systems code. And now we have major companies like Microsoft, Amazon, Facebook, and Google adopting Rust for major projects. Somewhat unexpectedly, to me anyway, Rust has become the language of choice for a lot of Blockchain companies.

This increasing adoption has also begun to translate to increased investment in Rust itself. Microsoft and Amazon, for example, are now sponsoring the majority of Rust’s CI costs. A big part of the async-await development was done by developers on Google’s Fuschia team. And so forth.

But we need more people paid to work on Rust

Nonetheless, for Rust to really thrive, we need to see more people paid for their work on Rust teams. As Erin put it in her #rust2020 post,

When 1.0 launched there was ~30 members of The Rust Programming Language, now in 2019 we have ~200 members. This is nearly 7x the amount of members, yet we’ve changed very little to be able to adapt to this growth. No where is this more evident than out of the now 200 or so members, the number that are paid for their time on Rust is still in the single digits, and this doesn’t look to change any time soon.

One thing I’ve observed time and time again is that bigger, complex projects really require dedicated leadership and organization – and this often takes vast amount of time. I talked some about this in my post on “More than coders”. The plain fact is that this kind of time is often unavailable on a volunteer basis.

Shifting the focus from adoption to investment

In year’s past, when we thought about companies and Rust, a big part of the focus was on encouraging adoption. But I think at this point it’s time for us to start encouraging investment. There are a lot of companies using Rust now, and the time is ripe to ask ourselves how we can help those companies to help Rust.

But when we ask those questions, I want us to be careful in our thinking. I don’t think there’s a single, simple answer for how to increase investment in Rust. In fact, I don’t even think there’s a single answer to what investment is. I would love, of course, to see more people hired to work 100% on Rust. But there are so many other ways to invest in Rust:

Sponsoring Rust conferences, meetups, or other social events

Sponsoring employees to attend the Rust All Hands

Encouraging employees to spend work time working on Rust as a sort of “20% project”

Building ecosystem libraries that everyone can use

Sponsoring Rust’s CI or other infrastructure

Sponsoring the Rust All Hands, Increasing Rust’s Reach, or other Rust org initiatives direectly

Using contracting or grants to support the maintainers of the Rust project or key figures in the Rust ecosystem

Many are stronger than one

Even though Rust was started by Mozilla, Mozilla never wanted to “own” Rust. We’ve always wanted Rust to have its own identity and to be supported by many companies and groups, big and small. Fundamentally, this is because having many stakeholders makes for a better, more robust language. Part of what accounts for Rust’s success is that we’ve attracted a diverse set of contributors, who were able to push us to improve the design in any number of ways.

I mention this because, as we talk about money, I think we will also need to address the question of whether to form a Rust foundation. I am increasingly thinking that this is a good idea. I think that having a central legal entity that represents Rust could solve some challenges for us, for example, and I also think having a central bank account could help for “group funding” of infrastructure, events like the Rust All Hands, or programs likes Increasing Rust’s Reach. But I don’t expect this Rust foundation to directly “solve the problem” of paying people to work on Rust, nor do I think it should. I would expect it rather to be one piece of a larger puzzle.

Whatever we wind up with, I think it’s important to encourage companies that use Rust to employ key figures that actively participate in the Rust organization (whether that be full- or part-time). We don’t want a setup where the Rust organization is the foundation, supported financially by others. We want a setup where the Rust organization is directly composed, as much as possible, of its users and stakeholders, all working together.

Improving our organization can lead to increased investment

I think a lot of the changes we need may be more organizational than specifically to do with money. For example, I enjoyed reading Parity’s #rust2020 post, and I was particularly struck by this paragraph (emphasis mine):

For many of the issues raised above, we are also happy to jump in and help out–and on other issues as well. We are a Rust company after all—we believe in the language, its ecosystem and the community, and want to be a valuable participant in it. … However, it’s often unclear whether the work is worthwhile. To a business, it is hard to argue that one might spend a month or two working on a new feature without any assurance that the approach taken would be accepted.

This is something that’s been on my mind quite a bit lately as well. If you are a company or organization that would like to help make changes to Rust, how do you go about it? I’ve been getting this question more and more lately as I go and talk to companies. Sometimes, the question pertains to a single feature, like custom test frameworks, or custom allocators. Other times, the question is about a broader initiative – think of the Sealed Rust pitch that Ferrous Systems posted some time back.

In principle, the RFC process is supposed to help serve these needs, but I don’t think in practice it’s working very well. I think though that we can tweak and improve our system to overcome some of those shortcomings. What’s more, if we do, I think that same system won’t be specific to companies. After all, if you’re a volunteer contributor interested in pushing on a specific feature, you face the same the problem. (This, for example, is precisely the problem that shepherding is taking aim at.)

2019 saw a lot of progress in organizational matters

Organizationally, I’m quite proud of all the work that we did during 2019, even though I still think we’ve got a lot of room to go. Just looking at the compiler team, for example, we really refined the concept of working groups, we clarified the concept of compiler team contributors, and we introduced other innovations like the weekly design meetings. These meetings have meant not only that we just have a lot more communication as a team, they’re also great for people looking to eavesdrop and learn more about how the compiler works. The lang team is publishing its minutes and (frequently) recordings of our meetings, which are also open for anyone to attend; the core team is also publishing recordings on a best effort basis. The intrastructure team has made great strides in documenting their procedures on forge, as have other teams. At the project level, we introduced the leadership sync meeting, the Inside Rust blog, and we’ve been trying to get a governance-focused WG off the ground.

We can do even more in 2020

Over the next year, I’d like to see more progress on how the project operates. Some of the goals I think we should be working towards:

active mentorship to help leads formulate roadmaps and plans, as was discussed in the recent compiler-team retrospective

documenting all of our governance procedures and other details on forge

more transparency about our priorities, and a clearer process for requesting that something be made into a priority

improving followthrough and avoiding unbounded queues; when we start designing a feature, we should see that effort through to the end before we pick up new things

extending our governance to cover “cross-cutting projects”, which draw on the expertise from many teams; right now, for example, the “handoff” between the lang team doing the design for a feature and the compile team starting to implement is informal and often just fails to happen

Conclusion

As I wrote in the beginning, I’ve not tried to address technical initiatives in this post. I have thoughts on those too, and I think I will try to do some follow-ups there. In summary, for Rust 2020, I believe:

We should do a 2021 Edition, and we should start the planning now.

We’ve succeeded at encouraging Rust adoption, and we should start thinking about encouraging investment.

Improving how the Rust organization operates continues to be a pressing need, and will help everything, including investment.

Strategy	Downside
Box it (with default allocator)	requires allocation, not especially efficient
Box it with cache on caller side	requires allocation
Inline it into the iterator	adds space to `AI`, doesn’t work for `&self`
Box it with cache on callee side	requires allocation, adds space to `AI`, doesn’t work for `&self`
Allocate maximal space	can’t necessarily use that across crates, requires extensive interprocedural analysis
Allocate some space, fallback	uses allocator, requires extensive interprocedural analysis or else random guesswork
Alloca on the caller side	incompatible with async Rust
Side-stack	requires cooperation from runtime and allocation

When	Who
Wed at 07:00 ET	Ryan
Wed at 15:00 ET	Niko
Fri at 07:00 ET	Ryan
Fri at 14:00 ET	Niko

When	Who	Topic
Thu at 07:00 ET	Ryan	The need for Async Traits
Fri at 07:00 ET	Ryan	Challenges from cancellation

When	Who	Topic
Tue at 14:30 ET	Niko	wrapping C++ async APIs in Rust futures and other tales of interop
Wed at 10:00 ET	Niko	picking an HTTP library and similar stories
Wed at 15:00 ET	Niko	structured concurrency and parallel data processing
Thu at 07:00 ET	Ryan	debugging and getting insights into running services
Fri at 07:00 ET	Ryan	lack of a polished common implementations of basic async helpers
Fri at 14:30 ET	Niko	bridging sync and async

baby steps

The `Overwrite` trait and `Pin`

Just show me the dang code

Example 1: Converting a generator into an iterator

Example 2: Implementing the MaybeDone future

Example 3: Implementing the Join combinator

How I think about pin

The Pin type is a modifier on the pointer P

Pinning is part of the “lifecycle” of a place

A digression on forgetting vs other ways to leak

Values of types implementing Unpin cannot be pinned

Pin<&mut T> is really a “maybe pinned” reference

Pin picked a peck of pickled pain

My proposal in a nutshell

Making Pin-based APIs easier to author

Overwrite as the better Unpin

Why swaps are bad without s/Unpin/Overwrite/

As a result, Pin<&mut T> and &mut T methods don’t interoperate today

With s/Unpin/Overwrite/, Pin<&mut Self> is a subtype of &mut self

Today you must categorize fields as “structurally pinned” or not

Pin projection is safe from any !Overwrite type

Making Pin-based APIs easier to call

New syntax not strictly required

Frequently asked questions

So…there’s a lot here. What’s the key takeaways?

Why do you only mention swaps? Doesn’t Overwrite affect other things?

Why then did you propose opting out from both overwrites and swaps?

Can you come up with a more semantic name for Overwrite?

What do DropWhileBorrowed and Swap have in common?

What does it mean to be the “same value”?

There’s a lot of subtle reasoning in this post. Are you sure this is correct?

What part of this post are you most proud of?

Is this backwards compatible?

Conclusion

Making overwrite opt-in #crazyideas

TL;DR

Structure of this series

If you could change one thing about Rust, what would it be?

Motivating example #1: Immutable fields

Not all fields are mutable, but I can’t declare that in my Rust code

Idea: Declare fields as mutable

When can you mutate fields?

Leveraging immutable fields in the borrow checker

But what about immutable fields? Doesn’t that solve this?

Generics: why we can’t trivially disallow overwrites

Recap

The trait system to the rescue

The effect of Overwrite

Overwrite requires Sized

Overwrite only applies to borrowed values

Subtle: Overwrite is not infectious

Who implements Overwrite?

Copy implies Overwrite

“Pointer” types are Overwrite

dyn,[], and other “unsized” types do not implement Overwrite

Structs and enums

Futures and closures

Default bounds and backwards compatibility

Associated type bounds are annoying, but perhaps not fatal

Frequently asked questions

OMG endless words. What did I just read?

This change doesn’t seem worth it just to get immutable fields. Is there more?

In “Rust Next”, who would ever implement Overwrite manually?

What is the relationship of Overwrite and Unpin?

Should Overwrite be an auto trait?

Conclusion

More thoughts on claiming

TL;DR: People like it

Clarifying the relationship of the traits

On heuristics

“Infallible” ought to be “does not unwind” (and we ought to abort if it does)

Clarifying claim codegen

Conclusion

What I really proposed

Why I proposed it

Claiming, auto and otherwise

TL;DR

Step 1: Introducing an explicit Claim trait

One-clone-fits-all creates a maintenance hazard

Proposal: an explicit Claim trait distinguishing “cheap, infallible, transparent” clones

Example 2: Implementing the `MaybeDone` future

Example 3: Implementing the `Join` combinator

The `Pin`
type is a modifier on the pointer `P`

Values of types implementing `Unpin` cannot be pinned

`Pin<&mut T>` is really a “maybe pinned” reference

Making `Pin`-based APIs easier to author

`Overwrite` as the better `Unpin`

Why swaps are bad without `s/Unpin/Overwrite/`

As a result, `Pin<&mut T>` and `&mut T` methods don’t interoperate today

With `s/Unpin/Overwrite/`, `Pin<&mut Self>` is a subtype of `&mut self`

Pin projection is safe from any `!Overwrite` type

Making `Pin`-based APIs easier to call

Why do you only mention swaps? Doesn’t `Overwrite` affect other things?

Can you come up with a more semantic name for `Overwrite`?

What do `DropWhileBorrowed` and `Swap` have in common?

The effect of `Overwrite`

`Overwrite` requires `Sized`

`Overwrite` only applies to borrowed values

Subtle: `Overwrite` is not infectious

Who implements `Overwrite`?

`Copy` implies `Overwrite`

“Pointer” types are `Overwrite`

`dyn`,`[]`, and other “unsized” types do not implement `Overwrite`

In “Rust Next”, who would ever implement `Overwrite` manually?

What is the relationship of `Overwrite` and `Unpin`?

Should `Overwrite` be an auto trait?

Clarifying `claim` codegen

Step 1: Introducing an explicit `Claim` trait

Proposal: an explicit `Claim` trait distinguishing “cheap, infallible, transparent” clones

Some things that should implement `Copy` do not

Step 3. Stop using `Copy` to control moves

What kind of code would `#[deny(automatic_claims)]`?

How did you come up with the name `Claim`?

TL;DR: write `T: Unsized` in place of `T: ?Sized` (and sometimes `T: DynSized`)

Why do we have a default `T: Sized` bound in the first place?

So why the `?Sized` notation?

`?` signals that the bound operates in reverse

`?` can be extended to other default bounds

But `?` doesn’t scale well to “differences in degree”

And `?` looks “more magical” than it needs to