Dyn async traits, part 7: a design emerges?

7 January 2022

Hi all! Welcome to 2022! Towards the end of last year, Tyler Mandry and I were doing a lot of iteration around supporting “dyn async trait” – i.e., making traits that use async fn dyn safe – and we’re starting to feel pretty good about our design. This is the start of several blog posts talking about where we’re at. In this first post, I’m going to reiterate our goals and give a high-level outline of the design. The next few posts will dive more into the details and the next steps.

The goal: traits with async fn that work “just like normal”

It’s been a while since my last post about dyn trait, so let’s start by reviewing the overall goal: our mission is to allow async fn to be used in traits just like fn. For example, we would like to have an async version of the Iterator trait that looks roughly like this1:

trait AsyncIterator {
    type Item;
    
    async fn next(&mut self) -> Self::Item;
}

You should be able to use this AsyncIterator trait in all the ways you would use any other trait. Naturally, static dispatch and impl Trait should work:

async fn sum_static(mut v: impl AsyncIterator<Item = u32>) -> u32 {
    let mut result = 0;
    while let Some(i) = v.next().await {
        result += i;
    }
    result
}

But dynamic dispatch should work too:

async fn sum_dyn(v: &mut dyn AsyncIterator<Item = u32>) -> u32 {
    //               ^^^
    let mut result = 0;
    while let Some(i) = v.next().await {
        result += i;
    }
    result
}

Another goal: leave dyn cleaner than we found it

While we started out with the goal of improving async fn, we’ve also had a general interest in making dyn Trait more usable overall. There are a few reasons for this. To start, async fn is itself just sugar for a function that returns impl Trait, so making async fn in traits work is equivalent to making RPITIT (“return position impl trait in traits”) work. But also, the existing dyn Trait design contains a number of limitations that can be pretty frustrating, and so we would like a design that improves as many of those as possible. Currently, our plan lifts the following limitations, so that traits which make use of these features would still be compatible with dyn:

  • Return position impl Trait, so long as Trait is dyn safe.
    • e.g., fn get_widgets(&self) -> impl Iterator<Item = Widget>
    • As discussed above, this means that async fn works, since it desugars
  • Argument position impl Trait, so long as Trait is dyn safe.
    • e.g., fn process_widgets(&mut self, items: impl Iterator<Item = Widget>).
  • By-value self methods.
    • e.g., given fn process(self) and d: Box<dyn Trait>, able to call d.process()
    • eventually this would be extended to other “box-like” smart pointers

If you put all three of those together, it represents a pretty large expansion to what dyn safety feels like in Rust. Here is an example trait that would now be dyn safe that uses all of these things together in a natural way:

trait Widget {
    async fn augment(&mut self, component: impl Into<WidgetComponent>);
    fn components(&self) -> impl Iterator<Item = WidgetComponent>;
    async fn transmit(self, factory: impl Factory);
}

Final goal: works without an allocator, too, though you have to work a bit harder

The most straightforward way to support RPITIT is to allocate a Box to store the return value. Most of the time, this is just fine. But there are use-cases where it’s not a good choice:

  • In a kernel, where you would like to use a custom allocator.
  • In a tight loop, where the performance cost of an allocation is too high.
  • Extreme embedded cases, where you have no allocator at all.

Therefore, we would like to ensure that it is possible to use a trait that uses async fns or RPITIT without requiring an allocator, though we think it’s ok for that to require a bit more work. Here are some alternative strategies one might want to support:

  • Pre-allocating stack space: when you create the dyn Trait, you reserve some space on the stack to store any futures or impl Trait that it might return.
  • Caching: reuse the same Box over and over to reduce the performance impact (a good allocator would do this for you, but not all systems ship with efficient allocators).
  • Sealed trait: you derive a wrapper enum for just the types that you need.

Ultimately, though, there is no limit to the number of ways that one might manage dynamic dispatch, so the goal is not to have a “built-in” set of strategies but rather allow people to develop their own using procedural macros. We can then offer the most common strategies in utility crates or perhaps even in the stdlib, while also allowing people to develop their own if they have very particular needs.

The design from 22,222 feet

I’ve drawn a little diagram to illustrate how our design works at a high-level:

VtableVtableCallerCallerArgumentadaptation from vtableArgument…Normal function found in the implNormal functi…Return value adaptation to vtableReturn value…Return type adaptation from vtableReturn type a…Caller knows:Types of impl Trait arguments.Caller does not know:Type of the callee.Precise return type, if function returns impl Trait.Caller knows:…Argument adaptation to vtableArgument adap…Callee does not know:Types of impl Trait arguments.Callee knows:Type of the callee.Precise return type, if function returns impl Trait.Callee does not know:…Viewer does not support full SVG 1.1

Let’s walk through it:

  1. To start, we have the caller, which has access to some kind of dyn trait, such as w: &mut Widget, and wishes to call a method, like w.augment()
  2. The caller looks up the function for augment in the vtable and calls it:
    • But wait, augment takes a impl Into<WidgetComponent>, which means that it is a generic function. Normally, we would have a separate copy of this function for every Into type! But we must have only a single copy for the vtable! What do we do?
    • The answer is that the vtable encodes a copy that expects “some kind of pointer to a dyn Into<WidgetComponent>”. This could be a Box but it could also be other kinds of pointers: I’m being hand-wavy for now, I’ll go into the details later.
    • The caller therefore has the job of creating a “pointer to a dyn Into<WidgetComponent>”. It can do this because it knows the type of the value being provided; in this case, it would do it by allocating some memory space on the stack.
  3. The vtable, meanwhile, includes a pointer to the right function to call. But it’s not a direct pointer to the function from the impl: it’s a lightweight shim that wraps that function. This shim has the job of converting from the vtable’s ABI into the standard ABI used for static dispatch.
  4. When the function returns, meanwhile, it is giving back some kind of future. The callee knows that type, but the caller doesn’t. Therefore, the callee has the job of converting it to “some kind of pointer to a dyn Future” and returning that pointer to the caller.
    • The default is to box it, but the callee can customize this to use other strategies.
  5. The caller gets back its “pointer to a dyn Future” and is able to await that, even though it doesn’t know exactly what sort of future it is.

Upcoming posts

In upcoming blog posts, I’m going to expand on several things that I alluded to in my walkthrough:

  • “Pointer to a dyn Trait”:
    • How exactly do we encode “some kind of pointer” and what does that mean?
    • This is really key, because we need to be able to support
  • Adaptation for impl Trait arguments:
    • How do we adapt to/from the vtable for arguments of generic type?
    • Hint: it involves create a dyn Trait for the argument
  • Adaptation for impl trait return values:
    • How do we adapt to/from the vtable for arguments of generic type?
    • Hint: it involves returning a dyn Trait, potentially boxed but not necessarily
  • Adaptation for by-value self:
    • How do we adapt to/from the vtable for by-value self, and when are such functions callable?
  • Boxing and alternatives thereto:
    • When you call an async fn or fn that returns impl Trait via dynamic dispatch, the default behavior is going to allocate a Box, but we’ve seen that doesn’t work for everyone. How convenient can we make it to select an alternative strategy like stack pre-allocation, and how can people create their own strategies?

We’ll also be updating the async fundamentals initiative page with more detailed design docs.

Appendix: Things I’d still like to see

I’m pretty excited about where we’re landing in this round of work, but it doesn’t get dyn where I ultimately want it to be. My ultimate goal is that people are able to use dynamic dispatch as conveniently as you use impl Trait, but I’m not entirely sure how to get there. That means being able to write function signatures that don’t talk about Box vs & or other details that you don’t have to deal with when you talk about impl Trait. It also means not having to worry so much about Send/Sync and lifetimes.

Here are some of the improvements I would like to see, if we can figure out how:

  • Support clone:
    • Given trait Widget: Clone and w: Box<dyn Widget>, able to invoke w.clone()
    • This almost works, but the fact that trait Clone: Sized makes it difficult.
  • Support “partially dyn safe” traits:
    • Right now, dyn safe is all or nothing. This has the nice implication that dyn Foo: Foo for all types. However, it is also limiting, and many people have told me they find it confusing. Moreover, dyn Foo is not Sized, and hence while it’s cool conceptually that dyn Foo implements Foo, you can’t actually use a dyn Foo in the same way that you would use most other types.
  • Improve how Send interacts with returned values (e.g., RPIT, async fn in traits, etc):
    • If you write dyn Foo + Send, that
  • Avoid having to talk about pointers so much
    • When you use impl Trait, you get a really ergonomic experience today:
      • fn apply_map(map_fn: impl FnMut(u32) -> u32)
      • fn items(&self) -> impl Iterator<Item = Item> + '_
    • In contrast, when you use dyn trait, you wind up having to be very explicit around lots of details, and your callers have to change as well:
      • fn apply_map(map_fn: &mut dyn FnMut(u32) -> u32)
      • fn items(&self) -> Box<dyn Iterator<Item = Item> + '_>
  • Make dyn trait feel more parametric:
    • If I have an struct Foo<T: Trait> { t: Box<T> }, it has the nice property that it exposes the T. This means we know that Foo<T>: Send if T: Send (assuming Foo doesn’t have any fields that are not send), we know that Foo<T>: 'static if T: 'static, and so forth. This is very cool.
    • In contrast, struct Foo { t: Box<dyn Trait> } bakes a lot of details – it doesn’t permit t to contain any references, and it doesn’t let Foo be Send.
  • Make it sound:
    • There are a few open soundness bugs around dyn trait, such as #57893, and I would like to close them. This interacts with other things in this list.

  1. This has traditionally been called Stream↩︎