Async Interview #7: Withoutboats
10 March 2020
Hello everyone! I’m happy to be posting a transcript of my async interview with withoutboats. This particularly interview took place way back on January 14th, but the intervening months have been a bit crazy and I didn’t get around to writing it up till now.
Video
You can watch the video on YouTube. I’ve also embedded a copy here for your convenience:
Next steps for async
Before I go into boats’ interview, I want to talk a bit about the state of async-await in Rust and what I see as the obvious next steps. I may still do a few more async interviews after this – there are tons of interesting folks I never got to speak to! – but I think it’s also past time to try and come to a consensus of the “async roadmap” for the rest of the year (and maybe some of 2021, too). The good news is that I feel like the async interviews highlighted a number of relatively clear next steps. Sometime after this post, I hope to post a blog post laying out a “rough draft” of what such a roadmap might look like.
History
withoutboats is a member of the Rust lang team. Starting around the
beginning on 2018, they started looking into async-await for
Rust. Everybody knew that we wanted to have some way to write a
function that could suspend (await
) as needed. But we were stuck on
a rather fundamental problem which boats explained in the blog post
“self-referential structs”. This blog post was the first in a
series of posts that ultimately documented the design that became the
Pin
type, which describes a pointer to a value that can never be
moved to another location in memory. Pin
became the foundation for
async functions in Rust. (If you’ve not read the blog post series,
it’s highly recommended.) If you’d like to learn more about pin, boats
posted a recorded stream on YouTube that explores its design in
detail.
Vision for async
All along, boats has been motivated by a relatively clear vision: we
should make async Rust “just as nice to use” as Rust with blocking
I/O. In short, you should be able to write code much like you ever
did, but adding making functions which perform I/O into async
and
then adding await
here or there as needed.
Since 2018, we’ve made great progress towards the goal of “async I/O that is as easy as sync” – most notably by landing and stabilizing the async-await MVP – but we’re not there yet. There remain a number of practical obstacles that make writing code using async I/O more difficult than sync I/O. So the mission for the next few years is to identify those obstacles and dismantle them, one by one.
Next step: async destructors
One of the first obstacles that boats mentioned was extending Rust’s
Drop
trait to work better for async code. The Drop
trait, for
those who don’t know Rust, is a special trait in Rust that types can
implement in order to declare a destructor (code which should run when
a value goes out of scope). boats wrote a blog
post that discusses the
problem in more detail and proposes a solution. Since that blog post,
they’ve refined the proposal in response to some feedback, though the
overall shape remains the same. The basic idea is to extend the Drop
trait with an optional poll_drop_ready
method:
trait Drop {
fn drop(&mut self);
fn poll_drop_ready(
self: Pin<&mut Self>,
ctx: &mut Context<'_>,
) -> Poll<()> {
Poll::Ready(())
}
}
When executing an async fn, and a value goes out of scope, we will
first invoke poll_drop_ready
, and “await” if it returns anything
other than Poll::Ready
. This gives the value a chance to do async
operations that may block, in preparation for the final drop. Once
Poll::Ready
is returned, the ordinary drop
method is invoked.
This async-drop trait came up in early async interviews, and I raised Eliza’s use case with boats. Specifically, she wanted some way to offer values that are live on the stack a callback when a yield occurs and when the function is resumed, so that they can (e.g.) interact with thread-local state correctly in an async context. While distinct from async destructors, the issues are related because destructors are often used to manage thread-local values in a scoped fashion.
Adding async drop requires not only modifying the compiler but also
modifying futures combinators to properly handle the new
poll_drop_ready
method (combinators need to propagate this
poll_drop_ready
to the sub-futures they contain).
Note that we wouldn’t offer any ‘guarantee’ that poll_drop_ready
will run. For example, it would not run if a future is dropped without
being resumed, because then there is no “async context” that can
handle the awaits. However, like Drop
, it would ultimately be
something that types can “usually” expect to execute under ordinary
circumstances.
Some of the use cases for async-drop include writers that buffer data and wish to ensure that the data is flushed out when the writer is dropped, transactional APIs, or anything that might do I/O when dropped.
block_on
in the std library
One very small addition that boats proposed is adding block_on
to
the standard library. Invoking block_on(future)
would block the
current thread until future
has been fully executed (and then return
the resulting value). This is actually something that most async I/O
code would never want to do – if you want to get the value from a
future, after all, you should do future.await
. So why is block_on
useful?
Well, block_on
is basically the most minimal executor. It allows you
to take async code and run it in a synchronous context with minimal
fuss. It’s really convenient in examples and documentation. I would
personally like it to permit writing stand-alone test cases. Those
reasons alone are probably good enough justification to add it, but
boats has another use in mind as well.
async fn main
Every Rust program ultimately begins with a main
somewhere. Because
main
is invoked by the surrounding C library to start the program,
it also tends to be a place where a certain amount of “boilerplate
code” can accumulate in order to “setup” the environment for the rest
of the program. This “boilerplate setup” can be particularly annoying
when you’re just getting started with Rust, as the main
function is
often the first one you write, and it winds up working differently
than the others. A similar program effects smaller code examples.
In Rust 2018, we extended main
so that it supports Result
return
values. This meant that you could now write main
functions that use
the ?
operator, without having to add some kind of intermediate
wrapper:
fn main() -> Result<(), std::io::Error> {
let file = std::fs::File::create("output.txt")?;
}
Unfortunately, async code today suffers from a similar papercut. If
you’re writing an async project, most of your code is going to be
async in nature: but the main
function is always synchronous, which
means you need to bridge the two somehow. Sometimes, especially for
larger projects, this isn’t that big a deal, as you likely need to do
some setup or configuration anyway. But for smaller examples, it’s
quite a pain.
So boats would like to allow people to write an “async” main. This
would then permit you to directly “await” futures from within the
main
function:
async fn main() {
let x = load_data(22).await;
}
async fn load_data(port: usize) -> Data { ... }
Of course, this raises the question: since the program will ultimately
run synchronized, how do we bridge from the async fn main
to a
synchronous main? This is where block_on
comes in: at least to
start, we can simply declare that the future generated by async fn main
will be executed using block_on
, which means it will block the
main thread until main
completes (exactly what we want). For simple
programs and examples, this will be exactly what you want.
But most real programs will ultimately want to start some other
executor to get more features. In fact, following the lead of the
runtime crate, many executors already offer a procedural macro
that lets you write an async main. So, for example, tokio and
async-std offer attributes called #[tokio::main]
and
#[async_std::main]
respectively, which means that if you have an
async fn main
program you can pick an executor just by adding the
appropriate attribute:
#[tokio::main] // or #[async_std::main], etc
async fn main() {
..
}
I imagine that other executors offer a similar procedural macro – or if they don’t yet, they could add one. =)
(In fact, since async-std’s runtime starts implicitly in a background thread when you start using it, you could use async-std libraries without any additional setup as well.)
Overall, this seems pretty nice to me. Basically, when you write
async fn main
, you get Rust’s “default executor”, which presently is
a very bare-bones executor suitable only for simple examples. To
switch to a more full-featured executor, you simply add a
#[foo::main]
attribute and you’re off to the races!
(Side note #1: This isn’t something that boats and I talked about, but
I wonder about adding a more general attribute, like
#[async_runtime(foo)]
that just desugars to a call like
foo::main_wrapper(...)
, which is expected to do whatever setup is
appropriate for the crate foo
.)
(Side note #2: This also isn’t something that boats and I talked
about, but I imagine that having a “native” concept of async fn main
might help for some platforms where there is already a native
executor. I’m thinking of things like GStreamer or perhaps iOS with
Grand Central Dispatch. In short, I imagine there are environments
where the notion of a “main function” isn’t really a great fit anyhow,
although it’s possible I have no idea what I’m talking about.)
async-await in an embedded context
One thing we’ve not talked about very much in the interviews so far is
using async-await in an embedded context. When we shipped the
async-await MVP, we definitely cut a few corners, and one of those had
to do with the use of thread-local storage (TLS). Currently, when you
use async fn
, the desugaring winds up using a private TLS variable
to carry the Context
about the current async task down through the
stack. This isn’t necessary, it was just a quick and convenient hack
that sidestepped some questions about how to pass in arguments when
resuming a suspended function. For most programs, TLS works just fine,
but some embedded environments don’t support it. Therefore, it makes
sense to fix this bug and permit async fn
to pass around its state
without the use of TLS. (In fact, since boats and I talked,
jonas-schievink opened PR #69033 which does exactly this, though
it’s not yet landed.)
Async fn are implemented using a more general generator mechanism
You might be surprised when I say that we’ve already started fixing the TLS problem. After all, the reason we used TLS in the first place is that there were unresolved questions about how to pass in data when waking up a suspended function – and we haven’t resolved those problems. So why are we able to go ahead and use them to support TLS?
The answer is that, while the async fn
feature is implemented atop a
more general mechanism of suspendable functions1, the full power
of that mechanism is not exposed to end-users. So, for example,
suspendable functions in the compiler permit yielding arbitrary
values, but async functions always yield up ()
, since they only need
to signal that they are blocked waiting on I/O, not transmit
values. Similarly, the compiler’s internal mechanism will allow us to
pass in a new Context
when we wake up from a yield, and we can use
that mechanism to pass in the Context
argument from the future
API. But this is hidden from the end-user, since that Context
is
never directly exposed or accessed.
In short, the suspended functions supported by the compiler are not a language feature: they are an implementation detail that is (currently) only used for async-await. This is really useful because it means we can change how they work, and it also means that we don’t have to make them support all possible use cases one might want. In this particular case, it means we don’t have to resolve some of the thorny questions about to pass in data after a yield, because we only need to use them in a very specific way.
Supporting generators (iterators) and async generators (streams)
One observation that boats raised is that people who write Async I/O
code are interacting with Pin
much more directly than was expected.
The primary reason for this is that people are having to manually
implement the Stream
trait, which is basically the async version
of an iterator. (We’ve talked about Stream
in a number of previous
async interviews.) I have also found that, in my conversations with
users of async, streams come up very, very often. At the moment,
consuming streams is generally fairly easy, but creating them is
quite difficult. For that matter, even in synchronous Rust, manually
implementing the Iterator
traits is kind of annoying (although
significantly easier than streams).
So, it would be nice if we had some way to make it easier to write iterators and streams. And, indeed, this design space has been carved out in other languages: the basic mechanism is to add a generator2, which is some sort of function that can yield up a series of values before terminating. Obviously, if you’ve read up to this point, you can see that the “suspendable functions” we used to implement async await can also be used to support some form of generator abstractions, so a lot of the hard implementation work has been done here.
That said, support generator functions has been something that we’ve been shying away from. And why is that, if a lot of the implementation work is done? The answer is primarily that the design space is huge. I alluded to this earlier in talking about some of the questions around how to pass data in when resuming a suspended function.
Full generality considered too dang difficult
boats however contends that we are making our lives harder than they need to be. In short, if we narrow our focus from “create the perfect, flexible abstraction for suspended functions and coroutines” to “create something that lets you write iterators and streams”, then a lot of the thorny design problems go away. Now, under the covers, we still want to have some kind of unified form of suspended functions that can support async-await and generators, but that is a much simpler task.
In short, we would want to permit writing a gen fn
(and async gen fn
), which would be some function that is able to yield
values and
which eventually returns. Since the iterator’s next
method doesn’t
take any arguments, we wouldn’t need to support passing data in after
yields (in the case of streams, we would pass in data, but only the
Context
values that are not directly exposed to users). Similarly,
iterators and streams don’t produce a “final value” when they’re done,
so these functions would always just return unit.
Adopting a more narrow focus wouldn’t close the door to exposing our internal mechanism as a first-class language feature at some point, but it would help us to solve urgent problems sooner, and it would also give us more experience to use when looking again at the more general task. It also means that we are adding features that makes writing iterators and streams as easy as we can make it, which is a good thing3. (In case you can’t tell, I was sympathetic to boats’ argument.)
Extending the stdlib with some key traits
boats is in favor of adding the “big three” traits to the standard library (if you’ve been reading these interviews, these traits will be quite familiar to you by now):
AsyncRead
AsyncWrite
Stream
Stick to the core vision: Async and sync should be analogous
One important point: boats believes (and I agree) that we should try
to maintain the principle that the async and synchronous versions of
the traits should align as closely as possible. This matches the
overarching design vision of minimizing the differences between “async
Rust” and “sync Rust”. It also argues in favor of the proposal that
sfackler proposed in their interview, where we address the
questions of how to handle uninitialized memory in an analogous way
for both Read
and AsyncRead
.
We talked a bit about the finer details of that principle. For
example, if we were to extend the Read
trait with some kind of read_buf
method (which can support an uninitialized output buffer), then this
new method would have to have a default, for backwards compatibility reasons:
trait Read {
fn read(&mut self, ...);
fn read_buf(&mut self, buf: &mut BufMut<..>) { }
}
This is a bit unfortunate, as ideally you would only implement
read_buf
. For AsyncRead
, since the trait doesn’t exist yet, we
could switch the defaults. But boats pointed out that this carries
costs too: we would forever have to explain why the two traits are
different, for example. (Another option is to have both methods
default to one another, so that you can implement either one, which –
combined with a lint – might be the best of both worlds.)
Generic interface for spawning
Some time back, boats wrote a post proposing global executors. This would basically be a way to add a function to the stdlib to spawn a task, which would then delegate (somehow) to whatever executor you are using. Based on the response to the post, boats now feels this is probably not a good short-term goal.
For one thing, there were a lot of unresolved questions about just what features this global executor should support. But for another, the main goal here is to enable libraries to write “executor independent” code, but it’s not clear how many libraries spawn tasks anyway – that’s usually done more at the application level. Libraries tend to instead return a future and let the application do the spawning (interestingly, one place this doesn’t work is in destructors, since they can’t return futures; supporting async drop, as discussed earlier, would help here.)
So it’d probably be better to revisit this question once we have more experience, particularly once we have the async I/O and stream traits available.
The futures crate
We discussed other possible additions to the standard library. There are a lot of “building blocks” currently in the futures library that are independent from executors and which could do well in the standard library. Some of the things that we talked about:
- async-aware mutexes, clearly a useful building block
- channels
- though std channels are not the most loved, crossbeam’s are genreally preferred
- interstingly, channel types do show up in public APIs from time to time, as a way to receive data, so having them in std could be particularly useful
In general, where things get more complex is whenever you have bits of code that either have to spawn tasks or which do the “core I/O”. These are the points where you need a more full-fledged reactor or runtime. But there are lots of utilities that don’t need that and which could profitably level in the std library.
Where to put async things in the stdlib?
One theme that boats and I did not discuss, but which has come up when
I’ve raised this question with others, is where to put async-aware
traits in the std hierarchy, particularly when there are sync
versions. For example, should we have std::io::Read
and
std::io::AsyncRead
? Or would it be better to have std::io::Read
and something like std::async::io::Read
(obviously, async is a
keyword, so this precise path may not be an option). In other words,
should we combine sync/async traits into the same space, but with
different names, or should we carve out a space for “async-enabled”
traits and use the same names? An interesting question, and I don’t
have an opinion yet.
Conclusion and some of my thoughts
I always enjoy talking with boats, and this time was no exception. I
think boats raised a number of small, practical ideas that hadn’t come
up before. I do think it’s important that, in addition to stabilizing
fundamental building blocks like AsyncRead
, we also consider
improvements to the ergonomic experience with smaller changes like
async fn main
, and I agree with the guiding principle that boats
raised of keeping async and sync code as “analogous” as possible.
Comments?
There is a thread on the Rust users forum for this series.
Footnotes
In the compiler, we call these “suspendable functions” generators, but I’m avoiding that terminology for a reason. ↩︎
This is why I was avoiding using the term “generator” earlier – I want to say “suspendable functions” when referring to the implementation mechanism, and “generator” when referring to the user-exposed feature. ↩︎
though not one that a fully general mechanism necessarily precludes ↩︎