Several months ago, on May 1st, I spoke to Stjepan Glavina about his (at the time) new crate, smol. Stjepan is, or ought to be, a pretty well-known figure in the Rust universe. He is one of the primary authors of the various crossbeam crates, which provide core parallel building blocks that are both efficient and very ergonomic to use. He was one of the initial designers for the async-std runtime. And so when I read stjepang’s blog post describing a new async runtime smol that he was toying with, I knew I wanted to learn more about it. After all, what could make stjepang say:
It feels like this is finally it - it’s the big leap I was longing for the whole time! As a writer of async runtimes, I’m excited because this runtime may finally make my job obsolete and allow me to move onto whatever comes next.
If you’d like to find out, then read on!
You can watch the video on YouTube. I’ve also embedded a copy here for your convenience:
What is smol?
smol is an async runtime, similar to tokio or async-std, but with a distinctly different philosophy. It aims to be much simpler and smaller. Whereas async-std offers a kind of “mirror” of the libstd API surface, but made asynchronous, smol tries to get asynchrony by wrapping and adapting synchronous components. There are two main ways to do this:
- One option is to delegate to a thread-pool. As we’ll see, stjepang
argues that this option is can be much more efficient than people realize,
and that it makes sense for things like accesses to the local file system.
smol offers the
blocking!macro as well as adapters like the
readerfunction, which converts
impl Readvalues into
- The other option is to use the
Async<T>wrapper to convert blocking I/O sockets into non-blocking ones. This works for any I/O type
Tthat is compatible with
epoll(or its equivalent; on Mac, smol uses
kqueue, and on Windows, smol uses
Delegation to a thread pool
One of the debates that has been going back and forth when it comes to
asynchronous coding is how to accommodate things that need to block.
Async I/O is traditionally based on a “cooperative” paradigm, which
means that if you thread is going to do blocking I/O – or perhaps
even just execute a really long loop – you ought to use an explicit
spawn_blocking that tells the scheduler what’s going
Earlier, in the context of async-std, stjepang introduced a new async-std scheduler, inspired by Go. This scheduler would automatically determine when tasks were taking too long and try to spin up more threads to compensate. This was simpler to use, but it also had some downsides: it could be too pessimistic at times, creating spikes in the number of threads.
Therefore, in smol, stjepang returned to the approach of explicitly
labeling your blocking sections, this time via the
macro. This macro will move the “blocking code” out from the
cooperative thread pool to one where the O/S manages the scheduling.
Explicit blocking is often just fine
In fact, you might say that the core argument of
smol is that some
blocking! is often “good enough”. Rather than reproducing
or cloning the libstd API surface to make it asynchronous, it is often
just fine to use the existing API but with a
wrapped around it.
The Async wrapper
But of course if you were spawning threads for all of your I/O,
this would defeat the purpose of using an async runtime in the first
place. Therefore, smol offers another approach, the
The idea of
Async<T> is that you can take a blocking abstraction,
TcpStream found in the standard library, and convert it
to be asynchronous by creating a
Async<TcpStream>. This works for
any type that supports the
AsRawFd trait, which gives access to
the underlying file descriptor. We’ll explain that in a bit.
So what can you do with an
Async<TcpStream>? The core operations
Async<T> offers are the async functions
write_with. They allow you to wrap blocking operations and have
them run asynchronously. For example, given a
socket of type
Async<UdpSocket>, you might write the following to send data
let len = socket.write_with(|s| s.send(msg)).await?;
How the wrappers work: epoll
So how do these wrappers work under the hood? The idea is quite
simple, and it’s connected to how epoll works. The idea with a
traditional Unix non-blocking socket is that it offers the same
interface as a blocking one: i.e., you still invoke functions like
send. However, if the kernel would have had to block, and the socket
is in non-blocking mode, then it simply returns an error code instead.
Now the user’s code knows that the operation wasn’t completed and it
can try again later (in Rust, this is
io::ErrorKind::WouldBlock). But how does it know when to try
again? The answer is that it can invoke
epoll to find out when the
socket is ready to accept data.
write_with methods build on this
idea. Basically, they execute your underlying operation just like
normal. But if that operation returns
WouldBlock, then the
function will register the underlying file descriptor (which was
AsRawFd) with smol’s runtime and yield the current
task. smol’s reactor will invoke epoll and when epoll indicates that
the file descriptor is ready, it will start up your task, which will
run your closure again. Hopefully this time it succeeds.
If this seems familiar, it should.
Async<T> is basically the same
as the core
Future interface, but “specialized” to the case of
pollable file descriptors that return
WouldBlock instead of
Poll::Pending. And of course the core
was very much built with interfaces like
epoll in mind.
write_with wrappers are very general but
not the most convenient to use. Therefore, smol offers some “convenience impls”
that basically wrap existing methods for you. So, for example,
socket: Async<UdpStream>, earlier we saw that I can send data
let len = socket.write_with(|s| s.send(msg)).await?;
but I can also invoke
let len = socket.send(msg).await;
Bridging the sync vs sync worlds
stjepang argues that based the runtime around this idea of “bridging”
the sync vs async worlds not only makes for a smaller runtime, but
also has the potential to help bridge the gap between the “sync” and
“async” worlds. Basically, user’s today have to choose: do they base
their work around the synchronous I/O interfaces, like
Write, or the asynchronous ones? The former are more mature and
there are a lot of libraries available that build on them, but the
latter seem to be the future.
smol presents another option. Rather than converting all libraries to
async, you can just adapt the synchronous libraries into the async
world, either through
Async<T>, where that applies, or through the
blocking adapters like
We walked through the example of the
inotify crate. This is an
existing library that wraps the
inotify interface in the
linux kernel in idiomatic Rust. It is written in a sychronous style,
however, and so you might think that if you are writing async code,
you can’t use it. However, its core type implements
means that you can create an
Async<Inotify> instance and invoke all
its methods by using the
write_with methods (or
create ergonomic wrappers of your own).
Digging into the runtime
In the video, we spent a fair amount of time digging into the guts of how smol is implemented. For example, smol never starts threads on its own: instead, users start their own threads and invoke functions from smol that put those threads to work. We also looked at the details of its thread scheduler, and compaerd it to some of the recent work towards a new Rayon scheduler that is still pending. (Side note, there’s a recorded deep dive on YouTube that digs into how the Rayon scheduler works, if that’s your bag). In any case, we kind of got into the weeds here, so I’ll spare you the details. You can watch the video. =)
The importance of auditing and customizing
One interesting theme that we came to later is the importance of being able to audit unsafe code. stjepang mentioned that he has often heard people say that they would be happy to have a runtime that doesn’t achieve peak performance, if it makes use of less unsafe code.
In fact, I think one of the things that stjepang would really like to see is people taking smol and, rather than using it directly, adapting it to their own codebases. Basically using it as a starting point to build your own runtime for your own needs.
Towards a generic runtime interface?
It’s not a short-term thing, but one of the things that I personally am very interested in is getting a better handle on what a “generic runtime interface” looks like. I’d love to see a future where async runtimes are like allocators: there is a default one that works “pretty well” that you can use a lot of the time, but it’s also really use to change that default and port your application over to more specialized allocators that work better for you.
I’ve often imagined this as a kind of trait that encapsulates the
“core functions” a runtime would provide, kind of like the
GlobalAlloc trait for allocators. But stjepang pointed out that
smol suggests a different possibility, one where the std library
offers a kind of “mini reactor”. This reactor would offer functions to
“register” sockets, associate them with wakers, and a function that
periodically identifies things that can make progress and pushes them
along. This wouldn’t in and of itself be a runtime, but it would be a
building block that other runtimes can use.
Anyway, as I said above, I don’t think we’re at the point where we know what a generic runtime interface should look like. I’m particularly a bit nervous about something that is overly tied to epoll, given all the interesting work going on around adapting io-uring (e.g., withoutboat’s Ringbahn) and so forth. But I think it’s an interesting thing to think about, and I definitely think smol stakes out an interesting point in this space.
My main takeaways from this conversation were:
- The “core code” you need for a runtime is really very little.
- Adapters like
Async<T>and offloading work onto thread pools can be a helpful and practical way to unify the question of sync vs async.
- In particular, while I knew that Future’s were conceptually quite
close to epoll, I hadn’t realized how far you could get with a
generic adapter like
Async<T>, which maps between the I/O
WouldBlockerror and the
- In thinking about the space of possible runtimes, we should be considering not only things like efficiency and ergonomics, but also the total amount of code and our ability to audit and understand it.
There is a thread on the Rust users forum for this series.