Baby Steps

A blog about programming, and tiny ways to improve it.

Rayon: Data Parallelism in Rust

Over the last week or so, I’ve been working on an update to Rayon, my experimental library for data parallelism in Rust. I’m pretty happy with the way it’s been going, so I wanted to write a blog post to explain what I’ve got so far.

Rayon’s goal is to make it easy to add parallelism to your sequential code – so basically to take existing for loops or iterators and make them run in parallel. For example, if you have an existing iterator chain like this:

let total_price = stores.iter()
                        .map(|store| store.compute_price(&list))

then you could convert that to run in parallel just by changing from the standard sequential iterator to Rayon’s parallel iterator:

let total_price = stores.par_iter()
                        .map(|store| store.compute_price(&list))

Of course, part of making parallelism easy is making it safe. Rayon guarantees you that using Rayon APIs will not introduce data races.

This blog post explains how Rayon works. It starts by describing the core Rayon primitive (join) and explains how that is implemented. I look in particular at how many of Rust’s features come together to let us implement join with very low runtime overhead and with strong safety guarantees. I then explain briefly how the parallel iterator abstraction is built on top of join.

I do want to emphasize, though, that Rayon is very much work in progress. I expect the design of the parallel iterator code in particular to see a lot of, well, iteration (no pun intended), since the current setup is not as flexible as I would like. There are also various corner cases that are not correctly handled, notably around panic propagation and cleanup. Still, Rayon is definitely usable today for certain tasks. I’m pretty excited about it, and I hope you will be too!

Virtual Structs Part 4: Extended Enums and Thin Traits

So, aturon wrote this interesting post on an alternative virtual structs approach, and, more-or-less since he wrote it, I’ve been wanting to write up my thoughts. I finally got them down.

Before I go any further, a note on terminology. I will refer to Aaron’s proposal as the Thin Traits proposal, and my own previous proposal as the Extended Enums proposal. Very good.

(OK, I lied, one more note: starting with this post, I’ve decided to disable comments on this blog. There are just too many forums to keep up with! So if you want to discuss this post, I’d recommend doing so on this Rust internals thread.)


Let me lead with my conclusion: while I still want the Extended Enums proposal, I lean towards implementing the Thin Traits proposal now, and returning to something like Extended Enums afterwards (or at some later time). My reasoning is that the Thin Traits proposal can be seen as a design pattern lying latent in the Extended Enums proposal. Basically, once we implement specialization, which I want for a wide variety of reasons, we almost get Thin Traits for free. And the Thin Traits pattern is useful enough that it’s worth taking that extra step.

Now, since the Thin Traits and Extended Enums proposal appear to be alternatives, you may wonder why I would think there is value in potentially implementing both. The way I see it, they target different things. Thin Traits gives you a way to very precisely fashion something that acts like a C++ or Java class. This means you get thin pointers, inherited fields and behavior, and you even get open extensibility (but, note, you thus do not get downcasting).

Extended Enums, in contrast, is targeting the fixed domain use case, where you have a defined set of possibilities. This is what we use enums for today, but (for the reasons I outlined before) there are various places that we could improve, and that was what the extended enums proposal was all about. One advantage of targeting the fixed domain use case is that you get additional power, such as the ability to do match statements, or to use inheritance when implementing any trait at all (more details on this last point below).

To put it another way: with Thin Traits, you write virtual methods whereas with Extensible Enums, you write match statements – and I think match statements are far more common in Rust today.

Still, Thin Traits will be a very good fit for various use cases. They are a good fit for Servo, for example, where they can be put to use modeling the DOM. The extensibility here is probably a plus, if not a hard requirement, because it means Servo can spread the DOM across multiple crates. Another place that they might (maybe?) be useful is if we want to have a stable interface to the AST someday (though for that I think I would favor something like RFC 757).

But I think there a bunch of use cases for extensible enums that thin traits don’t cover at all. For example, I don’t see us using thin traits in the compiler very much, nor do I see much of a role for them in LALRPOP, etc. In all these cases, the open-ended extensibility of Thin Traits is not needed and being able to exhaustively match is key. Refinement types would also be very welcome.

Which brings me to my final thought. The Extended Enums proposal, while useful, was not perfect. It had some rough spots we were not happy with (which I’ll discuss later on). Deferring the proposal gives us time to find new solutions to those aspects. Often I find that when I revisit a troublesome feature after letting it sit for some time, I find that either (1) the problem I thought there was no longer bothers me or (2) the feature isn’t that important anyway or (3) there is now a solution that was either previously not possible or which just never occurred to me.

OK, so, with that conclusion out of the way, the post continues by examining some of the rough spots in the Extended Enums proposal, and then looking at how we can address those by taking an approach like the one described in Thin Traits.


Around four years ago, when I had first decided to start at Mozilla research, I had planned to write an LR(1) parser generator. It seemed like a good way to get to know Rust. However, I found that newborns actually occupy somewhat more time than anticipated (read: I was lucky to squeeze in a shower), and hence that never came to pass.

Well, I’m happy to say that, four years later, I’ve finally rectified that. For a few months now I’ve been working on a side project while I have my morning coffee: LALRPOP (pronounced like some sort of strangely accented version of lollypop). LALRPOP is an LR(1) parser generator that emits Rust code. It is designed for ease of use, so it includes a number of features that many parser generators are missing:

  • Regular-expression-like notation, so you can write Id* for any number of `Id` or Id? for an optional `Id`.
  • User-defined macros, so you can make a macro like Comma<Id> that means comma separated list of `Id` with optional trailing comma.
  • Conditional macros, so you can easily generate a subset of your grammar for some particular context (like, all expressions that don’t end in {).
  • Support for synthesizing tokenizers (currently somewhat limited, but sufficient for many uses) as well as external tokenizers (very flexible). If you’re using an external tokenizer, you don’t even need to be parsing input strings at all really, any iterator of matchable values will do.
  • Easy to get access to positional information.
  • Easy to write fallible rules where the action code can generate a parse error.

If you’d like to learn more about LALRPOP, I recently started a tutorial that introduces LALRPOP in more depth and walks through most of its features. The tutorial doesn’t cover everything yet, but I’ll try to close the gaps.

Why LR(1)? After all, aren’t LR(1) generators kind of annoying, what with those weird shift/reduce errors? Well, after teaching compiler design for so many years, I think I may have developed Stockholm syndrome – I kind of enjoy diagnosing and solving shift/reduce failures. ;) But more seriously, I personally like that once I get my grammar working with an LR(1) generator, I know that it is unambiguous and will basically work. When I’ve used PEG generators, I usually find that they work great in the beginning, but once in a while they will just mysteriously fail to parse something, and figuring out why is a horrible pain. This is why with LALRPOP I’ve tried to take the approach of adding tools to make handling shift/reduce errors relatively easy – basically automating the workarounds that one typically has to do by hand.

That said, eventually I would like LALRPOP to support a bunch of algorithms. In particular, I plan to add something that can handle universal CFGs, though other deterministic techniques, like LL(k), would be nice as well.

Performance. Another advantage of LR(1), of course, it that it offers linear performance. That said, I’ve found that in practice, parsing based on a parsing table is not particularly speedy. If you think about it, it’s more-or-less interpreting your grammar – you’ve basically got a small loop that’s loading data from a table and then doing an indirect jump based on the results, which happen to be the two operations that CPUs like least. In my experience, rewriting to use a recursive descent parser is often much faster.

LALRPOP takes a different approach. The idea is that instead of a parsing table, we generate a function for every state. This ought to be quite speedy; it also plays very nicely with Rust’s type system, since types in Rust don’t have uniform size, and using distinct functions lets us store the stack in local variables, rather than using a Vec. At first, I thought maybe I had invented something new with this style of parsing, but of course I should have known better: a little research revealed that this technique is called recursive ascent.

Now, as expected, recursive ascent is supposed to be quite fast. In fact, I was hoping to unveil some fantastic performance numbers with this post, but I’ve not had time to try to create a fair benchmark, so I can’t – since I haven’t done any measurements, LALRPOP’s generated code may in fact be quite slow. I just don’t know. Hopefully I’ll find some time to rectify that in the near future.

100% stable Rust. It’s probably worth pointing out that LALRPOP is 100% stable Rust, and I’m committed to keeping it that way.

Other parser generators. Should LALRPOP or LR(1) not be too your fancy, I just want to point out that the Rust ecosystem has grown quite a number of parser combinator and PEG libraries: nom, oak, peg, nailgun, peggler, combine, parser-combinators, and of course my own rusty-peg. I’m not aware of any other LR(1) (or GLL, GLR, etc) generators for Rust, but there may well be some. There are also two LALR parser generators for Rust you may want to check out, racc and and lemon_rust.

Future plans. I’ve got some plans for LALRPOP. There are a host of new features I’d like to add, with the aim of eliminating more boilerplate. I’d also like to explore adding new parser algorithms, particularly universal algorithms that can handle any CFG grammar, such as GLL, GLR, or LL(*). Finally, I’m really interesting in exploring the area of error recovery, and in particular techniques to find the minimum damaged area of a parse tree if there is an incorrect parse. (There is of course tons of existing work here.)

Plea for help. Of course, if I wind up doing all of this myself, it might take quite some time. So if you’re interested in working on LALRPOP, I’d love to hear from you! I’d also love to hear some other suggestions for things to do with LALRPOP. One of the things I plan to do over the next few weeks is also spend some more time writing up plans in LALRPOP’s issue database, as well as filling out its wiki.

Virtual Structs Part 3: Bringing Enums and Structs Together

So, in previous posts, I discussed the pros and cons of two different approaches to modeling variants: Rust-style enums and C++-style classes. In those posts, I explained why I see Rust enums and OO-style class hierarchies as more alike than different (I personally credit Scala for opening my eyes to this, though I’m sure it’s been understood by others for much longer). The key points were as follows:

  • Both Rust-style enums and C++-style classes can be used to model the idea of a value that be one of many variants, but there are differences in how they work at runtime. These differences mean that Rust-style enums are more convenient for some tasks, and C++-style classes for others. In particular:
    • A Rust-style enum is sized as large as the largest variant. This is great because you can lay them out flat in another data structure without requiring any allocation. You can also easily change from one variant to another. One downside of Rust enums is that you cannot refine them to narrow the set of variants that a particular value can have.
    • A C++-style class is sized to be exactly as big as one variant. This is great because it can be much more memory efficient. However, if you don’t know what variant you have, you must manipulate the value by pointer, so it tends to require more allocation. It is also impossible to change from one variant to another. Class hierarchies also give you a simple, easily understood kind of refinement, and the ability to have common fields that are shared between variants.
  • C++-style classes offer constructors, which allows for more abstraction and code reuse when initially creating an instance, but raise thorny questions about the type of a value under construction; Rust structs and enums are always built in a single-shot today, which is simpler and safer but doesn’t compose as well.

What I want to talk about in this post is a proposal (or proto-proposal) for bridging those two worlds in Rust. I’m going to focus on data layout in this post. I’ll defer virtual methods for another post (or perhaps an RFC). Spoiler alert: they can be viewed as a special case of specialization.

I had originally intended to publish this post a few days after the others. Obviously, I got delayed. Sorry about that! Things have been very busy! In any case, better late than never, as some-great-relative-or-other always (no doubt) said. Truth is, I really miss blogging regularly, so I’m going to make an effort to write up more in progress and half-baked ideas (yeah yeah, promises to blog more are a dime a dozen, I know).

Note: I want to be clear that the designs in this blog post are not my work per se. Some of the ideas originated with me, but others have arisen in the course of conversations with others, as well as earlier proposals from nrc, which in turn were heavily based on community feedback. And of course it’s not like we Rust folk invented OO or algebraic data types or anything in the first place. :)

Virtual Structs Part 2: Classes Strike Back

This is the second post summarizing my current thoughts about ideas related to virtual structs. In the last post, I described how, when coding C++, I find myself missing Rust’s enum type. In this post, I want to turn it around. I’m going to describe why the class model can be great, and something that’s actually kind of missing from Rust. In the next post, I’ll talk about how I think we can get the best of both worlds for Rust. As in the first post, I’m focusing here primarily on the data layout side of the equation; I’ll discuss virtual dispatch afterwards.

Virtual Structs Part 1: Where Rust's Enum Shines

One priority for Rust after 1.0 is going to be incorporating some kind of support for efficient inheritance or virtual structs. In order to motivate and explain this design, I am writing a series of blog posts examining how Rust’s current abstractions compare with those found in other languages.

The way I see it, the topic of virtual structs has always had two somewhat orthogonal components to it. The first component is a question of how we can generalize and extend Rust enums to cover more scenarios. The second component is integrating virtual dispatch into this picture.

I am going to start the series by focusing on the question of extending enums. This first post will cover some of the strengths of the current Rust enum design; the next post, which I’ll publish later this week, will describe some of the advantages of a more class-based approach. Then I’ll discuss how we can bring those two worlds together. After that, I will turn to virtual dispatch, impls, and matching, and show how they interact.

A Few More Remarks on Reference-counting and Leaks

So there has been a lot of really interesting discussion in response to my blog post. I wanted to highlight some of the comments I’ve seen, because I think they raise good points that I failed to address in the blog post itself. My comments here are lightly edited versions of what I wrote elsewhere.

Isn’t the problem with objects and leak-safe types more general?

Reem writes:

I posit that this is in fact a problem with trait objects, not a problem with Leak; the exact same flaw pointed about in the blog post already applies to the existing OIBITs, Send, Sync, and Reflect. The decision of which OIBITs to include on any trait object is already a difficult one, and is a large reason why std strives to avoid trait objects as part of public types.

I agree with him that the problems I described around Leak and objects apply equally to Send (and, in fact, I said so in my post), but I don’t think this is something we’ll be able to solve later on, as he suggests. I think we are working with something of a fundamental tension. Specifically, objects are all about encapsulation. That is, they completely hide the type you are working with, even from the compiler. This is what makes them useful: without them, Rust just plain wouldn’t work, since you couldn’t (e.g.) have a vector of closures. But, in order to gain that flexibility, you have to state your requirements up front. The compiler can’t figure them out automatically, because it doesn’t (and shouldn’t) know the types involved.

So, given that objects are here to stay, the question is whether adding a marker trait like Leak is a problem, given that we already have Send. I think the answer is yes; basically, because we can’t expect object types to be analyzed statically, we should do our best to minimize the number of fundamental splits people have to work with. Thread safety is pretty fundamental. I don’t think Leak makes the cut. (I said some of the reasons in conclusion of my previous blog post, but I have a few more in the questions below.)

Could we just remove Rc and only have RcScoped? Would that solve the problem?

Original question.

Certainly you could remove Rc in favor of RcScoped. Similarly, you could have only Arc and not Rc. But you don’t want to because you are basically failing to take advantage of extra constraints. If we only had RcScoped, for example, then creating an Rc always requires taking some scoped as argument – you can have a global constant for 'static data, but it’s still the case that generic abstractions have to take in this scope as argument. Moreover, there is a runtime cost to maintaining the extra linked list that will thread through all Rc abstractions (and the Rc structs get bigger, as well). So, yes, this avoids the split I talked about, but it does it by pushing the worst case on all users.

Still, I admit to feeling torn on this point. What pushes me over the edge, I think, is that simple reference counting of the kind we are doing now is a pretty fundamental thing. You find it in all kinds of systems (Objective C, COM, etc). This means that if we require that safe Rust cannot leak, then you cannot safely integrate borrowed data with those systems. I think it’s better to just use closures in Rust code – particularly since, as annodomini points out on Reddit, there are other kinds of cases where RAII is a poor fit for cleanup.

Could a proper GC solve this? Is reference counting really worth it?

Original question.

It’ll depend on the precise design, but tracing GC most definitely is not a magic bullet. If anything, the problem around leaks is somewhat worse: GC’s don’t give any kind of guarantee about when the destructor bans. So we either have to ban GC’d data from having destructors or ban it from having borrowed pointers; either of those implies a bound very similar to Leak or 'static. Hence I think that GC will never be a fundamental building block for abstractions in the way that Rc/Arc can be. This is sad, but perhaps inevitable: GC inherently requires a runtime as well, which already limits its reusability.

On Reference-counting and Leaks

What’s a 1.0 release without a little drama? Recently, we discovered that there was an oversight in one of the standard library APIs that we had intended to stabilize. In particular, we recently added an API for scoped threads – that is, child threads which have access to the stack frame of their parent thread.

The flaw came about because, when designing the scoped threads API, we failed to consider the impact of resource leaks. Rust’s ownership model makes it somewhat hard to leak data, but not impossible. In particular, using reference-counted data, you can construct a cycle in the heap, in which case the components of that cycle may never be freed.

Some commenters online have taken this problem with the scoped threads API to mean that Rust’s type system was fundamentally flawed. This is not the case: Rust’s guarantee that safe code is memory safe is as true as it ever was. The problem was really specific to the scoped threads API, which was making flawed assumptions; this API has been marked unstable, and there is an RFC proposing a safe alternative.

That said, there is an interesting, more fundamental question at play here. We long ago decided that, to make reference-counting practical, we had to accept resource leaks as a possibility. But some recent proposals have suggested that we should place limits on the Rc type to avoid some kinds of reference leaks. These limits would make the original scoped threads API safe. However, these changes come at a pretty steep price in composability: they effectively force a deep distinction between leakable and non-leakable data, which winds up affecting all levels of the system.

This post is my attempt to digest the situation and lay out my current thinking. For those of you don’t want to read this entire post (and I can’t blame you, it’s long), let me just copy the most salient paragraph from my conclusion:

This is certainly a subtle issue, and one where reasonable folk can disagree. In the process of drafting (and redrafting…) this post, my own opinion has shifted back and forth as well. But ultimately I have landed where I started: the danger and pain of bifurcating the space of types far outweighs the loss of this particular RAII idiom.

All right, for those of you who want to continue, this post is divided into three sections:

  1. Section 1 explains the problem and gives some historical background.
  2. Section 2 explains the status quo.
  3. Section 3 covers the proposed changes to the reference-counted type and discusses the tradeoffs involved there.

Modeling Graphs in Rust Using Vector Indices

After reading nrc’s blog post about graphs, I felt inspired to write up an alternative way to code graphs in Rust, based on vectors and indicates. This encoding has certain advantages over using Rc and RefCell; in particular, I think it’s a closer fit to Rust’s ownership model. (Of course, it has disadvantages too.)

I’m going to describe a simplified version of the strategy that rustc uses internally. The actual code in Rustc is written in a somewhat dated Rust dialect. I’ve also put the sources to this blog post in their own GitHub repository. At some point, presumably when I come up with a snazzy name, I’ll probably put an extended version of this library up on Anyway, the code I cover in this blog post is pared down to the bare essentials, and so it doesn’t support (e.g.) enumerating incoming edges to a node, or attach arbitrary data to nodes/edges, etc. It would be easy to extend it to support that sort of thing, however.

Little Orphan Impls

We’ve recently been doing a lot of work on Rust’s orphan rules, which are an important part of our system for guaranteeing trait coherence. The idea of trait coherence is that, given a trait and some set of types for its type parameters, there should be exactly one impl that applies. So if we think of the trait Show, we want to guarantee that if we have a trait reference like MyType : Show, we can uniquely identify a particular impl. (The alternative to coherence is to have some way for users to identify which impls are in scope at any time. It has its own complications; if you’re curious for more background on why we use coherence, you might find this rust-dev thread from a while back to be interesting reading.)

The role of the orphan rules in particular is basically to prevent you from implementing external traits for external types. So continuing our simple example of Show, if you are defining your own library, you could not implement Show for Vec<T>, because both Show and Vec are defined in the standard library. But you can implement Show for MyType, because you defined MyType. However, if you define your own trait MyTrait, then you can implement MyTrait for any type you like, including external types like Vec<T>. To this end, the orphan rule intuitively says either the trait must be local or the self-type must be local.

More precisely, the orphan rules are targeting the case of two cousin crates. By cousins I mean that the crates share a common ancestor (i.e., they link to a common library crate). This would be libstd, if nothing else. That ancestor defines some trait. Both of the crates are implementing this common trait using their own local types (and possibly types from ancestor crates, which may or may not be in common). But neither crate is an ancestor of the other: if they were, the problem is much easier, because the descendant crate can see the impls from the ancestor crate.

When we extended the trait system to support multidispatch, I confess that I originally didn’t give the orphan rules much thought. It seemed like it would be straightforward to adapt them. Boy was I wrong! (And, I think, our original rules were kind of unsound to begin with.)

The purpose of this post is to lay out the current state of my thinking on these rules. It sketches out a number of variations and possible rules and tries to elaborate on the limitations of each one. It is intended to serve as the seed for a discussion in the Rust discusstion forums.