Baby Steps

A blog about programming, and tiny ways to improve it.

Intersection Impls

As some of you are probably aware, on the nightly Rust builds, we currently offer a feature called specialization, which was defined in RFC 1210. The idea of specialization is to improve Rust’s existing coherence rules to allow for overlap between impls, so long as one of the overlapping impls can be considered more specific. Specialization is hotly desired because it can enable powerful optimizations, but also because it is an important component for modeling object-oriented designs.

The current specialization design, while powerful, is also limited in a few ways. I am going to work on a series of articles that explore some of those limitations as well as possible solutions.

This particular posts serves two purposes: it describes the running example I want to consder, and it describes one possible solution: intersection impls (more commonly called lattice impls). We’ll see that intersection impls are a powerful feature, but they don’t completely solve the problem I am aiming to solve and they also intoduce other complications. My conclusion is that they may be a part of the final solution, but are not sufficient on their own.

Thoughts on Trusting Types and Unsafe Code

I’ve been thinking about the unsafe code guidelines a lot in the back of my mind. In particular, I’ve been trying to think through what it means to trust types – if you recall from the Tootsie Pop Model (TPM) blog post, one of the key examples that I was wrestling with was the RefCell-Ref example. I want to revisit a variation on that example now, but from a different angle. (This by the way is one of those Niko thinks out loud blog posts, not one of those Niko writes up a proposal blog posts.)

The 'Tootsie Pop' Model for Unsafe Code

In my previous post, I spent some time talking about the idea of unsafe abstractions. At the end of the post, I mentioned that Rust does not really have any kind of official guidelines for what kind of code is legal in an unsafe block and what is not.What this means in practice is that people wind up writing what seems reasonable and checking it against what the compiler does today. This is of course a risky proposition since it means that if we start doing more optimization in the compiler, we may well wind up breaking unsafe code (the code would still compile; it would just not execute like it used to).

Now, of course, merely having published guidelines doesn’t entirely change that dynamic. It does allow us to assign blame to the unsafe code that took actions it wasn’t supposed to take. But at the end of the day we’re still causing crashes, so that’s bad.

This is partly why I have advocated that I want us to try and arrive at guidelines which are human friendly. Even if we have published guidelines, I don’t expect most people to read them in practice. And fewer still will read past the introduction. So we had better be sure that reasonable code works by default.

Interestingly, there is something of a tension here: the more unsafe code we allow, the less the compiler can optimize. This is because it would have to be conservative about possible aliasing and (for example) avoid reordering statements. We’ll see some examples of this as we go.

Still, to some extent, I think it’s possible for us to have our cake and eat it too. In this blog post, I outline a proposal to leverage unsafe abstaction boundaries to inform the compiler where it can be aggressive and where it must be conservative. The heart of the proposal is the intution that:

  • when you enter the unsafe boundary, you can rely that the Rust type system invariants hold;
  • when you exit the unsafe boundary, you must ensure that the Rust type system invariants are restored;
  • in the interim, you can break a lot of rules (though not all the rules).

I call this the Tootsie Pop model: the idea is that an unsafe abstraction is kind of like a Tootsie Pop. There is a gooey candy interior, where the rules are squishy and the compiler must be conservative when optimizing. This is separated from the outside world by a hard candy exterior, which is the interface, and where the rules get stricter. Outside of the pop itself lies the safe code, where the compiler ensures that all rules are met, and where we can optimize aggressively.

One can also compare the approach to what would happen when writing a C plugin for a Ruby interpreter. In that case, your plugin can assume that the inputs are all valid Ruby objects, and it must produce valid Ruby objects as its output, but internally it can cut corners and use C pointers and other such things.

In this post, I will elaborate a bit more on the model, and in particular cover some example problem cases and talk about the grey areas that still need to be hammered out.

Unsafe Abstractions

The unsafe keyword is a crucial part of Rust’s design. For those not familiar with it, the unsafe keyword is basically a way to bypass Rust’s type checker; it essentially allows you to write something more like C code, but using Rust syntax.

The existence of the unsafe keyword sometimes comes as a surprise at first. After all, isn’t the point of Rust that Rust programs should not crash? Why would we make it so easy then to bypass Rust’s type system? It can seem like a kind of flaw in the design.

In my view, though, unsafe is anything but a flaw: in fact, it’s a critical piece of how Rust works. The unsafe keyword basically serves as a kind of escape valve – it means that we can keep the type system relatively simple, while still letting you pull whatever dirty tricks you want to pull in your code. The only thing we ask is that you package up those dirty tricks with some kind of abstraction boundary.

This post introduces the unsafe keyword and the idea of unsafety boundaries. It is in fact a lead-in for another post I hope to publish soon that discusses a potential design of the so-called Rust memory model, which is basically a set of rules that help to clarify just what is and is not legal in unsafe code.

Non-lexical Lifetimes Based on Liveness

In my previous post I outlined several cases that we would like to improve with Rust’s current borrow checker. This post discusses one possible scheme for solving those. The heart of the post is two key ideas:

  1. Define a lifetime as a set of points in the control-flow graph, where a point here refers to some particular statement in the control-flow graph (i.e., not a basic block, but some statement within a basic block).
  2. Use liveness as the basis for deciding where a variable’s type must be valid.

The rest of this post expounds on these two ideas and shows how they affect the various examples from the previous post.

Non-lexical Lifetimes: Introduction

Over the last few weeks, I’ve been devoting my free time to fleshing out the theory behind non-lexical lifetimes (NLL). I think I’ve arrived at a pretty good point and I plan to write various posts talking about it. Before getting into the details, though, I wanted to start out with a post that lays out roughly how today’s lexical lifetimes work and gives several examples of problem cases that we would like to solve.

Nice Errors in LALRPOP

For the last couple of weeks, my mornings have been occupied with a pretty serious revamping of LALRPOP’s error message output. I will probably wind up doing a series of blog posts about the internal details of how it works, but I wanted to write a little post to advertise this work.

Typically when you use an LR(1) parser generator, error messages tend to be written in terms of the LR(1) state generation algorithm. They use phrases like shift/reduce conflict and talk about LR(1) items. Ultimately, you have to do some clever thinking to relate the error to your grammar, and then a bit more clever thinking to figure out how you should adjust your grammar to make the problem go away. While working on adapting the Rust grammar to LALRPOP, I found I was wasting a lot of time trying to decrypt the error messages, and I wanted to do something about it. This work is the result.

An aside: It’s definitely worth citing Menhir as an inspiration, which is an awesome parser generator for OCaml. Menhir offers a lot of the same features that LALRPOP does, and in particular generates errors very similar to those I am talking about here.

What I’ve tried to do now in LALRPOP is to do that clever thinking for you, and instead present the error message in terms of your grammar. Perhaps even more importantly, I’ve also tried to identify common beginner problems and suggest solutions. Naturally this is a work-in-progress, but I’m already pretty excited with the current status, so I wanted to write up some examples of it in action.

Parallel Iterators Part 2: Producers

This post is the second post in my series on Rayon’s parallel iterators. The goal of this series is to explain how parallel iterators are implemented internally, so I’m going to be going over a lot of details and giving a lot of little code examples in Rust. If all you want to do is use parallel iterators, you don’t really have to understand any of this stuff.

I’ve had a lot of fun designing this system, and I learned a few lessons about how best to use Rust (some of which I cover in the conclusions). I hope you enjoy reading about it!

This post is part 2 of a series. In the initial post I covered sequential iterators, using this dot-product as my running example:

1
2
3
4
vec1.iter()
    .zip(vec2.iter())
    .map(|(i, j)| i * j)
    .sum()

In this post, we are going to take a first stab at extending sequential iterators to parallel computation, using something I call parallel producers. At the end of the post, we’ll have a system that can execute that same dot-product computation, but in parallel:

1
2
3
4
vec1.par_iter()
    .zip(vec2.par_iter())
    .map(|(i, j)| i * j)
    .sum()

Parallel producers are very cool, but they are not the end of the story! In the next post, we’ll cover parallel consumers, which build on parallel producers and add support for combinators which produce a variable number of items, like filter or flat_map.