Baby Steps

A blog about programming, and tiny ways to improve it.

Thoughts on DST, Part 2

In the previous post I elaborated a bit on DSTs and how they could be created and used. I want to look a bit now at an alternate way to support the combination of vector types and smart pointers (e.g., RC<[uint]>). This approach avoids the use of DSTs. We’ll see that it also addresses some of the rough patches of DST, but doesn’t work quite as well for object types.

This is part 2 of a series:

  1. Examining DST.
  2. Examining an alternative formulation of the current system.
  3. …and then a twist.
  4. Further complications.

Thoughts on DST, Part 1

In the past, I’ve been quite the champion of dynamically sized types (DST). Specifically what this means is that things like [T] and Trait would be “types” in the Rust type system. Lately I’ve been investing a lot of effort thinking through the ramifications of offering better support for smart pointers, and in particular how this interacts with dynamically sized types, and I am no longer persuaded that DST offer the best way forward. I’m a bit unsure, though, and the topic is complicated, so I wanted to stop and write up a short series of posts laying out my thought process thus far. This post will describe what it would mean to offer DST in more detail. I don’t plan to give a lot of Rust background, since there’s enough to talk about.

This is part 1 of a series:

  1. Examining DST.
  2. Examining an alternative formulation of the current system.
  3. …and then a twist.
  4. Further complications.

Optimizing SIMD, Part 2

A quick follow-up to my previous post. The approach I suggested (“generate boxing instructions but bypass them when possible”) is in some sense pessimistic: we generate the instructions we need for the worst case and then cleanup. Like many problems in computer science, it has an optimistic dual. We could generate unboxed data and then insert boxes where needed. In fact, we have an existing mechanism for doing that, called the type policies. Basically, there is a phase where each MIR opcode goes through and examines the types of its inputs, attempting to reconcile those types with what it needs, either by boxing or unboxing as needed.

In a [comment on my original post][c], Haitao suggested using the more optimistic approach. It is also something I had discussed briefly with Luke in the past. I agree it is a good way to go; the main reason I held off on suggesting it is that I thought using the pessimistic approach (particularly to start) would be a faster way of getting going. Also, I wasn’t sure if having objects (the float32x4 wrappers) be represented by unpacked values during the IonBuilder phase would violate some invariants.

The reason that I think it’ll be easier to get started if we create the wrapped instructions to begin with is that they can be used in the bailouts. This means that we’ll be able to implement and test the required changes to the register allocator and code generator first. Of course, performance will be lackluster since we’ll still be allocating temporaries.

The pessimistic approach is also easier to implement. It requires only a few lines of code, basically just a helper for unboxing float32x4 values which first checks “is this a boxed float32x4? let me just pull out the input and return that directly rather than generating a load instruction”. However, it won’t yield as good results, since it doesn’t handle phis as well. Until Haitao suggested it, I hadn’t thought about using the optimistic approach to address phis.

In any case, either technique seems like a good choice to me. In the interest of making changes as small as possible, what I’d prefer is to start with the pessimistic approach and then move to the optimistic approach once things work.

If we take this approach (pessimistic first, optimistic later), then this is the high-level set of tasks. I used indentation to convey dependencies.

  • Add the simd module, either in C++ code or (preferably) ported to self-hosting. We have to be careful to get the float32 addition semantics here, but thanks to Math.fround that’s not too hard.
  • Add MIR types and operations, using pessimistic optimization to start
    • Augment register allocator and LIR to handle vector types
    • Augment bailout code to accept unboxed vectors and box them
      • Implement changes to type policy and generate unboxed data by default (“optimistic approach”)
  • Eventually, correct the value type semantics – right now we are representing float32x4 values as objects. The correct strategy for permitting new kinds of values (in the sense of a distinct typeof) is being discussed actively in bug 654416, and we will eventually latch onto whatever that strategy becomes.

Optimizing SIMD

There is currently some ongoing effort to implement the proposed JavaScript SIMD API in Firefox. The basic idea of the API is to introduce explicit vector value types called float32x4 and int32x4. These types fit into the typed objects hierarchy, so you can create arrays of them, embed them in structs, and so forth.

The semantics of these vectors types is designed to make it possible for JIT engines to detect and optimize their use. One crucial bit is that they are values and hence do not have identity. Basically float32x4 values work like numbers and strings do today – they are equal if they have the same value. This is quite different from objects, which may be unequal if their properties are the same (e.g., {} !== {}).

The purpose of this post is to outline one possible strategy for optimizing the use of SIMD types in IonMonkey. The initial set of optimizations don’t require any new analyses, though we can add an optional simple analysis in order to optimize more programs. This analysis is actually fairly general and can be applied to other kinds of typed objects.

Parameter Coercion in Rust

Alex Chrichton recently sent a message to the rust-dev mailing list discussing the fate of parameter coercion in Rust. I’ve been thinking about this for a while and feeling conflicted. As is my wont, I decided to try and write up a blog post explaining precisely what’s under debate and exporing the tradeoffs.

Treating Vectors Like Any Other Container

Some astute comments on a recent thread to rust-dev got me thinking about our approach to vectors. Until now, we have focused on having built-in support for vectors via the vector type (~[T]) and slice types (&[T]). However, another possible approach would be to move vector support out of the language (almost) entirely and into standard libraries. I wanted to write out a post exploring this idea; I find it brings some simplifications and reduces the need for DST. Seems like an idea worth considering. Consider this a thought experiment, not exactly a proposal.

Intermingled Parameter Lists, Take 2

I got a lot of feedback on my post about intermingled parameter lists – most of it negative – and I’ve been thinking about the issue over the weekend. Truth is, I wasn’t terribly fond of the proposal myself – making the position in the list significant feels wrong – but I felt it was the least bad of the various options. However, I’ve had a change of heart, and thus have a new “least bad” proposal.

The high-level summary of the plan is as follows:

  • Parameters lists are not intermingled, instead we begin with some number of lifetime parameters and then some number of type parameters, as today.
  • Lifetime parameters on fn items are considered late bound unless they appear within a type bound.
  • When referencing a fn item (or any generic path), users would have the option of either (1) supplying values for no parameters at all; (2) only supplying values for type parameters; or (3) supplying values for both lifetime and type parameters.
  • Users may always use _ to have the compiler select a suitable default for the value of some type or type/lifetime parameter. This is equivalent to not supplying a value at all.

This plan is fully backwards compatible (all existing code retains its existing meaning).

Aside #1: Some good arguments were advanced in favor of making all lifetime parameters late bound and relying instead on higher-kinded types (HKT) or some kind of explicit universal quantification syntax instead. While both things – and in particular, HKT – might be nice to have, I don’t think that they fully eliminate the need for early/late binding, as I will explain later.

Aside #2: I’d like to find a better name for early and late binding. If there is a standard term for this sort of distinction, I am not aware of it. I suspect this is because most languages whcih feature universally quantified types support quantification over other types – it is much cleaner from a pure type system point-of-view – and thus they do not have to discriminate between early and late binding. The situation is also related to let polymorphism and the distinction between type schemes and types in ML or Haskell, so perhaps there is a related term I can borrow from there. It occurs to me that it’s also worth digging into the literature about type erasure, as this distinction arises at least partly due to our use of monomorphization as an implementation scheme. Anyway, suggestions for better names or else other reading material welcome.

PJS Roadmap

I think someone reading this blog would be forgiven for thinking that I must spend most of my energy thinking about Rust. In fact I spend a good part of my working hours hammering on PJS. I thought I’d try to write up a bit of a preview of the things we are working on.

Parallel methods on arrays

Right now, on Nightly Firefox builds, you can use the parallel methods mapPar, filterPar, and reducePar on normal JS arrays. These work basically like their sequential equivalents except that they execute in an undefined order (for reducePar, that can be a more significant difference, since both the left and right operand might be the result of a reduction). That means you can write code like:

var x = [...];
x.mapPar(function(e) { return ...; })

We don’t expect major changes to these methods, except that things will continue to work faster and with a broader range of functions. The major change I expect, which is backwards compatible, is that mapPar and filterPar will change from having an undefined execution order to having a defined order that is the same as map and filter (in other words, they will become “drop-in” replacements for map and filter). But we’re not ready yet to commit that things won’t change, and hence these methods will stay confined to Nightly builds only for the time being.

Besides improving the runtime itself, the two major areas of development for these methods are:

  1. Developer feedback: We’d like to improve the feedback that one gets from the JIT in general, and particularly with respect to parallel execution, to make it easier to track down why you may be getting parallel execution.

  2. Supporting a wider set of JS features: there are a few common constructs that block parallel execution right now. Prime among them is regular expressions. It’d be great to fix those.

Typed objects

My main focus has been on implementing the typed objects API and integrating it into our existing parallel support. Typed objects on Nightly are in a transitional state: they partially conform to the final API, but a lot of work has not yet landed. I hope to land the remainder over this week and next.

Two big pieces of functionality are still missing. The first is the ability to create a typed object that is layered atop an array buffer. The second is to be able to extract an array buffer from a typed object. In both cases, the best way to handle this would be to merge typed arrays and typed objects completely, but since the performance of typed arrays is so crucial, I expect to first create bridges between the two types and then finally remove the bridge altogether. The latter is a long-term goal.

I implemented the the basic optimization strategy that I described before, but it is still necessary to extend that code to handle array elements (that is, we will currently optimize an expression like foo.x or bar.x = 22, but not foo[1] or bar[3] = 22 (where foo and bar are typed objects or handles).

The other big piece of work is that we need to optimize typed object creation. I would like idiomatic code like the following:

var point = new PointType({x: 22, y: 44});

to compile efficiently, without creating any intermediate objects. The same optimizations I have in mind will also make large assignments like the following more efficient:

line.from = {x: 22, y: 44}; // where line.from is a PointType

Integrating the two

Once typed objects are fully working, we need to ensure that parallel higher-order methods like mapPar etc work and work efficiently. We have currently built a prollyfill implementing roughly the desired API. The prollyfill works but only on [my branch] at the moment. It’ll work on Nightly once the typed object patches are fully landed. However, the prollyfill is pretty slow right now due to the absence of good optimization for typed objects and the absence of parallel execution.

There are two main work items here, one high and one low priority. The high priority item is that we’ll have to adjust the write guard code to understand [out pointers][outp].

The low priority item is that we’ll need to implement fallback paths that ensure that even when the JIT cannot fully optimize typed objects, we can still stay in parallel execution. I am not quite sure how this will look. Most likely it will mean extending the ICs to support typed objects (a good idea that will benefit sequential code too); I’ve also made a point to self-host the typed objects implementation where possible, which should help with parallelization.

Timeline

The current goal is to have efficient parallel execution for typed objects by January. I still think this can be achieved. If we can have the optimized variations on typed objects working by November, that gives two months to integrate which seems reasonable, though not generous.

Intermingled Parameter Lists

I’ve been hard at work finishing up work on Rust’s “new” syntax for lifetimes – I put “new” in quotes because the syntax has been in use for some time, but in rustc itself the change was only half-baked. In effect, the new syntax was a kind of “bridge” to code that was designed for the older type system. This resulted in some artificial limitations: for example, types could only have a single lifetime parameter, and it had to be named 'self. Under my pull request, these limitations are lifted. However, in the process of implementing things, I realized one minor problem with the new syntax that must be rectified. In this post I describe the problem and my proposed solution.

For the impatient, the changes I propose are summarized here. These are pretty nitty-gritty changes that won’t affect most programs, but are needed to make the current type system more coherent. The changes are needed to fix issue #5121. The full motivation for these changes is somewhat subtle and is described below the fold.

  • Permit lifetime and type parameters to be intermingled within a parameter list, rather than requiring all lifetime parameters to appear up front.
  • Do not bring all parameters into scope as a unit but rather one by one.
  • Require that, when explicit parameters are supplied, values for all parameters must be supplied, unlike today where values for lifetime parameters can always be omitted.
  • Introduce _ as a notation for an unspecified lifetime or type. It can be used in any type that appears in a fn body or signature.
  • For fn items, trailing lifetime parameters are considered “late bound”, meaning that their value are substituted when the fn is called rather than when it is referenced. This is contrast to type parameters, which are “early bound”. More on this point below.
  • Make both type and lifetime parameters required on references to types in all contexts.