Self-hosted Parallel JS
5 December 2012
The blog has been silent for a while. The reason is that I’ve been hard at work on Parallel JS. It’s come a long way: in fact, the goal is to land an initial version in the next couple weeks!
One of the very exciting developments has been that we switched to a
self-hosting implementation. Self-hosting is a very cool new
direction for the SpiderMonkey engine being introduced by
Till Schneidereit. The idea is to implement large parts of
the engine itself in JavaScript, similar to projects like
Squeak, Jikes RVM, Maxine, PyPy and numerous
others. As an example, imagine the standard JavaScript function
Array.map
. In SM, this is currently implemented with approximately
80 lines of C++ code. This function must handle all sorts of
annoying conditions, such as ensuring that objects are rooted,
checking for interrupts, and using an
optimized call sequence to make it faster to invoke the JS code.
If the implementation were written in JS, however, all of these issues
would be handled automatically by the engine itself, just as they are
for any other JS function.
Besides complexity, another downside to the C++ implementation of
Array.map
is that it is slow. This may seem surprising. Isn’t
C++ supposed to be faster than JS? Well, the answer is that yes, C++
programs can be faster than JS, but only if they’re operating on C++
data structures, which are significantly different than their JS
equivalents. A C++ array, for example, is just a continuous series of
memory locations, so an array store like a[i] = v
can be compiled
with just a few assembly instructions. In JavaScript, though, arrays
are much more flexible: they can be sparse, for example, and they can
be dynamically grown and shrunk. All of this means that that same
array store in JavaScript can be a significant more complex
operation. Therefore, the C++ equivalent is a
call to the JS VM function SetArrayElement()
.
This function is very general and can handle all of the various
possibilities that come up.
Using a helper function is unfortunate because, in the common case, a
JS array is actually represented in the engine using a simple C++
array, meaning that the store a[i] = v
could (normally) be compiled
into a simple store, just like in C++. And in fact, if the loop were
written in JS, the JIT would do just that; it would only resort to
something as general as SetArrayElement()
if it proved to be
necessary during execution.
There are other reasons that the C++ code is slower than JS code would be. The interfacing to the GC is more awkward, for example, since the compiler cannot generate a stack map to tell us where pointers are located. The transition between C++ and JS code requires a bit of adaptation since the IonMonkey compiler uses a calling convention that is specialized to the needs of JS. And so on.
Intrinsics
Of course, replacing C++ with JavaScript is not all rainbows and sunshine. There are complications. For one thing, the C++ code is able to do special things that normal JavaScript is not capable of; it is also involved with some internal details that JavaScript functions do not normally need to be concerned with. To help with these cases, self-hosting introduces the idea of intrinsic function. An intrinsic function is a special JS function, implemented in C++, that is only available to the self-hosted code. Intrinsics can be used to expose extra capabilities to the self-hosted code, which is really internal to the engine, without exposing them to all JavaScript code.
To set them apart, intrinsic functions are currently given a name that
starts with a %
sign (e.g., %Foo()
). The plan as I understand it,
though, is to replace these names with normal JS identifiers.
In Parallel JS, we use intrinsics to expose a single, underlying
parallel operation. This operation is then used to implement the
other higher-level parallel ops (map()
, filter()
, reduce()
,
scan()
, and scatter()
). Unlike the higher-level ops, the
intrinsic parallel operation is not safe. It can induce data races.
It will also fail if the function it is applied to cannot be compiled
for parallel execution for whatever reason, unlike the higher-level
ops.