Optimizing complex typed object assignments
4 November 2013
I want to optimize assignments to struct-typed fields in typed objects. This post is an effort to work through my optimization plan.
The goal
Imagine some code like this:
var PointType = new StructType({x: int32, y: int32});
var LineType = new StructType({from: PointType,
to: PointType});
var line = new LineType();
line.to = {x: 22, y: 44};
The last line in particular is the one I am interested in. Today we execute this in the most naive way. The code which ion generates looks something like:
var tmp = {x: 22, y: 44};
SetProperty(line, "to", tmp) // a C++ helper
This means that a fresh temporary object tmp
is allocated. There is
no special optimization for setting properties whose types are complex
types like PointType
, so line.to
results in a call into the
interpreter, which eventually calls the self-hosted function
ConvertAndCopyTo
. This function will reflectively walk tmp
,
essentially doing the equivalent of:
var tmp = {x: 22, y: 44};
line.to.x = int32(tmp.x);
line.to.y = int32(tmp.y);
There are many sources of inefficiency here:
- Constructing the temporary object
tmp
. - Going into C++ code from JS then back to self-hosted JS.
- The reflective walk done by
ConvertAndCopyTo
.
The pending patch in bug 933289 eliminates point 2.
Basically all the patch does is to convert the generic SetProperty
call, which goes into C++ and then into self-hosted code, so that it
directly invokes the self-hosted code. Therefore the generated code
now looks like:
var tmp = {x: 22, y: 44};
ConvertAndCopyTo(Point, line, 8, tmp)
This is a slight optimization in that we’ve baked in the offset of the
to
field (8 bytes) and we avoid the JS to C++ to JS transition, but
we are still allocating a temporary and we’re still using a reflective
walk to read out and copy the properties. What I’m looking at now is
how we can eliminate those two parts.
Optimizing in stages
My plan is to add two optimizations. The first would expand calls to
ConvertAndCopyTo
into a series of assignments in the case where we
know the type of the value being assigned. Therefore, a call like
ConvertAndCopyTo(Point, line, 8, tmp)
would be expanded in place to
something like:
var tmp = {x: 22, y: 44};
line.to.x = int32(tmp.x)
line.to.y = int32(tmp.y)
Next, a second optimization would detect reads like tmp.x
where
we can statically determine the result and propagate the value. This
means that the code above would be optimized to:
var tmp = {x: 22, y: 44};
line.to.x = int32(22);
line.to.y = int32(44);
Finally, dead-code elimination should be able to remove the temporary.
This approach has the advantage that the second half can benefit general code. For example, a common pattern in some of the PJS code I’ve looked is to use constant vectors like so:
var constantFactors = [3.14159, 2.71828, 6.67384];
var x = constantFactors[0] * vector[0];
var y = constantFactors[1] * vector[1];
var z = constantFactors[2] * vector[2];
The same optimization would constant propagate those uses. (I don’t believe we optimize this kind of code today; of course, if we do, that’s great, less work for me.)
One tricky case
However, after some discussion with jandem on IRC, I realized that
this strategy was overly ambitious. In particular, the first step
which expands calls to ConvertAndCopyTo
provides no way to recover
in case the JIT code should be invalidated. In the running example,
this isn’t an issue, but in general a read like tmp.x
or tmp.y
could in fact access a getter and have arbitrary side-effects. That
means that we must be able to bailout of the jitted code and resume
execution in the interpreter.
For example, imagine some code like:
line.to = evilObject;
this line would be optimized to:
ConvertAndCopyTo(Point, line, 8, evilObject)
and that call would in turn be expanded to:
line.to.x = int32(evilObject.x);
line.to.y = int32(evilObject.y);
Now imagine that evilObject.x
is in fact a getter that modifies some
global variables, and evilObject.y
is a getter that throws an
exception. Since throwing exceptions boots us out of jitted code, that
means that we have to bailout to the interpreter while reading
evilObject.y
. The problem is that the interpreter only has a single
bytecode for the entire assignment; it doesn’t have a way to represent
having partially progressed through the assignment. And, since
accessing evilObject.x
actually mutates a global variable, we can’t
just re-do the entire assignment.
To avoid this scenario, we can lean on TI so as to ensure that we only
expand ConvertAndCopyTo
calls when we can ensure that the source
value properties are normal data properties and hence accessing them
will not produce globally visible side-effects. This means that even
if we do have to throw or revert, we can always bail out and repeat
the entire assignment without ill effects.
At least that’s my plan! I hope to hack some on this tonight / tomorrow, so we’ll see if I encounter any surprises.