<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Baby Steps]]></title>
  <link href="http://smallcultfollowing.com/babysteps/atom.xml" rel="self"/>
  <link href="http://smallcultfollowing.com/babysteps/"/>
  <updated>2013-05-14T10:34:06-04:00</updated>
  <id>http://smallcultfollowing.com/babysteps/</id>
  <author>
    <name><![CDATA[Nicholas D. Matsakis]]></name>
    
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[Procedures, continued]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/05/14/procedures/"/>
    <updated>2013-05-14T09:20:00-04:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/05/14/procedures</id>
    <content type="html"><![CDATA[<p>So, I didn&#8217;t actually <em>mean</em> to post that previous post, I had
intended to think more on the idea. But oh well, cat&#8217;s out of the
bag. In any case, I&#8217;ve been thinking about the &#8220;closures&#8221; vs
&#8220;procedures&#8221; idea that I jotted down there and decided to try and
elaborate on it a bit more, since I find it has a lot of appeal. In
particular I think that the current collection of closure types is
addressing too many distinct use cases and the result is confusing.</p>

<p><em>UPDATE 2013.05.14:</em> Edited to tweak various errors and to
add some variations at the end that I prefer.</p>

<h3>Today: by-reference vs copying closures</h3>

<p>Today we offer three different kinds of closures (<code>&amp;fn</code>, <code>@fn</code>, and
<code>~fn</code>), but these closures can really be divided into two basic
categories: by-reference and copying closures. A by-reference closure
is the usual kind: it is allocated on the stack and has full access to
the variables in the creating stack frame. It can read them, write
them, and borrow them. These are used with for loops and the like.</p>

<p>Copying closures, on the other hand, are somewhat different. They are
not tied to any particular stack frame. Instead, they <em>copy</em> the
current values of the variables which they close over into their
environment (like all the default Rust copies, this is a shallow copy,
so if the value being closed over contains <code>~</code> pointers, it will no
longer be accessible from the creator). These closures are used
primarily as task bodies and for futures. There are some scattered
uses of <code>@fn</code> closures in the compiler but as far as I can tell they
are all legacy code that should eventually be purged and rewritten to
use traits (i.e., the visitor, the AST folder).</p>

<p>Loosely speaking, a <code>&amp;fn</code> closure is by-reference and <code>@fn</code> and <code>~fn</code>
closures are copying closures. But this is not strictly true. In fact,
an the <code>&amp;fn</code> type can be either a by-reference closure <em>or</em> a copying
closure, because you are permitted to borrow a <code>@fn</code> or <code>~fn</code> to a
<code>&amp;fn</code>.  So the type in isolation does not tell you whether a closure
is by-reference or not. In fact, there is no explicit indication at
all&#8212;instead, when you create a closure today (i.e., with a <code>|x, y|
...</code>) expression, the compiler infers based on the expected types
whether this should be a by-reference closure or a copying
closure. Because the semantics of these two vary greatly, I find this
potentially quite confusing and unfortunate.</p>

<h3>Tomorrow (perhaps): closures and procedures</h3>

<p>In general, I would prefer to draw a starker line between copying and
by-reference closures. I propose to use the term <em>closure</em> to refer
only to by-reference, stack-allocated closures. We could then use
another term, perhaps <em>procedure</em>, to refer to the copying
closures. This would mean that our type hierarchy would look like:</p>

<pre><code>T = S               // sized types
  | U               // unsized types
S = fn(S*) -&gt; S     // closures (*)
  | &amp;'r T           // region ptr
  | @T              // managed ptr
  | ~T              // unique ptr
  | [S, ..N]        // fixed-length array
  | uint            // scalars
  | ...
U = [S]             // vectors
  | str             // string
  | Trait           // existential ("exists S:Trait.S")
  | proc(S*) -&gt; S   // procedures (*)
</code></pre>

<p>This chart is basically the same as the one you will find in the
<a href="http://smallcultfollowing.com/babysteps/blog/2013/04/30/dynamically-sized-types/">dynamically sized types</a> post from before with one crucial
difference: closure types have been split from procedures, and closure
types have moved into the category of <em>sized types</em>, meaning that you
no longer write an explicit sigil when you use one. This is because
the representation of a closure would always be a pair of a borrowed
pointer into the stack and a function pointer: the type has a fixed
size (two words) and requires no memory allocation.</p>

<p>I have chosen to leave procedures as unsized, since a procedure must
allocate memory on the heap, and this allows the user to select which
heap is used; in earlier drafts of this idea, I had modified
procedures to implicit use the exchange heap, meaning that a type like
<code>proc()</code> always represented an exchange heap allocation. But I think
it&#8217;s more consistent to have that type be written <code>~proc</code>, and it
maintains the general Rust invariant &#8220;you don&#8217;t have allocation unless
you see a sigil&#8221;.</p>

<p><em>UPDATE:</em> bstrie on IRC asked about fn items, which never have any
environment. As today, these would continue to be coercable to either
a closure or a procedure.</p>

<h3>Closure and procedure expressions</h3>

<p>Closures would still be created with the form <code>|x, y|
expr</code>. Procedures would be created using the keyword <code>proc</code>: <code>proc(x,
y) expr</code>. If desired, we could integrate procedures into <code>do</code> using
some syntax like one of the following, depending on whether we wish
to make the sigil explicit:</p>

<pre><code>do spawn proc { ... }   // sigil inferred

do spawn ~proc { ... }  // sigil explicit
</code></pre>

<h3>Closure and procedure types in more detail</h3>

<p>The full function or procedure type would look something like this
(<code>[]</code> indicates optional content):</p>

<pre><code> [once] (fn|proc) [:['r] [Bounds]] &lt;'a...&gt; (S*) -&gt; S
 ^~~~~^           ^~~~~~~~~~~~~~~^ ^~~~~~^ ^~~^    ^
   |                     |            |     |      |
   |                     |            |     |  Return type
   |                     |            |    Argument types
   |                     |          Bound lifetime names
   |               Lifetime and trait bounds
Onceness
</code></pre>

<p>Here the &#8220;onceness&#8221; indicates whether the closure/procedure can be
called more than once. The &#8220;lifetime and trait bounds&#8221; indicate
constraints on the environment. The lifetime bound <code>'r</code> indicates the
minimum lifetime of the variables that the closure/procedure closes
over, and the &#8220;bounds&#8221; (if any) would give bounds on the types of
those variables. Finally, you have the argument and return types.</p>

<p>If omitted, the default bounds for a closure would be a fresh lifetime
and no type bounds. The default bounds for a procedure would be the
static lifetime and <code>Owned</code>.</p>

<h3>Use cases</h3>

<p>Let&#8217;s look briefly at the use cases I listed before.</p>

<h4>Higher-order and once functions</h4>

<p>Typical uses for higher-order and once functions look much the same as
before, but minus a sigil.</p>

<pre><code>impl&lt;T:Sized&gt; for [T] {
    pub fn map&lt;U:Sized&gt;(f: fn(&amp;T) -&gt; U) -&gt; ~[U] { ... }
                        // ^~~~~~~~~~~
}

impl&lt;T:Sized&gt; for Option&lt;T&gt; {
    // `each` on an option type can only execute at most once:
    pub fn each(f: once fn(&amp;T) -&gt; bool) -&gt; bool { ... }
                // ^~~~~~~~~~~~~~~~~~~
    }
}
</code></pre>

<p>For contrast, these are <code>&amp;fn(&amp;T) -&gt; U</code> and <code>&amp;once fn(&amp;T) -&gt; bool</code> today.</p>

<h4>Sendable functions and sendable once functions</h4>

<p>Here is an example of a sendable once function:</p>

<pre><code>fn spawn(f: ~once proc()) {...}
         // ^~~~~~~~~~
</code></pre>

<p>As we saw before, one would write one of the following to call this
function:</p>

<pre><code>do spawn proc { ... }
spawn(proc { ... })
</code></pre>

<p>Creating a future would look like <code>future(proc expr)</code> (vs <code>future(||
expr)</code> today).</p>

<h4>Const closures</h4>

<p>One could still use const closures to achieve lightweight parallelism:</p>

<pre><code>impl&lt;T:Sized&gt; for [T] {
    pub fn par_map&lt;U:Sized&gt;(f: fn:Const(&amp;T) -&gt; U) -&gt; bool { ... }
                            // ^~~~~~~~~~~~~~~~~
}
</code></pre>

<p>However, I have been thinking that we&#8217;ll have to be careful here, we
need some way to guarantee that the closure does not move from its
environment and then replace the moved value. Today this is illegal,
but if we can prevent closures from recursing (which we must do
anyhow) then we could make such moves legal, and it would be useful
sometimes. On simple solution is to stay that if the closure type has
a <code>Const</code> bound, moves are illegal, but it&#8217;s a bit&#8230;ad-hoc, since the
bounds are only supposed to be constraining the <em>types</em> of the
variables that are closed over. Still, it might be good enough.</p>

<h4>Sendable const functions and combinators</h4>

<p>As I argued before, I think these are not important use cases, but
with procedures they actually work out fine (though not with
variations 2 and 3 below). A sendable const function can be expressed
with the type <code>~proc:Owned+Const()</code>, which is complex, but then it
<em>is</em> a complex idea. Combinator types would likely look like <code>@proc</code>
or <code>@proc:'r</code>, in the case where the combinator closes over borrowed
data.</p>

<h3>Variation #1: Leaving procedures out of the core language</h3>

<p>In fact, I think <code>proc</code> types need not be built into the language, you
could model them with traits, though you&#8217;d probably want a macro like
<code>proc!(...)</code> for defining the proc body. This would also mean the
procedures can&#8217;t be used with <code>do</code> form.</p>

<h3>Variation #2: Limit procedures to execute once</h3>

<p>I don&#8217;t know of any (good) uses cases for non-once procedures.  I
think they should just always be <code>once</code>. This would mean that the only
closure types that are commonly needed would be:</p>

<ol>
<li><code>fn(T)</code> &#8211; normal higher-order functions</li>
<li><code>once fn(T)</code> &#8211; higher-order functions that execute at most once</li>
<li><code>~proc(T)</code> &#8211; procedures</li>
</ol>


<p>Because procedures can always be desugared into a struct and a trait,
this would not lose no expressiveness.</p>

<h3>Variation #3: Limit procedures to execute once and use exchange heap</h3>

<p>For maximum streamlining, we could make <code>proc</code> implicitly use <code>~</code>,
in which case it would be written:</p>

<ol>
<li><code>fn(T)</code> &#8211; normal higher-order functions</li>
<li><code>once fn(T)</code> &#8211; higher-order functions that execute at most once</li>
<li><code>proc(T)</code> &#8211; procedures</li>
</ol>


<p>These types read pretty well, I think.</p>

<h3>Summary</h3>

<p>I have long been unsatisfied with the implicit and confusing divide
between &#8220;by reference&#8221; and &#8220;copying&#8221; closures. Splitting them into two
concepts seems to address a lot of issues and be an overall win to me.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Mutable fn alternatives]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/05/13/mutable-fn-alternatives/"/>
    <updated>2013-05-13T17:31:00-04:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/05/13/mutable-fn-alternatives</id>
    <content type="html"><![CDATA[<p>I&#8217;ve been thinking about what I wrote in my last post regarding
closures and I am beginning to change my opinion about the correct
solution. <code>fn~</code> just seems so unfortunate. So, besides writing <code>fn~</code>,
what are the other options?  I just thought I&#8217;d write down a few of
the other ideas I&#8217;ve come up with for later reference.  Not saying any
of the ideas in this post are good yet.</p>

<h3>Just write <code>&amp;mut fn()</code></h3>

<p>Maybe it&#8217;s not so bad. It is advertising the possibility that the
closure may mutate its environment. This would mean that while <code>&amp;fn()</code>
is a valid type, it is a type that does not permit the function to be
called, much as <code>&amp;&amp;mut</code> (pointer to a mutable borrowed pointer) does
not permit the mutable borrowed pointer to be used.</p>

<p>At first I was thinking that there is also a valid interpretation for
<code>&amp;fn</code>, meaning a function that does not mutate the variable in its
environment, but then I realize that per the DST proposal any <code>&amp;mut
fn</code> could be borrowed to <code>&amp;fn</code>, and so that would not be sound.</p>

<h3>Remove everything but borrowed closures</h3>

<p>We could just <em>only have</em> borrowed closures. The type would be written
<code>fn[:bounds]()</code> or <code>once fn[:bounds]()</code>. There&#8217;d be no need to notate
the kind of environment pointer: it&#8217;s always a borrowed pointer. All
other uses of closures would be expressed using traits and impls.</p>

<p>Mainly this means that code which spawns traits would get somewhat
verbose, because you would need to create a struct or some other type
to capture all of the upvars. For larger tasks, this is not a big
deal, but for some code it could be rather annoying. I imagine futures
in particular would become much more verbose; enough so as to be
nearly unusable.</p>

<p>On the upside, there&#8217;d be no more confusion about whether a closure
copies its environment or not (no, it never does). Closure types would
be simpler (no need to worry about sigils). You&#8217;d write <code>fn()</code> or
<code>once fn()</code> in all but the most esoteric cases. The code to manage
closures would become much simpler.</p>

<h3>Add a new keyword for what is now called an &#8220;owned closure&#8221;</h3>

<p>This is basically the <code>fn~</code> solution with another name. Rather than
writing <code>fn~</code> to indicate a closure value that owns its environment,
we could write <code>proc</code> (for procedure) or something like that.  This
avoids the annoying &#8220;sigil after the name&#8221;, at the cost of a new
keyword.</p>

<p>Procedures could probably <em>always</em> be single-shot (that is, <code>once</code>).
Almost all use cases for them (futures, tasks, etc) are single-shot,
and the others could probably be accommodated with traits instead. But
we could also distinguish between a <code>proc</code> and a <code>once proc</code> if we
wanted.</p>

<p>Procedures would probably be less interoperable with functions, since
the name does not particularly suggest interoperability. For example,
I imagine you could not use a <code>proc</code> where a <code>fn</code> is expected. I don&#8217;t
know of any time that this is actually important.</p>

<p>Using a different name also helps to draw a clear line between between
&#8220;closures&#8221; (which reference the variables in the stack frame that
created them) and &#8220;procedures&#8221; (which copy out from that stack frame).
I personally would prefer to designate procedures with a different
syntax, e.g., <code>proc(x, y) { ... }</code> in place of <code>|x, y| ...</code>, but this
is not <em>necessary</em> (as an aside, I had hoped to write some today about
why I think our current use of <code>||</code> to designate any kind of closure
is troublesome and should be changed, before I realized that we&#8217;d have
to address this problem I&#8217;m thinking over instead).</p>

<h3>More ideas?</h3>

<p>Ok, that&#8217;s most of the more radical ideas I&#8217;ve had so far. I&#8217;ll have
to keep thinking on it.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Recurring closures and dynamically sized types]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/05/13/recurring-closures-and-dynamically-sized-types/"/>
    <updated>2013-05-13T10:35:00-04:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/05/13/recurring-closures-and-dynamically-sized-types</id>
    <content type="html"><![CDATA[<p>I realized today that there is an unfortunate interaction between the
proposal for <a href="http://smallcultfollowing.com/babysteps/blog/2013/04/30/dynamically-sized-types/">dynamically sized types</a> and closure types. In
particular, in the <a href="http://smallcultfollowing.com/babysteps/blog/2013/04/30/the-case-of-the-recurring-closure/">case of the recurring closure</a>, I described
the soundness issues that arise in our language when closures are able
to recurse.</p>

<p>My solution for this was to make the type system treat a <code>&amp;fn()</code> value
the same way it treats <code>&amp;mut T</code> pointers: they would be non-copyable,
and when you invoke them, that would be effectively like a &#8220;mutable
borrow&#8221;, meaning that for the duration of the call the original value
would become inaccessible. So in short the type system would guarantee
that when you call a closure, that same closure is not accessible from
any other path in the system, just as we now guarantee that when you
mutate a value, that same value is not accessible from any other path
in the system.</p>

<p>This is all well and good, and I think this treatment would be largely
invisible to the user under common access patterns. However, it does
not play well with the proposal for <a href="http://smallcultfollowing.com/babysteps/blog/2013/04/30/dynamically-sized-types/">dynamically sized types</a>,
because under this proposal all things written <code>&amp;T</code> must behave the
same, no matter what <code>T</code> is. This is in fact <em>the whole point</em> of the
proposal! But here I want to treat <code>&amp;fn</code> specially.</p>

<p>I&#8217;ve been pondering various solutions this morning. I have come up
with two possible avenues:</p>

<ol>
<li><p>Instead of writing <code>&amp;fn()</code> you could write <code>&amp;mut fn()</code>. This is
perhaps the &#8220;principled&#8221; solution, but I consider it rather a
non-starter.  Writing <code>&amp;fn()</code> for a closure is&#8230;tolerable, but <code>&amp;mut
fn()</code> is not. It&#8217;s verbose and it seems sort of nonsensical (although
there is some logic to it, when you consider that calls to the
function may mutate the environment and so forth).</p></li>
<li><p>We go back to the older notation and move sigils for closures
<em>after</em> the fn. This actually has some notational perks. For example,
rather than writing <code>&amp;fn()</code> we can just write <code>fn()</code> (if there is no
sigil, we can default to <code>&amp;</code>). On the minus side, a sendable closure
would be written <code>fn~()</code>&#8212;but, then again, under the dynamically
sized types proposal, sendable closures were going to be written
<code>~fn:Owned()</code>, so is <code>fn~()</code> really so bad?</p></li>
</ol>


<p>More details after the fold.</p>

<!-- more -->


<p>OK, let&#8217;s dig into the details a bit more. As anyone who has been
following my blog posts probably knows by now, there are many, many
use cases for closures. I want to dive into the use cases that are on
my mind and elaborate on them. I also want to take this case to write
up a bit more thoroughly how I think closures should work, including a
few unrelated issues.</p>

<h3>Syntax and use cases</h3>

<p>Here is a list of use cases to be accommodated:</p>

<ol>
<li>&#8220;Higher-order functions&#8221;: simple functions like <code>map</code>, <code>fold</code>
and so forth. By far the most common use case.</li>
<li>&#8220;Once functions&#8221;: functions that can only execute once. This means
that they can move values out of their environment.</li>
<li>&#8220;Sendable functions&#8221;: functions that can be sent between tasks.
This means that they only close over &#8220;sendable&#8221; values (no
garbage-collected data or borrowed pointers).</li>
<li>&#8220;Sendable once functions&#8221;: sendable functions that can only execute
once. This is what a task body will be.</li>
<li>&#8220;Const functions&#8221;: functions that do not close over mutable state.
We don&#8217;t make much use of this yet, but I plan to do so in order to
achieve lightweight fork-join parallelism a la <a href="http://smallcultfollowing.com/babysteps/blog/categories/pjs">PJS</a>.</li>
</ol>


<p>The use cases above seem to me to be the &#8220;bread and butter&#8221; cases that
will arise frequently. I will go over the syntax and give an example
for each of those use cases shortly. Interestingly, I think that all
of them actually read reasonably well if the sigils are moved after
the <code>fn</code> keyword, and in some cases the examples read much better.</p>

<p>However, there are two additional use cases that I have considered in
the past which I left out. These use cases become significantly harder
to read under the new proposal (though they were always hard to read).
Interestingly, I realized while writing this blog post that I think
these use cases are no longer terribly important, since both of them
can be expressed equally well using objects instead of closures, as I
will explain shortly. The two use cases are:</p>

<ol>
<li>&#8220;Sendable const functions&#8221;: functions that can be sent between tasks
<em>and</em> do not close over mutable state. You could safely share such
functions between tasks in an ARC (atomically referenced counted
container) and execute them multiple times in parallel.</li>
<li>&#8220;Combinators&#8221;: combinator libraries create <em>and return</em> closures that
closure over their arguments, which may include borrowed values.</li>
</ol>


<h4>Higher-order functions</h4>

<p>Here is an example of a simple higher-order function (with the closure
type highlighted):</p>

<pre><code>impl&lt;T:Sized&gt; for [T] {
    pub fn map&lt;U:Sized&gt;(f: fn(&amp;T) -&gt; U) -&gt; ~[U] { ... }
                        // ^~~~~~~~~~~
}
</code></pre>

<p>For contrast, this is <code>&amp;fn(&amp;T) -&gt; U</code> today.</p>

<h4>Once functions</h4>

<p>Here is an example of a higher-order function that executes at most
once:</p>

<pre><code>impl&lt;T:Sized&gt; for Option&lt;T&gt; {
    pub fn each(f: once fn(&amp;T) -&gt; bool) -&gt; bool { ... }
                // ^~~~~~~~~~~~~~~~~~~
    }
}
</code></pre>

<p>For contrast, this is <code>&amp;once fn(&amp;T) -&gt; U</code> today.</p>

<h4>Sendable functions and sendable once functions</h4>

<p>Here is an example of a sendable once function:</p>

<pre><code>fn spawn(f: once fn~()) {...}
         // ^~~~~~~~~~
</code></pre>

<p>The <code>~</code> after the <code>fn</code> tells the type system that the environment for
this function is allocated using an owned pointer. It also implies a
default bound of <code>Owned</code>. The <code>once</code> tells the type system that the
function will only execute once.</p>

<p>For contrast, this is <code>~once fn()</code> today.</p>

<h4>Const functions</h4>

<p>Here is an example of how I would use a const function to achieve
lightweight parallelism:</p>

<pre><code>impl&lt;T:Sized&gt; for [T] {
    pub fn par_map&lt;U:Sized&gt;(f: fn:Const(&amp;T) -&gt; U) -&gt; bool { ... }
                            // ^~~~~~~~~~~~~~~~~
}
</code></pre>

<p>This is a parallel map function. It is similar to the regular map
except that its iterations execute in parallel. As a consequence, it
demands a <code>fn:Const</code> rather than a <code>fn</code>&#8212;the <code>Const</code> bound specifies
that all the environmental state must be immutable. This is exactly
the &#8220;patient parent&#8221; or &#8220;parallel closures&#8221; model that is used in
<a href="http://smallcultfollowing.com/babysteps/blog/categories/pjs">PJS</a> and described in <a href="https://www.usenix.org/conference/hotpar12/parallel-closures-new-twist-old-idea">this HotPar paper I wrote</a>.</p>

<p>For contrast, this is <code>&amp;fn:Const()</code> today.</p>

<h4>Sendable const functions</h4>

<p>Sendable const functions are one of the two cases that I said would
become less attractive under the new proposal. They would look
something like <code>fn~:Const</code> (vs <code>~fn:Const</code> today). The newer syntax
works and should be available, but it&#8217;s hard to read, due I think to
the juxtaposition of <code>~</code> (which specifies the kind of pointer used for
the environment) and the <code>:</code> that begins the bound specifier <code>:Const</code>.
If this use case were important, I might be worried that the syntax is
too ugly, but when I tried to come up with an example for where this
use case would be needed, I realize that time has left the use case
behind to some extent.</p>

<p>The primary use case for a sendable const function initially was to
allow hashtables to be placed in ARCs&#8212;the reason for this was that a
<code>HashMap</code> requires closures for for computing the hash function of its
argument, and those to share the hashmap (and perform parallel
lookups) we had to be sure that the closures would not mutate any
state. However, this is somewhat outdated, because hashing and
equality comparison today is based on traits rather than closures.</p>

<p>Now, using traits is somewhat limited, because due to coherence it
means that any one type can only be hashed in one way, and sometimes
you would like to have specialized hashing for specific circumstances.
But these use cases can easily be accommodated in three ways:</p>

<ol>
<li>Using newtyped keys (<code>struct MyKey(key)</code>) and defining different
implementations for the hashing and equality traits on <code>MyKey</code>.</li>
<li>If a newtyped key is not acceptable, you can write a hash table
that takes a simple function pointer (<code>extern "Rust" fn</code>) rather
than a closure. Function pointers carry no state, but state is
rarely needed for equality comparisons.</li>
<li><p>If you really need state, then you can write a specialized trait
in lieu of a closure:</p>

<pre><code>trait HashFuncs&lt;K&gt; {
    fn hash(&amp;self, k: &amp;K) -&gt; uint;
    fn eq(&amp;self, k1: &amp;K, k2: &amp;K) -&gt; bool;
}
</code></pre>

<p>Now your hashtable can either take a <code>~HashFuncs</code> object to use
for hashing and equality comparison or, if you wish to avoid
dynamic dispatch for performance reasons, you can parameterize
your hashtable type by the instance of <code>HashFuncs</code> that it should use:</p>

<pre><code>struct MyHashMap&lt;K,V,F:HashFuncs&lt;K&gt;&gt; {
    f: F,
    ...
}
</code></pre></li>
</ol>


<h4>Combinators</h4>

<p>General purpose combinators are the other case that (might) get less
attractive. This is less clear cut. The idea of a combinator library
is that you have functions that return functions, and then you can
compose these functions into bigger functions. The most common example
is a <a href="http://en.wikipedia.org/wiki/Parser_combinator">parser combinator</a>, which is a simple way to create
inefficient and buggy parsers (ok, that&#8217;s unfair, but I couldn&#8217;t
resist; I&#8217;ve had some bad experiences trying to scale up parser
combinators&#8212;truth is, they are super nice to work with, at least
until things go wrong).</p>

<p><em>Anyway,</em> a typical parser combinator library would begin with a primitive
like the following:</p>

<pre><code>fn expect(c: char) -&gt; fn@(&amp;mut ParseState) -&gt; Result&lt;(), Err&gt; { ... }
                   // ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
</code></pre>

<p>Note that the function returns a closure. We used <code>fn@</code> because this
closure must be allocated on some heap in order for us to return it,
and because using the type <code>fn@</code> (vs say <code>fn~</code>) would allow us to
close over managed and other task-local data. So far, I think this
example works out fine.</p>

<p>Where things get more complex is if we want to close over borrowed
pointers.  For example, imagine an <code>expect</code> function that takes a
slice:</p>

<pre><code>fn expect_string&lt;'a&gt;(s: &amp;'a str)
                     -&gt; fn@:'a(&amp;mut ParseState) -&gt; Result&lt;(), Err&gt; {...}
                     // ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
</code></pre>

<p>Here the type system will require that the lifetime <code>'a</code> of the input
slice <code>s</code> appear in the resulting function type, so that it can be
sure that the function is not used after the slice is no longer valid.
This makes the type more complicated: <code>fn@:'a</code> (vs the
also-not-especially-intuitive notation of <code>@'a fn</code> today).</p>

<p>Of course, one could address this problem by having <code>expect_string</code>
take a <code>~str</code> or <code>@str</code> instead of a borrowed string, but in some use
cases borrowed pointers may perfect sense. For example, I had once
thought to use this pattern to create a combinator library for
expressing iteration primitives like <code>enumerate</code> and so forth
(similar, experimental work is now underway in the <code>iter</code> module).</p>

<p>Interestingly, just as with sendable const closures, objects and
traits can provide an alternative that is ultimately (I think) a
better and more readable design anyway. We could rewrite the return
type from a closure into a trait:</p>

<pre><code>trait Parser&lt;R&gt; {
    fn parse(&amp;mut ParseState) -&gt; Result&lt;R,Error&gt;;
}

fn expect(c: char) -&gt; @Parser&lt;()&gt;;
fn expect_string&lt;'a&gt;(s: &amp;'a str) -&gt; @Parser:'a&lt;()&gt;;
</code></pre>

<p>Here in the <code>expect_string</code> case I have taken advantage of the fact
that object types will also carry bounds similar to closure types.  An
advantage of this design is that using a trait allows the <code>Parser</code>
objects to carry more methods as well.</p>

<p>If we were to extend the example to include an actual <em>combinator</em>,
I imagine it would look something like this:</p>

<pre><code>fn or&lt;'a, R&gt;(p1: @Parser:'a&lt;R&gt;, p2: @Parser:'a&lt;R&gt;) -&gt; @Parser:'a&lt;R&gt; {...}
</code></pre>

<p>Of course, for maximum efficiency, one would avoid using object types
altogether. Then you would just implement <code>Parser</code> directly on
the <code>char</code> and <code>&amp;str</code> types, and perhaps write the <code>or</code> combinator
like so:</p>

<pre><code>struct or&lt;P1,P2&gt;(P1, P2);

impl&lt;R,P1:Parser&lt;R&gt;,P2:Parser&lt;R&gt;&gt; Parser for or&lt;P1,P2&gt; {
  fn parse(&amp;self, state: &amp;mut ParseState) -&gt; Result&lt;R,Error&gt; {
    let (ref p1, ref p2) = *self;
    state.try(); // (*)
    match p1.parse(state) {
      Ok(r) =&gt; { state.confirm(); Ok(r) }
      Err(_) =&gt; { state.backtrack(); p2.parse() }
    }
  }
}

// (*) Here you see my imperative roots. A true functional
// programmer would not use in-place mutation here but rather
// clone and return a new parser state.
</code></pre>

<h3>Summary</h3>

<p>Another long post mostly targeted at rust devs and myself. Sorry about
that. I think the bottom line is that we should move sigils for
closures and have them appear after the <code>fn</code> keyword. This makes me
sad, because this is how things used to be, and in fact one of the
main goals of the <a href="http://smallcultfollowing.com/babysteps/blog/2013/04/30/dynamically-sized-types/">dynamically sized types (DST)</a> proposal was to
move the sigils in closure types in front. But of course soundness
comes first, and I think the general wins of the DST proposal
(consistent behavior for all <code>&amp;T</code>, <code>@T</code>, <code>~T</code> etc) outweigh the need
to write <code>fn~</code> on occasion (I don&#8217;t really see much use for <code>fn@</code>).</p>

<p>There is also one final solution I didn&#8217;t mention in my initial
paragraphs. We could adopt the &#8220;principled&#8221; solution of using <code>&amp;mut</code>
for closures but change the way we notate <code>&amp;mut</code>. I have largely
avoided thinking about because I want to avoid destabilizing syntax
changes. However, I have toyed around occasion with an idea for
reorganizing our types to emphasize ownership and de-emphasize
mutability, which goes in this direction.  I may indulge myself and
write it up at some point. Still, I largely consider this a
non-starter.</p>

<p>Adopting the &#8220;move sigils in back&#8221; proposal does have another
casualty, though. There has been some talk of figuring out ways to
make <code>@</code> and <code>~</code> less special (as in, allowing user-defined pointer
types like <code>RefCounted&lt;T&gt;</code> that are on equal footing). The DST
proposal is clearly a step in that direction. Moving the sigils
backwards on <code>fn</code> types is, well, a step backward, because closures
would always be allocated using a limited set of allocators (stack,
<code>~</code>, or <code>@</code>).</p>

<p>In an odd way, finding this interaction makes me feel good. I&#8217;ve been
concerned that the DST proposal seemed too easy, which meant we
weren&#8217;t thinking hard enough about it. But there is another reason as
well: I have also been concerned that closure types were becoming a
bit too&#8230;  special, particularly with regard to
copyability. Basically I&#8217;ve been concerned that although the syntax
for a borrowed closure was <code>&amp;fn</code>, borrowed closures didn&#8217;t really
behave like <code>&amp;</code> pointers&#8212;without the DST proposal, this was something
that we could safely enforce as part of the type system, but it&#8217;s
still confusing for users. So I think the DST proposal forces us to be
more honest, and that&#8217;s a good thing all around.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Dynamically sized types, revisited]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/04/30/dynamically-sized-types/"/>
    <updated>2013-04-30T20:06:00-04:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/04/30/dynamically-sized-types</id>
    <content type="html"><![CDATA[<p>Recently, separate discussions with pnkfelix and graydon have prompted
me to think a bit about &#8220;dynamically sized types&#8221; once again. Those
who know Rust well know all about the sometimes annoying discrepancy
between a type like <code>~T</code> (owned pointer to <code>T</code>) and <code>~[S]</code> (owned
vector of <code>S</code> instances)&#8212;in particular, despite the visual
similarity, there is no type <code>[S]</code>, so <code>~[S]</code> is not an instance of
<code>~T</code> for any <code>T</code>. This design was the outcome of a lot of
back-and-forth and I think it has generally served us well, but I&#8217;ve
always had this nagging feeling that we can do better. Recently it
occurred to me how we could, though it&#8217;s not without its price.</p>

<p>In the spirit of &#8220;no stone left unturned&#8221;, I thought I&#8217;d write out
this idea. At first I thought this was a rather futile exercise, since
any large changes to Rust have to pass a pretty high bar at this
point, but now that I&#8217;ve thought the idea through, I think it has a
lot of merit and is worth considering.</p>

<!-- more -->


<h3>A change to representation</h3>

<p>For the purposes of simplicity, I will focus on vector types in this
blog post, though I think that many of the same considerations apply
to other types like closure and trait types (as well as strings, but
those are really just newtyped vectors to the compiler).</p>

<p>In the compiler today, both a <code>~[T]</code> and an <code>@[T]</code> are represented as
a <code>Box&lt;Vector&lt;T&gt;&gt;*</code> where the <code>Box</code> and <code>Vector</code> types are defined as
follows (here <code>N</code> is the length of the vector, which naturally is not
known until runtime):</p>

<pre><code>template&lt;class T&gt;
struct Box {
    type_descriptor_t *type_desc;
    ...
    T payload;
}

template&lt;class T&gt;
struct Vector {
    unsigned length;
    T[N] elements;
}
</code></pre>

<p>(The fact that <code>~[T]</code> uses a box is not actually necessary, it was
done as part of the early work on tracing GC and will eventually be
undone, at least for those cases where the type <code>T</code> does not itself
include managed pointers)</p>

<p>However, today, a slice <code>&amp;[T]</code> is represented quite differently. It is
in fact a <code>Slice&lt;T&gt;</code> type, where <code>Slice</code> is defined as follows:</p>

<pre><code>template&lt;class T&gt;
struct Slice {
    T* elements;
    unsigned length;
}
</code></pre>

<p>The reason for this is that we wish a slice to be a subset of another
vector, which is enabled by this two-word representation.</p>

<p>What I&#8217;d like to do is to use two words for all vectors. Therefore,
the layout for <code>~[T]</code> and <code>@[T]</code> will be:</p>

<pre><code>template&lt;class T&gt;
struct Vector {
    Box&lt;Elements&lt;T&gt;&gt;* elements;
    unsigned length;
}

template&lt;class T&gt;
struct Elements {
    T[N] elements;
}
</code></pre>

<h3>What does this new representation buy us?</h3>

<p>Notice that, apart from the box header, this means that a <code>~[T]</code> or a
<code>@[T]</code> is in fact a valid slice. This is exactly like any other <code>~T</code>
or <code>@T</code> pointer, which has the same format as a <code>&amp;T</code> pointer but for
the box. This is actually quite similar to how we handle object types
(<code>@Trait</code> vs <code>&amp;Trait</code>) and closure types (<code>@fn()</code>, <code>&amp;fn()</code>).</p>

<p>This means that we can define our Rust type hierarchy as follows:</p>

<pre><code>T = S            // sized types
  | U            // unsized types
S = &amp;'r T        // region ptr
  | @T           // managed ptr
  | ~T           // unique ptr
  | [S, ..N]     // fixed-length array
  | uint         // scalars
  | ...
U = [S]          // vectors
  | str          // string
  | Trait        // existential ("exists S:Trait.S")
  | fn(S*) -&gt; S
</code></pre>

<p>Note that I have divided the types into two groups. <em>Sized</em> types
indicate values whose size is known to the compiler. <em>Unsized</em> types
represent values whose size is <em>not</em> known the compiler (this
terminology is somewhat imprecise; unsized values do in fact have a
size, but it is not known until runtime). Note that unsized types are
generally only legal behind a pointer; that is, you can&#8217;t have a type
like <code>~[[int]]</code>, which would be an array of arrays, where each
subarray could have a different size. You could have <code>~[~[int]]</code>&#8212;an
array of pointers to arrays&#8212;or <code>~[[int, ..4]]</code>, an array of
fixed-length arrays of size 4.</p>

<p>Pointers to values of unsized type (e.g., <code>@U</code>, <code>&amp;U</code>) are &#8220;fat&#8221;
pointers, meaning that at runtime they are represented by a pair
(<code>(pointer, meta)</code>).  The first word is a pointer to the data, and the
second word (<code>meta</code>) is some kind of descriptor that indicates what
size the data has. The exact nature of this descriptor will change
depending on the type <code>U</code>, but there is always something there (for
vectors, the meta value is just a length; for objects, it&#8217;s a vtable;
etc). Standard pointer operations (notably borrowing) are applied to
the <code>pointer</code> portion of this pair but leave the <code>meta</code> portion
intact.</p>

<h3>Writing generic code in the face of unsized types</h3>

<p>Using this definition of types means that we can write and compose
generic impls that operate over types like <code>@T</code>, <code>~T</code>, and <code>[T]</code>,
instead of writing impls, like the following:</p>

<pre><code>impl&lt;T:ToStr&gt; ToStr for @T {
    fn to_str(&amp;self) -&gt; ~str {
        let @ref v = *self;
        fmt!("@%s", v.to_str())
    }
}

impl&lt;T:ToStr&gt; ToStr for ~T {
    fn to_str(&amp;self) -&gt; ~str {
        let ~ref v = *self;
        fmt!("~%s", v.to_str())
    }
}

impl&lt;T:ToStr+Sized&gt; ToStr for [T] {
    fn to_str(&amp;self) -&gt; ~str {
        let mut result = ~"";
        let mut prefix = "";
        result.push_char('[');
        for self.each |v: &amp;T| {
            result.push_str(prefix);
            result.push_str(v.to_str());
            prefix = ",";
        }
        result.push_char(']');
    }
}
</code></pre>

<p>This replaces the impls we must write today, which would be over <code>~T</code>,
<code>@T</code>, <code>~[T]</code>, <code>@[T]</code> (and <code>&amp;T</code> and <code>&amp;[T]</code>, typically, but I didn&#8217;t
include those in the above example).</p>

<p>However, there is a catch. The compiler must ensure that unsized types
do not appear in illegal locations. For example, we cannot have a
local variable of unsized type, because that would require an unknown
amount of stack space. Similarly, we cannot have a vector whose
elements are unsized. In fact, this is visible in the previous code
snippet: if you look carefully at the impl for <code>[T]</code>, you will see
that the type <code>T</code> is declared with a bound <code>Sized</code>:</p>

<pre><code>impl&lt;T:ToStr+Sized&gt; ToStr for [T] { ... }
</code></pre>

<p>This indicates that the type <code>T</code> must be a sized type.</p>

<p>In practice, I suspect we wouldn&#8217;t have to write the <code>Sized</code> bound very
often. This is because the traits <code>Copy</code> and <code>Clone</code> must extend
<code>Sized</code>, since they return a new instance of the receiver, and you
can&#8217;t return an unsized type (note that functions must take sized
arguments and return sized values). Today, most generic functions fall
into two categories: those that copy values around, and those that
manipulate them solely by reference. The former would require a
<code>Sized</code> bound, but then they also require a <code>Copy</code> bound, which
implies <code>Sized</code>. The latter do not require <code>Sized</code> at all.</p>

<h3>In summary</h3>

<p>In summary, I think we can have our cake and eat it too. If we change
the representation of vectors and slices, we can have composable types
<em>and</em> all the efficiency and flexibility of the current system. The
price is that we must distinguish &#8220;sized&#8221; from &#8220;unsized&#8221; type
parameters. I argue that this is likely to be a minor cost, since most
of the time parameters that would require a <code>Sized</code> bound will already
have a <code>Copy</code> or <code>Clone</code> bound anyhow. I think that&#8217;s pretty exciting,
since the non-composability of vector types has always seemed like a
language wart in the making.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Parallelizable JavaScript Subset]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/04/30/parallelizable-javascript-subset/"/>
    <updated>2013-04-30T11:30:00-04:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/04/30/parallelizable-javascript-subset</id>
    <content type="html"><![CDATA[<p>I want to look at an interesting topic: what subset of JavaScript do
we intend to support for parallel execution, and how long will it take
to get that working? As my dear and loyal readers already know, our
current engine supports a simple subset of JavaScript but we will want
to expand it and make the result more predictable.</p>

<p>From my point of view, the subset below includes basically all the
JavaScript syntax that I ever use. There are two primary limitations
that I think people will encounter in practice:</p>

<ol>
<li><em>The fact that mutating shared state boots you out of parallel
mode.</em> This restriction is (I think) easy to understand, but it
will often take some restructuring to obey it.</li>
<li><em>The fact that strings and many native objects (regular
expressions, DOM objects, etc) are currently not supported.</em> I
expect we&#8217;ll improve the support for strings in the near term and
also for some native objects, but for others&#8212;notably DOM&#8212;it
will be <em>very</em> challenging to make things work in parallel, due to
the many complex implementation details.</li>
</ol>


<p>OK, let&#8217;s get to the details. In the list that follows, we often refer
to a <em>plain old JavaScript object</em> (POJO), which means an object
defined in JavaScript, not a built-in object. It should be created
either with a literal (<code>{...}</code>, <code>[...]</code>) or a <code>new C</code> expression where
<code>C</code> is a user-defined function. I&#8217;ve also written <em>bug</em> for cases
where the current implementation is more limited than it should be.</p>

<ul>
<li><code>a</code>: Variable access

<ul>
<li>You should be able to access any variable in scope, so long as you
do not use <code>with</code> or <code>eval</code>. I think this all works fine today,
though there may be errors in the implementation today relating to
infrequently used access patterns, such as <code>"use strict"</code>.</li>
</ul>
</li>
<li><code>a.b</code>: Property access

<ul>
<li>If <code>a</code> is a POJO and <code>b</code> is a data property

<ul>
<li>no getters (should work someday)</li>
</ul>
</li>
</ul>
</li>
<li><code>a[e]</code>: Element access

<ul>
<li>If <code>a</code> is a POJO or a TypedArray</li>
</ul>
</li>
<li><code>a + b</code>, <code>a - b</code>, etc: Binary operators

<ul>
<li>for any primitive values (someday we should be able to support all objects)</li>
<li><em>bug:</em> today, only works if <code>a</code> and <code>b</code> are both numbers or bools
 (in any combination)</li>
</ul>
</li>
<li><code>a === b</code>, <code>a !== b</code>: Strict equality and unequality

<ul>
<li>always works</li>
</ul>
</li>
<li><code>a == b</code>, <code>a &gt; b</code>, etc: Loose relational operators

<ul>
<li>works if <code>a</code> and <code>b</code> are both numbers or bools (in any combination)</li>
</ul>
</li>
<li><code>a[i] = b</code>: Numeric property assignment

<ul>
<li>if <code>a</code> is a POJO or TypedArray owned by current task, and <code>i &lt;=
a.length</code> (no holes&#8212;<em>not yet, anyway</em>)</li>
<li><em>bug:</em> today, we must successfully predict that <code>a</code> will be an array
or typed array. In practice, this preciction is reasonably successful,
but problems arise when the same function is called many times from
different contexts.</li>
</ul>
</li>
<li><code>a.e = b</code> or <code>a["e"] = b</code>: String property assignment

<ul>
<li>if <code>a</code> is a POJO owned by the current task</li>
<li><em>bug:</em> today, we must successfully predict the offset of <code>e</code> within <code>a</code>.
In practice, this prediction often fails, as the code must be written
very, very carefully for it to work.</li>
</ul>
</li>
<li><code>{...}</code>, <code>[...]</code>, <code>new C(...)</code>: object literals and creation

<ul>
<li>for <code>new C(...)</code>, C must be a JavaScript function</li>
</ul>
</li>
<li><code>f()</code> and <code>a.m(...)</code>: Function and method calls

<ul>
<li>If the function being called is a user-implemented function, or
one of the functions in the following list:

<ul>
<li>higher-order functions like <code>map</code>, <code>reduce</code>, etc</li>
<li>parallel higher-order functions like <code>pmap</code>, <code>preduce</code>, etc</li>
<li><code>Array.push</code> (presuming the receiver is writable)

<ul>
<li><em>bug:</em> today only <code>a[a.length] = e</code> works, not <code>a.push(e)</code></li>
</ul>
</li>
<li><code>Math.*</code>

<ul>
<li><em>bug:</em> today, most but not all <code>Math</code> functions work, and only
if we predict the function that will be called and are able to
inline it. In practice, this prediction is almost always
successful.</li>
</ul>
</li>
<li>(more to come)</li>
</ul>
</li>
</ul>
</li>
<li><code>function(a, b) { ... }</code> or <code>(a, b) =&gt; { ... }</code>: closure creation

<ul>
<li>this should basically always work, I think</li>
<li><em>bug:</em> the <code>=&gt;</code> syntax doesn&#8217;t work yet in parallel execution, I
 don&#8217;t believe</li>
</ul>
</li>
<li><code>if</code>, <code>while</code>, etc

<ul>
<li>works fine</li>
</ul>
</li>
</ul>


<p>One caveat I should point out in the current implementation: even if
you stick to the above subset, it is possible that <em>some</em> parallel
iterations will abort, generally because of mispredicted types or
other transient errors. But I consider a parallel abort to be ok if
the engine will eventually stabilize and all subsequent runs will be
successful. This is the same as what happens with JIT engines, which
often generate code that is later invalidated and recompiled.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[The Case of the Recurring Closure]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/04/30/the-case-of-the-recurring-closure/"/>
    <updated>2013-04-30T10:51:00-04:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/04/30/the-case-of-the-recurring-closure</id>
    <content type="html"><![CDATA[<p>Yesterday I realized that you can violate Rust&#8217;s memory safety
guarantees by using &#8220;stack closures&#8221;, meaning closures that are
allocated on the stack which have can refer to and manipulate the
local variables of the enclosing stack frame. Such closures are
ubiquitous in Rust, since every <code>for</code> loop makes use of them (and
virtually every higher-order function). Luckily, this hole can be
fixed with (I think) very little pain&#8212;in fact, I think fixing it
can also help us make other analyses a little less strict.</p>

<p>The problem stems from the fact that, if you are clever, you can get a
stack closure to recurse (that is, to call itself again with the same
environment). This would mean that while the stack closure has a new
set of local variables, the variables <em>it inherits from its
environment</em> are the same. When the borrow checker was first written,
this was not true, but it is now, since closures were generalized in
the meantime.</p>

<h3>Executive Summary</h3>

<p>My proposed fix is a change that will guarantee statically that stack
closures cannot recurse (that is, cannot call themselves with the same
environment).  I&#8217;ll go into the details of the problem and my proposed
fix in the post, but I wanted to start by briefly summarizing what the
effects would be on end-users.</p>

<ul>
<li>Almost all if not all existing higher-order functions would still
work fine. In particular, you can call <code>&amp;fn</code> closures normally and
pass them as a parameter to another function and so on. What would
be a little more subtle is storing <code>&amp;fn</code> closures into data
structures; it&#8217;d still be supported, but some patterns that would be
legal today would become illegal.</li>
<li><p>The &#8220;liveness&#8221; pass, which checks that all variables are
initialized, can be generalized to permit closures to move out from
local variables so long as they move a new value back in. This means
that <code>foldl</code> can be implemented without copies:</p>

<pre><code>function foldl&lt;A,B,I:Iterable&lt;B&gt;&gt;(a0: A,
                                  iter: &amp;I,
                                  op: &amp;fn(A,&amp;B) -&gt; A) -&gt; A {
    let mut result = a0;

    // Here I am deliberately desugaring the `for` syntax,
    // because I want to emphasize that this is a closure:
    iter.each(|b| {
        // Note: A is not copyable, therefore this call
        // *moves* from result into `op()` and then restores
        // it. This did not used to be legal because we are
        // executing in a closure, and we were afraid
        // that `op` might in fact somehow recurse, in which
        // case it would find that `result` is uninitialized.
        result = op(result, b);
        true
    })
}
</code></pre></li>
</ul>


<p>What would not work would be:</p>

<ul>
<li>Passing the same <code>&amp;fn</code> closure as a parameter more than once,
or calling a <code>&amp;fn</code> closure with itself as an argument (no Y combinators).</li>
<li><p>Making a struct <code>S</code> that contains <code>&amp;fn</code> closures and then calling those
closures via an <code>@S</code> or
<code>&amp;S</code> pointer, like so:</p>

<pre><code>struct S {f: &amp;fn()}
fn foo(s: &amp;S) { s.f() } // would be illegal
</code></pre>

<p>You could write this code using an <code>&amp;mut S</code> pointer, though:</p>

<pre><code>struct S {f: &amp;fn()}
fn foo(s: &amp;mut S) { s.f() } // would be illegal
</code></pre>

<p>The reason for this is that <code>&amp;mut</code> pointers are non-aliasable.
Similar rules arise in the revised borrow checker I&#8217;ve been working
on for various corner cases where aliasing is a concern. I&#8217;ll have a
post on that at some point too.</p></li>
</ul>


<!-- more -->


<p></p>

<h3>What is the problem, anyway?</h3>

<p>Here is an example of an unsound function:</p>

<pre><code>struct R&lt;'self&gt; {
    // This struct is needed to create the
    // otherwise infinite type of a fn that
    // accepts itself as argument:
    c: &amp;'self fn(&amp;R)
}

fn innocent_looking_victim() {
    let mut vec = ~[1, 2, 3];
    conspirator(|f| {
        if vec.len() &lt; 100 {
            vec.push(4);
            for vec.each |i| {
                f.c(&amp;f)
            }
        }
    })
}

fn conspirator(f: &amp;fn(&amp;R)) {
    let r = R {c: f};
    f(&amp;r)
}
</code></pre>

<p>What happens when you run this function is that the vector <code>vec</code> is
pushed to while it is also being iterated over, which is supposed to
be impossible. The root cause of this problem is that the borrow
checker generally assumes that <code>&amp;fn</code> closures do not recurse (which,
when it was first written, was true). Because of this, the closure <code>f</code>
which is passed to <code>conspirator</code> is permitted to freeze <code>vec</code>, because
it looks to the borrow checker like it can track all the possible
aliases of <code>vec</code> and it sees that this action is ok. But the borrow
checker is of course mistaken here, since the closure <code>f</code> is passed to
itself as an argument, and thus there <em>is</em> an alias of <code>vec</code>, capured
in the closure environment.</p>

<p>The problem lies in the <code>&amp;fn</code> closures, which effectively create
implicit references to the data they capture. I tried to make up an
example showing what that function looks like if state is passed
explicitly, but due to the problem of recursive types it is quite
tedious, so I&#8217;m going to, um, leave it as an exercise to the reader.</p>

<p>Anyhow, my solution has two parts:</p>

<ol>
<li>Modify the borrow checker to treat these implicit references just
like any other reference in the borrow checker. Basically the model
should be that when a stack closure with lifetime <code>'a</code> is created,
its contents are opaque to the creator, except that any data which
it references is considered borrowed for the lifetime <code>'a</code>. The
type of borrow will depend on how the variable is used (for
example, is it read? mutated?  borrowed from within the
closure?). In the case above, the variable <code>vec</code> would be borrowed
mutably, since it is pushed to.</li>
<li>Guarantee that closures cannot recurse, because otherwise we&#8217;d have
to treat every upvar as potentially aliased, which would make most
programs illegal.</li>
</ol>


<p>Let&#8217;s look at those changes in more detail.</p>

<h3>Modifications to the borrow checker</h3>

<p>The basic idea would be to examine the body of each closure as we are
conducting the borrow check to examine what free variables it
references and how.  This is fairly straightforward to do: the borrow
checker conducts a walk of the AST already to find all the functions
it must check, so basically what we would do is to analyze functions
on the way up the tree. So we would analyze each closure first,
assuming it has total access to the upvars of the parent. We would
then compute a list of the upvars that the closure borrowed and what
level of access it required.  In the parent fn, when we find a closure
expression, we would not examine the body of the closure but rather
just treat it as taking out loans that persist for the lifetime of the
closure. This is very similar to what we have to
do for <code>once</code> fns and also what we do for moves (I guess that&#8217;s
another post, though).</p>

<h3>Guaranteeing closures cannot recurse</h3>

<p>The idea here would be to make all <code>&amp;fn</code> closures
non-copyable. Basically this would mean the only copyable closure type
would be an <code>@fn</code>:</p>

<ul>
<li><code>~fn</code> is non-copyable</li>
<li><code>~once fn</code> is non-copyable</li>
<li><code>&amp;fn</code> <em>will become</em> non-copyable</li>
<li><code>&amp;once fn</code> is non-copyable</li>
<li><code>@fn</code> is copyable</li>
<li><code>@once fn</code> is non-copyable</li>
</ul>


<p>At first I thought that <code>@once fn</code> was not necessary, but in fact it
is potentially useful for combinator libraries and the like, as it
allows you to return a fn that can move out of its environment.</p>

<p>For fun, let&#8217;s review the <a href="http://smallcultfollowing.com/babysteps/blog/2012/10/23/function-and-object-types/">full closure type specification</a> from
a previous blog post, modernized somewhat and taking these changes
into account.</p>

<pre><code>(&amp;'r|~|@'r) [unsafe] [once] fn [:K] (S) -&gt; T
^~~~~~~~~~^ ^~~~~~~^ ^~~~~^    ^~~^ ^~^    ^
   |          |        |        |    |     |
   |          |        |        |    |   Return type
   |          |        |        |  Argument types
   |          |        |    Environment bounds
   |          |     Once-ness (a.k.a., affine)
   |        Effect
Allocation type and lifetime bound
</code></pre>

<p>One part I had hoped to remove was the environment bounds, but I think
they are still necessary. The only real use case for this is <code>:Const</code>,
which would be a way of saying that the closure only closes over
deeply immutable data. This enables parallelism in various ways
(putting closures in ARCs, fork-join parallelism a la PJS, etc).
Conceivably we could also support <code>:Clone</code>, which would permit
closures to be cloned, but we&#8217;d need some magic support in trans (code
which, admittedly, mostly exists) to make that work.</p>

<h3>Some musings on orthogonality or lack thereof</h3>

<p>It annoys me that the rules for closures feel&#8230; one-off. I considered
briefly if we were not categorizing things correctly. To some extent,
the answer is clearly yes: there are many partly orthogonal
characteristics of closures (once-ness, type of pointer used to
reference its data, kinds of loans it requires, etc). Ultimately, we
are trying to boil this down into a relatively small set of types that
covers all important use cases.</p>

<p>A similar phenomena occurs with <code>&amp;</code> and <code>&amp;mut</code>: there are really two
characteristics, aliasability and mutability, and we have joined them
together, such that <code>&amp;</code> references are immutable and aliasable and
<code>&amp;mut</code> are mutable and unaliasable. This is typically what you want,
but there are rare occasions where you must use <code>&amp;mut</code> solely for its
non-aliasable nature and not because of mutability. In particular, if
you want access to other non-aliasable things, such as other <code>&amp;mut</code>
pointers or (per this post) <code>&amp;fn</code> closures.</p>

<p>In the beginning of the post I wrote that the following example will
not work:</p>

<pre><code>  struct S {f: &amp;fn()}
  fn foo(s: &amp;S) { s.f() } // would be illegal
</code></pre>

<p>The reason for this has to do precisely with the fact that <code>&amp;S</code>
pointers are always aliasable. Hence we could not permit <code>s.f</code> to be
called because we can&#8217;t guarantee that there are no aliases to <code>s</code>
lurking around, and thus creating aliases to <code>s.f</code>. You could fix this
program by using <code>&amp;mut</code>:</p>

<pre><code>  struct S {f: &amp;fn()}
  fn foo(s: &amp;mut S) { s.f() } // legal
</code></pre>

<p>In this case, we don&#8217;t care about mutability, we do care about
uniqueness.</p>

<p>I was debating for a time whether to suggest adding more facets to the
various types. For example, one could imagine <code>&amp;</code>, <code>&amp;alias</code>, <code>&amp;mut</code>,
and <code>&amp;mut alias</code>. But I think that ultimately, this is a bad idea.
For one thing, the correct aliasability default varies, so you&#8217;d
<em>probably</em> want something like <code>&amp;</code>, <code>&amp;noalias</code>, <code>&amp;mut</code>, <code>&amp;mut
alias</code>. The type system ultimately feels more complex, with many
branches (case in point: see the full closure type specification
above!).</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Nested lifetimes]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/04/04/nested-lifetimes/"/>
    <updated>2013-04-04T19:04:00-04:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/04/04/nested-lifetimes</id>
    <content type="html"><![CDATA[<p>While working on <a href="https://github.com/mozilla/rust/issues/5656">issue #5656</a> I encountered an interesting
problem that I had not anticipated.  The result is a neat little
extension to the region type system that increases its expressive
power.  The change is completely internal to the type rules and
involves no user-visible syntax or anything like that, though there
are some (basically nonsensical) programs that will no longer compile.
Anyway I found it interesting and thought I would share.</p>

<!-- more -->


<h3>Background: Issue #5656 and the case of the many cursors</h3>

<p>To explain the problem I encountered, consider this running example
(written as one would write it after my patch):</p>

<pre><code>struct OwnedCursor&lt;T&gt; {
    buffer: ~[T],
    position: uint
}

pub impl&lt;T&gt; OwnedCursor&lt;T&gt; {
    fn get&lt;'c&gt;(&amp;'c self) -&gt; &amp;'c T {
        &amp;self.buffer[self.position]
    }

    fn move(&amp;mut self, by: int) -&gt; bool {
        if (by &gt; 0) {
            self.position += by as uint;
        } else {
            self.position -= by as uint;
        }
        self.position &lt; self.buffer.len()
    }
}
</code></pre>

<p>This defines a type <code>OwnedCursor</code> that owns a vector and a position within
that buffer.  The type offers two methods, <code>get()</code> and <code>move()</code>.  The
method definitions themselves should be fairly
self-explanatory. <code>get()</code> returns a pointer to the current item and
<code>move()</code> modifies the current position.</p>

<p>What is interesting is the lifetimes on the function <code>get()</code>. The
signature indicates that it takes a borrowed pointer to an
<code>OwnedCursor</code> with lifetime <code>'c</code> and returns a pointer with that same
lifetime.  In other words, the <code>&amp;'c self</code> declaration means that the
method <code>get()</code> is roughly equivalent to a function written as follows:</p>

<pre><code>fn get&lt;'self, T&gt;(self: &amp;'c OwnedCursor&lt;T&gt;) -&gt; &amp;'c T {
    &amp;self.buffer[self.position]
}
</code></pre>

<p>The reason that we can say that the returned value has the same
lifetime as the input is that (1) we know that the pointer <code>self</code> will
be valid for the entirety of the lifetime <code>'c</code> and (2) <code>self</code> is an
immutable pointer, so the field <code>buffer</code> will not be mutated.  So we
can say that, for the lifetime <code>'c</code>, the <code>OwnedCursor</code> object will not
be freed and it is immutable, therefore we can take a pointer into
<code>self.buffer</code> and know that this memory is also valid.</p>

<p>Now let&#8217;s suppose that we wanted to develop many kinds of cursors.
We might introduce a generic trait and convert the impl to use it:</p>

<pre><code>trait Cursor&lt;T&gt; {
    fn get&lt;'c&gt;(&amp;'c self) -&gt; &amp;'c T;
    fn move(&amp;mut self, by: int) -&gt; bool;
}

impl&lt;T&gt; Cursor&lt;T&gt; for OwnedCursor&lt;T&gt; {
    // as before
}
</code></pre>

<p>Now we can introduce a second kind of cursor, one that doesn&#8217;t <em>own</em>
the vector that it iterates over:</p>

<pre><code>struct BorrowedCursor&lt;'b, T&gt; {
    buffer: &amp;'b [T],
    position: uint
}

impl&lt;'b, T&gt; Cursor&lt;T&gt; for BorrowedCursor&lt;'b, T&gt; {
    fn get&lt;'c&gt;(&amp;'c self) -&gt; &amp;'c T {
        &amp;self.buffer[self.position]
    }

    fn move(&amp;mut self, by: int) -&gt; bool {...}
}
</code></pre>

<p>This definition is very similar, except that the type and impl are
parameterized by a lifetime <code>'b</code>, representing the lifetime of the
<em>b</em>uffer.  Everything seems fine, but when I tried running this
example through the compiler with my patch, I encountered a type error
on the <code>get()</code> routine.  The compiler reported that the lifetime of
<code>&amp;self.buffer[self.position]</code> was not <code>'c</code> but rather <code>'b</code>&#8212;the
lifetime of the buffer, and so the return type of the function was
invalid.</p>

<p>Unfortunately, the compiler is quite correct!  The function signature
states that we return a pointer with the lifetime <code>'c</code>, but here <code>'c</code>
is the lifetime of the <em>cursor</em>.  Our pointer is a pointer into the
<em>buffer</em>, so it will have the lifetime <code>'b</code>.</p>

<p>In the original <code>OwnedCursor</code> type, the buffer was owned by the
cursor, so the result had identical lifetimes.  But in the case of
<code>BorrowedCursor</code>, the lifetime of the buffer is not tied to the cursor
object, which after all is only borrowing the buffer.</p>

<p>Of course, it&#8217;s kind of nonsensical to have a cursor that outlives the
buffer it&#8217;s working with.  But there is in fact nothing in the type
system that would prevent you from doing that&#8212;it&#8217;s just that once
your buffer became invalid you would be prevented from actually
<em>using</em> the cursor anymore.</p>

<h3>The solution</h3>

<p>This problem was quite vexing at first.  The example I just gave is
perfectly reasonable and it really should work (and, in fact, the old
system I am attempting to replace allowed it, though in a kind of
roundabout and possibly unsound way).  The solution I decided on was
just to formalize the common sense rule that a pointer should not have
a longer lifetime than any other pointers it points at.  In other words,
if I create a pointer to a <code>BorrowedCursor&lt;'b, T&gt;</code>, my pointer cannot
have a lifetime that exceeds the lifetime <code>'b</code> of the buffer.</p>

<p>If we assume this rule holds, then when you have a function that takes
an argument of type <code>&amp;'c BorrowedCursor&lt;'b, T&gt;</code> (such as the <code>self</code>
argument to <code>get()</code>), the compiler can deduce that <code>'c</code> must be a
smaller lifetime than <code>'b</code>, because otherwise the caller would have
encountered a type error.  This means that the <code>get()</code> method for
<code>BorrowedCursor&lt;'b, T&gt;</code> is permitted to return a pointer with lifetime
<code>'c</code>&#8212;the largest possible lifetime is still <code>'b</code>, but <code>'c</code> is a
sound approximation (it can only be shorter than <code>'b</code>, after all).</p>

<p>I wonder if this problem arises in other similar type systems. I
remember Safe Java had a rule that the &#8220;main owner&#8221; of an object must
own all the other owners, or something like that, but I think this was
a soundness concern having to do with downcasting. I&#8217;ll have to go and
re-read the various papers. Anyway, I&#8217;ve been debating whether to
allow types like <code>&amp;'a &amp;'b uint</code> where <code>'a</code> outlives <code>'b</code> for a while.
They would seem to have no practical use but I couldn&#8217;t think of a
reason to add extra rules to prohibit them&#8230;  until now, anyway.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[A tour of the Parallel JS implementation (Part 2)]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/04/04/a-tour-of-the-parallel-js-implementation-part-2/"/>
    <updated>2013-04-04T10:17:00-04:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/04/04/a-tour-of-the-parallel-js-implementation-part-2</id>
    <content type="html"><![CDATA[<p>In my <a href="http://smallcultfollowing.com/babysteps/blog/2013/03/20/a-tour-of-the-parallel-js-implementation">last post about ParallelJS</a>, I discussed the <code>ForkJoin()</code>
intrinsic and showed how it was used to implement the parallel map
operation.  Today I want to write about the high-level changes to
IonMonkey that are needed to support <code>ForkJoin()</code>.  IonMonkey, of
course, is our JavaScript engine.</p>

<h3>Parallel execution mode</h3>

<p>To support ParallelJS, we introduce a second mode of compilation
called <em>parallel execution mode</em>. JavaScript compiled in this mode
produces executable code that is suitable to be run in parallel.  To
accommodate this new mode, each <code>JSScript*</code> potentially contains
pointers to two <code>IonScript*</code> data structures, one for standard
sequential mode and one for parallel mode.</p>

<p>Execution normally stays confined within one mode.  So if you are
running a function <code>f</code> in sequential mode and it invokes another
function <code>g</code>, then we will run the sequential mode version of <code>g</code>.
But if you are running <code>f</code> in <em>parallel mode</em>, it will call the
parallel version of <code>g</code>.  The only place where we move between modes
is in the <code>ForkJoin</code> intrinsic, which invokes the parallel mode script
for the first time.</p>

<p>You may wonder why we permit each script to be compiled in both modes
simultaneously. The reason is that it is possible to have helper
functions and code that runs in both sequential and parallel mode.
Imagine, for example, that you have a helper function for searching
an array to find the object with a given name:</p>

<pre><code>function findObject(list, name) {
    for (var i = 0; i &lt; list.length; i++) {
        if (list[i].name === name)
            return list[i];
    }
    throw Error("No object with name " + name + " found!");
}
</code></pre>

<p>It is perfectly reasonable to want to invoke this helper function both
from sequential and from parallel code. If we only permitted a
function to be compiled in one mode or the other, we would always be
recompiling <code>findObject</code> each time we started or finished a parallel
operation.</p>

<h3>Differences between parallel and sequential execution mode</h3>

<p>The biggest difference between parallel and sequential mode is that
code executing in parallel mode is guaranteed to be <em>pure</em>.  That is,
it can never write to any shared state that might be visible from
other threads. This purity requirement generally includes not only
user-visible JavaScript state but also internal engine details. For
example, in sequential mode code, after we have done several property
lookups on an object that has a large number of properties, we will
&#8220;hashify&#8221; the property chain, meaning that we convert it from an array
into a dictionary to make later lookups faster. This hashification
operation is not visible to the JavaScript user (except insofar as
subsequent property lookups are faster), but it is still disallowed in
parallel execution mode because it would cause data races.</p>

<p>There are some exceptions to the purity requirement.  The first and
most obvious is the <code>UnsafeSetElement</code> intrinsic I discussed in
<a href="http://smallcultfollowing.com/babysteps/blog/2013/03/20/a-tour-of-the-parallel-js-implementation">part one</a>, which is used to track the progress of parallel
work. The second exception is that it is ok to modify internal engine
details so long as those modifications are threadsafe. For example, in
<a href="https://bugzilla.mozilla.org/show_bug.cgi?id=846111">bug 846111</a>, Shu has implemented threadsafe inline-caching, which is of
course a mutation to shared state.</p>

<p>Generally speaking, though, when you call a parallel mode function you
can be sure that it will either complete successfully or bailout.  In
either case, you know that it has no lasting effects that are visible
to end-user JavaScript code, except those that might have occurred via
the <code>UnsafeSetElement</code> intrinsic (which of course is only usable from
self-hosted code and which must be carefully audited).</p>

<h3>Changes to the Ion compilation process</h3>

<p>There are two major changes when compiling in <code>ParallelExecutionMode</code>:
The first change is the so-called &#8220;parallel array analysis&#8221;, which
analyzes the actions that the scripts take and modifies them as needed
to ensure that each action is either threadsafe (and pure) or else
that the script bails out. The second change is that do not compile a
single script in isolation but rather attempt to compile the
transitive closure of a starting script and all scripts that it may
call.</p>

<h4>Parallel array analysis</h4>

<p>The parallel array analysis can be found in
<a href="http://hg.mozilla.org/mozilla-central/file/c232bec6974d/js/src/ion/ParallelArrayAnalysis.cpp"><code>js/src/ion/ParallelArrayAnalysis.cpp</code></a>, in the function
<code>ParallelCompileContext::analyzeAndGrowWorklist()</code>.  It runs after the
normal suite of optimizations have taken place.  Its primary goal is
to ensure that the parallel code will be pure and threadsafe.</p>

<p>To that end, it performs a walk of the control-flow graph and examines
each MIR instruction using a visitor. The MIR instructions are
<a href="http://hg.mozilla.org/mozilla-central/file/c232bec6974d/js/src/ion/ParallelArrayAnalysis.cpp#l121">categorized into one of several categories</a>, as follows:</p>

<ul>
<li><em>Safe operations</em> are operations that can be safely executed in parallel
without changes, such as <code>Constant</code> or <code>Box</code>.</li>
<li><em>Write-guarded operations</em> are operations that are safe as long as
the value being modified is not shared.  To verify this, we insert a
write guard before the operation in question.  The write guard will
cause a bailout should the object be shared (more on the details of
this check to come in a later post). N.B.&#8212;write guards are not to
be confused with write <em>barriers</em>, which have to do with incremental
and generational garbage collection.</li>
<li><em>Specialized operations</em> are numeric operations that are safe so long
as they are operating over scalar data, such as <code>Add</code>, <code>Mul</code>, etc.</li>
<li><em>Unsafe operations</em> are operations that are just plain disallowed in
parallel execution, generally because we have not made an
equivalent threadsafe path.  An example is <code>RegExp</code>.</li>
<li><em>Custom operations</em> are, well, everything else.  Generally speaking
these are operations that are not safe by default in parallel mode,
but where there exists an alternative version that <em>is</em> safe,
such as <code>NewArray</code> or <code>NewObject</code>.</li>
</ul>


<p>The categorization of instructions is <a href="http://hg.mozilla.org/mozilla-central/file/c232bec6974d/js/src/ion/ParallelArrayAnalysis.cpp#l29">done using macros</a>. The
visitor expects one method per MIR instruction type. There are various
macros for each of the above categories, and the macro expands into a
pre-canned method definition (in the case of custom operations, the
macro expands to an out-of-line method, and the method body appears
later in the file).</p>

<p>I&#8217;ll talk a little bit more about the safe and unsafe operations now,
and I&#8217;ll cover the other cases (write guards, memory allocation, etc)
in later posts.</p>

<p>Safe operations are simply left unchanged, and they execute just as
they would in sequential mode (though in some cases there are checks
in the <code>CodeGenerator</code> so that the MIR behaves somewhat differently).</p>

<p>When an unsafe operation is encountered, the basic block in which it
resides is removed from the graph along with its dominated subtree.
In its place, we add a bailout block that will cause parallel
execution to bailout should it ever execute. This ensures that unsafe
operations that never execute do not prohibit safe code from running.</p>

<h4>Transitive compilation</h4>

<p>In normal sequential mode, if we encounter a call to a script that is
not compiled, we just invoke the interpreter. In parallel mode this
option is not available. So what we do instead is to take advantage of
the information that TI makes available and, when compiling a script
<em>x</em>, collect all scripts that <em>x</em> might call. Then, once we have
compiled <em>x</em>, we go on and compile those scripts. The process is
transitive, meaning that we will then continue on to compile the
scripts that <em>x</em>&#8217;s callees might call and so forth until we reach a
fixed point.</p>

<p>Note that we do not monitor for hot paths, as we do in sequential
mode.  That is, we don&#8217;t care if the script has been called 10 times
or 100 times before.  This is for two reasons: one, we assume that
parallel paths are going to be hot, since they are going to be called
over all the entries in a large array.  Two, seeing as we will have to
bailout if we encounter a call to an uncompiled script, it&#8217;s worth
erring on the side of more compilation rather than less. We do however
check that the use count of the script is at least <em>one</em>, so as to
avoid compiling things that never run.</p>

<p>At runtime, when we see a call to a JavaScript function, we check
whether it has been compiled for parallel execution.  If so, we can
simply call it as normal and carry on.  This is the expected case, of
course.</p>

<p>If we encounter a call to an uncompiled script, which can happen
either because our transitive compilation was incomplete or because
the callee was invalidated or garbage-collected in the mean-time, we
bailout with an &#8220;uncompiled script&#8221; error.  At this point, control
returns to
<a href="http://smallcultfollowing.com/babysteps/blog/2013/03/20/a-tour-of-the-parallel-js-implementation">the <code>ForkJoin</code> function I described in my previous post</a>.
Presuming that we haven&#8217;t encountered too many bailouts yet,
<code>ForkJoin</code> will cycle around and try to compile the uncompiled script.</p>

<p>When compiling an uncompiled script, we also set a flag on all the
currently executing scripts in the stack trace.  This flag is a
warning that execution of that script is likely to encounter an
uncompiled script.  The purpose for this flag is to notify later
callers that while the script itself is valid, it likely has callees
that have not been compiled, so before running the script in parallel
we should re-walk the transitive closure of things it might call and
check for anything that is missing.</p>

<!-- LINKS -->



]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Associated items continued]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/04/03/associated-items-continued/"/>
    <updated>2013-04-03T08:37:00-04:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/04/03/associated-items-continued</id>
    <content type="html"><![CDATA[<p>I want to <a href="http://smallcultfollowing.com/babysteps/blog/2013/04/02/associated-items">finish my discussion of associated items</a> by taking a
look at how they are handled in Haskell, and what that might mean in
Rust.</p>

<p>These proposals have the same descriptive power as what I described
before, but they are backwards compatible.  This is nice.</p>

<!-- more -->


<h3>Object-oriented style name resolution</h3>

<p>In the object-oriented, C++-like version of associated items that I
introduced before, the names of associated items and methods were
resolved relative to a type.  To see what I mean by this, consider a
(slightly expanded) variant the graph example I introduced before:</p>

<pre><code>trait Graph {
    type Node;      // associated type
    static K: uint; // associated constant
}

mod graph {
    fn depth_first_search&lt;G: Graph&gt;(
        graph: &amp;G) -&gt; ~[G::Node]
    {
        let k = G::K;
        ...
    }
}
</code></pre>

<p>Consider a path like <code>graph::depth_first_search</code>, which names an item
within a module.  This kind of path is based solely on the module
hierarchy and can be resolved without knowing anything types
whatsoever.</p>

<p>Now consider the paths <code>G::Node</code> and <code>G::K</code> that appear in
<code>depth_first_search</code>. These paths <em>look</em> similar to in form to
<code>graph::depth_first_search</code>, but they are resolved quite
differently. Because <code>G</code> is a type and not a module, if we want to
figure out what names <code>Node</code> and <code>K</code> refer to, we don&#8217;t examine the
module hierarchy but rather the properties of the type <code>G</code>. In this
case, <code>G</code> is a type parameter that implements the <code>Graph</code> trait, and
the <code>Graph</code> trait defines an associated type <code>Node</code> and an associated
constant <code>K</code>.</p>

<p>Note that the name lookup process here is exactly analogous to what
happens on a method call. With an expression like <code>a.b()</code>, the meaning
of the name <code>b</code> is resolved by first examining the type of the
expression <code>a</code> and then checking to see what methods that type
offers. The module hierarchy is not consulted.</p>

<p>The object-oriented style of naming specification is not fully
explicit. In particular, a path like <code>G::Node</code> does not specify the
trait in which <code>Node</code> was defined, the compiler must figure it out.
It is also possible that the type <code>G</code> implements multiple traits that
have an associated item <code>Node</code>, so the syntax could be ambiguous.  To
make things fully explicit, I proposed in my previous post that the
full syntax would be <code>Type::(Trait::Item)</code>.  So the fully explicit
form of <code>G::Node</code> would be <code>G::(Graph::Node)</code>, since the type <code>Node</code>
is defined in the trait <code>Graph</code>.</p>

<h3>Functional-style name resolution (take 1)</h3>

<p>In Haskell, all name resolution is done based on lexical scoping and
the module hierarchy. This is no accident. It means that the Haskell
compiler can figure out what each name in a program refers to without
knowing anything about the types involved, which is helpful when
performing aggressive type inference.</p>

<p>What this means is that we can&#8217;t use a path like <code>G::Node</code> to mean
&#8220;the type <code>Node</code> relative to the type <code>G</code>&#8221;, because interpreting this
path would require examining the definition of the type <code>G</code> (as we saw
before). Instead, if we were to use a syntax analogous to what Haskell
uses, one would write something like <code>Graph::Node&lt;G&gt;</code>.  Note that all
the names here (<code>Graph::Node</code>, <code>G</code>) can be resolved using only the
module hierarchy.</p>

<p>So the example I gave before would look as follows:</p>

<pre><code>trait Graph {
    type Node;      // associated type
    static K: uint; // associated constant
}

mod graph {
    fn depth_first_search&lt;G: Graph&gt;(
        graph: &amp;G) -&gt; ~[Graph::Node&lt;G&gt;]
    {
        let k = Graph::K::&lt;G&gt;;
        ...
    }
}
</code></pre>

<p>Note that where before we wrote <code>G::K</code> to refer to the constant <code>K</code>
associated with the type <code>G</code>, we would now write <code>Graph::K::&lt;G&gt;</code>.  As
is typical in Rust, the extra <code>::</code> that appears before the type
parameter <code>&lt;G&gt;</code> is necessary to avoid parsing ambiguities when the
path appears as part of an expression.</p>

<p>Let&#8217;s look a bit more closely at what&#8217;s going on here.  Effectively
what is happening is that, for each associated item within a trait, we
are adding a synthetic type parameter. For any reference to an
associated item, this type parameter tells the compiler which type is
implementing the trait. The path <code>Graph::Node</code> by itself is not
complete; <code>Graph::Node&lt;G&gt;</code> means &#8220;the type <code>Node</code> defined for the type
<code>G</code>&#8221;.</p>

<p>Let&#8217;s dig into it this Haskell-style convention a bit to see some of
the implications.</p>

<h4>Return type inference</h4>

<p>One benefit of the Haskell style convention is that the values for
the type parameters can often be deduced by inference. For example,
let&#8217;s return to the trait <code>FromStr</code> from my <a href="http://smallcultfollowing.com/babysteps/blog/2013/04/02/associated-items">previous post</a>.
The trait <code>FromStr</code> is used to parse a string and produce a value of
some other type:</p>

<pre><code>trait FromStr {
    fn parse(input: &amp;str) -&gt; Self;
}
</code></pre>

<p>We might implement <code>FromStr</code> for unsigned integers as follows:</p>

<pre><code>impl FromStr for uint {
    fn parse(input: &amp;str) -&gt; uint {
        uint::parse(input, 10) // 10 is the radix
    }
}
</code></pre>

<p>Now we could write a function that invokes <code>parse()</code> like so:</p>

<pre><code>fn parse_strings(v: &amp;[&amp;str]) -&gt; ~[uint] {
    v.map(|s| FromStr::parse(*s))
}
</code></pre>

<p>Note that when we called <code>FromStr::parse(*s)</code>, we did not say what
type it should parse to.  The compiler was able to infer that we wanted
to parse a string into a <code>uint</code> based on the return type of
<code>parse_strings()</code> as a whole.  A fully explicit version of <code>parse_strings</code>
would look like:</p>

<pre><code>fn parse_strings(v: &amp;[&amp;str]) -&gt; ~[uint] {
    v.map(|s| FromStr::parse::&lt;uint&gt;(*s))
    //                        ^~~~~~ specify return type
}
</code></pre>

<h4>Generic traits</h4>

<p>Imagine that we have a generic trait, like this <code>Add</code> trait:</p>

<pre><code>trait Add&lt;Rhs&gt; {
    type Sum;

    fn add(&amp;self, r: &amp;Rhs) -&gt; Sum&lt;Self, Rhs&gt;;
}
</code></pre>

<p>This trait is very similar to the trait used in Rust to implement
operator overloading, except it has been adapted to use an associated
type for the <code>Sum</code> (which is probably how the Rust type should be
defined as well, since the type of the sum ought to be determined by
the types of the things being added).</p>

<p>Previously, with the associated type <code>Node</code>, we said that any
reference to node had to include a single type parameter to indicate
the type that was implementing <code>Graph</code>.  But with a generic trait like
<code>Add</code> a simple type parameter is not enough.  To fully specify all the
types involved, we need to include both the <code>Self</code> type and any type
parameters.  This is why the return type of the method <code>add()</code> is
<code>Sum&lt;Self, Rhs&gt;</code>&#8212;a mere reference to <code>Sum</code> or <code>Sum&lt;Self&gt;</code> would be incomplete.</p>

<p><strong>Comparison to object-oriented form.</strong> Interestingly, this case is
something that the object-oriented style of naming cannot handle very
well. This is because the object-oriented convention is strongly
oriented towards specifying the <code>Self</code> type but does not easily expand
to accommodate generic traits. Using the fully explicit syntax that I
suggested in my <a href="http://smallcultfollowing.com/babysteps/blog/2013/04/02/associated-items">previous post</a>, I think the result would look
like <code>Self::(Add&lt;Rhs&gt;::Sum)</code>. (It&#8217;s plausible that, within the trait
definition itself, one could simply write <code>Sum</code>, but from outside the
trait definition I think it would be necessary to specify the full
type parameters).</p>

<h4>Generic items</h4>

<p>It is also possible to have an associated item which itself has
type parameters.  For example, we might want to have a graph
where kind of node can carry its own userdata:</p>

<pre><code>trait Graph {
    type Node&lt;B&gt;;

    fn get_node_userdata&lt;B&gt;(n: &amp;Node&lt;Self, B&gt;) -&gt; B;
}
</code></pre>

<p>Here we see that when we refer to the <code>Node</code> type in
<code>get_node_userdata</code>, we specify both the <code>Self</code> type parameter and the
type parameters defined on <code>Node</code> itself.  I think this is a bit
surprising.</p>

<p><strong>Comparison to object-oriented form.</strong> The object-oriented naming
scheme handles this case very naturally.  For example, <code>get_node_userdata()</code>
would be declared as follows:</p>

<pre><code>fn get_node_userdata&lt;B&gt;(n: &amp;Self::Node&lt;B&gt;) -&gt; B;
</code></pre>

<h3>Functional-style name resolution (take 2)</h3>

<p>In the previous section we added implicit type parameters to each
associated item.  Particularly in the cases of generic traits or
generic items, this can be a bit confusing. You wind up mixing type
parameters that were declared on the trait together with type
parameters declared on the item.</p>

<p>An alternative that would be a bit more explicit is to (1) designate
the implicit parameter for the trait&#8217;s Self type using a special
keyword, such as <code>for</code> or <code>self</code> (I prefer <code>for</code> since it echoes the
<code>impl Trait for Type</code> form) and (2) push the trait type parameters
into the path itself.  So instead of writing <code>Graph::Node&lt;G&gt;</code> you
would write <code>Graph::Node&lt;for G&gt;</code>, and instead of <code>Add::Sum&lt;Lhs, Rhs&gt;</code>
you would write <code>Add&lt;Rhs&gt;::Sum&lt;for Lhs&gt;</code>. You&#8217;ll see more examples of
how this looks in the next section.</p>

<h3>Conclusion: Comparing the conventions</h3>

<p>I think none of these conventions is perfect. Each has cases where it
is a bit counterintuitive or ugly. To try and make the comparison
easier, I&#8217;m going to create a table summarizing the object-oriented,
functional 1, and functional 2 styles, and show how each syntax looks
for each of the use cases I identified in this post. For each use
case, I&#8217;ll provide both the shortest possible form and the fully
explicit variant.</p>

<p><table class="hor-minimalist-a">
<tr><th colspan=3>Reference to an associated type</th></tr>
<tr><td>G::Node</td><td>Node&lt;G&gt;</td><td>Node&lt;for G&gt;</td></tr>
<tr><td>G::(Graph::Node)</td><td>Graph::Node&lt;G&gt;</td><td>Graph::Node&lt;for G&gt;</td></tr>
<tr><th colspan=3>Reference to an associated constant</th></tr>
<tr><td>G::K</td><td>K::&lt;G&gt;</td><td>K::&lt;for G&gt;</td></tr>
<tr><td>G::(Graph::K)</td><td>Graph::K::&lt;G&gt;</td><td>Graph::K::&lt;for G&gt;</td></tr>
<tr><th colspan=3>Call of an associated function</th></tr>
<tr><td>uint::parse()</td><td>parse()</td><td>parse()</td></tr>
<tr><td>uint::(Graph::parse())</td><td>FromStr::parse::&lt;uint&gt;()</td><td>FromStr::parse()::&lt;for uint&gt;</td></tr>
<tr><th colspan=3>Generic trait</th></tr>
<tr><td>Self::(Add&lt;Rhs&gt;::Sum)</td><td>Sum&lt;Self,Rhs&gt;</td><td>Add&lt;Rhs&gt;::Sum&lt;for Self&gt;</td></tr>
<tr><td>Self::(Add&lt;Rhs&gt;::Sum)</td><td>Add::Sum&lt;Self,Rhs&gt;</td><td>Add&lt;Rhs&gt;::Sum&lt;for Self&gt;</td>
<tr><th colspan=3>Generic associated item</th></tr>
<tr><td>Self::Node&lt;B&gt;</td><td>Node&lt;Self,B&gt;</td><td>Node&lt;B for Self&gt;</td></tr>
<tr><td>Self::(Graph::Node&lt;B&gt;)</td><td>Graph::Node&lt;Self,B&gt;</td><td>Graph::Node&lt;B for Self&gt;</td></tr>
</table></p>


<p>Based on this table, my feeling is that the object-oriented style
handles the simple cases the best (<code>G::Node</code>, <code>G::K</code>), but it handles
the &#8220;generic trait&#8221; case very badly.</p>

<p>There are also some side considerations:</p>

<ol>
<li>Functional 1 is (mostly) backwards compatible with the current code.</li>
<li>Functional 1 provides return-type inference, which many people find
appealing.</li>
<li>The object-oriented style means that <code>a.b(...)</code> is always
sugar for <code>T::b(a, ...)</code> where <code>T</code> is the type of <code>a</code>, which is
elegant.</li>
<li>The functional styles mean that <code>::</code> is always module-based name
resolution and <code>.</code> is always type-based resolution, which has an
elegance of its own.</li>
</ol>


<p>It&#8217;s a tough call, but right now I think on balance I lean towards one
of the two functional notations, probably functional 2 because,
despite being wordier, it seems a bit clearer what&#8217;s going on. Just
appending the type parameters from the trait and the method together
is confusing.</p>

<h3>Appendix A. Functional notation (take 3)</h3>

<p>There is one other where you might handle the placement of type
parameters in the functional style.  You might take the &#8220;self&#8221; type
and place it on the trait: i.e., instead of <code>Graph::Node&lt;for G&gt;</code> you&#8217;d
write <code>Graph&lt;for G&gt;::Node</code>.  This is arguably more correct if you
think about traits in terms of Haskell type classes, since the self
type is really the same as any other type parameter on the generic
trait.  But when I experimented with it I found that it was so wordy
and ugly it was a non-starter.</p>

<h3>Appendix B. Haskell and functional dependencies</h3>

<p>In addition to associated types, Haskell also offers a feature called
functional dependencies, which is basically another, independently
developed, means of solving this same problem.  The idea of a
functional dependency is that you can define when some type parameters
of a trait are determined by others.  So, if we were to adapt
functional dependencies in their full generality to Rust syntax, we
might write out the graph example as something like this:</p>

<pre><code>// Associated types:
trait Graph {
    type Node;
    type Edge;
}

// Functional dependencies:
trait Graph&lt;Node, Edge&gt; {
    Self -&gt; Node;
    Self -&gt; Edge;
    ...
}
</code></pre>

<p>The line <code>Self -&gt; Node</code> states that, given the type of <code>Self</code>, you can
determine the type <code>Node</code> (and likewise for <code>Edge</code>).  You can see that
associated types can be translated to functional dependencies in a
quite straightforward fashion.</p>

<p>When functional dependencies have been declared, it implies that there
is no need to specify the values of all the type parameters.  For
example, it would be legal to to write our <code>depth_first_search</code>
routine without specifying the type parameter <code>E</code> on <code>Graph</code>:</p>

<pre><code>fn depth_first_search&lt;N, G: Graph&lt;N&gt;&gt;(
    graph: &amp;mut Graph,
    start_node: &amp;N) -&gt; ~[N]
{
    /* same as before */
}
</code></pre>

<p>The reason that we do not have to specify <code>E</code> is because (1) we do not
use it and (2) it is fully determined by the type <code>G</code> anyhow, so there
is no ambiguity here.  In other words, there can&#8217;t be multiple
implementations of <code>Graph</code> that have the same self type but different
edge types.</p>

<p>Functional dependencies are more general than associated types.  They
allow you to say a number of other things that you could never write
with an associated type, for example:</p>

<pre><code>trait Graph&lt;Node, Edge&gt; {
    Node -&gt; Edge;
    ...
}
</code></pre>

<p>This trait declaration says that, if you know the type of the nodes
<code>Node</code>, then you know the type of the edges <code>Edge</code>.  However, knowing the
type <code>Self</code> isn&#8217;t enough to tell you either of them.  I don&#8217;t know of
any examples where expressiveness like this is useful, however.</p>

<h3>Appendix C. &#8220;where&#8221; clauses.</h3>

<p>There is one not-entirely-obvious interaction between associated types
and other parts of the syntax.  Suppose that I wanted to write a
function that worked over any graph whose nodes were represented as
integers (it is very common to represent graph nodes as integers when
working with large graphs). If we defined the graph trait using a
simple type parameter, like so:</p>

<pre><code>trait Graph1&lt;N&gt; { ... }
</code></pre>

<p>then I could write a depth-first-search routine that expects
a graph with <code>uint</code> nodes as follows:</p>

<pre><code>fn depth_first_search_over_uints&lt;G: Graph1&lt;uint&gt;&gt;(graph: &amp;G) { ... }
</code></pre>

<p>But we saw in <a href="http://smallcultfollowing.com/babysteps/blog/2013/04/02/associated-items">the previous post</a> that this definition of
<code>Graph</code> has a number of downsides. In fact, it was the motivating
example for associated types. So we&#8217;d rather write the trait
like so:</p>

<pre><code>trait Graph {
    type N;
    ...
}
</code></pre>

<p>But now it seems that I cannot write a <code>depth_first_search_over_uints</code>
routine anymore! After all, where would I write it?</p>

<pre><code>fn depth_first_search_over_uints&lt;G: Graph&gt;(graph: &amp;G) { ... }
</code></pre>

<p>Many languages answer this problem by adding a separate clause that
can be used to specify additional constraints.  In Rust we might write
it like so (hearkening back to the typestate constraint syntax):</p>

<pre><code>fn depth_first_search_over_uints&lt;G: Graph&gt;(graph: &amp;G)
    : G::Node == uint
{ ... }
</code></pre>

<p>This is not the end of the world, but it&#8217;s also unfortunate, since
this kind of clause leaks into closure types and all throughout the
language. But while discussing associated types with [Felix][pnkfelix]
at some point I realized that there is a workaround for this
situation. If you have a trait like <code>Graph</code> that uses an associated
type, but you would like to write a routine like
<code>depth_first_search_over_uints</code>, you can write an adapter:</p>

<pre><code>trait Graph1&lt;N&gt; { ... } // as before
impl&lt;G: Graph&gt; Graph1&lt;G::Node&gt; for G { ... }
</code></pre>

<p>Now I can write <code>depth_first_search_over_uints</code> and have it work for
any type that implements <code>Graph</code>.</p>

<p>This adapter trait is not the most elegant solution but it works. I
would not expect this situation to arise that frequently, but it will
come up from time-to-time. The <code>Add</code> and <code>Iterable</code> traits come to
mind.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Associated items]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/04/02/associated-items/"/>
    <updated>2013-04-02T14:21:00-04:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/04/02/associated-items</id>
    <content type="html"><![CDATA[<p>I&#8217;ve been doing a lot of thinking about Rust&#8217;s trait system lately.
The current system is a bit uneven: it offers a lot of power, but the
implementation is inconsistent and incomplete, and in some cases we
haven&#8217;t thought hard enough about precisely what should be allowed and
what should not.  I&#8217;m going to write a series of posts looking at
various aspects of the trait system and trying to suss out what we
should be doing in each case. In particular I want to be sure that our
trait design is <em>forwards compatible</em>: that is, I expect that we will
defer final decisions about various aspects of the trait system until
after 1.0, but we should look now and try to anticipate any future
difficulties we may encounter.</p>

<p>As the inaugural post in this series, I want to take a look at
<em>associated items</em> (e.g., associated types, constants, functions,
etc).  Associated items are requested often, though under various
names.  When I first started this post, it was actually part of a
larger post, but I quickly found that the topic of associated items
was too large to be a footnote of another post.  In fact, I&#8217;m finding
it&#8217;s too large to fit into one post at all.  So I&#8217;ll be breaking this
post up until multiple pieces.  This first post will cover what an
associated item <em>is</em> and what you might want to use it for, and it
will do so from a C++ perspective.</p>

<p><strong>I will also propose some changes to how we handle so-called &#8220;static&#8221;
fns (which I will be calling &#8220;associated&#8221; functions, because the name
&#8220;static&#8221; gives all the wrong connotations).</strong> These changes are not
backwards compatible.  I do not take such an idea lightly; we are
trying very hard to stabilize Rust so such changes must pass a high
bar (I personally think the change would be worth it, but opinions
will vary).  In the next post, I will present the Haskell approach to
associated items, which is closer to what we have today and which can
be adapted in a mostly backwards-compatible fashion.</p>

<!-- more -->


<p>Associated items sound like some kind of crazy language extension, but
they&#8217;re actually pretty straight-forward and natural.  They are used
very frequently both in C++ and Haskell, as well as other languages.
To get an idea what you might want one for, imagine you were going to
design a generic graph library, and you want to implement some
algorithms that operate over any sort of graph.</p>

<p>You might begin with defining a generic graph trait that defines
the interface your algorithms will expect to manipulate the graph:</p>

<pre><code>trait Graph&lt;Node&gt; {
    fn get_visited(&amp;self, n: &amp;Node) -&gt; bool;
    fn set_visited(&amp;mut self, n: &amp;Node);
    fn get_successors(&amp;self, n: &amp;Node) -&gt; ~[Node];
    ...
}
</code></pre>

<p>The details are not too important but you get the idea.  Now, we might
implement a function like <code>depth_first_search</code>, which executes a depth
first search and returns the nodes we visited in order:</p>

<pre><code>fn depth_first_search&lt;N, G: Graph&lt;N&gt;&gt;(
    graph: &amp;mut Graph,
    start_node: &amp;N) -&gt; ~[N]
{
    let mut nodes = ~[];
    let mut stack = ~[start_node];
    while !stack.is_empty() {
        let node = stack.pop();
        if graph.get_visited(node) {
            loop; // already visited
        }
        graph.set_visited(node);
        nodes.push(node);
        stack.push_all(graph.get_successors(node));
    }
    return nodes;
}
</code></pre>

<p>Notice that <code>depth_first_search</code> takes two type parameters, <code>N</code> and
<code>G</code>, where <code>N</code> is the type of the nodes used by the graph <code>G</code>.  If you
think about it, this is a bit odd, because these two type parameters
are not really independent.  Typically, when one implements a graph,
you implement it for a specific kind of node, and only that kind of
node.  Now, so long as the <code>Graph</code> trait is only parameterized by the
type of the nodes, this is not so bad, but in practice a real graph
library will grow a number of similar type parameters. For example, we
might want the type of the edges, which would give us two type parameters:</p>

<pre><code>trait Graph&lt;Node, Edge&gt; {
    ...
}

fn depth_first_search&lt;N, E, G: Graph&lt;N, E&gt;&gt;(
    graph: &amp;mut Graph,
    start_node: &amp;N) -&gt; ~[N]
{
    ...
}
</code></pre>

<p>Already you can see that our signatures are getting complicated.
There is another problem as well: even though <code>depth_first_search</code>
does not need to consider edges, after all we saw the implementation
before and it only needed the type <code>N</code>, we must include the edge type
<code>E</code> in the signature.</p>

<p>Now imagine that we want to make an efficient graph type.  It is
likely that we can use a specialized type to represent a set of edges
or nodes; a bitset, for example.  In that case, we would want a third
and maybe even a fourth type parameter (<code>NodeSet</code> or <code>EdgeSet</code>).  The
list just keeps growing.  And for each such type parameter, we will
have to extend the signature of <code>depth_first_search</code> along with every
generic function that is implemented over our graph.  This is not only
unwieldy, it&#8217;s a refactoring hazard that will limit the ability of
people to write generic libraries.</p>

<h3>Enter associated types</h3>

<p>C++ had a similar problem in the design of the STL.  Because C++
traits are basically just macros, however, clever C++ programmers were
able to come up with a useful pattern that avoids all these hazards (I
probably have my history wrong here, no doubt C++ programmers adapted
a solution first used in other languages, perhaps without even knowing
it, but it reads better this way, doesn&#8217;t it?).  Instead of defining
the trait <code>Graph</code> as being parameterized over the node type <code>Node</code>,
define the node type <code>Node</code> as an &#8220;associated type&#8221;:</p>

<pre><code>trait Graph {
    type Node; // associated type

    fn get_visited(&amp;self, n: &amp;N) -&gt; bool;
    fn set_visited(&amp;mut self, n: &amp;N);
    fn get_successors(&amp;self, n: &amp;N) -&gt; ~[N];
    ...
}
</code></pre>

<p>Notice that the definition of <code>Node</code> has moved <em>inside</em> the trait.
The meaning of this is that any given <code>Graph</code> implementation will
define a type <code>Node</code> that represents nodes.  That is, rather than
<code>Node</code> being a &#8220;input&#8221; to the trait, it is an &#8220;output&#8221;, just like the
functions <code>get_visited()</code> etc are &#8220;outputs&#8221;.</p>

<p>Now we can adapt our <code>depth_first_search</code> routine as follows:</p>

<pre><code>fn depth_first_search&lt;G: Graph&gt;(
    graph: &amp;mut Graph,
    start_node: &amp;G::Node) -&gt; ~[G::N]
{
    /* same as before */
}
</code></pre>

<p>Note that <code>depth_first_search</code> only takes one type parameter, the
graph type <code>G</code>.  The type of the node is then relative to <code>G</code> (so
<code>G::Node</code> would be &#8220;the node type used by the graph type <code>G</code>&#8221;).</p>

<p>Interestingly, I can now add as many associated types to <code>Graph</code> as I
like without affecting the signature of <code>depth_first_search</code> in the
slightest.</p>

<h3>Associated constants</h3>

<p>It is not hard to imagine extending this idea to other kinds of
associated members.  For example, we might write up a trait like
<code>Vector</code> that has an associated constant specifying the number of
dimensions in vectors of this type:</p>

<pre><code>trait Vector {
    static dims: uint;
    fn get(&amp;self, dim: uint) -&gt; uint;
}
</code></pre>

<p>Now I can write up an implementation, say for a two-dimensional point
type:</p>

<pre><code>struct Point2D { x: uint, y: uint }

impl Vector for Point2D {
    static dims: uint = 2;
    fn get(&amp;self, dim: uint) -&gt; uint {
        assert!(dim &lt; 2);
        if dim == 0 {self.x} else {self.y}
    }
}
</code></pre>

<p>And then I can use this with generic code:</p>

<pre><code>fn sum&lt;V: Vector&gt;(v: &amp;V) -&gt; uint {
    let sum = 0;
    for uint::range(0, V::dims) |i| {
        sum += v.get_dim(i);
    }
    return sum;
}
</code></pre>

<h3>Associated functions</h3>

<p>Associated functions are useful in a couple of different contexts.
One common example is where you would like to define a trait that
includes some sort of constructor, such as <code>FromStr</code>:</p>

<pre><code>trait FromStr {
    fn parse(input: &amp;str) -&gt; Self;
}
</code></pre>

<p>Here the trait defines an associated function <code>parse()</code> that will
parse a string and return an instance of the <code>Self</code> type.  I could
for example implement <code>FromStr</code> for integers:</p>

<pre><code>impl FromStr for uint {
    fn parse(input: &amp;str) -&gt; uint {
        uint::parse(input, 10) // 10 is the radix
    }
}
</code></pre>

<p>Using <code>FromStr</code>, I can write a generic routine that, for example,
parses a comma-separate list of values:</p>

<pre><code>fn parse_comma_separated&lt;T: FromStr&gt;(input: &amp;str) -&gt; ~[T] {
    let substrings = input.split(",");
    substrings.map(|substring| T::parse(substring))
}
</code></pre>

<p>Experienced Rust users might note that the syntax in that example is
actually not what one would write today.  This is the
&#8220;non-backwards-compatible change&#8221; I alluded to earlier.  In Rust
today, when one invokes an associated function, it is not named via
the self type as I did above, but rather it is named via the trait to
which the function belongs:</p>

<pre><code>substrings.map(|substring| FromStr::parse(substring))
</code></pre>

<p>The compiler uses inference to decide that the return type here is <code>T</code>
and therefore the self type for this call to <code>parse</code> must be <code>T</code>. This
approach is elegant in many ways, as I&#8217;ll cover in the next post in
more detail, but it also has some downsides.  Perhaps the most serious
is that, if the associated function does not return an instance of
<code>Self</code>, then the compiler cannot disambiguate what version of the
function you are trying to call!</p>

<p>To see where you might have an associated function that does not
return <code>Self</code>, consider a trait like the following:</p>

<pre><code>trait TemperatureUnit {
    fn to_kelvin(f: float) -&gt; float;
}
</code></pre>

<p>Using the C++-approach I have been describing thus far, I could write
a generic function like:</p>

<pre><code>fn do_some_chemistry&lt;TU: TemperatureUnit&gt;(f: float) -&gt; float {
    let kelvin = TU::to_kelvin(f);
    ...
}
</code></pre>

<p>Of course, this example is somewhat artificial, because one would be
better off integrate the temperature units as types in your type
system rather than using floats. But real examples like this do come
up. The associated constant <code>V::dims</code> is an example.</p>

<h3>So is there a proposal here?</h3>

<p>Yes and no. Partially I just wanted to explain what an associated item
is and what you might use it for. But I&#8217;ve also kind of baked in an
alternate proposal for how we should address associated items, which
is to switch from a Haskell-like approach to a C++-like approach.  In
the next post, I&#8217;ll explain how the Haskell solution works, and what
it would look like in Rust.  Frankly the difference is not so great so
it&#8217;s a matter of taste.</p>

<p>Anyway, if you wanted to implement the scheme I&#8217;ve described in this
post, it would work as follows.  When resolving a path, if you find
that some prefix of the path evaluates to a type, then later elements
in the path are resolved using the same algorithm that we use today
for method lookup.  So, to look at our examples, if I wrote <code>G::Node</code>,
the path <code>G</code> here is a type, which means that the type <code>Node</code> would be
determined by examining the traits that are in scope to see whether
any of them both (1) define a type member <code>Node</code> and (2) are
implemented by <code>G</code>.</p>

<p>This is exactly analogous to how method lookup operates.  When you see
a call <code>a.b()</code>, we determine the type <code>T</code> of the expression <code>a</code> and
then look to see whether any of the traits which are in scope (1)
offer a method <code>b()</code> and (2) are implemented by <code>T</code>.</p>

<p>In fact, it&#8217;s a bit more complex, because we also consider the
inherent members of a type that are defined without any trait at all.
We can do the same thing when resolving associated items.</p>

<p>Interestingly, unifying the algorithm used to specify associated items
and method calls also allows us to say that a call like <code>a.b(...)</code> is
just sugar for <code>T::b(a, ...)</code> where <code>T</code> is the type of <code>a</code>.</p>

<h3>Corner cases</h3>

<p>There are a few corner cases to consider in this proposal.</p>

<h4>Ambiguous references</h4>

<p>It is possible to have two traits <code>A</code> and <code>B</code> that define the same
associated item <code>I</code>.  If both those traits are imported, and both
those traits are implemented by the same type <code>T</code>, then a reference
like <code>T::I</code> could refer to the item defined by <code>A</code> or the item defined
by <code>B</code>.  If we wish to provide an explicit syntax to disambiguate the
reference, it could be something like <code>T::(A::I)</code>.  That is, we refer
to the item <code>I</code> as defined in the trait <code>A</code> implemented for the type
<code>T</code>.</p>

<p>Another possible ambiguity can arise when you have a generic trait.
Consider something like the following:</p>

<pre><code>trait Getter&lt;T&gt; {
    static default: T;
    fn get(&amp;self) -&gt; T;
}
</code></pre>

<p>Now imagine that I have some type with two implementations of
<code>Getter</code>:</p>

<pre><code>struct Circle {
    center: Point, radius: float
}

impl Getter&lt;Point&gt; for Circle {
    static default: Point = Point {x: 0, y: 0};
    fn get(&amp;self) -&gt; Point { self.point }
}

impl Getter&lt;float&gt; for Circle {
    static default: float = 0;
    fn get(&amp;self) -&gt; Point { self.radius }
}
</code></pre>

<p>If I then write a generic routine such as:</p>

<pre><code>fn is_default&lt;G: Getter&lt;float&gt; Getter&lt;Point&gt;&gt;(g: &amp;G) -&gt; bool {
    let x = G::default;
    let y = g.get();
    x == y
}
</code></pre>

<p>Then what value for <code>default</code> is <code>G::default</code> are we going to obtain?
The <code>Point</code> or the <code>float</code>?</p>

<p>Using the syntax that I proposed, one could write this unambiguously,
if verbosely:</p>

<pre><code>fn is_default&lt;G: Getter&lt;float&gt; Getter&lt;Point&gt;&gt;(g: &amp;G) -&gt; bool {
    let x = G::(Getter::&lt;float&gt;::default);
    let y = g.(Getter::&lt;float&gt;::get)();
    x == y
}
</code></pre>

<h4>Not all types are paths</h4>

<p>Another problem is that if you wanted to get an associated member
of a type like <code>~[int]</code>, you couldn&#8217;t write <code>~[int]::foo</code>.  But
this is easily circumvented by creating a type alias</p>

<pre><code>type T&lt;U&gt; = U;
</code></pre>

<p>and writing <code>T::&lt;~[int]&gt;::foo</code>, or else by permitting the syntax
<code>&lt;~[int]&gt;::foo</code>.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Guaranteeing parallel execution]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/03/21/guaranteeing-parallel-execution/"/>
    <updated>2013-03-21T10:38:00-04:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/03/21/guaranteeing-parallel-execution</id>
    <content type="html"><![CDATA[<p>One common criticism of the work on ParallelJS is that the API itself
does not guarantee parallel execution.  Instead, our approach has been
to offer methods whose definition makes parallel execution <em>possible</em>,
but we have left it up to the engines to define the exact set of
JavaScript that will be safe for parallel execution.</p>

<p>Now, I definitely think it is a good idea to clearly define the subset
of JavaScript that our engine will be able to execute in parallel.  As
I wrote in my <a href="http://smallcultfollowing.com/babysteps/blog/2013/03/20/parallel-js-lands/">preivous post</a>, I want to do this both via
documentation and via developer tools that provide live feedback.  In
some cases, I think, the rules will probably depend on type inference
or other dynamic analysis techniques that are subtle and hard to
explain, but live feedback should be helpful in detecting and
resolving those cases.</p>

<p>Nonetheless, I do not think that the <em>formal specification</em> of
ParallelJS should include these sorts of details.  In my view, this
would be similar to having the ECMAScript committee define what
patterns in JavaScript will be efficiently JITted and which will not.
This is ultimately going to vary depending on the implementation.</p>

<p>In particular, the JavaScript subset that will be acceptable is going
to vary substantially depending on what techniques are used to
implement the parallelization.  On the extreme end, if we had an
implementation based on transactional memory, you could imagine that
the <em>full JavaScript language</em> might be accepted.  If you think that&#8217;s
science fiction, consider that
<a href="http://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions">newer Intel chips will have hardware support for a limited form of transactional memory</a>.
On the other extreme, engines that utilize the GPU will only support a
very limited subset, one that most likely excludes memory allocation.</p>

<p>I find the precedent of <a href="http://asmjs.org/">asm.js</a> to be a more promising
approach.  The formal specification should only state what the the
parallel methods are.  Preferably, this specification should be loose
enough to accommodate as many different parallel execution techniques
as possible, but strict enough to prevent wide divergence between
engines.  I have argued in the past for
<a href="http://smallcultfollowing.com/babysteps/blog/2013/01/02/deterministic-or-not/">&#8220;equivalent to some sequential execution&#8221;</a>, and I still think
that&#8217;s the right standard, but there&#8217;s room for discussion on this
point.</p>

<p>Meanwhile, there are can be several independent specifications that
provide guidance as to what subset of JavaScript should be supported
to parallelize in different ways.  Writing such a specification now is
probably immature, I think it would be better to have multiple
JavaScript engines involved so that the specification is not tailored
to SpiderMonkey.</p>

<p>It will be challenging, I think, to come up with a specification that
offers the very strong guarantees that &#8220;asm.js&#8221; can offer (no
recompilation, no bailouts, etc).  This is because &#8220;asm.js&#8221; is a
<em>very</em> narrow slice of JS intended to be output by compilers, not by
humans.  It excludes, for example, all normal JavaScript objects.
Now, a specification like this might be useful in a parallel context
as well; it could serve as the backend for other languages.  But I
would hope that we have some broader specifications that define code
that humans can write.</p>

<p>It is also important to point out that &#8220;asm.js&#8221; builds upon a lot of
precedent.  Smart folk working on projects like <a href="https://github.com/kripken/emscripten">Emscripten</a> and
<a href="http://www.mandreel.com/">Mandreel</a> have already done a lot of the leg work to define the
idioms that &#8220;asm.js&#8221; codifies.  I hope that as ParallelJS evolves
we&#8217;ll also evolve a common set of idioms and &#8220;ways of doing things&#8221;
that we can then formalize.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[A tour of the Parallel JS implementation (Part 1)]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/03/20/a-tour-of-the-parallel-js-implementation/"/>
    <updated>2013-03-20T16:30:00-04:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/03/20/a-tour-of-the-parallel-js-implementation</id>
    <content type="html"><![CDATA[<p>I am going to write a series of blog posts giving a tour of the
current Parallel JS implementation in SpiderMonkey.  These posts are
intended to serve partly as documentation for the code.  The plan is
to begin high level and work my way down to the nitty gritty details,
so here we go!</p>

<p>I will start my discussion at the level of the intrinsic <code>ForkJoin()</code>
function.  As an intrinsic function, <code>ForkJoin()</code> is not an API
intended for use by end-users.  Rather, it is available only to
self-hosted code and is intended to serve as a building block for
other APIs (<code>ParallelArray</code> among them).</p>

<!-- more -->


<h3>The ForkJoin function</h3>

<p>The idealized view of how <code>ForkJoin</code> works is that you give it a
callback <code>fjFunc</code> and it will invoke <code>fjFunc</code> once on each worker thread.
Each call to <code>fjFunc</code> should therefore do one slice of the total work.</p>

<p>In reality, though, the workings of <code>ForkJoin</code> are rather more
complex.  The problem is that we can only execute Ion-compiled code in
parallel. Moreover, we can only handle ion-compiled code that avoids
the interpreter, since most of the pathways in the interpreter are not
thread-safe. This means that we require accurate type
information. However, we can only <em>get</em> accurate type information by
running the code (and, moreover, we can&#8217;t run the code in parallel
because the type monitoring infrastructure is not thread-safe).</p>

<p>What we wind up doing therefore is using sequential executions
whenever we can&#8217;t run in parallel.  This might be because there isn&#8217;t
enough type information, or it might be because the code contains
operations that require parts of the interpreter that are not
thread-safe.</p>

<p>In the general case, a single call to <code>ForkJoin</code> can move back and
forth between parallel and sequential execution many times, gathering
more information and potentially recompiling at each step.  After a
certain number of bailouts, however, we will just give up and execute
the remainder of the operations sequentially.</p>

<p>The <code>ForkJoin</code> function is designed such that the caller does not have
to care whether the execution was done in parallel or sequential.
Either way, presuming that the callback <code>fjFunc</code> is properly written,
the same results will have been computed in the end.</p>

<h4>The arguments to ForkJoin</h4>

<p>ForkJoin expects one or two arguments:</p>

<pre><code>ForkJoin(fjFunc)               // Either like this...
ForkJoin(fjFunc, feedbackFunc) // ...or like this.
</code></pre>

<p>Both arguments are functions.  <code>fjFunc</code> defines the operation that
will execute in parallel and <code>feedbackFunc</code> is used for reporting on
whether bailouts occurred and why.  <code>feedbackFunc</code> is optional and may
be undefined or null.  Not passing in feedback will result in slightly
faster execution as less data is gathered.  The <code>ForkJoin</code> function
does not return anything and neither <code>fjFunc</code> nor <code>feedbackFunc</code> are
expected to return any values; instead, <code>fjFunc</code> and <code>feedbackFunc</code>
are expected to mutate values in place to produce their output.</p>

<h4><code>fjFunc</code>: The parallel operation</h4>

<p>The signature of <code>fjFunc</code> is as follows:</p>

<pre><code>fjFunc(sliceId, numSlices, warmup)
</code></pre>

<p>Here <code>sliceId</code> and <code>numSlices</code> are basically the thread id and the
thread count respectively (though we purposefully distinguish between
the <em>slice</em>, a unit of work, and the <em>worker thread</em>&#8212;today there is
always one slice per worker thread, but someday we may improve the
scheduler to support work-stealing or other more intelligent
strategies for dividing work and then this would not necessarily be
true).</p>

<p>The <code>warmup</code> flag indicates whether the function is being called in
<em>warmup mode</em>.  As will be explained in the next section, we expect
<code>fjFunc</code> to generally track how much work it has done so far.  When
<code>warmup</code> is true, the function should do &#8220;some&#8221; of the remaining work,
but not too much.  When <code>warmup</code> is false, it should attempt to do all
the remaining work.  Thus, if <code>fjFunc</code> successfully returns when
<code>warmup</code> is false, then <code>ForkJoin</code> can assume that all of the work for
that slice has been completed.</p>

<h4>Warmups and bailouts</h4>

<p>On the very first call to <code>ForkJoin</code>, it is very likely that the
callback <code>fjFunc</code> has never been executed and therefore no type
information is available.  In that case, <code>ForkJoin</code> will begin by
invoking the callback <code>fjFunc</code> <em>sequentially</em> (i.e., with the normal
interpreter) and with the <code>warmup</code> argument set to true.  We currently
invoke <code>fjFunc</code> once for each slice. As we just said, because <code>warmup</code>
is true, each call to <code>fjFunc</code> should do some of the work in its slice
but not all (<code>fjFunc</code> is responsible for tracking how much work it has
done; I&#8217;ll explain that in a second).  Once the calls to <code>fjFunc</code>
return, and presuming no exceptions are thrown, <code>ForkJoin</code> will
attempt compilation for parallel execution.</p>

<p>Presuming compilation succeeds, <code>ForkJoin</code> will attempt parallel
execution.  This means that we will spin up worker threads and invoke
<code>fjFunc</code> in each one.  This time, the <code>warmup</code> argument will be set to
false, so <code>fjFunc</code> should try and do all the rest of the work that
remains in each slice.  If all of these invocations are successful,
then, the <code>ForkJoin</code> procedure is done.</p>

<p>However, it is possible that one or more of those calls to <code>fjFunc</code> may
<em>bailout</em>, meaning that it will attempt some action that is not
permitted in parallel mode.  There are many possible reasons for
bailouts but they generally fall into one of three categories:</p>

<ul>
<li>The type information could be incomplete, leading to a failed type
guard;</li>
<li>The script might have attempted some action that is not (yet?)
supported in parallel mode even though it seems like it might be
theoretically safe, such as access to a JS proxy, built-in C++
function, or DOM object;</li>
<li>The script might have attempted to mutate shared state.</li>
</ul>


<p>What we do in response to a bailout is to fallback to another
sequential, warmup phase.  As part of this fallback, we typically
invalidate the parallel version of <code>fjFunc</code> that bailed out, meaning
that we&#8217;ll recompile it later.  Next, just as we did in the initial
warmup, we invoke <code>fjFunc</code> using the normal interpterer once for each
slice with the <code>warmup</code> argument set to true.</p>

<p>Once this &#8220;recovery&#8221; warmup phase has completed, we will re-attempt
parallel execution.  The idea is that we now have more accurate type
and profiling information, so we should be able to compile
successfully this time.</p>

<p>This process of alternating parallel execution and sequential recovery
runs continues until either (1) a parallel run completes without error
(in which case we&#8217;re done) or (2) we have bailoud out three times
(which is a random number, obviously, that we probably want to
tune). Once we&#8217;ve had three bailouts, we&#8217;ll give up and just invoke
<code>fjFunc</code> sequentially with <code>warmup</code> set to false.</p>

<h4>An example: ParallelArray.map</h4>

<p>To make this more concrete, I want to look in more detail about how
<code>ParallelArray.map</code> is implemented in the self-hosted code.  All the
other <code>ParallelArray</code> functions work in a similar fashion so I will
just focus on this one.</p>

<p>The semantics of a parallel map are simple: when the user writes
<code>pa.map(kernelFunc)</code>, a new ParallelArray is returned with the result of
invoking <code>kernelFunc</code> on each element in <code>pa</code>. In effect, this is just
like <code>Array.map</code>, except that the order of each iteration is
undefined.</p>

<p>Our implementation works by dividing the array to be mapped into
chunks, which are groups of 32 elements.  These chunks are then
divided evenly amongst the <code>N</code> worker threads.  The implementation
relies on shared mutable state to track how many chunks each thread
has been able to process thus far.  There is a private array called
<code>info</code> that stores, for each chunk, a start index, end index, and a
current index.  The start and end indices simply reflect the range of
items assigned to the worker.  The current index, which is initially
the same as the start index, indicates the next chunk that the worker
should attempt to process.  This array is shared across all threads
and is unsafely mutated using special intrinsics (thus bypassing the
normal restrictions against mutating shared state).</p>

<p>The <code>ParallelArray</code> map function is built on <code>ForkJoin</code>.  A simplified
verison looks something like this:</p>

<pre><code>function map(kernelFunc) {
    // Compute the bounds each slice will have to operate on:
    var length = this.length;
    var numSlices = ForkJoinSlices();
    var info = prepareInfoArray(length, numSlices);

    // Create the result buffer:
    var buffer = NewDenseArray(length);

    // Perform the computation itself, writing into `buffer`:
    ForkJoin(mapSlice);

    // Package up the buffer in a parallel array and return it:
    return NewParallelArray(buffer);

    function mapSlice(sliceId, numSlices, warmup) {
        // ... see below ...
    }
}
</code></pre>

<p>Here is the source to the map callback function <code>mapSlice</code>:</p>

<pre><code>function mapSlice(sliceId, numSlices, warmup) {
  var chunkPos = info[SLICE_POS(sliceId)];
  var chunkEnd = info[SLICE_END(sliceId)];

  if (warmup &amp;&amp; chunkEnd &gt; chunkPos)
    chunkEnd = chunkPos + 1;

  while (chunkPos &lt; chunkEnd) {
    var indexStart = chunkPos &lt;&lt; CHUNK_SHIFT;
    var indexEnd = std_Math_min(indexStart + CHUNK_SIZE, length);

    // Process current chunk:
    for (var i = indexStart; i &lt; indexEnd; i++)
      UnsafeSetElement(buffer, i, kernelFunc(self.get(i), i, self));

    UnsafeSetElement(info, SLICE_POS(sliceId), ++chunkPos);
  }
}
</code></pre>

<p>This same code is used both for parallel execution and the sequential
fallback.  Each time <code>mapSlice</code> is invoked, it will use the <code>sliceId</code>
it is given to lookup the current chunk (<code>info[SLICE_POS(sliceId)]</code>;
<code>SLICE_POS</code> is a macro that computes the correct index).  It will then
process that chunk and update the shared array with the index of the
next chunk (results are unsafely written into the result array
<code>buffer</code>&#8212;note that this result is not yet exposed to non-self-hosted
code).  If we are in warmup mode, it will stop and return once it has
processed a single chunk.  Otherwise, it keeps going and processes the
remaining chunks, updating the shared array at each point.</p>

<p>The purpose of updating the shared array is to record our progress in
the case of a bailout.  If a bailout occurs, it means that after
processing some portion of the current chunk, the function will simply
exit.  As a result, the &#8220;current chunk&#8221; will not be incremented, and
the next time that <code>mapSlice</code> is invoked with that same <code>sliceId</code>, it
will pick up and start re-processing the same chunk.  This does mean
that if a bailout occurs we will process some portion of the chunk
twice, once in parallel mode and then again in sequential mode after
the bailout.  This is unobservable to the end user, though, because
parallel executions are guaranteed to be pure and thus the user could
not have modified shared state or made any observable changes.</p>

<p>The various worker threads will unsafely mutate this shared array to
track their progress.  Unsafe mutations make use of the intrinsic
<code>UnsafeSetElement(array, index, value)</code>, which is more-or-less
equivalent to <code>array[index] = value</code> except that (1) it assumes the
index is in bounds; (2) it assumes that <code>array</code> is a dense array or a
typed array; (3) it does not do any data race detection.  In other
words, you have to know what you&#8217;re doing.  The same intrinsic is also
used to store the intermediate results.</p>

<h3><code>feedback</code>: Reporting on bailouts</h3>

<p>The precise API for <code>feedback</code> is to some extent still being hammered
out. Right now this function is used primarily for unit testing so
that we can be sure that parallelization works when we think it
should.  The eventual goal is to display information in the profiler
or other dev tools that indicate what happened. This post is already
long enough, so I&#8217;ll defer a discussion of the precise process by
which we gather bailout information.  Suffice to say that in the event
of a bailout, each thread records the cause of the bailout (e.g.,
&#8220;write to illegal object&#8221; or &#8220;type guard failure&#8221;) along with a stack
trace showing the script and its position.</p>

<h3>Note</h3>

<p>This note describes the code as it is found on <a href="https://github.com/syg/iontrail">our branch</a>.
This differs slightly from what is currently landed on trunk.  In
particular, there have been some recent refactorings that renamed the
<code>ParallelDo</code> function to <code>ForkJoin</code>.  This is because we recently
refactored the source so that <code>ParallelDo.cpp</code> and <code>ForkJoin.cpp</code>,
originally two distinct but tightly interwoven layers, are now fused
into one abstraction.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Parallel JS lands]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/03/20/parallel-js-lands/"/>
    <updated>2013-03-20T09:56:00-04:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/03/20/parallel-js-lands</id>
    <content type="html"><![CDATA[<p>The <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=829602">first version of our work on ParallelJS</a> has just been
promoted to mozilla-central and thus will soon be appearing in a
Nightly Firefox build near you.  I find this pretty exciting.  In
honor of the occassion, I wanted to take a moment to step back and
look both at what has landed now, what we expect to land soon, and the
overall trajectory we are aiming for.</p>

<!-- more -->


<h3>What is available now</h3>

<p>Once Nightly builds are available, users will be able to run what is
essentially a &#8220;first draft&#8221; of Parallel JS.  The code that will be
landing first is not really ready for general use yet.  It supports a
limited set of JavaScript and there is no good feedback mechanism to
tell you whether you got parallel execution and, if not, why not.
Moreover, it is not heavily optimized, and the performance can be
uneven.  Sometimes we see linear speedups and zero overhead, but in
other cases the overhead can be substantial, meaning that it takes
several cores to gain from parallelism.  Nonetheless, it is pretty
exciting to see multithreaded execution landing in a JavaScript
engine.  As far as I know, this is the first time that something like
this has been available (WebWorkers, with their Share Nothing, Copy
Everything architecture, do not count).</p>

<p><strong>UPDATE:</strong> It has been pointed out to me that WebWorkers were
recently extended to support <em>moving</em> typed arrays from place to
place, though there is still no way for multiple workers to <em>share</em> a
read-only view on a typed array.</p>

<p>We have already written several patches that we hope to land in the
near future.  These patches expand the set of JavaScript functions
that can run in parallel.  They also help to reduce compilation
overheads and generally improve performance, as well as making the
code less vulnerable to disruptions from garbage collection.</p>

<h3>Where we are going in the medium term</h3>

<p>Looking at the medium term, the main focus is on ensuring that there
is a large, usable subset of JavaScript that can be reliably
parallelized.  Moreover, there should be a good feedback mechanism to
tell you when you are not getting parallel execution and why not.</p>

<p>I think that we can achieve a state where if you write a pure function
(meaning one that does not mutate shared state) in &#8220;plain vanilla&#8221; JS,
it will basically work.  &#8220;Plain vanilla&#8221; is of course a highly
technical industry term meaning &#8220;no weird stuff&#8221;.  Intuitively, I mean
code that uses only JS objects (i.e., no DOM objects) and avoids some
of the more advanced JS features like proxies. A more rigorous
definition of what I mean by &#8220;plain vanilla&#8221; is a big piece of this
medium-term work.</p>

<p>Supporting &#8220;plain vanilla&#8221; JS is mostly a matter of going through
individual code paths in SpiderMonkey and refactoring them so that
they can cleanly support parallel execution.  It is difficult to do
this and keep the code relatively DRY.  We are currently exploring the
best techniques for this.  I think there is no magic bullet here,
though; the code was written to assume single-threaded execution and
is riddled with various bits of cleverness that unfortunately make it
hard to parallelize.</p>

<p>The other part of the story is providing feedback that informs users
when parallelization has failed and why.  Once we support a large
enough portion of JS, I think good feedback is probably even more
important than expanding the subset we support.</p>

<p>Finally, there will always be ongoing work on lowering overhead and
improving performance.  Some of that can come from more advanced
optimization techniques (like vectorized compilation or GPU support),
but to some extent this also arises just from looking over the
relevant code paths and tuning them repeatedly.</p>

<h3>Where we are going in the long term</h3>

<p>I am basically obsessed with the idea of making parallelism easy and
omnipresent wherever I can.  The code we are landing now is a very
significant step in that direction, though there is a long road ahead.</p>

<p>I want to see a day where there are a variety of parallel APIs for a
variety of situations.  I want to see a day where you can write
arbitrary JS and know that it will parallelize and run efficiently
across all browsers.</p>

<p>I expect the final APIs with which we expose parallel execution will
evolve over time.  There will be debate, some of which is already
visible on this blog.  For example, should we just offer a
<code>ParallelArray</code> type or instead <a href="http://smallcultfollowing.com/babysteps/blog/2013/02/26/splitting-the-pjs-api/">attach methods to Array</a>?  How
<em>should</em> we specify the <a href="http://smallcultfollowing.com/babysteps/blog/2013/01/02/deterministic-or-not/">semantics</a> of parallel execution,
<a href="http://smallcultfollowing.com/babysteps/blog/2013/01/03/the-case-for-deterministic-results/">precisely</a>?  I expect that once we have good prototypes
available, this dialog will grow, paricularly as the JS community and
ECMAScript committee gets involved (neither group is exactly known for
a shortage of opinions, and rightly so).</p>

<p>I also want to add better support for task parallelism via something
like the <a href="http://smallcultfollowing.com/babysteps/blog/2012/01/09/parallel-javascript/">PJs</a> API I have talked about before.  Part of the goal
with the current work has been to lay the foundations to make it
possible to iterate on APIs and introduce new ones.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Splitting the PJs API]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/02/26/splitting-the-pjs-api/"/>
    <updated>2013-02-26T15:08:00-05:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/02/26/splitting-the-pjs-api</id>
    <content type="html"><![CDATA[<p>Lately, I&#8217;ve been thinking about the ParallelJS API that we want to
expose.  In particular, I&#8217;ve been considering offering methods on the
normal array type for basic parallel operations.  I think this opens
up some interesting doors.</p>

<p><em>Note:</em> To give credit where credit is due, I should note that a lot
of the ideas in this post originate with other members of the Parallel
JS team (Shu-yu Guo, Dave Herman, Felix Klock).  But I don&#8217;t want to
speak for them, since we seem to each have our own opinions on the
best arrangement, so I&#8217;m writing the post from the first person
singular (&#8220;I&#8221;) and not a team perspective (&#8220;we&#8221;).  This does not imply
&#8220;ownership&#8221; of the ideas within.</p>

<h3>The basic idea</h3>

<p>The basic idea is to add &#8220;unordered&#8221; or parallel variants of the
standard higher-order methods to JavaScript arrays as well as to typed
arrays (and <a href="http://wiki.ecmascript.org/doku.php?id=harmony:binary_data">binary data arrays</a> when those become available).
For example, in addition to <code>map()</code> and <code>reduce()</code>, we&#8217;d offer
<code>unorderedMap()</code> and <code>unorderedReduce()</code> (in the case of typed arrays,
I think we&#8217;d have to add <code>map()</code> as well).</p>

<p>The <em>semantics</em> of the unordered variants are the same as their
ordered cousins, except that the ordering in which they perform their
iterations is not defined.  However, if you used the unordered
variants, we will attempt parallel execution where possible.</p>

<h3>Why call the methods &#8220;unordered&#8221;?</h3>

<p>I chose the (admittedly somewhat clunky) prefix <code>unordered</code> because I
want to emphasize the fundamental contract our parallel execution
engine offers, which is that parallel execution is equivalent to
<em>some</em> sequential ordering, but <a href="http://smallcultfollowing.com/babysteps/blog/2013/01/02/deterministic-or-not/">it doesn&#8217;t say which one</a>.
This is a <a href="http://smallcultfollowing.com/babysteps/blog/2013/01/03/the-case-for-deterministic-results/">somewhat controversial design</a>, but I still feel it&#8217;s
the right one.  In any case, it&#8217;s basically orthgonal to this post.</p>

<p>Note that there is no reason we can&#8217;t someday try parallel execution
for the ordered <code>map()</code> as well.  However, we&#8217;d have to be very
careful to avoid introducing overhead in the case that parallelization
fails or would change the semantics of the program.  The use of the
unordered variant effectively serves as a hint that parallelization is
likely to pay off.</p>

<h3>What about immutability?</h3>

<p>Some readers will remember that <code>ParallelArray</code> objects are immutable
while normal JS arrays are not.  This is true but it&#8217;s not a big
obstacle.  During any parallel operation, mutations to pre-existing
objects are forbidden and must be detected; in the case of a call like
<code>array.unorderedMap(func)</code>, the array <code>array</code> that is being mapped is
itself a pre-existing object and thus would be at least temporarily
immutable.</p>

<p>There are of course some good reasons to have immutable data,
particularly if we wind up doing GPU operations, in which case memory
will have to be transferred back and forth, and we may have to worry
about invalidation.  If this ever becomes an issue, we can accommodate
these more advanced use cases either by the existing freezing
interfaces that JS provides or through the multi-dimensional API
described below.</p>

<h3>What are the benefits of this API?</h3>

<p>The biggest benefit of this approach, I think, is that it&#8217;s about the
simplest way to offer parallelism.  You can work with the JS array
types we all know and love (or hate, as you prefer).  Moreover,
integration with existing codebases becomes easier.  If you have some
loops that are performing pure transformations, such as filtering out
records on some criteria, you can change them to execute in parallel
just by changing the name of the method you use.  On other or older
browsers, it&#8217;s trivial to polyfill <code>unorderedMap</code> as equivalent to
<code>map</code>.</p>

<h3>What does this mean for ParallelArray?</h3>

<p>Right now, the <code>ParallelArray</code> API serves two masters.  It tries to be
a very lightweight one-dimensional array but it also tries to be a
fairly powerful multi-dimensional matrix.  If we offer parallel
transformations on normal arrays, that frees up <code>ParallelArray</code> so
that it can be targeted at more advanced use cases.  In particular, it
can be (1) always multi-dimensional and (2) type-annotated to permit
efficient storage when you have a matrix of scalar values like bytes
or ints.  I am right now working on another post regarding some ideas
relating to how we can handle the multi-dimensional case; it was
originally part of this post but this post was rapidly becoming too
long.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Interfacing with C functions in Rust]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/02/22/interfacing-with-c-functions-in-rust/"/>
    <updated>2013-02-22T16:19:00-05:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/02/22/interfacing-with-c-functions-in-rust</id>
    <content type="html"><![CDATA[<p>One of the things that I&#8217;ve been working on for some time now is the
proper integration of C functions.  As with virtually every other
facet of the design of Rust, we&#8217;ve been slowly moving from a model
where Rust tried to hide low-level details for you to one where Rust
offers tight control over what&#8217;s going on, with the type system
intervening only as needed to prevent segfaults or other strange
behavior.  This blog post details what I consider to be the best
proposal so far; some of the finer points are a bit vague, however.</p>

<h3>Extern Function Types</h3>

<p>One thing we need is a type for a simple function pointer.  Rust&#8217;s
function types to date have always been closure types, meaning that
they referred to the combination of a function pointer and some
environment.  So we have added an &#8220;extern fn&#8221; type, which is written
as follows:</p>

<pre><code>extern "ABI" fn(T) -&gt; U
</code></pre>

<p>Here the <code>"ABI"</code> string must be some ABI that is supported by the Rust
compiler.  The most common values will be either <code>C</code> or <code>Rust</code>, I
imagine, but <code>stdcall</code> (or <code>pascal</code>) may be used occasionally as well,
and who knows what we&#8217;ll support in the future.</p>

<p>I imagine that the default for &#8220;ABI&#8221; should be &#8220;C&#8221;, as it will be the
most common thing people really want to use. Calls to any extern
function with non-Rust ABI is an unsafe action.</p>

<h3>Extern Blocks and Function Declarations</h3>

<p>We are moving towards a model where function declarations are placed
within extern blocks.  This looks something like:</p>

<pre><code>extern "C" {
    fn foo();
    fn bar();
}
</code></pre>

<p>In this case, the type of <code>foo</code> and <code>bar</code> would be <code>extern "C" fn()</code>.</p>

<p>The reason that we declare extern functions in extern blocks, as
opposed to individually, is that on some platforms it is necessary to
load blocks of functions that are defined by a common library
together.</p>

<h3>&#8220;crust&#8221; functions</h3>

<p>In addition to being able to call C functions from within Rust, it is
useful to be able to call Rust functions from within C.  To this end
the compiler will permit Rust fns to be declared with a specific ABI
like so:</p>

<pre><code>extern "C" fn crust(t: T) -&gt; U {
}
</code></pre>

<p>If you declare a function as having a non-Rust ABI, then this implies
a few things:</p>

<ul>
<li>A reference to <code>crust()</code> will have type <code>extern "C" fn(T) -&gt; U</code>.</li>
<li>We cannot catch and process failure for you, since the propagation
of failure results is ABI specific.  Thus is the Rust code within an
external function fails, it will cause the process to abort.  We may
later add some way to catch failure so that you can propagate it
yourself (perhaps by returning false, etc).</li>
</ul>


<h3>Stack Switching</h3>

<p>Now we come to the interesting (and tricky) part.  Internally, Rust
makes use of a split stack approach where stack segments are allocated
dynamically as the stack grows.  This allows us to have a very large
number of threads without exhausting our address space (particularly
on 32-bit systems).  This also allows your programs to recurse as long
as there is memory available, which is sometimes useful.  It is not,
however, what C expects.  C functions just expect to have a big chunk
of stack available.  Hopefully infinite.</p>

<p>Therefore, whenever we recurse into C code, we must make sure that a
lot of stack is available.  The way we do this today is somewhat
magical: functions declared as extern are not in fact the raw C
function, but rather a wrapper around the C function that will switch
over from the Rust stack (which may be small) to a very big stack.
This was more-or-less an ok solution back before we had the idea of
getting a raw pointer to a C function and so forth but it&#8217;s not very
appealing now.  Also it can be a performance bottleneck.</p>

<p>The new proposal is to say that when you call an <code>extern "C"</code> fn,
nothing magical happens.  The stack stays just as it was.  To perform
the stack switching, we offer a function in the runtime (perhaps a
number of functions) called <code>prepare_extern_call()</code>, which can be
used like so:</p>

<pre><code>let my_c_function: extern "C" fn() = ...;
do prepare_extern_call {
    my_c_function()
}
</code></pre>

<p>Of course, it would be easy to forget to use this function, which
would be a recipe for stackfaults.  Therefore, we will also offer a
lint-mode check that defaults to error.  This check will trigger if we
see a call to a function of non-Rust ABI that is not lexically
enclosing within a call to <code>prepare_extern_call</code>.</p>

<p>There will be variants of <code>prepare_extern_call</code> that allow you to
specify the amount of stack size to guarantee more precisely if you
prefer, along with other options as those arise.</p>

<h3>Auto-generating wrappers</h3>

<p>It is our expectation that most people will not directly call C
functions.  Instead, you will wrap them in a Rust-friendly wrapper
that performs some sanity checking, converts from Rust types, etc.
This wrapper will also perform the stack switching shown above.</p>

<p>In some cases, though, writing such wrappers can be tedious, so we can
supply some annotations in the compiler that will autogenerate these
wrappers.  This is basically a macro.  I am envisioning something like
this:</p>

<pre><code>#[auto_wrap] // autogenerate wrappers for enclosing functions
extern "C" {
    #[no_wrap] // ...not this one, I'll do it by hand
    fn my_func1(x: *char) -&gt; bool;

    fn my_func2();
}

fn my_func1(x: ~str) -&gt; bool {
    do x.as_c_string |p| {
        do prepare_extern_call {
        }
    }
}
</code></pre>

<p>which would then expand into:</p>

<pre><code>extern "C" {
    fn my_func1(x: *char) -&gt; bool;
    fn my_func2();
}

fn my_func2() -&gt; bool {
   do prepare_extern_call {
       my_func2()
   }
}
</code></pre>

<p>One issue that is obvious here is the name collisions.  I&#8217;m not sure
how to resolve that.  It seems like the older way of native functions
within their own module (<code>extern "C" mod foo</code>) would solve it.  Well,
we&#8217;ll do something.  And the precise details of this auto-generation
remain to be resolved.  But you get the idea.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Destructors and finalizers in Rust]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/01/17/destructors-and-finalizers-in-rust/"/>
    <updated>2013-01-17T09:45:00-05:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/01/17/destructors-and-finalizers-in-rust</id>
    <content type="html"><![CDATA[<p>Rust features destructors and, as of this moment, they are simply not
sound with respect to many other features of the language, such as
borrowed and managed pointers.  The problem is that destructors are
granted unlimited access to arbitrary data, but the type system and
runtime do not take that into account.  I propose to fix this by
limiting destructors to <em>owned</em> types, meaning types that don&#8217;t contain
borrowed or managed pointers.</p>

<!-- more -->


<h3>Dangers today</h3>

<p>The root of our problems lies in the fact that if you have a struct
type <code>S</code> that has a destructor, it is legal to place an instance of
<code>S</code> into a managed box (<code>@S</code>).  This is problematic because it implies
that the destructor will run when the managed box is collected, which
can occur at any arbitrary time (in fact, if the garbage collector
were to run on a different thread, it could even occur in parallel
with the owning thread!).  I will use the term <a href="http://en.wikipedia.org/wiki/Finalizer">finalizer</a>
to mean a destructor associated with an object that is owned by a
managed box.  In other words, a destructor that can run asynchronously
with respect to the main program.</p>

<p>Note: Many of the thoughts in this post were inspired by Hans Boehm.
For those seeking a deeper undestanding, I recommend his paper
<a href="http://www.hpl.hp.com/techreports/2002/HPL-2002-335.html">&#8220;Destructors, Finalizers, and Synchronization&#8221;</a>.</p>

<h4>Problem number one: finalizers and borrowed pointers</h4>

<p>In our current system, there is nothing to prevent a borrowed pointer
from being stored in a managed box.  Although it is sometimes
surprising to people that it is legal, this scenario is generally
harmless.  Although the managed box may outlive the data that the
borrowed pointer references, the type system will guarantee that the
managed box will never be <em>dereferenced</em> once the loan expires.  In
other words, you can put a pointer into your stack frame into a
managed box, but you could never return that managed box to your
caller or store it into any data structure that outlives your stack
frame.  So we know for certain that <em>if we were to run the garbage
collector, that box would be collected</em>.  Finalizers change this
equation.  A finalizer provides a backdoor that would allow borrowed
pointers in managed boxes to be dereferenced.  See <a href="https://github.com/mozilla/rust/issues/3167">issue 3167</a> for
examples of dangerous programs and more details.</p>

<p>The only way I can see to address this unsoundness is to create a new
intrinsic trait that indicates when data can safely be placed into a
managed box.  I have some thoughts on this at the end.</p>

<h4>Problem number two: finalizers and managed data</h4>

<p>There is another dangerous situation that can arise which has nothing
to do with borrowed pointers.  Imagine we have a cycle of managed data
and two objects on that cycle have a finalizer.  Which finalizer do
you run first?  Normally, you want to finalize an object X before you
finalize any object Y that X references, but because there is a cycle
that is impossible to guarantee.  Different systems have solved this
problem in different ways, none of which are wholly satisfactory.</p>

<h4>Problem number three: finalizers and mutable state</h4>

<p>Another more subtle problem which can occur with finalizers is that
the finalizer may have access to mutable state which is not yet dead.
Imagine, for example, a struct whose job is to increment and decrement
a counter automatically:</p>

<pre><code>struct SomeDataStructure { value: uint, ... }

struct Counter { s: @mut SomeDataStructure }
impl Counter: Drop {
    fn new(s: @mut SomeDataStructure) -&gt; Counter {
        s.value += 1;
        Counter { s: s }
    }
    fn drop(self) { self.s.value -= 1; }
}
</code></pre>

<p>As long as this counter is stored on the stack frame, everything
should be fine.  But if you were to place this counter into a managed
box, suddenly you have a ticking time bomb: now the field <code>s.value</code>
will be decremented at some random time, whenever the garbage
collector elects to collect this managed box.  Even if the garbage
collector does not run in parallel with the mutator thread, this can
essentially cause <code>s.value</code> to be decremented in between virtually any
statement, leading to race conditions that are very similar to those
problems you face with threads and mutable state.  Note that due to
compiler optimizations and so forth it is entirely possible for value
to be decremented earlier than you might expect as well as later.</p>

<p>Of all the problems, I am perhaps most worried about this one, because
it is relatively easy to overlook.  It&#8217;s not a soundness issue per se
but it can lead to very surprising bugs, particularly in light of
aggressive compiler optimization.  Hans Boehm goes so far as to say
that finalizes <em>require</em> a multithreaded, shared memory context to
make any sense, precisely because of Problem #3.  Basically, the
asynchrony inherent in finalizers is more natural in a parallel
language and you have tools like locks to defend against it.  If
finalizers run in the mutator thread, locks lead to deadlocks and not
having locks leads to bugs.</p>

<p>You might think that moving data into a managed box can only cause the
destructor to be delayed from when it would otherwise run, but this is
not the case.  In fact the destructor can also run much <em>earlier</em> than
you might expect.  Consider this Java program from Boehm&#8217;s paper:</p>

<pre><code>class X {
    Y mine;
    public foo() { Mine m = mine; ...; m.bar(); }
    public void finalize() { mine.baz(); }
}
</code></pre>

<p>Here, in the <code>foo()</code> method, the <code>this</code> pointer may actually be dead
right after the first statement, and so <code>this</code> can be collected before
<code>m.bar()</code> is called.  Boehm&#8217;s point with this example is that, in
Java, this could result in <code>m.bar()</code> and <code>mine.baz()</code> executing in
parallel, but in general the behavior is very surprising.  I recall
that similar problems were prevalent with Apple&#8217;s failed attempt at an
Objective-C garbage collector.</p>

<h3>Restricting to owned data</h3>

<p>All of these problems are solved by limiting destructors to types
which contain only owned data.  Borrowed pointers and managed pointers
are disallowed, so problems one and two cannot arise. Problem three
cannot arise because there is no way for the destructor to directly
access shared, mutable state.</p>

<p>Limiting to owned data still permits many interesting use cases for
destructors. You can embed a file descriptor and guarantee it gets
closed.  You can ensure that random C resources, such as database
descriptors or blocks of memory obtained from <code>malloc()</code>, are cleaned
up, since these are typically described by unsafe pointers anyhow.
You can also embed a channel and use it to send messages from the
destructor.</p>

<p>There is one very useful scenario that is ruled out, however, which is
basically the &#8220;auto counter&#8221; (or any &#8220;auto adjustment&#8221;) type from
problem number three.  That is, it is often very useful to have some
adjustment that will automatically occur when a stack frame exits, and
destructors are one common way to achieve that.  Of course this is
dangerous if abused, as we have seen, but what about the good guys,
who <em>don&#8217;t</em> put an auto-object into managed data?</p>

<p>The good news is that even with the limitation I propose there are
still two valid ways to achieve the auto-pattern, depending on your
precise needs.  First, if you don&#8217;t care whether the auto code
executes on failure&#8212;and you probably don&#8217;t, remember that Rust
failures are unrecoverable&#8212;you can just use a function with a
closure argument:</p>

<pre><code>fn auto_adjust&lt;R&gt;(s: @mut SomeDataStructure, f: &amp;fn() -&gt; R) -&gt; R{
    s.value += 1;
    let v = f();
    s.value -= 1;
    return v;
}
</code></pre>

<p>Now in your code you can write:</p>

<pre><code>do auto_adjust(s) { ... }
</code></pre>

<p>But what if you really <em>do</em> care about failure <em>and</em> you need access
to the current stack frame when unwinding?  We should be able to
provide a function in the standard library to handle this case.  That
function would look something like:</p>

<pre><code>do defer(|| {
    /* This code will execute once the block below exits, even on failure */
}) {
    /* This code executes immediately */
}
</code></pre>

<p>Naturally we can play around with the precise signature of this
function a bit, but you get the idea: you supply two closures to a
library function, it executes them as appropriate.  Internally, the
function would use unsafe pointers and a destructor, but it would
never expose the object that carries the destructor to the outside,
and thus could ensure that this object is never placed into a managed
box.</p>

<h3>Caveat: Not quite future proof</h3>

<p>In some sense, I am advocating the conservative approach: we begin
with a narrow set of types that can have a destructor, and we can then
expand later if that proves to be insufficient. However, there is a
catch.  If we ever wanted to permit borrowed pointers to be referenced
by destructors, the only way that this can be made sound is to limit
the set of types that can be placed into a managed box.  Since, at the
moment, any type can be placed into a managed box, this is a backwards
incompatible change.  To see what I mean, consider a function like
<code>box()</code>:</p>

<pre><code>fn box&lt;A&gt;(a: A) -&gt; @A { @a }
</code></pre>

<p>This function is legal today, but it would become illegal.  This is because
there is no guarantee that <code>A</code> can be placed into a managed box.  So you&#8217;d
need to write something like:</p>

<pre><code>fn box&lt;A:Manageable&gt;(a: A) -&gt; @A { @a }
</code></pre>

<p>where <code>Manageable</code> is the hypothetical intrinsic trait that
characterizes types that can safely be placed into managed boxes.  Of
course we could change the defaults, so that <code>&lt;A&gt;</code> no longer means
&#8220;any type at all&#8221; but rather &#8220;the usual set of types you want to do
the usual set of operations&#8221; (in which case, perhaps <code>A:</code> would mean
&#8220;any type at all&#8221;, I don&#8217;t know).  But that too is backwards
incompatible.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Revised for loop protocol]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/01/16/revised-for-loop-protocol/"/>
    <updated>2013-01-16T12:13:00-05:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/01/16/revised-for-loop-protocol</id>
    <content type="html"><![CDATA[<p>In Rust today, there is a
<a href="http://brson.github.com/rust/2012/04/05/new-for-loops/">special <code>for</code> syntax designed to support interruptible loops</a>.
Since we introduced it, this has proven to be a remarkable success.
However, I think we can improve it very slightly.</p>

<h3>Current for protocol</h3>

<p>The current &#8220;for protocol&#8221; is best explained by giving an example of
how to implement it for slices:</p>

<pre><code>fn each&lt;E&gt;(v: &amp;[E], f: &amp;fn(&amp;E) -&gt; bool) {
    let mut i = 0;
    let n = v.len();
    while i &lt; n {
        if !f(&amp;v[i]) {
            return;
        }
        i += 1
    }
}
</code></pre>

<p>As you can see, the idea is that the last parameter to the <code>each()</code>
method is a function of type <code>&amp;fn(&amp;E) -&gt; bool</code>, which means that it is
given a pointer to an element in the collection and it returns true or
false.  The return value indicates whether we should continue
iterating.</p>

<p>A little known fact is that the <code>for</code> statement returns whatever the
<code>each()</code> method returns.  This means that <code>each()</code> methods typically
have unit return type so that the Rust compiler doesn&#8217;t require a
semicolon, which would be used to disregard the result of the <code>for</code>
expression.</p>

<h3>Problems</h3>

<p>The biggest problem with this protocol is that it is not easily
composable.  In particular, imagine that I have a simple tree like
this:</p>

<pre><code>struct Tree&lt;E&gt; {
    elem: E,
    children: ~[Tree&lt;E&gt;]
}
</code></pre>

<p>Now let&#8217;s try to implement the pre-order traversal method for such a
tree.  You might think you could do it like this:</p>

<pre><code>fn each&lt;E&gt;(t: &amp;Tree&lt;E&gt;, f: &amp;fn(&amp;E) -&gt; bool) {
    if !f(&amp;t.elem) {
        return;
    }

    for t.children.each |child| { each(child, f); }
}
</code></pre>

<p>While this will compile, it will not work as expected. For example, this
program:</p>

<pre><code>fn main() {
    let t = Tree {
        elem: 0,
        children: ~[
            Tree { elem: 1, children: ~[
                Tree { elem: 2, children: ~[] }
            ] },
            Tree { elem: 3, children: ~[] }
        ]
    };

    for each(&amp;t) |e| {
        io::println(fmt!("%d", *e));
        if *e == 1 { break; }
    }
}
</code></pre>

<p>should print &#8220;0&#8221; and &#8220;1&#8221;, but it prints &#8220;0&#8221;, &#8220;1&#8221;, and &#8220;3&#8221;.  The reason
is that while <code>each()</code> does indeed return early when the iteration
function returns false, it doesn&#8217;t abort the entire iteration, only
the current subtree.</p>

<p>One way to fix this is to wrap the <code>each()</code> function with an inner
each function that returns a bool to indicate whether execution should
stop:</p>

<pre><code>fn each1&lt;E&gt;(t: &amp;Tree&lt;E&gt;, f: &amp;fn(&amp;E) -&gt; bool) {
    each_inner(t, f);

    fn each_inner&lt;E&gt;(t: &amp;Tree&lt;E&gt;, f: &amp;fn(&amp;E) -&gt; bool) -&gt; bool {
        if !f(&amp;t.elem) {
            return false;
        }

        for t.children.each |child| {
            if !each_inner(child, f) {
                return false;
            }
        }

        return true;
    }
}
</code></pre>

<h3>Making <code>each()</code> composable</h3>

<p>I think that we should change the standard <code>each</code> signature to:</p>

<pre><code>fn each&lt;E&gt;(c: &amp;Coll&lt;E&gt;, f: &amp;fn(&amp;E) -&gt; bool) -&gt; bool
</code></pre>

<p>Here the return value of <code>each</code> is always a boolean, and it will be false
if the last call to <code>f()</code> returned false, and true otherwise.  This
makes it easier to write composed <code>each()</code> methods.  We would also
adjust <code>for</code> statements so that they always return unit and do not
return the result of <code>each()</code>.</p>

<p>Under this definition, we could write the tree iterator as follows:</p>

<pre><code>fn each2&lt;E&gt;(t: &amp;Tree&lt;E&gt;, f: &amp;fn(&amp;E) -&gt; bool) -&gt; bool {
    f(&amp;t.elem) &amp;&amp; t.children.each(|c| each2(c, f))
}
</code></pre>

<p>This is clearly an improvement over <code>each1()</code>!</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Lifetime notation redux]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/01/15/lifetime-notation-redux/"/>
    <updated>2013-01-15T08:34:00-05:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/01/15/lifetime-notation-redux</id>
    <content type="html"><![CDATA[<p>In a <a href="http://smallcultfollowing.com/babysteps/blog/2012/12/30/lifetime-notation/">previous post</a> I outlined some of the options for updating our
lifetime syntax.  I want to revist those examples after having given
the matter more thought, and also after some discussions in the
comments and on IRC.</p>

<p>My newest proposal is that we use <code>&lt;&gt;</code> to designate lifetime
parameters on types and we lean on semantic analysis (the resolve
pass, more precisely) to handle the ambiguity between a lifetime name
and a type name.  Before I always wanted to have the distinction
between lifetimes and types be made in the parser itself, but I think
this is untenable.  This proposal has the advantage that the most
common cases are still written as they are today.</p>

<p>Here is the example from the previous post in my proposed notation:</p>

<pre><code>struct StringReader&lt;&amp;self&gt; {
                 // ^~~~~ Lifetime parameter designated with &amp;
    value: &amp;self/str,
        // ^~~~~~~~~ Same as today.
    count: uint
}

impl StringReader {
    fn new(value: &amp;self/str) -&gt; StringReader&lt;&amp;self&gt; {
                             //              ^~~~~
                             // Interpreted as a lifetime reference due to
                             // the declaration of StringReader, which states
                             // that first parameter is a lifetime.
        StringReader { value: value, count: 0 }
    }
}

fn value(s: &amp;v/StringReader&lt;&amp;v&gt;) -&gt; &amp;v/str {
         // ^~~~~~~~~~~~~~~~~~~     ^~~~~~
         // As today, lifetime names that appear in a function declaration
         // do not have to be declared anywhere and are implicitly scoped
         // to the containing function declaration.
    return s.value;
}

fn remaining(s: &amp;StringReader&lt;&amp;&gt; -&gt; uint {
             // ^~~~~~~~~~~~~~~~
             // A bare &amp; in a fn decl means "use a fresh name",
             // so this is equivalent to &amp;x/StringReader&lt;&amp;y&gt;.
             // This may be the right thing, see Option 2 below.
    return s.value.len() - s.count;
}
</code></pre>

<p>What follows are miscellaneous notes and thoughts.  There are a few
options that could be tweaked, which I have noted.</p>

<h3>Considerations</h3>

<p>The only way I have found to distinguish lifetime names purely in the
parser that is also visually appealing is to use braces to designate
lifetimes (options 7 and 8 in my <a href="http://smallcultfollowing.com/babysteps/blog/2012/12/30/lifetime-notation/">previous post</a>).  As a reminder,
the impl of <code>StringReader</code> would look like:</p>

<pre><code>impl StringReader {
    fn new(value: &amp;{self} str) -&gt; StringReader{self} {
        StringReader { value: value, count: 0 }
    }
}
</code></pre>

<p>The major problem here is that, as bstrie pointed out on IRC, it&#8217;s
ambiguous: the <code>{self}</code> which appears in the return type could be
interpreted as the function body.  His proposed fix was to use
whitespace sensitivity, so that <code>StringReader{self}</code> and <code>StringReader
{self}</code> are parsed differently, but whitespace sensitivity is
something we have always tried to avoid.</p>

<p>I personally find it appealing to use <code>&lt;&gt;</code> both for lifetime and type
parameters, because I think it gives the right intution.  A
lifetime-parameterized declaration is just like a type-parameted
declaration with regard to how it works in the type system.</p>

<p><strong>OPTION 1:</strong> I opted to include <code>&amp;</code> in the lifetime parameters to a
type for consistency (this way, a lifetime name is always preceded by
<code>&amp;</code>).  However, they are not strictly necessary and they are visually
heavy.  We could remove them, which would mean you have
<code>&amp;v/StringReader&lt;v&gt;</code> and <code>StringReader&lt;self&gt;</code> and not
<code>&amp;v/StringReader&lt;&amp;v&gt;</code> and <code>StringReader&lt;&amp;self&gt;</code>.  However, the default
would still have to be written <code>&amp;</code>, so you&#8217;d still have
<code>StringReader&lt;&amp;&gt;</code>.</p>

<h3>The default lifetime &amp;</h3>

<p>In this proposal, the &#8220;default lifetime&#8221; <code>&amp;</code> would only be usable
inside a function declaration or function body.  In a function
declaration, it means &#8220;use a fresh lifetime.  In the function body it
means &#8220;use inference&#8221;.</p>

<p><strong>OPTION 2:</strong> It would be possible to make <code>&amp;</code> a little smarter, as it
is today.  Today it means &#8220;use a fresh name unless <code>&amp;</code> appears on a
nested type, then use the lifetime you are nested within&#8221;.  If we took
that interpretation, then <code>&amp;StringReader&lt;&amp;&gt;</code> would be equivalent to
<code>&amp;x/StringReader&lt;&amp;x&gt;</code> and not <code>&amp;x/StringReader&lt;&amp;y&gt;</code>.  This is more
likely to be what the user wanted, though I don&#8217;t think it makes much
difference in practice.  I&#8217;d probably just want to experiment a bit
here: start with the simpler version, as I proposed here, and then see
how many type errors we get</p>

<p><strong>OPTION 3:</strong> We could also allow users to leave off the <code>&lt;&gt;</code> if the
only parameter is a lifetime parameter, in which case it would be
equivalent to <code>&lt;&amp;&gt;</code>.  This means that you could write <code>&amp;StringReader</code>
instead of <code>&amp;StringReader&lt;&amp;&gt;</code>.</p>

<p><strong>OPTION 4:</strong> The one place that I opted to eschew explicit declarations
is on functions.  If we wanted, we could always require that all named
lifetimes be declared, which would mean that the function <code>value()</code>
above would be written:</p>

<pre><code>fn value&lt;&amp;v&gt;(s: &amp;v/StringReader&lt;v&gt;) -&gt; &amp;v/str {
    return s.value;
}
</code></pre>

<p>I can&#8217;t decide about this option.  It strikes me as a reasonably
simple story, which appeals to me, but it&#8217;s also fairly heavyweight.</p>

<h3>How complex can it get?</h3>

<p><strong>UPDATE</strong>: Per bstrie&#8217;s request, here is an example of a type that
uses both lifetime and type parameters with trait bounds:</p>

<pre><code>struct Foo&lt;&amp;self, T: Reader+Eq&gt; {
    value: &amp;self/T,
    count: uint
}

fn operate&lt;R: Reader+Eq&gt;(f: Foo&lt;&amp;, R&gt;)
{
    ...
}
</code></pre>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[The case FOR deterministic results]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/01/03/the-case-for-deterministic-results/"/>
    <updated>2013-01-03T19:05:00-05:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/01/03/the-case-for-deterministic-results</id>
    <content type="html"><![CDATA[<p>In my last post, I made the case against having a deterministic
semantics.  I&#8217;ve gotten a fair amount of feedback saying that, for a
Web API, introducing nondeterminism is a very risky idea.  Certainly
the arguments are strong.  Therefore, I want to take a moment and make
the case <em>for</em> determinism.</p>

<!-- more -->


<h3>Why determinism?</h3>

<p>All things being equal, it&#8217;s clear that deterministic execution
semantics are preferable.  They&#8217;re easier to debug and they avoid the
question of browser incompatibilities.</p>

<p>One interesting observation (most recently pointed out by
<a href="http://smallcultfollowing.com/babysteps/blog/2013/01/02/deterministic-or-not/#comment-753987533">Roc in this comment</a>) is that while the intention of a
nondeterministic ordering is to free up the implementor, what
sometimes happens is that all people wind up relying on the behavior
of one implementation, and then the others follow suit.</p>

<p>That said, there seem to be numerous examples of nondeterministic
portions of JavaScript: the iteration order for properties, for
example, or the order of callbacks to the comparator in <code>Array.sort</code>.
But then there are plenty of examples of implementations being
constrained by arbitrary behavior inherited from legacy interpreters.
In any case, I am not an expert in these kind of nitty gritty
cross-browser compatibility details.</p>

<p>If we did opt for nondeterministic semantics, it might be plausible to
use a cheap PRNG like <a href="http://en.wikipedia.org/wiki/Xorshift">Xorshift</a> in the sequential fallback so as
to make it more likely that unwanted dependencies on execution
ordering would be seen during testing (though, of course, this adds
overhead too!).</p>

<h3>What about performance?</h3>

<p>It is true that nondeterministic semantics give maximum efficiency,
but the magnitude of these performance gains is not entirely clear.
In my previous post, for example, I stressed the behavior around
bailouts.  It is true that guaranteeing deterministic semantics will
result in wasted work and in general make bailouts less efficient.
<em>However,</em> it is also true that bailouts are the exceptional case: if
things are working properly, they should only occur at the beginning
of execution.  Once sufficient type information has been gathered,
parallel execution without bailouts should be the norm&#8212;unless of
course it turns out the code is not parallelizable, either because it
is impure or because it uses some unsupported language features, in
which case we will simply use the sequential fallback from the start
and not even <em>attempt</em> parallelism.</p>

<p>So, at least in the case of functions like <code>map()</code>, efficiency in the
<em>steady state</em> should not be negatively impacted by deterministic
semantics, presuming that the kernel function is pure.</p>

<h3>But what about reduce, scan, and scatter?</h3>

<p>As I wrote in the previous post, the current semantics of
<code>ParallelArray</code> are inconsistent.  They give deterministic results for
<code>map()</code> but not for <code>reduce()</code>, <code>scan()</code>, or <code>scatter()</code>.  The core
problem here is that the standard sequential ordering for <code>reduce()</code>
(i.e., left-to-right) is inherently sequential&#8212;and the most
efficient ordering will depend on the precise implementation strategy
(how many worker threads are involved, etc). But, at the cost of some
efficiency, we <em>can</em> choose a deterministic ordering that still
permits parallel execution.</p>

<p>For reduce, a good ordering might be to evaluate in a tree-like
fashion.  So we would first reduce indices 0 and 1, then 2 and 3, 4
and 5, and so on, resulting in an array with length <code>N/2</code>.  We can
then repeat the reduction until the result has length 1.  If at any
step we get an array with an odd length, we can reduce the final
element in with the final pair.  A similar ordering can be used for
scan, though the need to preserve intermediate results creates
complications.</p>

<p>The scatter operation, at least when a conflict function is provided,
is much more difficult to parallelize.  I think that the only way it
is possible is to make each thread walk the entire list of targets but
only process writes to a specific subset of the array.  If no conflict
function is provided, or if the conflict function is one that is known
to be associative and commutative (such as integer addition&#8212;though
not floating point, sadly), then parallelization of scatter is also
relatively straightforward.</p>

<h3>So what&#8217;s the right thing to do?</h3>

<p>At this point, the answer is probably &#8220;measure&#8221; or perhaps &#8220;wait and
see&#8221;.  I am somewhat concerned though about basing too many decisions
on the performance of our current implementation, both because it has
not been heavily optimized and because it is only one model of
execution (parallel worker threads).  But we&#8217;ve got to go on
something.</p>

<p>If the API were designed for <em>me personally</em> to use, I would want it
to have nondeterministic semantics.  This gives maximum flexibility to
the implementation without opening the door to data races.  However, I
am certainly appreciative of the concerns regarding
debugability&#8212;performance is not everything!</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Deterministic or not?]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2013/01/02/deterministic-or-not/"/>
    <updated>2013-01-02T12:30:00-05:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2013/01/02/deterministic-or-not</id>
    <content type="html"><![CDATA[<p>One of the interesting questions with respect to Parallel JS is what
the semantics ought to be if you attempt a parallel operation with a
kernel function that has side-effects.  There are basically three
reasonable options:</p>

<ol>
<li><em>Deterministic results where possible:</em> The function behaves &#8220;as
if&#8221; it executed sequentially, executing the kernel from 0 to n,
just like <code>Array.map</code>.</li>
<li><em>Error:</em> An exception is thrown.</li>
<li><em>Non-determinstic results:</em> The function behaves &#8220;as if&#8221; it
executed sequentially, but the items were mapped in an unspecified
order.</li>
</ol>


<p>The <a href="https://github.com/syg/iontrail">branch</a> currently implements option 3: I believe it is
the most consistent and will yield the best performance.  However,
reasonable people can differ on this point, so I want to make my case.</p>

<!-- more -->


<p></p>

<h3>Why not ensure deterministic results?</h3>

<p>At first glance, at least, it seems like having deterministic results
would be the most convenient option.  After all, <code>ParallelArray.map()</code>
could then be used as a drop-in equivalent to <code>Array.map()</code>.  However,
there are two reasons that I am concerned about this option:</p>

<ol>
<li>Deterministic results are not possible for all operations.</li>
<li>Deterministic ordering can hamper efficient parallel execution.</li>
</ol>


<h4>Deterministic results are not possible for all operations.</h4>

<p>If you examine <a href="http://wiki.ecmascript.org/doku.php?id=strawman:data_parallelism#reduce">the specification for <code>ParallelArray.reduce()</code></a>,
you will see that it permits <code>reduce()</code> to reduce the items in the array
in any order:</p>

<blockquote>
Reduce is free to group calls to the elemental function and reorder
the calls. For an elemental function that is associative and
commutative the final result will be the same as reducing from left to
right&#8230;reduce is only required to return a result consistent with
some call ordering and is not required to chose the same call ordering
on subsequent calls.
</blockquote>


<p>This means that the result of <code>reduce()</code> is only deterministic if the
kernel function is associative and commutative.  Without this
requirement, implementations would be severely limited in how they
could perform parallel reduction, and in some cases parallel execution
would be too expensive to be worthwhile (interestingly, the sequential
fallback would be more expensive too, because the best sequential
ordering for reduction is not parallelizable at all).</p>

<p>For this reason, I am wary of telling users that <code>ParallelArray</code>
methods have equivalent semantics to <code>Array</code> methods.  This is only
partially true and it can never be fully true.  In general, I think
users will want the <em>best parallel performance they can get</em>, and they
will accept whatever restrictions are required to get it.</p>

<h3>Deterministic ordering can hamper efficient parallel execution.</h3>

<p>We just saw that, in the case of <code>reduce()</code>, any deterministic
ordering either prevents efficient parallel execution or it prevents
efficient sequential execution.  To a lesser degree, the same is true
of <code>map()</code>.  Unlike <code>reduce()</code>, though, the problems with <code>map()</code> are
somewhat subtle.</p>

<p>You may recall that when we perform parallel execution, there is
always the possibility of <em>bailout</em>.  A bailout can occur because we
detect that a write would cause a visible side-effect, but it can also
occur for arbitrary, internal reasons.  Bailouts often occur because
there is some portion of the code that has not yet been executed in
the warmup runs, and hence the type information that we gathered was
inaccurate.</p>

<p>The question is, when a bailout occurs, what do we do with the results
that were successfully computed by the previous parallel iterations?
It would be very nice if we were able to make use of those results.
If we do guarantee deterministic results, however, this is somewhat
tricky.</p>

<p>Imagine for a moment there are just two threads processing a
1000-element array.  In our current system, each worker will be
responsible for mapping half of the array.  So imagine that the first
worker has processed indices 0 to 22 when it encounters a bailout due
to insufficient type information.  To gather type information, we need
to execute iteration 23 sequentially in the interpreter.  In fact,
while we&#8217;re in the interpreter, it&#8217;s probably best if we do a chunk of
iterations&#8212;say, 23 to 32 or so, just to gather up more
data. Meanwhile, let&#8217;s say that the second worker has processed
entries 500 to 600 before it notices that the first worker bailed out
and follows suit.</p>

<p>Unfortunately, if we are guaranteeting deterministic results, it is
very difficult for us to make use of the results for entries 500 to
600 that were already computed.  It&#8217;s possible, after all, that when
we re-run iterations 23 to 32 in the interpreter, they will modify
shared state, and that could affect the computations that already
occurred. In effect, all parallel iterations are inherently
<em>speculative</em>.</p>

<p>As annoying as it is to have to throw away indices 500 to 600, it is
equally annoying if the bailout should occur in the second worker
(say, while processing index 601).  In that case, we have a problem:
we&#8217;d like to run index 601 in the interpreter, but it&#8217;s possible that
this too will cause side-effects.  That means that we must ensure that
all indices prior to 601 are fully processed (which, at the time of
bailout, they are not).  Given enough bookkeeping, we can manage this
too: we could perhaps re-spawn parallel workers to process from 23 to
600, for example, and then re-spawn to process from 600 to 1000.</p>

<h3>What if we don&#8217;t guarantee deterministic ordering?</h3>

<p>If, however, we choose <em>not</em> to guarantee deterministic results, these
problems become much simpler.  When a bailout occurs, we can simply
have each worker execute sequentially from wherever it left off.  So
to continue with our example, worker 0 would run from indices 23 to 30
or so, and perhaps worker 1 would run from 601 to 632.  This way they
can both gather a bit more data.  Then we resume parallel execution.</p>

<p>This is in fact more-or-less exactly what
<a href="https://github.com/syg/iontrail">our current code does</a>.  Each worker tracks its current
position.  Whenever the kernel function is invoked, it processes the
next data it has to proccess, and it updates its current position as
it goes.  If a bailout occurs, then the current position is not
updated, so when the kernel function is invoked next (which will be
from the interpreter), it will pick up where it left off and process
for a while.  I can dive into the intimate details of this in a
separate post.</p>

<h4>What about reporting an error?</h4>

<p>Prohibiting impure operations altogether does enable the same
optimizations as nondeterministic ordering, but (as I
<a href="http://smallcultfollowing.com/babysteps/blog/2012/10/24/purity-in-parallel-javascript/">argued in a previous post</a>) it does so by imposing a
significant and unnecessary implementation and performance burden on
the sequential fallback code.  So I am not a fan of this approach.</p>

<h3>The take-away</h3>

<p>My feeling is this.  Deterministic ordering sounds like it makes
things easier for the end-user, but the story is actually more
complex, as only some operations are deterministic.  Moreover it
imposes performance burdens on the implementation, so even the &#8220;good
guys&#8221; who stick to pure operations will pay the price, at least until
optimizations become more highly tuned to cover all kinds of corner
cases.</p>

<p>In general, I think people will turn to <code>ParallelArray</code> when they have
pure operations to perform: for pure operations, deterministic
ordering simply imposes a performance burden. If you have an impure
operation for which ordering is significant, it seems to me you would
be better off just using normal arrays or coding the operation in a
sequential fashion.</p>

<p>Clearly this is an area where reasonable people can differ, though,
and I would not be surprised if we ultimately decide to guarantee
determinism where possible (i.e., <code>map()</code> and <code>filter()</code>).</p>
]]></content>
  </entry>
  
</feed>
