Recurring closures and dynamically sized types
13 May 2013
I realized today that there is an unfortunate interaction between the proposal for dynamically sized types and closure types. In particular, in the case of the recurring closure, I described the soundness issues that arise in our language when closures are able to recurse.
My solution for this was to make the type system treat a &fn()
value
the same way it treats &mut T
pointers: they would be non-copyable,
and when you invoke them, that would be effectively like a “mutable
borrow”, meaning that for the duration of the call the original value
would become inaccessible. So in short the type system would guarantee
that when you call a closure, that same closure is not accessible from
any other path in the system, just as we now guarantee that when you
mutate a value, that same value is not accessible from any other path
in the system.
This is all well and good, and I think this treatment would be largely
invisible to the user under common access patterns. However, it does
not play well with the proposal for dynamically sized types,
because under this proposal all things written &T
must behave the
same, no matter what T
is. This is in fact the whole point of the
proposal! But here I want to treat &fn
specially.
I’ve been pondering various solutions this morning. I have come up with two possible avenues:
Instead of writing
&fn()
you could write&mut fn()
. This is perhaps the “principled” solution, but I consider it rather a non-starter. Writing&fn()
for a closure is…tolerable, but&mut fn()
is not. It’s verbose and it seems sort of nonsensical (although there is some logic to it, when you consider that calls to the function may mutate the environment and so forth).We go back to the older notation and move sigils for closures after the fn. This actually has some notational perks. For example, rather than writing
&fn()
we can just writefn()
(if there is no sigil, we can default to&
). On the minus side, a sendable closure would be writtenfn~()
—but, then again, under the dynamically sized types proposal, sendable closures were going to be written~fn:Owned()
, so isfn~()
really so bad?
More details after the fold.
OK, let’s dig into the details a bit more. As anyone who has been following my blog posts probably knows by now, there are many, many use cases for closures. I want to dive into the use cases that are on my mind and elaborate on them. I also want to take this case to write up a bit more thoroughly how I think closures should work, including a few unrelated issues.
Syntax and use cases
Here is a list of use cases to be accommodated:
- “Higher-order functions”: simple functions like
map
,fold
and so forth. By far the most common use case. - “Once functions”: functions that can only execute once. This means that they can move values out of their environment.
- “Sendable functions”: functions that can be sent between tasks. This means that they only close over “sendable” values (no garbage-collected data or borrowed pointers).
- “Sendable once functions”: sendable functions that can only execute once. This is what a task body will be.
- “Const functions”: functions that do not close over mutable state. We don’t make much use of this yet, but I plan to do so in order to achieve lightweight fork-join parallelism a la PJS.
The use cases above seem to me to be the “bread and butter” cases that
will arise frequently. I will go over the syntax and give an example
for each of those use cases shortly. Interestingly, I think that all
of them actually read reasonably well if the sigils are moved after
the fn
keyword, and in some cases the examples read much better.
However, there are two additional use cases that I have considered in the past which I left out. These use cases become significantly harder to read under the new proposal (though they were always hard to read). Interestingly, I realized while writing this blog post that I think these use cases are no longer terribly important, since both of them can be expressed equally well using objects instead of closures, as I will explain shortly. The two use cases are:
- “Sendable const functions”: functions that can be sent between tasks and do not close over mutable state. You could safely share such functions between tasks in an ARC (atomically referenced counted container) and execute them multiple times in parallel.
- “Combinators”: combinator libraries create and return closures that closure over their arguments, which may include borrowed values.
Higher-order functions
Here is an example of a simple higher-order function (with the closure type highlighted):
impl<T:Sized> for [T] {
pub fn map<U:Sized>(f: fn(&T) -> U) -> ~[U] { ... }
// ^~~~~~~~~~~
}
For contrast, this is &fn(&T) -> U
today.
Once functions
Here is an example of a higher-order function that executes at most once:
impl<T:Sized> for Option<T> {
pub fn each(f: once fn(&T) -> bool) -> bool { ... }
// ^~~~~~~~~~~~~~~~~~~
}
}
For contrast, this is &once fn(&T) -> U
today.
Sendable functions and sendable once functions
Here is an example of a sendable once function:
fn spawn(f: once fn~()) {...}
// ^~~~~~~~~~
The ~
after the fn
tells the type system that the environment for
this function is allocated using an owned pointer. It also implies a
default bound of Owned
. The once
tells the type system that the
function will only execute once.
For contrast, this is ~once fn()
today.
Const functions
Here is an example of how I would use a const function to achieve lightweight parallelism:
impl<T:Sized> for [T] {
pub fn par_map<U:Sized>(f: fn:Const(&T) -> U) -> bool { ... }
// ^~~~~~~~~~~~~~~~~
}
This is a parallel map function. It is similar to the regular map
except that its iterations execute in parallel. As a consequence, it
demands a fn:Const
rather than a fn
—the Const
bound specifies
that all the environmental state must be immutable. This is exactly
the “patient parent” or “parallel closures” model that is used in
PJS and described in this HotPar paper I wrote.
For contrast, this is &fn:Const()
today.
Sendable const functions
Sendable const functions are one of the two cases that I said would
become less attractive under the new proposal. They would look
something like fn~:Const
(vs ~fn:Const
today). The newer syntax
works and should be available, but it’s hard to read, due I think to
the juxtaposition of ~
(which specifies the kind of pointer used for
the environment) and the :
that begins the bound specifier :Const
.
If this use case were important, I might be worried that the syntax is
too ugly, but when I tried to come up with an example for where this
use case would be needed, I realize that time has left the use case
behind to some extent.
The primary use case for a sendable const function initially was to
allow hashtables to be placed in ARCs—the reason for this was that a
HashMap
requires closures for for computing the hash function of its
argument, and those to share the hashmap (and perform parallel
lookups) we had to be sure that the closures would not mutate any
state. However, this is somewhat outdated, because hashing and
equality comparison today is based on traits rather than closures.
Now, using traits is somewhat limited, because due to coherence it means that any one type can only be hashed in one way, and sometimes you would like to have specialized hashing for specific circumstances. But these use cases can easily be accommodated in three ways:
Using newtyped keys (
struct MyKey(key)
) and defining different implementations for the hashing and equality traits onMyKey
.If a newtyped key is not acceptable, you can write a hash table that takes a simple function pointer (
extern "Rust" fn
) rather than a closure. Function pointers carry no state, but state is rarely needed for equality comparisons.If you really need state, then you can write a specialized trait in lieu of a closure:
trait HashFuncs<K> { fn hash(&self, k: &K) -> uint; fn eq(&self, k1: &K, k2: &K) -> bool; }
Now your hashtable can either take a
~HashFuncs
object to use for hashing and equality comparison or, if you wish to avoid dynamic dispatch for performance reasons, you can parameterize your hashtable type by the instance ofHashFuncs
that it should use:struct MyHashMap<K,V,F:HashFuncs<K>> { f: F, ... }
Combinators
General purpose combinators are the other case that (might) get less attractive. This is less clear cut. The idea of a combinator library is that you have functions that return functions, and then you can compose these functions into bigger functions. The most common example is a parser combinator, which is a simple way to create inefficient and buggy parsers (ok, that’s unfair, but I couldn’t resist; I’ve had some bad experiences trying to scale up parser combinators—truth is, they are super nice to work with, at least until things go wrong).
Anyway, a typical parser combinator library would begin with a primitive like the following:
fn expect(c: char) -> fn@(&mut ParseState) -> Result<(), Err> { ... }
// ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Note that the function returns a closure. We used fn@
because this
closure must be allocated on some heap in order for us to return it,
and because using the type fn@
(vs say fn~
) would allow us to
close over managed and other task-local data. So far, I think this
example works out fine.
Where things get more complex is if we want to close over borrowed
pointers. For example, imagine an expect
function that takes a
slice:
fn expect_string<'a>(s: &'a str)
-> fn@:'a(&mut ParseState) -> Result<(), Err> {...}
// ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here the type system will require that the lifetime 'a
of the input
slice s
appear in the resulting function type, so that it can be
sure that the function is not used after the slice is no longer valid.
This makes the type more complicated: fn@:'a
(vs the
also-not-especially-intuitive notation of @'a fn
today).
Of course, one could address this problem by having expect_string
take a ~str
or @str
instead of a borrowed string, but in some use
cases borrowed pointers may perfect sense. For example, I had once
thought to use this pattern to create a combinator library for
expressing iteration primitives like enumerate
and so forth
(similar, experimental work is now underway in the iter
module).
Interestingly, just as with sendable const closures, objects and traits can provide an alternative that is ultimately (I think) a better and more readable design anyway. We could rewrite the return type from a closure into a trait:
trait Parser<R> {
fn parse(&mut ParseState) -> Result<R,Error>;
}
fn expect(c: char) -> @Parser<()>;
fn expect_string<'a>(s: &'a str) -> @Parser:'a<()>;
Here in the expect_string
case I have taken advantage of the fact
that object types will also carry bounds similar to closure types. An
advantage of this design is that using a trait allows the Parser
objects to carry more methods as well.
If we were to extend the example to include an actual combinator, I imagine it would look something like this:
fn or<'a, R>(p1: @Parser:'a<R>, p2: @Parser:'a<R>) -> @Parser:'a<R> {...}
Of course, for maximum efficiency, one would avoid using object types
altogether. Then you would just implement Parser
directly on
the char
and &str
types, and perhaps write the or
combinator
like so:
struct or<P1,P2>(P1, P2);
impl<R,P1:Parser<R>,P2:Parser<R>> Parser for or<P1,P2> {
fn parse(&self, state: &mut ParseState) -> Result<R,Error> {
let (ref p1, ref p2) = *self;
state.try(); // (*)
match p1.parse(state) {
Ok(r) => { state.confirm(); Ok(r) }
Err(_) => { state.backtrack(); p2.parse() }
}
}
}
// (*) Here you see my imperative roots. A true functional
// programmer would not use in-place mutation here but rather
// clone and return a new parser state.
Summary
Another long post mostly targeted at rust devs and myself. Sorry about
that. I think the bottom line is that we should move sigils for
closures and have them appear after the fn
keyword. This makes me
sad, because this is how things used to be, and in fact one of the
main goals of the dynamically sized types (DST) proposal was to
move the sigils in closure types in front. But of course soundness
comes first, and I think the general wins of the DST proposal
(consistent behavior for all &T
, @T
, ~T
etc) outweigh the need
to write fn~
on occasion (I don’t really see much use for fn@
).
There is also one final solution I didn’t mention in my initial
paragraphs. We could adopt the “principled” solution of using &mut
for closures but change the way we notate &mut
. I have largely
avoided thinking about because I want to avoid destabilizing syntax
changes. However, I have toyed around occasion with an idea for
reorganizing our types to emphasize ownership and de-emphasize
mutability, which goes in this direction. I may indulge myself and
write it up at some point. Still, I largely consider this a
non-starter.
Adopting the “move sigils in back” proposal does have another
casualty, though. There has been some talk of figuring out ways to
make @
and ~
less special (as in, allowing user-defined pointer
types like RefCounted<T>
that are on equal footing). The DST
proposal is clearly a step in that direction. Moving the sigils
backwards on fn
types is, well, a step backward, because closures
would always be allocated using a limited set of allocators (stack,
~
, or @
).
In an odd way, finding this interaction makes me feel good. I’ve been
concerned that the DST proposal seemed too easy, which meant we
weren’t thinking hard enough about it. But there is another reason as
well: I have also been concerned that closure types were becoming a
bit too… special, particularly with regard to
copyability. Basically I’ve been concerned that although the syntax
for a borrowed closure was &fn
, borrowed closures didn’t really
behave like &
pointers—without the DST proposal, this was something
that we could safely enforce as part of the type system, but it’s
still confusing for users. So I think the DST proposal forces us to be
more honest, and that’s a good thing all around.