Fn types
12 July 2012
I am trying to mop up some of the remaining work for regions now. One of the big remaining areas is dealing with function and iface types. This proposal is certainly influenced by my previous proposals. However, we have backed away from the idea of dynamically-sized types for vectors and so I will do the same here.
The design
My current design includes the following kinds of function types (written as I expect them to commonly be written; some details are omitted):
&fn(S) -> T
@fn(S) -> T
~fn(S) -> T
Similar types apply to iface types (using the iface to_str
as an example):
&to_str
@to_str
~to_str
These types work like their corresponding pointer types, so a value of
type @fn(S) -> T
or ~fn(S) -> T
can be “borrowed” into a &fn(S) -> T
. (Today, there is a subtyping relationship)
Similarly, the explicit syntax for creating closures will become
@fn(s: S) -> T { ... }
and ~fn(s: S) -> T { ... }
. I will make it
legal to omit the argument and return types and infer them from
context. The closures produced from such expressions are always
copying closures, in that they copy or move values from their
environment when created.
The sugared notation |s| expr
as well as the do
and for
loop
notation will use inference to decide which of the three kinds of
closures is produced. If a sugared closure resolves to a &fn
, then
a “by-reference” closure will be produced. That is, one which can
modify the values in its enclosing environment. Otherwise, a copying
closure is created. (As today)
Borrowed closures
Borrowed closures in their most explicit form are written &r/fn(S) -> T
. They are like fn()
today but they are associated with a region
r
(the notation for region pointers is changed from &r.T
to
&r/T
, for those of you following along). The region type system
will ensure they do not escape, so we can lift the various
restrictions about using borrowed closures only within parameters and
so forth.
This will require some tweaks to various analyses (notably borrowck) that take advantage of the limited possibilities of the current by-reference closures. I don’t think this will be terribly difficult.
Including an explicit region also enables a lot of very useful things. Closures which reference their environment can be stored in other data structures. Further, we should be able to make use of the region bound to allow borrowed closures to deinitialize variables in their environment—see below for more details.
Shared closures
Shared closures, written like @fn(S) -> T
, are exactly like the
fn@(S) -> T
that we use today. They consist of a shared, boxed
environment which may close over arbitrary data. They cannot close
over borrowed pointers.
Unique closures
Unique closures in their most explicit form are written like ~fn:K(S) -> T
, where K
is a set of bounds like those found on type
parameters. If K
is omitted it defaults to send
. They cannot
close over borrowed pointers and, furthermore, all data that they do
close over must conform to the given bound. I expect it will rarely
be necessary to write an explicit bound, but it may be useful
sometimes to write ~fn:send copy(S) -> T
, to indicate that the data
that is closed over is not only sendable but also copyable. Unlike
all other closure types, unique closures are themselves sendable
provided that send
is listed among the bounds K
.
Representation, bare functions
The representation of function pointers will remain a pair (code, env)
of pointers. This is not as pure as my prior proposals but has
some advantages, not the least of which being that it will require
less code churn to achieve. It should also make bare functions more
efficiently represented.
The bare function type goes away entirely, to be replaced with an inference technique similar to what Lindsey did with integer literal inference.
Preventing capture
There is one annoying complication. As they have no region bounds, we
need to prevent the @fn()
and ~fn()
types from closing over
borrowed pointers. I plan to do this by introducing a new kind
owned
which means “copyable and does not contain borrowed pointers”.
Functions can only close over owned data. The reason that a kind is
necessary is due to the possibility of functions like the following:
iface foo { fn foo(); }
fn generator<A: copy foo>(a: A) -> @fn() {
@fn() { a.foo() }
}
This function closes over a
. So now if we invoked generator()
with A
bound to a type like &int
, it would close over region data.
Any time the returned closure is used, it would make use of that
region data. Because the region does not appear in the type of the
closure (@fn()
), the type system would be powerless to prevent this
borrowed pointer from leaking out beyond its lifetime. Bad.
In my current design, the function above would be illegal. One would have to write:
iface foo { fn foo(); }
fn generator<A: owned foo>(a: A) -> @fn() {
@fn() { a.foo() }
}
Note that the bound owned
replaced copy
. Without this bound it
would not be legal to copy a
into the closure. This bound also
prevents generator()
from being called with a borrowed pointer.
This is actually a simplified approach. Another option might be to
add region bounds on all fn
types. We could then allow fn
types
to close over borrowed pointers. I hestitate to do this though
because I think that the region bounds on @fn
and ~fn
would want
different defaults than all other region references. For example, all
region references appearing in a function signature default to a
region parameter, including those that appear within shared boxes and
so forth (this will not be a common occurrence, I think, but of course
it can happen). So if I write:
fn some_func(x: &int, y: @&int) { ... }
then both x
and *y
have the same lifetime (the default lifetime
parameter). If we were consistent, then, it would mean that a function
like this:
fn some_func(x: &int, f: @fn(int)) { ... }
would imply that the lifetime of x
and the region bound of f
would
both be the same as well (the default lifetime parameter). But this
is likely not what you wanted; probably you planned to store f
into
a data structure or something and actually wanted a region bound of
&static
(that is, does not close over region data).
Basically, the difference between an @fn
type and other shared types
with regard to regions is that we never know what an @fn
closes over.
With a normal shared type, we only add region bounds when regions are
actually used: but for @fn
we’d have to choose a different default.
Maybe this is not so bad, but even if we added region bounds to @fn
,
we’d still need the owned
kind, or else we’d need to be able to
add region bounds to type variables. This all seems like layers and
layers of complexity that won’t be necessary, as a @fn
type that
closes over borrowed pointers isn’t all that useful (there are places
I can imagine it being useful, though).
Anyway this part is actually easily tweaked if it seems more expressiveness is warranted.
Goals that are achieved by this proposal
Stack-allocated and sendable iface instances. I’ve wanted this for a long time.
More flexibility with stack-allocated closures. The current requirement that all closures which go into data structures be heap-allocated is sometimes a real drag. It means that things like the visitor pattern we implement in rustc can’t make use of borrowed pointers—though this would also be ameliorated with traits, I suppose. But anyhow.
Prefix-sigils. Postfix sigils like fn@
are so Rust 0.2.
Support for sendable, copyable unique closures. Now that send
no
longer implies copy
, we need to distinguish these things.
Fewer closure types. We currently have way too many (5, I think). With this proposal there are still three, but they look and feel like one. Mostly this is achieved through better inference techniques for native functions and eliminating the “any closure” type (which, admittedly, I introduced).
Goals that might be achieved by this proposal
Moving within closures. Actually I only think this can be done
safely. This is a fairly complex topic. The basic goal is to be
able to write something like this fold()
operation (here I am using
some non-implemented things, like unary move, in place of the current
<-
and last use):
fn fold<A,B>(iter: &iterable<A>, b0: B, op: &fn(B, &A) -> B) -> B {
let b = move b0;
for iter.each |a| {
b = op(move b, a);
}
ret b;
}
The interesting part is that we move the intermediate value b
out of the stack frame in the call to op()
and then replace it
with a new value before the next iteration of the loop. Note that the
loop body here is actually a closure.
This may seem trivially safe, but given the possibility of recursion
it is not so. We need to prove that the closure never recurses
while some upvars are uninitialized. I think regions give us the
tools to do that: we can see that the region of op
is bigger than
the method body itself (it’s a region parameter, in fact). This
implies that op()
cannot reach the closure which is executing,
which is defined with a tighter region. However, there are some
complications I haven’t thought through—for example, what if we
provided the closure as an argument to op()
? This will turn out
being quite complex in the end I fear. Maybe there is a better way.
Goals that are not achieved by this proposal
One-shot closures. We still have no support for one-shot closures.
It is becoming increasingly clear that these would be helpful: it is
common to want to move some value into the closure and then move it
out again, and the current unique closure design does not permit this.
You can work around it with a series of painful swaps and calls to
option::unwrap()
, but it’s inefficient and inelegant.
pcwalton ran an idea by me which basically converted today’s fn~
into a one-shot closure: if you wanted to call the closure mutiple
times, you had to copy it. This has a certain appeal but it doesn’t
work with the system as it is today nor the system I proposed just
now, because fn~
(at least today) can be upcast to fn
. We’d have
to make fn~
not borrowable.
I was initially enthusiastic about this idea but am now less so. It
seems to be a strike against orthogonality: ~fn()
types would behave
differently from all other closure types. I’d rather give them a
different name, like procedures or even ~fn1()
, to make the
distinction more clear. Not sure what is best here.