Intermingled parameter lists
29 October 2013
I’ve been hard at work finishing up work on Rust’s
“new” syntax for lifetimes – I put “new” in quotes because
the syntax has been in use for some time, but in rustc itself the
change was only half-baked. In effect, the new syntax was a kind of
“bridge” to code that was designed for the older type system. This
resulted in some artificial limitations: for example, types could only
have a single lifetime parameter, and it had to be named 'self
.
Under my pull request, these limitations are lifted. However, in the
process of implementing things, I realized one minor problem with the
new syntax that must be rectified. In this post I describe the problem
and my proposed solution.
For the impatient, the changes I propose are summarized here. These are pretty nitty-gritty changes that won’t affect most programs, but are needed to make the current type system more coherent. The changes are needed to fix issue #5121. The full motivation for these changes is somewhat subtle and is described below the fold.
- Permit lifetime and type parameters to be intermingled within a parameter list, rather than requiring all lifetime parameters to appear up front.
- Do not bring all parameters into scope as a unit but rather one by one.
- Require that, when explicit parameters are supplied, values for all parameters must be supplied, unlike today where values for lifetime parameters can always be omitted.
- Introduce
_
as a notation for an unspecified lifetime or type. It can be used in any type that appears in a fn body or signature. - For fn items, trailing lifetime parameters are considered “late bound”, meaning that their value are substituted when the fn is called rather than when it is referenced. This is contrast to type parameters, which are “early bound”. More on this point below.
- Make both type and lifetime parameters required on references to types in all contexts.
Early- vs late-bound lifetimes
In general lifetime parameters, work a lot like type parameters. For
example, consider the struct VecIndex
, which has both a lifetime
parameter 'l
and a type parameter T
:
struct VecIndex<'l, T> {
vec: &'l [T],
index: uint
}
Now if I were to write a type like VecIndex<'foo, uint>
, that is
basically equivalent to search-and-replacing 'l
with 'foo
and
T
with uint
, so wind up with a type like:
struct VecIndex {
vec: &'foo [uint],
index: uint
}
However, when it comes to functions, lifetime parameters are more
flexible than type parameters. In particular, we can often wait to
specify the value for a lifetime parameter until the function is
called. To see what I mean, consider the function get_index
, which
is again parameterized by both a lifetime 'l
and a type T
:
fn get_index<'l, T>(v: &'l [T], index: uint) -> &'l T {
&v[index]
}
Now suppose I were to call get_index
twice, supplying
two different types of vectors as inputs:
let vec1 = [1, 2, 3];
let addr1 = get_index(vec1, 1);
let vec2 = ['1', '2', '3'];
let addr2 = get_index(vec2, 1);
Although they look like the are calling the same function, these two
calls to get_index
are in fact executing completely different code
at runtime. This is because Rust uses a monomorphization scheme for
handling type parameters (similar to C++), which means that we must
create a duplicate copy of get_index
for each set of types. To put
it another way, behind the scenes every reference to get_index
must
specifiy a concrete set of type parameters (though the compiler
normally infers their values for us). We could therefore rewrite the
code example above in a more explicit way (for now, just ignore the
lifetime parameter, I will get to it):
let vec1 = [1, 2, 3];
let addr1 = get_index::<int>(vec1, 1);
let vec2 = ['1', '2', '3'];
let addr2 = get_index::<char>(vec2, 1);
Monomorphization is generally invisible but becomes visible if you
try to obtain a function pointer to get_index
:
let func = get_index::<?>; // must choose int or char here!
let vec1 = [1, 2, 3];
let addr1 = func(vec1, 1);
let vec2 = ['1', '2', '3'];
let addr2 = func(vec2, 1);
You can see that when we store get_index
into a variable, we must
specify the types it will operate over. So we can pass func
a slice
of ints or chars, but not both.
To put it another way, when you specify a closure type like |uint| -> float
, we don’t permit that type to be generic with respect to
types. That is, you can’t have a a type like <T> fn(T) -> uint
, which
would be a function that converts any type to a uint
(I am using the
new syntax that was recently proposed). Even when you
define a generic function (like fn foo<T>(t: T) -> uint
), the type
of a reference to that function at any point in time will have some
concrete type substituted for T
. I will call type parameters
early bound, meaning that we substitute concrete values for them as
early as possible.
All this time I’ve been ignoring the lifetime parameter. Because
lifetimes are erased at runtime, meaning that they do not influence
the code we generate, they don’t have to share the same limits. We do
in fact permit closures that bind lifetime names, meaning I can write
a type like <'a> fn(&'a [uint], uint) -> &'a T
(that is, a closure
that takes a slice with lifetime 'a
and returns a pointer with
lifetime 'a
– but the lifetime 'a
can be different each time the
closure is called). I will call this a late-bound lifetime
parameter, because we don’t need to substitute a specific lifetime
right away, but rather we can wait until the function is called.
Let’s revisit our get_index
in light of this distinction:
fn get_index<'l, T>(v: &'l [T], index: uint) -> &'l T {
&v[index]
}
As we said, the type parameter T
(like all type parameters) must be
early bound. We can’t actually generate code for v[index]
, for
example, without knowing T
– we wouldn’t know how big T
is and
thus how many bytes to skip over. However, for the lifetime 'l
there
is no problem. When we generate code for get_index
, it will not
matter what lifetime 'l
represents, we just generate code that takes
a slice and indexes into it – the type system guarantees us that this
dereference will not crash, but doesn’t affect the code we generate to
actually do the dereference. Therefore, 'l
can be late bound. We
could say, for example, that the type of an expression like
get_index::<int>
is <'l> fn(&'l [uint]) -> &'l uint
.
Not all lifetime parameters can be late bound. Lifetime parameters on
types, for example, are early bound. To see what I mean, consider a
struct like Foo
:
struct Foo<'a> { x: &'a int }
You cannot refer to a type Foo
without specifying what 'a
is
(well, you can write Foo
in some cases, but the compiler will
insert defaults or make use of inference to expend it to Foo<'z>
for
some lifetime 'z
).
I used to think that this then was the divide: lifetime parameters on types are early bound, lifetime parameter on fns are late bound. But I’ve realized that this is not necessarily true. Consider a function like the following:
trait Allocator<'arena> {
fn new_box(&mut self) -> &'arena mut Box;
}
fn with_alloc<'arena,A:Allocator<'arena>>(
alloc: &mut A,
...)
{
...
}
What is important here is that the type parameter A
has a bound that
references the lifetime parameter 'arena
. This means that we
cannot know A
unless we know 'arena
. Since A
is a type
parameter, and thus early bound, this implies that 'arena
must also
be early bound.
So we can see that lifetime parameters on fns may be early or late bound. Unfortunately, our current syntax offers no way to distinguish between early- and late-bound lifetimes. What’s worse, we currently require all lifetimes to go first in the list of parameters (ironically, this is because I wanted them to be able to appear within trait bounds, as shown in that last example, but I didn’t consider the full implications of that).
My proposed solution
I think we should allow type and lifetime parameters to be freely
intermixed. Moreover, we should require that when you specify values
for generic parameters, you must always specify both lifetime and type
parameters, in the proper position. To make this less onerous, we’ll
add a new specifier _
that can be used to omit a lifetime/type
parameter and have the compiler fill in a default. _
can be used to
supply a value for either a type or region parameter. Parameter names
are in scope for the bounds of all parameters that appear later in the
list.
One exception to the previous rules: on fn items, any trailing lifetimes will be considered late bound. Late-bound lifetimes may be but need not be specified on reference to a fn. If the value for a late-bound lifetime is omitted, then the lifetime becomes bound in the resulting fn type. This convention reserves room for adding late-bound lifetimes to types, once some sort of semantics is defined for that. ;)
That is pretty dense. Let me give some examples. First, our get_index
function that we saw before:
fn get_index<'l, T>(v: &'l [T], index: uint) -> &'l T {
&v[index]
}
The way this function is written, 'l
would be considered early
bound because it comes before the type parameter T
. That means
that the following code which I wrote before is now illegal:
let func = get_index::<int>;
This code is illegal because there are two early-bound parameters
('l
, T
) and the code only supplies a value for one of them. The
user could supply a named lifetime, like:
fn foo<'a>(v: &'a [int]) -> &'a int{
let func = get_index::<'a, int>;
func(v, 0)
}
Or, if they would prefer to allow the compiler to use inference,
they could just write _
:
fn foo<'a>(v: &'a [int]) -> &'a int{
let func = get_index::<_, int>;
func(v, 0)
}
In fact, they could write _
for both parameters:
fn foo<'a>(v: &'a [int]) -> &'a int{
let func = get_index::<_, _>;
func(v, 0)
}
All of these examples are equivalent.
What the user could not do, given this definition of get_index
, is to
apply func
to slices of two distinct lifetimes. For example:
fn bar<'a, 'b>(v: &'a [int], w: &'b [int]) {
let func = get_index::<_, _>;
let x: 'a int = func(v, 0); // Infers lifetime 'l to be 'a
let y: 'b int = func(w, 0); // <-- Error: 'l is 'a, not 'b
}
To achieve this, the parameter 'l
must be late bound, which means that
it must be moved to the end of the list:
fn get_index_late<T, 'l>(v: &'l [T], index: uint) -> &'l T {
&v[index]
}
Now, because 'l
is late bound, one can leave it out of the list of
parameters entirely, and simply refer to get_index_late::<int>
:
fn bar<'a, 'b>(v: &'a [int], w: &'b [int]) {
let func = get_index_late::<int>;
let x: 'a int = func(v, 0); // OK
let y: 'b int = func(w, 0); // OK
}
The type of func
here is <'l> fn(&'l [int]) -> &'l int
– that is,
'l
remains bound, and hence can be given different values when the
function is called multiple times.
If we go to our example which required an early-bound lifetime
parameter, we see that everything works out fine, because the lifetime
parameter 'arena
appears first in the list and hence is early bound:
trait Allocator<'arena> {
fn new_box(&mut self) -> &'arena mut Box;
}
fn with_alloc<'arena,A:Allocator<'arena>>(
alloc: &mut A,
...)
{
...
}
The scoping rules would prevent a function definition like the following, which attempts to reference a late-bound lifetime parameter inside of a type bound:
fn with_alloc_bad<A:Allocator<'arena>, 'arena>(
// Note that 'arena appears second ^~~~~~
alloc: &mut A,
...)
{
...
}
Note that the _
notation can be used in other type contexts as well
as a convenient shorthand. So I could write a statement like:
let v: ~[_] = vec.iter().map(...).collect();
This expression specifies that the type of v
is some kind of owned
vector, but leaves the contents unspecified (presumably they can be
inferred). Today one can either omit a type in its entirety or specify
it down the smallest detail. This feature has been independently
requested as issue #9508.
_
for types would be legal only within a function body. _
for
lifetimes is legal either within a function body or in a function
signature: in the signature, it would mean “a fresh lifetime
parameter”. (As an extension, we could potentially have _
have the
same meaning for types)
With this change, I propose that we make lifetime parameters on types
mandatory. Today they are mandatory in type signatures but not in fn
signatures or fn bodies. If omitted in those contexts, they default to
a fresh lifetime. Using this proposal, we could simply write _
instead. So code like:
struct MyMap<'a> { ... }
fn foo(x: &MyMap) { ... }
would change to:
struct MyMap<'a> { ... }
fn foo(x: &MyMap<_>) { ... }
which is arguably more consistent and clear.