Intermingled parameter lists, take 2
4 November 2013
I got a lot of feedback on my post about intermingled parameter lists – most of it negative – and I’ve been thinking about the issue over the weekend. Truth is, I wasn’t terribly fond of the proposal myself – making the position in the list significant feels wrong – but I felt it was the least bad of the various options. However, I’ve had a change of heart, and thus have a new “least bad” proposal.
The high-level summary of the plan is as follows:
- Parameters lists are not intermingled, instead we begin with some number of lifetime parameters and then some number of type parameters, as today.
- Lifetime parameters on fn items are considered late bound unless they appear within a type bound.
- When referencing a fn item (or any generic path), users would have the option of either (1) supplying values for no parameters at all; (2) only supplying values for type parameters; or (3) supplying values for both lifetime and type parameters.
- Users may always use
_
to have the compiler select a suitable default for the value of some type or type/lifetime parameter. This is equivalent to not supplying a value at all.
This plan is fully backwards compatible (all existing code retains its existing meaning).
Aside #1: Some good arguments were advanced in favor of making all lifetime parameters late bound and relying instead on higher-kinded types (HKT) or some kind of explicit universal quantification syntax instead. While both things – and in particular, HKT – might be nice to have, I don’t think that they fully eliminate the need for early/late binding, as I will explain later.
Aside #2: I’d like to find a better name for early and late binding. If there is a standard term for this sort of distinction, I am not aware of it. I suspect this is because most languages whcih feature universally quantified types support quantification over other types – it is much cleaner from a pure type system point-of-view – and thus they do not have to discriminate between early and late binding. The situation is also related to let polymorphism and the distinction between type schemes and types in ML or Haskell, so perhaps there is a related term I can borrow from there. It occurs to me that it’s also worth digging into the literature about type erasure, as this distinction arises at least partly due to our use of monomorphization as an implementation scheme. Anyway, suggestions for better names or else other reading material welcome.
Prelude: How much is this likely to impact people?
I want to point out that the distinction between early and late bound lifetime parameters won’t matter so much in regular usage. In particular, the only time that the sort of binding comes into play is when functions are used as values.
Example 1: A late-bound lifetime parameter
Let’s begin with the get_index
early from my earlier post:
fn get_index<'l, T>(v: &'l [T], index: uint) -> &'l T {
&v[index]
}
In this case, the lifetime 'l
would be late-bound, because it does
not appear in any type bounds (in fact, there are no type bounds at
all).
As today, when users take the value of get_index
, there is no need
to specify explicit values for any of the parameters. This simply
means that the compiler will use inference to find the values it
needs.
let a = get_index; // use inference for everything
a([3]);
In this case, we would infer that the type T
should be int
, and
hence that the type of a
is <'l> fn(&'l [int], uint) -> &'l int
.
Note that the type of a
is generic with respect to the lifetime
'l
. That is, we could call a
many times, supply slices with
different lifetimes, and each time the return value would have the
same lifetime as the input.
The proposal also adds the _
specification for types/lifetimes,
which makes it possible to make the defaults explicit:
let a = get_index; // implicitly use inference for everything
let c = get_index::<_>; // explicitly use inference for types,
// implicitly use inference for lifetimes
let b = get_index::<_, _>; // explicitly use inference for everything
As you can see, users can choose between supplying values for all
parameters (b
) or just for type parameters (c
). The compiler uses
the number of values provided to determine the distinction.
Thanks to _
, it is also possible to specify values for just some
of the parameters. If we did not want to rely on inference to
determine that T
was int
but instead wanted to specify it
manually, we might write:
let d = get_index::<_, int>; // just specify types
let e = get_index::<int>; // equivalent to above
In all of these previous cases, no concrete value was given for the
lifetime parameter 'l
. Because 'l
is late-bound, this results in a
final fn type that is generic with respect to 'l
. If we wish,
though, we could specify a specific lifetime as well:
let f = get_index::<'static, int>; // specify all values manually
f
would have the type fn(&'static [int], uint) -> &'static int
.
Unlike the other examples, f
could only be used with slices of
static lifetime.
An aside: the subtyping rules for universally quantified functions
also specify that a generic type like <'a> fn(&'a int)
is a subtype
of a precise type fn(&'static int)
, exactly because we could
transform the former into the latter by substituting 'static
for
'a
. Simon Peyton-Jones et al. have written a
wonderfully readable, but rather long, paper on this topic if
you’d like to read more.
Example 2: An early-bound lifetime parameter
I’m going to use a slightly different example of an early-bound
lifetime parameter than I gave last time, so that I can explain things
a bit more clearly. Let’s turn to a familiar friend, the Iterator
trait (amazing how iteration manages to be simultaneously the most
pedestrian topic imaginable but also so rich with respect to exposing
weaknesses in type systems):
trait Iterator<A> {
fn next(&mut self) -> Option<A>;
}
Now, imagine I want to write a function that iterates over some values and pulls out the first meeting some criteria or other:
fn first<'a, I:Iterator<&'a Value>>(i: I) -> Option<&'a Value> {
for f in i {
if some_condition_or_other_holds(f) {
return Some(f);
}
}
return None;
}
Unlike the previous case, the lifetime parameter 'a
appears within
the type bound for I
, which means that it is early bound. In case
you forgot from my previous post, this means that we cannot wait until
the fn is called to decide what lifetime to use for 'a
, but
rather must choose one immediately.
Review: The reason for this is that, because we don’t support types
that are universally quantified over other types, we must select
concrete values for all type parameters in order to create the type
for (some specific reference to) first
. We will then need to check
that the value we chose for I
meets the bound Iterator<&'a Value>
,
which implies that we must know a specific value for 'a
. (Some
commentors argued that HKT or skolemization provides an
alternative; I disagree, as I’ll explain below.)
In any case, all this machinery is mostly invisible when you are using
Rust because the compiler provides defaults. Under this proposal,
all of these different ways of referring to first
would be equivalent:
let a = first; // all implicit
let b = first::<_>; // types explicit, lifetimes implicit
let c = first::<_,_>; // all explicit
The catch though is that, because 'a
is early bound, the type of
first is not going to be universally quantified over 'a
, but rather
it’ll refer to some concrete lifetime. In this case, the type would be
fn(Foo) -> Option<&'Bar>
where Foo
and 'Bar
are the values we
infer for I
and 'a
respectively.
As before, it is also an option to partially specify either the lifetime or the types. For example, I could do:
let d = first::<'static, _>; // Don't specify iterator type, just lifetime
let e = first::<_, Foo>; // Just specify iterator type
Can we avoid distinguishing early- and late-bound lifetime parameters at all?
It was proposed that we could use HKT to avoid the need for
early-bound lifetime parameters at all. Unfortunately, I don’t think
this works. The basic idea would be to replace any type parameter that
uses a lifetime with a higher-kinded type parameter instead, so that
we don’t have to reference the lifetime parameter from within the type
bound. This is easier to explain by example. The function first
that
I showed before was defined like:
fn first<'a, I:Iterator<&'a Value>>(i: I)
-> Option<&'a Value> {...}
Under this HKT-based approach, it would instead be defined as follows (I’ve highlighted the key differences):
fn first<'a, I<'b>:Iterator<&'b Value>>(i: I<'a>)
/* ^~~~~ ^~~~~ */
-> Option<&'a Value> {...}
You can see that by making I
take a lifetime parameter, I’ve severed
the inherent connection between the bound of I
and 'a
. Instead,
that connection is recreated in the parameter list with I<'a>
.
I’ve done a lot of hand-waving in this example. I think that Haskell’s
notion of HKT’s is significantly more restrictive than this, in that
it is sensitive to ordering, and one would be expected to provide
bounds for I
that are of kind LT -> *
(note that Iterator
alone
is of kind * -> *
, so I guess you’d need an adapter). There are good
reasons for these restrictions, it makes inference and type class
matching much more tractable. But let’s leave all that aside for the
moment (besides, I need to inform myself more deeply on this topic,
it’s quite possible that I am out of date) and assume the example
worked as written, as there is a more fundamental problem.
The fundamental problem is that for this approach to work, the
Iterator
protocol must be implemented in a way that is completely
generic with respect to the lifetime. But this may not apply to all
types. This is because we must validate I
against Iterator
without
knowing what lifetimes it may be applied to. For example, imagine I
had a type Alphabet
that permitted iteration over some pre-defined,
static set of values:
struct Alphabet { index: uint }
static v0: Value = ...;
static v1: Value = ...;
impl Iterator<&'static Value> for Alphabet {
fn next(&mut self) -> Option<&'static value> {
let i = self.index;
self.index += 1;
match i {
0 => Some(&v0),
1 => Some(&v1),
_ => None,
}
}
}
This is an example of an impl that is not defined generally with
respect to all lifetimes. Rather, it is defined with respect to one
specific lifetime. For this reason, Alphabet
would not
(necessarily, anyhow) be usable as a binding for the type parameter
I
in our example. This is a special case of the phenomenon that
Simon Peyton-Jones and Mark Jones describe in section 2.2
(“Overloading with constrained parameters”) of their paper covering
the design space for multi-parameter type classes. In short,
higher-kinded types are only applicable for a type-class when you can
be sure that every instance of the type-class will be applicable to
any type (resp. lifetime).
That said, in this specific case, as the data is 'static
, and
'static
exceeds all other lifetimes, we could have created an impl
that is generic with respect to any impl. This would not always be the
case, though. For example, imagine a trait like:
trait<'a> PipelineSink {
fn consume(x: Data<'a>);
}
One example of a PipelineSink
might be one that sends messages to
another task. Such messages would have to include only 'static
data
to be sendable.
struct TaskSender { chan: Channel<Data<'static>> }
impl PipelineSink<'static> for TaskSender {
fn consume(x: Data<'static>) {
self.chan.send(x);
}
}
In this case, it would not be possible to create a generic impl
.
Detailed rules
The more detailed rules for my proposal are as follows:
- Parameter lists consists of some number of lifetimes and some number of types (i.e., the two are not intermingled, for now anyway).
- Lifetime parameters on fns or methods that appear within a type bound are considered early bound. Those that do not are considered late bound.
- Whenever a type- or lifetime-parameterized path is referenced, users
have the option of either:
- supplying values for all parameters, both lifetime and type; or,
- supplying values for only the type parameters, in which case defaults/inference will be used for the lifetimes; or,
- supplying no values at all, in which case defaults/inference will be used for all parameteters.
- Defaults, whether explicit or implicit, are only permitted in the
following contexts:
- In a function signature, values for lifetimes parameters may be
defaulted. All such lifetimes will occur in types or trait
references, and hence are early bound. They are replaced by
anonymous bound lifetime parameters as today.
- If these missing values appear in the list of arguments, the resulting parameters will be late bound. If the missing values appear in within a type bound, the resulting parameters will be early bound.
- In a function body, values for types may be defaulted. They will be replaced with fresh type variable.
- In a function body, values for lifetimes may be defaulted.
- If the lifetime parameter is early bound, a fresh lifetime variable is substituted.
- If the lifetime parameter is late bound, then it must appear as a parameter to a fn item, and results in a bound parameter scoped to the resulting fn type.
- In a function signature, values for lifetimes parameters may be
defaulted. All such lifetimes will occur in types or trait
references, and hence are early bound. They are replaced by
anonymous bound lifetime parameters as today.