Reducing DST Annotation

6 June 2013

So Ben Blum has doing some investigation into the full implications of the Sized bound that I proposed as part of the dynamically sized types post. It’s clear that, if we change nothing else, the impact of Sized will be somewhat greater than I thought. He estimates somewhere around 40% of the files in libstd need at least one Sized bound; the actual number may wind up being somewhat higher.

It is not entirely clear to me if this is a problem. I imagine that the number of Sized bounds will be highest in container and other library code. But it is worse than I hoped. So I wanted to briefly explore some of the alternatives, assuming that the Sized annotation burden is too high.

No dynamically sized types

One option of course is to keep the status quo. I feel that the current scheme where &[T] is a type but is not an instance of &T is a “wart in the making”. It’s a pain to write generic impls for universal traits like Eq etc. It’s annoying to me that I can’t borrow a ~[T] by writing &*v. The compiler is full of all kinds of special case code paths to accommodate this situation.

Patrick Walton’s thoughts about improving language integration for smart pointers work out more smoothly with DST, since it means that one can write Gc<[T]>, Rc<str>, and so forth. Admittedly, it is also plausible to only support Gc<~[T]>—certainly @mut ~[T] is a common type in Rust today, since it permits appending and growing the length.

Some elements of my original proposal don’t work out like I thought, though. For example, pcwalton and I realized that it’s better to have the “non-borrowed” forms of [T] and so forth use a different layout than the borrowed forms.

It turns out that function types just don’t fit well as DSTs, leading to alternative proposals, like the fn vs thunk post or bblum’s proposal which changes [&|@|~]fn to fn[&|@|~], and makes clever use of bounds.

Option 1: Different defaults

This may be a case where the defaults are just wrong. pcwalton proposed having all type parameters implicitly have a Sized bound unless you prefix it with unsized. So you’d write <unsized T> to get the widest possible type parameter. This seems like it might work quite well, and it’s quite simple to explain. However, I was thinking about other possible approaches.

Option 2: A limited form of inference

One thought I had was that most of these Sized bounds are really quite redundant. As a simple example, consider the function push:

fn push<T>(vec: &mut ~[T], value: T) { ... }

We already know that whenever you call a function, all of the values you supply as arguments must be sized. Moreover, we know that you can’t have a vector type ~[T] unless T is Sized. So in a lot of ways the sized bound is quite redundant: even without looking at the fn body, we can deduce that it would only be possible to invoke push with a T that is sized.

Therefore, we could have the type checker take advantage of this when checking the body of push. Essentially it would augment the declared bounds of T with Sized if it can deduce from the types of the parameters that the function could only be called with a sized type.

My feeling is that this rule would basically eliminate Sized bounds from all fns in practice, though they might still appear on type declarations (currently, we don’t allow bounds at all in type declarations, but I think we have to permit at least builtin bounds for smoother integration with deriving and also for destructors, more below).

I am somewhat concerned the rule is perhaps overly clever. It’s a bit subtle and hard to explain, particularly compared to pcwalton’s simple defaulting proposal. Also, we don’t as a rule do any inference around function signatures: the rule I propose here is not in fact a violation of this principle, since the function signature of push would not change, but it feels like a violation in spirit. We are essentially inferring, based on the types of the parameters, that push can only be called with sized types bound to T.

An aside, bounds in type declarations

Currently we do not allow bounds in generic type declarations. As bblum points out, though, it is hard to properly support deriving this way. You might think that deriving could just add Sized bounds to all the type parameters, but that is not valid for a type like Foo where T could legally be unsized:

pub struct Foo<T> {
    v: @T
}

There is another, unrelated, use case for such bounds, which is destructors. Currently we only permit destructors to be defined on Owned types. But suppose I have a generic type like ArrayMessage that is only used with owned values:

pub struct ArrayMessage<T> {
    data: ~[T]
}
pub type U8Message = ArrayMessage<u8>
pub type U16Message = ArrayMessage<u6>

If I were to define a destructor for ArrayMessage, I would need to ensure that T is owned:

impl<T:Owned> Drop for ArrayMessage<T> { ... }

But the compiler gives me an error here, because we don’t permit types to “sometimes” have destructors. In other words, would this impl imply that ArrayMessage has no destructor if instantiated with a non-owned type? Of course, we know that in practice ArrayMessage is only used with Owned arguments, but we currently have no way to tell the compiler that.

So I think we can just allow bounds in type declarations, though I am inclined to limit it to the builtin traits (what we sometimes call “kinds”). We would validate that the type of every expression meets whatever bounds are defined. If we decided to opt for my inference suggestion, then we would not want to validate arbitrary types that appear in e.g. fn parameter lists. Instead, we would leverage the fact that we know that this type cannot be instantiated without a bound to aid the inference scheme (in other words, if I have an argument of type ArrayMessage<T>, that would imply to the compiler that T must be Owned).

I am inclined to limit the bounds to builtin traits for two reasons. First, it ensures that the validation that a type meets its defined bounds is cheap: we don’t want to be doing trait resolution on every expression, even if we made more progress on memoization. Second, I think the inference scheme I discussed is only feasible for simple, builtin traits, and I don’t want to consider what would happen if we applied it more widely.

So what to do?

I’d be happy with either defaults or this inference scheme. I might prefer defaults, actually: easier to explain. And I feel like it’s ok to have “unsized” type parameters be declared differently, they will not be used as widely as normal ones. There is the potential issue that sometimes people will use a normal type parameter when an unsized type parameter could have worked, but I suspect that unsized type parameters will not be widely used, but rather will appear in particular patterns:

  • Smart pointers, and
  • Impls for “universal” traits like Eq and so forth.

In those case, the library author will quickly find out that it can’t be applied to [T].