Rust RFC: Opt-in builtin traits
28 February 2014
In today’s Rust, there are a number of builtin traits (sometimes
called “kinds”): Send
, Freeze
, Share
, and Pod
(in the future,
perhaps Sized
). These are expressed as traits, but they are quite
unlike other traits in certain ways. One way is that they do not have
any methods; instead, implementing a trait like Freeze
indicates
that the type has certain properties (defined below). The biggest
difference, though, is that these traits are not implemented manually
by users. Instead, the compiler decides automatically whether or not a
type implements them based on the contents of the type.
In this proposal, I argue to change this system and instead have users
manually implement the builtin traits for new types that they define.
Naturally there would be #[deriving]
options as well for
convenience. The compiler’s rules (e.g., that a sendable value cannot
reach a non-sendable value) would still be enforced, but at the point
where a builtin trait is explicitly implemented, rather than being
automatically deduced.
There are a couple of reasons to make this change:
- Consistency. All other traits are opt-in, including very common
traits like
Eq
andClone
. It is somewhat surprising that the builtin traits act differently. - API Stability. The builtin traits that are implemented by a type are really part of its public API, but unlike other similar things they are not declared. This means that seemingly innocent changes to the definition of a type can easily break downstream users. For example, imagine a type that changes from POD to non-POD – suddenly, all references to instances of that type go from copies to moves. Similarly, a type that goes from sendable to non-sendable can no longer be used as a message. By opting in to being POD (or sendable, etc), library authors make explicit what properties they expect to maintain, and which they do not.
- Pedagogy. Many users find the distinction between pod types (which copy) and linear types (which move) to be surprising. Making pod-ness opt-in would help to ease this confusion.
- Safety and correctness. In the presence of unsafe code, compiler inference is unsound, and it is unfortunate that users must remember to “opt out” from inapplicable kinds. There are also concerns about future compatibility. Even in safe code, it can also be useful to impose additional usage constriants beyond those strictly required for type soundness.
I will first cover the existing builtin traits and define what they are used for. I will then explain each of the above reasons in more detail. Finally, I’ll give some syntax examples.
The builtin traits
We currently define the following builtin traits:
Send
– a type that deeply owns all its contents. (Examples:int
,~int
, not&int
)Freeze
– a type which is deeply immutable when accessed via an&T
reference. (Examples:int
,~int
,&int
,&mut int
, notCell<int>
orAtomic<int>
)Pod
– “plain old data” which can be safely copied via memcpy. (Examples:int
,&int
, not~int
or&mut int
)
We are in the process of adding an additional trait:
Share
– a type which is threadsafe when accessed via an&T
reference. (Examples:int
,~int
,&int
,&mut int
,Atomic<int>
, notCell<int>
)
Proposed syntax
Under this proposal, for a struct or enum to be considered send, freeze, pod, etc, those traits must be explicitly implemented:
struct Foo { ... }
impl Send for Foo { }
impl Freeze for Foo { }
impl Pod for Foo { }
impl Share for Foo { }
For generic types, a conditional impl would be more appropriate:
enum Option<T> { Some(T), None }
impl<T:Send> Send for Option<T> { }
// etc
As usual, deriving forms would be available that would expand into impls like the one shown above.
Whenever a builtin trait is implemented, the compiler will enforce the same requirements it enforces today. Therefore, code like the following would yield an error:
struct Foo<'a> { x: &'a int }
// ERROR: Cannot implement `Send` because the field `x` has type
// `&'a int` which is not sendable.
impl<'a> Send for Foo<'a> { }
These impls would follow the usual coherence requirements. For
example, a struct can only be declared as Share
within the crate
where it is defined.
For convenience, I also propose a deriving shorthand
#[deriving(Data)]
that would implement a “package” of common traits
for types that contain simple data: Eq
, Ord
, Clone
, Show
,
Send
, Share
, Freeze
, and Pod
.
Pod and linearity
One of the most important aspects of this proposal is that the Pod
trait would be something that one “opts in” to. This means that
structs and enums would move by default unless their type is
explicitly declared to be Pod
. So, for example, the following
code would be in error:
struct Point { x: int, y: int }
...
let p = Point { x: 1, y: 2 };
let q = p; // moves p
print(p.x); // ERROR
To allow that example, one would have to impl Pod
for Point
:
struct Point { x: int, y: int }
impl Pod for Point { }
...
let p = Point { x: 1, y: 2 };
let q = p; // copies p, because Point is Pod
print(p.x); // OK
Effectively this change introduces a three step ladder for types:
- If you do nothing, your type is linear, meaning that it moves from place to place and can never be copied in any way. (We need a better name for that.)
- If you implement
Clone
, your type is cloneable, meaning that it moves from place to place, but it can be explicitly cloned. This is suitable for cases where copying is expensive. - If you implement
Pod
, your type is plain old data, meaning that it is just copied by default without the need for an explicit clone. This is suitable for small bits of data like ints or points.
What is nice about this change is that when a type is defined, the user makes an explicit choice between these three options.
Consistency
This change would bring the builtin traits more in line with other
common traits, such as Eq
and Clone
. On a historical note, this
proposal continues a trend, in that both of those operations used to
be natively implemented by the compiler as well.
API Stability
The set of builtin traits implemented by a type must be considered
part of its public inferface. At present, though, it’s quite invisible
and not under user control. If a type is changed from Pod
to
non-pod, or Send
to non-send, no error message will result until
client code attempts to use an instance of that type. In general we
have tried to avoid this sort of situation, and instead have each
declaration contain enough information to check it indepenently of its
uses. Issue #12202 describes this same concern, specifically with
respect to stability attributes.
Making opt-in explicit effectively solves this problem. It is clearly
written out which traits a type is expected to fulfill, and if the
type is changed in such a way as to violate one of these traits, an
error will be reported at the impl
site (or #[deriving]
declaration).
Pedagogy
When users first start with Rust, ownership and ownership transfer is
one of the first things that they must learn. This is made more
confusing by the fact that types are automatically divided into pod
and non-pod without any sort of declaration. It is not necessarily
obvious why a T
and ~T
value, which are semantically equivalent,
behave so differently by default. Makes the pod category something you
opt into means that types will all be linear by default, which can
make teaching and leaning easier.
Safety and correctness: unsafe code
For safe code, the compiler’s rules for deciding whether or not a type is sendable (and so forth) are perfectly sound. However, when unsafe code is involved, the compiler may draw the wrong conclusion. For such cases, types must opt out of the builtin traits.
In general, the opt out approach seems to be hard to reason about: many people (including myself) find it easier to think about what properties a type has than what properties it does not have, though clearly the two are logically equivalent in this binary world we programmer’s inhabit.
More concretely, opt out is dangerous because it means that types with
unsafe methods are generally wrong by default. As an example,
consider the definition of the Cell
type:
struct Cell<T> {
priv value: T
}
This is a perfectly ordinary struct, and hence the compiler would
conclude that cells are freezable (if T
is freezable) and so forth.
However, the methods attached to Cell
use unsafe magic to mutate
value
, even when the Cell
is aliased:
impl<T:Pod> Cell<T> {
pub fn set(&self, value: T) {
unsafe {
*cast::transmute_mut(&self.value) = value
}
}
}
To accommodate this, we currently use marker types – special types
known to the compiler which are considered nonpod and so forth. Therefore,
the full definition of Cell
is in fact:
pub struct Cell<T> {
priv value: T,
priv marker1: marker::InvariantType<T>,
priv marker2: marker::NoFreeze,
}
Note the two markers. The first, marker1
, is a hint to the variance
engine indicating that the type Cell
must be invariant with respect
to its type argument. The second, marker2
, indicates that Cell
is
non-freeze. This then informs the compiler that the referent of a
&Cell<T>
can’t be considered immutable. The problem here is that, if
you don’t know to opt-out, you’ll wind up with a type definition that
is unsafe.
This argument is rather weakened by the continued necessity of a
marker::InvariantType
marker. This could be read as an argument
towards explicit variance. However, I think that in this particular
case, the better solution is to introduce the Mut<T>
type described
in #12577 – the Mut<T>
type would give us the invariance.
Using Mut<T>
brings us back to a world where any type that uses
Mut<T>
to obtain interior mutability is correct by default, at least
with respect to the builtin kinds. Types like Atomic<T>
and
Volatile<T>
, which guarantee data race freedom, would therefore have
to opt in to the Share
kind, and types like Cell<T>
would simply
do nothing.
Safety and correctness: future compatibility
Another concern about having the compiler automatically infer membership into builtin bounds is that we may find cause to add new bounds in the future. In that case, existing Rust code which uses unsafe methods might be inferred incorrectly, because it would not know to opt out of those future bounds. Therefore, any future bounds will have to be opt out anyway, so perhaps it is best to be consistent from the start.
Safety and correctness: semantic constraints
Even if type safety is maintained, some types ought not to be copied
for semantic reasons. An example from the compiler is the
Datum<Rvalue>
type, which is used in code generation to represent
the computed result of an rvalue expression. At present, the type
Rvalue
implements a (empty) destructor – the sole purpose of this
destructor is to ensure that datums are not consumed more than once,
because this would likely correspond to a code gen bug, as it would
mean that the result of the expression evaluation is consumed more
than once. Another example might be a newtype’d integer used for
indexing into a thread-local array: such a value ought not to be
sendable. And so forth. Using marker types for these kinds of
situations, or empty destructors, is very awkward. Under this
proposal, users needs merely refrain from implementing the relevant
traits.
The Sized
bound
In DST, we plan to add a Sized
bound. I do not feel like users
should manually implemented Sized
. It seems tedious and rather
ludicrous.
Counterarguments
The downsides of this proposal are:
There is some annotation burden. I had intended to gather statistics to try and measure this but have not had the time.
If a library forgets to implement all the relevant traits for a type, there is little recourse for users of that library beyond pull requests to the original repository. This is already true with traits like
Eq
andOrd
. However, as SiegeLord noted on IRC, that you can often work around the absence ofEq
with a newtype wrapper, but this is not true if a type fails to implementSend
orPod
. This danger (forgetting to implement traits) is essentially the counterbalance to the “forward compatbility” case made above: where implementing traits by default means types may implement too much, forcing explicit opt in means types may implement too little. One way to mitigate this problem would be to have a lint for when an impl of some kind (etc) would be legal, but isn’t implemented, at least for publicly exported types in library crates.