Thoughts on DST, Part 4
2 December 2013
Over the Thanksgiving break I’ve been devoting a lot of time to thinking about DST and Rust’s approach to vector and object types. As before, this is very much still churning in my mind so I’m just going to toss out some semi-structured thoughts.
Brief recap
Treating vectors like any other container. Some time back, I wrote up a post about how we could treat vectors like any other container, which would (to some extent) avoid the need for DST.
Dynamically sized types (DST). In Part 1 of the series, I
sketched out how “Dynamically Sized Types” might work. In that scheme,
[T]
is interpreted as an existential type like
exists N. [T, ..N]
, and Trait
is interpreted as exists T:Trait. T
. The type system ensures that DSTs always appear behind
one of the builtin pointer types, and those pointer types become fat
pointers:
- Advantage. Impls for objects and vectors work really well.
- Disadvantage. Hard to square with user-defined smart pointers
like
RC<[int]>
. The problem is worse than I presented in that post, I’ll elaborate a bit more.
Statically sized types (SST). In Part 2 of the series, I
sketched out an alternative scheme that I later dubbed “Statically
Sized Types”. In this scheme, in some ways similar to today, [T]
and Trait
are not themselves types, but rather shorthands for
existential types where the exists
qualifier is moved outside the
smart pointer. For example, ~[T]
becomes exists N. ~[T, ..N]
. The
scheme does not involve fat pointers; rather, the existential type
carries the length, and the thin pointer is embedded within the
existential type.
- Advantage. It is easy to create a type like
RC<[int]>
from an existingRC<[int, ..N]>
(and, similarly, anRC<Trait>
from an existingRC<T>
). - Disadvantage. Incompatible with monomorphization except via virtual calls. I described part of the problem in Part 3 of the series. I’ll elaborate a bit more here.
Where does that leave us?
So, basically, we are left with two flawed schemes. In this post I just want to elaborate on some of the thoughts I had over Thanksgiving. Roughly speaking they are three:
- DST and smart pointer interaction is even less smooth than I thought,
but workable for
RC
at least. - SSTs, vectors, and smart pointers are just plain unworkable.
- SSTs, objects, and smart pointers work out reasonable well.
At the end, I suggest two plausible solutions that seem workable to me at this point:
- DST, after elaborating more examples to see whether they work;
- Treat vectors like any other container combined with SST for object types.
Making DST work with RC requires some contortions
In part 1, I gave the example of how we could adapt an RC
type to use smart pointers. I defined the RC
type as followings:
struct RC<T> {
priv data: *T,
priv ref_count: uint,
}
Unfortunately, as Partick pointed out on reddit, this
simply doesn’t work. The ref count needs to be shared amongst
all clones of the RC
pointer. Embarassing. Anyway, the correct definition
for RC
is more like the following:
struct RCData<T> {
priv ref_count: uint,
priv t: T,
}
struct RC<T> {
priv data: *mut RCData<T>
}
In order to be sure that I’m not forgetting details, permit me to
sketch out roughly how an RC
implementation would look in actual
code. To start, here is the code to allocate a new RC
pointer,
based on an initial value. I’m going to allocate the memory using a
direct call to malloc, both so as to express the “maximally
customized” case and because this will be necessary later on.
impl<T> RC<T> {
pub fn new(t: T) -> RC<T> {
unsafe {
let data: *mut RCData<T> =
transmute(malloc(sizeof::<RCData<T>>()));
// Intrinsic `init` initializes memory that contains
// uninitialized data to begin with:
init(&mut *data, RCData { ref_count: 1, t: t });
RC { data: data }
}
}
}
One could dereference and clone an RC
pointer as follows:
impl<T> Deref for RC<T> {
fn deref<'a>(&'a self) -> &'a T {
unsafe { &self.data.t }
}
}
impl<T> Clone for RC<T> {
fn clone(&self) -> RC<T> {
unsafe {
self.data.ref_count += 1;
*self
}
}
}
The destructor for an RC<T>
would be written:
impl<T> Drop for RC<T> {
fn drop(&mut self) {
unsafe {
let rc = self.data.ref_count;
if rc > 1 {
self.data.ref_count = rc - 1;
return;
}
// Intrinsic `drop` that frees memory:
drop::<T>(&mut self.data.t);
free(self.data);
}
}
}
OK, everything seems reasonable. Only one problem – this whole scheme
is incompatible with DST! To see why, consider again the type
RCData
:
struct RCData<T> {
priv ref_count: uint,
priv t: T,
}
And, as you can see here, it references T
by itself, without using
any kind of pointer indirection. But for T
to be unsized, it must
always appear behind a *T
or something similar. This is precisely
the example that I showed in the section
Limitation: DSTs much appear behind a pointer in
Part 1.
Now, it turns out we could rewrite RC
to make it DST compatible.
The idea is to use the standard trick of storing the reference count
at a negative offset. Let’s write up an RC1
type that shows what I
mean:
struct RC1Header {
priv ref_count: uint,
}
struct RC1<unsized T> {
priv data: *mut T
}
In this scheme, we have a pointer data
directly to a *mut T
. This
means that the compiler could “coerce” an RC1<[int, ..3]>
into a
RC1<[int]>
by expanding data
into a fat pointer. It does have the
side-effect of makeing the code to allocate an RC
and manipulate its
ref count a bit more complex, since more pointer arithmetic is
involved.
Here is the code to allocate an RC1
instance. Hopefully it’s fairly
clear. One interesting aspect is that, for allocation, we don’t need
to accept unsized
types T
, since at allocation time the full type
is known. However, later on, we may “forget” the precise type of T
and convert it into an unsized, existential type like [U]
or
Trait
. In that case, we still need to be able to find the reference
count, even without knowing the size or alignment of T
. Therefore,
we must be conservative and do our calculations based on the maximal
possible alignment requirement for the platform.
static MAXIMAL_ALIGNMENT: uint = 16; // platform specific
impl<T> RC1<T> {
pub fn new(t: T) -> RC1<T> {
unsafe {
// We need to be able to compute size of header
// without knowing T, so be conservative:
assert!(MAXIMAL_ALIGNMENT > sizeof::<uint>());
let header_size = MAXIMAL_ALIGNMENT;
// Allocate memory for header + data.
let size = header_size + sizeof::<T>();
let alloc: *mut u8 = malloc(size) as *mut u8;
// Initialize the reference count.
let header: *mut RC1Header = alloc as *mut RC1Header;
*ref_count = 1;
// Initialize the data itself.
let data: *mut T = (alloc + header_size) as *mut T;
init(&mut *data, t);
// Construct the GC value.
RC1 { data: data }
}
}
}
Here is a helper to obtain a pointer to the ref count from an RC1
instance. Note that it is carefully written to be compatible with an
unsized T
.
impl<unsized T> RC1<T> {
fn header(&self) -> *mut RC1Header {
let data: *mut u8 = self.data as *mut u8;
let header_size = MAXIMAL_ALIGNMENT;
(data - MAXIMAL_ALIGNMENT) as *mut uint
}
}
Based on this we can rewrite deref, clone, and drop in a fairly obvious way.
All of them are compatible with unsized
types.
impl<unsized T> Deref for RC1<T> {
fn deref<'a>(&'a self) -> &'a T {
unsafe { &*self.data }
}
}
impl<unsized T> Clone for RC1<T> {
fn clone(&self) -> RC<T> {
unsafe {
self.header().ref_count += 1;
*self
}
}
}
impl<unsized T> Drop for RC1<T> {
fn drop(&mut self) {
unsafe {
let rc = self.header().ref_count;
if rc > 1 {
self.header().ref_count = rc - 1;
return;
}
// Intrinsic `drop` that frees memory:
drop::<T>(&mut *self.data);
free(self.data);
}
}
}
OK, so we can see that DST does permit RC<[int]>
, but only
barely. It makes me nervous. Is this a general enough solution to
scale to future smart pointers? It’s certainly not universal.
Why SST just doesn’t work with vector types.
The SST approach does not employ fat pointers in the same sense and
thus is largely free of the limitations on smart pointer layout that
DST imposes. But not entirely. In part 3 I described the
problem of finding the correct monomorphized instance of deref()
.
In general, this is not possible, though in many instances the
compiler could deduce that it doesn’t matter which type of pointee
deref()
is specialized to – I thus proposed that a solution might
lie in formalizing this idea by permitting a type parameter T
to be
labeled erased
, which would cause the compiler to guarantee that the
generated code will be identical no matter what type T
is
instantiated with. This seems nice, but there are many complications
in practice. Let me sketch them out.
First, it is rare that a type can be entirely erased, even in
dereference routines. For example, consider the straight-forward RC
type that I sketched out before, where the header was made explicit in
the representation, rather than being stored at a negative offset. Here is
the Deref
routine:
impl<T> Deref for RC<T> {
fn deref<'a>(&'a self) -> &'a T {
unsafe { &self.data.t }
}
}
At first, it appears that the precise type T
is irrelevant, but in
fact we must know its alignment to compute the offset of the field
t
. This precise situation is why the alternative scheme RC1
made
conservative assumptions about the alignment of t
. We could address
this, though, by manually annotating the alignment of the t
field
(something we do not yet support, but ought to in any case):
struct RCData<T> {
priv ref_count: uint,
#[alignment(maximum)]
priv t: T,
}
A deeper problem lies with the drop
routine. The destructor for an
RC<T>
needs to do three things, and in a particular order:
- Decrement ref count, returning if it is not yet zero.
- Drop the value of
T
that we encapsulate. - Drop the memory we allocated.
The tricky part is that step 2 requires knowledge of T
. I thought at
first we might be able to finesse this problem by having the
destructor run after the contained data had been freed, but that
doesn’t work because in this case the data is found at the other end
of an unsafe pointer, and the compiler doesn’t traverse that – and
worse, we don’t always want to free the T
value of an RC<T>
,
only if the ref count is zero.
Despite all the problems with Drop
, it’s possible to imagine that we
define some super hacky custom drop protocol for smart pointers that
makes this work. But that’s not enough. There are other operations
that make sense for RC<[T]>
types beyond indexing, and they have the
same problems. For example, perhaps I’d like to compare two values
of type RC<[T]>
for equality:
fn foo(x: RC<[int]>, y: RC<[int]>) {
if x == y { ... }
}
This seems reasonable, but we immediately hit the same problem: what
Eq
implementation should we use? Can Eq
be defined in an “erased”
way? Let’s not forget that Eq
is currently defined only between
instances of equal type. This winds up being basically the same
problem as drop – we can only circumvent it by adding a bunch of
specialized logic for comparing existential types.
Another problem lies in the case where the length of a vector is not
statically known. The underlying assumption of all this work is that
a type like ~[T]
corresponds to a vector whose length was once
statically known but has been forgotten. We were going to move the
“dynamic length” case to a type like Vec<T>
, that supports push()
and so on. But the idea was that Vec<T>
should be convertible to
a ~[T]
– frozen, if you will – once we were doing building it.
And that doesn’t work at all.
Finally, even if we could, we don’t want to generate those
monomorphized variants anyhow. Even if we could overcome all the
above challenges, it’s still silly to have a type like RC<[int]>
delegate to some specific destructor for [int, ..N]
for whatever
length N
it happens to be. That implies we’re generating code for
every length o the vector that occurs in practice. Not good, and DST
wouldn’t have this problem.
OK, so I hope I’ve convinced you that SST and vector types just do not mix.
Why SST could work for object types.
You’ll note I was careful not to toss out the baby with the bathwater. Although SST doesn’t work well with vector types, I think it still has potential for object types. There are a couple of crucial differences here:
- With object types, we carry a vtable, permitting us to make crucial operations – like drop – virtual calls.
- Object types like
RC<Trait>
support a much more limited set of operations:- drop;
- invoke methods offered by
Trait
.
There are many ways we could make RC<Trait>
work. Here is one
possible scheme that is maximally flexible and does not require the
notion of erased type parameters. When you cast an RC<T>
to an
RC<Trait>
, we pair it with a vtable. This vtable contains an entry
for drop and an entry for each of the methods in Trait
. These
entries are setup to take an RC<T>
as input and to handle the
dereferencing etc themselves, delegating to a monomorphic variant
specialized to T
. Let me explain by example. First let’s create a
simple trait:
trait Mobile {
fn hit_points(&self) -> int;
}
struct PC { ... }
impl Mobile for PC { ... }
struct NPC { ... }
impl Mobile for NPC { ... }
Now imagine I have a routine like:
fn interact(pc: RC<PC>, npc: RC<NPC>) {
let pc_mob: RC<Mobile> = pc as RC<Mobile>; // convert to object type
let npc_mob: RC<Mobile> = npc as RC<Mobile>; // convert to object type
}
The idea would be to package up the RC<Mobile>
with a vtable
containing adapter routines. These routines would be auto-generated by
the compiler, and would look roughly similar to:
fn RC_PC_drop(r: *RC<PC>) {
drop(*r)
}
fn RC_PC_hit_points(r: *RC<PC>) -> uint {
let pc: &PC = (*r).deref();
pc.hit_points()
}
Thus, when we convert a RC<PC>
to a RC<Player>
, we would pair the
RC
pointer with a vtable consisting of RC_PC_drop
and
RC_PC_hit_points
. There are some minor complications to work out
around the various self
pointer types, but that seems relatively
straightforward (famous last words). Anyway, the key idea here is to
specialize the vtable routines to the smart pointer type, by moving
the required deref into the generated method itself. This avoids the
need for us to ever invoke code in an erased fashion.
If we added the erased
keyword, it could still be used to permit the
reuse of these adaptor methods across distinct pointer types. But this
can also be done without a special keyword as an optimization (unlike
before, it’s not necessary for the type to be erased, merely
helpful).
Squaring the circle
I think we could maybe make DST work, but I still worry it is too magical. It has some real advantages though so perhaps the right thing is to try and elaborate more examples of smart pointer types we anticipate and see whether they can be made to work.
Another solution is to remove vectors from the language, treat them like any other container, and use the SST approach for object types. But there are lots of micro-decisions to be made there, many of which boil down to usability things. For example, what is the meaning of the literal syntax and so on? I’ll leave those thoughts for another day.