Virtual Structs Part 1: Where Rust's enum shines
5 May 2015
One priority for Rust after 1.0 is going to be incorporating some kind of support for “efficient inheritance” or “virtual structs”. In order to motivate and explain this design, I am writing a series of blog posts examining how Rust’s current abstractions compare with those found in other languages.
The way I see it, the topic of “virtual structs” has always had two somewhat orthogonal components to it. The first component is a question of how we can generalize and extend Rust enums to cover more scenarios. The second component is integrating virtual dispatch into this picture.
I am going to start the series by focusing on the question of
extending enums. This first post will cover some of the strengths of
the current Rust enum
design; the next post, which I’ll publish
later this week, will describe some of the advantages of a more
“class-based” approach. Then I’ll discuss how we can bring those two
worlds together. After that, I will turn to virtual dispatch, impls,
and matching, and show how they interact.
The Rust enum
I don’t know about you, but when I work with C++, I find that the
first thing that I miss is the Rust enum
. Usually what happens is
that I start out with some innocent-looking C++ enum, like
ErrorCode
:
enum ErrorCode {
FileNotFound,
UnexpectedChar
};
ErrorCode parse_file(String file_name);
As I evolve the code, I find that, in some error cases, I want to
return some additional information. For example, when I return
UnexpectedChar
, maybe I want to indicate what character I saw, and
what characters I expected. Because this data isn’t the same for all
errors, now I’m kind of stuck. I can make a struct, but it has these
extra fields that are only sometimes relevant, which is awkward:
struct Error {
ErrorCode code;
// only relevant if UnexpectedChar:
Vector<char> expected; // possible expected characters
char found;
};
This solution is annoying since I have to come up with values for all
these fields, even when they’re not relevant. In this case, for
example, I have to create an empty vector and so forth. And of course
I have to make sure not to read those fields without checking what
kind of error I have first. And it’s wasteful of memory to boot. (I
could use a union
, but that is kind of a mess of its own.) All in
all, not very good.
One more structured solution is to go to a full-blown class hierarchy:
enum ErrorCode {
FileNotFound,
UnexpectedChar
};
class Error {
public:
Error(ErrorCode ec) : errorCode(ec) { }
const ErrorCode errorCode;
};
class FileNotFoundError : public Error {
public:
FileNotFound() : Error(FileNotFound);
};
class UnexpectedChar : public ErrorCode {
public:
UnexpectedChar(char expected, char found)
: Error(UnexpectedChar),
expected(expected),
found(found)
{ }
const char expected;
const char found;
};
In many ways, this is pretty nice, but there is a problem (besides the
verbosity, I mean). I can’t just pass around Error
instances by
value, because the size of the Error
will vary depending on what
kind of error it is. So I need dynamic allocation. So I can change my
parse_file
routine to something like:
unique_ptr<Error> parse_file(...);
Of course, now I’ve wound up with a lot more code, and mandatory memory allocation, for something that doesn’t really seem all that complicated.
Rust to the rescue
Of course, Rust enums make this sort of thing easy. I can start out with a simple enum as before:
enum ErrorCode {
FileNotFound,
UnexpectedChar
}
fn parse_file(file_name: String) -> ErrorCode;
Then I can simply modify it so that the variants carry data:
enum ErrorCode {
FileNotFound,
UnexpectedChar { expected: Vec<String>, found: char }
}
fn parse_file(file_name: String) -> ErrorCode;
And nothing really has to change. I only have to supply values for
those fields when I construct an instance of UnexpectedChar
, and I
only read the values when I match a given error. But most importantly,
I don’t have to do dummy allocations: the size of ErrorCode
is
automatically the size of the largest variant, so I get the benefits
of the a union
in C but without the mess and risk.
What makes Rust and C++ behave differently?
So why does this example work so much more smoothly with a Rust enum than a C++ class hierarchy? The most obvious difference is that Rust’s enum syntax allows us to compactly declare all the variants in one place, and of course we enjoy the benefits of match syntax. Such “creature comforts” are very nice, but that is not what I’m really talking about in this post. (For example, Scala is an example of a language that offers great syntactic support for using “classes as variants”; but that doesn’t change the fundamental tradeoffs involved.)
To me, the key difference between Rust and C++ is the size of the
ErrorCode
types. In Rust, the size of an ErrorCode
instance is
equal to the maximum of its variants, which means that we can pass
errors around by value and know that we have enough space to store any
kind of error. In contrast, when using classes in C++, the size of an
ErrorCode
instance will vary, depending on what specific variance
it is. This is why I must pass around errors using a pointer, since
I don’t know how much space I need up front. (Well, actually, C++
doesn’t require you to pass around values by pointer: but if you
don’t, you wind up with object slicing, which can be a particularly
surprising sort of error. In Rust, we have the notion of DST to
address this problem.)
Rust really relies deeply on the flat, uniform layout for
enums. For example, every time you make a nullable pointer like
Option<&T>
, you are taking advantage of the fact that options are
laid out flat in memory, whether they are None
or Some
. (In Scala,
for example, creating a Some
variant requires allocating an object.)
Preview of the next few posts
OK, now that I spent a lot of time telling you why enums are great and subclassing is terrible, my next post is going to tell you why I think suclassing is sometimes fantastic and enums kind of annoying.
Caveat
I’m well aware I’m picking on C++ a bit unfairly. For example, perhaps
instead of writing up my own little class hierarchy, I should be using
boost::any
or something like that. Because C++ is such an extensible
language, you can definitely construct a class hierarchy that gives
you similar advantages to what Rust enums offer. Heck, you could just
write a carefully constructed wrapper around a C union
to get what
you want. But I’m really focused here on contrasting the kind of “core
abstractions” that the language offers for handling variants with
data, which in Rust’s case is (currently) enums, and in C++’s case is
subtyping and classes.