gnome-class: Integrating Rust and the GNOME object system
2 May 2017
I recently participated in the GNOME / Rust “dev sprint” in Mexico City. (A thousand thanks to Federico and Joaquin for organizing!) While there I spent some time working on the gnome-class plugin. The goal of gnome-class was to make it easy to write GObject implementations in Rust which would fully interoperate with C code.
Roughly speaking, my goal was that you should be able to write code
that looked and felt like
Vala code, but where the
method bodies (and types, and so forth) are in Rust. The plugin is in
no way done, but I think it’s already letting you do some pretty nice
stuff. For example, this little snippet defines a Counter
class
offering two methods (add()
and get()
):
gobject_gen! {
class Counter {
struct CounterPrivate {
f: Cell<u32>
}
fn add(&self, x: u32) -> u32 {
let private = self.private();
let v = private.f.get() + x;
private.f.set(v);
v
}
fn get(&self) -> u32 {
self.private().f.get()
}
}
}
You can access these classes from Rust code in a natural way:
let c = Counter::new();
c.add(2);
c.add(20);
Under the hood, this is all hooked up to the GNOME runtime. So, for
example, Counter::new()
translates to a call to g_object_new()
,
and the c.add()
calls translate into virtual calls passing through
the GNOME class structure. We also generate extern "C"
functions so
you should be able to call the various methods from C code.
Let’s go through this example bit-by-bit and I’ll show you what each part works. Along the way, we can discuss the GNOME object model. Finally, we can cover some of the alternative designs that I considered and discarded, and a few things we could change in Rust to make everything smoother.
Mapping between GNOME and Rust ownership
The basic GNOME object model is that every object is ref-counted. In
general, if you are given a Foo*
pointer, it is assumed you are
borrowing that ref, and it you want to store that Foo*
value
somewhere, you should increment the ref-count for yourself. However,
there are other times when ownership transfer is assumed. (In general,
the GNOME has strong conventions here, which is great.)
I’ve debating about how best to mirror this in Rust. My current branch
works as follows, using the type Counter
as an example.
Counter
represents an owned reference to aCounter
object. This is implicitly heap-allocated and reference-counted, per the object model.Counter
implementsClone
, which will simply increment the reference count but return the same object.Counter
implementsDrop
, which will decrement the reference count.- In terms of its representation,
Counter
is a newtype’d*mut GObject
.
&Counter
is used for functions that wish to “borrow” a counter; if they want to store a version for themselves, they can callclone()
.- Hence the methods like
add()
are&self
methods. - This works more-or-less exactly like passing around an
&Rc<T>
or&Arc<T>
(which, incidentally, is the style I’ve started using all of the time for working with ref-counted data).
- Hence the methods like
Note that since every Counter
is implicitly ref-counted data, there
isn’t much point to working with an &mut Counter
. That is, you may
have a unique reference to a single handle, but you can’t really know
how many aliases are of Counter
are out there from other sources.
As a result, when you use gnome_gen!
, all of the methods and so
forth that you define are always going to be &self
methods. In
other words, you will always get a shared reference to your data.
Because we have only shared references, the fields in your GNOME
classes are going to be immutable unless you package them up Cell
and RefCell
. This is why the counter type, for example, stores its
count in a field f: Cell<u32>
– the Cell
type allows the counter
to be incremented and decremented even when aliased. It does imply
that it would be unsafe to share the Counter
across multiple threads
at once; but this is roughly the default in GNOME (things cannot be
shared across threads unless they’ve been designed for that).
Private data in GNOME
When it comes to data storage, the GNOME object model works a bit differently than a “traditional” OO language like Java or C++. In those more traditional languages, an object is laid out with the vtable first, and then the fields from each class, concatenated in order:
object --> +-------------------+
| vtable |
| ----------------- |
| superclass fields |
| ----------------- |
| subclass fields |
+-------------------+
The nice thing about this is that the object
pointer can safely be
used as either a Superclass
pointer or a Subclass
pointer. But
there is a catch. If new fields are added to the superclass, then the
offset of all my subclass fields will change – this implies that all
code using my object as a Subclass
has to be recompiled. What’s
worse, this is true even if all I wanted to do is to add a private
field to the superclass. In other words, adding fields in this scheme
is an ABI-incompatible change – meaning that we have to recompile
all downstream code, even if we know that this compilation cannot
fail.
Therefore, the GNOME model works a bit differently. While you can have fields allocated inline as I described, the recommendation is instead to use a facility called “private data”. With private data, you define a struct of fields accessible only to your class; these fields are not stored “inline” in your object at some statically predicted offset. Instead, when you allocate your object, the GNOME memory manage will also allocate space for the private data each class needs, and you can ask (dynamically) for the offset. (Appendix A goes into details on the actual memory layout.)
The gobject_gen!
macro is setup to always use private data in the
recommended fashion. If take another look at the header, we can see
the private data struct for the Counter
class is defined in the very
beginning, and given the name CounterPrivate
:
gobject_gen! {
class Counter {
struct CounterPrivate {
f: Cell<u32>
}
...
}
}
In the code, when we want to access the “private” data, we use the
private()
method. This will return to us a &CounterPrivate
reference that we can use. For example, defining the get()
method on
our counter looks like this:
fn get(&self) -> u32 {
self.private().f.get()
}
Although the offset of the private data for a particular class is not known statically, it is still always constant in any given execution. It’s just that it can change from run to run if different versions of libraries are in use. Therefore, in C code, most classes will inquire once, during creation time, to find the offset of their private data, and then store this result in a global variable. The current Rust code just inquires dynamically every time.
Object construction
gobject_gen!
does not expose traditional OO-style
constructors. Instead, you can define a function that produces the
initial values for your private struct – if you do not provide
anything, then we will use
the Rust Default
trait.
The Counter
example, in fact, provided no initialization function,
and hence it was using the Default
trait to initialize the field f
to zero. If we wanted to write this explicitly, we could have added
an init { }
block. For example, the following variant will initialize
the counter to 22
, not 0
:
gobject_gen! {
class Counter {
struct CounterPrivate {
f: Cell<u32>
}
init {
CounterPrivate {
f: Cell::new(22)
}
}
...
}
}
Note that init
blocks take no parameters – at the time when it
executes, the object’s memory is still not fully initialized, and
hence we can’t safely give access it. (Unlike in Java, we don’t
necessarily have a “null” value for all types.)
The general consensus at the design sprint was that the Best Practices
for writing a GNOME object was to avoid a “custom constructor” but
instead to define public properties and have creators specify those
properties at construction time. I did not yet model properties, but
it seems like that would fit nicely with this initialization
setup.There is also a hook that one can define that will execute once
all the “initial set” of properties have been initialized – I’d like
to expose this too, but didn’t get around to it. This would be similar
to init
, presumably, except that it would give access to a &self
pointer.
Similarly, we could extend gobject_gen!
to offer a more
“traditional” OO constructor model, similar to the one that Vala
offers. This too would layer on top of the existing code: so your
init()
function would run first, to generate the initial values for
the private fields, but then you could come afterwards and update
them, making use of the parameters. (You can model this today just by
defining an fn initialize(&self)
method, effectively.)
What still needs work?
So we’ve seen what does work (or what kind of works, in the case of subclassing). What work is left? Lots, it turns out. =)
Private data support could be smoother
I would prefer if you did not have to type self.private()
to access
private data. I would rather if you could just do self.f
to get
access to a private field f
. For that to work, though, we’d need to
have something like the
fields in traits RFC –
and probably an expanded version that has a few additional features.
In particular, we’d need the ability to map through derefs, or
possibly through custom code; read-only fields would likely help
too. Now that this blog post is done, I plan to post a comment on that
RFC with some observations and try to get it moving again.
Interfacing with C
I haven’t really implemented this yet, but I wanted to sketch how I
envision that this macro could interface with C code. We already
handle the “Rust” side of this, which is that we generate C-compatible
functions for each method that do the ceorrect dispatch; these follow
the GNOME naming conventions (e.g., Counter_add()
and
Counter_get()
). I’d also to have the macro to generate a .h
file
for you (or perhaps this should be done by a build.rs
script, I’m
not yet sure), so that you can easily have C code include that .h
file and seamlessly use your Rust object.
Interfacing with gtk-rs
There has already been a lot of excellent work mirroring the various
GNOME APIs through the gtk-rs crates. I’m using
some of those APIs already, but we should do some more work to make
the crates more intercompatible. I’d love it if you easily subclass
existing classes from the GNOME libraries using gnome_gen!
. It
should be possible to make this work, it’ll just take some
coordination.
Making it more convenient to work with shared, mutable data
Since all GNOME objects are shared, it becomes very important to have
ergonomic libraries for working with shared, mutable data. The
existing types in the standard library – Cell
and RefCell
– are
very general but not always the most pleasant to work with.
If nothing else, we could use some convenient types for other
scenarios, such as a Final<T>
that corresponds to a “write-once”
variable (the name is obviously inspired by final fields in Java,
though ivars is another name commonly used in the parallel programming
community). Final<T>
would be nice for fields that start out as null
but which are always initialized during construction and then never
changed again. The nice thing would be that Final<T>
could implement
Deref
(it would presumably panic if the value has not yet been
assigned).
Supporting more of the GNOME object model
There are also many parts of GNOME that we don’t model yet.
We don’t really support subclassing yet. I have a half-executed plan for supporting it, but this is a topic worthy of a post of its own, so I’ll just leave it at that.
Properties are probably the biggest thing; they are fairly simple conceptually, but there are lots of knobs and whistles to get right.
We don’t support constructing an object with a list of initial property values nor do we support the post-initialization hook. In C code, when constructing a GNOME object, once can use a var-args style API to supply a bunch of initial values:
g_object_new(TYPE_MEDIA,
"inventory-id", 42,
"orig-package", FALSE,
NULL);
I imagine modeling this in Rust using a builder pattern:
Media::with()
.inventory_id(42)
.orig_package(false)
.new()
We don’t support signals, which are a kind of message bus system that I don’t really understand very well. =)
Procedural macro support on Rust is young
There is still a long ways to before the gnome_gen!
plugin is really
usable. For one thing, it relies on a number of unstable Rust language
features – not the least of them being the new procedural macro
system. It also inherits one very annoying facet of the current
procedural macros, which is that all source location information is
lost. This means that if you have type errors in your code it just
gives you an error like “somewhere in this usage of the gnome_gen!
macro”, which is approximately useless since that covers the entire
class definition. This is obviously something we aim to improve
through PRs like #40939.
Conclusion
Overall, I really enjoyed the sprint. It was great to meet so many GNOME contributors in person. I was very impressed with how well thought out the GNOME object system is.
Obviously, this macro is in its early days, but I’m really excited about its current state nonetheless. I think there is a lot of potential for GNOME and Rust to have a truly seamless integration, and I look forward to seeing it come together.
I don’t know how much time I’m going to have to devote to hacking on the macro, but I plan to open up various issues on the repository over the next little while with various ideas for expansions and/or design questions, so if you’re interested in seeing the work proceed, please get involved!
Finally, I want to take a moment to give a shoutout to jseyfried and
dtolnay, who have done excellent work pushing forward with procedural
macro support in rustc and the quote!
libraries. Putting
gobject_gen!
together was really an altogether pleasant
experience. I can’t wait to see those APIs evolve more: support for
spans, first and foremost, but proper hygiene would be nice too, since
gobject_gen!
has to generate various names as part of its mapping.
Appendix A: Memory layout of private data
My understanding is that the private data feature evolved over time. When the challenges around ABI compatibility were first discovered, a convention developed of having each object have just a single “inline” field. Each class would then malloc a separate struct for its private fields. So you wound up with something like this:
object --> +--------------------+
| vtable |
| ------------------ |
| SuperclassPrivate* | ---> +-------------------+
| ------------------ | | superclass fields |
| SubclassPrivate* | --+ +-------------------+
+--------------------+ |
+--> +-----------------+
| subclass fields |
+-----------------+
Naturally any class can now add private fields without changing the offset of others’ fields. However, making multiple allocations per object is inefficient, and it’s easy to mess up the manual memory management involved as well. So the GNOME runtime added the “private” feature, which allows each class to request that some amount of additional space be allocated, and provides an API for finding the offset of that space from the main object. The exact memory layout is (I presume) not defined, but as I understand it things are currently laid out with the private data stored at a negative offset:
+--------------------+
| subclass fields |
| ------------------ |
| superclass fields |
object --> + ------------------ +
| vtable |
+--------------------+
Although no longer necessary, it is also still common to include a single “inline” field that points to the private data, setup during initialization time:
+--------------------+ <---+
| subclass fields | |
| ------------------ | <-+ |
| superclass fields | | |
object --> + ------------------ + | |
| vtable | | |
+ ------------------ + | |
| SuperclassPrivate* | --+ |
| ------------------ | |
| SubclassPrivate* | ----+
+--------------------+