The 'Tootsie Pop' model for unsafe code
27 May 2016
In my previous post, I spent some time talking about the idea of unsafe abstractions. At the end of the post, I mentioned that Rust does not really have any kind of official guidelines for what kind of code is legal in an unsafe block and what is not.What this means in practice is that people wind up writing what “seems reasonable” and checking it against what the compiler does today. This is of course a risky proposition since it means that if we start doing more optimization in the compiler, we may well wind up breaking unsafe code (the code would still compile; it would just not execute like it used to).
Now, of course, merely having published guidelines doesn’t entirely change that dynamic. It does allow us to “assign blame” to the unsafe code that took actions it wasn’t supposed to take. But at the end of the day we’re still causing crashes, so that’s bad.
This is partly why I have advocated that I want us to try and arrive at guidelines which are “human friendly”. Even if we have published guidelines, I don’t expect most people to read them in practice. And fewer still will read past the introduction. So we had better be sure that “reasonable code” works by default.
Interestingly, there is something of a tension here: the more unsafe code we allow, the less the compiler can optimize. This is because it would have to be conservative about possible aliasing and (for example) avoid reordering statements. We’ll see some examples of this as we go.
Still, to some extent, I think it’s possible for us to have our cake and eat it too. In this blog post, I outline a proposal to leverage unsafe abstaction boundaries to inform the compiler where it can be aggressive and where it must be conservative. The heart of the proposal is the intution that:
- when you enter the unsafe boundary, you can rely that the Rust type system invariants hold;
- when you exit the unsafe boundary, you must ensure that the Rust type system invariants are restored;
- in the interim, you can break a lot of rules (though not all the rules).
I call this the Tootsie Pop model: the idea is that an unsafe abstraction is kind of like a Tootsie Pop. There is a gooey candy interior, where the rules are squishy and the compiler must be conservative when optimizing. This is separated from the outside world by a hard candy exterior, which is the interface, and where the rules get stricter. Outside of the pop itself lies the safe code, where the compiler ensures that all rules are met, and where we can optimize aggressively.
One can also compare the approach to what would happen when writing a C plugin for a Ruby interpreter. In that case, your plugin can assume that the inputs are all valid Ruby objects, and it must produce valid Ruby objects as its output, but internally it can cut corners and use C pointers and other such things.
In this post, I will elaborate a bit more on the model, and in particular cover some example problem cases and talk about the grey areas that still need to be hammered out.
How do you define an unsafe boundary?
My initial proposal is that we should define an unsafe boundary as
being “a module that unsafe code somewhere inside of it”. So, for
example, the module that contains split_at_mut
, which we have seen
earlier is a fn defined with unsafe code, would form an unsafety
boundary. Public functions in this module would therefore be “entry
points” into the unsafe boundary; returning from such a function, or
issuing a callback via a closure or trait method, would be an exit
point.
Initially when considering this proposal, I wanted to use a an unsafe
boundary defined at the function granularity. So any function which
contained an unsafe block but which did not contain unsafe
in its
signature would be considered the start of an unsafe boundary; and any
unsafe fn
would be a part of its callers boundary (note that its
caller must contain an unsafe block). This would mean that
e.g. split_at_mut
is its own unsafe boundary. However, I have come
to think that this definition is too precise and could cause problems
in practice – we’ll see some examples below. Therefore, I have
loosened it.
Ultimately I think that deciding where to draw the unsafe boundary is still somewhat of an open question. Even using the module barrier means that some kinds of refactorings that might seem innocent (migrating code between modules, specifically) can change code from legal to illegal. I will discuss various alternatives later on.
Permissions granted/required at the unsafe boundary
In the model I am proposing, most of your reasoning happens as you
cross into or out of an unsafe abstraction. When you enter into an
unsafe abstraction – for example, by calling a method like
split_at_mut
, which is not declared as unsafe
but uses unsafe
code internally – you implicitly provide that function with certain
permissions. These permissions are derived from the types of the
function’s arguments and the rules of the Rust type system. In the
case of split_at_mut
, there are two arguments:
- The slice
self
that is being split, of type&'a mut [T]
; and, - the midpoint
mid
at which to perform the split, of typeusize
.
Based on these types, the split_at_mut
method can assume that the
variable self
refers to a suitably initialized slice of values of
type T
. That reference is valid for the lifetime 'a
, which
represents some span of execution time that encloses at least the
current call to split_at_mut
. Similarly, the argument mid
will be
an unsigned integer of suitable size.
At this point we are within the unsafe abstraction. It is now free to do more-or-less whatever it likes, so long as all the actions it takes fall within the initial set of permissions. More on this below.
Finally, when you exit from the unsafe boundary, you must ensure that
you have restored whatever invariants and permissions the Rust type
system requires. These are typically going to be derived from the
types of the function’s outputs, such as its return type. In the case
of split_at_mut
, the return type is (&mut [T], &mut [T])
, so this
implies that you will return a tuple of slices. Since those slices are
both active at the same time, they must (by the rules of Rust’s type
system) refer to disjoint memory.
Specifying the permissions
In this post, I am not trying to define the complete set of permissions. We have a reasonably good but not formalized notion of what these permissions are. Ralf Jung and Derek Dryer have been working on making that model more precise as part of the Rust Belt project. I think writing up those rules in one central place would obviously be a big part of elaboring on the model I am sketching out here.
If you are writing safe code, the type system will ensure that you never do anything that exceeds the permissions granted to you. But if you dip into unsafe code, then you take on the responsibility for verifying that you obey the given permissions. Either way, the set of permissions remain the same.
Permissons on functions declared as unsafe
If a function is declared as unsafe, then its permissions are not
defined by the type system, but rather in comments and documentation.
This is because the unsafe
keyword is a warning that the function
arguments may have additional requirements of its caller – or may
return values that don’t meet the full requirements of the Rust type
system.
Optimizations within an unsafe boundary
So far I’ve primarily talked about what happens when you cross an unsafe boundary, but I’ve not talked much about what you can do within an unsafe boundary. Roughly speaking, the answer that I propose is: “whatever you like, so long as you don’t exceed the initial set of permissions you were given”.
What this means in practice is that when the compiler is optimizing code that originates inside an unsafe boundary, it will make pessimistic assumptions about aliasing. This is effectively what C compilers do today (except they sometimes employ type-based alias analysis; we would not).
As a simple example: in safe code, if you have two distinct variables
that are both of type &mut T
, the compiler would assume that they
represent disjoint memory. This might allow it, for example, to
re-order reads/writes or re-use values that have been read if it does
not see an intervening write. But if those same two variables appear
inside of an unsafe boundary, the compiler would not make that
assumption when optimizing. If that was too hand-wavy for you, don’t
worry, we’ll spell out these examples and others in the next section.
Examples
In this section I want to walk through some examples. Each one contains unsafe code doing something potentially dubious. In each case, I will do the following:
- walk through the example and describe the dubious thing;
- describe what my proposed rules would do;
- describe some other rules one might imagine and what their repercussions might be.
By the way, I have been collecting these sorts of examples in a repository, and am very interested in seeing more such dubious cases which might offer insight into other tricky situations. The names of the sections below reflect the names of the files in that repository.
split-at-mut-via-duplication
Let’s start with a familiar example. This is a variant of the familiar
split_at_mut
method that I covered in the previous post:
impl [T] {
pub fn split_at_mut(&mut self, mid: usize) -> (&mut [T], &mut [T]) {
let copy: &mut [T] = unsafe { &mut *(self as *mut _) };
let left = &mut self[0..mid];
let right = &mut copy[mid..];
(left, right)
}
}
This version works differently from the ones I showed before. It
doesn’t use raw pointers. Instead, it cheats the compiler by
“duplicating” self
via a cast to *mut
. This means that both self
and copy
are &mut [T]
slices pointing at the same memory, at the
same time. In ordinary, safe Rust, this is impossible, but using
unsafe code, we can make it happen.
The rest of the function looks almost the same as our original attempt
at a safe implementation (also in the previous post). The only
difference now is that, in defining right
, it uses copy[mid..]
instead of self[mid..]
. The compiler accepts this because it assumes
that copy
and self
, since they are both simultaneously valid, must
be disjoint (remember that, in unsafe code, the borrow checker still
enforces its rules on safe typess, it’s just that we can use tricks
like raw pointers or transmutes to sidestep them).
Why am I showing you this? The key question here is whether the
optimizer can “trust” Rust types within an unsafe boundary. After all,
this code is only accepted because the borrowck thinks (incorrectly)
that self
and copy
are disjoint; if the optimizer were to think
the same thing, that could lead to bad optimizations.
My belief is that this program ought to be legal. One reason is
just that, when I first implemented split_at_mut
, it’s the most
natural thing that I thought to write. And hence I suspect that many
others would write unsafe code of this kind.
However, to put this in terms of the model, the idea is that the
unsafe boundary here would be the module containing
split_at_mut
. Thus the dubious aliasing between left
and right
occurs within this boundary. In general, my belief is that
whenever we are inside the boundary we cannot fully trust the
types that we see. We can only assume that the user is supplying the
types that seem most appropriate to them, not necessarily that they
are accounting for the full implications of those types under the
normal Rust rules. When optimizing, then, the compiler will not
assume that the normal Rust type rules apply – effectively, it will
treat &mut
references the same way it might treat a *mut
or
*const
pointer.
(I have to work a bit more at understanding LLVM’s annotations, but I think that we can model this using the aliasing metadata that LLVM provides. More on that later.)
Alternative models. Naturally alternative models might consider this code illegal. They would require that one use raw pointers, as the current implementation does, for any pointer that does not necessarily obey Rust’s memory model.
(Note that this raises another interesting question, though, about
what the legal aliasing is between (say) a &mut
and a *mut
that
are actively in use – after all, an &mut
is supposed to be unique,
but does that uniqueness cover raw pointers?)
refcell-ref
The borrow()
method on the type RefCell
employs a helper type that
returns a value of a helper type called Ref
:
pub struct Ref<'b, T: ?Sized + 'b> {
value: &'b T,
borrow: BorrowRef<'b>,
}
Here the value
field is a reference to the interior of the
RefCell
, and the borrow
is a value which, once dropped, will cause
the “lock” on the RefCell
to be released. This is important because
it means that once borrow
is dropped, value
can no longer safely
be used. (You could imagine the helper type MutexGuard
employing a
similar pattern, though actually it works ever so slightly differently
for whatever reason.)
This is another example of unsafe code is using the Rust types in a
“creative” way. In particular, the type &'b T
is supposed to mean: a
reference that can be safely used right up until the end of 'b
(and
whose referent will not be mutated). However, in this case, the actual
meaning is “until the end of 'b
or until borrow
is dropped,
whichever comes first”.
So let’s consider some imaginary method defined on Ref
,
copy_drop()
, which works when T == u32
. It would copy the value
and then drop the borrow to release the lock.
use std::mem;
impl<'b> Ref<'b, u32> {
pub fn copy_drop(self) -> u32 {
let t = *self.value; // copy contents of `self.value` into `t`
mem::drop(self.borrow); // release the lock
t // return what we read before
}
}
Note that there is no unsafe code in this function at all. I claim
then that the Rust compiler would, ideally, be within its rights to
rearrange this code and to delay the load of self.value
to occur later,
sort of like this:
mem::drop(self.borrow); // release the lock
let t = *self.value; // copy contents of `self.value` into `t`
t // return what we read before
This might seem surprising, but the idea here is that the type of
self.value
is &'b u32
, which is supposed to mean a reference valid
for all of 'b
. Moreover, the lifetime 'b
encloses the entire call
to copy_drop
. Therefore, the compiler would be free to say “well,
maybe I can save a register if I move this load down”.
However, I think that reordering this code would be an invalid
optimization. Logically, as soon as self.borrow
is dropped,
*self.value
becomes inaccessible – if you imagine that this pattern
were being used for a mutex, you can see why: another thread might
acquire the lock!
Note that because these fields are private, this kind of problem can
only arise for the methods defined on Ref
itself. The public cannot
gain access to the raw self.value
reference. They must go through
the deref trait, which returns a reference for some shorter lifetime
'r
, and that lifetime 'r
always ends before the ref is dropped.
So if you were to try and write the same copy_drop
routine from the
outside, there would be no problem:
let some_ref: Ref<u32> = ref_cell.borrow();
let t = *some_ref;
mem::drop(some_ref);
use(t);
In particular, the let t = *some_ref
desugars to something like:
let t = {
let ptr: &u32 = Deref::deref(&some_ref);
*ptr
};
Here the lifetime of ptr
is just going to be that little enclosing
block there.
Why am I showing you this? This example illustrates that, in the
presence of unsafe
code, the unsafe
keyword itself is not
necessarily a reliable indicator to where “funny business” could
occur. Ultimately, I think what’s important is the unsafe abstraction
barrier.
My belief is that this program ought to be legal. Frankly, to me, this code looks entirely reasonable, but also it’s the kind of code I expect people will write (after all, we wrote it). Examples like this are why I chose to extend the unsafe boundary to enclose the entire module that uses the unsafe keyword, rather than having it be at the fn granularity – because there can be functions that, in fact, do unsafe things where the full limitations on ordering and so forth are not apparent, but which do not directly involve unsafe code. Another classic example is modifying the length or capacity fields on a vector.
Now, I chose to extend to the enclosing, module because it corresponds to the privacy boundary, and there can be no unsafe abstraction barrier without privacy. But I’ll explain below why this is not a perfect choice and we might consider others.
usize-transfer
Here we have a trio of three functions. These functions collaborate
to hide a reference in a usize
and then later dereference it:
// Cast the reference `x` into a `usize`
fn escape_as_usize(x: &i32) -> usize {
// interestingly, this cast is currently legal in safe code,
// which is a mite unfortunate, but doesn't really affect
// the example
x as *const _ as usize
}
// Cast `x` back into a pointer and dereference it
fn consume_from_usize(x: usize) -> i32 {
let y: &i32 = unsafe { &*(x as *const i32) };
*y
}
pub fn entry_point() {
let x: i32 = 2;
let p: usize = escape_as_usize(&x);
// (*) At this point, `p` is in fact a "pointer" to `x`, but it
// doesn't look like it!
println!("{}", consume_from_usize(p));
}
The key point in this example is marked with a (*)
. At that point,
we have effected created a pointer to x
and stored it in p
, but
the type of p
does not reflect that (it just says it’s a
pointer-sized integer). Note also that entry_point
does not itself
contain unsafe code (further evidence that private helper functions
can easily cause unsafe reasoning to spread beyond the border of a
single fn). So the compiler might assume that the stack slot x
is
dead and reuse the memory, or something like that.
There are a number of ways that this code might be made less shady.
escape_as_usize
might have, for example, returned a *const i32
instead of usize
. In that case, consume_from_usize
would look like:
fn consume_from_usize(x: *const i32) -> i32 { ... }
This itself raises a kind of interesting question though. If a
function is not declared as unsafe, and it is given a *const i32
argument, can it dereference that pointer? Ordinarily, the answer
would clearly be no. It has no idea what the provenance of that
pointer is (and if you think back to the idea of permissions that are
granted and expected by the Rust type system, the type system does
not guarantee you that a *const
can be dereferenced). So
effectively there is no difference, in terms of the public
permissions, between x: usize
and x: *const i32
. Really I think
the best way to structure this code would have been to declare
consume_from_usize()
as unsafe
, which would have served to declare
to its callers that it has extra requirements regarding its argument
x
(namely, that it must be a pointer that can be safely
dereferenced).
Now, if consume_from_usize()
were a public function, then not
having an unsafe
keyword would almost certainly be flat out
wrong. There is nothing that stops perfectly safe callers from calling
it with any old integer that they want; even if the signature were
changed to take *const u32
, the same is basically true. But
consume_from_usize()
is not public: it’s private, and that perhaps
makes a difference.
It often happens, as we’ve seen in the other examples, that people cut corners within the unsafe boundary and declare private helpers as “safe” that are in fact assuming quite a bit beyond the normal Rust type rules.
Why am I showing you this? This is a good example for playing with the concept of an unsafe boundary. By moving these functions about, you can easily create unsafety, as they must all three be contained within the same unsafe boundary to be legal (if indeed they are legal at all). Consider these variations:
Private helper module.
mod helpers {
pub fn escape_as_usize(x: &i32) -> usize { ... }
pub fn consume_from_usize(x: usize) -> i32 { ... }
}
pub fn entry_point() {
... // calls now written as `helpers::escape_as_usize` etc
}
Private helper module, but restriced scope to an outer scope.
mod helpers {
pub(super) fn escape_as_usize(x: &i32) -> usize { ... }
pub(super) fn consume_from_usize(x: usize) -> i32 { ... }
}
pub fn entry_point() {
... // calls now written as `helpers::escape_as_usize` etc
}
Public functions, but restricted to an outer scope.
pub mod some_bigger_abstraction {
mod helpers {
pub(super) fn escape_as_usize(x: &i32) -> usize { ... }
pub(super) fn consume_from_usize(x: usize) -> i32 { ... }
pub(super) fn entry_point() { ... }
}
}
Public functions, but de facto restricted to an outer scope.
pub mod some_bigger_abstraction {
mod helpers {
pub fn escape_as_usize(x: &i32) -> usize { ... }
pub fn consume_from_usize(x: usize) -> i32 { ... }
pub fn entry_point() { ... }
}
// no `pub use`, so in fact they are not accessible
}
Just plain public.
pub fn escape_as_usize(x: &i32) -> usize { ... }
pub fn consume_from_usize(x: usize) -> i32 { ... }
pub fn entry_point() { }
Different crates.
// crate A:
pub fn escape_as_usize(x: &i32) -> usize { ... }
// crate B:
pub fn consume_from_usize(x: usize) -> i32 { ... }
// crate C:
extern crate a;
extern crate b;
pub fn entry_point() {
...
let p = a::escape_as_usize(&x)
...
b::consume_from_usize(p)
...
}
My belief is that some of these variations ought to be legal. The current model as I described it here would accept the original variation (where everything is in one module) but reject all other variations (that is, they would compile, but result in undefined behavior). I am not sure this is right: I think that at least the “private helper module” variations seems maybe reasonable.
Note that I think any or all of these variations should be fine with
appropriate use of the unsafe
keyword. If the helper functions were
declared as unsafe
, then I think they could live anywhere. (This is
actually an interesting point that deserves to be drilled into a bit
more, since it raises the question of how distinct unsafe boundaries
“interact”; I tend to think of there as just being safe and unsafe
code, full stop, and hence any time that unsafe code in one module
invokes unsafe code in another, we can assume they are part of the
same boundary and hence that we have to be conservative.)
On refactorings, harmless and otherwise
One interesting thing to think about with an kind of memory model or
other guidelines is what sorts of refactorings people can safely
perform. For example, under this model, manually inlining a fn body
is always safe, so long as you do so within an unsafe abstraction.
Inlining a function from inside an abstraction into the outside is
usually safe, but not necessarily – the reason it is usually safe is
that most such functions have unsafe
blocks, and so by manually
inlining, you will wind up changing the caller from a safe function
into one that is part of the unsafe abstraction.
(Grouping items and functions into modules is another example that may or may not be safe, depending on how we chose to draw the boundary lines.)
EDIT: To clarify a confusion I have seen in a few places. Here I
am talking about inlining by the user. Inlining by the compiler is
different. In that case, when we inline, we would track the
“provenance” of each instruction, and in particular we would track
whether the instruction originated from unsafe code. (As I understand
it, LLVM already does this with its aliased sets, because it is needed
for handling C99 restrict
.) This means that when we decide e.g. if
two loads may alias, if one (or both) of those loads originated in
unsafe code, then the answer would be different than if they did not.
Impact of this “proposal” and mapping it to LLVM
I suspect that we are doing some optimizations now that would not be legal under this proposal, though probably not that many – we haven’t gone very far in terms of translating Rust’s invariants to LLVM’s alias analysis metadata. Note though that in general this proposal is very optimization friendly: all safe code can be fully optimized. Unsafe code falls back to more C-like reasoning, where one must be conservative about potential aliasing (note that I do not want to employ any type-based alias analysis, though).
I expect we may want to add some annotations that unsafe code can use
to recover optimizations. For example, perhaps something analogous to
the restrict
keyword in C, to declare that pointers are unaliased,
or some way to say that an unsafe
fn (or module) nonetheless ensures
that all safe Rust types meet their full requirements.
One of the next steps for me personally in exploring this model is to try and map out (a) precisely what we do today and (b) how I would express what I want in LLVM’s terms. It’s not the best formalization, but it’s a concrete starting point at least!
Tweaking the concept of a boundary
As the final example showed, a module boundary is not clearly right. In particular, the idea of using a module is that it aligned to privacy, but by that definition it should probably include submodules (that is, any module where an unsafe keyword appears either in the module or in some parent of the module is considered to be an unsafe boundary module).
Conclusion
Here I presented a high-level proposal for how I think a Rust “memory model” ought to work. Clearly this doesn’t resemble a formal memory model and there are tons of details to work out. Rather, it’s a guiding principle: be aggressive outside of unsafe abstractions and conservative inside.
I have two major concerns:
- First, what is the impact on execution time? I think this needs to be investigated, but ultimately I am sure we can overcome any deficit by allowing unsafe code authors to “opt back in” to more aggressive optimization, which feels like a good tradeoff.
- Second, what’s the best way to draw the optimization boundary? Can we make it more explicit?
In particular, the module-based rule that I proposed for the unsafe
boundary is ultimately a kind of heuristic that makes an “educated
guess” as to where the unsafe boundary lies. Certainly the boundary
must be aligned with modules, but as the last example showed, there
may be a lot of ways to set thigns up that “seem reasonable”. It
might be nicer if we could have a way to declare that boundary
affirmatively. I’m not entirely sure that this looks like. But if
we did add some way, we might then say that if you use the older
unsafe
keyword – where the boundary is implicit – we’ll just
declare the whole crate as being an “unsafe boundary”. This likely
won’t break any code (though of course I mentioned the “different
crates” variation above…), but it would provide an incentive to use
the more explicit form.
For questions or discussion, please see this thread on the Rust internals forum.
Edit log
Some of the examples of dubious unsafe code originally used
transmute
and transmute_copy
. I was asked to change them because
transmute_copy
really is exceptionally unsafe, even for unsafe code
(type inference can make it go wildly awry from what you expected),
and so we didn’t want to tempt anyone into copy-and-pasting them. For
the record: don’t copy and paste the unsafe code I labeled as dubious
– it is indeed dubious and may not turn out to be legal! :)