Thoughts on trusting types and unsafe code
12 September 2016
I’ve been thinking about the unsafe code guidelines a lot in the back
of my mind. In particular, I’ve been trying to think through what it
means to “trust types” – if you recall from the
Tootsie Pop Model (TPM) blog post, one of the key examples
that I was wrestling with was the RefCell-Ref
example. I want to
revisit a variation on that example now, but from a different
angle. (This by the way is one of those “Niko thinks out loud” blog
posts, not one of those “Niko writes up a proposal” blog posts.)
Setup
Let’s start with a little safe function:
fn patsy(v: &usize) -> usize {
let l = *v;
collaborator();
use(l);
}
The question is, should the compiler ever be able to optimize this function as follows:
fn patsy(v: &usize) -> usize {
collaborator();
use(*v);
}
By moving the load from v
after the call to collaborator()
, we
avoided the need for a temporary variable. This might reduce stack
size or register pressure. It is also an example of the kind of
optimizations we are considering doing for MIR (you can think of it as
an aggressive form of copy-propagation). In case it’s not clear, I
really want the answer to this question be yes – at least most of the
time. More specifically, I am interested in examining when we can do
this without doing any interprocedural analysis.
Now, the question of “is this legal?” is not necessarily a yes or no question. For example, the Tootsie Pop Model answer was “it depends”. In a safe code context, this transformation was legal. In an unsafe context, it was not.
What could go wrong?
The concern here is that the function collaborator()
might invalidate *v
in
some way. There are two ways that this could potentially happen:
- unsafe code could mutate
*v
, - unsafe code could invalidate the memory that
v
refers to.
Here is some unsafe code that does the first thing:
static mut data: usize = 0;
fn instigator() {
patsy(unsafe { &data });
}
fn collaborator() {
unsafe { data = 1; }
}
Here is some unsafe code that invalidates *v
using an option (you
can also write code that makes it get freed, of course). Here, when we
start, data
is Some(22)
, and we take a reference to that 22
. But
then collaborator()
reassigns data to None
, and hence the memory
that we were referring to is now uninitialized.
static mut data: Option<usize> = Some(22);
fn instigator() {
patsy(unsafe { data.as_ref().unwrap() })
}
fn collaborator() {
unsafe { data = None; }
}
So, when we ask whether it is legal to optimize patsy
move the *v
load after the call to collaborator()
, our answer affects whether
this unsafe code is legal.
The Tootsie Pop Model
Just for fun, let’s look at how this plays out in the Tootsie Pop
model (TPM). As I wrote before, whether this code is legal will
ultimately depend on whether patsy
is located in an unsafe
context. The way I described the model, unsafe contexs are tied to
modules, so I’ll stick with that, but there might also be other ways
of defining what an unsafe context is.
First let’s imagine that all three functions are in the same module:
mod foo {
static mut data: Option<usize> = Some(22);
pub fn instigator() {...}
fn patsy(v: &usize) -> usize {..}
fn collaborator() {...}
}
Here, because instigator
and collaborator
contain unsafe blocks,
the module foo
is considered to be an unsafe context, and thus
patsy
is also located within the unsafe context. This means that the
unsafe code would be legal and the optimization would not. This is
because the TPM does not allow us to “trust types” within an unsafe
context.
However, it’s worth pointing out one other interesting
detail. Just because the TPM model does not authorize the
optimization, that doesn’t mean that it could not be performed. It
just means that to perform the optimization would require detailed
interprocedural alias analysis. That is, a highly optimizing compile
might analyze instigator
, patsy
, and collaborator
and determine
whether or not the writes in collaborator
can affect patsy
(of
course here they can, but in more reasonable code they likely would
not). Put another way, the TPM basically tells you “here are
optimizations you can do without doing anything sophisticated”; it
doesn’t put an upper limit on what you can do given sufficient extra
analysis.
OK, so now here is another recasting where the functions are spread between modules:
mod foo {
use bar::patsy;
static mut data: Option<usize> = Some(22);
pub fn instigator() {...}
pub fn collaborator() {...}
}
mod bar {
use foo::collaborator;
pub fn patsy(v: &usize) -> usize {..}
}
In this case, the module bar
does not contain unsafe
blocks, and
hence it is not an unsafe context. That means that we can optimize
patsy
. It also means that instigator
is illegal:
fn instigator() {
patsy(unsafe { &data });
}
The problem here is that instigator
is calling patsy
, which is
defined in a safe context (and hence must also be a safe
function). That implies that instigator
must fulfill all of Rust’s
basic permissions for the arguments that patsy
expects. In this
case, the argument is a &usize
, which means that the usize
must be
accessible and immutable for the entire lifetime of the reference;
that lifetime encloses the call to patsy
. And yet the data in
question can be mutated (by collaborator
). So instigator
is
failing to live up to its obligations.
TPM has interesting implications for the Rust optimizer. Basically,
whether or not a given statement can “trust” the types of its
arguments ultimately depends on where it appeared in the original
source. This means we have to track some info when inlining unsafe
code into safe code (or else ’taint’ the safe code in some way). This
is not unique to TPM, though: Similar capabilities seem to be required
for handling e.g. the C99 restrict
keyword, and we’ll see that they
are also important when trusting types.
What if we fully trusted types everywhere?
Of course, the TPM has the downside that it hinders optimization in
unchecked-get use case. I’ve been pondering various ways to address
that. One thing that I find intuitively appealing is the idea of
trusting Rust types everywhere. For example, the idea might be that
whenever you create a shared reference like &usize
, you must
ensure that its associated permissions hold. If we took this approach,
then we could perform the optimization on patsy
, and we could say
that instigator
is illegal, for the same reasons that it was illegal
under TPM when patsy
was in a distinct module.
However, trusting types everywhere – even in unsafe code –
potentially interacts in a rather nasty way with lifetime inference.
Here is another example function to consider, alloc_free
:
fn alloc_free() {
unsafe {
// allocates and initializes an integer
let p: *mut i32 = allocate_an_integer();
// create a safe reference to `*p` and read from it
let q: &i32 = &*p;
let r = *q;
// free `p`
free(p);
// use the value we loaded
use(r); // but could we move the load down to here?
}
}
What is happening here is that we allocate some memory containing an
integer, create a reference that refers to it, read from that
reference, and then free the original memory. We then use the value
that we read from the reference. The question is: can the compiler
“copy-propagate” that read down to the call to use()
?
If this were C code, the answer would pretty clearly be no (I
presume, anyway). The compiler would see that free(p)
may invalidate
q
and hence it act as a kind of barrier.
But if we were to go “all in” on trusting Rust types, the answer would
be (at least currently) yes. Remember that the purpose of this
model is to let us do optimizations without doing fancy
analysis. Here what happens is that we create a reference q
whose
lifetime will stretch from the point of creation until the end of its
scope:
fn alloc_free() {
unsafe {
let p: *mut i32 = allocate_an_integer();
let q: &i32 = &*p; // --+ lifetime of the reference
let r = *q; // | as defined today
// |
free(p); // |
// |
use(r); // <------------+
}
}
If this seems like a bad idea, it is. The idea that writing unsafe Rust code might be even more subtle than writing C seems like a non-starter to me. =)
Now, you might be tempted to think that this problem is an artifact of
how Rust lifetimes are currently tied to scoping. After all, q
is
not used after the let r = *q
statement, and if we adopted the
non-lexical lifetimes approach, that would mean the lifetime
would end there. But really this problem could still occur in a
NLL-based system, though you have to work a bit harder:
fn alloc_free2() {
unsafe {
let p: *mut i32 = allocate_an_integer();
let q: &i32 = &*p; // --------+
let r = *q; // |
if condition1() { // |
free(p); // |
} // |
if condition2() { // |
use(r); // |
if condition3() { // |
use_again(*q); // <---+
}
}
}
}
Here the problem is that, from the compiler’s point of view, the
reference q
is live at the point where we call free
. This is
because it looks like we might need it to call use_again
. But in
fact the programmer knows that condition1()
and condition3()
are
mutually exclusive, and so she may reason that the lifetime of q
ends earlier when condition1()
holds than when it doesn’t.
So I think it seems clear from these examples that we can’t really fully trust types everywhere.
Trust types, not lifetimes?
I think that whatever guidelines we wind up with, we will not be able to fully trust lifetimes, at least not around unsafe code. We have to assume that memory may be invalidated early. Put another way, the validity of some unsafe code ought not to be determined by the results of lifetime inference, since mere mortals (including its authors) cannot always predict what it will do.
But there is a more subtle reason that we should not “trust lifetimes”. The Rust type system is a conservative analysis that guarantees safety – but there are many notions of a reference’s “lifetime” that go beyond its capabilities. We saw this in the previous section: today we have lexical lifetimes. Tomorrow we may have non-lexical lifetimes. But humans can go beyond that and think about conditional control-flow and other factors that the compiler is not aware of. We should not expect humans to limit themselves to what the Rust type system can express when writing unsafe code!
The idea here is that lifetimes are sometimes significant to the model – in particular, in safe code, the compiler’s lifetimes can be used to aid optimization. But in unsafe code, we are required to assume that the user gets to pick the lifetimes for each reference, but those choices must still be valid choices that would type check. I think that in practice this would roughly amount to “trust lifetimes in safe contexts, but not in unsafe contexts.
Impact of ignoring lifetimes altogether
This implies that the compiler will have to use the loads that the
user wrote to guide it. For example, you might imagine that the the
compiler can move a load from x
down in the control-flow graph,
but only if it can see that x
was going to be loaded anyway. So
if you consider this variant of alloc_free
:
fn alloc_free3() {
unsafe {
let p: *mut i32 = allocate_an_integer();
let q: &i32 = &*p;
let r = *q; // load but do not use
free(p);
use(*q); // not `use(r)` but `use(*q)` instead
}
}
Here we can choose to either eliminate the first load (let r = *q
)
or else replace use(*q)
with use(r)
. Either is ok: we have
evidence that the user believes the lifetime of q
to enclose
free
. (The fact that it doesn’t is their fault.)
But now lets return to our patsy()
function. Can we still optimize
that?
fn patsy(v: &usize) -> usize {
let l = *v;
collaborator();
use(l);
}
If we are just ignoring the lifetime of v
, then we can’t – at least
not on the basis of the type of v
. For all we know, the user
considers the lifetime of v
to end right after let l = *v
. That’s
not so unreasonable as it might sound; after all, the code looks to
have been deliberately written to load *v
early. And after all, we
are trying to enable more advanced notions of lifetimes than those
that the Rust type system supports today.
It’s interesting that if we inlined patsy
into its caller, we might
learn new information about its arguments that lets us optimize more
aggressively. For example, imagine a (benevolent, this time) caller
like this:
fn kindly_fn() {
let x = &1;
patsy(x);
use(*x);
}
If we inlined patsy
into kindly_fn
, we get this:
fn kindly_fn() {
let x = &1;
{
let l = *x;
collaborator();
use(l);
}
use(*x);
}
Here we can see that *x
must be valid after collaborator()
, and so
we can optimize the function as follows (we are moving the load of
*x
down, and then applying CSE to eliminate the double load):
fn kindly_fn() {
let x = &1;
{
collaborator();
let l = *x;
use(l);
}
use(l);
}
There is a certain appeal to “trust types, not lifetimes”, but ultimately I think it is not living up to Rust’s potential: as you can see above, we will still be fairly reliant on inlining to recover needed context for optimizing. Given that the vast majority of Rust is safe code, where these sorts of operations are harmless, this seems like a shame.
Trust lifetimes only in safe code?
An alternative to the TPM is the “Asserting-Conflicting Access” model (ACA), which was proposed by arielb1 and ubsan. I don’t claim to be precisely representing their model here: I’m trying to (somewhat separately) work through those rules and apply them formally. So what I write here is more “inspired by” those rules than reflective of it.
That caveat aside, the idea in their model is that lifetimes are significant to the model, but you can’t trust the compiler’s inference in unsafe code. There, we have to assume that the unsafe code author is free to pick any valid lifetime, so long as it would still type check (not “borrow check” – i.e., it only has to ensure that no data outlives its owning scope). Note the similarities to the Tootsie Pop Model here – we still need to define what an “unsafe context” is, and when we enter such a context, the compiler will be less aggressive in optimizing (though more aggressive than in the TPM). (This has implications for the unchecked-get example.)
Nonetheless, I have concerns about this formulation because it seems
to assume that the logic for unsafe code can be expressed in terms
of Rust’s lifetimes – but as I wrote above Rust’s lifetimes are
really a conservative approximation. As we improve our type system,
they can change and become more precise – and users might have in
mind more precise and flow-dependent lifetimes still. In particular,
it seems like the “ACA” would disallow my alloc_free2
example:
fn alloc_free2() {
unsafe {
let p: *mut i32 = allocate_an_integer();
let q: &i32 = &*p;
let r = *q; // (1)
if condition1() {
free(p); // (2)
}
if condition2() {
use(r); // (3)
if condition3() {
use_again(*q); // (4)
}
}
}
}
Intuitively, the problem is that the lifetime of q
must enclose the
points (1), (2), (3), and (4) that are commented above. But the user
knows that condition1()
and condition3()
are mutually exclusive,
so in their mind, the lifetime ends either when we reach point (2),
since they know that this means that point (4) is unreachable.
In terms of their model, the conflicting access would be (2) and the asserting access would be (1). But I might be misunderstanding how this whole thing works.
Trust lifetimes at safe fn boundaries
Nonetheless, perhaps we can do something similar to the ACA model
and say that: we can trust lifetimes in “safe code” but totally
disregard them in “unsafe code” (however we define that). If we
adopted these definitions, would that allow us to optimize patsy()
?
fn patsy<'a>(v: &'a usize) -> usize {
let l = *v;
collaborator();
use(l);
}
Presuming patsy()
is considered to be “safe code”, then the answer is
yes. This in turn implies that any unsafe callers are obligated to
consider patsy()
as a “block box” in terms of what it might do with 'a
.
This flows quite naturally from a “permissions” perspective — giving a reference to a safe fn implies giving it permission to use that reference any time during its execution. I have been (separately) trying to elaborate this notion, but it’ll have to wait for a separate post.
Conclusion
One takeaway from this meandering walk is that, if we want to make it easy to optimize Rust code aggressively, there is something special about the fn boundary. In retrospect, this is really not that surprising: we are trying to enable intraprocedural optimization, and hence the fn boundary is the boundary beyond which we cannot analyze – within the fn body we can see more.
Put another way, if we want to optimize patsy()
without doing any
interprocedural analysis, it seems clear that we need the caller to
guarantee that v
will be valid for the entire call to patsy
:
fn patsy(v: &usize) -> usize {
let l = *v;
collaborator();
use(l);
}
I think this is an interesting conclusion, even if I’m not quite sure where it leads yet.
Another takeaway is that we have to be very careful trusting lifetimes around unsafe code. Lifetimes of references are a tool designed for use by the borrow checker: we should not use them to limit the clever things that unsafe code authors can do.
Note on comments
Comments are closed on this post. Please post any questions or comments on the internals thread I’m about to start. =)
Also, I’m collecting unsafe-related posts into the unsafe category.