Query structure in chalk
25 May 2017
For my next post discussing chalk, I want to take kind of a different turn. I want to talk about the general struct of chalk queries and how chalk handles them right now. (If you’ve never heard of chalk, it’s sort of “reference implementation” for Rust’s trait system, as well as an attempt to describe Rust’s trait system in terms of its logical underpinnings; see this post for an introduction to the big idea.)
The traditional, interactive Prolog query
In a traditional Prolog system, when you start a query, the solver will run off and start supplying you with every possible answer it can find. So if I put something like this (I’m going to start adopting a more Rust-like syntax for queries, versus the Prolog-like syntax I have been using):
?- Vec<i32>: AsRef<?U>
The solver might answer:
Vec<i32>: AsRef<[i32]>
continue? (y/n)
This continue
bit is interesting. The idea in Prolog is that the
solver is finding all possible instantiations of your query
that are true. In this case, if we instantiate ?U = [i32]
, then the
query is true (note that the solver did not, directly, tell us a value
for ?U
, but we can infer one by unifying the response with our
original query). If we were to hit y
, the solver might then give us
another possible answer:
Vec<i32>: AsRef<Vec<i32>>
continue? (y/n)
This answer derives from the fact that there is a reflexive impl
(impl<T> AsRef<T> for T
) for AsRef
. If were to hit y
again,
then we might get back a negative response:
no
Naturally, in some cases, there may be no possible answers, and hence
the solver will just give me back no
right away:
?- Box<i32>: Copy
no
In some cases, there might be an infinite number of responses. So for
example if I gave this query, and I kept hitting y
, then the solver
would never stop giving me back answers:
?- Vec<?U>: Clone
Vec<i32>: Clone
continue? (y/n)
Vec<Box<i32>>: Clone
continue? (y/n)
Vec<Box<Box<i32>>>: Clone
continue? (y/n)
Vec<Box<Box<Box<i32>>>>: Clone
continue? (y/n)
As you can imagine, the solver will gleefully keep adding another
layer of Box
until we ask it to stop, or it runs out of memory.
Another interesting thing is that queries might still have variables in them. For example:
?- Rc<?T>: Clone
might produce the answer:
Rc<?T>: Clone
continue? (y/n)
After all, Rc<?T>
is true no matter what type ?T
is.
Do try this at home: chalk has a REPL
I should just note that ever since aturon recently added a REPL to chalk, which means that – if you want – you can experiment with some of the examples from this blog post. It’s not really a “polished tool”, but it’s kind of fun. I’ll give my examples using the REPL.
How chalk responds to a query
chalk responds to queries somewhat differently. Instead of trying to enumerate all possible answers for you, it is looking for an unambiguous answer. In particular, when it tells you the value for a type variable, that means that this is the only possible instantiation that you could use, given the current set of impls and where-clauses, that would be provable.
Overall, chalk’s answers have three parts:
- Status: Yes, No, or Maybe
- Refined goal: a version of your original query with some substitutions applied
- Lifetime constraints: these are relations that must hold between the lifetimes that you supplied as inputs. I’ll come to this in a bit.
Future compatibility note: It’s worth pointing out that I expect some the particulars of a “query response” to change, particularly as aturon continues the work on negative reasoning. I’m presenting the current setup here, for the most part, but I also describe some of the changes that are in flight (and expected to land quite soon).
Let’s look at these three parts in turn.
The status and refined goal of a query response
The “status” tells you how sure chalk is of its answer, and it can be yes, maybe, or no.
A yes response means that your query is uniquely provable, and in that case the refined goal that we’ve given back represents the only possible instantiation. In the examples we’ve seen so far, there was one case where chalk would have responded with yes:
> cargo run
?- load libstd.chalk
?- exists<T> { Rc<T>: Clone }
Solution {
successful: Yes,
refined_goal: Query {
value: Constrained {
value: [
Rc<?0>: Clone
],
constraints: []
},
binders: [
U0
]
}
}
(Since this is the first example using the REPL, a bit of explanation
is in order. First, cargo run
executs the REPL, naturally. The first
command, load libstd.chalk
, loads up some standard type/impl
definitions. The next command, exists<T> { Rc<T>: Clone }
is the
actual query. In the section of Prolog examples, I used the Prolog
convention, which is to implicitly add the “existential quantifiers”
based on syntax. chalk is more explicit: writing exists<T> { ... }
here is saying “is there a T
such that ...
is true?”. In future
examples, I’ll skip over the first two lines.)
You can see that the response here (which is just the Debug
impl for
chalk’s internal data structures) included not only Yes
, but also a
“refined-goal”. I don’t want to go into all the details of how the
refined goal is represented just now, but if you skip down to the
value
field you will pick out the string Rc<?0>: Clone
– here the
?0
indicates an existential variable. This is saying thatthe
“refined” goal is the same as the query, meaning that Rc<T>: Clone
is true no matter what Clone
is. (We saw the same thing in the
Prolog case.)
So what about some of the more ambiguous cases. For example, what
happens if we ask exists<T> { Vec<T>: Clone }
. This case is
trickier, because for Vec<T>
to be clone, T
must be Clone
, so it
matters what T
is:
?- exists<T> { Vec<T>: Clone }
Solution {
successful: Maybe,
... // elided for brevity
}
Here we get back maybe. This is chalk’s way of saying that the
query is provable for some instants of ?T
, but we need more type
information to find a unique answer. The idea is that we will
continue type-checking or processing in the meantime, which may yield
results that further constrain ?T
; e.g., maybe we find a call to
vec.push(22)
, indicating that the type of the values within is
i32
. Once that happens, we can repeat the query, but this time with
a more specific value for ?T
, so something like Vec<i32>: Clone
:
?- Vec<i32>: Clone
Solution {
successful: Yes,
...
}
Finally, some times chalk can decisively prove that something is not provable. This would occur if there is just no impl that could possibly apply (but see aturon’s post, which covers how we plan to extend chalk to be able to reason beyond a single crate):
?- Box<i32>: Copy
`Copy` is not implemented for `Box<i32>` in environment `Env(U0, [])`
Refined goal in action
The refined goal so far hasn’t been very important; but it’s generally a way for the solver to communicate back a kind of substitution – that is, to communicate back what values the type variables have to have in order for the query to be provable. Consider this query:
?- exists<U> { Vec<i32>: AsRef<Vec<U>> }
Now, in general, a Vec<i32>
implements AsRef
twice:
Vec<i32>: AsRef<Slice<i32>>
(chalk doesn’t understand the syntax[i32]
, so I made a typeSlice
for it)Vec<i32>: AsRef<Vec<i32>>
But here, we know we are looking for AsRef<Vec<U>>
. This implies
then that U
must be i32
. And indeed, if we give this query, chalk
tells us so, using the refined goal:
?- exists<U> { Vec<i32>: AsRef<Vec<U>> }
Solution {
successful: Yes,
refined_goal: Query {
value: Constrained {
value: [
Vec<i32>: AsRef<Vec<i32>>
],
constraints: []
},
binders: []
}
}
Here you can see that there are no variables. Instead, we see
Vec<i32>: AsRef<Vec<i32>>
. If we unify this with our original query
(skipping past the exists
part), we can deduce that U = i32
.
You might imagine that the refined goal can only be used when the response is yes – but, in fact, this is not so. There are times when we can’t say for sure if a query is provable, but we can still say something about what the variables must be for it to be provable. Consider this example:
?- exists<U, V> { Vec<Vec<U>>: AsRef<Vec<V>> }
Solution {
successful: Maybe,
refined_goal: Query {
value: Constrained {
value: [
Vec<Vec<?0>>: AsRef<Vec<Vec<?0>>>
],
constraints: []
},
binders: [
U0
]
}
}
Here, we were asking if Vec<Vec<U>>
implements AsRef<Vec<V>>
. We
got back a maybe response. This is because the AsRef
impl
requires us to know that U: Sized
, and naturally there are many
sized types that U
could be, so we need to wait until we get more
information to give back a definitive response.
However, leaving aside concerns about U: Sized
, we can see that
Vec<Vec<U>>
must equal Vec<V>
, which implies that, for this query
to be provable, Vec<U> = V
must hold. And the refined goal reflects
as much:
Vec<Vec<?0>>: AsRef<Vec<Vec<?0>>>
Open vs closed queries
Queries in chalk are always “closed” formulas, meaning that all the
variables that they reference are bound by either an exists<T>
or a
forall<T>
binder. This is in contrast to how the compiler works, or
a typical prolog implementation, where a trait query occurs in the
context of an ongoing set of processing. In terms of the current rustc
implementation, the difference is that, in rustc, when you wish to do
some trait selection, you invoke the trait solver with an inference
context in hand. This defines the context for any inference variables
that appear in the query.
In chalk, in contrast, the query starts with a “clean slate”. The only context that it needs is the global context of the entire program – i.e., the set of impls and so forth (and you can consider those part of the query, if you like).
To see the difference, consider this chalk query that we looked at earlier:
?- exists<U> { Vec<i32>: AsRef<Vec<U>> }
In rustc, such a query would look more like Vec<i32>: AsRef<Vec<?22>>
, where we have simply used an existing inference
variable (?22
). Moreover, the current implementation simply gives
back the yes/maybe/no part of the response, and does not have a notion
of a refined goal. This is because, since we have access to the raw
inference variable, we can just unify ?22
(e.g., with i32
) as a
side-effect of processing the query.
The new idea then is that when some part of the compiler needs to
prove a goal like Vec<i32>: AsRef<Vec<?22>>
, it will first create a
canonical query from that goal
(chalk code is in query.rs
). This is done by replacing
all the random inference variables (like ?22
) with existentials. So
you would get exists<T> Vec<i32>: AsRef<Vec<T>>
as the output. One
key point is that this query is independent of the precise inference
variables involved: so if we have to solve this same query later, but
with different inference variables (e.g., Vec<i32>: AsRef<Vec<?44>>
), when we make the canonical form of that query, we’d
get the same result.
Once we have the canonical query, we can invoke chalk’s solver. The code here varies depending on the kind of goal, but the basic strategy is the same. We create a “fulfillment context”, which is the combination of an inference context (a set of inference variables) and a list of goals we have yet to prove. (The compiler has a similar data structure, but it is setup somewhat differently; for example, it doesn’t own an inference context itself.)
Within this fulfillment context, we can
“instantiate”
the query, which means that we replace all the variables bound in an
exists<>
binder with an inference variable (here is
an example of code invoking instantiate()
. This
effectively converts back to the original form, but with fresh
inference variables. So exists<T> Vec<i32>: AsRef<Vec<T>>
would
become Vec<i32>: AsRef<Vec<?0>>
. Next we can actually try to prove
the goal, for example by searching through each impl,
unifying the goal with the impl header,
and then
recursively processing the where-clauses on the impl
to make sure they are satisfied.
An advantage of the chalk approach where queries are closed is that they are much easier to cache. We can solve the query once and then “replay” the result an endless number of times, so long as the enclosing context is the same.
Lifetime constraints
I’ve glossed over one important aspect of how chalk handles queries, which is the treatment of lifetimes. In addition to the refined goal, the response from a chalk query also includes a set of lifetime constraints. Roughly speaking, the model is that the chalk engine gives you back the lifetime constraints that would have to be satisfied for the query to be provable.
In other words, if you have a full, lifetime-aware logic, you might
say that the query is provable in some environment Env
that also
includes some facts about the lifetimes (i.e., which lifetime outlives
which other lifetime, and so forth):
Env, LifetimeEnv |- Query
but in chalk we are only giving in Env
, and the engine is giving
back to us a LifetimeEnv
:
chalk(Env, Query) = LifetimeEnv
with the intention that we know that if we can prove that LifetimeEnv
holds, then Query
also holds.
One of the main reasons for this split is that we want to ensure that
the results from a chalk query do not depend on the specific lifetimes
involved. This is because, in part, we are going to be solving chalk
queries in contexts when lifetimes have been fully erased, and hence
we don’t actually know the original lifetimes or their relationships
to one another. (In this case, the idea is roughly that we will get
back a LifetimeEnv
with the relationships that would have to hold,
but we can be sure that an earlier phase in the compiler has proven to
us that this LifetimeEnv
will be satisfied.)
Anyway, I plan to write a follow-up post (or more…) focusing just on lifetime constraints, so I’ll leave it at that for now. This is also an area where we are doing some iteration, particularly because of the interactions with specialization, which are complex.
Future plans
Let me stop here to talk a bit about the changes we have planned. aturon has been working on a branch that makes a few key changes. First, we will replace the notion of “refined goal” with a more straight-up substitution. That is, we’d like chalk to answer back with something that just tells you the values for the variables you’ve given. This will make later parts of the query processing easier.
Second, following the approach that aturon outlined in their blog post, when you get back a “maybe” result, we are actually going to be considering two cases. The current code will return a refined substitution only if there is a unique assignment to your input variables that must be true for the goal to be provable. But in the newer code, we will also have the option to return a “suggestion” – something which isn’t necessary for the goal to be provable, but which we think is likely to be what the user wanted. We hope to use this concept to help replicate, in a more structured and bulletproof way, some of the heuristics that are used in rustc itself.
Finally, we plan to implement the “modal logic” operators, so that you can make queries that explicitly reason about “all crates” vs “this crate”.