Polonius and the case of the hereditary harrop predicate
21 January 2019
In my previous post about Polonius and subregion obligations, I mentioned that there needs to be a follow-up to deal with higher-ranked subregions. This post digs a bit more into what the problem is in the first place and sketches out the general solution I have in mind, but doesn’t give any concrete algorithms for it.
The subset relation in Polonius is not enough
In my original post on Polonius, I assumed that when we computed a
subtype relation T1 <: T2
between two types, the result was either a
hard error or a set of subset
relations between various regions.
So, for example, if we had a subtype relation between two references:
&'a u32 <: &'b u32
the result would be a subset
relation 'a: 'b
(or, “'a
contains a
subset of the loans in 'b
”).
For a more complex case, consider the relationship of two fn types:
fn(&'a u32) <: fn(&'b u32)
// ^^^^^^^ ^^^^^^^^^^
// | A fn expecting a `&'b u32` as argument.
// |
// A fn expecting a `&'a u32` as argument.
If we imagine that we have some variable f
of type fn(&'a u32)
–
that is, a fn that can be called with a 'a
reference – then this
subtype relation is saying that f
can be given the type fn(&'b u32)
– that is, a fn that can be called with a 'b
reference. That
is fine so long as that 'b
reference can be used as a 'a
reference: that is, &'b u32 <: &'a u32
. So, we can say that the two
fn types are subtypes so long as 'b: 'a
(note that the order is
reversed from the first example; this is because fn types are
contravariant in their argument type).
Unfortunately, this structure isn’t flexible enough to accommodate a subtyping question involving higher-ranked types. Consider a subtype relation like this:
fn(&'a u32) <: for<'b> fn(&'b u32)
// ^^^^^^^
// Unlike before, the supertype
// expects a reference with *any*
// lifetime as argument.
What subtype relation should come from this? We can’t say 'b: 'a
as
before, because the lifetime 'b
isn’t some specific region –
rather, the supertype says that the function has to accept a reference
with any lifetime 'b
. In fact, this subtyping relation should
ultimately yield an error.
Richer constraints
To express the constraints that arise from higher-ranked subtyping (and trait matching), we need a richer set of constraints than just subset. In fact, if you tease it all out, we need something more like this:
Constraint = Subset
| Constraint, Constraint // and
| forall<R1> { Constraint }
| exists<R1> { Constraint }
Subset = R1: R2
Now we can say that
fn(&'a u32) <: for<'b> fn(&'b u32)
holds if the constraint forall<'b> { 'b: 'a }
holds, which implies
that 'a
has to contain all possible loans. This isn’t possible, and
so we would treat this as an error.
Interestingly, if we reverse the order of the two types:
for<'b> fn(&'b u32) <: fn(&'a u32)
we get the constraint exists<'b> { 'a: 'b }
(for
binders on the
subtype side are instantiated with “there exists”, not “for
all”). That is, the region 'a
must be a subset of some possible set
of loans 'b
. This constraint is trivially solveable: 'b
could
always be exactly 'a
itself.
The role of free vs bound regions
As one final example, consider what happens here, where we added a
return type and another region ('c
):
for<'b> fn(&'b u32) -> &'b u32
<:
fn(&'a u32) -> &'c u32
This gives rise to the following constraint:
exists<'b> {
'a: 'b, // from relating the parameter types
'b: 'c, // from relating the return types
}
Here, the constraint is solveable, but only if 'a: 'c
. Therefore, if
we think back to Polonius with its simple “subset” relations, we can
effective reduce this “rich” constraint to the subset relation 'a: 'c
.
To do this reduction, we draw a distinction between the bound and
the free regions. Bound regions are those that are bound within a
forall
and exists
quantifiers (e.g., 'b
), and free regions
those that are not. When we are reducing, we only care about two things:
- Do we have something unsatisfiable about the constraint? This
often happens when a bound “forall” region is on the right-hand
side.
- We saw this with
forall<'b> { 'b: 'a }
. - Another example is
forall<'x, 'y> { 'x: 'y }
. - Reducing something unsatisfiable is obviously an error.
- We saw this with
- What are the effects on the free regions? Othertimes, bound
regions effectively as a “go-between”, creating subset relations
between the free regions.
- We saw this with
exists<'b> { 'a: 'b, 'b: 'c }
. - Another, stranger example might be
forall<'b> { 'a: 'b }
: here, this is satisfiable, but only if'a: 'static
. This is true because'static: 'b
is implicitly true ('static: R
is true for any region R).- In Polonius terms,
'static
represents an “empty set” of loans, so this effectively means that'a
can be a subset of any region'b
by being the empty set.
- In Polonius terms,
- We saw this with
Wait, those “richer” constraints look familiar…
The “richer” constraints I mentioned in the previous section basically
arise from taking a base predicate (R1: R2
) and “adding in” richer
constraint forms like “for all” and “there exists”. This may sound
familiar – if you recall my very first Chalk post, I talked
about the need to go beyond Prolog’s core “Horn clauses” and to
support “Hereditary Harrop” (HH) predicates. The basic idea was
to extend simple Horn clauses with “for all” and “there exists”, along
with a few other things.
In fact, “hereditary harrop predicates” are a kind of generic structure that we can apply to any base set of predicates. So, if we wanted, we might say that the region constraints we are creating can be extended to the full hereditary harrop form, which would look like so:
Constraint = Subset
| Constraint, Constraint // and
| Constraint; Constraint // or
| forall<R1> { Constraint }
| exists<R1> { Constraint }
| if (Assumption) { Constraint }
Assumption = Subset
| forall<R1> { Assumption }
| if (Constraint) { Assumption }
Subset = R1: R2
Here we support not only “for all” and “there exists” but also “implication” and even “or”. rustc doesn’t use constraints this rich today, but for various reasons I think we will want to eventually.
Why is it useful to talk about HH predicates? Well, HH predicates have the nice property that we can use basic Prolog-style search to find and enumerate all possible solutions to them. Besides, “hereditary harrop” is really fun to say.
Conclusion
So now we have this problem. To encode the “solutions” to
higher-ranked subtyping and trait matching, we need to use this richer
notion of constraints that include forall
and exists
quantifiers. Once we add those, we are basically talking about
“hereditary harrop region constraints”. We’ve also talked about the
idea of mapping these complex constraints down to the simple subset
relation that Polonius uses, but here I only gave examples and didn’t
really give any sort of algorithm. I’ve done some experiments here,
and I may try to write them up in a future post, but I’m also curious
to know if somebody else has already solved this problem. I definitely
have that “reinventing the wheel” feeling here.
One really nice aspect of this general direction, though, is that it means that Polonius effectively doesn’t care about these “richer” constraints. The idea is that our subtyping and trait matching algorithms can produce hereditary harrop region constraints (or some subset thereof). These can be reduced to simpler subset constraints, which are then passed to Polonius to do the final reasoning. (And, of course, any of these steps may also produce an error.)
Comments, as usual, are requested in the internals thread for this blog post series.