Revised for loop protocol
16 January 2013
In Rust today, there is a
special for
syntax designed to support interruptible loops.
Since we introduced it, this has proven to be a remarkable success.
However, I think we can improve it very slightly.
Current for protocol
The current “for protocol” is best explained by giving an example of how to implement it for slices:
fn each<E>(v: &[E], f: &fn(&E) -> bool) {
let mut i = 0;
let n = v.len();
while i < n {
if !f(&v[i]) {
return;
}
i += 1
}
}
As you can see, the idea is that the last parameter to the each()
method is a function of type &fn(&E) -> bool
, which means that it is
given a pointer to an element in the collection and it returns true or
false. The return value indicates whether we should continue
iterating.
A little known fact is that the for
statement returns whatever the
each()
method returns. This means that each()
methods typically
have unit return type so that the Rust compiler doesn’t require a
semicolon, which would be used to disregard the result of the for
expression.
Problems
The biggest problem with this protocol is that it is not easily composable. In particular, imagine that I have a simple tree like this:
struct Tree<E> {
elem: E,
children: ~[Tree<E>]
}
Now let’s try to implement the pre-order traversal method for such a tree. You might think you could do it like this:
fn each<E>(t: &Tree<E>, f: &fn(&E) -> bool) {
if !f(&t.elem) {
return;
}
for t.children.each |child| { each(child, f); }
}
While this will compile, it will not work as expected. For example, this program:
fn main() {
let t = Tree {
elem: 0,
children: ~[
Tree { elem: 1, children: ~[
Tree { elem: 2, children: ~[] }
] },
Tree { elem: 3, children: ~[] }
]
};
for each(&t) |e| {
io::println(fmt!("%d", *e));
if *e == 1 { break; }
}
}
should print “0” and “1”, but it prints “0”, “1”, and “3”. The reason
is that while each()
does indeed return early when the iteration
function returns false, it doesn’t abort the entire iteration, only
the current subtree.
One way to fix this is to wrap the each()
function with an inner
each function that returns a bool to indicate whether execution should
stop:
fn each1<E>(t: &Tree<E>, f: &fn(&E) -> bool) {
each_inner(t, f);
fn each_inner<E>(t: &Tree<E>, f: &fn(&E) -> bool) -> bool {
if !f(&t.elem) {
return false;
}
for t.children.each |child| {
if !each_inner(child, f) {
return false;
}
}
return true;
}
}
Making each()
composable
I think that we should change the standard each
signature to:
fn each<E>(c: &Coll<E>, f: &fn(&E) -> bool) -> bool
Here the return value of each
is always a boolean, and it will be false
if the last call to f()
returned false, and true otherwise. This
makes it easier to write composed each()
methods. We would also
adjust for
statements so that they always return unit and do not
return the result of each()
.
Under this definition, we could write the tree iterator as follows:
fn each2<E>(t: &Tree<E>, f: &fn(&E) -> bool) -> bool {
f(&t.elem) && t.children.each(|c| each2(c, f))
}
This is clearly an improvement over each1()
!