Performance

Which version of our grep functions is faster: the version with an explicit for loop or the version with iterators? We ran a benchmark by loading the entire contents of "The Adventures of Sherlock Holmes" by Sir Arthur Conan Doyle into a String and looking for the word "the" in the contents. Here were the results of the benchmark on the version of grep using the for loop and the version using iterators:

test bench_grep_for  ... bench:  19,620,300 ns/iter (+/- 915,700)
test bench_grep_iter ... bench:  19,234,900 ns/iter (+/- 657,200)

The iterator version ended up slightly faster! We're not going to go through the benchmark code here, as the point is not to prove that they're exactly equivalent, but to get a general sense of how these two implementations compare. For a real benchmark, you'd want to check various texts of various sizes, different words, words of different lengths, and all kinds of other variations. The point is this: iterators, while a high-level abstraction, get compiled down to roughly the same code as if you'd written the lower-level code yourself. Iterators are one of Rust's zero-cost abstractions, by which we mean using the abstraction imposes no additional runtime overhead in the same way that Bjarne Stroustrup, the original designer and implementer of C++, defines zero-overhead:

In general, C++ implementations obey the zero-overhead principle: What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better.

  • Bjarne Stroustrup "Foundations of C++"

As another example, here is some code taken from an audio decoder. This code uses an iterator chain to do some math on three variables in scope: a buffer slice of data, an array of 12 coefficients, and an amount by which to shift data in qlp_shift. We've declared the variables within this example but not given them any values; while this code doesn't have much meaning outside of its context, it's still a concise, real-world example of how Rust translates high-level ideas to low-level code:

let buffer: &mut [i32];
let coefficients: [i64; 12];
let qlp_shift: i16;

for i in 12..buffer.len() {
    let prediction = coefficients.iter()
                                 .zip(&buffer[i - 12..i])
                                 .map(|(&c, &s)| c * s as i64)
                                 .sum::<i64>() >> qlp_shift;
    let delta = buffer[i];
    buffer[i] = prediction as i32 + delta;
}

In order to calculate the value of prediction, this code iterates through each of the 12 values in coefficients, uses the zip method to pair the coefficient values with the previous 12 values in buffer. Then for each pair, multiply the values together, sum all the results, and shift the bits in the sum qlp_shift bits to the right

Calculations in applications like audio decoders often prioritize performance most highly. Here, we're creating an iterator, using two adaptors, then consuming the value. What assembly code would this Rust code compile to? Well, as of this writing, it compiles down to the same assembly you'd write by hand. There's no loop at all corresponding to the iteration over the values in coefficients: Rust knows that there are twelve iterations, so it "unrolls" the loop. All of the coefficients get stored in registers (which means accessing the values is very fast). There are no bounds checks on the array access. It's extremely efficient.

Now that you know this, go use iterators and closures without fear! They make code feel higher-level, but don't impose a runtime performance penalty for doing so.

Summary

Closures and iterators are Rust features inspired by functional programming language ideas. They contribute to Rust's ability to clearly express high-level ideas. The implementations of closures and iterators, as well as other zero-cost abstractions in Rust, are such that runtime performance is not affected.

Now that we've improved the expressiveness of our I/O project, let's look at some more features of cargo that would help us get ready to share the project with the world.