This post is a reply / extension to Shane Osbourne’s post on binary search. So please give that a read before continuing here. It’s a great introduction to the algorithm with animations and all.

Motivation

One big selling points of Rust is zero-cost abstractions. So using zero-cost abstractions arguably makes Rust code more idiomatic. The binary search implementation presented in the original offers some potential into that direction.

Binary search also may optionally provide the position where an element would fit, if it were to be inserted.

In this post I want to explore some of these possibilities.

Base

This is the code from the original blog post.

pub fn binary_search(k: i32, items: &[i32]) -> Option<usize> {
    if items.is_empty() {
        return None;
    }

    let mut low: usize = 0;
    let mut high: usize = items.len() - 1;

    while low <= high {
        let middle = (high + low) / 2;
        if let Some(current) = items.get(middle) {
            if *current == k {
                return Some(middle);
            }
            if *current > k {
                if middle == 0 {
                    return None;
                }
                high = middle - 1
            }
            if *current < k {
                low = middle + 1
            }
        }
    }
    None
}

More Types

The original code only works on the i32 type. Using the std::cmp::Ord trait we can use the function for all number types and even all other types that define a natural ordering.

use std::cmp::Ord;

pub fn binary_search<T:Ord>(k: &T, items: &[T]) -> Option<usize> {
    if items.is_empty() {
        return None;
    }

    let mut low: usize = 0;
    let mut high: usize = items.len() - 1;

    while low <= high {
        let middle = (high + low) / 2;
        if let Some(current) = items.get(middle) {
            if current == k {
                return Some(middle);
            }
            if current > k {
                if middle == 0 {
                    return None;
                }
                high = middle - 1
            }
            if current < k {
                low = middle + 1
            }
        }
    }
    None
}

#[test]
fn test() {
    let numbers = [1,3,8,10];
    assert_eq!(Some(1), binary_search(&3, &numbers));
    assert_eq!(Some(3), binary_search(&10, &numbers));
    assert_eq!(None, binary_search(&11, &numbers));
    assert_eq!(None, binary_search(&-5, &numbers));
    assert_eq!(None, binary_search(&4, &numbers));
}

Note that for consistency with more complex types, we have to pass in the key to search for as a reference now, e.g. &3 instead of just 3.

Insertion Index

When a key cannot be found, it might be relevant, where in the input the key would have to be inserted. Java does this by returning a negative number instead (e.g. -2 if the key should be inserted at index 1). But in Rust we can use the Result type: If the key is found, we return its index inside an Ok, and if it is not found, we return the index, where the element would fit inside an Err.

use std::cmp::Ord;

pub fn binary_search<T:Ord>(k: &T, items: &[T]) -> Result<usize, usize> {
    if items.is_empty() {
        return Err(0);
    }

    let mut low: usize = 0;
    let mut high: usize = items.len() - 1;

    while low <= high {
        let middle = (high + low) / 2;
        if let Some(current) = items.get(middle) {
            if current == k {
                return Ok(middle);
            }
            if current > k {
                if middle == 0 {
                    return Err(0);
                }
                high = middle - 1
            }
            if current < k {
                low = middle + 1
            }
        }
    }
    Err(low)
}

#[test]
fn test() {
    let numbers = [1,3,8,10];
    assert_eq!(Ok(1), binary_search(&3, &numbers));
    assert_eq!(Ok(3), binary_search(&10, &numbers));
    assert_eq!(Err(4), binary_search(&11, &numbers));
    assert_eq!(Err(0), binary_search(&-5, &numbers));
    assert_eq!(Err(2), binary_search(&4, &numbers));
}

And indeed, this is the exact way, the standard library provides binary search on Vec and on slices. Looks like we reached maximum idiomaticity! Time to stop, right? Right??

Returning Less

A Result<usize,usize> takes up more space than a usize, because the compiler cannot assume any usize values to be unattainable in this case. Tackling this issue is definitely a strong case of premature optimization, but the Java folks are more efficient than us with their negative number trick, and we can’t have Java be more efficient than Rust, can we?

We can do better, if we really want to: We can create our own return type to save space, using the negative number trick internally. So we get the efficiency of Java mixed with Rust’s convenience.1

#[derive(Debug,PartialEq,Eq)]
pub struct BinarySearchResult {
    data: isize
}

impl BinarySearchResult {
    pub fn get_index(&self) -> usize {
        if !self.is_found() {
            panic!("key not found")
        }
        self.data as usize
    }
    pub fn is_found(&self) -> bool {
        self.data >= 0
    }
    pub fn get_insertion_index(&self) -> usize {
        if self.is_found() {
            panic!("key already exists")
        }
        (-1 - self.data) as usize
    }
    pub fn found(index: usize) -> Self {
        BinarySearchResult { data: index as isize}
    }
    pub fn not_found(insertion_index: usize) -> Self {
        BinarySearchResult { data: -(insertion_index as isize) - 1 }
    }
}

Be aware that this type loses the ability to talk about the index usize::MAX. This returns false on my machine.

BinarySearchResult::found(usize::MAX).is_found()

Rust programs panic for overflows and underflows in debug mode, but they do not panic for precision loss by casting. So to be 100% safe, some checks should be added to the found and not_found constructors.2

The BinarySearchResult has another disadvantage compared to Result: It cannot be used with the questionmark operator. In future Rust versions (or current nightly versions) our type can implement the Try trait to make this possible.

Taking Abstraction too far

So far we have only been searching on slices. But binary search works on anything we can index. For that there exists the Index trait. It comes without the indexable’s length though, so we would need to add that as an explicit parameter, which makes the function more cumbersome to use.

We can decide, if we only want to support Index<usize>, or if we want to go even further and support any index type that implements the operations we need (Add and Div or Shr).

That would be taking it way too far though, unless you actually have this requirement in your project.

Thanks for reading :-)

About This Blog

I haven’t blogged in a long while and I’m not happy with the setup my blog used to use anymore. But I’m also not ready with moving it to where I want it, yet. So this posting is kind of hacky, using Hugo locally with a default theme to generate it and then just manually scp-ing the generated page to the public html directory of my Uberspace. The post will move once the new blog setup is done and this link will become a redirect.


  1. This one is fun to quote out of context :D ↩︎

  2. Man, Java people don’t have those problems with their “all numbers are signed” policy. But there is still opportunity for Rust smugness here, because if a Java dev also were to create a new type for the result type to make it easier to use, it would have to be a class with heap allocation and garbage collection, ugh glad we’re not that guy. Also they can’t handle arrays longer than i32::MAX at all with their return type, which is way less than our isize::MAX on computers that weren’t made in the nineties, as Java was (BURN!). ↩︎