The Unreasonable Effectiveness of Linear Search

### [ 2018-September-10 10:53 ]

Thanks to programming interviews, we have memorized that a binary search performs O(log n) comparisons, while a linear search performs O(n), so a binary search is going to be faster. Unfortunately, this ignores the constant factors. Linear search is the same or slightly faster for arrays of less than about 100 integers, since it is simpler than a binary search. This ignores the cost of sorting the array, so the advantage could be slightly larger for real programs. My conclusion is that if you only have a tiny number of elements (say less than ~30), it probably doesn't really matter what data structure you use, and an old fashioned array is a good choice due to simplicity and memory efficiency. For larger data sets the algorithmic advantages are huge and you need to choose carefully.

I was recently reminded of this fact when I was browsing the ChangeLog for the STX B-Tree, a C++ in-memory B-Tree. The first line was "changing find_lower() to not use binary search for small node sizes." After investigating, it was changed because it improved performance. I wasted a lot of time during my PhD tweaking an in-memory tree and had "discovered" the same optimization. I remember writing some tiny benchmark and being surprised that it remained true for fairly large values of N. I decided to revisit this and wrote a quick-and-dirty benchmark. The short summary is that up to about 100 integers, the linear search is better or competitive. This is amazing if you consider the comparisons. With 100 elements, the linear search performs on average 50 comparisons, while the binary search performs only 6 or 7, so it is doing about 10X more "work" in the same amount of time.

If you really like optimization, you should read two posts that use vector and conditional move instructions to optimize both the linear and binary search. The first by Paul Khuong from 2012 concludes that "binary search is faster for short [...] and long vectors". The second from 2017 revisits these results and finds that vectorized linear search is at least competitive up to 128 elements, but concludes "there is no reason to prefer linear search over binary search." These results, and mine below, show that the actual difference is small and can often vary depending on compiler optimizations and the differences between specific CPUs.

The chart below shows my benchmark results for the time for an individual lookup for the different implementations.

The implementations are as follows:

• binary/linear: Binary or linear search of an array (`std::vector`).
• plain/split: Without the suffix the array contains (key, value) pair structs. With `split` the keys and values are in separate arrays.
• map: Using a binary tree implemented using `std::map`.

As expected for small N, linear search performs slightly better than binary search (29.6 ns compared to 35.7 ns, a 27% improvement). Surprisingly, the `std::map` implementation is faster than binary search on an array. This suggests to me that I either have a bug in my code, or there is something about my benchmark that causes Clang's optimizer to do something strange. I don't understand how the pointer chasing the happens with a binary tree could possibly be better than a plain binary search over an array. As the data sets get much larger, binary search is better than the map. Possibly this just means that modern CPUs are really good at pointer chasing, if the data is all in the L1 cache.

The test was run on a Google Cloud n1-highcpu-2 machine on their "Intel Skylake" platform. The code was compiled using: `clang++-6.0 -O3 --std=c++11 -Wall -Werror -march=native -mtune=native -o linearsearch linearsearch.cc`, using `clang version 6.0.0-1~bpo9+1`.

Download the code: `linearsearch.cc`.