Matteo Frigo, Charles Leiserson, Harald Prokop, Sridhar Ramachandran:
Cache-Oblivious Algorithms
This paper introduced a new model of computation, a cache-oblivious algorithm, that proved to be very influential, both in theoretical and systems research. Standard (RAM) algorithms make the assumption that access to all memory locations incurs the same time delay, an assumption that is obviously violated when multiple levels of caches are present, leading to poor performance of the algorithms in certain settings. The cache-oblivious model addresses this problem while still being simple enough to enable theoretical analysis: Different levels of the memory hierarchy have different costs and both the total work as well as the number of cache misses in the slowest cache are analyzed. The crux is, however, that a cache-oblivious algorithm works without knowing hardware parameters such as the size of the caches and the cache line length, which are often not known by the programmer or which can vary depending on where the program is running.
This paper also presents the first asymptotically optimal algorithms for rectangular matrix transpose, FFT, and sorting in this model, which have been widely used in follow-up work.