Cache-friendly Database Query Execution
Traditionally, query execution engines construct a tree data structure (called execution plan) of operators that describe how a query is to be executed. Rows are then passed one-by-one from the leaves up to the root of this tree. This allows executing the query with a relatively low memory footprint, since most operators do not require all the rows to be available at once and intermediate result sets rarely have to be materialized.
However, this approach is not very cache-friendly. Handing rows one-by-one through the tree leads to many cache misses, at least between the handling of every row. Along the lines of data-oriented design, an alternative approach is to linearize the tree structure into a sequence of transformations and then perform every one of those transformations on the whole intermediate result up to this point. This would significantly reduce the number of cache misses, as most operations work linearly through a contiguous area of memory. However, the memory footprint of this approach can be expected to be higher.
The goal of this thesis is to devise methods for linearizing a given execution plan while trying to keep memory overhead as small as possible. These methods should then by benchmarked against an implementation of a traditional execution plan, to see whether performance is improved and how much more memory they require.