Authors: Y Klonatos, C Koch et al EPFL


The authors argue for a high-level programming language to implement efficient engines. The key identified is to use generative programming that compiles to low level code, and unlock the ability to continuously optimize the query engine (or, on-the-fly for each individual query). They ran the experient for TPC-H benchmark and outperforms an in-memory database with very few lines of code.


This paper is based on some advanced compilation techniques and some low level programming details which I didn’t know about before reading the paper so laying them out here

First, compilers!

  • Low Level Virtual Machine, LLVM is a compiler middle layer that takes a (low level) intermediate representation (IR) code from a compiler and emitting an optimized one, which could be converted and linked to assembly.
  • Lightweight Modular Staging (LMS) compiler takes a program in Scala, converts to high level IR (one node in LMS IR could represent the creation of a hash map). Then LMS do transformation passes to apply all possible user defined optimizations. It provides many standard optimizations, such as “function inlining, common subexpression and dead code elimination, constant propagation, loop fusion, deforestation, and code motion”
    • An example optimization transformation to show why this could be useful is changing hash maps to native arrays. In general the optimization transformation capabilities is critical for LegoBase’s final performance.
    • One neat thing they could do is present-stage vs. future-stage, where present stage computation is executed right away and future-stage is placed later in the IR graph to be optimized by later optimization passes. This is useful for not loading all the data immediately but perhaps after some predicates are applied.

The authors mention that fundamentally LMS is better than LLVM for this task because “frameworks like LLVM cannot automatically detect the data-structure, data flow or operator optimizations that we support in LegoBase: the scope of optimization is too coarse-grained to be detected by a low-level compiler.”

System Design


  • get SQL query
  • generate query plan based on traditional query optimization (the authors chose a generic one)
  • LegoBase takes the physical plan and instantiates the operators with Scala functions (not very special)
  • Then apply domain specific LMS optimizations (this is where all the wins come in)
    • “We also use LMS to recompile the components of our query engine that are affected by a runtime change.” (e.g. no logging)
  • The authors extended LMS to generate C code, which is tricky because they have to implement garbage collection.
  • The C code is then compiled with generic cmake compiler.



(a) inter-operator optimizations for query plans

  • pull to push based: this is essentially reversing the dataflow, and could be achieved by changing the callees to callers (and vice versa). Operators that carry state, like joins, becomes tricky and the authors exhaust the iterator by calling the next function to eagerly evaluate everything. (yifan’s question: I wonder if they will see an improvement if they implemented eddies)
  • Eliminating Redundant Materializations: one motivating example here is template-based generation, where the operators are not aware of each other (yifan: what does aware mean?) Also dont quite follow the example in Figure 6. The idea seems to be if the operated object is note connected to the operator but its child then it doesn’t need to do the work anymore?

(b) transparent data-structure modifications

The authors demonstrate with HashMap as an example:

  • we convert the hash map to an array that stores only the values, and not the associated key
  • inline the function calls which improves branch prediction and cache locality (not sure why…)
  • preallocate all the necessary memory (as opposed adding as needed) and the size could be obtained by performing worst case analysis.

(c) changing the data layout: LegoBase makes it easy to switched from row base to column based layout for each operator.

(d) traditional compiler optimizations like dead code elimination: Apparently it also automatically skips row reconstruction when using column layout.


At the rate over which a new database comes out, it is compelling to have a high level language to program databases and abstract away a lot of the basic technique that presumably gets written over and over again.

I didn’t realize code rewriting could be so intelligent and transform based on such high level rules. Very cool!