Authors: G Graefe W McKenna
The Volcano Optimizer Generator takes as inputs data models, logical and physical algebra, and optimizations rules and generates optimizer source code that could take queries and generate an optimized physical plan. The image below illustrates how it fits into the DBMS pipeline:
What exactly is a optimizer generator? “Optimization consists of mapping a logical algebra ex- pression into the optimal equivalent physical algebra ex- pression. In other words, a generated optimizer reorders operators and selects implementation algorithms.”
It is different from previous attempts of this sort, such as EXODUS, as it supports more features (See the Comparison below).
- Make it work for non-relational engines, such as object.
- Using rules is an extensible way to describe inputs.
- Algebraic equivalences (as sopposed to more imperative alternative).
- Rule interpretation vs. compilation: they chose compilation for speed.
How the Sausage Gets Made
I’m not sure how this section was tied together — seems like some discrete ideas, but either way here they are:
- How to store the properties? Logical properties are attached to equivalence classes - sets of equivalent logical expres- sions and plans - whereas physical properties are attached to specific plans and algorithm choices.
- How to derive the logical and physical properties? Property Functions. Note also that the set of physical properties is summarized for each intermediate result in a physical property vector, which is defined by the optimizer implementor and treated as an abstract data type by the Volcano optimizer generator and its search engine
- How are the properties used?
- Some of the physical algebra operators do not correspond to logical algebra operators (e.g. sort/decompression) but some of them are required so these operators are called enforcers.
- How does the generated optimizer work? “In order to decide whether or not an algorithm or enforcer can be used to execute the root node of a logical expression, a generated optimizer matches the implementation rule, executes the condition code associated with the rule, and then invokes an applicability function that determines whether or not the algorithm or enforcer can deliver the logical ex- pression with physical properties that satisfy the physical property vector.”
Based on dynamic programming (as was System R’s) and made to be very goal oriented (because of bad experience in EXODOS) — they use backward chaining and call it “directed dynamic programming”.
This is 14 years after the System R optimizer, also after EXODOS (1987). The main differences seem to be:
- more expressive/powerful: non-trivial cost models (replacing heuristic with more dynamic evaluation)
- better design: the authors who made EXODOS mentioned that EXODOS’ MESH design was too cumbersome due to unclean abstraction.
- MESH was a cached layer of intermediate result, and the plans have to be reevaluated when MESH is updated, which the authors mentioned was a huge performance issue. The number of nodes was high in MESH because the physical and logical algebra were combined.
- Generalized abstraction for cost (no longer just a single numeric value).
- better algorithms: dynamic programming, goal directed search, and branch-and-bound pruning.
- The authors mentioned that EXODUS evaluates too eagerly whereas Volcano goes top down and prunes faster.
Rant: I didn’t quite like the way the paper was structured. The comparison and design principles seem to be sprinkled into a mix of good ideas but lacking structure.