Authors: R Avnur, J Hellerstein
Run-time variations in selectivity and sending data across the network have not been discussed widely by previous query optimizers. Eddy edits reorders operators as the query plan is running, which is different from the traditionally coarse-grained updates.
Examples to motivate the system * Salary > 10K is correlated with age (so the selectivity is not independent) * The network speed changed and the relation which was streamed fast is now streamed slowly.
Here is an image that illustrates what’s going on!
Eddy is a query processing operator, it continuously reorders the application of pipe-lined operators in a query plan, on a tuple-by-tuple basis. The more “moments of symmetry” (i.e. no need to block) the more Eddy could do to help speed things up. One example application to joins is Ripple Join, which is a variant of block nested loop join that allows streaming from both relations at various speeds.
Eddies is plugged into River. From the original paper: River is “a data-flow programming environment and I/O substrate for clusters of computers” and “this is achieved using two basic system mechanisms: a distributed queue (DQ) balances work across consumers of the system, and a data layout and access mechanism called graduated declustering (GD) dynamically adjusts the load generated by producers”
One thing to make Eddies work in River is to chose which relations to be put into the same eddies for joining (equivalent to choosing a spanning tree in a query graph). The initial heuristic is to choose joins whose result set wont be too large (probably some hand-tuned number in research prototype).
Now because there is a set of operators, the eddy need to keep state and not let tuples go until its done, and this is achieved via some simple labeling (with Ready and Done bits). I’m a bit confused how this works with Rover because it seems that Eddies itself is a queue that River has… Overlapping functionalities? Or similar idea but on different granularities?
“An eddy module directs the flow of tuples from the inputs through the various operators to the output, providing the flexibility to allow each tuple to be routed individually through the operators. The routing policy used in the eddy determines the efficiency of the system.”
“An eddy’s tuple buffer is implemented as a priority queue with a flexible prioritization scheme. An operator is always given the highest-priority tuple in the buffer that has the corresponding Ready bit set”
The naive Eddy algorithm does well because an edge in River has fixed size queues, so the production is limited by the consumption. If Eddy could help pass the data to an operator that is free it increases utilization when operators have different costs.
The problem comes when there are variable selectivity. The authors use lottery scheduling to enhance Eddy. It works as follows:
- Each time the operator returns a tuple to the eddy, one ticket is debited from the eddy’s running count for that operator.
- When an eddy is ready to send a tuple to be processed, it “holds a lottery” among the operators eligible for receiving the tuple.
These two combined essentially encourages the operator that processes more quickly to process more. The lottery ticket essentially captures the overall performance and not just the optimal plan.
Here are some results:
When there are dynamic fluctuations, the authors further introduces a window scheme so that the lottery tickets dont track too far back in history (and reflect more recent behavior).
Q: why do joins capture state?
This was very abstract initially, but the example in Section 2.1 really helped. In a merge join, the current pointer to the value of one relation needs to wait for the other value to be at least as big as the current value inspected.
Q: why do we care about the interleaving of requests?
Interleaving means we might need to block and wait (and keep around extra state).
Q: how is this different from a scheduler?
It seems that a scheduler needs to have global knowledge and an operator could just encapsulate local knowledge. As a result it also keeps around less state. Eddies also runs once per tuple whereas a scheduler usually cannot get to that level of granularity.
- It seems that thinking about reordering has to be end to end, like how eddies is dependent on ripple join algorithm. Though I guess the good thing is that there always exists an reordering friendly alternative.
Adaptivity was not a new idea and there were techniques like “query scrambling” by Urhan et al where an iterative process of (1) rescheduling when delay is detected and (2) synthesizing new operators when stalling. Eddies takes the notion of adaptivity further to in-flight operator level and digging into the semantics of the join algorithms to make it happen.