Year: 2000

Authors: JM Hellerstein, Avnur and Raman


This is a multi-year project that is accumulates multiple research papers, with the shared goal to develop systems for interactive analysis of large data sets.

Online Query Processing Applications

It started with online aggregation, which samples the underlying data with certain statistical guarantees, which motivated online query processing algorithms: ripple join and online reordering. The alternative approach is to pre-compute and give approximate results based on that.

The online aggregation line of work inspired online enumeration, which is essentially streaming in tuples (marketed as “scalable spreadsheets”).

If we combine the above two we get online data visualization!

And if we take a step further and start supporting complex algorithms we could start supporting online data mining.

Algorithms for Online Query Processing

The initial research begs for better algorithms to make the interfaces possible.

First to support sampling and statistical guarantees, the system needs to support random delivery.

Then is online reordering, which helps the user control what they see first (since time is the most expensive thing!). Index stride helps with online reordering by finding key ranges and then having direct access to a range of values evenly split across the value space.

Joins are probably the most expensive operator. Ripple join alleviates the large blocking cost by streaming tuples through without waiting for everything to be done.

End to End Systems

The online processing makes more demands on the client than traditional and requires more complex GUI to capture user intentions. The output API contains CONFIDENCE_AVG and the input API contains group preference specification such as pause group(x) and speed up group(x).

Traditional query optimizers dont really apply completely to this new paradigm and Eddies and River were introduced (and I think the work was carried on into the TelegraphCQ stream processing ideas as well).

Random comment: I was surprised that this didn’t take off commercially. I guess people would rather like to be wrong than uncertain…