Authors: V Harinarayan, A Rajaraman, J Ullman
It was the early days of OLAP and the then hot “Decision Support Systems”. The authurs focus on choosing the right subset to materialize, Harinarayan et al proposed the lattice framework for expressing “dependencies” (defined as: can be answered from) among views. Desphande et al view CUBE computation as hierarchy of group-by operations. Though they didn’t mention lattice explicitly its the same geometric object since the edges are defined in the same way.
SIGMOD 2006 Test of Time Award!
“The paper presents an elegant formulation of the problem of selecting views to materialize in a data cube to speed-up query processing. It proposes a simple solution and an interesting analysis. This paper has had significant impact in terms of the way one now thinks about materialized data cubes and on implementations in commercial products.”
The issue with CUBEs is that there are space constraints (back then, now is more time constraints), so the question is which cuboids (group by’s) should the system materialize (just the count, not total space) to get reasonable performance (average time taken to evaluate a set of queries)?
- Based on the lattice framework, developed a linear cost model: the time to answer a query is taken to be equal to the space occupied by the view from which the query is answered. Sometimes the intermediate states dont fit into memory, but that on average requires a small number of further passes (2, per experiment conducted).
- Based on the cost model, developed greedy algorithm to select the best set of views (i.e. cuboids) that improves the cost of evaluating other groups (including itself) given existing materialized choices. For instance, if we have ABC materialized, then adding AB will decrease the cost of AB and A along with B. The gain of which one would be better depends on the statistic of the tables.
The authors did mention some of the simplifying assumptions could be relaxed to represent a more realistic world.