Year: 2006

Authors: A Arasu, S Babu, J Widom

Notes: references from these slides

Summary

CQL is a language and execution framework for continuous queries (think streams). The context for when the language was introduced is when there was no agreed on semantics and when queries get more complex the situations becomes “murky”.

The authors contributed by classifying 2 data types: streams and relation. Based on these 2 data types, there are 3 classes of operations: (1) stream to relation, (2) relation to relation and (3) relation to stream.

Their goals for the language is to make it easy to use and hard to make mistakes, and the goals for the execution plan is for modularity, capture the stream semantics, and make experimentation easy (cool idea).

Semantics

Definitions

How is stream fundamentally different from a relation? The following definitions speak to the question:

  • (Definition 4.1) A stream S is a possibly infinite bag of elements <s, t>, where s is a tuple belonging to the schema of S and t in T (time domain) is the timestamp of the element.
    • Note that there is also the split of base stream and derived stream
  • (Defintion 4.2) A relation R is a mapping from T to a finite but unbounded bag of tuples belonging to the schema of R.

How to think about time

  • T is discrete and ordered
  • Inputs are either streams or relations
  • Output are either stream (up to t) or a relation at time t (R(t)).
  • Time only advances from t - 1 to t after all inputs up to t-1 have been processed. (yifan’s note: Bloom has a similar semantics here, very important for clarity of reasoning about things.)

Operators

Stream-to-Relation: sliding window, with three following classes,

  • time based: S [Range T]
  • tuple base: S [Rows N]
  • partitions: S [Partition By A1…Ak Rows N]

Relation-to-Stream (I think this is the most original one/harder to understand):

  • Insert stream: Istreami(R) contains a stream element <s,t> whenever tuples s is in R(t)-R(t-1)
  • Delete stream: Dstream(R) similar to above, flip the subtraction: R(t-1)-R(t)
  • Relation stream Rstream whenever tuple is in R at time t.

Note that to derive stream to stream semantics (which is what TelegraphQL has).

Equivalences

There are 2 new stream-based equivalences in CQL:

  • window reduction
    • example: SELECT Istream(L) FROM S [Range Unbounded] WHERE C is equivalent to SELECT Rstream(L) FROM S [Now] WHERE C
  • filter-window commutativity:
(SELECT L
 FROM S
 WHERE C) [Range T]

is equivalent to

SELECT L
FROM S [Range T]
WHERE C

Execution

Reordering: heartbeats

Assumption: The network conveying the stream elements to the DSMS may not guarantee in-order transmission.

Heartbeat: it’s a timestamp, after which the tuples whose timestamp are smaller (older) are rejected. It could be genearted via a clock of the DSMS when the tuples arrive (that way nothing is rejected), or it could be the source of the input stream.

Question: it seems strange to just drop stream elements if there are reordering though.

Running queries continuously

  • Queues: Connect its input operator to its output operator. (yifan: reminds me of the exchange operator of Volcano)
  • Synopsis: Store the intermediate state needed by continuous query plans. (yifan: it’s almost like a materialized view but with stream semantics)

Query Optimization

  • The notion of “equivalences” has always been important for query optimization, and its no different for CQL! Some where mentioned in the earlier part.
  • Share synopses whenever possible