Authors: A Arasu, S Babu, J Widom
Notes: references from these slides
CQL is a language and execution framework for continuous queries (think streams). The context for when the language was introduced is when there was no agreed on semantics and when queries get more complex the situations becomes “murky”.
The authors contributed by classifying 2 data types: streams and relation. Based on these 2 data types, there are 3 classes of operations: (1) stream to relation, (2) relation to relation and (3) relation to stream.
Their goals for the language is to make it easy to use and hard to make mistakes, and the goals for the execution plan is for modularity, capture the stream semantics, and make experimentation easy (cool idea).
How is stream fundamentally different from a relation? The following definitions speak to the question:
- (Definition 4.1) A stream S is a possibly infinite bag of elements <s, t>, where s is a tuple belonging to the schema of S and t in T (time domain) is the timestamp of the element.
- Note that there is also the split of base stream and derived stream
- (Defintion 4.2) A relation R is a mapping from T to a finite but unbounded bag of tuples belonging to the schema of R.
How to think about time
- T is discrete and ordered
- Inputs are either streams or relations
- Output are either stream (up to t) or a relation at time t (R(t)).
- Time only advances from t - 1 to t after all inputs up to t-1 have been processed. (yifan’s note: Bloom has a similar semantics here, very important for clarity of reasoning about things.)
Stream-to-Relation: sliding window, with three following classes,
- time based: S [Range T]
- tuple base: S [Rows N]
- partitions: S [Partition By A1…Ak Rows N]
Relation-to-Stream (I think this is the most original one/harder to understand):
- Insert stream:
Istreami(R)contains a stream element <s,t> whenever tuples s is in
- Delete stream:
Dstream(R)similar to above, flip the subtraction:
- Relation stream
Rstreamwhenever tuple is in R at time t.
Note that to derive stream to stream semantics (which is what TelegraphQL has).
There are 2 new stream-based equivalences in CQL:
- window reduction
SELECT Istream(L) FROM S [Range Unbounded] WHERE Cis equivalent to
SELECT Rstream(L) FROM S [Now] WHERE C
- filter-window commutativity:
(SELECT L FROM S WHERE C) [Range T]
is equivalent to
SELECT L FROM S [Range T] WHERE C
Assumption: The network conveying the stream elements to the DSMS may not guarantee in-order transmission.
Heartbeat: it’s a timestamp, after which the tuples whose timestamp are smaller (older) are rejected. It could be genearted via a clock of the DSMS when the tuples arrive (that way nothing is rejected), or it could be the source of the input stream.
Question: it seems strange to just drop stream elements if there are reordering though.
Running queries continuously
- Queues: Connect its input operator to its output operator. (yifan: reminds me of the exchange operator of Volcano)
- Synopsis: Store the intermediate state needed by continuous query plans. (yifan: it’s almost like a materialized view but with stream semantics)
- The notion of “equivalences” has always been important for query optimization, and its no different for CQL! Some where mentioned in the earlier part.
- Share synopses whenever possible