Year: 2002

Authors: S Madden, MJ Franklin, JM Hellerstein, W Hong

Background

From wiki: wireless sensor networks are spatially distributed autonomous sensors to monitor physical or environmental conditions. There are different topologies for WSNs and information propagates through the hops sof the network. At Berkeley small sensor devices called motes, in addition to having a suite of sensors, radio, battery, also have a processor and memory. TinyOS is a basic design to faciliate deployments of motes in ad-hoc networks, which decouples the implementation from a fixed network topology. Often times the tasks require some data aggregation rather than raw reads. Up until the paper, practitioners use low level programming abstractions and program the aggregation specific to the application.

TAG is a service for aggregation for ad-hoc TinyOS networks of low-power, wireless sensors. It offers two main advantages:

  • simple declarative interface for data collection
  • intelligently distributes and executes agg queries in a time and power efficient manner, with fault tolerance.

Basic Architecture

TAG operates via the following steps:

  • users send queries from basestation
  • operators for the query are distributed into the network (piggy backing on existing protocol)
  • sensors route the data back towards user (via a routing tree)

Query Syntax

TAG queries look like the following

SELECT {agg(expr), attrs} FROM sensors

  WHERE {selPreds}

  GROUP BY {attrs}

  HAVING {havingPreds}

  EPOCH DURATIONi i

Note that TAG does have to deal with stream semantics: the output is a stream. Each record consists of one <group id, aggregate value> pair per group. “Each group is time-stamped and the readings used to compute an aggregate record all belong to the same time interval, or epoch”.

Aggregation Properties

The way TAG does evaluation is via merging, initializer, and evaluator. The example used is AVERAGE, whose partial states record consist of SUM and COUNT, and to merge two records the function simply adds the two values together. The authors inspect different classes of aggregates based on the following properties (see table 1), they are pretty self explanatory and the key idea is that some of the properties could be exploit for relaxing execution constraints.

aggregations

  • Duplication Sensitive
  • Exemplary or Summary: important distinction because
    • exemplary aggregates behave unpredictably in the face of loss, and, for the same reason, are not amenable to sampling
    • summary aggregates can be treated as a robust approximation of the true aggregate value
  • Monotonic: intuitive, it means that when a boolean is evaluated it never changes. E.g. if we are getting the max of a set of values that are streamed in, and if we have a predicate that says, MAX(s) < 10, then if we ever see a value 11 we could safety evaluate this predicate to false before seeing all the elements of the stream.

This is important when determining whether some predicates (such as HAVING) can be applied in network, before the final value of the aggregate is known. Early predicate evaluation saves messages by reducing the distance that partial state records must flow up the aggregation tree.

  • Partial State: a few types:
    • distributive: e.g. max)
    • algebraic: avg, since its sum/count)
    • holistic: median, since it needs to keep track of everything)
    • unique: similar to holistic but limited to distinct values)
    • content-sensitive: e.g. historgram, the size depends on the unique values of the aggregate column

Aggregation Techniques

in Extreme Resource Constrained Environments

  • Attribute Catalog: basically every mote tries to fetch the data required, if its not present the programmer could either specify the mote not to participate or to give NULL value. This way there doesn’t need to be a centralized place for schema management.

  • Epoch: “Given our goal of using as few messages as possible, the collection phase must ensure that parents in the routing tree wait until they have heard from their children before propagating an aggregate value for the current epoch. We will accomplish this by having parents subdivide the epoch such that children are required to deliver their partial state records during a parent-specified time interval.”

  • Monotonicity: “The predicate is only sent if it can potentially be used to reduce the number of mes sages that must be sent: e.g. if the predicate is of the form MAX(attr) < x, then information about groups with MAX(attr) > x need not be transmitted up the tree, and so the predicate is sent down into the net- work”

  • Eviction: “Grouping introduces an additional problem: the number of groups can exceed available storage on any one (non-leaf) device”, in this case some groups are evicted up the tree, and the basestation will need to re-aggregate, and this technique was from another paper (Data reduction by partial preaggregation).

Dealing with Loss

To maintain an update to date topology, two ideas are used:

  • maintaining a cache of neighbors and track the quality by how many packets are received from each neighbor.
  • assume the parent has failed or moved away if it hasn’t heard for a while.

Another technique (caching):

parents remember the partial state records their children reported for some number of rounds, and use those previous values when new values are unavailable due to lost child messages.

An alternative when the previous is not possible is to use redundancy:

Consider a mote with two possible choices of parent: instead of sending its aggregate value to just one parent, it can send it to both parents.

Insights

  • “The problem of computing aggregate queries in large clusters of nodes has been addressed in the context of shared- nothing parallel query processing environments” (references Adaptive parallel aggregation algorithms)
  • “Literature on active networks identified the idea that the network could simultaneously route and transform data, rather than simply serving as an end-to-end data conduit.”