Year: 2012

Authors: A Thomson, T Diamond, SC Weng, K Ren P Shap D Abadi

Summary

“Calvin is a practical transaction scheduling and data replication layer that uses a deterministic ordering guarantee to significantly reduce the normally prohibitive contention costs associated with distributed transactions. Unlike previous deterministic database system prototypes, Calvin supports disk-based storage, scales near-linearly on a cluster of commodity machines, and has no single point of failure. By replicating transaction inputs rather than effects, Calvin is also able to support multiple consistency levels—including Paxos-based strong consistency across geographically distant replicas—at no cost to transactional throughput.” (emphasis mine)

From Google Spanner paper: “Calvin eliminates concurrency control: it pre-assigns timestamps and then executes the transactions in timestamp order.”

Background: Determinism in Databases

The fundamental issue that makes distributed transactions expensive is reaching agreement (as seen in 2PC). Deterministic databases help alleviate this issue.

The goal is to only synchronously replicate batches of transaction requests. This is not possible in traditional database implementation because the processing is equivalent to some serial ordering. However with small edits (e.g. deterministic lock manager), the DBMS could make all replicas emulate the same serial order.

Fundamentally this eliminates the need to check for a node failure which could cause the transaction to abort. However, by committing the system to a predefined sequence of oder the execution is now less flexible. This is why it has to do the trick about learning about the read/write set up front and

Architecture

  • Sequencing takes transacts and places them into a global sequence, by which all replicas will ensure serial equivalence. Need to handle the replication and logging of this sequence.
  • Scheduling execute using a deterministic locking to support concurrency while maintain specified serial order.
  • Storage is fairly simple (any CRUD interface works).

This architecture separates the replication mechanism, transactional functionality and concurrency control from the storage system. This makes the system more modular and enable certain algorithms such as ARIES and next-key locking (predicate locking).

Sequencer

Since Calvin is assigning order to a set of transactions, it needs to wait for some amount of time (managed by epochs) to collect the input transactions. > Calvin divides time into 10-millisecond epochs during which every machine’s sequencer component collects transaction requests from clients.

After the collection, replication starts, and supports two modes:

  • sync
    • paxos based, uses ZooKeeper
    • con: very slow (50 to 100x) depending on number of data centers
  • async
    • pro: low latency before a transaction can begin being executed at the master replication
    • con: significant complexity in failover: (1) which batch was the last valid batch (2) what transactions that batch contained

Scheduler and Concurrency Control

When the transactional component of a database system is unbundled from the storage component, it can no longer make any assumptions about the physical implementation, so all logging have to be logical. This is why it needs to have deterministic locking. To this end Calvin manages virtual resources that can be logically locked in the transactional layer (local to the individual nodes). They use modified strict 2PL: if A and B are both requesting exclusive lock on local record R, give the one earlier in the serial order first.

Once a transaction has acquired all the locks, it’s assigned to a worker to be executed, which happens in 5 phases:

  1. read/write set analysis, noting local/remote reads, and what nodes need to do writes (active) and who just reads (passive).
  2. local read
  3. remote read
  4. active node collect reads
  5. apply writes

Because of this process, Calvin doesn’t support transactions which must perform reads to determine the full read/write sets. It uses a “reconaissance query” to figure out what the read/write set will be and then execute.

Working With Disks

They introduce an artificial delay before issuing an action with anticipated long I/O time and “warm-up” disk-resident records. And they have some algorithms for estimating disk I/O latency.

Checkpointing

They use a modification of a Zig-Zag algorithm to make sure that no checkpoints are not too behind.

Evaluation

Calvin does better under high contention than other systems with ACID guarantees.

Notes:

  • I’m very curious how this compares with Spanner. One day when I have time it would be cool to map out where all the database systems lie in the design space and their respective performance curves.
  • It’s interesting how Calvin introduces artificial delays at many points of its processing (first to collect transactions, then before scheduling certain action) to optimize for future time loss.
  • Calvin is super cool because the engineering and ideas are pretty neat.