Authors: C Mohan et al
For a transaction to commit atomicaly (all or nothing) to a distributed system with replicas as if its one system requires coordination and consensus, which is challenging in the face of uncertain network and potential crashes. There are existing commit protocols based around the two-phase commit protocol. From the paper the authors listed some desirable characteristics in a commit protocol
- guaranteed transaction atomicity always
- ability to “forget” outcome of commit processing after a short amount of time
- minimal overhead in terms of log writes and message traffic,
- optimized performance in the no-failure case,
- exploitation of completely or partially read-only transactions, and
- maximizing the ability to perform unilateral aborts.
The authors voiced concern about the state of evaluation of performance of the different algorithms (immediately the year after there is the paper Concurrency Control Performance Modeling)
Overview of 2PC
Basic structure: have a coordinator that is connected to user application, which all the subordinates communicate with (and not among themselves)
Communication protocol under no failures:
- user commits a transaction
- coordinator initiates the 1st phase by sending PREPARE in parallel to subordinates
- subordintates either sends YES or NO for the vote, and if NO, it vetos and could abort the transaction (forgets its information) locally without waiting for coordinator
- If all votes YES, then the coordinator force-write (flush to disk) a commit record and send COMMIT messages to all subordinates. Otherwise force-write abort record, and send ABORTs to all subordinates.
- when the subordinates receives a COMMIT, it force-writes its commit record, send ACK to coordinator and commits the transaction and “forgets” it. Same for receiving ABORT (different record)
There is a great image that illustrates the process in the original paper:
An invariant maintained that ensures correctness is the fact that “if a subordinate acknowledges the receipt of any particular message, then it should make sure (by forcing a log record with the information in that message before sending the ACK) that it will never ask the coordinator about that piece of information.”
The paper then goes on to discuss failure scenarios for 2PL (which vaguely reminds me of ARIES in writing style… which is not surprising since its the same main author), for which there are quite a few due to the combinatoric nature of the states.
- Extending 2PL with a hierarchical tree process: because both R* and ENCOMPASS use such schemes. This introduces a new kind of process that organizes a subset of subordinates and combines their result to the root coordinator.
- Presumed Abort: realized that in the NO VOTE, the coordinator does not need to force write abort log (both coordinator and subordinates), or for the subordinates to send ACK.
- Partial read-only queries: read only queries are so great that we want to make use of it as much as possible. Which adds the READ vote, in which case the subordinate writes no log records. If everything is a READ VOTE there is no need to log for the coordinator.
- Presumed Commit: if we eliminated ACKs for COMMITs would things be faster? yes, but there is a case when the coordinator crashes after a subordinates sends a YES VOTE, and the coordinator upon recovery will have aborted the transaction but the subordinator will have assumed COMMIT. To fix this, the coordinators need to force write the names of subordinates.
The morning paper has a great writeup for a detailed description of the algorithm.
- I like how the authors survey the desirable properties of a feature and identifies where the state of art is missing. They then combine the goals with system insights (high-level heuristics) to motivate their design.
- Are the force writes all just optimizations?
- What makes hierarchical good?