Year: 1981

Authors: PL Lehman, SB Yao

Summary

B-tree and its variants have been found to be highly useful, but concurrent operations on them are hard. The author’s key insight is that a single additional “link” pointer in each node allows a process to easily recover from tree modifications performed by other concurrent processes. Their solution uses only a small (constant) number of locks at any time and no search through the tree is ever prevented from reading any node.

What goes wrong

Example that show the naive approach to concurrent operation on B*-trees is erroneous: basically a search for a value a and insertion of value b, the insert would split the node and move a to a new node. The search first returns a pointer to the node b resides on and then reads the page containing b. Between these two operations, the insertion process has altered the tree.

Image of the example:

tree

Solution

Add a link field that points to the next node at the same level of the tree as the current node (except that the link pointer of the rightmost node on a level is a null pointer).

The purpose of the link pointer is to provide an additional method for reaching a node.

The intent of this scheme is that the two nodes, since they are joined by a link pointer, are functionally essentially the same as a single node until the proper pointer from their father can be added.

Link pointers have the advantage that they are introduced simultaneously with the splitting of the node. If the search key exceeds the highest value in a node (as indicated by the high key), it indicates that the tree structure has been changed, and that the twin node should be accessed using the link pointer.

It’s only slightly less efficient: might need to dereference the new pointer.

An additional advantage of the B-linl-tree structure is that when the tree is searched serially, the link pointer is useful for quickly retrieving all of the nodes in the tree in “level-major” order, or, for example, retrieving only leaves.

Algorithms

Search > In each node, the comparisons produce a pointer to follow from that node, whether to the next level, or to a leaf (record) node. If the search process examines a node and finds that the maximum value given by that node is less than u, then it infers some change has taken place in the current node that had not been indicated in the father at the time the father was examined by the search. The current node must have been split into two (or more) new nodes.

Insert, similar to search to find the place to insert, say node a, the only unsafe case is when a split is needed, and it’s illustrated below

tree

Note that this procedure works its way back up the tree one level at a time. Further, at most three nodes are ever locked simultaneously, and this occurs relatively infrequently: only when it is necessary to follow a link pointer while inserting a pointer to a split node.

Importance of the remembered parent > In particular, we always have some idea of the correct insertion position for an object (associated with some value) at any level, that is, the “remembered” node through which our search passed at that level. If the correct insertion has been moved, it has done so in a known fashion, that is, via a node splitting to the right, leaving link pointers with which a search (or insertion) can find it. So the correct insertion position for an object is always accessible by a process starting at the old “expected” insertion position.

Correctness Intuition? > The correctness of the algorithm relies on the fact that any change in the tree structure (i.e., any splitting of a node) incorporates a link pointer; a split always moves entries to the right in the tree, where they are reachable by following the link pointer.

The paper then goes into many details about corner cases and proofs for correctness.

Questions

From Berkeley Prelim Questions

  • Review the assigned Lehman/Yao B-Link scheme for concurrent B-trees. Design an efficient logging and recovery technique that integrates with the B-Link scheme.
  • Concurrency control for B+Trees on both the physical level and logical level