# Improved Query Performance With Variant Indexes

## Summary

Problem: try to use indices answer as a broad range of queries as possible. Traditional indices can only answer a certain type of queries well.

Solution: the authors propose value-list index, projection and bit sliced indexes, and then show how to answer aggregates using these indices, and lastly how to work with “non-dimensional” selection.

## Indices

The authors proved a cool theorem (2.1) that any two of the three indices could be used to create the other one.

### Value-List Index

- Background: the leaf level of B+-tree index consists of a sequence of entries for index keyvalues, which reflects the value of the indexed column or columns in one or more rows in the table. In traditional B-trees the values only appear once so the keyvalue partitions the rows, but in object-oriented DBs they don’t. To address each individual rows, there is a Row IDentifier (RID), which is a list stored with each distinct keyvalue entry. The problem with RIDs: large number of RIDs. Bit map is an alternative method to represent RID.
- Definition of Bitmap: “A “Bitmap B” is defined on table T as a sequence of M bits, where for each row with ordinal number j, we set the jth bit in B if that row satisfies the property of the index”
- Advantages:
- Space efficient: easy to compress, good for caching
- CPU efficient: boolean operations used for AND, OR, and NOT are fast
- Note that to work with NOT, we need an additional “Existence Bitmap” since SQL has True, False
*and*NULL.

- Note that to work with NOT, we need an additional “Existence Bitmap” since SQL has True, False

### Projection Index

- Definition: “A projection index for column duplicates all column values for lookup by ordinal number” (yifan’s note: this feels like a precursor to column store ideas)
- Advantage: don’t need to retrieve other columns in the row and could fit in smaller number of pages.

### Bit Sliced Index

- Definition: “A Bit-Sliced index for a column creates Bitmaps for significant bits in the column value”.
- So for example, a list of [1,2,3] would have a bit sliced index of [[0,0,1],[0,1,1]]

- This is a very neat idea as it makes RANGE selection, MAX calculation, and SUM extremely fast (all bit operations). Highlight of the contribution.
- Example: Algorithm 4.2 shows how to do Bit-Sliced Index do range predicate. As a concrete example, for greater than comparison, one just need to compare for the most significant bit for which the two numbers differ (and smaller than is vice versa).
- Intuitively it’s a vertical partitioning of the data — instead of looking at the numbers one by one, we are looking at the i-th significant bit all together, so they are literally “orthogonal” to the data in the Projection index.

### Using the Indexes

- Aggregation: sum could be via projection index, or bitmap index (add over the bit-shifted counts) and bit sliced (the two algorithms shown below, respectively):

```
if (COUNT (Bf AND Bnn) == 0)
Return null;
SUM = 0.0;
for each non-null value v in the index for C {
Designate the set of rows with value v as Bv
SUM += v * COUNT(Bf AND BV);
}
Return SUM;
```

```
if (COUNT (Bf AND Bnn) == 0)
Return null;
SUM = 0.0;
for i = 0 to N
SUM += 2i * COUNT(Bi AND Bf);
Return SUM;
```

For how to implement other types of queries the paper goes into details.

- Queries that group by different combinations of columns are called dimensions, and if there isn’t an existing summary table (e.g. CUBE) other indices are completely useless. This is a problem. The authors created Join Indexes and Bitmap-Join-Indexes to fix this (note that this paper did not contribute the index, a previous 1995 paper did).
- Naive join index: “an index on one table that involves a column value from different table through a commonly encountered join.”
- Issue: combinatorial explosion of join indexes in terms of the number of useful columns
- How bitmap-join-index solve the problem? just do pairwise joins, “an index on a table T based on a single column of a table S, where S commonly joins with T in a specified way”

### Evaluation

These results really motivate the need of different indices.

Bit-Sliced Index is not the best for MAX or MIN makes sense because Value-List Index apparently works better for narrow ranges, but I’m not sure why this would still hold if the selectivity of the predicate is high…