Year: 1990

Authors: N Beckmann, HP Kreiegel R Schneider, B Seeger

Note: I referenced slides here: The original paper was very painful to read due to the lack of periods…


R-tree is great for accessing rectangles, which is useful for geometrical data and multi-dimensional data. Data is represented by a minimum bounding box. Each node could.

R*-tree achieves extra:

  • Supports spatial join (overlap)
  • Supports points (as well as rectangles)

Vanilla R-Tree

An R-tree is a B+-tree like structure which stores multidimensional rectangles as complete objects without clipping them or transforming them to higher dimensional points.

A non-leaf node contarns entries of the form (cp, Rectangle) where cp 1s the address of a child node m the R-tree and Rectangle 1s the minimum boundingg rectangle of all rectangles which are entries m that child node A leaf node contams entries of the form (Old, Rectangle) where Old refers to a record m the database, describing a spatial oblect and Rectangle 1s the enclosrng rectangle of that spatial oblect

How to access?

the structure must allow overlapping directory rectangles Thus it cannot guarantee that only one search path is required for an exact match query

Research Methods

“A heuristic approach is applied, which is based on many different experiments carried” out in a systematic framework” And they have found the following suitable heuristics:

(01) The area covered by a directory rectangle should be minimized (02) The overlap between directory rectangles should be minimized (03) The sum of the lengths of the edges of a rectangle of a directory rectangle should be minimized (04) Storage utilization should be optimized

With these goals in mind they set out to design algorithms for working with the R*-tree. Note that they were constrained severely by CPU back then and now we could probably do a lot more extensive optimization.

Main Algorithm

  • ChooseSubTree: determine the nearly minimum overlap cost
  • Split of the R*-tree: there is a goodness value for area, margin, and overlap.
  • Forced Reinsertion at Node Overflow: reinsert is good because this way the ChooseSubtree algorithm has a new chance of distributing entries into different nodes.
    • overall heuristic: Since the outer rectangles of a node are reinserted, the shape of the directory rectangles will be more quadratic As discussed before, this is a desirable property

The rest of the paper were dedicated to experiment setup to account for different scenarios.