tl;dr: wow this is really old
This is back in the days when relational algebra and relational database was too out-there. System R is an experimental system to demonstrate that the basic ideas works. We’ve come a long way from implementing a basic store with some consistency in under 50 lines of code (via bloom-lang)!
It talked about some features and implementations that are now standard text-book material, such as the following:
- data independence: user productivity & system performance
- people were concerned if declarative approaches were fast enough (thus the System R optimizer).
VIEWs, no materialization strategy though
- Authorization: groups of users, and change of ownership (in general these features are not much discussed)
- “Reloading table” — its a rather strange idea thinking from today.
- Recovery using shadow paging, which they found were very bad for performance and the authors hinted at WAL.
- Isolation Level (same as the one in Jim Gray’s paper):
- Level 1: read uncommitted data (no locks for reading)
- Level 2: do not read uncommitted data, but could still be non-repeatable read (the value changes the second time it does the read).
- Level 3: Repeatable read! Guaranteed by holding a read lock until end of transaction.
- Convoy phenomenon: for high contention locks, when P release a lock its given to the first waiting process (and P cannot access it again soon since it will be at the end of the queue). The solution is to allow the other waiting process to run but not grant them the lock immediately so P has a window of time to re-aquire the lock.
- System R compiles the query into machine code (so that the high level language is comparably fast as low level language). They mentioned that for an environment with short repetitive processes compilation was a great approach. Of course there is a lot of room for improvement here. They show that only 20% of the overhead was from the DBMS (and the 80% from the relational storage system underneath). Another benefit is ad-hoc query and canned queries could execute with the same interface, making the system easier to engineer.
- They didn’t use hashing because it didnt have the ordering property of a B-tree index, and the savings of potential direct access is amoitized with complex access patterns and thus not very attractive.
Users liked the SQL language (uniformity of the syntax across environments)