Authors: M Stonebraker, G Kemnitz
I was very surprised by the introduction because I expected some more beefed up version of System R. It actually talks about how new database services needs to encompass data, object and knowledge (which as we know now didn’t happen and people moved away from monolithic structures)
Postgres supported classes, inheritance, and types, where a class is basically a schema, and each attribute has a type, and inheritance basically copies over some attributes.
They also allow abstract data type (ADT) to support custom types.
They allow query via path expressions using nested dot notation, like the following
etrieve (EMP.manager.age) where EMP.name = "Joe"
There are two ways to implement rules, one is forward chaining (activated by updates) and the other is backward chaining (need to “awaken” other rules to find the answer). The better choice depends on what happens more often. It’s quite analogous to eager vs lazy evaluation in my opinion.
The markers that are needed to implement the rules are rather interesting. It contains the identifier of the corresponding rule and the types of events its sensitive to (and is on a per record basis). There is a trade-off between doing the trigger on a per record basis or per table. When the updates are sparse per tuple is better, otherwise a query rewrite may be a better option.
There are different activation policies (consistency) where it could be immediate or deferred, combined with whether its the same transaction or different transactions (so 2x2=4 options).
I imagine the rules system is highly inefficient and thus fell out of popularity. The rules can also be semantically ambiguous.
User defined functions: “fast path” because the programmer could invoke calls directly to Postgres internals.
Departing from the mainstream WAL approach for a storage manager, the Postgres team use a no-overwrite storage managaer in place of a write ahead log, All Postgres logs only contain 2 bits per transaction indicating whether each transaction committed, aborted or is in progress. It has two nice properties:
- Instantaneous crash recovery: no need to process UNDO for aborts because the previous states were stored and accessible via log.
- Time travel
It is challenging to implement well due to the lack of stable memory, so Postgres has to force to disk (FORCE) at commit time all pages written by a transaction to make it durable, whereas a conventional one do not need to FORCE (recall Aries). So no overwrite has to suffer from random I/O (which is not so bad now with SSDs! and event better when non-volatile main memory arrives).
I found it very interesting that the authors considered the following “simple example”: “an application that stores and manipulates text and graphics to facilitate the layout of newspaper copy” — this is essentially holding the philosophy that the DBMS will handle most of the application logic!
They also supported CODASYL, a navigational interface, back then before it soon died out.
The saw the need for the DB to be good interfaces with different programming languages.
Back then it seemed very important that the system remains open to external code and modifications (Volcano optimizer generator makes similar arguments) but it seems that now no one cares.
“When considering the POSTGRES storage system, we were guided by a missionary zeal to do something different.” — Hipster research methodology! And they sure don’t consider Postgres to be a relational system: “As can be seen, POSTGRES beats the relational system by a substantial factor.”