This paper covers some techniques that improves file system throughput.
Issues with the Old File System
- segregates the inode information from the data: accessing a file incurs a long seek from the file’s inode to its data.
- files in a single directory not allocated consecutively
- allocation of data blocks to files is also suboptimum
Some initial improvements:
- Maintain reliability: staging modifications to critical file system information so that they could either be completed or repaired cleanly by a program after a crash.
- Bigger block size: 512 to 1024
New File System
- A file system:
- described by its super-block, located at the beginning of the file system’s disk partition
- each disk drive contains one or more file systems
- A cylinder group:
- a disk partition has multiple cylinder groups
- one or more consecutive cylinders on a disk
- bookkeeping information begins at a varying offset from the beginning of the cylinder group
Data is laid out so that larger blocks can be transferred in a single disk transaction, greatly increasing file system throughput.
But this decreases storage utilization, so they instrumented ways for fragments to share data blocks.
File system parameterization
What is parametrized: processor capabilities and mass storage characteristic, including speed of the processor, the hardware support for mass storage transfers, and the characteristics of the mass storage devices.
Why? blocks can be allocated in an optimum configuration-dependent way, so that the FS could be adapted to the characteristics of the disk on which it is placed..
The global policies try to balance the two conflicting goals of localizing data that is concurrently accessed while spreading out unrelated data.
- Inodes of files in the same directory are frequently accessed together.
- Data blocks:
- The problem with allocating all the data blocks in the same cylinder group is that large files will quickly use up available space in the cylinder group, forcing a spill over to other areas.
The final global policy is the following heuristics:
- Use the next available block rotationally closest to the requested block on the same cylinder.
- If there are no blocks available on the same cylinder, use a block within the same cylinder group.
- If that cylinder group is entirely full, quadratically hash the cylinder group number to choose another cylinder group to look for a free block.
- If the hash fails, apply an exhaustive search to all cylinder groups.
- Long file names: nearly arbitrary length
- File locking: before this people used a workaround to create a lock file.
- The authors chose to serialize access to a file with locks and they allow locking pieces of the file (more granularity).
- A hard lock is always enforced when a program tries to access a file; an advisory lock is only applied when it is requested by a program. They chose advisory to allow admin level threads to control things more (bottom up).
- Symbolic links: the old system allows multiple directory entries in the same file system to reference a single file, but its not allowed across physical file systems or inter-machine.
- Rename: now a sys call to guarantee the existence of the target name despite crashes.