The Apple File System has some support for data deduplication. You can create a copy-on-write clone of a file, using no extra disk space (until you start modifying either the original or the clone). I wanted to use this for my games volume. Multiple games using the same game engine often have many support files in common. Additionally, multiple versions of the same game often have many game-specific files in common. Keeping track of this is not difficult. You simply need to maintain a database with a file table having size and content hash columns. But what about partial matches? For example, if you append files to a ZIP archive, the old and new versions of the archive will have a common prefix. When installing the new version, you could make a clone of the old one, then copy only the non-matching parts from the source. Keeping track of **this** gets more elaborate. You need to store hashes of each allocation block of each file. Here's the final design: ``` create table file ( fileid integer primary key,
Show comments (12)