ZFS is a hybrid filesystem and volume manager system that is quite popular recently but has some important and unexpected problems.
It has many good features, which are probably why it is used: snapshots (with send/receive suppport), checksumming, RAID of some kind (with scrubbing support), deduplication, compression, and encryption.
But ZFS also has a lot of downsides. It is not the only way to achieve those features on Linux, and there are better alternatives.
Terminology
In this post I will refer to the ZFS on Linux project as ZoL. It was renamed to OpenZFS since ZoL got FreeBSD support, and FreeBSD’s own in-tree ZFS driver was deprecated in favor of just periodically syncing ZoL from out-of-tree.
What is “Scrubbing”? If a disk has an unrecoverable read error (URE) when reading a sector, it’s possible to repair the sector by rewriting its contents; the physical disk detects the rewrite over an unreadable sector and performs remapping in firmware. The RAID layer can do this automatically by relying on its redundant copy. Scrubbing is the process of periodically, preemtively reading every sector to check for UREs and repair them early.
Bad things about ZFS
Out-of-tree and will never be mainlined
Linux drivers are best maintained when they’re in the Linux kernel git repository along with all the other filesystem drivers. This is not possible because ZFS is under the CDDL license and Oracle are unlikely to relicense it, if they are even legally able to.
Therefore just like how all proprietary software eventually finds a GPL implementation and then a BSD/MIT one, ZFS will eventually be superceded by a mainline solution, so don’t get too used to it.
As an out-of-tree GPL-incompatible module, it is regularly broken by upstream changes on Linux where ZoL was discovered to be abusing GPL symbols, causing long periods of unavailability until a workaround can be found.
When compiled together, loaded and runnning, the resulting kernel is a combined work of both GPL and CDDL code. It’s all open source, but your right to redistribute the work to others requires your compliance with both the CDDL and GPL license which can’t be satisifed simultaneously.
It’s still easy to install on Debian. They ship only ZoL’s source code with a nice script that compiles everything on your own machine (zfs-dkms
), so it is technically never redistributed in this form which satisfies both licenses.
Ubuntu ships ZFS as part of the kernel, not even as a separate loadable module. This redistribution of a combined CDDL/GPLv2 work is probably illegal.
Red Hat will not touch this with a bargepole.
You could consider trying the fuse ZFS instead of the in-kernel one at least, as a userspace program it is definitely not a combined work.
Slow performance of encryption
ZoL did workaround the Linux symbol issue above by disabling all use of SIMD for encryption, reducing the performance versus an in-tree filesystem.
Rigid
To first clarify ZFS’s custom terminology (not considered a really bad point because even LVM2 is also guilty of using custom terminology here):
- A “dataset” is a filesystem you can mount. It might be the main filesystem or perhaps a snapshot.
- A “pool” is the top-level block device. It is a union span (call it RAID-0, stripe, or JBOD if you like) of all the vdevs in the pool.
- A “vdev” is the 2nd-level block device. It can be a passthrough of a single real block device, or a RAID of multiple underlying block devices. RAID happens at the vdev layer.
This RAID-X0
(stripe of mirrors) structure is rigid, you can’t do 0X
(mirror of stripes) instead at all. You can’t stack vdevs in any other configuration.
For argument’s sake, let’s assume most small installations would have a pool with only a single RAID-Z2 vdev.
Can’t add/remove disks to a RAID
You can’t shrink a RAIDZ vdev by removing disks and you can’t grow a RAIDZ vdev in by adding disks.
All you can do in ZFS is expand your pool to create a whole second RAIDZ vdev and stripe your pool across it, creating a RAID60
– you can’t just have one big RAID6
. This could badly affect your storage efficiency.
(Just for comparison, mdadm lets you grow a RAID volume by adding disks since 2006 and shrink by removing disks since 2009.)
Growing a RAIDZ vdev by adding disks is at least coming soon. It is still a WIP as of August 2021 despite a breathless Ars Technica article about it in June.
There are several Ars Technica links in this blog post. I like Ars a lot and appreciate the Linux coverage, but as an influencer why are they so bullish about ZFS? It turns out all their ZFS articles are written by this person who is also a mod of /r/zfs and hangs out there a lot. At least he is highly informed on the topic.
RAIDZ is slow
For some reason ZFS’s file-level RAIDZ IOPS only scale per vdev, not per underlying device. A 10-disk RAIDZ2 has IOPS similar to a single disk.
(Just for comparison, mdadm’s block-level RAID6 will deliver more IOPS.)