The Wayback Machine – https://web.archive.org/web/20170515160229/http://minnie.tuhs.org/pipermail/tuhs/2017-May/009935.html
Theodore Ts’o
tytso at mit.edu
Sun May 14 14:30:24 AEST 2017
- Previous message (by thread): [TUHS] The evolution of Unix facilities and architecture
- Next message (by thread): [TUHS] The evolution of Unix facilities and architecture
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
On Thu, May 11, 2017 at 03:25:47PM -0700, Larry McVoy wrote: > This is one place where I think Linux kicked Unix's ass. And I am not > really sure how they did it, I have an idea but am not positive. Unix > file systems up through UFS as shipped by Sun, were all vulnerable to > what I call the power out test. Untar some big tarball and power off > the machine in the middle of it. Reboot. Hilarity ensues (not). > > You were dropped into some stand alone shell after fsck threw up its > hands and it was up to you to fix it. Dozens and dozens of errors. > It was almost always faster to go to backups because figuring that > stuff out, file by file (which I have done more than once), gets you > to the point that your run "fsck -y" and go poke at lost+found when > fsck is done, realize that there is no hope, and reach for backups. > > Try the same thing with Linux. The file system will come back, starting > with, I believe, ext2. > > My belief is that Linux orders writes such that while you may lose data > (as in, a process created a file, the OS said it was OK, but that file > will not be in the file system after a crash), but the rest of the file > system will be consistent. I think it's as if you powered off the > machine a few seconds earlier than you actually did, some stuff is in > flight and until they can write stuff out in the proper order you may > lose data on a hard reset. So the story is a bit complicated here, and may be an example of "worse is better" --- which is ironically one of those things which is used as an explanation for why BSD/Unix won ever though the Lisp was technically superior[1] --- but in this case, it's Linux that did something "dirty", and BSD that did something that was supposed to be the "better" solution. [1] https://www.jwz.org/doc/worse-is-better.html So first let's talk about ext2 (which indeed, does not have file system journalling; that came in ext3). The BSD Fast File System goes to a huge amount of effort to make sure that writes are sent to the disk in exactly the right order so that fsck can actually fix things. This requires that the disk not reorder writes (e.g., write caching is disabled or in write-through mode). Linux, in ext2, didn't bother with trying to get the write order correct at all. None. Nada. Zip. Writes would go out in whatever order dictated by the elevator scheduler, and so on a power failure or a kernel crash, the order in which metadata writes would be sent to the disk was completely unconstrained. Sounds horrible, right? In many ways, it was. And I lost count of how often NetBSD and FreeBSD users would talk about how primitive and horrible ext2 was in comparison to FFS, which had all of this excellent engineering work to make sure writes happened in the correct order such that fsck was guaranteed to always be able to fix things. So why did Linux get away with it? When I wrote the fsck for ext2, I knew that anything can and would happen, so it was implemented so that it was extremely paranoid about not ever losing any data. And if there was a chance that an expert could recover the data, e2fsck would stop and ask the system administrator to take a look. In the case that the user ran with fsck -y, the default was drop files into lost+found, where as in some cases with the FFS fsck, it "knew" that in a particular case, the order in which writes were staged out the right thing to do was to let the unlink complete, so it would let the refcount go to zero, or stay at zero. The other thing that we did in Linux is that I made sure we h