I recently happened upon an implementation of popen()
(different API, same idea) using clone(2)
, and so I opened an issue requesting use of vfork(2)
or posix_spawn()
for portability. It turns out that on Linux there’s an important advantage to using clone(2)
. I think I should capture the things I wrote there in a better place. A gist, a blog, whatever.
So here goes.
Long ago, I, like many Unix fans, thought that fork(2)
and the fork-exec process spawning model were the greatest thing, and the Windows sucked for only having exec*()
and _spawn*()
, the last being a Windows-ism.
After many years of experience, I learned that fork(2)
is in fact evil. And vfork(2)
, long said to be evil, is in fact goodness. A slight variant of vfork(2)
that avoids the need to block the parrent would be even better (see below).
Extraordinary statements require explanation, so allow me to explain.
I won’t bother explaining what fork(2)
is — if you’re reading this, I assume you know. But I’ll explain vfork(2)
and why it was said to be harmful. vfork(2)
is very similar to fork(2)
, but the new process it creates runs in the same address space as the parent as if it were a thread, even sharing the same stack as the thread that called vfork(2)
! Two threads can’t share a stack, so the parent is stopped while the child does its thing: either exec*(2)
or _exit(2)
.
Now, 3BSD added vfork(2)
, and a few years later 4.4BSD removed it as it was by then considered harmful. Most subsequent man pages say as much. But the derivatives of 4.4BSD restored it and do not call it harmful. There’s a reason for this: vfork(2)
is much cheaper than fork(2)
— much, much cheaper. That’s because fork(2)
has to either copy the parent’s address space, or arrange for copy-on-write (which is supposed to be an optimization to avoid unnecessary copies). But even COW is very expensive because it requires modifying memory mappings, taking expensive page faults, and so on. Modern kernels tend to seed the child with a copy of the parent’s resident set, but if the parent has a large memory footprint (e.g., is a JVM), then the RSS will be huge. So fork(2)
is inescapably expensive except for small programs with small footprints (e.g., a shell).
So you begin to see why fork(2)
is evil. And I haven’t yet gotten to fork-safety perils! Fork-safety considerations are a lot like thread-safety, but it is harder to make libraries fork-safe than thread-safe. I’m not going to go into fork-safety here: it’s not necessary.
(Before I go on I should admit to hypocrisy: I do write code that uses fork(2)
, often for multi-processing daemons — as opposed to multi-threading, though I often do the latter as well. But the forks there happen very early on when nothing fork-unsafe has happened yet and the address space is small, thus avoiding most evils of fork(2)
. vfork(2)
cannot be used for this purpose. On Windows one would have to CreateProcess()
or _spawn()
to implement multi-processed daemons, which is a huge pain in the neck.)
Why did I ever think fork(2)
was elegant then? It was the same reason that everyone else did and does: CreateProcess*()
, _spawn()
and posix_spawn()
and such functions are extremely complex, and they have to be because there is an enormous number of things one might do between fork()
and exec()
in, say, a shell. But with fork()
and exec()
one does not need a language or API that can express all those things: the host language will do! fork(2)
gave the Unix’s creators the ability to move all that complexity out of kernel-land into user-land, where it’s much easier to develop software — it made them more productive, perhaps much more so. The price Unix’s creators paid for that elegance was the need to copy address spaces. Since back then programs and processes were small that inelegance was easy to overlook. But now processes tend to be huge, and that makes copying even just a parent’s resident set, and page table fiddling for the rest, extremely expensive.
But vfork()
has all that elegance, and none of the downsides of fork()
!
vfork()
does have one downside: that the parent (specifically: the thread in the parent that calls vfork()
) and child share a stack, necessitating that the parent (thread) be stopped until the child exec()
s or _exit()
s. (This can be forgiven due to vfork(2)
‘s long preceding threads — when threads came along the need for a separate stack for each new thread became utterly clear and unavoidable. The fix for threading was to use a new stack for the new thread and use a callback function and argument as the main()
-alike for that new stack.) But blocking is bad because synchronous behavior is bad, especially when it’s the only option yet it could have been better. An asynchronous version of vfork()
would have to run the child in a new/alternate stack. Let’s call it afork()
, or avfork()
.