nullprogram.com/blog/2023/02/15/
Seven years ago I wrote about “freestanding” Windows executables.
After an additional seven years of practical experience both writing and
distributing such programs, half using a custom-built toolchain,
it’s time to revisit these cabalistic incantations and otherwise scant
details. I’ve tweaked my older article over the years as I’ve learned, but
this is a full replacement and does not assumes you’ve read it. The “why”
has been covered and the focus will be on the “how”. Both the GNU
and MSVC toolchains will be considered.
I no longer call these “freestanding” programs since that term is, at
best, inaccurate. In fact, we will be actively avoiding GCC features
associated with that label. Instead I call these CRT-free programs,
where CRT stands for the C runtime the Windows-oriented term for
libc. This term communicates both intent and scope.
Entry point
You should already know that main
is not the program’s entry point, but
a C application’s entry point. The CRT provides the entry point, where it
initializes the CRT, including parsing command line options, then
calls the application’s main
. The real entry point doesn’t have a name.
It’s just the address of the function to be called by the loader without
arguments.
You might naively assume you could continue using the name main
and tell
the linker to use it as the entry point. You would be wrong. Avoid the
name main
! It has a special meaning in C gets special treatment. Using
it without a conventional CRT will confuse your tools an may cause build
issues.
While you can use almost any other name you like, the conventional names
are mainCRTStartup
(console subsystem) and WinMainCRTStartup
(windows
subsystem). It’s easy to remember: Append CRTStartup
to the name you’d
use in a normal CRT-linking application. I strongly recommend using these
names because it reduces friction. Your tools are already familiar with
them, so you won’t need to do anything special.
int mainCRTStartup(void); // console subsystem
int WinMainCRTStartup(void); // windows subsystem
The MSVC linker documentation says the entry point uses the __stdcall
calling convention. Ignore this and do not use __stdcall
for your
entry point! Since entry points take no arguments, there is no practical
difference from the __cdecl
calling convention, so it does not actually
matter. Rather, the goal is to avoid __stdcall
function decorations.
In particular, the GNU linker --entry
option does not understand them,
nor can it find decorated entry points on its own. If you use __stdcall
,
then the 32-bit GNU linker will silently (!) choose the beginning of your
.text
section as the entry point.
If you’re using C++, then of course you will also need to use extern "C"
so that it’s not name-mangled. Otherwise the results are similarly bad.
If using -fwhole-program
, you will need to mark your entry point as
externally visible for GCC so that it knows its an entry point. While
linkers are familiar with conventional entry point names, GCC the
compiler is not. Normally you do not need to worry about this.
__attribute__((externally_visible)) // for -fwhole-program
int mainCRTStartup(void)
{
return 0;
}
The entry point returns int
. If there are no other threads then the
process will exit with the returned value as its exit status. In practice
this is only useful for console programs. Windows subsystem programs have
threads started automatically, without warning, and it’s almost certain
your main thread is not the last thread. You probably want to use
ExitProcess
or even TerminateProcess
instead of returning. The latter
exits more abruptly and can avoid issues with certain subsystems, like
DirectSound, not shutting down gracefully: It doesn’t even let them try.
int WinMainCRTStartup(void)
{
// ...
TerminateProcess(GetCurrentProcess(), 0);
}
Compilation
Starting with the GNU toolchain, you have two ways to get into “CRT-free
mode”: -nostartfiles
and -nostdlib
. The former is more dummy-proof,
and it’s what I use in build documentation. The latter can be a more
complicated, but when it succeeds you get guarantees about the result. I
use it in build scripts I intend to run myself, which I want to fail if
they don’t do exactly what I expect. To illustrate, consider this trivial
program:
#include
int mainCRTStartup(void)
{
ExitProcess(0);
}
This program uses ExitProcess
from kernel32.dll
. Compiling is easy:
$ cc -nostartfiles example.c
The -nostartfiles
prevents it from linking the CRT entry point, but it
still implicitly passes other “standard” linker flags, including libraries
-lmingw32
and -lkernel32
. Programs can use kernel32.dll
functions
without explicitly linking that DLL. But, hey, isn’t -lmingw32
the CRT,
the thing we’re avoiding? It is, but it wasn’t actually linked because the
program didn’t reference it.
$ objdump -p a.exe | grep -Fi .dll
DLL Name: KERNEL32.dll
However, -nostdlib
does not pass any of these libraries, so you need to
do so explicitly.
$ cc -nostdlib example.c -lkernel32
The MSVC toolchain behaves a little like -nostartfiles
, not linking a
CRT unless you need it, semi-automatically. However, you’ll need to list
kernel32.dll
and tell it which subsystem you’re using.
$ cl example.c /link /subsystem:console kernel32.lib
However, MSVC has a handy little feature to list these arguments in the
source file.
#ifdef _MSC_VER
#pragma comment(linker, "/subsystem:console")
#pragma comment(lib, "kernel32.lib")
#endif
This information must go somewhere, and I prefer the source file rather
than a build script. Then anyone can point MSVC at the source without
worrying about options.
I try to make all my Windows programs so simply built.
Stack probes
On Windows, it’s expected that stacks will commit dynamically. That is,
the stack is merely reserved address space, and it’s only committed when
the stack actually grows into it. This made sense 30 years ago as a memory
saving technique, but today it no longer makes sense. However, programs
are still built to use this mechanism.
To function properly, programs must touch each stack page for the first
time in order. Normally that’s not an issue, but if your stack frame
exceeds the page size, there’s a chance it might step over a page. When a
function has a large stack frame, GCC inserts a call to a “stack probe” in
libgcc
that touches its pages in the prologue. It’s not unlike stack
clash protection.
For example, if I have a 4kiB local variable:
int mainCRTStartup(void)
{
char buf[1<<12] = {0};
return