Last week, I was working on a crash bug in a game running in Proton.
About 90% of the time, the game would crash at some random point during
the publisher logo screens. Rarely, it would succeed and get to the
game’s main menu. That kind of inconsistent behavior points to some kind
of invalid memory bug.
While debugging this, I mentioned to my coworker Paul Gofman that it
looked like a bogus write to an important part of memory. Paul suggested
that I try setting a hardware write watch breakpoint to determine where
the bad write was coming from. I hadn’t done that before, so Paul
provided me with example code. I found it really useful, so here’s a
walkthrough of how I debugged the issue and used the x86 debug registers
to diagnose the problem and develop a fix.
Here’s the fatal exception from a WINEDEBUG
log of the
game crashing:
0124:0128:trace:seh:dispatch_exception code=c0000005 flags=0 addr=00000001700680B4 ip=00000001700680B4 tid=0128
I then ran the game under Wine’s debugger, winedbg, to discover where
the crash was occurring (output heavily trimmed to relevant
sections):
[aeikum@aeikum ~]$ /tmp/proton_aeikum/winedbg_run
WineDbg starting on pid 00e0
0x0000000170057a59 ntdll+0x57a59: ret
Wine-dbg>c
...
Unhandled exception: page fault on read access to 0xffffffffffffffff in 64-bit code (0x00000001700440f8).
Wine-dbg>info share 0x1700440f8
Module Address Debug info Name
PE 0000000170000000-00000001700a1000 Export ntdll
This shows that it is crashing in Wine code, somewhere in ntdll. I
used some printf-debugging to discover exactly where the crash was
occurring. I narrowed it down to the function
get_full_path_helper
in dlls/ntdll/path.c
.
Specifically it’s crashing when dereferencing a bogus cd
pointer in this code:
case RELATIVE_DRIVE_PATH: /* c:foo */
dep = 2;
if (wcsnicmp( name, cd->Buffer, 2 ))
Here’s the bogus pointer value from one run (the value changes in
every run):
0124:0128:err:file:get_full_path_helper cd: 6F72507865546E8F
That’s obviously garbage (it’s actually the ASCII string
"x8FnTexPro"
). Where is this garbage coming from? Earlier
in the function, you can find where cd
is calculated:
if (NtCurrentTeb()->Tib.SubSystemTib) /* FIXME: hack */
cd = &((WIN16_SUBSYSTEM_TIB *)NtCurrentTeb()->Tib.SubSystemTib)->curdir.DosPath;
else
cd = &NtCurrentTeb()->Peb->ProcessParameters->CurrentDirectory.DosPath;
One more debug line to print the value of SubSystemTib shows the
problem:
0124:0128:err:file:get_full_path_helper SubSystemTib: 6F72507865546E6F
As you can read from the Wine code above, a non-NULL SubSystemTib
indicates that Wine is dealing with a 16-bit process, so it uses the
16-bit thread information block (TIB) structure instead of the modern
Windows TIB. This is obviously not a 16-bit game, so the problem is that
SubSystemTib is somehow being se