November 22, 2013 – by TJ Fontaine
A few weeks ago, Eran Hammer of Walmart labs came to the Node.js core team complaining of a memory leak he had been tracking down for months. By expending a lot of effort over those few months he had taken the memory leak from a couple hundred megabytes a day, down to a mere eight megabytes a day. That in and of itself is an impressive feat, however, the source and final portion of the leak was elusive.
So he spent more effort and found a test case that reproduced with only Node APIs, which placed the fault right at Node.js’ doorstep. He found that the more HTTP client requests he did, the more memory his Node process would consume, but it was really slow. Here’s the graph from his initial data:

The top line is barely trending up, but it’s enough to manifest in frustrating ways for Walmart. The good news is that Node.js is getting mature, the bad news is that a more mature Node.js means bugs are harder to find. So with the test client and server in hand, I started on my 3 week journey to track down this memory leak. Lucky for you, you’re getting the abbreviated version here.
$ node server.js &
$ UMEM_DEBUG="default,audit=200" node --expose-gc --always-compact standalone.js
First things first, I started the server and put it in the background since we were only interested in the client. Then I ran Node with UMEM_DEBUG set to record various important information about the memory allocations. This added a certain amount of overhead to the application, but since we were just using a test script it was completely safe. Also, we were exposing the GC so we could run it from inside the script, at the same time telling the GC to stay as compact as possible.
Since it took a while to manifest, I had a little script to grab interesting information periodically from the client process. Every hour, it grabbed the output of pmap -x and a core file and stored those in Joyent Manta so we didn’t have to worry about storage space.
bash
#!/bin/bash
set -xe
DEST=~~/public/walmart.graphs
mmkdir -p $DEST
for i in $(seq 1 100); do
pmap -x $1 | mput $DEST/pmap.$i
gcore $1
mput -f core.$1 $DEST/core.$i
mput -f symboltable.json $DEST/
sleep 3600
done
Then I let this run for 15 hours. To start with, let’s compare the output between the first pmap and the 15th pmap.
diff
diff -u pmap.1 pmap.15
--- pmap.1 2013-11-21 01:42:27.929994800 +0000
+++ pmap.15 2013-11-21 01:42:09.622017684 +0000
@@ -4,7 +4,7 @@
08040000 32 32 32 - rw--- [ stack ]
08050000 14944 14944 - - r-x-- node
08EF7000 56 56 16 - rwx-- node
-08F05000 15204 15104 15104 - rwx-- [ heap ]
+08F05000 181288 180784 180784 - rwx-- [ heap ]
80199000 4 4 4 - rw--- [ anon ]
8019A000 4 4 4 - rw--- [ anon ]
8019B000 8 8 8 - rw--- [ anon ]
@@ -15,12 +15,13 @@
801D9000 256 256 256 - rw--- [ anon ]
80219000 512 512 512 - rw--- [ anon ]
86300000 100 100 100 - rw--- [ anon ]
-86B00000 1024 1024 1024 - rw--- [ anon ]
+86700000 1024 636 636 - rw--- [ anon ]
+8A500000 36 36 36 - rw--- [ anon ]
+8A50A000 980 484 484 - rwx-- [ anon ]
8F140000 64 64 64 - rw--- [ anon ]
92A00000 36 36 36 - rw--- [ anon ]
92A0A000 164 164 164 - rwx-- [ anon ]
95B00000 228 228 228 - rw--- [ anon ]
-9A300000 1024 664 664 - rw--- [ anon ]
9C000000 4096 4096 4096 - rw--- [ anon ]
9C400000 2048 2048 2048 - rw--- [ anon ]
9C600000 1024 1024 1024 - rw--- [ anon ]
@@ -29,12 +30,11 @@
9CC00000 2048 2048 2048 - rw--- [ anon ]
9CE00000 1024 1024 1024 - rw--- [ anon ]
9CF00000 1024 1024 1024 - rw--- [ anon ]
-A0600000 36 36 36 - rw---