Nocc – A Distributed C++ Compiler by gjvc

Share This Article

Sed ut perspiciatis unde.

nocc propagates a compiler invocation to a remote machine: nocc g++ 1.cpp calls g++ remotely, not locally.

nocc speeds up compilation of large C++ projects: when you have multiple remotes, tons of local jobs are parallelized between them.

But its most significant effort is greatly speeding up re-compilation across build agents in CI/CD and across developers working on the same project:
they use shared remote caches.
Once a cpp file has been compiled, the resulting obj is used by other agents without launching compilation, actually.

nocc easily integrates into any build system, since a build system should only prefix executing commands.

The reason why nocc was created

nocc was created at VK.com to speed up KPHP compilation.
KPHP is a PHP compiler: it converts PHP sources to C++.
VK.com codebase is huge, for how we have about 150 000 autogenerated cpp files.

Our goal was to greatly improve the performance of the “C++ → binary” step.

Since 2014, we used distcc.
In 2019, we patched distcc to support precompiled headers. That gave us 5x to performance.
In 2021, we decided to implement a distcc replacement. Finally, we got 2x – 9x over the patched version.

Installation and configuration

The easiest way is just to download ready binaries — proceed to the releases page
and download the latest .tar.gz for your system: you’ll have 3 binaries after extracting.

You can also compile nocc from sources, see the installation page.

For a test launch (to make sure that everything works), proceed to this section.

For a list of command-line arguments and environment variables, visit the configuration page.

Consider the following file named 1.cpp:

#include "1.h"

int square(int a) { 
  return a * a; 
}

Having 1.h be just like

When you run nocc g++ 1.cpp -o 1.o -c, the compilation is done remotely:

What’s actually happening here:

nocc parses the command-line invocation: input files, include dirs, cxx flags, etc.
for an input file (1.cpp), nocc finds all dependencies: it traverses all #include recursively (which results in just one file 1.h here)
nocc uploads files to a server and waits
nocc-server executes the same command-line (same cxx flags, but modified paths)
nocc-server pushes a compiled object file back
nocc saves 1.o — the same as if compiled locally

Besides an object file, nocc-server pushes exitCode/stdout/stderr of the C++ compiler: nocc process uses them as a self output.

In production, you have multiple compilation servers

Conceptually, you can think of a working scheme like this:

Lots of nocc processes are launched simultaneously — much more than you could launch if you use g++ locally.

Every nocc invocation handles exactly one .cpp -> .o compilation, it’s by design.
It does remote compilation and dies — nocc is just a front-end layer between any build system and a real C++ compiler.

For every invocation, a remote server is chosen, all dependencies are detected, missing dependencies are uploaded,
and the server streams back a ready obj file.
This happens in parallel for all command lines.

Actually, to be more efficient, all connections are served via one background nocc-daemon:

nocc-daemon is written in Go, whereas nocc is a very lightweight C++ wrapper,
the only aim of which is to pipe command-line to a daemon, wait for the response, and die.

So, a final working scheme is the following:

The very first nocc invocation starts nocc-daemon:
a daemon serves grpc connections and actually does all stuff for remote compilation.
Every nocc invocation pipes a command-line (g++ ...) to a daemon via Unix socket, a daemon compiles it remotely and
writes the resulting .o file, then nocc process dies.
nocc jobs start and die: a build system executes and balances them.
nocc-daemon dies in 15 seconds after nocc stops connecting (after the compilation process finishes).

For more info, consider the nocc architecture page.

nocc is also a remote src/obj cache

The main idea behind nocc is that the 2nd, the 3rd, the Nth runs are faster than the first.
Even if you clean a build directory, even on another machine, even in a renamed folder.

That’s because of remote caches.
nocc does not upload files if they have already been uploaded — that’s the src cache.
nocc does not compile files if they have already been compiled — that’s the obj cache.

Such an approach dramatically decreases compilation times if your CI has different build machines or your builds start from a fresh copy.
Moreover, git branch switching and merging is also a great target for remote caching.

When CMake generates a buildfile for your C++ project, you typically launch the build process with make or ninja.
These build systems launch and balance processes and keep doing it until all C++ files are compiled.

Our goal is to tell CMake to launch nocc g++ instead of g++ (or any other C++ compiler). This can be done
with -DCMAKE_CXX_COMPILER_LAUNCHER:

cmake -DCMAKE_CXX_COMPILER_LAUNCHER=/path/to/nocc ..

Then make building would look like this:

CMake sometimes invokes the C++ compiler with -MD/-MT flags to generate a dependency list.
nocc supports them out of the box, depfiles are generated on a client-side.

Ninja is a build system, easily integrated to CMake instead of make.

nocc works with ninja, but there are 2 points to care about:

Nocc – A Distributed C++ Compiler by gjvc

Nocc – A Distributed C++ Compiler by gjvc

Share This Article

Newsletter

The reason why nocc was created

Installation and configuration

In production, you have multiple compilation servers

nocc is also a remote src/obj cache

HackTech

Leave a comment Cancel reply

Editor's Choice

Nocc – A Distributed C++ Compiler by gjvc

Nocc – A Distributed C++ Compiler by gjvc

Share This Article

Newsletter

The reason why nocc was created

Installation and configuration

In production, you have multiple compilation servers

nocc is also a remote src/obj cache

HackTech

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter