tldr; Remote caching is simple to set up and makes repeated builds 50x faster.
In my previous article, I've benchmarked the Bazel build system on a C++ project, and found that it performs well, but doesn't really offer a compelling story over alternatives. In this article, I'll explore one of Bazel's unique selling points, remote caching, and show that it delivers great speedups.
How remote caching works
Most build systems only track file timestamps or other metadata, but Bazel hashes the content of source code as part of its strict dependency checks. With such a system in place, caching is trivial, since intermediate build artifacts can simply be stored and retrieved by recursively aggregating the hashes of their dependencies in the build graph.
For this benchmark, I've followed the documentation and configured the simplest supported cache backend, an
PUT support for storing new files. This is all handled by the same docker-compose setup that configures the Docker for the actual build (the code is here).
The benchmark is the same as my previous article, and simulates a standard development workflow:
- do a fresh build of the project
- build again, without modifying any file (no-op build)
- checkout a different commit, 2 weeks apart
- do an incremental build of this new commit
- revert to the original commit
The plot compares the running times for each of these steps, without and with remote caching enabled:
Enabling remote caching has a slight negative effect on the first two builds, since pushing the intermediate artifacts to the cache adds latency. However, this means that the third build only takes ~ 20 seconds to complete (50x speedup), since all the artifacts are already stored. For reference, the total size of the cache is ~2.5GB, so the bandwidth requirements are significant and the network may be a bottleneck during cached builds.
I have repeated the test using a fresh Docker container, and the cache is indeed usable by different instances, as expected.
Simply pointing Bazel to an nginx server is enough to massively speed up repeated builds of C++ projects. In real life, some additional setup would be needed to keep che cache size bounded, and to make sure that build machines have enough bandwidth to the cache server.