🚨 The Day a 12-Year-Old Bug became a QA Validation

We’re beyond excited to finally roll out cmake-re --distributed — our shiny new feature that lets scale CMake builds across thousands of remote cores on EngFlow’s Remote Build Execution (RBE) clusters.

But before we get to the good stuff, let us tell the story of how a dusty bug buried deep in Boost.Log's macros helped us !

This tale includes:

  • 🧠 Preprocessor metaprogramming gone rogue
  • 🐛 A recursion unroller that forgot how to recurse
  • 🧮 A modulo that wanted to divide
  • ⏱️ Three weeks of intense debugging... all for a one-character fix

What is cmake-re --distributed?

With a single CLI switch — cmake-re --distributed — distributes workloads across an EngFlow RBE cluster. That means:

  • Smarter parallelization of translation units
  • Lightning-fast compile times

To accomplish this, we wrap compiler invocation in our compiler driver, which allows to identify the necessary files for a given compilation before the compilation process begins.

In a distributed build system, the input processor operates on the client side. Its role is to determine which files are needed to perform a specific action, ensuring those files are available on the node where the action will be executed.

Our input-processor performs a preliminary preprocessing of C and C++ code to identify the #include files required for compilation. It then directs the cluster to supply these files to the nodes handling the distributed compilation.

And that one — the input-processor — is where the story begins.

🔍 Meet the Whistleblower: Boost.Log’s Recursion Unroller Bug

As part of prepping our --distributed public beta, we refactored our input-processor to handle C++ integral constants bigger than 32 bits. But... it also woke a sleeping giant in Boost.Log — a bug hiding in plain sight for over 12 years.

While testing the update on Boost itself (because if it builds Boost, it builds anything), we hit a strange input-processing issue with missing Boost.Preprocessor files. After some serious head-scratching, we discovered:

🤔 Boost.Log’s Attribute Value visitor recursion unrolling logic... isn’t unrolling at all.

Boost.Log Attributes are a piece of meta-information that can be used to specialize a log record.

Here’s what was happening:

The code defines a BOOST_LOG_VALUE_REF_VISITATION_UNROLL_COUNT which defaults to 8, but this value supposed to be used later in the file is never used, instead line 69 to 73 of boost/log/detail/value_ref_visitation.hpp defines a preprocessing loop for the visitation unrolling to run inside via #include BOOST_PP_ITERATE() from 1 to BOOST_PP_INC(BOOST_LOG_VALUE_REF_VISITATION_VTABLE_SIZE) :

#define BOOST_LOG_AUX_CASE_ENTRY(z, i, data)\
    case i: return visitor(*static_cast< typename mpl::at_c< SequenceT, i >::type const* >(p));


#define BOOST_PP_FILENAME_1 <boost/log/detail/value_ref_visitation.hpp>
#define BOOST_PP_ITERATION_LIMITS (1, BOOST_PP_INC(BOOST_LOG_VALUE_REF_VISITATION_VTABLE_SIZE))
#include BOOST_PP_ITERATE()

#undef BOOST_LOG_AUX_CASE_ENTRY

The macro BOOST_PP_ITERATION_LIMITS that should’ve expanded to (1, 9) instead end up looking like (1, BOOST_PP_INC_BOOST_LOG_VALUE_REF_VISITATION_VTABLE_SIZE).

And since BOOST_LOG_VALUE_REF_VISITATION_VTABLE_SIZE is never defined, we will see later that the Boost.Preprocessor slots machinery transformed it into a 0. So the recursion unrolling loop? It never loops.

Worse — it actually loops once in reverse in all compilers.

Indeed this leads to the #if condition on line 44 of boost/preprocessor/iteration/detail/iter/forward1.hpp to evaluate to true and include <boost/preprocessor/iteration/detail/iter/reverse1.hpp>:

# if (BOOST_PP_ITERATION_START_1) > (BOOST_PP_ITERATION_FINISH_1)
#    include <boost/preprocessor/iteration/detail/iter/reverse1.hpp>
# else

But Boost.Log's bug isn’t fatal to its own functionality — it just skips some performance optimizations.

🧮 A modulo that wanted to divide

The actual bug was in our code.

On that # if (BOOST_PP_ITERATION_START_1) > (BOOST_PP_ITERATION_FINISH_1) instead of computing #if (1) > (0), our input-processor saw #if (0) > (0), misread the iteration direction, and failed to include reverse1.hpp.

This happened because our input-processor mishandled the Boost.Preprocessor slots mechanism, it transformed the very simple 1 literal defining the lower bound in BOOST_PP_ITERATION_LIMITS into a 0.

Who would think that this simple 1 literal, that looked so much like a constant, would also go through the Boost.Preprocessor slots transformation mechanism ? 🧠

Boost.Preprocessor slots are a complex machinery to store numeric constants and actually have the nice side effect of cleaning up non-numeric inputs and returning 0 instead (which saves the above Boost.Log from crashing at compilation). In this machinery retrieving the stored numeric values from the slots are implemented as modulo operations like :

# define BOOST_PP_SLOT_OFFSET_10(x) (x) % 1000000000UL
# define BOOST_PP_SLOT_OFFSET_9(x) BOOST_PP_SLOT_OFFSET_10(x) % 100000000UL
# define BOOST_PP_SLOT_OFFSET_8(x) BOOST_PP_SLOT_OFFSET_9(x) % 10000000UL
# define BOOST_PP_SLOT_OFFSET_7(x) BOOST_PP_SLOT_OFFSET_8(x) % 1000000UL
# define BOOST_PP_SLOT_OFFSET_6(x) BOOST_PP_SLOT_OFFSET_7(x) % 100000UL
# define BOOST_PP_SLOT_OFFSET_5(x) BOOST_PP_SLOT_OFFSET_6(x) % 10000UL
# define BOOST_PP_SLOT_OFFSET_4(x) BOOST_PP_SLOT_OFFSET_5(x) % 1000UL
# define BOOST_PP_SLOT_OFFSET_3(x) BOOST_PP_SLOT_OFFSET_4(x) % 100UL
# define BOOST_PP_SLOT_OFFSET_2(x) BOOST_PP_SLOT_OFFSET_3(x) % 10UL

BOOST_PP_ITERATION_START_1 ended up being computed as 0 instead of the correct 1, due to a mishandled modulo operation.

The root cause? A single character typo deep inside our input-processor integral constant modulo calculation code, and this is the fix :

  if (r.unsigned_) {
-    r.value = static_cast<uint64_t>(v1.value) / static_cast<uint64_t>(v2.value);
+    r.value = static_cast<uint64_t>(v1.value) % static_cast<uint64_t>(v2.value);
  } else {
    r.value = v1.value % v2.value;
  }

Yup. 🙉 A / instead of a %, only in the case of <XXXX>UL unsigned literals ! After three weeks of sweat and breakpointing : the contribution had to be a single character change.

Discovered thanks to this 12 years old Boost.Log bug. We are truly grateful to the author of Boost.Log who added this small optimization bug before Boost was imported on SVN 12 years ago. Who would have thought that someone else's bug could become a QA Check ? 😀

⚡️ The Payoff: Blazing Fast Builds with -j1000

Now that our input-processor bug’s is squashed and the Boost.Log recursion is not unrolling as expected, it's possible to enjoy blisteringly fast distributed builds:

cmake-re --build . --distributed -j1000

Let EngFlow’s Clusters handle the heavy lifting — compile Boost in seconds, not minutes.

👋 Try It Yourself

Head over to request a trial for an EngFlow cmake-re cluster and give cmake-re --distributed a go.

If you love debugging war stories like this one sign-in, we will keep you updated about our blog + product. We’ve got more tales coming soon — some funny, some painful, all real.

Happy building!


Author image

Damien Buhl

@daminetreg

tipi.build by EngFlow, CEO