Builds are Complicated: C/C++ (evanjones.ca)

[ 2012-March-03 11:38 ]

At first glance, building C and C++ code seems straightforward. Source files (.c/.cc) are compiled into object files (.o/.obj), which are linked into executables or libraries. The programmer manually lists the object files or libraries that comprise a given executable. This creates a clear dependency graph, making it easy to figure out what needs to be done to build a given output file. The build tool just needs to check the timestamps on the input and output files. However, there are a number of challenges in larger projects, mostly related to reusing code.

Implicit Dependencies

An object file does not depend on only its .c/.cc source file, but also on all the header files that are transitively included when compiling that file. If any of these header files changes, the object file should be rebuilt. The set of input files can only be determined by pre-processing the source file. Thankfully, compilers like GCC can output this dependency information (see the -MD switch), and build systems like Ninja can include it. Thus, this is actually a "solved" problem, although some tools don't do this by default.

Static Libraries

At first glance, sharing code using static libraries is straightforward: When the static library is updated, anything that depends on it needs to be relinked. However, if that library (e.g. libParent.a) depends on other libraries (e.g. libChild.a), then an executable that links the parent library must explicitly link all transitive dependencies (both libParent.a and libChild.a). Some linkers care about the order, only searching for symbols in libraries that appear later on the command line. Thus, the libraries must be ordered so that "higher level" libraries appear first, with their dependencies later.

Generated Code

Tools that generate source code are not uncommon. They are widely used for data exchange and RPC systems, such as Protocol Buffers or Thrift. This means that first a generator tool must be compiled, next the tool generates the code, and finally the generated code is compiled. Generated headers are particularly problematic. C/C++ programmers expect to be able to include any header without needing to do any build configuration. Thus, the build system could try to compile a file before generating the needed headers, causing the compiler to complain about missing includes. To fix this, the build system must support some "special" ordering rules for generated headers. One solution is if a library includes generated code, the headers must be generated before anything that depends on it is compiled (Ninja supports order-only dependencies to handle this).

Third-Party Dependencies

The worst complications come from third-party libraries. To include a dependency in your project, you can either require everyone to install the "system" version of a library (e.g. on using your system's package manager on Linux), or you can import the source and build it with your project. I prefer this latter approach, because it means that you automatically have access to all the source code, and developers just pull down a single source tree to build the project. This is what Google's Chromium project does. It contains a third_party directory with tons of dependencies.

Third-party projects make different assumptions about where header files can be found, which libraries need to be passed on the link line, and may require certain macros to be defined. Any code that depends on this library may need these same include paths and macros to in order to be compiled correctly. This means the build system needs some understanding of search paths, macros, and compiler options.