Its not actually all that hard to make well-parallelized makefiles, provided you follow a few basic rules.
- Each build step has a unique artifact.
- That artifact is visible to Make as a file in the filesystem.
- That artifact is named $@ in the rule's recipe.
- Every time the recipe is executed, $@ is updated on success.
- If the recipe fails, it must return nonzero to Make.
- All of the dependencies of the artifact are represented in the Makefile
For example, here is how the format checks are run in my current project for some C code. Its mission: To verify that those source files which are under the aegis of clang-format are correctly formatted. BUILD_DIRS is a list of directories containing source code. CFORMATTER is the name of the formatting program. Not everything is under clang-format control, so FORMATTED_SRCS is used to opt-in to it.
- A formatted source file for each repository source file
- An empty file in the filesystem for each formatted source file that is identical to the repository source file.
- A tree of directories for the above.
Each format check is run exactly once, and only when source files change. If anything fails, then `make` returns nonzero and the build fails. Its also fully parallelized, since there aren't any neck-down points in the dependency graph. Every one of our pre-commit checks are structured this way. Build verification is as parallel as possible for fresh builds. Engineers can resolve and verify their problems quickly and incrementally when they fail.
You forgot the most important and most difficult part. Ensuring transitive dependencies really make their way into the makefile without huge manual effort. For C-like languages just adding an additional #include in a header file will break most makefiles. To solve it you need to generate makefiles using -MMD flags which are included from your main makefile and this is not very obvious.
As make was mainly intended to build C projects, not getting these batteries included is why i think parent, and many others, consider makefiles needlessly complex.
That's a nice set of rules, although I don't think they're all easy to follow or verify that 1000 lines of Make is following.
I considered 3 main use cases, and I wrote Makefiles from scratch for all of them. Make works to an extent for each case, but I still have problems.
1. Building mixed Python/C app bundles [1]
2. Building my website [2]. Notably people actually do complain about Jekyll build speed, to the point where they will use a different system like Hugo. So incremental/parallel builds are really useful in this domain!
3. Doing analytics on web log files (e.g. time series from .gz files)
One thing I didn't mention is that they all involve some sort of build parameterization or "metaprogramming". That requirement interacts with the problem of parallel and incremental builds.
For example, for #1, there is logic shared between different bundles. Pattern rules aren't really expressive enough, especially when you have two dimensions. Like (app1, app2, ...) x (debug, release, ASAN, ...)
A pet peeve of mind is having to "make clean" between a debug and a release build, and nearly all usages of Make have that problem, e.g. Python and bash's build system. You could say they are violating your rules because each artifact doesn't have a unique name on the file system (i.e. debug and release versions of the same object file.)
Likewise, Make isn't exactly flexible about how the blog directory structure is laid out. I hit the multiple outputs progblem -- I have Jekyll-style metadata at the front of each post (title, date, tags), so each .md file is split into 2 files. The index.html file depends on all the metadata, but not the data.
All of them have dynamic dependencies too:
1. I generate dependencies using the Python interpreter
2. I add new blog posts without adding Make rules
3. I add new web log files without adding Make rules
Make does handle this to an extent, but there are definitely some latent bugs. I have fixed some of them, but without a good way of testing, I haven't been motivated to fix them all.
I wrote up some more problems in [3], but this is by no means exhaustive. I'm itching to replace all of these makefiles with something that generates Ninja. It's possible I'll hit some unexpected problems, but we'll see.
My usage is maybe a bit out of the ordinary, but I don't see any reason why a single tool shouldn't handle all of these use cases.
> A pet peeve of mind is having to "make clean" between a debug and a release build, and nearly all usages of Make have that problem, e.g. Python and bash's build system. You could say they are violating your rules because each artifact doesn't have a unique name on the file system (i.e. debug and release versions of the same object file.)
There are at least two ways that this problem can be addressed. One is to support out-of-tree builds, one side directory per configuration. Builds based on the autotools do this by default.
The other is to use a separate build directory per configuration within the build. My current project uses local in-tree directories named .host-release, .host-debug (including asan), .host-tsan, .cross-release, and .cross-debug. All of them are built in parallel with a single invocation of Make, and I use target-scoped variables to control the various toolchain options.
The engineer's incremental work factor to add another build configuration isn't quite constant time, since each top-level target needs to opt into each build configuration that is relevant for that target.
> I hit the multiple outputs problem
I wouldn't really classify that as a problem in GNU Make, as long as you can specify the rule as a pattern rule.
I hear you on the testing problem. Make certainly behaves as if the Makefile's correctness isn't decidable. Even if you levied the requirement that a Makefile's decidability was predicated on the recipes being well-behaved, I'm not sure that the correctness is decidable.
- Each build step has a unique artifact.
- That artifact is visible to Make as a file in the filesystem.
- That artifact is named $@ in the rule's recipe.
- Every time the recipe is executed, $@ is updated on success.
- If the recipe fails, it must return nonzero to Make.
- All of the dependencies of the artifact are represented in the Makefile
For example, here is how the format checks are run in my current project for some C code. Its mission: To verify that those source files which are under the aegis of clang-format are correctly formatted. BUILD_DIRS is a list of directories containing source code. CFORMATTER is the name of the formatting program. Not everything is under clang-format control, so FORMATTED_SRCS is used to opt-in to it.
BUILD_DIRS_FORMAT = $(addprefix .format-check/,$(BUILD_DIRS))
$(BUILD_DIRS_FORMAT): mkdir -p $@
# There aught to be a better way to control the suffix without becoming a match-anything # rule...
.format-check/%.c: %.c | $(BUILD_DIRS_FORMAT) $(CFORMATTER) $< -style=file > $@
.format-check/%.h: %.h | $(BUILD_DIRS_FORMAT) $(CFORMATTER) $< -style=file > $@
# Record the fact that each format check passed by touching a uniquely-named file. # note the call to `false` on error, since `echo` always succeeds.
.format-check/%.diffed: .format-check/% @(diff -u --color=always -- $* $< && touch $@) || \ (echo Formatting errors exist in $* ; false)
check-formatting: $(addsuffix .diffed,$(addprefix .format-check/, $(FORMATTED_SRCS)))
It's artifacts are:
- A formatted source file for each repository source file
- An empty file in the filesystem for each formatted source file that is identical to the repository source file.
- A tree of directories for the above.
Each format check is run exactly once, and only when source files change. If anything fails, then `make` returns nonzero and the build fails. Its also fully parallelized, since there aren't any neck-down points in the dependency graph. Every one of our pre-commit checks are structured this way. Build verification is as parallel as possible for fresh builds. Engineers can resolve and verify their problems quickly and incrementally when they fail.