Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

I'm only going to answer this part of your question:

Is there some way to build my packages that is determinstic/repeatable?

The Catkin side to this is already largely deterministic I believe.

Catkin performs two main tasks before it asks CMake to build everything:

  1. discover all packages in your src space and construct a graph where the vertices are the packages and the edges encode dependencies (ie: if pkg A depends on B, there will be an edge between the vertices representing A and B expressing that dependency)
  2. run a topological sort over that graph, to determine the order in which everything should be built which satisfies those dependencies

Your packages state their dependencies using build_depends in their manifests, which Catkin parses and then uses to build the graph. After the sort, it kicks off the build by asking CMake to start the build by building the packages which:

  1. have no dependencies (rare),
  2. for which all dependencies are already built,
  3. depend only on packages present in an underlay (or its underlay, ad infinitum)
  4. do not depend on each other

for all of those, the order in which they are built does not really matter, as they have no depends relationship at all.

After the first layer of packages has been built, Catkin moves up to the next, which will be packages which depend on (some, or all) of the packages it just built. This process then repeats, until the "top" package(s) is (are) built.

Note: only the build_depend entries in your package.xml are taken into account, not the setup.py or CMakeLists.txt (actually, there's also build_export_depend et al., but I'll ignore those here, they don't significantly change the explanation).

Now back to your question:

Sometimes other people with the same workspace will build and find that their build fails with some missing dependency in a CMakeLists.txt somewhere. [..] If I run it several times in succession, more and more of the packages will be successfully built.

As I wrote: only package.xml entries are used to create the graph. So it's entirely possible for a pkg A which depends on pkg B, but which does not state a <build_depend>B</build_depend> to end up on the same 'layer' of the graph and thus get built at the same time. Obviously this fails, as A does need B.

Catkin can not really be blamed in such cases I believe: according to the information you provided, A and B fall into (at least) category 4 above.

However, it's possible A can be partly built, perhaps because it only needs some parts of B, and this is where the non-determinism comes in: depending on how your operating system schedules the cmake, make and gcc processes, those "some parts of B" may or may not have already completed building.

If they were finished by the time A tries to use them: you'll never know there was a problem with the build_depends of A. If they were not finished, A's build will fail, probably with some sort of #include error, complaining No such file of directory: "B/some_include.h" and Bs build is also aborted (together with any other jobs which may be running for other pkgs).

A good example of this situation is packages which provide messages, services and/or actions. The headers are generated during the build, and it's imperative message generation has completed before other packages try to #include those files, or GCC (or whichever compiler you're using) will not be able to find the files and abort.

If I run catkin build && catkin clean several times, different numbers of packages will be successfully built.

So where does the non-determinism come in, and why does restarting the build (sometimes) help?

Each time you restart your build, B's build processes will get a little further before they are aborted due to A's build failing. At some point, enough of B will have finished for A's build to no longer fail (fi: the message headers #included by A may have actually been generated by B's build). But how much of B succeeds is slightly non-deterministic, due to the stochasticity which results from OS process & thread scheduling I mentioned above (it not only depends on available jobs in the build, but on all other processes on your machine fi).

As to why different numbers of packages may succeed: a topological sort does not always have a unique solution, and in addition, again due to scheduling, there is a non-zero chance packages with the same or similar dependency sets will be scheduled "at the same time", but really only start building at slightly different times. Together with the semi-randomness of tasks succeeding in dependencies (fi: B's tasks getting scheduled and/or aborted at slightly different times), this leads to a situation where packages on the same layer in the dependency graph not always progressing in the exact same way in their build. This can appear to be random, but should be explainable if total insight into scheduling of tasks, progress of the build, file system state, etc would be available.

I'm only going to answer this part of your question:

Is there some way to build my packages that is determinstic/repeatable?

The tl;dr: if you don't state your dependencies correctly, Catkin is free to start building pkgs, which may in fact depend on each other, in parallel, leading to compiler errors and ultimately failed and aborted builds.

As the exact point in time where builds get aborted is (almost) stochastic (due to scheduling, CPU and IO performance, etc), and dependents typically will not need everything from dependencies, partial builds of dependencies may result in successful builds of dependents.

This leads to successive builds appearing to make progress, as more-and-more dependents will find their dependencies where they expect them, leading to those builds succeeding, leading to their dependents being able to start their build. Etc, etc.


Long version: the Catkin side to this is already largely deterministic I believe.believe. It's the other parts which may not be.

Catkin performs two main tasks before it asks CMake to build everything:

  1. discover all packages in your src space and construct a graph where the vertices are the packages and the edges encode dependencies (ie: if pkg A depends on B, there will be an edge between the vertices representing A and B expressing that dependency)
  2. run a topological sort over that graph, to determine the order in which everything should be built which satisfies those dependencies

Your packages state their dependencies using build_depends in their manifests, which Catkin parses and then uses to build the graph. After the sort, it kicks off the build by asking CMake to start the build by building the packages which:

  1. have no dependencies (rare),
  2. for which all dependencies are already built,
  3. depend only on packages present in an underlay (or its underlay, ad infinitum)
  4. do not depend on each other

for all of those, the order in which they are built does not really matter, as they have no depends relationship at all.

After the first layer of packages has been built, Catkin moves up to the next, which will be packages which depend on (some, or all) of the packages it just built. This process then repeats, until the "top" package(s) is (are) built.

Note: only the build_depend entries in your package.xml are taken into account, not the setup.py or CMakeLists.txt (actually, there's also build_export_depend et al., but I'll ignore those here, they don't significantly change the explanation).

Now back to your question:

Sometimes other people with the same workspace will build and find that their build fails with some missing dependency in a CMakeLists.txt somewhere. [..] If I run it several times in succession, more and more of the packages will be successfully built.

As I wrote: only package.xml entries are used to create the graph. So it's entirely possible for a pkg A which depends on pkg B, but which does not state a <build_depend>B</build_depend> to end up on the same 'layer' of the graph and thus get built at the same time. Obviously this fails, as A does need B.

Catkin can not really be blamed in such cases I believe: according to the information you provided, A and B fall into (at least) category 4 above.

However, it's possible A can be partly built, perhaps because it only needs some parts of B, and this is where the non-determinism comes in: depending on how your operating system schedules the cmake, make and gcc processes, those "some parts of B" may or may not have already completed building.

If they were finished by the time A tries to use them: you'll never know there was a problem with the build_depends of A. If they were not finished, A's build will fail, probably with some sort of #include error, complaining No such file of directory: "B/some_include.h" and Bs build is also aborted (together with any other jobs which may be running for other pkgs).

A good example of this situation is packages which provide messages, services and/or actions. The headers are generated during the build, and it's imperative message generation has completed before other packages try to #include those files, or GCC (or whichever compiler you're using) will not be able to find the files and abort.

If I run catkin build && catkin clean several times, different numbers of packages will be successfully built.

So where does the non-determinism come in, and why does restarting the build (sometimes) help?

Each time you restart your build, B's build processes will get a little further before they are aborted due to A's build failing. At some point, enough of B will have finished for A's build to no longer fail (fi: the message headers #included by A may have actually been generated by B's build). But how much of B succeeds is slightly non-deterministic, due to the stochasticity which results from OS process & thread scheduling I mentioned above (it not only depends on available jobs in the build, but on all other processes on your machine fi).

As to why different numbers of packages may succeed: a topological sort does not always have a unique solution, and in addition, again due to scheduling, there is a non-zero chance packages with the same or similar dependency sets will be scheduled "at the same time", but really only start building at slightly different times. Together with the semi-randomness of tasks succeeding in dependencies (fi: B's tasks getting scheduled and/or aborted at slightly different times), this leads to a situation where packages on the same layer in the dependency graph not always progressing in the exact same way in their build. This can appear to be random, but should be explainable if total insight into scheduling of tasks, progress of the build, file system state, etc would be available.

I'm only going to answer this part of your question:

Is there some way to build my packages that is determinstic/repeatable?

tl;dr: if you don't state your dependencies correctly, Catkin is free to start building pkgs, which may in fact depend on each other, in parallel, leading to compiler errors and ultimately failed and aborted builds.

As the exact point in time where builds get aborted is (almost) stochastic (due to scheduling, CPU and IO performance, etc), and dependents typically will not need everything from dependencies, partial builds of dependencies may result in successful builds of dependents.

This leads to successive builds appearing to make progress, as more-and-more dependents will find their dependencies where they expect them, leading to those builds succeeding, leading to their dependents being able to start their build. Etc, etc.

With enough iterations, the whole graph of dependencies gets built, until finally all dependencies are built and there are no more failures.

As to the question: always running catkin_make -j1 could help make things fail at almost the same place, however it's no guarantee, as other factors may still cause slight ordering differences in operations.


Long version: the Catkin side to this is already largely deterministic I believe. It's the other parts which may not be.

Catkin performs two main tasks before it asks CMake to build everything:

  1. discover all packages in your src space and construct a graph where the vertices are the packages and the edges encode dependencies (ie: if pkg A depends on B, there will be an edge between the vertices representing A and B expressing that dependency)
  2. run a topological sort over that graph, to determine the order in which everything should be built which satisfies those dependencies

Your packages state their dependencies using build_depends in their manifests, which Catkin parses and then uses to build the graph. After the sort, it kicks off the build by asking CMake to start the build by building the packages which:

  1. have no dependencies (rare),
  2. for which all dependencies are already built,
  3. depend only on packages present in an underlay (or its underlay, ad infinitum)
  4. do not depend on each other

for all of those, the order in which they are built does not really matter, as they have no depends relationship at all.

After the first layer of packages has been built, Catkin moves up to the next, which will be packages which depend on (some, or all) of the packages it just built. This process then repeats, until the "top" package(s) is (are) built.

Note: only the build_depend entries in your package.xml are taken into account, not the setup.py or CMakeLists.txt (actually, there's also build_export_depend et al., but I'll ignore those here, they don't significantly change the explanation).

Now back to your question:

Sometimes other people with the same workspace will build and find that their build fails with some missing dependency in a CMakeLists.txt somewhere. [..] If I run it several times in succession, more and more of the packages will be successfully built.

As I wrote: only package.xml entries are used to create the graph. So it's entirely possible for a pkg A which depends on pkg B, but which does not state a <build_depend>B</build_depend> to end up on the same 'layer' of the graph and thus get built at the same time. Obviously this fails, as A does need B.

Catkin can not really be blamed in such cases I believe: according to the information you provided, A and B fall into (at least) category 4 above.

However, it's possible A can be partly built, perhaps because it only needs some parts of B, and this is where the non-determinism comes in: depending on how your operating system schedules the cmake, make and gcc processes, those "some parts of B" may or may not have already completed building.

If they were finished by the time A tries to use them: you'll never know there was a problem with the build_depends of A. If they were not finished, A's build will fail, probably with some sort of #include error, complaining No such file of directory: "B/some_include.h" and Bs build is also aborted (together with any other jobs which may be running for other pkgs).

A good example of this situation is packages which provide messages, services and/or actions. The headers are generated during the build, and it's imperative message generation has completed before other packages try to #include those files, or GCC (or whichever compiler you're using) will not be able to find the files and abort.

If I run catkin build && catkin clean several times, different numbers of packages will be successfully built.

So where does the non-determinism come in, and why does restarting the build (sometimes) help?

Each time you restart your build, B's build processes will get a little further before they are aborted due to A's build failing. At some point, enough of B will have finished for A's build to no longer fail (fi: the message headers #included by A may have actually been generated by B's build). But how much of B succeeds is slightly non-deterministic, due to the stochasticity which results from OS process & thread scheduling I mentioned above (it not only depends on available jobs in the build, but on all other processes on your machine fi).

As to why different numbers of packages may succeed: a topological sort does not always have a unique solution, and in addition, again due to scheduling, there is a non-zero chance packages with the same or similar dependency sets will be scheduled "at the same time", but really only start building at slightly different times. Together with the semi-randomness of tasks succeeding in dependencies (fi: B's tasks getting scheduled and/or aborted at slightly different times), this leads to a situation where packages on the same layer in the dependency graph not always progressing in the exact same way in their build. This can appear to be random, but should be explainable if total insight into scheduling of tasks, progress of the build, file system state, etc would be available.

Disabling parallelism during the build (both process and thread) will significantly reduce these effects and likely cause failures in the same or similar spots. With catkin_make, this can be done with catkin_make -j1. catkin_tools supports similar flags.

I'm only going to answer this part of your question:

Is there some way to build my packages that is determinstic/repeatable?

tl;dr: if you don't state your dependencies correctly, Catkin is free to start building pkgs, which may in fact depend on each other, in parallel, leading to compiler errors and ultimately failed and aborted builds.

As the exact point in time where builds get aborted is (almost) stochastic (due to scheduling, CPU and IO performance, etc), and dependents typically will not need everything from dependencies, partial builds of dependencies may result in successful builds of (some) dependents.

This leads to successive builds appearing to make progress, as more-and-more dependents will find their dependencies where they expect them, leading to those builds succeeding, leading to their dependents being able to start their build. Etc, etc.

With enough iterations, the whole graph of dependencies gets built, until finally all dependencies are built and there are no more failures.

As to the question: always running catkin_make -j1 could help make things fail at almost the same place, however it's no guarantee, as other factors may still cause slight ordering differences in operations.


Long version: the Catkin side to this is already largely deterministic I believe. It's the other parts which may not be.

Catkin performs two main tasks before it asks CMake to build everything:

  1. discover all packages in your src space and construct a graph where the vertices are the packages and the edges encode dependencies (ie: if pkg A depends on B, there will be an edge between the vertices representing A and B expressing that dependency)
  2. run a topological sort over that graph, to determine the order in which everything should be built which satisfies those dependencies

Your packages state their dependencies using build_depends in their manifests, which Catkin parses and then uses to build the graph. After the sort, it kicks off the build by asking CMake to start the build by building the packages which:

  1. have no dependencies (rare),
  2. for which all dependencies are already built,
  3. depend only on packages present in an underlay (or its underlay, ad infinitum)
  4. do not depend on each other

for all of those, the order in which they are built does not really matter, as they have no depends relationship at all.

After the first layer of packages has been built, Catkin moves up to the next, which will be packages which depend on (some, or all) of the packages it just built. This process then repeats, until the "top" package(s) is (are) built.

Note: only the build_depend entries in your package.xml are taken into account, not the setup.py or CMakeLists.txt (actually, there's also build_export_depend et al., but I'll ignore those here, they don't significantly change the explanation).

Now back to your question:

Sometimes other people with the same workspace will build and find that their build fails with some missing dependency in a CMakeLists.txt somewhere. [..] If I run it several times in succession, more and more of the packages will be successfully built.

As I wrote: only package.xml entries are used to create the graph. So it's entirely possible for a pkg A which depends on pkg B, but which does not state a <build_depend>B</build_depend> to end up on the same 'layer' of the graph and thus get built at the same time. Obviously this fails, as A does need B.

Catkin can not really be blamed in such cases I believe: according to the information you provided, A and B fall into (at least) category 4 above.

However, it's possible A can be partly built, perhaps because it only needs some parts of B, and this is where the non-determinism comes in: depending on how your operating system schedules the cmake, make and gcc processes, those "some parts of B" may or may not have already completed building.

If they were finished by the time A tries to use them: you'll never know there was a problem with the build_depends of A. If they were not finished, A's build will fail, probably with some sort of #include error, complaining No such file of directory: "B/some_include.h" and Bs build is also aborted (together with any other jobs which may be running for other pkgs).

A good example of this situation is packages which provide messages, services and/or actions. The headers are generated during the build, and it's imperative message generation has completed before other packages try to #include those files, or GCC (or whichever compiler you're using) will not be able to find the files and abort.

If I run catkin build && catkin clean several times, different numbers of packages will be successfully built.

So where does the non-determinism come in, and why does restarting the build (sometimes) help?

Each time you restart your build, B's build processes will get a little further before they are aborted due to A's build failing. At some point, enough of B will have finished for A's build to no longer fail (fi: the message headers #included by A may have actually been generated by B's build). But how much of B succeeds is slightly non-deterministic, due to the stochasticity which results from OS process & thread scheduling I mentioned above (it not only depends on available jobs in the build, but on all other processes on your machine fi).

As to why different numbers of packages may succeed: a topological sort does not always have a unique solution, and in addition, again due to scheduling, there is a non-zero chance packages with the same or similar dependency sets will be scheduled "at the same time", but really only start building at slightly different times. Together with the semi-randomness of tasks succeeding in dependencies (fi: B's tasks getting scheduled and/or aborted at slightly different times), this leads to a situation where packages on the same layer in the dependency graph not always progressing in the exact same way in their build. This can appear to be random, but should be explainable if total insight into scheduling of tasks, progress of the build, file system state, etc would be available.

Disabling parallelism during the build (both process and thread) will significantly reduce these effects and likely cause failures in the same or similar spots. With catkin_make, this can be done with catkin_make -j1. catkin_tools supports similar flags.

I'm only going to answer this part of your question:

Is there some way to build my packages that is determinstic/repeatable?

tl;dr: if you don't state your dependencies correctly, Catkin is free to start building pkgs, which may in fact depend on each other, in parallel, leading to compiler errors and ultimately failed and aborted builds.

As the exact point in time where builds get aborted is (almost) stochastic (due to scheduling, CPU and IO performance, etc), and dependents typically will not need everything from dependencies, partial builds of dependencies may result in successful builds of (some) dependents.

This leads to successive builds appearing to make progress, as more-and-more dependents will find their dependencies where they expect them, leading to those builds succeeding, leading to their dependents being able to start their build. Etc, etc.

With enough iterations, the whole graph of dependencies gets built, until finally all dependencies are built and there are no more failures.

As to the question: always running catkin_make -j1 could help make things fail at almost the same place, however it's no guarantee, as other factors may still cause slight ordering differences in operations.


Long version: the Catkin side to this is already largely deterministic I believe. It's the other parts which may not be.

Catkin performs two main tasks before it asks CMake to build everything:

  1. discover all packages in your src space and construct a graph where the vertices are the packages and the edges encode dependencies (ie: if pkg A depends on B, there will be an edge between the vertices representing A and B expressing that dependency)
  2. run a topological sort over that graph, to determine the order in which everything should be built which satisfies those dependencies

Your packages state their dependencies using build_depends in their manifests, which Catkin parses and then uses to build the graph. After the sort, it kicks off the build by asking CMake to start the build by building the packages which:

  1. have no dependencies (rare),
  2. for which all dependencies are already built,
  3. depend only on packages present in an underlay (or its underlay, ad infinitum)
  4. do not depend on each other

for all of those, the order in which they are built does not really matter, as they have no depends relationship at all.

After the first layer of packages has been built, Catkin moves up to the next, which will be packages which depend on (some, or all) of the packages it just built. This process then repeats, until the "top" package(s) is (are) built.

Note: only the build_depend entries in your package.xml are taken into account, not the setup.py or CMakeLists.txt (actually, there's also build_export_depend et al., but I'll ignore those here, they don't significantly change the explanation).

Now back to your question:

Sometimes other people with the same workspace will build and find that their build fails with some missing dependency in a CMakeLists.txt somewhere. [..] If I run it several times in succession, more and more of the packages will be successfully built.

As I wrote: only package.xml entries are used to create the graph. So it's entirely possible for a pkg A which depends on pkg B, but which does not state a <build_depend>B</build_depend> to end up on the same 'layer' of the graph and thus get built at the same time. Obviously this fails, as A does need B.

Catkin (Catkin can not really be blamed in such cases I believe: according to the information you provided, A and B fall into (at least) category 4 above.above).

However, it's possible A can be partly built, perhaps because it only needs some parts of B, and this is where the non-determinism comes in: depending on how your operating system schedules the cmake, make and gcc processes, those "some parts of B" may or may not have already completed building.

If they were finished by the time A tries to use them: you'll never know there was a problem with the build_depends of A. If they were not finished, A's build will fail, probably with some sort of #include error, complaining No such file of directory: "B/some_include.h" and Bs build is also aborted (together with any other jobs which may be running for other pkgs).

A good example of this situation is packages which provide messages, services and/or actions. The headers are generated during the build, and it's imperative message generation has completed before other packages try to #include those files, or GCC (or whichever compiler you're using) will not be able to find the files and abort.

If I run catkin build && catkin clean several times, different numbers of packages will be successfully built.

So where does the non-determinism come in, and why does restarting the build (sometimes) help?

Each time you restart your build, B's build processes will get a little further before they are aborted due to A's build failing. At some point, enough of B will have finished for A's build to no longer fail (fi: the message headers #included by A may have actually been generated by B's build). But how much of B succeeds is slightly non-deterministic, due to the stochasticity which results from OS process & thread scheduling I mentioned above (it not only depends on available jobs in the build, but on all other processes on your machine fi).

As to why different numbers of packages may succeed: a topological sort does not always have a unique solution, and in addition, again due to scheduling, there is a non-zero chance packages with the same or similar dependency sets will be scheduled "at the same time", but really only start building at slightly different times. Together with the semi-randomness of tasks succeeding in dependencies (fi: B's tasks getting scheduled and/or aborted at slightly different times), this leads to a situation where packages on the same layer in the dependency graph not always progressing in the exact same way in their build. This can appear to be random, but should be explainable if total insight into scheduling of tasks, progress of the build, file system state, etc would be available.

Disabling parallelism during the build (both process and thread) will significantly reduce these effects and likely cause failures in the same or similar spots. With catkin_make, this can be done with catkin_make -j1. catkin_tools supports similar flags.

I'm only going to answer this part of your question:

Is there some way to build my packages that is determinstic/repeatable?

tl;dr: if you don't state your dependencies correctly, Catkin is free to start building pkgs, which may in fact depend on each other, in parallel, leading to compiler errors and ultimately failed and aborted builds.

As the exact point in time where builds get aborted is (almost) stochastic (due to scheduling, CPU and IO performance, etc), and dependents typically will not need everything from dependencies, partial builds of dependencies may result in successful builds of (some) dependents.

This leads to successive builds appearing to make progress, as more-and-more dependents will find their dependencies where they expect them, leading to those builds succeeding, leading to their dependents being able to start their build. Etc, etc.

With enough iterations, the whole graph of dependencies gets built, until finally all dependencies are built and there are no more failures.

As to the question: always running catkin_make -j1 could help make things fail at almost the same place, however it's no guarantee, as other factors may still cause slight ordering differences in operations.


Long version: the Catkin side to this is already largely deterministic I believe. It's the other parts which may not be.

Catkin performs two main tasks before it asks CMake to build everything:

  1. discover all packages in your src space and construct a graph where the vertices are the packages and the edges encode dependencies (ie: if pkg A depends on B, there will be an edge between the vertices representing A and B expressing that dependency)
  2. run a topological sort over that graph, to determine the order in which everything should be built which satisfies those dependencies

Your packages state their dependencies using build_depends in their manifests, which Catkin parses and then uses to build the graph. After the sort, it kicks off the build by asking asks CMake to start the build by building the packages which:

  1. have no dependencies (rare),
  2. for which all dependencies are already built,
  3. depend only on packages present in an underlay (or its underlay, ad infinitum)
  4. do not depend on each other

for all of those, the order in which they are built does not really matter, as pkgs in category 1 don't depend on anything, in category 2 and 3 have all their dependencies ready and in category 4 cannot fail when built at the same time as they have no depends relationship at all.don't need anything from each other.

After the first layer of packages has been built, Catkin moves up to the next, which will be packages which depend on (some, or all) of the packages it just built. This process then repeats, until the "top" package(s) is (are) built.

Note: only the build_depend entries in your package.xml are taken into account, not the setup.py or CMakeLists.txt (actually, there's also build_export_depend et al., but I'll ignore those here, they don't significantly change the explanation).

Now back to your question:

Sometimes other people with the same workspace will build and find that their build fails with some missing dependency in a CMakeLists.txt somewhere. [..] If I run it several times in succession, more and more of the packages will be successfully built.

As I wrote: only package.xml entries are used to create the graph. So it's entirely possible for a pkg A which depends on pkg B, but which does not state a <build_depend>B</build_depend> to end up on the same 'layer' of the graph and thus get built at the same time. Obviously this fails, as A does need B (Catkin can not really be blamed in such cases I believe: according to the information you provided, A and B fall into (at least) category 4 above).

However, it's possible A can be partly built, perhaps because it only needs some parts of B, and this is where the non-determinism comes in: depending on how your operating system schedules the cmake, make and gcc processes, those "some parts of B" may or may not have already completed building.

If they were finished by the time A tries to use them: you'll never know there was a problem with the build_depends of A. If they were not finished, A's build will fail, probably with some sort of #include error, complaining No such file of directory: "B/some_include.h" and Bs build is also aborted (together with any other jobs which may be running for other pkgs).

A good example of this situation is packages which provide messages, services and/or actions. The headers are generated during the build, and it's imperative message generation has completed before other packages try to #include those files, or GCC (or whichever compiler you're using) will not be able to find the files and abort.

If I run catkin build && catkin clean several times, different numbers of packages will be successfully built.

So where does the non-determinism come in, and why does restarting the build (sometimes) help?

Each time you restart your build, B's build processes will get a little further before they are aborted due to A's build failing. At some point, enough of B will have finished for A's build to no longer fail (fi: the message headers #included by A may have actually been generated by B's build). But how much of B succeeds is slightly non-deterministic, due to the stochasticity which results from OS process & thread scheduling I mentioned above (it not only depends on available jobs in the build, but on all other processes on your machine fi).

As to why different numbers of packages may succeed: a topological sort does not always have a unique solution, and in addition, again due to scheduling, there is a non-zero chance packages with the same or similar dependency sets will be scheduled "at the same time", but really only start building at slightly different times. Together with the semi-randomness of tasks succeeding in dependencies (fi: B's tasks getting scheduled and/or aborted at slightly different times), this leads to a situation where packages on the same layer in the dependency graph not always progressing in the exact same way in their build. This can appear to be random, but should be explainable if total insight into scheduling of tasks, progress of the build, file system state, etc would be available.

Disabling parallelism during the build (both process and thread) will significantly reduce these effects and likely cause failures in the same or similar spots. With catkin_make, this can be done with catkin_make -j1. catkin_tools supports similar flags.