13

Here's a small illustration of my question:

Assume a build job that consists of 4 independent tasks named A-D. D takes longer than A-C do in sum.

A build system that cannot incorporate the relative task times might schedule the tasks like this:

---------------------------------------
CPU1: A  |    C   |
---------------------------------------
CPU2: B    | D                        |
---------------------------------------

In contrast, if the scheduler is aware of the task time differences, it could come up with this much shorter schedule:

---------------------------------------
CPU1: A  |  B    |   C   |
---------------------------------------
CPU2: D                        |
---------------------------------------

My questions:

  1. Are there any build systems that incorporate relative expected task times into the schedule?
  2. What academic research into build systems of this kind exists?
  3. Where do these build systems (if they exist) take the time information from? Heuristics, timings collected during previous builds?
  4. If such build systems do not exist, why? Is there a gotcha that would make them less worthwile than they appear at first glance?
sjakobi
  • 157

2 Answers2

3

Microsoft Visual Studio Team System (formerly TFS) does consider build action times and parallel builds; it takes the data from previous build history; and while I don't believe you can get the behavior you want out of the box, you may be able to customize it.

An example of some custom tasks to work on optimizing performance

https://veegens.wordpress.com/2013/03/26/tfs-2010-build-performance-report/

Bruno Guardia
  • 965
  • 7
  • 10
0

This is based on the wrong assumption that "building" a task is non-parallel.

Many compilers work multi-threaded, so a single task A will use all CPUs. Therefore, the order doesn't matter. For I/O bound tasks, especially involving networking, better start them all parallely from the start too: most time will be spent waiting for an answer.

In other words, ordering does not matter since the individual tasks are typically parallelized (like compiling for instance)


Edit:

Actually, this conception of "Task A on CPU 1" is flawed too. Even for single threaded tasks, the OS scheduling the processes/threads may hop it from CPU to CPU on each context switch. I guess most build systems will just run all tasks in parallel and let the OS do the scheduling. Longer tasks will take longer and that's about it.

Assuming you have a long running single threaded task that's not I/O bound, it would be way easier for the build system to assign it a priority/importance rather to attempt to delay smaller tasks to reduce context switches from the OS.

Even if you have such strange tasks, which is quite rare in practice, and have a fancy scheduling build system which works on heuristics based on previous runs (the only way to know), the benefits you get from it may be rather small ...however you get a bunch of added complexity to maintain.

dagnelies
  • 5,493