How to test and benchmark mutex implementations

Question

As the title says: How do you properly test and benchmark different implementations of mutexes in c++?

Essentially I wrote my own std::mutex like class for a project running on a 2 core, armv7 with the aim to minimize the overhead in the uncontested case. Now I'm considering using said mutex in more places and also different architectures, but before I do this I'd like to make sure that

it is actually correct
there aren't any pathological cases in which it performs much worse than a standard std::mutex.

Obviously, I wrote a few basic unit tests and micro-benchmarks and everything seems to work, but in multi-threaded code "seems to work" doesn't give me great comfort.

So, are there any established static or dynamic analysis techniques?
What are common pitfalls when writing unit tests for mutex classes?
What are typical edge cases one should look out for (performance-wise)?

I'm only using standard library types for the implementation, which includes non-sequential-consistent load & store operations on atomics. However, I'm mainly interested in implementation agnostic advice, since I'd like to use the same test harness for other implementations, too.

score 1 · Answer 1 · answered Jan 16 '18 at 12:03

The issue is complex:

Some sources of complexity include:

How many context switches are taking place: This is very important depending on the platform these tests run on. Some platforms handle this better than others
Are the functions that the mutexes are tested in inlined or not. i.e. Does the mutex perform well only in well optimized or optimizable code.
Are these mutexes designed for cache locality. Will cache misses significantly reduce performance or cause more context switches. before and after the mutex is entered.
Will the mutex itself cause loss of cache locality. i.e. is mutex state data dynamically allocated.
Will these mutexes perform well where context switches are contained within the mutex. i.e. io, malloc etc.
Will the mutex perform well were kernel time is contained in the mutex.i.e. dynamic memory allocation and dealocation.
Does the performance hold when running within VM's
Is the destruction or construction of the mutex expensive i.e. state data is located in dynamic memory

score -1 · Answer 2 · answered Nov 27 '17 at 11:34

Your idea is very interesting: a compliance benchmark against which a mutex implementation could be tested against.

Unfortunately, as far as I could see, there's no widely known compliance benchmark for mutex implementations. So, I guess you have in your hands the very interesting problem of creating a proposal for such compliance benchmark.

And, since you've been involved in the creation of a benchmark implementation, you're the guy.

If you allow me a suggestion, maybe you could start this research with the POSIX standard for threads in one side, and some study of the theoretical literature of concurrent processing, like CSP, or Communicating Sequential Processes. This kind of papers usually deals with the classical concurrent problems, like the Dining Philosophers.

An implementation of them could be an interesting part of your compliance benchmark, I guess.

How to test and benchmark mutex implementations

2 Answers2