QMutex, what Nokia never told you
Who is a big fan of easy-to-use, third-party cross-platform code? Yeah, hands down. I love it just as much as the rest of you.
Over the years, I have come to realize that one disadvantage to using third party libraries is that application developers often neglect to consider what is actually happening behind the scenes. Of course, this becomes even more of an issue with closed source libraries.
In one of our multithread applications, we’ve noticed that the application is spending an awful amount of unecessary time in kernel mode. There has always been suspicion of the culprit, but it’s not until recently (where performance has become an issue) that we have been conducting a ore thorough investigation. Using Intel’s VTune, we were able to pin-point down the culprit to Qt’s QMutex.

Let’s talk a little bit about contention. If you have ever taken a university-level course in operating systems, context-switches come at an expensive cost. Wikipedia offers a great analogy:
“To give an analogy, multiple threads in a process are like multiple cooks reading off the same cook book and following its instructions, not necessarily from the same page.”
Any time the page is busy being turned in the cook book, time is being wasted by all of the cooks. The goal in multithreaded applications is to turn the pages as little as possible AND only as and when it is needed. When the pages need to be turned, it must be quick so that the cooks can resume their work.
Although there are other operations which require an application to enter kernel mode, a high ratio of kernel time to user time was indicative that a combination of our threads and points of mutual exclusion were causing issues.
Diving into the depths of Qt (4.6.3), I found that QMutex was using the following code on a windows machine:
handle = ::CreateMutex( NULL, FALSE, NULL );
Now this is a rather heavy-handed approach to mutual exclusion. In short, this code makes use of win32 mutexes. Exactly what does MSDN have to say?
“You can use a mutex object to protect a shared resource from simultaneous access by multiple threads or processes. Each thread must wait for ownership of the mutex before it can execute the code that accesses the shared resource.”
They key phrase here is, “…use a mutex object to protect a shared resorce from simultaneous access by multiple … processes”. If you are wanting to provide mutual exclusion across threads and threads only, a win32 mutex object is far too heavy.
Digging through MSDN, I came across something that windows calls a critical section. Similar to a mutex, critical sections can be used only by the threads of a process.
“Event, mutex, and semaphore objects can also be used in a single-process application, but critical section objects provide a slightly faster, more efficient mechanism for mutual-exclusion synchronization (a processor-specific test and set instruction).”
Taking advantage of RAII, it’s simple to construct a class to instantiate critical section objects.
class CCriticalSection
{
public:
CCriticalSection(quint32 nSpinCount=4000)
{
::InitializeCriticalSectionAndSpinCount
(
&m_critSection,
static_cast<DWORD>(nSpinCount)
);
}
~CCriticalSection()
{
::DeleteCriticalSection(&m_critSection);
}
void lock() {::EnterCriticalSection(&m_critSection);}
bool tryLock() {return (TRUE==::TryEnterCriticalSection(&m_critSection));}
void unlock() {::LeaveCriticalSection(&m_critSection);}
private:
::CRITICAL_SECTION m_critSection;
};
The two (2) big advantages to critical sections are clear. They’re quick and don’t require kernel mode unless there is contention. According to Microsoft,
“Unlike events, mutexes, and semaphores, which are also used for multithreaded synchronization, critical sections don’t always perform an expensive control transfer to kernel mode. As you’ll see later, acquiring an unheld critical section requires, in effect, just a few memory modifications and is very quick. Only if you try to acquire an already-held critical section does it jump into kernel mode.”
I wrote a simple unit test using the common producer/consumer scenario to pit a win32 critical section against a Qt QMutex. A thread (the producer) adds 100000 integer items to a shared vector. In the meantime another thread runs in parallel (the consumer) and attempts to remove the same 100000 from the shared vector.
The results are in, and there is a clear winner. Firstly, I’ll start with how the QMutex performed:
1 Writer, 1 Readers - 393us
Here’s how the win32 critical section performed in comparison:
1 Writer, 1 Readers - 21us
The different in results between the Qt and the Win32 implementation was fairly consistent regardless of the number of writer (producer) and reader (consumer) threads.
[EDIT:] The new, revised implementation of QMutex in 4.7.4 does not perform any better.
Gavin is a software developer developing automation solutions for the broadcast industry. When he's not churning out code, he functions as a a rugby fanatic, aquarium enthusiast, fiance, and blogger.