Concurrency. Tools
 
PROGRAMMING
Concurrency. Tools
2021-02-20 | by David "DeMO" Martínez Oliveira

Debugging, profiling and optimising concurrent applications is not easy, and there are not that many tools out there to deal with the specific issues of these special programs. Helgrind, part of the Valgrind project is one of those tools.

So I decided to give it a try and learn how to use it and figure out if it would be useful for me in the future. I have used Valgrind memcheck tool in the past to find and fix memory leaks so I have a good vibe with this suite.

Our Specimen

In order to play with Helgrind we are going to use a modified version of the producer/consumer example from last instalment. This is the starting code:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>

#include <pthread.h>

#define RUN_TIME       5
#define MAX_SIZE       64
#define MAX_THREADS    8
#define MAX_PRODUCERS  3


typedef struct t_par {
  int  id;
  int  flag;
} TPAR;

/* Shared buffer */
int shared_buf[MAX_SIZE];
int n = 0;
int n_transactions = 0;

int  max_buf = 0;
long acc_buf = 0;
int  prod_wait = 0;
int  cons_wait = 0;

pthread_mutex_t  mutex;

void *producer (void *arg)  {
  TPAR   *par = (TPAR*)arg;
  int id = par->id;

  while (!par->flag) {
    pthread_mutex_lock (&mutex); // Lock the resource
    while (n == MAX_SIZE - 1) {
      prod_wait++;
      pthread_mutex_unlock (&mutex);
      usleep (0);
      if (par->flag) return 0;
      pthread_mutex_lock (&mutex);
    }
    shared_buf[n] = rand () % 100;
    printf ("\033[32m Producer %d : %d (%d items)\033[m\n", id, shared_buf[n], n+1);

    n++;
    if (n > max_buf) max_buf = n;
    acc_buf += n;
    pthread_mutex_unlock (&mutex);
    usleep (100);
  }

}

void *consumer (void *arg)  {
  TPAR   *par = (TPAR*)arg;
  int     id = par->id;

  while (!par->flag) { 
    pthread_mutex_lock (&mutex);
    while (n == 0) {
      cons_wait++;
      pthread_mutex_unlock (&mutex);
      usleep (0);
      if (par->flag) return 0;
      pthread_mutex_lock (&mutex);
    }
    n_transactions ++;
    n--;
    printf ("\033[31m Consumer %d : %d (%d items-%d)\033[m\n", id, shared_buf[n], n, n_transactions);    
    pthread_mutex_unlock (&mutex);
    usleep (100);
  }

}


int main () {
  pthread_t   tid[MAX_THREADS];
  TPAR        tpar[MAX_THREADS] = {0};
  int         i;

  srand (time(NULL));
  /* Initialise synchronisation objects*/
  pthread_mutex_init (&mutex, NULL);

  /* Launch threads */
  for (i = 0; i < MAX_THREADS; tpar[i].id=i++)
    pthread_create (&tid[i], NULL, (i > MAX_PRODUCERS - 1? consumer : producer), (void*)&tpar[i]);

  sleep (RUN_TIME);
  for (i = 0; i < MAX_THREADS; i++ ) tpar[i].flag= 1; 
  for (i = 0; i < MAX_THREADS; i++ ) pthread_join(tid[i], NULL);

  printf ("WAIT: Prod: %d || Cons: %d\n", prod_wait, cons_wait);  
  printf ("Buffer size: %d (%d-%ld)|| %d Producers  || %d Consumers\n",
      MAX_SIZE, max_buf, acc_buf/n_transactions, MAX_PRODUCERS, MAX_THREADS-MAX_PRODUCERS);
  printf ("%d transactions processed\n", n_transactions);
  

  /* We will never reach this part*/
  return 0;
}

This is basically the last version of the program but adding some ordered shutdown of the threads.

First try

No more talking, let's jump straight into the thingy. I'll take the last test program we wrote while exploring the Producer/Consumer Problem and start from there:

$ valgrind ./test04
==2926== Memcheck, a memory error detector
==2926== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==2926== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==2926== Command: ./test04
==2926==
vex amd64->IR: unhandled instruction bytes: 0xF 0xAE 0x64 0x24 0x40 0x48 0x8B 0x73
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==2926== valgrind: Unrecognised instruction at address 0x4017898.
==2926==    at 0x4017898: _dl_runtime_resolve_xsave (dl-trampoline.h:116)
==2926== Your program just tried to execute an instruction that Valgrind
==2926== did not recognise.  There are two possible reasons for this.
==2926== 1. Your program has a bug and erroneously jumped to a non-code
==2926==    location.  If you are running Memcheck and you just saw a
==2926==    warning about a bad jump, it's probably your program's fault.
==2926== 2. The instruction is legitimate but Valgrind doesn't handle it,
==2926==    i.e. it's Valgrind's fault.  If you think this is the case or
==2926==    you are not sure, please let us know and we'll try to fix it.
==2926== Either way, Valgrind will now raise a SIGILL signal which will
==2926== probably kill your program.
==2926==
==2926== Process terminating with default action of signal 4 (SIGILL)
==2926==  Illegal opcode at address 0x4017898
==2926==    at 0x4017898: _dl_runtime_resolve_xsave (dl-trampoline.h:116)
==2926== valgrind: Unrecognised instruction at address 0x4017898.
==2926==    at 0x4017898: _dl_runtime_resolve_xsave (dl-trampoline.h:116)
==2926== Your program just tried to execute an instruction that Valgrind
==2926== did not recognise.  There are two possible reasons for this.
==2926== 1. Your program has a bug and erroneously jumped to a non-code
==2926==    location.  If you are running Memcheck and you just saw a
==2926==    warning about a bad jump, it's probably your program's fault.
==2926== 2. The instruction is legitimate but Valgrind doesn't handle it,
==2926==    i.e. it's Valgrind's fault.  If you think this is the case or
==2926==    you are not sure, please let us know and we'll try to fix it.
==2926== Either way, Valgrind will now raise a SIGILL signal which will
==2926== probably kill your program.
==2926==
==2926== Process terminating with default action of signal 4 (SIGILL)
==2926==  Illegal opcode at address 0x4017898
==2926==    at 0x4017898: _dl_runtime_resolve_xsave (dl-trampoline.h:116)
==2926==
==2926== HEAP SUMMARY:
==2926==     in use at exit: 0 bytes in 0 blocks
==2926==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==2926==
==2926== All heap blocks were freed -- no leaks are possible
==2926==
==2926== For counts of detected and suppressed errors, rerun with: -v
==2926== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Oof... Oof, ooff. That doesn't look good. Apparently the Valgrind version on my Ubuntu 18.04 is not good enough to deal with my concurrent test programs. Next, step, compile the latest version of Valgrind.

This worked like a charm. I just donwloaded the latest version of Valgrind source code and did the well-known:

 ./configure --prefix=/tmp/valgrind && make && make install
 

Let's try again

Second Try

When we run again the program using our new compiled tool, this is the output:

$ /tmp/valgrind/bin/valgrind --tool=helgrind ./fix02

(...)
6470 transactions processed
==1111==
==1111== Use --history-level=approx or =none to gain increased speed, at
==1111== the cost of reduced accuracy of conflicting-access information
==1111== For lists of detected and suppressed errors, rerun with: -s
==1111== ERROR SUMMARY: 16 errors from 3 contexts (suppressed: 169793 from 44)

Note:We have renamed test04 to fixXX, as we will be implementing the fixes incrementally, so fix02 is actually exactly the same than test04... why 2?... don't ask

So Helgrind have found 16 errors in a program that we believe is pretty much correct and "works" fine. So let's go through each one of the errors and try to solve them

==1111==
==1111== Possible data race during write of size 4 at 0x1FFF000174 by thread #1
==1111== Locks held: none
==1111==    at 0x108D62: main (fix02.c:95)
==1111==
==1111== This conflicts with a previous read of size 4 by thread #2
==1111== Locks held: none
==1111==    at 0x108B31: producer (fix02.c:35)
==1111==    by 0x4C39077: mythread_wrapper (hg_intercepts.c:387)
==1111==    by 0x4E4C6DA: start_thread (pthread_create.c:463)
==1111==    by 0x518571E: clone (clone.S:95)
==1111==  Address 0x1fff000174 is on thread #1's stack
==1111==  in frame #0, created by main (fix02.c:80)
==1111==
==1111== ----------------------------------------------------------------

So, let's see what is at line 95

  for (i = 0; i < MAX_THREADS; i++ ) tpar[i].flag= 1; 

Ans what we have at line 35

  while (!par->flag) {

So, Helgrind is completely right we are writing to a variable in one thread and reading it from another thread so this is potentially an issue. Fortunately for us, intel processor store and load operations are atomic... depending on the code generated by the compiler the code above may be correct, but Helgrind only deals with the POSIX threads primitives and cannot deal with low level synchronisation constructs.

Let's take a quick look to the generated code for the main function:

 d49:   e8 22 fb ff ff          callq  870 <sleep@plt>
 d4e:   c7 85 6c ff ff ff 00    movl   $0x0,-0x94(%rbp)
 d55:   00 00 00
 d58:   eb 17                   jmp    d71 <main+0x12f>
 d5a:   8b 85 6c ff ff ff       mov    -0x94(%rbp),%eax
 d60:   48 98                   cltq
 d62:   c7 44 c5 b4 01 00 00    movl   $0x1,-0x4c(%rbp,%rax,8)
 d69:   00
 d6a:   83 85 6c ff ff ff 01    addl   $0x1,-0x94(%rbp)
 d71:   83 bd 6c ff ff ff 07    cmpl   $0x7,-0x94(%rbp)
 d78:   7e e0                   jle    d5a <main+0x118>

Let's rewrite it in a more clear way:

; Local Variables
; RBP-0x94 -> i
; RBP-0x4c -> tpar
;
           call  sleep
           mov   0, i            ; for (i = 0...
           jmp   loop_condition
loop:   
           mov   i, %eax,
           mov   1, (tpar + %eax*8)  ; tpar[i].flag = 1
           add   1,i              ; .... i++)
loop_condition: 
           cmp   7, i             ;  ... i < MAX_THREADS ....
           jle   loop

This is the annotated assembly for the code:

  for (i = 0; i < MAX_THREADS; i++ ) tpar[i].flag= 1; 

As we can see the compiler is setting the flag in just one mov so that, in principle, is an atomic operation for intel processors. This means that we could just consider this a false positive and instruct Helgrind to ignore this error.... or we could fix it.

Obviously, fixing it is the right solution so our C code doesn't depend on intel platform. If we try to compile the program for a different architecture in future, it may happen that it will fail.

Supressing Helgrind Errors

So, first, let's see how we can suppress errors from Helgrind output. This is something we have to do when it is reporting false positives. Errors that are actually not errors, but that looks like errors to the tool because it is lacking some information. This happens sometimes when dealing with highly optimised libraries.

In order to achieve this, Valgrind provides a generic mechanism to suppress errors from any of the tools on the suit (remember, Valgrind provides multiple tools). This is done through the so-called Suppression Files.

For that, we can use the --gen-suppressions=all flag. This flag, will attach to each error the associated suppression block that we need to include in a suppression file to ignore that specific error. So, let's go:

$ /tmp/valgrind/bin/valgrind --gen-suppressions=all  --tool=helgrind ./fix02
(...)
==1733== ----------------------------------------------------------------
==1733==
==1733== Possible data race during write of size 4 at 0x1FFF000174 by thread #1
==1733== Locks held: none
==1733==    at 0x108D62: main (fix02.c:95)
==1733==
==1733== This conflicts with a previous read of size 4 by thread #2
==1733== Locks held: none
==1733==    at 0x108B31: producer (fix02.c:35)
==1733==    by 0x4C39077: mythread_wrapper (hg_intercepts.c:387)
==1733==    by 0x4E4C6DA: start_thread (pthread_create.c:463)
==1733==    by 0x518571E: clone (clone.S:95)
==1733==  Address 0x1fff000174 is on thread #1's stack
==1733==  in frame #0, created by main (fix02.c:80)
==1733==
{
   <insert_a_suppression_name_here>
   Helgrind:Race
   fun:main
}

So, now we can give that suppression a name (e.g. pt-stop), add it to a file (e.g pt-stop.sup) and re-run the tool with this file:

$ /tmp/valgrind/bin/valgrind --suppressions=pt-stop.sup --tool=helgrind ./fix02
(..)
==1884== ERROR SUMMARY: 8 errors from 2 contexts (suppressed: 218577 from 37)

We can use the flag -s to get a summary of the suppresions:

$ /tmp/valgrind/bin/valgrind -s --suppressions=pt-stop.sup --tool=helgrind ./fix02
(..)

--1942--
--1942-- used_suppression:      8 pt-stop pt-stop.sup:2
--1942-- used_suppression: 215544 helgrind-glibc2X-005 /tmp/valgrind/lib/valgrind/default.supp:943
--1942-- used_suppression:     68 helgrind-glibc2X-004 /tmp/valgrind/lib/valgrind/default.supp:929
--1942-- used_suppression:      7 helgrind-glibc2X-101 /tmp/valgrind/lib/valgrind/default.supp:978
==1942==
==1942== ERROR SUMMARY: 8 errors from 3 contexts (suppressed: 215627 from 38)

We can see the 8 suppression from our suppression file.

Fixing the error

So, actually the error is due to write the stop flag from the main thread. That variable is shared and we are writing it from one thread and reading it from another without any protection. We can fix this error in different ways. We can use the mutex object we already have to block all other threads when the main thread set the flag to 1 or we can ensure that the access to that variable is atomic.

The first fix will lead to this code:

  pthread_mutex_lock (&mutex);
  for (i = 0; i < MAX_THREADS; i++ ) tpar[i].flag= 1;
  pthread_mutex_unlock (&mutex);

The other solution would force the atomic update of the flag.

  for (i = 0; i < MAX_THREADS; i++ )  __sync_fetch_and_add (&tpar[i].flag, 1);

or more compactly

for (i = 0; i < MAX_THREADS; __sync_fetch_and_add (&tpar[i++].flag, 1));

If we now run again Helgrind.....

$ /tmp/valgrind/bin/valgrind --tool=helgrind ./fix04
(...)
==2440== Use --history-level=approx or =none to gain increased speed, at
==2440== the cost of reduced accuracy of conflicting-access information
==2440== For lists of detected and suppressed errors, rerun with: -s
==2440== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 229923 from 36)

Wow... we have solved all the issues in one go!!. Very good.

However, for the sake of completeness. Let's take a look to the 8 errors left when we just used the mutex and then figure out how they are related to the atomic modification of tpar[i].flag.

==2193== ----------------------------------------------------------------
==2193==
==2193== Possible data race during write of size 4 at 0x1FFF000174 by thread #1
==2193== Locks held: none
==2193==    at 0x108D62: main (fix03.c:96)
==2193==
==2193== This conflicts with a previous read of size 4 by thread #2
==2193== Locks held: none
==2193==    at 0x108B31: producer (fix03.c:35)
==2193==    by 0x4C39077: mythread_wrapper (hg_intercepts.c:387)
==2193==    by 0x4E4C6DA: start_thread (pthread_create.c:463)
==2193==    by 0x518571E: clone (clone.S:95)
==2193==  Address 0x1fff000174 is on thread #1's stack
==2193==  in frame #0, created by main (fix03.c:80)
(...)
==2193== ----------------------------------------------------------------
==2193==
==2193== Possible data race during read of size 4 at 0x1FFF0001A4 by thread #8
==2193== Locks held: none
==2193==    at 0x108B97: consumer (fix03.c:66)
==2193==    by 0x4C39077: mythread_wrapper (hg_intercepts.c:387)
==2193==    by 0x4E4C6DA: start_thread (pthread_create.c:463)
==2193==    by 0x518571E: clone (clone.S:95)
==2193==
==2193== This conflicts with a previous write of size 4 by thread #1
==2193== Locks held: none
==2193==    at 0x108D62: main (fix03.c:96)
==2193==  Address 0x1fff0001a4 is on thread #1's stack
==2193==  in frame #3, created by main (fix03.c:80)

As we can see there are two different error but both of them are related to the same line on the main function. The same one that also caused the previous error and that is the reason why all of then went out after fixing the previous error.

A deadlock

We had got a slight idea of what Helgrind can do for us to profile and troubleshot multithreaded applications. Let's try one more example. This time let's see what the tool will give us for the case of a deadlock.

The following is an implementation of the simplest deadlock program that can be written. Two threads works on two resources. One thread locks resource 1 and then tries to lock resource 2, but the other thread had already locked resource 2 and is waiting for resource 1..... So both threads get blocked for ever and the program effectively hangs.

#include <stdio.h>
#include <unistd.h>
#include <pthread.h>

pthread_mutex_t lock1, lock2;

void *p1() {
  
  pthread_mutex_lock(&lock1);
  
  printf ("Thread 1 acquires Resource 1....\n");
  sleep (1);
  
  printf ("Thread 1. Trying to acquire Resource 2\n");
  
  pthread_mutex_lock (&lock2); 
  printf ("Thread 1... Working with Resource 2\n");
  pthread_mutex_unlock (&lock2);
  
  printf ("Thread 1. All done with Resource 1..\n");
  
  pthread_mutex_unlock (&lock1);
  
  pthread_exit (NULL);
}

void *p2() {
  
  pthread_mutex_lock(&lock2);
  
  printf ("Thread 2 acquires Resource 2....\n");
  sleep (1);
  
  printf ("Thread 2. Trying to acquire Resource 1\n");
  
  pthread_mutex_lock (&lock1); 
  printf ("Thread 2... Working with Resource 1\n");
  pthread_mutex_unlock (&lock1);
  
  printf ("Thread 2. All done with Resource 2..\n");
  
  pthread_mutex_unlock (&lock1);
  
  pthread_exit (NULL);
}


int main() {
  pthread_t t1, t2;  
  pthread_mutex_init (&lock1, NULL);
  pthread_mutex_init (&lock2, NULL);
  
  pthread_create (&t1, NULL, p1, NULL);
  pthread_create (&t2, NULL, p2, NULL);

  pthread_join (t1, NULL);
  pthread_join (t2, NULL);
  
  return 0;
  
}

If you compile and run this program you will see that it never finished. Both threads get blocked waiting for a resource hold by the other one, and the main one is also blocked waiting for those two threads to finish.

Let's run Helgrind:

$ /tmp/valgrind/bin/valgrind --tool=helgrind ./deadlock 2> deadlock.log
Thread 1 acquires Resource 1....
Thread 2 acquires Resource 2....
Thread 1. Trying to acquire Resource 2
Thread 2. Trying to acquire Resource 1

This time we had redirected the stderr (where Helgrind dumps its data) to a file for easy check. Also note that the program just hangs up so you have to press CTRL+C to finish the execution.

The deadlock.log

So, let's take a look to the errors in the previous program. Helgrind reported 105 errors but let's hope most of them are all the same.

==4663== ----------------------------------------------------------------
==4663==
==4663==  Lock at 0x30A080 was first observed
==4663==    at 0x4C3A339: pthread_mutex_init (hg_intercepts.c:785)
==4663==    by 0x108A42: main (in /mnt/mia/work/projects/concurrency/30cond/deadlock)
==4663==  Address 0x30a080 is 0 bytes inside data symbol "lock2"
==4663==
==4663== Possible data race during write of size 1 at 0x5C561AF by thread #3
==4663== Locks held: 1, at address 0x30A080
==4663==    at 0x4C3E50B: mempcpy (vg_replace_strmem.c:1536)
==4663==    by 0x50EFA63: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1258)
==4663==    by 0x50E4B6E: puts (ioputs.c:40)
==4663==    by 0x1089AD: p2 (in /mnt/mia/work/projects/concurrency/30cond/deadlock)
==4663==    by 0x4C39077: mythread_wrapper (hg_intercepts.c:387)
==4663==    by 0x4E4C6DA: start_thread (pthread_create.c:463)
==4663==    by 0x518571E: clone (clone.S:95)
==4663==  Address 0x5c561af is 31 bytes inside a block of size 1,024 alloc'd
==4663==    at 0x4C3326B: malloc (vg_replace_malloc.c:307)
==4663==    by 0x50E226B: _IO_file_doallocate (filedoalloc.c:101)
==4663==    by 0x50F2448: _IO_doallocbuf (genops.c:365)
==4663==    by 0x50F1567: _IO_file_overflow@@GLIBC_2.2.5 (fileops.c:759)
==4663==    by 0x50EFABC: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1266)
==4663==    by 0x50E4B6E: puts (ioputs.c:40)
==4663==    by 0x108935: p1 (in /mnt/mia/work/projects/concurrency/30cond/deadlock)
==4663==    by 0x4C39077: mythread_wrapper (hg_intercepts.c:387)
==4663==    by 0x4E4C6DA: start_thread (pthread_create.c:463)
==4663==    by 0x518571E: clone (clone.S:95)
==4663==  Block was alloc'd by thread #2
==4663==

Well this seems to be related to puts. Valgrind web site mentions that standard IO function (printf, fread,...) have some issues. This is what they say:

  1. POSIX requires that implementations of standard I/O (printf, fprintf, fwrite, fread, etc) are thread safe. Unfortunately GNU libc implements this by using internal locking primitives that Helgrind is unable to intercept. Consequently Helgrind generates many false race reports when you use these functions.

Helgrind attempts to hide these errors using the standard Valgrind error-suppression mechanism. So, at least for simple test cases, you don't see any. Nevertheless, some may slip through. Just something to be aware of.

So, it looks like this one was not caught by the default suppression file. Fortunately we know now how to create our own. This is what --gen-suppressions=all produces for the above error:

{
   puts-p2
   Helgrind:Race
   fun:mempcpy
   fun:_IO_file_xsputn@@GLIBC_2.2.5
   fun:puts
   fun:p2
   fun:mythread_wrapper
   fun:start_thread
   fun:clone
}

Running again with the new suppression file:

$ /tmp/valgrind/bin/valgrind --suppressions=deadlock.sup --tool=helgrind ./deadlock 2> deadlock.log; tail deadlock.log
(...)
==4948== ERROR SUMMARY: 35 errors from 3 contexts (suppressed: 124 from 56)

Not bad... Let's go for the next one:

==5079== ----------------------------------------------------------------
==5079==
==5079==  Lock at 0x30A040 was first observed
==5079==    at 0x4C3A339: pthread_mutex_init (hg_intercepts.c:785)
==5079==    by 0x108A31: main (deadlock.c:50)
==5079==  Address 0x30a040 is 0 bytes inside data symbol "lock1"
==5079==
==5079==  Lock at 0x30A080 was first observed
==5079==    at 0x4C3A339: pthread_mutex_init (hg_intercepts.c:785)
==5079==    by 0x108A42: main (deadlock.c:51)
==5079==  Address 0x30a080 is 0 bytes inside data symbol "lock2"
==5079==
==5079== Possible data race during write of size 1 at 0x5C561B0 by thread #2
==5079== Locks held: 1, at address 0x30A040
==5079==    at 0x4C3E50B: mempcpy (vg_replace_strmem.c:1536)
==5079==    by 0x50EFA63: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1258)
==5079==    by 0x50E4B6E: puts (ioputs.c:40)
==5079==    by 0x10894B: p1 (deadlock.c:14)
==5079==    by 0x4C39077: mythread_wrapper (hg_intercepts.c:387)
==5079==    by 0x4E4C6DA: start_thread (pthread_create.c:463)
==5079==    by 0x518571E: clone (clone.S:95)
==5079==
==5079== This conflicts with a previous write of size 1 by thread #3
==5079== Locks held: 1, at address 0x30A080
==5079==    at 0x50F1415: _IO_file_overflow@@GLIBC_2.2.5 (fileops.c:795)
==5079==    by 0x50E4C41: puts (ioputs.c:41)
==5079==    by 0x1089AD: p2 (deadlock.c:31)
==5079==    by 0x4C39077: mythread_wrapper (hg_intercepts.c:387)
==5079==    by 0x4E4C6DA: start_thread (pthread_create.c:463)
==5079==    by 0x518571E: clone (clone.S:95)
==5079==  Address 0x5c561b0 is 32 bytes inside a block of size 1,024 alloc'd
==5079==    at 0x4C3326B: malloc (vg_replace_malloc.c:307)
==5079==    by 0x50E226B: _IO_file_doallocate (filedoalloc.c:101)
==5079==    by 0x50F2448: _IO_doallocbuf (genops.c:365)
==5079==    by 0x50F1567: _IO_file_overflow@@GLIBC_2.2.5 (fileops.c:759)
==5079==    by 0x50EFABC: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1266)
==5079==    by 0x50E4B6E: puts (ioputs.c:40)
==5079==    by 0x108935: p1 (deadlock.c:11)
==5079==    by 0x4C39077: mythread_wrapper (hg_intercepts.c:387)
==5079==    by 0x4E4C6DA: start_thread (pthread_create.c:463)
==5079==    by 0x518571E: clone (clone.S:95)
==5079==  Block was alloc'd by thread #2
==5079==

So, this is the same error but from the point of view of p1. So, let's remove also this one.

{
   puts=p1
   Helgrind:Race
   fun:mempcpy
   fun:_IO_file_xsputn@@GLIBC_2.2.5
   fun:puts
   fun:p1
   fun:mythread_wrapper
   fun:start_thread
   fun:clone
}

After adding this to our suppressions file we get just:

==5171== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 157 from 57)

The actual deadlock

When checking Helgrind output everything is normal until we press the CTRL+C.

==5171== Process terminating with default action of signal 2 (SIGINT)
==5171==    at 0x4E4DD2D: __pthread_timedjoin_ex (pthread_join_common.c:89)
==5171==    by 0x4C36103: pthread_join_WRK (hg_intercepts.c:551)
==5171==    by 0x4C39F80: pthread_join (hg_intercepts.c:570)
==5171==    by 0x108A8D: main (deadlock.c:58)

Then those two errors are shown that may give us a hint about the issue.

==5171== ----------------------------------------------------------------
==5171==
==5171== Thread #3: Exiting thread still holds 1 lock
==5171==    at 0x4E561FD: __lll_lock_wait (lowlevellock.S:135)
==5171==    by 0x4E4F024: pthread_mutex_lock (pthread_mutex_lock.c:80)
==5171==    by 0x4C3646B: mutex_lock_WRK (hg_intercepts.c:900)
==5171==    by 0x4C3A35C: pthread_mutex_lock (hg_intercepts.c:923)
==5171==    by 0x1089CF: p2 (deadlock.c:36)
==5171==    by 0x4C39077: mythread_wrapper (hg_intercepts.c:387)
==5171==    by 0x4E4C6DA: start_thread (pthread_create.c:463)
==5171==    by 0x518571E: clone (clone.S:95)
==5171==
(...)
==5171== ----------------------------------------------------------------
==5171==
==5171== Thread #2: Exiting thread still holds 1 lock
==5171==    at 0x4E561FD: __lll_lock_wait (lowlevellock.S:135)
==5171==    by 0x4E4F024: pthread_mutex_lock (pthread_mutex_lock.c:80)
==5171==    by 0x4C3646B: mutex_lock_WRK (hg_intercepts.c:900)
==5171==    by 0x4C3A35C: pthread_mutex_lock (hg_intercepts.c:923)
==5171==    by 0x108957: p1 (deadlock.c:16)
==5171==    by 0x4C39077: mythread_wrapper (hg_intercepts.c:387)
==5171==    by 0x4E4C6DA: start_thread (pthread_create.c:463)
==5171==    by 0x518571E: clone (clone.S:95)
==5171==

Helgrind tell us than, at the time of killing the process, there were still two threads running waiting a a lock. The first one was waiting on function p2 line 36 while the second one was waiting on function p1 line 16:

p1:16 pthread_mutex_lock (&lock2); 
p2:36 pthread_mutex_lock (&lock1); 

That is actually the point were both threads were blocked... But Helgrind doesn't say anything about the actual deadlock. It is up to us to analyse the code and figure out that each thread is blocking each other.

mutextrace

There is other tool available in the standard repository. This is suppose to deal with locks and mutex and not work in the general case. Again it is a low level debugging tool that just gives us information to help us figure out what is the problem. The output of this tool is:

$ mutextrace ./deadlock
[1] mutex_init(1, FAST)
[1] mutex_init(2, FAST)
[2] started
[2] mutex_lock(1)
Thread 1 acquires Resource 1....
[3] started
[3] mutex_lock(2)
Thread 2 acquires Resource 2....
Thread 1. Trying to acquire Resource 2
[2] mutex_lock(2) <waiting for thread 3> ...
Thread 2. Trying to acquire Resource 1
[3] mutex_lock(1) <waiting for thread 2> ...

In this case the deadlock is a bit more obvious as it says that Thread 2 is waiting in lock 2 for thread3 and Thread 3 is waiting on lock 1 for thread 2.

Again, the version of mutextrace on Ubuntu 18 seems a bit outdated so I tried the latest version. I got it from this repo. The version there has an additional functionality that shows a warning when a thread is waiting for a lock for too long. Using this version, we got the following output:

$ /tmp/mutextrace/bin/mutextrace ./deadlock
[1] mutex_init(1, FAST)
[1] mutex_init(2, FAST)
[2] started
[2] mutex_lock(1)
Thread 1 acquires Resource 1....
[3] started
[3] mutex_lock(2)
Thread 2 acquires Resource 2....
Thread 1. Trying to acquire Resource 2
Thread 2. Trying to acquire Resource 1
[2] mutex_lock(2) <waiting for thread 3> ...
[3] mutex_lock(1) <waiting for thread 2> ...
[3] ... mutex_lock(1) <still waiting for thread 2> ...
[2] ... mutex_lock(2) <still waiting for thread 3> ...
[2] ... mutex_lock(2) <still waiting for thread 3> ...
[3] ... mutex_lock(1) <still waiting for thread 2> ...
[2] ... mutex_lock(2) <still waiting for thread 3> ...
[3] ... mutex_lock(1) <still waiting for thread 2> ...

The last two messages keeps repeating every few seconds, which, somehow makes more visible the deadlock problem.

Conclusion

We have taken a look to Helgrind, a tool included in the Valgrind package to help us debug POSIX threads applications and used in a couple of examples to get some hands-on experience on what we may expect in a real case. As you can see, these tools are still very low level, and, even for so simple cases as the ones we used as examples, requires quite some interpretation from the programmer to figure out what is the problem... in case there is one.

Do you know other debugging tools for multi-threaded applications? Let me know in the comments.


 
Tu publicidad aquí :)