gdb + valgrind

During Fosdem 2014, I attended a great talk by Philippe Waroquiers about Combining the power of Valgrind and GDB. Since the video of the talk is not available, I wanted to share the power of combining gdb and valgrind.

Down the rabbit hole

valgrind can run a gdb server so that a gdb can interact with it. valgrind has 2 options to trigger that:

--vgdb=yes,
--vgdb-error=0.

This second option tells valgrind it needs to wait for that many errors before freezing and wait for a gdb to connect. Setting it to 0 will make valgrind to wait for gdb to connect before executing the program.

So you start

$ valgrind --vgdb=yes --vgdb-error=0 ./foo

valgrind will tell you to run

$ gdb ./foo

and then enter at the gdb prompt the following:

(gdb) target remote | vgdb

Let’s show some examples.

Leak search

    #include <stdlib.h>
    #include <string.h>
    
    int main(int argc, char *argv[]) {
        char *lost = strdup("catch me if you can!");
        lost = NULL;
        strdup("foobar");
        return 0;
    }

Passing commands from gdb to valgrind is done with the monitor keyword. Showing the help message is as follows:

(gdb) monitor help

Let’s put breakpoints at lines 6, 7 and 8 and continue the execution of the program.

(gdb) target remote | vgdb
(gdb) breakpoint 6
(gdb) breakpoint 7
(gdb) breakpoint 8
(gdb) continue

Is there a leak?

Breakpoint 1, main (argc=1, argv=0xffeffbe68) at leak_check.c:6
6           lost = NULL;
(gdb) monitor leak_check
==7756== LEAK SUMMARY:
==7756==    definitely lost: 0 (+0) bytes in 0 (+0) blocks
==7756==    indirectly lost: 0 (+0) bytes in 0 (+0) blocks
==7756==      possibly lost: 0 (+0) bytes in 0 (+0) blocks
==7756==    still reachable: 21 (+21) bytes in 1 (+1) blocks
==7756==         suppressed: 0 (+0) bytes in 0 (+0) blocks
==7756== Reachable blocks (those to which a pointer was found) are not
shown.
==7756== To see them, add 'reachable any' args to leak_check
==7756==

Not yet :) Let’s go to line 7 and lost no longer points to catch me if you can!.

(gdb) continue
Breakpoint 2, main (argc=1, argv=0xffeffbe68) at leak_check.c:7
7           strdup("foobar");
(gdb) monitor leak_check
==7756== LEAK SUMMARY:
==7756==    definitely lost: 0 (+0) bytes in 0 (+0) blocks
==7756==    indirectly lost: 0 (+0) bytes in 0 (+0) blocks
==7756==      possibly lost: 0 (+0) bytes in 0 (+0) blocks
==7756==    still reachable: 21 (+0) bytes in 1 (+0) blocks
==7756==         suppressed: 0 (+0) bytes in 0 (+0) blocks
==7756== Reachable blocks (those to which a pointer was found) are not
shown.
==7756== To see them, add 'reachable any' args to leak_check
==7756==

Strange! valgrind says the memory is still reachable! Let’s see why!

(gdb) monitor block_list 1
==7756== 21 (+0) bytes in 1 (+0) blocks are still reachable in loss record 1
of 1
==7756==    at 0x4C2C430: malloc (vg_replace_malloc.c:291)
==7756==    by 0x4EB7EF9: strdup (in /lib64/libc-2.17.so)
==7756==    by 0x4005A8: main (leak_search.c:5)
==7756== 0x51E1040[21]

So who is pointing at this part of the memory?

(gdb) monitor who_points_at 0x51E1040 21
==7756== Searching for pointers pointing in 21 bytes from 0x51e1040
==7756== tid 1 register RAX pointing at 0x51e1040

So there is still a register pointing to 0x51e1040.

Go to line 8 and let’s see what changed from line 7:

(gdb) continue
Breakpoint 3, main (argc=1, argv=0xffeffbe68) at leak_check.c:8
8           return 0;
(gdb) monitor leak_check full changed
(gdb) monitor l f c
==7756== 21 (+21) bytes in 1 (+1) blocks are definitely lost in loss
record 3 of 3
==7756==    at 0x4C2C430: malloc (vg_replace_malloc.c:291)
==7756==    by 0x4EB7EF9: strdup (in /lib64/libc-2.17.so)
==7756==    by 0x4005A8: main (leak_search.c:5)
==7756==
==7756== LEAK SUMMARY:
==7756==    definitely lost: 21 (+21) bytes in 1 (+1) blocks
==7756==    indirectly lost: 0 (+0) bytes in 0 (+0) blocks
==7756==      possibly lost: 0 (+0) bytes in 0 (+0) blocks
==7756==    still reachable: 7 (-14) bytes in 1 (+0) blocks
==7756==         suppressed: 0 (+0) bytes in 0 (+0) blocks
==7756== Reachable blocks (those to which a pointer was found) are not
shown.
==7756== To see them, add 'reachable any' args to leak_check
==7756==

The register no longer contains a pointer to 0x51e1040. There is still a pointer (a register in this case) that points to "foobar".

Which parts of memory is (un)initialized?

    #include <stdlib.h>
    #include <stdio.h>
    #include <unistd.h>
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    
    struct s_a {
        char c;
        int  i;
    };
    int main(int argc, char *argv[]) {
        struct s_a *a = malloc(sizeof(struct s_a));
        int fd = open("/dev/null", O_APPEND|O_WRONLY);
        a->c = 'c';
        a->i = 42;
        write(fd, a, sizeof(*a));
        close(fd);
        return 0;
    }

When running such code in valgrind, we would have the following:

==8592== Syscall param write(buf) points to uninitialised byte(s)
==8592==    at 0x4F103B0: __write_nocancel (in /lib64/libc-2.17.so)
==8592==    by 0x4006CB: main (init.c:18)
==8592==  Address 0x51e1041 is 1 bytes inside a block of size 8 alloc'd
==8592==    at 0x4C2C430: malloc (vg_replace_malloc.c:291)
==8592==    by 0x400688: main (init.c:14)

We can inspect the validity of the memory at get_vbits 0x51e1040:

(gdb) monitor get_vbits 0x51e1040 8
00ffffff 00000000

0 means that the bit is valid, 1 that it’s invalid. Here, the padding between c and i is indeed not initialized.

`watch` on a memory range

One other great feature of using valgrind with gdb is to be able to have lots of watchpoints.

Let’s consider this piece of code:

    #include <stdlib.h>

    int main(int argc, char *argv[]) {
        int *array = calloc(4096, sizeof(int));
        array[rand() % 4096] = 42;
        return 0;
    }

Let’s break on line 5. We would like to know when array is modified. We couldn’t use the watch command in gdb because the number of hardware breakpoints is very limited (usually 4) and thus it can not cover the size of the array. valgrind can handle that just fine:

Breakpoint 1, main (argc=1, argv=0xffeffbe68) at watch.c:5
5           array[rand() % 4096] = 42;
(gdb) p array
$1 = (int *) 0x51e1040
(gdb) watch (int[4096]) *0x51e1040
Hardware watchpoint 2: (int[4096]) *0x51e1040
(gdb) continue
Continuing.
Hardware watchpoint 2: (int[4096]) *0x51e1040

Old value = {0 <repeats 4096 times>}
New value = {0 <repeats 1383 times>, 42, 0 <repeats 2712 times>}
(gdb)

Conclusion

If you write C code, you must look into valgrind and gdb. They are really awesome tools for C developers.