Using valgrind to debug Xen toolstacks
What is Valgrind?
Valgrind is a framework for building dynamic analysis tools. Several useful tools are included in the Valgrind distribution including tools to check dynamic memory usage (memcheck), a cache profiler (cachegrind), heap profiler (massif) and thread debugger (helgrind) among others. Valgrind also provides a framework which can be used to build other tools.
The Valgrind tool which I find most useful and the one which I have most experience with is memcheck. This tool can detect all manner of memory management problems, including use after free, using uninitialized data, memory leaks, double free. Between them these can result in savings of many hours of staring a core dumps and gdb backtraces.
How Does memcheck Work?
At its core Valgrind is a dynamic binary translation engine, which is used to instrument the code at runtime. In the case of memcheck this is used to track properties of the memory in a process, including aspects such as whether each bit (yes, bit) of memory has been initialized since it was allocated. It also tracks this information as data is loaded into registers, so it can know if a given register is currently tainted with uninitialized data or not. As well as instrumentation through binary translation Valgrind also includes instrumented versions of the C dynamic memory allocation functions which are used to track whether a each bit of memory is currently considered free or allocated, as well as tainting registers when they contain a pointer to memory which has been freed.
A large part of memcheck‘s functionality is built upon Valgrind’s ability to determine when memory has been initialized. However although binary translation can be used to instrument when the application itself has initialized some memory it is not currently possible to instrument the behaviour of system calls, in other words Valgrind cannot automatically determine when memory has been initialised e.g. after a read(2) system call. For this reason Valgrind has baked in knowledge about the behaviour of system calls on various platforms.
For example lets consider the read(2) system call. The prototype of read(2) is:
ssize_t read(int fd, void *buf, size_t count);
This system call reads up to count bytes from the file descriptor fd into the memory pointed to by buf and returns the number of bytes read. Here is the code within Valgrind which handles this system call (found in coregrind/m_syswrap/syswrap-generic.c):
PRE(sys_read) { *flags |= SfMayBlock; PRINT("sys_read ( %ld, %#lx, %llu )", ARG1, ARG2, (ULong)ARG3); PRE_REG_READ3(ssize_t, "read", unsigned int, fd, char *, buf, vki_size_t, count); if (!ML_(fd_allowed)(ARG1, "read", tid, False)) SET_STATUS_Failure( VKI_EBADF ); else PRE_MEM_WRITE( "read(buf)", ARG2, ARG3 ); } POST(sys_read) { vg_assert(SUCCESS); POST_MEM_WRITE( ARG2, RES ); }
These two functions are called before and after the system call respectively. The main piece of functionality is the calls to PRE_MEM_WRITE and POST_MEM_WRITE. In this case PRE_MEM_WRITE is used to check that the memory starting at ARG2 (recall that this is the supplied buffer) and extending for ARG3 bytes (the system calls count argument) consists of allocated memory. After
the system call in complete the post function calls POST_MEM_WRITE to indicate to Valgrind that RES bytes of the buffer have know been initialized.
The above is a relatively simple case, but the above two hooks must be supplied for each system call on each platform which Valgrind wishes to support. One of the most problematic system calls is the ioctl(2) system call. The ioctl call takes a file descriptor, a request number and a pointer to a per-ioctl argument structure and implements per-device control requests. There
are literally dozens of devices many with their own specific ioctls. For this reason a sizable portion of Valgrind code is dedicate to decoding the behaviour of the myriad of potential ioctls
and providing pre- and post-call instrumentation for them.
Interacting with the Hypervisor from Userspace
Under Xen the toolstack is a normal userspace process running in the control domain (typically domain 0). As part of its operation the toolstack needs to communicate with the hypervisor and to do this it uses hypercalls. However a userspace process cannot simply make hypercalls on its own, instead it must request that the OS kernel make the hypercall for it. This prevents just any userspace process making a hypercall, since the kernel will first check the privilege of the process making the request.
Under Linux a userspace process makes hypercalls by using the ioctl(2) system call on a special “Privileged Command” (<a href="http://xenbits.xen.org/hg/xen-unstable.hg/file/tip/tools/include/xen-sys/Linux/privcmd.h"privcmd) device. In this case the toolstack uses the IOCTL_PRIVCMD_HYPERCALL ioctl request which takes a hypercall number and the hypercall arguments as its argument. Other OSes which can be used as a control domain posses similar interfaces.
Making memcheck Work With Xen Toolstacks?
As might be expected the majority of the work to allow Valgrind to work on Xen toolstack processes was teaching it about the IOCTL_PRIVCMD_HYPERCALL ioctl. Fortunately it was not necessary to teach Valgrind about every hypercall but only about those which are used by toolstacks. This is a smallish subset of all hypercalls including the domctl interfaces and a handful of others.
The initial batch of patches cover all of the system calls needed to start new domains, shut them down and migrate them using the XL toolstack. New hypercalls will be added as they are discovered to be missing.
Getting and Using This Functionality
The initial set of patches were accepted into Valgrind's Subversion repository trunk revision 13081 in October 2012 and are expected to be part of the 3.9.0 release.
Until 3.9.0 is released it will be necessary to get this functionality from SVN. Fortunately this is relatively simple and follows the usual autofoo dance. Although the usual caveats regarding running unreleased software apply.
$ svn co svn://svn.valgrind.org/valgrind/trunk valgrind $ cd valgrind $ ./autogen.sh $ ./configure $ make $ sudo make install
Once Valgrind is built and installed simply call it passing the command to be debugged as an option:
# valgrind xl list ==16186== Memcheck, a memory error detector ==16186== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al. ==16186== Using Valgrind-3.9.0.SVN and LibVEX; rerun with -h for copyright info ==16186== Command: xl list ==16186== Name ID Mem VCPUs State Time(s) Domain-0 0 512 4 r----- 332.5 ==16186== ==16186== HEAP SUMMARY: ==16186== in use at exit: 0 bytes in 0 blocks ==16186== total heap usage: 16 allocs, 16 frees, 93,529 bytes allocated ==16186== ==16186== All heap blocks were freed -- no leaks are possible ==16186== ==16186== For counts of detected and suppressed errors, rerun with: -v ==16186== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 39 from 6)
Here we see that xl list has no leaks! However if I comment out the call to libxl_dominfo_list_free in tools/libxl/xl_cmdimpl.c:main_list() then instead Valgrind reports:
[...] ==16203== HEAP SUMMARY: ==16203== in use at exit: 90,112 bytes in 1 blocks ==16203== total heap usage: 16 allocs, 15 frees, 93,529 bytes allocated ==16203== ==16203== LEAK SUMMARY: ==16203== definitely lost: 0 bytes in 0 blocks ==16203== indirectly lost: 0 bytes in 0 blocks ==16203== possibly lost: 90,112 bytes in 1 blocks ==16203== still reachable: 0 bytes in 0 blocks ==16203== suppressed: 0 bytes in 0 blocks ==16203== Rerun with --leak-check=full to see details of leaked memory ==16203== ==16203== For counts of detected and suppressed errors, rerun with: -v ==16203== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 39 from 6) [...]
So it has caught the memory leak! Rerunning with the suggests --leak-check=full goes even further and tells me where the leaked memory was allocated:
[...] ==16204== 90,112 bytes in 1 blocks are possibly lost in loss record 1 of 1 ==16204== at 0x4024480: calloc (vg_replace_malloc.c:593) ==16204== by 0x4050CE4: libxl_list_domain (libxl.c:548) ==16204== by 0x805EEC7: main_list (xl_cmdimpl.c:3931) ==16204== by 0x804D6DD: main (xl.c:285) [...]
To ease debugging of domain creation I find it useful to use the -F option to xl create to stop xl from forking and daemonising. Most xl subcommands do not daemonize and so need no special treatment.
For more information on running Valgrind see the Valgrind Documentation.
Conclusion
Valgrind is a powerful tool in the arsenal of the C programmer, and can be used to catch a large number of hard to debug and common issues before they happen can save many hours of tedious debugging. The ability to use tools such as this on Xen toolstack processes is of enormous benefit and has already found several bugs in the XL toolstack.