Skip to content

Using dtrace to track down memory leaks

June 30, 2009

I’ve been working with a customer to try and find a memory “leak” in their application. Many things have been tried, libumem, and the mdb ::findleaks command all with no success.

So I was, as I am sure others before me have, pondering if you could use dtrace to do this. Well I think you can. I have a script that puts probes into malloc et al and counts how often they are called by this thread and when they are freed often free is called.

Then in the entry probe of the target application note away how many calls there have been to the allocators and how many to free and with a bit of care realloc. Then in the return probe compare the number of calls to allocate and free with the saved values and aggregate the results. The principle is that you find the routines that are resulting in allocations that they don’t clear up. This should give you a list of functions that are possible leakers which you can then investigate1.

Using the same technique I for getting dtrace to “follow fork” that I described here I ran this up on diskomizer, a program that I understand well and I’m reasonably sure does not have systemic memory leaks. The dtrace script reports three sets of results.

  1. A count of how many times each routine and it’s descendents have called a memory allocator.

  2. A count of how many times each routine and it’s descendents have called free or realloc with a non NULL pointer as the first argument.

  3. The difference between the two numbers above.

Then with a little bit of nawk to remove all the functions for which the counts are zero gives:

# /usr/sbin/dtrace -Z -wD TARGET_OBJ=diskomizer2 -o /tmp/out-us \ 	-s /tmp/followfork.d \ 	-Cs /tmp/allocated.d -c \          "/opt/SUNWstc-diskomizer/bin/sparcv9/diskomizer -f /devs -f background \           -o background=0 -o SECONDS_TO_RUN=1800" dtrace: failed to compile script /tmp/allocated.d: line 20: failed to create entry probe for ‘realloc’: No such process dtrace: buffer size lowered to 25m dtrace: buffer size lowered to 25m dtrace: buffer size lowered to 25m dtrace: buffer size lowered to 25m   # nawk ‘$1 != 0 { print  $0 }’ < /tmp/out.3081 allocations            1 diskomizer`do_dev_control            1 diskomizer`set_dev_state            1 diskomizer`set_state            3 diskomizer`report_exit_reason            6 diskomizer`alloc_time_str            6 diskomizer`alloc_time_str_fmt            6 diskomizer`update_aio_read_stats            7 diskomizer`cancel_all_io            9 diskomizer`update_aio_write_stats           13 diskomizer`cleanup           15 diskomizer`update_aio_time_stats           15 diskomizer`update_time_stats           80 diskomizer`my_calloc          240 diskomizer`init_read          318 diskomizer`do_restart_stopped_devices          318 diskomizer`start_io          449 diskomizer`handle_write          606 diskomizer`do_new_write         2125 diskomizer`handle_read_then_write         2561 diskomizer`init_buf         2561 diskomizer`set_io_len        58491 diskomizer`handle_read        66255 diskomizer`handle_write_then_read       124888 diskomizer`init_read_buf       124897 diskomizer`do_new_read       127460 diskomizer`expect_signal freecount            1 diskomizer`expect_signal            3 diskomizer`report_exit_reason            4 diskomizer`close_and_free_paths            6 diskomizer`update_aio_read_stats            9 diskomizer`update_aio_write_stats           11 diskomizer`cancel_all_io           15 diskomizer`update_aio_time_stats           15 diskomizer`update_time_stats           17 diskomizer`cleanup          160 diskomizer`init_read          318 diskomizer`do_restart_stopped_devices          318 diskomizer`start_io          442 diskomizer`handle_write          599 diskomizer`do_new_write         2125 diskomizer`handle_read_then_write         2560 diskomizer`init_buf         2560 diskomizer`set_io_len        58491 diskomizer`handle_read        66246 diskomizer`handle_write_then_read       124888 diskomizer`do_new_read       124888 diskomizer`init_read_buf       127448 diskomizer`cancel_expected_signal mismatch_count      -127448 diskomizer`cancel_expected_signal           -4 diskomizer`cancel_all_io           -4 diskomizer`cleanup           -4 diskomizer`close_and_free_paths            1 diskomizer`do_dev_control            1 diskomizer`init_buf            1 diskomizer`set_dev_state            1 diskomizer`set_io_len            1 diskomizer`set_state            6 diskomizer`alloc_time_str            6 diskomizer`alloc_time_str_fmt            7 diskomizer`do_new_write            7 diskomizer`handle_write            9 diskomizer`do_new_read            9 diskomizer`handle_write_then_read           80 diskomizer`init_read           80 diskomizer`my_calloc       127459 diskomizer`expect_signal  # 

From the above you can see that there are two functions that create and free the majority of the allocations and the allocations almost match each other, which is expected as they are effectively constructor and destructor for each other. The small mismatch is not unexpected in this context.

However it is the vast number of functions that are not listed at all as they and their children make no calls to the memory allocator or have exactly matching allocation and free that are important here. Those are the functions that we have just ruled out.

From here it is easy now to drill down on the functions that are interesting you, ie the ones where there are unbalanced allocations.

I’ve uploaded the files allocated.d and followfork.d so you can see the details. If you find it useful then let me know.

1Unfortunately the list is longer than you want as on SPARC it includes any functions that don’t have their own stack frame due to the way dtrace calculates ustackdepth, which the script makes use of.

2The script only probes particular objects, in this case the main diskomizer binary, but you can limit it to a particular library or even a particular set of entry points based on name if you edit the script.


From → Solaris

  1. Andreas Jantos permalink

    That’s very interesting. Memory leaks are the modern bane of the C++ programmer. I always cringe when I get asked to give a time estimate for finding *the* leak in an application (yeah, like there is just one).
    I will give this a try the next time.

  2. Arghya Kusum Das permalink

    I am working a customer issue of memory leak. May I get the scripts allocated.d and followfork.d. Files are seemed to be removed from the website. I will really appreciate if you can send me those scripts.
    Thanks and Regards,
    Arghya Kusum Das

  3. Hi Chris,
    Such a simple idea and elegant solution to give insight into a very complex problem. Can I request access to the two D scripts for testing too?
    Many Thanks

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: