Measuring C++ Allocator Performance

What follows is a comparison of performance of size-based pools over a variety of value type sizes to the standard STL allocator. But as we can’t really know how clever that allocator is itself, we’ll also compare the STL allocator to a thin wrapper around C-style malloc/free. If both allocators perform similarly well, then the STL allocator is fairly likely to be little more than a similarly thin wrapper around C memory management APIs.

The stopwatch used to measure the performance1 measures both the process’ system and user time. The sum of both is a decent approximation of how long the process would take if it had exclusive use of the test machine’s CPU2. The system time on it’s own should give an indiciation of how many system calls were made due to memory allocation3.

The machine used for testing isn’t the latest and greated by today’s standard: an Intel Core 2 Dua processor running at 2.33GHz, with 4 MiB L2 cache and 2 GiB of main memory. The machine runs Mac OS X 10.5.6. At no time during the test was swap space used.

Also of interest should be the compiler and flags used to produce the code: GCC 4.0.1 (Apple Inc. build 5493) is run with -O3 -finline and -fstrict-aliasing enabled.

Tests were run with value sizes ranging from 4 Bytes to 128 Bytes. As the size-based pool approach only pre-allocates pools for objects up to 32 Bytes in size, higher values are of special interest, as they should be pretty much identical amongst all allocators tested.

For each combination of value sizes and allocators, twenty test runs were made, and averages and medians compared.

You can download the raw performance measurement data, but here’s a summary of the results:

Averages of combined system + user time

  4 8 16 32 64 128
std 22.92 ms 114.38 ms 281.23 ms 543.11 ms 1075.63 ms 2460.16 ms
heap 22.85 ms 112.52 ms 281.42 ms 543.22 ms 1072.00 ms 2442.20 ms
size 22.93 ms 113.28 ms 278.99 ms 542.26 ms 1067.54 ms 2443.31 ms

Averages of system time

  4 8 16 32 64 128
std 0.41 ms 1.53 ms 2.78 ms 4.97 ms 10.08 ms 23.94 ms
heap 0.35 ms 1.14 ms 2.92 ms 5.11 ms 8.90 ms 19.22 ms
size 0.34 ms 1.20 ms 2.18 ms 4.23 ms 8.23 ms 19.65 ms

At first glance, the numbers for all allocators are surprisingly similar. The only thing that jumps out is that the standard STL allocator appears to spend more system time than either of the other approaches. Given that the heap_pool simply calls malloc/free, I have to assume that glibc’s malloc/free is better optimized to re-use freed memory than GCC’s standard C++ library.

Despite the miniscule differences in performance, though, you can graph very definite trends if you assume either the heap_pool or the standard STL allocator as the baseline. The standard STL allocator should make for a better baseline if you are using STL containers a lot, while heap_pool makes for a better baseline if you want to make a comparison with C-style allocations.

  1. Part of the fhtagn_util library, incidentally. []
  2. Wall time will usually be a bit higher. []
  3. It is assumed that libc abstracts this somewhat; in fact, I seem to remember that glibc adds it’s own pool allocation on top of the system APIs — but I may well be wrong in that. []

Pages: 1 2 3 4 5

Comments are closed.

Copyright © 2007 - 2017 by the respective authors.
Permission to use the image of Great Cthulhu has kindly been granted by F. Launet/Goomi Studio.
Other content on this website is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.0 Germany License.
Creative Commons License

Blog directory