Skip to content

First performance pass!

Jeremy Soller requested to merge perf into master

Created by: ticki

This PR contains a big set of patches, introducing new features which drastically improve performance. The main big one is full TLS allocators. This approach is very different from the one taken by Jemalloc. It is faster, but more memory hungry.


  1. A safe implementation of thread-local storage and thread dtors. The symbols are provided by ralloc_shim and are then wrapped in a safe interface in the library.
  2. Introduce the Global-Local allocator model, where a global allocator acts as provider for a thread-local one, efficiently "emulating" the previous model. This means that locking the mutex is rarely necessary. When the local allocator runs out of memory, it will get more memory from the global one.
  3. Tweak BRK (fix #29 (closed)) to take less memory.
  4. Full-blown metacircular allocations! Now the block pool vector is allocated by itself in an intelligent system making sure that unbounded recursion never takes place. This drastically reduces the number of syscalls made.
  5. Multiple changes were done to logging, most importantly is that the logging symbols are now provided by ralloc_shim and can thus easily be replaced to redirect somewhere else.
  6. Every allocator is assigned an ID making it possible to distinguish them in the logs.
  7. Instead of implementing methods directly on the bookkeeper, we add a trait, Allocator, which provides the methods we need.
  8. Bump to 1.0.0!

and many small things...


Thanks a lot to @NilSet whom have helped debugging this patch.


Pinging @NilSet @jackpot51 @stratact

Merge request reports