Skip to content

Add multithreaded gpu_cache rasterization

Alex Butler requested to merge (removed):multi-rasterize into master

This change adds multithreaded rasterization in environments with more than 1 CPU thread available. This can significantly improve worst case performance of the gpu_cache, ie initialization, resizing etc.

Gpu cache work

  • Glyph rasterization work is generally uneven, one glyph may require much more work than the next.
  • The pixel data upload function may only be called by the "main" thread.
  • We want to spread out work, and have all cores stay as busy as possible.
  • We want to avoid holding all glyph pixel data in memory before passing to upload.

To handle the above I used crossbeam-deque where n-1 threads are work stealers, rasterizing and then sending the pixels to the "main" thread. The main thread itself continually rasterizes and uploads, then uploads any completed work-stealer work.

In this way all threads are working without blocking, and no more than the necessary pixel data is held in memory before being uploaded.

Most of the gpu_cache benchmarks only rasterize during warmup, but the population, thrashing & resizing benchmarks are significantly improved by multithreading.

Benchmark comparison with a 4-core Haswell

name                                    control ns/iter  change ns/iter  diff ns/iter   diff %  speedup 
cache::multi_font_population            8,361,106        2,704,365         -5,656,741  -67.66%   x 3.09 
cache_bad_cases::moving_text_thrashing  21,818,522       7,153,560        -14,664,962  -67.21%   x 3.05 
cache_bad_cases::resizing               15,417,159       4,812,120        -10,605,039  -68.79%   x 3.20
Edited by Alex Butler

Merge request reports