Skip to content

GpuCache: Constant time hash lookups of matching glyph texures

Jeremy Soller requested to merge alexheretic:master into master

Created by: alexheretic

I had a thought after the last optimisation of the glyph search: Instead of storing the info of the glyphs we cache and searching through what we have, we could have a function that produces some hashable data such that 2 matching glyphs produce the same hash. In this way we could then lookup glyphs in constant time.

tldr: You can, and it's way faster.

Implementation

I've implemented this idea by instead of keying the cache by the exact scale & offset of the glyphs, using lossy integer scales & offsets. These integers are attained by dividing the actual values by the scale_tolerance or position_tolerance as applicable then rounding into integers.

So with scale_tolerance = 0.2 glyph with scale 10.0, matches glyph with scale 10.09 as the lossy scale_over_tolerance is 50 for both. The matching bracket is (9.9, 10.1).

This is a little different to before where you add a glyph with scale 10.33 and it would become a match for anything within the tolerance, ie (10.23, 10.43) for default 0.1 tolerance. The lossy approach the bracket already conceptually exists, so the glyph with scale 10.33 becomes a match for anything in it's bracket (10.25, 10.35).

Impacts

The old approach meant that you could have glyphs next to each other but in the worst case rendered with textures up to 2 times the scale/position tolerance away from the ideal. The new approach means the absolute maximum error is equal to the tolerance, but you can now have glyphs close to each other but requiring a texture each as they lie in adjacent brackets.

So different usages could see size requirements go up, but may be able to simply relax their tolerances if previously they were seeing the worst case.

Other than subtle changes to tolerance meaning the API is mostly unaffected, that is except gpu_cache::Cache::set_scale_tolerance & gpu_cache::Cache::set_position_tolerance which are now deprecated. These method invalidate the cache as they're moving our hashing goalposts. They function equivalently to recreating the Cache. It looks like no-one is using them anyway (crates.io/github quick search).

Because of this we should probably bump to 0.6, even though all 0.5 code should still compile.

Performance

All this yields a large performance improvement compared with 0.5.2 (using b4ad64e6 to test which has the latest bench code). The moving text benchmark goes from 4.1ms -> 1.5ms frame latency. General performance is 1.6× to 3× faster. First run performance looks about the same, though maybe the change has just got the benchmark scenario generating more glyph textures than before as discussed above.

name                                                         control ns/iter  change ns/iter  diff ns/iter   diff %  speedup 
gpu_cache::cache_bench_tests::bench_high_position_tolerance  1,813,760        1,113,060           -700,700  -38.63%   x 1.63 
gpu_cache::cache_bench_tests::bench_moving_text              4,086,238        1,456,736         -2,629,502  -64.35%   x 2.81 
gpu_cache::cache_bench_tests::bench_multi_font               3,628,038        1,309,421         -2,318,617  -63.91%   x 2.77 
gpu_cache::cache_bench_tests::bench_multi_font_population    12,649,773       12,160,079          -489,694   -3.87%   x 1.04 
gpu_cache::cache_bench_tests::bench_single_font              3,799,776        1,255,568         -2,544,208  -66.96%   x 3.03

Code Changes

The code is overall reduced. BTreeMap search logic was a big part of the old code, and now is totally gone. I also removed the very short lived use of rayon, as with the new data structures it didn't yield a significant benefit.

Merge request reports