 So, last year at this conference I really proposed to look at the new hash functions for hash based maps and I did some benchmarks for, like, his intent was to look at the stack trace map and this is the one I will address last. And I also looked into how it is beneficial for a hash map to use a different hash functions. So the short resume is that jhash2 works pretty well for small keys and it also works actually well in terms of collisions. But for bigger keys there is a way to optimize hash maps like twice thrice and let's look at some plots. So, this one is comparing jhash, jhash2 and xxh3 hash functions. So we can see that like xxh3 wins for the most key sizes besides that it's not evident what's happening for, like, very small keys. And it happens that, like, Facilium, for example, were primarily interested in this small keys. So here is, like, a bigger picture. Here the right side is key size of 1k and another hash function which actually beats every else for, like, large keys is spooky so it's a new generation of jhash but it beats all the keys at the size of about, like, 10k so it doesn't make sense to use it. And also the blue line is xxh3 and it starts at the key size of 240. It starts underperform comparing to, like, the previous generation xxh64 and 32 because at this key size the original implementation switches from, like, Scholar to vector implementation and Scholar implementation actually is not very good. So yeah, another hash function which was mentioned the last year was zip-hash and it turns out that it actually, like, in terms of speed it underperforms everything else. Yes. Sorry, question about the previous slide. Is any of those implementations vector-based? It's like AVX? No, no, no. It's all Scholar. Yeah, I mean, for BPF it doesn't make any sense to use vectors so yeah, another comment here is that all the functions here besides the xxh3 they were all developed with O2 in mind, x-optimization, xxh3 performs better if it is compiled with O3. However, funny thing is that if I compile both xxh3 and hashmap with O3 it actually behaves worse, so. Can we compile, like, single.c file in kernel with O3 and everything else with O2? So I did compile the just hashmap.o and xxh3 with O3 and it performed worse than the O2. So if we have, like, xxh3 in kernel in separate.c file then we can compile it with O3 because we don't have as much. Yeah, so cp hash in terms of speed it doesn't get any benefits compared to jhash, for example. And here is an example of how if we use xxh3 in a hashmap how it affects a map and here is a map of 100k entries and from left to right it's like zero full and no, no, no. It's key size. Yes, the map is 100k and it is 100% full so it's the worst case actually. And on the left we have the key size and we see that for smaller keys jhash actually outperforms it here. And it also turns out like on different architectures this plot in the beginning is different so for some keys we win on like AMD for some keys we win on Intel but they start to diverge like for all the architectures at about key size of like 28 so like 32 is a good threshold to actually use it. And this is another slice so this is a hashmap of 100 keys 100,000 keys the key size is 64 and on the left it's empty on the right it's full and here you can see that like for 20-40% full map the xxh3 based hashmap it works like 45% faster than the jhash based map so for bigger keys it definitely makes sense to use a new hash function for hashmaps and those are lookups yeah but updates is literally the same right with offset. Yeah one like practical question is how to actually utilize it and in this way it actually works so this hash function makes all the hashmaps faster than the current implementation like it is the same for small keys because I recently changed it like the jhash by jhash2 but one interesting question is if it is like possible to actually configure the hash function when you create a hashmap and I did this I did measurements which did this like I created just 10 different hash functions and then substituted them in BPF general cop and like the slow function was used like in the user space side and in this case actually the xxh3 outperforms like all existing hash functions if we just use pieces of the xxh3 for particle or key size because like generic xxh3 is just a switch like if key size is less than 7 if key size is less than 15 if key size is less than 32 etc etc so are there like any ideas of like if it's possible to like create dynamic lookup functions like a implementation which I did is it was just a hack like I just used the macro and created like hash functions per key size and then use it in the hashmap it's but you will have to do this for like every map type which you want to to do this and if you want to change something you need to change like 50 functions or something like this it's not the best thing do we have a jhash for 64 bit or so that we could jhash is 32 bit based and like we actually utilize we never utilized the full 32 bits right it's for yeah but all new hash functions are 64 bit based okay yeah and like the original as I believe the original intent was to see if new generation of hash functions behave better for stuck trace map and I actually didn't see any difference so it it doesn't make stuck trace faster because the the most time for a stick trace we spent and get perfect call chain and this is like the 95% of time was spent so using new hash function makes tech trace like 5% faster it's it's better than before but it's not like any a dramatic change but in terms of collisions I did some tests so I created a Hockey kernel which creates stuck trace maps with different hashes so I just run BPF trace for a while and then I load system with some artificial load and look at how many collisions each map generated and when I run it for until the map is how many I don't remember yeah so it's it's about like 40% full so there should be no too much too many collisions and all the hash functions show about like less than 1% collision and it's really really pretty close and the funny fact is that the hash always one compared to see passion text H3 and if I run this same test like for a little longer the number of collisions obviously grow and but again it doesn't defer to it's not come like distinguishable for such tests and if I run the like the same test for the full night then the number collisions like it stopped at somewhere because probably I don't have too many different stuck traces in my artificial tests but at the full map like if we're just getting and getting new stuck traces all these hashes will get to this theoretical limit divided by the different number of different stuck traces so it still makes sense to use xxh3 for stuck trace because it does run faster but yeah this main question it doesn't help for collisions at all yeah thank you and you say sepash do you use half sepash or full sepash what is half sepash it's the 32-bit version of sepash basically I think this is the full one like with the secret generated and 64 bits so it might be interesting to run it with the half thing yeah yeah it's a simple test so we can run it it's the only question I have is like when are you gonna upstream this soon how much code is it in for xxh3 because like I looked at it it's like a lot of special casing for different ranges of keys and all stuff so how bad was this it's not like I don't know it's like maybe five thousand five hundred lens sound like this six maybe more I don't remember already it's not it's not that bad it's straightforward to port it to kernel style like the xxh3 internally it uses generic secret so first when you create a hash function you create like two five six bytes or something like this a secret which it utilizes so like for hash map instead of like hash round we would use generated secret instead why do you need secret like why can you just do like one shot for each key it's it's how it is implemented like instead of like when you when you pass it a seed it generates a secret so for hash map you just generate it on a location and then just pass the secrets to them but otherwise it's the like just hash impression could you go back to the slide where you had the test program you were running would it be possible to run that and grab the user stack as well and see what the effect on collisions is yeah sure thanks I mean this is the test so everyone I mean the only part like it's really really hockey because I didn't have time so I just patched kernel so it's like statically creates the next text race map with the different hash function so but yeah all right thank you very much