 We'll spend the next hour. I will do my best to explain to you what consistent hashing is and what can you do with it and A quick key about myself You can find me on the handle ultra bug. I'm a gentle Linux developer where I focus on mainly on the clustering and No sequer or distributed stuff and and I'm also sitting on number Lee the Goal of this talk is to really introduce you to consistent hashing So I will have to go into the before we do with the what I will have to Introduce you to the how so we'll start with the basics of What led to consistent hashing and what kind of problem it solves? But first let's go into a bit of history So this is the concept of consistent hashing is 20 years old actually It came from a paper from the guys from Akamai when they had some problems with distributed caching and then it's Got into the radar of P2P networks and like system like stored use consistent hashing to keep track of Where some part of a file is and? That it got into the database world and mainly after Amazon Use it in its dynamo DB for memory management so it's Strange to me that it's still fairly unknown to the casual developer and Maybe it will be a good chance for me to to make it even a bit more known as well and So let me tell you a story first on the basics. So let me tell you a story on mapping a Map is a way to take something that I point to another item and return that second item, okay? It's fairly basic concept The idea behind it is that you have a referential That you will use to point and to get some information This idea of looking up that the is The core of new mapping functions So it's like a phone book where you have a name and you want to find out the number of the people that you are looking up the basic of the map is that you have The referential selection step then you apply a logical operation on it and then that points to a Location where the information that you're looking for is told The whole process in the two first steps will Have a great impact on your lookup efficiency so Mapping is just a map and you know as developers most of us know this as a key relating to a value When you think about this you Usually think about a dict right and Python dict for instance He's a map you have a key and you have a value associated to this key, right? So let's get a deep dive or quick dive. Sorry about how is it? How is the Python dict implemented? the truth on the Python dict is that it's what we call a hash table a hash table is You apply a hash function on the key and Then you do another logical operation that will point to the value the role of the hash function is very important because it has a Direct relationship with the spread of your key on the location that is storing all the values and The two steps represent the actual implementation. So how is the implementation of the Python dict? Well, it applies this function. Well, the hash of the key is as a binary operation on the size of the in-memory array that stores all the values So if you do a hash of a you get a number That's the key point here a hash function Since you will do a logical operation on it has to return a number so you apply Binary operation on the size in this case is 11 and you get Index 0 so that means that you will get on your in-memory array at the index 0 is the value for key a That's is a hash table if I do the same for key C You see that it points to I'd index to if I do the same for key B. It points for to index three Okay Sounds pretty easy and pretty solid This hash function is a built-in of Python So there are some key factors to consider here because you expect this to be an efficient process, right? Looking up a key must return the value Fairly easily and quickly and so the distribution and balancing is the responsibility of the hash function and The operation will lead to some accuracy and performance On the location and on the whole implementation process When you add the storing of these data, then you start talking about scaling So does the Python dict actually scale good? The answer is I guess for some people who have tried to put a lot of information in a Python dict No, and the reason is that the hash function the basic hash function is meant for Very and the implementation itself is meant for very fast and consistently fast key lookups at the expense of memory Why because the hash function does not distribute the keys and the values of the keys Evenly in the in-memory array That means that if you if we take the example earlier you see that index one You consume some memory with no information in it So How do you scale out from a Python dict or form a national that is not very easily spread This is where distributed the hash tables come in So their little name is DHT. I guess most of you have heard of them, but the idea is Still pretty simple You split your keys into buckets So the hash of your key will apply to be to an operator But instead of pointing to a direct location in let it let's say in memory location It will just reference a bucket ID This bucket itself is a hash table So when you point to the bucket you will have to get the bucket to get to the bucket and then do another lookup over there fine Then you take those buckets and you spread it on machines on servers, right? So in this at this moment the distributed hash table Allows you to spread your keys and the data that comes with it around different machines Which is how you will be able to scale if you wanted to have Python dicts Insert inserted into buckets You could have Python dicts on different machines and then scale out what wouldn't fit on a single machine, right? So the key question now is what's the best operator? For the accuracy and the efficiency of this implementation What's the best operator you could have to point to the right server in the most efficient way So let's take with a let's approach this problem with a naive DHT implementation The naive one is uses the modular operator on the size of the number of the number of buckets So the idea is I do a hash on my key in this case I can't I wouldn't use the hash function of Python because it doesn't spread it doesn't mix the keys easily evenly that means that key hashed of a Gives me a number that is close or very close to the hash of B. That doesn't mix well. The idea is to have a random Output a random number coming out of a hash function so the best and easily known Hatch function that comes to mind is MD5 which mix is well the keys at least and Is pretty easily implemented and pretty standard as well So in this case, I would do a MD5 of B I would apply an ex digest to get an exadecimal out of it and I would transform it in base 16 to get an actual number On which I would apply a modulo on the size on the number of server or buckets that I have So the doing so with D would give me a modulo on 3 and let's say I have 3 servers would give Okay, it is on server your key is on server 0 then you could connect there and Get the value of what you're looking for So it seems pretty simple pretty solid and that's what we add actually In the 90s, I can say that but before consistent hashing came into play and that's Then the problem that we'll see is what got akamai Think about consistent hashing actually because It won't scale if you're if the number of your buckets change Because the modulo operator and the size of the buckets will And you apply the modulo operator the index the same key was Referencing to will change or have a very high probability of changing in this case The D is on modulo 4 points now to server 1 and Before it was pointing to server 0 So that means that on distributed system or on systems where you can have You need to tolerate the failure of a server or You need to scale up and add new servers if you were Implementing the naive DHT implementation on the modulo you will get some serious problems basically the number if we were using the hash function of a python the number of Key remapping so that means the keys that would their index would change would follow this operation, so let's say that for two thousand Keys spread on 100 servers you would get 1,900 keys That would point to a bad location if you'd only changed or added or removed one server from your cluster or your topology It would be a disaster even more at scale, right? So I guess we need help on this we need consistency well At least the best consistency or a better consistency Let's tell the truth, right? Okay And here I introduced you to the hashrang The hashrang is the concept behind consistent hashing The idea on this is Pretty simple actually when you think about it or when you hear about it and I guess I'll get I'll do a distant job at doing this and The idea is that you place your server on a ring. That's also called the continuum. So You can picture it quite easily you have an array and you bend it to form a Circle right and then you hash The name of your server which give you a number and you place it on your ring Simple enough for now So in this case, I have three servers. I hash their name and I place their number on my ring Okay Then how do you look it up? How do you look up a key and what how do you know that the key belongs to? Said the server. Well, you do the hash of the key as well You place it on the ring and then you go clockwise until you hit a server and That will be the server responsible for the key Fair enough. Okay If you do this only one over end of Your keys will be remapped or relocated or change if you add or remove a server on the ring That's fairly and clearly better What's cool with this is the more servers you had on your topology the more robust you get Okay But there's still a problem The problem is that the implementation itself can be nice if the hash function is not nice You still got the problem of distribution Even with the hashering and consistent hashing does not solve by itself the problem of distribution of your servers Around the circle the hash function does so you could end up with Because it's pretty random at some point if you think about it You could end up with this kind of distribution where server zero has a very larger partition of the circle Belonging to him because when we placed our three points on the server and the hash function didn't distribute them evenly, right? So that means that do servers and server zero would become what we call a hotspot It would get a higher load than the others Here server one will do almost nothing, right? So the hash function is very important and the choosing of the hash function is very important on The performance level because it requires computation, but most importantly on distribution and So they are not perfect so the question you asked now is okay, so Now it's time you tell me which hash function to use there are two kinds of hash functions and On the first example I went for the most common and maybe the first like comes to mind Which is MD5 which is belongs to the cryptographic hash functions and you know other like shea one and and shea X let's say those ones are pretty standard. They have a wide adoption and But they need like you said before like you saw before they need a conversion to integer So that will take some computation as well on the other hand you have non cryptographic and Algorithm that exists and that are optimized for key lookups. Those one don't include in their methods or code and Cryptographic-minded and anti-breaking or whatever Those you don't need to do actual and only key lookups, right? so the I guess the most famous of them are Murmure and it's V3 implementation that's the most recent one and city hash from Google They are fast since they are optimized for this so that that means and they their direct output is a number right and The only drawback I see About them is that they need C bindings C libraries to work So they don't work like out of the box. So you have to install them and On the bottom of the slide. I give you a raw estimation on the speed Comparison about all these hash functions Murmure is Pretty solid on on on the top of the list where city hash 32 is the fastest of all of all so this Reduce and helps you in the balancing and now you have to reduce the load variance on your site On your circle on your ring right for this. There's this concept of Vinod Vinod is just The fact that you will take your server and duplicate it multiple times on the ring just to Augment your the number of points on your ring and to reduce the size of all the partitions on your ring the in the consistent hashing word the best or the the most Acknowledge number of of Vinod's for a host on the ring is 160 and So it will give you a pretty solid number Pretty a solid idea on the number of times that you have to duplicate your your host on the on the ring The second one and the second aspect I want to highlight is this concept of weights You can have different servers and with different capabilities or computing power on your ring And you want your ring and to and you would want to adapt your load based on this as well, so you can Say okay, this server has a weight of two so it will be Duplicated more on the it will get more venos on the on the on the ring to have more load spread fine So now that you all want to do consistent hashing how do you do in python? So let's see all the implementations or the most famous implementation. I don't know if it's famous, but anyway Those ones that are existing and done so far so when I went looking into this I needed it at work and I said okay I want consistent hashing I understand the basics and I want I want to get my hands in it on the real code and on real applications, right I was kind of Disappointed I must say by the current and and the libraries that were present on on pi pi So I did I decided to go from and do my own which is your ashring and Which is the one that will have that I will propose on the rest of the slides and the first ones are mostly academic Implementation so they do only the hashing, but you don't have easy functions to work with them in the actual code So That's bad Google slides I don't know okay. Well here it is okay You can see it Will be a problem maybe later, but we'll see let's go like this Okay, so the idea on on on on on you ashring is that You are able to create some nodes and apply some node to each node that you will add to your ring You can have an instance associated to it, which is not any kind of object actually and then here I demonstrate that you can Get the hash node the node for the hash key For the key coconuts, but then if I wanted to use it I would like to be able to use it in the straightforward manner, right? So I would like the the actual instance associated to the MNT this for node is an open file on this on the system and I would and using the HR Coconut Key dot right actually points to a file descriptor so I can use it directly like this I don't have to get the node and Get some other object. I can use my ring in a straightforward manner That's just what I wanted to to to highlight here. Here. You see the nodes declaration with the open So each node has an open file descriptor Within this declaration and then I embed this on my ring So then I can use it directly with the key that I'm looking for Okay So be able to switch now No, I don't know if you have time to to see your list That I will leave it once again. So HR or coconut like here. You see that it points to a file descriptor and Then I can use it directly and use the file descriptor methods and functions Like this and then I can open the file and check that the hey coconut has been written in it Okay, so I will switch to some example use cases before I finish this if you have a laptop or Telephone of something bring it up. I will have a hopefully working. I don't know a city live demo and I simulated a raffle using consistent hashing so I will ask you a question about it You and I had some Raspberry zero to win so you will get a chance to win something out of this talk so If you have it prepare yourself, I will give you a neural to connect to anyway, so on Would like okay this for so I don't know why anyway, so on some examples you have the way a way to distribute some data on Database instances you could use consistent hashing to do it properly and to be able to add or prove databases from your topology and Don't so you'd have the minimum of mapping of data relocation to do when when you when you do that So here I have some client a has some data and I wanted to point to a random database But in a consistent way right So here I would Import your hashing I would create my nodes on each node at the instance of Property would point to an actual My sequel connection right and I would point to every every server that I have and reference them on for usage on my ring afterwards Then I would create my ring using those nodes and then I have some data And I will use the key of these data dict to select my node and to write this data into the database And for this I would use an for partition key Which it would be client a client B client C client D and then what's interesting is this one This HR partition key is points directly to the right server To the right connection of the right server for the key that we are evaluating And then I could execute inside my data and then do my commit great a sample use case number two The basic is to understand here that we would be able also to to spread some disk on network IO, let's say that you have multiple nas and you They don't do clustering stuff just have some some nas and you want to make the most out of them So you want to distribute your workload to those nas or network nas you would mount them on your file system Let's say or anything else you could open the files like this Just like the example I gave earlier and then distribute your data based on your key on the right nas and on the Another code you could have the reader part of the code which is implemented as well here Where based on the key it will find the right nas where the data is written okay Number three is also kind of interesting. Let's say that you have some piece of events or Data still coming in and you want to make sure that the user's session usually is a good example So a user station a user generate some data based on the user ID You would like to make sure that a given worker or server is Responsible for processing all these all the data of the the the current user you could do it to Have a greater performance so you could cash or do some clever stuff with with just the making sure that the All the cons in the consistent way all the data from the same user will be processed by the same worker And it was it also works for logs So let's say that you have 10 machines every machine logs The user ID on their logs if you were to want to trace all What the user did you would have to gather the logs from all the machines? Using consistent hashing like this you would not have to do it it would be Quite easier and at least in a consistent way So it would do your is best for your logs not to be spread at least So if you don't have a failure or you don't have add the new node on your topology, it can be a pretty good use case as well Okay All those examples and the source code of this is is online So don't really bother reading it like this on on the slides and I Couldn't do this presentation. We don't mention in caching, right? So I don't know how much about you. Maybe you can raise your hand use the Python memcache Library, okay Pretty I'm pretty a lot of guys. Well, actually the Python memcache Server selection when you have you when you create your memcache client You can specify multiple nodes to do distributed caching, right? Well the Python memcache implementation uses the modulo naive DHT Implementation so whenever one of your Cache server will fail or you will add another all Almost all your cache will be invalidated and it will generate some load if you load this data from database or something It will be I guess bad And so you are string I did I did a simple monkey patching on the Python memcache the library to change only change the server selection used to use consistent hashing Easily so you don't have to modify the rest of your code. You just do this and you get this new security Okay so Let's try to finish with a silly raffle So the idea of the raffle Usually when you do a raffle you want to win, right? So I did two implementations of this raffle the basic concept is okay I have a list of gifts one of the gifts is a winner gift which you can see here and Whenever you every time one of you will connect to the game We will simulate the adding of a node to the topology, right? And I have implemented this raffle using modulo or consistent hashing So if we were using the modulo operator that would mean that every time someone would connect So a new player joins the raffle The likelihood of the winner to change will be higher right So in doing so we would favor All the people who don't Who are not currently winning? If we were using consistent hashing every time a passive system would join the game Would have less chance of changing the winner so Will will play the game, but you will get to choose if we use the modulo one so to favor Most of you let's say our randomness or you want to use the consistent server and favor Whoever is gonna win? So your luck So let's do a hand raised hand for the modulo okay For the consistent I would almost have to count but you chose consistency so Yeah, okay. I'm surprised. I don't know Okay, so the concept of the game is is clear. It's okay All right So this is the URL so you can connect now Okay, EP 17 dot nbl y dot co You have it on top of the screen. You see it over there. Okay? Wait. Yeah, I know I need to run either modulo or consistent server. It's gonna work now, right? But it was fair play. I had you had to choose first. I guess I lost connection. That's why the stuff failed before anyway Okay, consistency is here the nodes only 10 pass already 10 participants 30 The ID of the winner is shown over there 59 really guys so here for those who don't have a Phone or not playing right now. You can see what the other people are seeing So basically here. I'm not winning, right? I don't have the winner Gift displayed so most of you don't have a funny gift Playing so even if you're losing you are at least laughing but anyway 93 I wasn't expecting 93 People playing the game actually so it's a it's a nice figure 99 I don't know who's the winner who's the current winner Here did it change a lot? Did you win the first time and he didn't change? No Okay, so you're 101 so now people are trying and are figuring out a way to abuse this I guess I Thought of it already. So I have in the code. There are some safety measure anyway. Okay, so I Wanted just to show a quickie on the two implementations On on the top is the is the modular one. So the modular one uses a Simple list to do this The other one uses the ashrin obviously When you add a node on a consistent On the consistent implementation. It's pretty easy. You see that it comes because it becomes a bit more complex When using the modular one But still it's not it's not insane I have a note cleaner for when people are leaving the game You see that people are leaving the game. So the likelihood is still changing. So the winner may be changing still And The node selection as well here you can see the winner selection And when it's on consistent, it's pretty easy. I passed the win URL to the ring So the winner I passed the URL of the winner gift and I get the winning node it's it's as simple as that That's all for me Thanks for playing Thanks for for being there if you have any question, I'd be happy to to to answer it. So you're the winner still Yeah, no, it just changed It's okay, you'll win Anyway, okay, the source code and all the rest of the stuff is is there. It's the the libraries on pi pi I welcome any contribution obviously So I hope that you understand better the hash tables even python dict and Distributed hash tables and now why or what you could do with consistent hashing and the type of problem is those Thank you. Yeah, so thank you very much And any questions? I think I just have a small comment So this is all really cool, and I don't want to discourage you from doing this but actually Python memory mapping is improved recently and The first graph you showed with how Python's dictionaries internally stored in the memo is no longer true for Python 3.6 And I really recommend for everybody in the room if you like this talk watch they talk by Raymond Hettinger from Python 2017 about dictionaries because Python 3.6 really improves the situation and your dictionaries are going to be sliced by half from the memory, so that's a really good improvement if you don't want to have a distributed system and you just want to improve your own Performance memory performance of your own single process Thanks for this insightful comment. Yeah, you're right Hi, thanks for the talk you mentioned replicating notes on the ring to Kind of reduced probability that you get an uneven distribution How do you distribute the notes on the ring because it seems to me that if you do this in a Predictable way or with the same distance everywhere. You don't improve the change the distribution at all Can you elaborate on this how this works? I? Can you go back to your slide of echo? I don't you go back to the slide where you distribute or repeat the notes on the ring Yeah This one. Yes. Can you explain how this improves the situation because it seems to me that you like just to replicate the same thing five times Yep, well If you compare this to this This improved the situation Doesn't it like If you take the slice of each node, it's Lighter now, then it was like this. So if here I took server zero and Server one and server two and for each of them. I would add to their name Dash zero dash one dash two dash three and then I would hash this again and place the result That would physically that would logically point to the same server But still would be a different point on the ring I would reduce the load variance of my ring by reducing the partitions on the on the ring Right. Thanks for letting me clarify this Variability of imbalance. I mean even using replicas Do you have an estimate of imbalance between the most loaded node and the least not the loaded node using this library So it is the value that you expect It's not dependent on the library implementation. It's dependent on the hash function so if you it depends on the hash function that you will use and basically the hash function as an In a range, right and the range of your hash function will be the range of your ring of the total possibilities of your ring so if you know all the data points that you have on your ring and Your assuring provides a way to to have this with the print continuum that I showcased you can calculate the part the Partitions of each node and then you can know for sure based on your hash function How what it will be the load of each of each server? Okay, I have actually two questions regarding the ring at first I mean why do I need to use the hash function to distribute the nodes? Can't I just distribute them evenly or so that I split the server which has the most load currently when I add a new one So what is the benefit of having the hash function set of just giving them evenly distributed points? And the second one is why is it a ring and not a line or whatever? So I understood the second question the first one with the echo. I'm sorry. I didn't understand it at all But why the ring and instead of a line? I guess it's for a representation for us But on the mathematics behind it it makes sense on on the ring on the ring because it's it's It's it calculates angles to represent partitions So it's on the paper on the aka my paper. So it's it will be better like it's like a trigonometric Ring circle, sorry, so it makes more sense on this and on the first question I'm sorry. Maybe you can catch me late at the end because With the echo I didn't understand. I'm sorry Okay, I know when I still okay. Thanks again Alexis. Thanks