 Hey, everyone. In today's session, I'm going to cover how we built a consistent hash ring for GRPC while we were implementing SpiceDB. Before we jump into that, I'm just going to give a little bit of background on my end and where I've been, what I've done, just to build some credibility. So it makes better sense kind of the context in which we went about to solve this problem and build this library. So my name is Jimi Asolinski, co-founder and one of the creators of SpiceDB, which is built by a company called OpZed. I'm going to get into SpiceDB a little bit later because it's going to be a core part of the story of kind of why we built this library and how it fits into our application and how it might fit into your applications as well. But prior to all of this, I worked at a company called Red Hat by way of the CoreOS acquisition. At Red Hat, I was a product manager and also an SRE. And then I built basically the largest service that was running at Red Hat. While I was at CoreOS, you might know that Neko system is called QuedaIO. And effectively, I've worked as a supper engineer and a product manager, always carrying a pager, always kind of working on distributed systems. In Neko system, the greater open source community, I'm an OCI maintainer. So that is the standards body for containers if you're unfamiliar. And then I've also created a couple of kind of CNCF landscape projects, notably operator framework is one of the ones that we started at CoreOS and got inherited by Red Hat and eventually incubated by CNCF. But I've worked on a whole bunch of other things in the space too and I've kind of been around since before the CNCF was founded, I started working at CoreOS in 2014 and CNCF and Kubernetes goes around in 2015. So been around familiar with containers, familiar with kind of cloud-native ecosystems and building things on the cloud with or without Kubernetes in a cloud-native fashion. We'll up some contact info here. Feel free to like come back to the slide whenever. I prefer folks reach out to me by email. Doesn't matter what the question is, I'm willing to spend time with anyone answering any questions, guide you to the right place. Email is the best way. If you wanna reach me for that, I'm saying synchronous, but you can also find me on GitHub or Twitter. But then also if you want to actually like ping me synchronously and have a real-time conversation, you can also find me in the SpiceDB Discord under the username Jayzolinsky. Speaking of SpiceDB, what exactly is SpiceDB? It's gonna be critical to kind of understand the rest of this talk. SpiceDB is actually a framework for kind of building secure modern authorization systems. So fundamentally folks, they get product requirements in their application usually when they get their first big enterprise customer and that functionality usually is dictating some kind of hierarchy around permissions and access to data stored within the system. So they might require that for example, there's a higher level concept called like an organization. And organizations can have teams and then within a team there might be roles and then users are assigned roles. This is kind of like a typical kind of hierarchy that you'd see. A lot of people will call the system I just described RBAC, but there's a lot of subtle details as to how different applications implement RBAC in different ways and what those requirements are. And because all those things can vary greatly, the recommendation from security professionals is that you actually build on top of a pre-existent authorization tool chain to actually build out these systems because if you build them yourselves, you're very likely to introduce security flaws or make changes that fundamentally limit the capabilities of your product and make it so you can't implement features that your customers are gonna require in the future. Ask me how I know I should experience all these pain problems, which is why we love to start our company outside to build SpiceDB. So why is SpiceDB different from your home-grown system? So it is actually inspired by the technology built at Google called Zanzibar and this is SpiceDB is the most mature and source implementation of that. It's being used in production for a couple of years now by companies large and small. So we kind of tested it in the full gamut, but the real core power to it is kind of all around its design with regards to relationship-based access control. So REBAC, which is not to be confused with RBAC, there's actually a kind of lower level of design where the chain of relationships defines whether or not a person has access. So this actually first got started by Facebook. They started protecting their social network using basically graphs. Because Facebook's database fundamentally is a graph, it was a very natural fit, but ultimately what Google found was that even applications where the core data is not a graph, it benefited from modeling at least the authorization system as a graph to make that a separate and more understandable system. And what's even cooler about all this is that actually because you're modeling a subset of your data as a graph, the graph system actually runs as a separate system entirely. So that would be space DB in this case and your applications would call then into space DB and ask questions such as, can this user access this document? That's really powerful because now you don't actually have any of that logic living within your applications and any application can make those queries at any point in time in the lifecycle of handling requests or user interactions. That means you don't have to know explicitly ahead of time what permissions need to be checked to handle a request. Instead, you can let any application check the permissions as needed to handle a request at any point in time while the system is being processed in something like a microservice architecture. So I said earlier that space DB is a permissions database so that means that you write a schema and the schema kind of models the objects in the system and the relationships across them and then you're going to write data into the database that has that schema applied and then you can query that data in an efficient way to answer these permission questions. So on the left, I've got an example schema. This is basically modeling a very simple system that kind of looks like Google Docs. So there's going to be documents and then there's going to be kind of roles that you can assign to each individual document. It can be a reader of a document. It can be a writer of a document and there are permissions on that document. So just like can you edit or can you view this document? If you look at the very bottom where we have the view permission, you'll see that the view permission is actually defined as being both the reader and the writer. And you can actually then see from the right-hand side if someone does a permissions check, this is using Zed, which is the official command line tool for space DB. This lets you query space DB from your command line and you just check on data and truly understand kind of the flows through the system and as well as timing information and debugging metadata. So in this example, we've done that check and you'll see that first evaluates to true. So Fred does have access and we'll even explain why it will show that first it checked the writer set of users and it did not find Fred, but then it checked the reader set of users and it did find Fred. So Fred was granted access by means of being a reader to the system. Cool. Now we have kind of a basic understanding of space DB, but what's really important is kind of this nested behavior. There's a reason why this debug information is kind of being displayed as a tree. That's because you can actually break down the request, that check request, checking if Fred has access to a first doc and you can actually do a bunch of this in parallel. So fundamentally how space DB works is it breaks down the request onto some requests, evaluates them in parallel and it wants to try to cache those values as much as possible because authorization and checking permissions is in the critical path. Absolutely everything you're gonna be doing within your systems. So for example, request comes in from your API. The first thing it's gonna do is check to see if the user has access to be able to perform that action that they're trying to perform. Before your application does any other work. So that means that we're targeting in the few milliseconds for response times from space DB. Fundamentally that means that almost everything needs to already have been pre-computed and in cache. So we wanna serve everything from memory as much as possible. So fundamentally the solution to us getting all the stuff served to that quickly is to have a sophisticated distributed caching system. So not only is space DB kind of this recursive parallel graph engine, it also has a sophisticated distributed caching mechanism built throughout the whole system. If you're familiar kind of with caching and a regular application architecture what normally happens is you have requests flowing into the system. They get randomly assigned to a server. An instance in this case are the circles. Each of those servers have their own independent cache in memory and then if the value that the application is looking for is not in that cache it then reaches back up to storage which is probably a database system to actually compute that result. And then it returns that probably inserts it into its cache. This is pretty standard in web applications from a caching often in storage is credits. So distributed caching adds these additional arrows to that whole system which is kind of this bidirectionality between the caches in the set of applications themselves. So that means that they actually are aware of each other and know what values are actually in each other's cache so that they don't have to reach out to storage. They can reach out to their neighbor's cache and get that value straight from memory. You might be thinking, hey, like, what's the difference between the hop to storage versus a hop to my neighbor to reaching the cache? While you are paying the same network latency querying database is likely going to be more overhead. It's probably gonna be doing way more computation. And it's almost certainly not gonna just read directly from memory from already pre-concuted result. The more expensive the computation you need to do on the data store in the storage system to get that result, the more the cache is going to be saving you from that work. So it is usually worth making an extra hop to your neighbor, even if you are making that network latency. All right, so now that we kind of understand it's supposed to be use case, it's time to dive into what a consistent hashing is. But before we answer that question, we have to answer the question, what is consistent hashing? So consistent hashing is a concept that was kind of mostly research in the 90s. At the time, it didn't really have a name. Nowadays, a lot of folks are familiar with this because it's so you could just be used in distributed systems. The idea is that you're going to map a key to a finite set of nodes. You're gonna do that using a hashing algorithm. So if you take something like three values, foobar and baz, and you run them through a hash algorithm, they're going to evaluate to a number. And that number is then going to map itself to a set of nodes. The most simple hash algorithm you can use is modulus. So you can say like mod three in this case is the example being used here. And that means when you go to run foob, you do mod three, it gets evaluated to one. Every single time you run foob through that hash algorithm, it needs to evaluate to that same value. This is gonna be the core between picking which nodes we're going to route to and which nodes should hold the value in memory in their cache. This is kind of like the first stab at the research. You'll see that foobar and baz are both getting mapped to node number one, and then bar is getting mapped to number three. But where this really falls over is in a problem called rebalancing. And so rebalancing is meant to deal with is basically what happens when one of these nodes go missing. So in this example, I made it so that node one goes missing, but also there are other events such as adding more nodes to the system that would incur a rebalancing to try to fix all these things. The problem here is that if we remove node one and we only have node two and three, we're still going to get values if we do modulus three as our hash function that mapped to node one. foob and baz finally have to go to node one but node one is gone. So now we've effectively lost this data for using this to store data. We won't get an answer to this problem until basically a algorithm called rendezvous. And that was written in a paper in 1996, but I'm actually not so interested in rendezvous so much as a paper published the following year in 97, which is where we get this hash rings introduced. And while not the first solution to this problem, it has become the lasting solution to this problem. The idea behind a consistent hash ring is actually that you have basically this array and the array wraps around at the end. So if you iterate past the last value in the array, you arrive back in the first. The way that folks like to kind of visualize arrays that work this way, typically call them something like a ring buffer, they put them as a circle, kind of similar to a wall clock like I've drawn here. And then what happens is we have nodes one, two, and three mapped as effectively times on the clock. But you'll notice there's a whole bunch of these other lines here. And those are all the possible values that can get hashed to. So when we run a hash on foo, r, and buzz, they get mapped to one of these individual lines and not actually specifically to node number one, node number two, or node number three. The idea now is that we kind of around those values to the nearest node. So we kind of move clockwise and kind of arrive at the node that is real basically in the ring. So now you can see that foo and buzz, they both get mapped to node one just like in the previous example and then bar gets mapped to node number three just like the previous example. But we've got this additional metadata now, this data of kind of the location on the clock. So now when we get a rebalancing event such as node number four being introduced in between foo and buzz, you can see if foo now points to node number four, buzz still points to node number one. So that means we only have to update foo from pointing to node number one from node number one to node number four. So we kind of like minimize the damage. A bunch of the keys will not have to be changed at all when this gets introduced. And so we now have this guarantee of kind of how much work has to be done when a node is added or removed. Now, you may be thinking, looking at this example, wow, that's like a really bad distribution. Look where node number four, node number one are really close together. How can you make guarantees about the closeness of these things to kind of minimize the disruption? And so what you do in practice is you create what are called virtual nodes. So you would actually say we had kind of these four nodes here, what you would actually do in practice is probably have something like a thousand of those nodes and then virtually exist. And those virtual nodes would then map to real physical nodes that exist in the system. And that kind of keeps the balancing to a minimum amount when you need to add or remove nodes and there is disruption. It also lets you do a really interesting thing which is configure replication. So say we didn't want to lose data when one of these nodes ceased to exist. What we can do is actually modify kind of how we store things in this ring. We could say that once something is mapped, for example, foo is mapped to node four, we could then run another hash, well foo combined with node four, for example, and that would map to yet another location on the ring and we could store the data there as well. And that would be a replication factor of two. And then we could also maybe if the replication factor was three, you could hash that again and then find yet another location on the ring and save that data there. And then that is a kind of configurable metric that you can have for resiliency for nodes coming and going out of a system such that you're not going to lose data when one particular node leaves. So this is actually why the hash ring kind of algorithm has stuck around for a really long time. It has a lot of these properties that are kind of beneficial for configuring and dynamically adapting to the different requirements that folks have in applications. So we have an implementation of this, everything that was just described with the hash ring in Go that is completely agnostic to the use case. So you can install this in your Go application and start using the following API. What makes it generic is actually you can plug in your own hash function. So you could literally use modulus like I just used in the example where you could use basically very, very fast hashing algorithms that are not cryptographically secure or you can use cryptographically secure hash algorithms that are expensive to compute without hardware acceleration, for example. All that's completely configurable but we kind of have this basic API that always stays consistent across hash rings which is one of adding and removing members, listing the members themselves and then also finding kind of the set of members from that list. So this is totally agnostic and can be adopted by anyone in any use case as long as they're writing in Go. But the whole topic of this conversation was actually GRPC and how this all fits into GRPC. So now we've got this kind of agnostic and a neutral implementation of the hash ring concepts and we map that back into GRPC world. So we can actually kind of bounce GRPC requests using this hash ring logic. So if we dig into Google GoLang.org slash GRPC slash balancer and this is a Go package that's used as bread as the set of kind of official GoLang, GRPC packages. Got this interface balancer and if we look at the documentation we can look at all the kind of methods. The only method we really care about is this update client state con method that I'm gonna be showing in depth later. But if we go through the actual that the doc comment, which is at the very bottom you'll see that says balancer takes input from GRPC manager sub cons and collects and aggregates connectivity states. But it also generates and updates the picker used by GRPC to pick sub cons for RPCs. So fundamentally it sounds like we're gonna have to do a bunch of bookkeeping that GRPC expects. That's that kind of first sentence. And then in the second sentence it says actually this is gonna be your hook for providing a picker implementation. We're just going to pick how we're going to map the request to a particular connection. Great, so let's look at what a picker is. A picker has this one method pick and that's the thing that's actually going to do the mapping like you said. So fundamentally what we need to do is implement a picker that is going to pull nodes out of our member list and select the right one from the hashering and then send the request to that particular member. Fantastic, so let's start implementing the update client constate method. We've got a ring balancer implementation here and I kind of cut out a lot of stuff. I'm just gonna focus on basically implementing this one method and also cut out a lot of boilerplate things like locking and things that are not necessarily tied to the hashering logic in this. You still have to implement a lot of the invariance that GRPC assumes when you hook into GRPC and implement balancer. So I don't really want to focus on much of that, although I will cover some of it just by happenstance. I really want to focus just on where we're kind of doing updates to our hashering within this implementation because that's the novel and interesting part about this library. So we get into this method. This method is going to be called whenever there's a change to the client state. So we're gonna be handling a bunch of different conditions that are gonna occur to connections. The first of all, the first one we're ever gonna check is that the service config has changed. From the way the config has changed, then we probably need to start from scratch. So in that scenario, we have us allocating a new hashering because we're assuming that hashering cannot fail in this scenario. We use the most new constructor there, allocating new hashering, and we set the config on our ring balancer. And then the next conditional below that is another state that we have to handle, which is basically we've gotten this call but we actually haven't set anything up yet. We haven't gotten the service config with any of the settings, in which case this is kind of like the default state. Like we haven't been allocated yet. And in that scenario, we're just going to set a picker that basically just throws an error. This is kind of the recommended behavior from kind of the different GRPC implementations. And they actually provide this nice library for us base. That's gonna provide a utility function for us. This error picker is gonna return this failure for us until we get to a state where we can be valid and we can actually start picking good views. So now we're gonna get into the meat and potatoes. We're actually going to be able to start tracking changes to the member set. In this case, we're going to look for folks that are not yet added into our member list. So new connections. So what we're actually gonna do is we're gonna iterate over the list of addresses that we're gonna get out of the resolver state. That came in when we got our update. And basically we're going to check to see if this connection exists in a set of connections that we're maintaining ourselves. And if it's not there, then we're gonna do kind of some bookkeeping, but fundamentally at the very end, you'll look at the bottom, it has hashering.add. So we're gonna make sure that we added this new connection that we weren't tracking before into our hashering so we can start mapping requests to it. And then the next code block below that is gonna be doing the opposite, which is removing anything that shouldn't be in our hashering anymore. So what we're gonna do is iterate now over our set of connections that we know exist in our hashering and we're gonna look them up in that same set that we just saw. And if it's not in that set, then we need to remove it from our hashering. So you'll see again at the very bottom, we're calling that hashering remove. There's also a bunch of internal bookkeeping just around normal logic that we have to maintain just being a normal gRPC balancer. But then we get to the end of this function where we get to call this update state function that's gonna do basically both the heavy lifting for us and that's where we get to actually provide the state that we just computed, but also the picker. So the picker is really, really important. As I said earlier, the picker is what is going to actually map a particular request to one of the members of our hashering. I'm glad that it's so elegant that it fits very cleanly in this page, the very small function. Fundamentally, we're kind of smuggling this context key in that is the key for the hashering that gets provided with the request context. But then we use that to find the members in our hashering. You'll see that we have this value or this variable called spread. And we're using spread not only in the call to find end, but also a little bit lower. We have this index, how we're gonna basically ultimately index into the member list to pick the ultimate connection that we're gonna map over, that we're gonna send the request over. The idea here is that if you have a replication factor that is greater than one, that means that we can actually load balance basically requests across those replications. So this fundamentally means that we're gonna pick a random node of the set of replications for that have been mapped to. And the utility here is not only just like spreading the load across those, but also if you have a failure, if one of these nodes disappears, you're gonna actually see less disruption because you're not gonna have been sending all of that traffic directly to that node. You're gonna actually have been spreading that traffic a bunch of different replications. So now you'll minimize the more disruption if you're kind of spreading across a replication factor. And so then that ultimately is our picker. So that is the core logic behind allset.com, or sorry, github.com slash allset slash consistent, our library for implementing all this into your PC. There is a lot more boilerplate, but this is the meeting potatoes fundamentally. That's it. Honestly, to add this to your code base, it takes about an import of our library. It takes a registering of the balancer and it takes setting that initial service KDA. And that's about it. I'd like to thank everyone involved at Allset who helped build this library, but also the folks over at GRPC that write all the go tooling, I think that's also the go line team and just the GRPC team. In addition, I'd like to thank the folks that developed the library called kubresolver. kubresolver is how we actually detect and find that embryo list. So that space to be can self cluster on Kubernetes. It doesn't need to have any configuration from our user. It will actually just auto detect nodes and start kind of clustering itself and all the dispatching and caching that comes for free. If you're interested in implementing anything like that, feel free to jump into our discord, the space to be discord. This is not just for folks that are trying to solve authorization problems, but folks interested in distributed systems are kind of building orthogonal tooling in the cloud native ecosystem. Also, if you're interested in kind of other distributed systems topics, I've also given a previous webinar talking about database consistency and consistency in general that you can find with this YouTube link here. Thanks for listening.