 Thank you for coming to the talk at nearly the end of the conference. As Randy said, the title is Context Aware Traffic Routing. Here's the agenda for the talk. We're going to talk a little bit about ourselves, explain to you what the motivation you might have for doing Context Aware Routing, discuss the issues involved, talk about the implementation, look at results, and then talk more about our prototype. And we'll definitely have time at the end for questions. My name is Piau. I'm a senior staff software engineer at Niantic. Hi, I'm Renana. I'm a staff software engineer at Niantic. Let's jump right in. You probably know about the usual Context Aware Routing that nearly every website uses. The URL directs you your particular request to a backend that then serves your request. Nearly every website does this. You can also get data from the request header about how to do routing. It's usually done by a load balancer. You can also have side channels that are used to direct traffic. This might be an application that tries to give smart load balancing by checking which servers can take the load. And what this talk is about is using the actual content of the request message in order to do the routing. Why might you want to do this? In our particular case, what we wanted to do was to be able to retrofit smart routing based on the data that was already being passed into the packet. That routing wasn't previously done, and we figured we could do this without changing the application. You can also do backend mapping based on either personal information that's already in the packet, so user X always goes to this server, or based on whatever intellectual property you might have. In one of our applications, we actually piggyback multiple protocol buffers into a single request, and when the monolithic application is broken up into microservices, one thing that's useful to do is you can take each of these individual sub-messages in that protocol buffer and route them to different services simultaneously. Notice what the custom Envoy filter looks like. The client sends a protocol buffer or message, and Envoy captures it because the data deserializes the entire protocol buffer, applies the routing algorithm, and sends it to the correct backend. We did two runs using this naive approach, one with an average message size around half a megabyte, and the other one with average sizes in the two megabyte ranges. The entire test turned out to be bandwidth limited, and what would happen was that for small messages, you can do a lot of requests at once, and that uses about seven calls for large messages. Because of the amount of bandwidth you're using, you can only do about 400 before your net link gets saturated, and because you actually end up with only about 400 messages per second, you need a lot less CPU. Both the small and large messages take about 168 megabytes, and this makes sense because that's basically what your bandwidth can do. Looking at a graph, you can see some interesting characteristics. So for instance, for the large messages, the memory is much more spiky, and we speculate this is because of the heap allocator. You would get a message, you would spend some time defragging your free list or allocating free lease and breaking it up, and so that's where you see these up and downs on the memory use. And then on the CPU, sometimes you might get bustiness in terms of number of messages coming in, and so that's what you see on the CPU graph on the small messages. Okay, let's see if we can improve it. To be able to peek inside the actual message without actually having the protocol buffers definition, you need to decode the protocol buff wire format. Turns out there are only five objects that are typically inside a protocol buffer, and one of them has been deprecated, so you don't even have to think about it, so that makes four. The key, the most important one is the variant. That's what you have on the right side of a protocol buffer definition when you're specifying a field. That's a variable length event. It ranges from eight bytes to 128 bytes, and that's also why you try to use the smaller numbers on that right-hand side, because those are smaller and use less space and bandwidth. The fixed 64 and fixed 32-bit fields are self-explanatory. They're usually used for doubles or floats, and the interesting bit from our point of view is the length-delimited field, right? And those are used for both strings and embedded protocol buffer messages that are inside the overall big protocol buffer. So our insight is when you see one of these lengths-delimited fields, you can try to unpack it, treat it like a message, and if you fail, then it was a string. If you succeeded, then you can peek inside and say, oh, you know, let me take a look at what's inside. And in this example that's on the screen, what we do is we have an envelope that contains four strings. And as an exercise, this last string envelope ID is used to inject a unique string that lets you say, oh, this particular thing is one of these envelopes that can be useful routing. And the other three fields add ID, server ID, player ID could easily be the content on which you do your routing. A few interesting observations. First of all, the envelope can be retrofitted. So if you had an existing protocol buffer where the payload is already used up, like the first 10 fields, you can use the 12 field to inject an envelope if you needed to do that. When you have something like an envelope ID that's a unique string that's used to identify the envelope, what you can do is you can change the contents of that string in order to indicate the version number of the envelope that you're doing. So you could indicate a backwards incompatible change. For instance, if you needed to change the type of envelope you had. And lastly, as previously stated, you can actually use a combination of the data in the envelope to do a routing. You don't have to depend on just one field or another. So we implemented this partial deserialization. And this is what it looks like. For small messages, we used about the same number of calls, but the amount of memory used has been reduced by more than 50%. And this makes sense for a small message, an envelope, even the envelope is not as big as half a megabyte. So you're only deserializing as much as you need to do routing. And then the rest of the protocol buffer is just handed off. For large envelopes, again, you're still bandwidth limited, that doesn't change. But because you're processing less of the envelope or less of the protocol buffer before you route it, you'll actually improve CPU utilization. You also save some memory, not as much because your payload is so big. In this case, you save about 30%. This is the interesting chart. What's happened in the large message case is that you no longer see that jiggle because you're no longer fragmenting your heap as much as you used to. Both the CPU on both sides look pretty much the same. But in this particular case, we were pretty happy about the performance improvement we got. So now we would like to focus regarding context that we're routing for the case of our stateful services. And what does it mean for Niantic? So in Niantic, when you play Pokemon Go, a mobile client, when it connects for the first time, is assigned a server. That server is where the player state is being maintained. And from that point on, we want the request to be as fast as possible, more in the area of milliseconds than in the area of seconds. So how does it look? When a client connects for the first time, it sends a handshake request. The server then goes into a storage where the state of the player is being stored. And it asks the storage to assign the player to himself. The storage then check, is it possible? Is it the player already assigned a server? If it's assigned a server, then it will return the ID of the server it's already assigned to. And if not, it will say, yeah, sure, your client. At that point, the server will reply to the client and will tell it what is the ID of the server that it needs to continue to talk with. The server itself, or the server that it was already assigned for. The case, by the way, of when it was already assigned is, for example, a restart or a player getting offline for a couple of minutes. So what does the client do at that point? As you can see, oh, I can use my mouse. So as you can see, the client is using the ID that it just received from the server and sends it with every request. That request then reaches in VOI, which already have a routing, a mapping of listeners and clusters, and knows that for each ID, which backend to send the request, it strips out the ID and then send it back there. The servers themselves, to make sure that everything works correctly, are talking with a storage where the player state is stored. And in that storage, we have mechanism, of course, for timeouts when a client is not connected for a long time. And locks, of course, to make sure the client is assigned only a single server. All this flow adds a lot of latency. Now it's true that this happens usually only on first connect or after a long time that the client haven't connected. It means that the request can take up to two seconds, just because that's the configured timeout we have currently on that connection, if, for example, the database starts to be very slow. So what is the proposed solution? If we look on what PIA was just proposing regarding how we can take the context out of the messages themselves, then whenever the client sends a message and we can either look in the header, if you look in other methods, but if you look in our own methods, look in the content, we usually apply a routing algorithm and then reroute, in our case and proposed solution, we take the client to server mapping and put it closer. So in this case, for example, the client to server mapping is part of the memory of the running envoy. Every message that gets in, we pull the context, in this case, the client ID, and then we check. If there is already a map, then we already know where to route it. If not, then we take just the selected load balancer algorithm and then we apply it, take whatever envoy propose to us and ask the mapping to update accordingly. So if before we needed to have the ID on the client and the storage in the back hand, now the client no longer needs to know what server it's talking with. It's not something that has any impact. The envoy holds the client to server mapping and the servers themselves don't need to communicate with the storage to make sure that they are talking with the right client. Now one of the things that is important to note here is that we guarantee 100% correctness. And why is that important? So first of all, most of you already noticed that this works only if you have one envoy. Because if it's in memory, what will happen with the rest of them? You need a way now to synchronize this. So our proposed solution was to add a service called the state coordinator. This is where the origin of your data, this is your source of truth. This is where the client to server mapping actually leaves. Each envoy have a cache of that one. Now whenever a new client connects, sorry, then envoy will check though I have it in the cache. If I have it in the cache, great, I just map it to where it should go. If I don't have it in the cache, then envoy will go and ask the state coordinator to assign a server. Now one of the things that the state coordinator also have is the ability to monitor which servers are alive. If a server is started, it can reroute clients and have a lot of logic for supporting that. And then it will reassign a new backend. Now the moment that it has a new backend assigned, it means that your origin is already updated. It will reply with the server that the envoy needs to route to. And then the mechanism where we update the cache on each envoy is very similar to the Cosmos DB change fit. I will not try to start to explain that one because it's very complicated but there is a link here and we will upload the update presentation with this. But the way that we are updating is that whenever a change is happening on the client server mapping, we just stream the change itself. That allows us to do fast updates and allows the envoy to be updated as much as possible. So getting back to the subject of 100% correctness, if the state coordinator returns an error because he's not able to assign the client, envoy will return an error. It will not try to route to anywhere else. And that's how we actually replace the stateful database and all the logic behind it with this component. So I'm sorry that I talk fast because I do tend to talk fast, but conclusions. So in this case, we moved a lot of the computation and the logic at the state that was maintained in the backend and just shifted it closer to where actually it's needed, which is where we are routing messages. It's definitely improved performance. And then when we started to handle larger messages, we saw things are costing us more and take a bit more time. And then we started investigating the message, partial message to serialization to help with large messages. So thank you, everybody, for coming. I'm Venonayak Obing. And we have to plug for our employer if you find this type of work interesting. Yes, Niantic is hiring. And if this was interesting, we would be very happy. Any questions?