 So we have Doug Lardo from Riot Games, so go ahead Doug, take it away. Hi, so I'm going to apologize in advance. My presentation's a little less stiff than most of the ones I've seen so far. So if there's any jokes, like might make you feel uncomfortable, let me know. But my name is Doug, I'm a data center architect at Riot Games. I've been working there for about three years now. And I really like packets, whiteboards, Contrail, Python, my dog, and I really hate pickles. So if you want to get on my good side, let's talk about how those things are on everything. I'm from Riot Games, and Riot Games is the largest online game in the world. We have presence in the United States, South America, Europe, China. We're basically all over. It's a free-to-play game, and it consists of basically two teams. Each team has five players, and each team has a base that they protect. The two teams are consisted up of different roles. So you have one guy with a sword, and then another guy that can heal you. Maybe someone shooting a bow and arrow, another guy casting spells. But each person has a unique role on that team, kind of like each player on a basketball court or player in a football game. And the two teams fight it out and attempt to destroy each other's base. So we have a ton of really passionate players from all over the world. We have world championships every year. It's legally a sport now, so we can get people from other parts of the world can get visas to come and play League of Legends, so it's a pretty big deal. It's pretty cool. But with that comes very passionate players, and they really love their low ping time. They love low latency. They expect service to be on at all hours of the day, especially in the middle of the night when they're running 12, 13-hour sessions. So we have a lot going on, and we definitely have demanding players. And we'd like to do the best for our players, because we're players ourselves. So if you ever want to talk about my favorite champion or what video game I played, I was just having that conversation a minute ago, feel free. So let's get started. So long story short, our developers just can't ship code fast enough. As I was saying, we have a global footprint, and our data centers are all over the world, and the designs range from being very old to very new. So we've been around probably seven or eight years now. But the first data centers were basically built out of duct tape and bailing wire, and whatever we could find at Costco or Best Buy that we could just slap together and under somebody's desk. And those data centers slowly evolved. And then we have the very modern designs, which are like layer-through-claws architectures, spy and leave, very, very modern. But to our developers, they were suffering because they would try to ship to these different parts of the world. And the data centers were all different. So our operations teams would go in, and they'd have to kind of know where the firewall rules live and know how to change a VLAN, and it was different every time. So it took us months to actually deploy a new service, just because of so many differences across our data centers. So what we did was we built our own infrastructure as a service, kind of platform, kind of containers as a service. We would call it RCluster. What RCluster is, is at its core, it's Docker-based, and we use Contrail to connect all the different virtual networks to the containers. And everything's self-service. So everything's infrastructure as code-based. You go in, and you create a JSON blob that defines your application, how many instances of that application you want and what location. And then also, we define the networking. I'm just going to focus on the networking part here. So the one piece that we like to tout a lot, talk about a lot, is the declarative language that we've developed here. So we don't expose IP addresses ever to our customers. We always talk about applications and what application can talk to what other application, and we do that through this kind of JSON blob that we call a self-service blueprint. So we'll specify every application has its own level of security. So I own my application security policy. So we'll go in and we'll say, this source is allowed to talk to me or that source is allowed to talk to me. And if someone wants to get access from one app to another, we use GitHub as a back-end and we create a pull request on their repository. Then they can review it and then push OK. Those merges happen, and then access is essentially granted. So this is awesome for the infrastructure team because we're no longer the bottleneck. People can go in and do whatever they want, and when they get stuck, they just come to us for assistance. So we've come a long way from trying to figure out where to configure a VLAN and what the next free IP address is and dealing with the idiosyncrasies of each data center. But now we're completely defining our infrastructure as code in an app-to-app language. So it's been really successful for us. So kind of under the hood how we did it is all these blueprints are stored in GitHub. And then we wrote, we have our own, David Press isn't here. He's my associate back home, but he's the champion of the RFCs, which are the internal documentations that we used to kind of like talk about ideas and decide on what we're going to build. And one of his ideas was just name all of our projects after the RFC that talks about the project. So it's called RFC 291 Transformer. But what that does is it just scrapes in all those blueprints, and it has a global view of the world. It knows what apps can talk to what. And then it cranks out all the networking policies that we're going to need and then just tramples anything that's inside of Contrail. So every five minutes or so this job runs out of a Jenkins server. And we just say, if it's in the blueprints, then that's what exists in Contrail. So what this does for us is it effectively makes Contrail stateless. I'm like, yeah, we still kind of track what Docker containers are attached to, what virtual interfaces. But if we were to lose the whole Contrail cluster, we can just rebuild everything from scratch. And we know that everything's in code. So rebuilding a lost controller is no problem. And also, if we ever need to do disaster recovery, we can also do that too. So here's a quick overview of how this all works together. Engineer, we'll go in and make the blueprint. The blueprint gets rendered through the transformer and pushed to Contrail. And then the iteration loop can just occur as things change. One added benefit of this is now, since we have all of our security policies in the central source on GitHub. We can have the security team go in and look at the JSON and say, OK, this is a reasonable policy or it's not, and they can have one central point of truth that they know is definitely what's being affected in infrastructure because, like I said, we trample everything in production every five minutes. So that's much easier to program again since it's easier to audit. They know who changed what because they get that in GitHub. There's a bunch of benefits, just heavy in code rather than like NIP addresses, for example. So the developers have full visibility. The security teams can audit. And since we have now unlocked the potential for our developers to help us with their infrastructure problems, some of the developers said, well, I don't want to write JSON. This is annoying. I want a GUI. So they just, without really talking to the network team, because they didn't have to talk to us, they just went off and made a GUI. And it's beautiful. They use this cool new framework, and it's modern looking. It kind of looks like GitHub a little bit. They implemented all the business logic inside of that system so the users get feedback early. And it's just been great. So we estimated putting the system and probably saved us about three years worth of manually editing access lists. And that's not including all the rest of the just troubleshooting that, you know, inherently would come with that. So thank questions before I move on. Networking is code piece. Yeah. Could you please use the mic? Yeah. For the remotees. You mentioned the security audit can take place at a band. Is that done as part of the process of the teams building their configuration files, or is that done afterwards? And if so, is that a bit of a risk? Yeah. So it's, we got like what, five guys on security team or something like that? Like it's pretty small, and we have like hundreds of developers. So while the security team would love to sit there and audit every change, it's just not practical. So they're making best practice recommendations and periodically auditing. But it's up to the developer to be in charge of their own application security. So they ultimately own the risk. And what we're trying to do as experts is like, advise them on why it's good to do least privilege and why it's good to not do 10 slash zero. Like, you know, it's like, have you ever logged into like EC2 where developers like configured their own like policies? It's just like, like, oh my, like, okay, like, why do you even do this? I think it's just ridiculous. So there's that line there, but we don't want to block, but at the same time, we don't want to be insecure. So we're constantly, you know, adjusting that threshold. It's a tough balance. Anything else? Yeah? So you said the networking is stateless. So if it lose connection with the, where you have the state and you restart a node. So the forwarding plane like is already programmed like from contrail. So basically if the controllers were completely erased, then, you know, the cluster would collapse at that point. Like, there's that headless mode which would, you know, continue forwarding that the controllers aren't there. But this is assuming like total loss of the database, like a crater hit it or something or like we lost the power surge and everything. We're talking like disaster level. So we could spin up three new controllers, like fire up Ansible and then push the button on the transformer and then just all our policies would be back in there. Are you saying there is state in the controllers? Because what if the controllers are fine, but one of the, you know, compute nodes goes away and comes back. He's lost all its state and you've kept your state outside of the controllers. Is that right? Or did I misunderstand? I think you're talking more like the architecture of contrail, right? You said contrail is stateless, right? The way that we put all of our configuration in the external database mean, what I'm trying to say is like I can take that contrail cluster and everything in it and destroy it and then build a brand new cluster and then push it in and it looks exactly the same as the old cluster. So I think there's still temporary state in contrail in your application, but your master state is offline. I mean, yeah. If you want to talk more about like how the failover and the router flows and yeah, we'll talk about that. All right, cool. So self-service networking right now only exists in our cluster. And our existing platform, which is League of Legends based, is kind of still hanging around and hasn't really changed too often. The core platform has been built when the game was originally built, but they're not cloud-native apps. So it's still in that world where a failure of a particular node is pretty disruptive. We've done a lot of stuff to bring up HA, like putting in better database replication, for example, but the actual platform itself is still kind of fragile if the infrastructure gets ripped out from underneath it. So we are more or less not touching that part so much. But as we add on new systems, we're putting them as close to the platform as possible. So one thing we did recently was called a loot, which is a way for players to buy shards. It's kind of like chunks of stuff. So if I, after a game, I would get like a piece of a champion. But if you get like three of these things, then you can unlock it to get a champion, for example. So it's kind of like a microtransaction fun, open the box, see what you get type system. But that was real important to be very close to the database and close to the core platform, because there's so many transactions and messages that went back and forth. So we needed those two systems to coexist. And to complicate things, a lot of our platforms now are moving entirely to Amazon. So if we want to keep our goal of, okay, developers, like you have this one ecosystem, you write to our cluster, and then wherever you ship it in the world, it feels exactly the same to you. We needed to bring our infrastructure into Amazon, and we needed to do that elegantly and transparently to our developers. But as you guys probably know, Amazon is a challenge. Oh, let's put Contrail on Amazon, it should be easy. Like, dog, what are you doing this afternoon? You want to spin up a B-rotter, and you're like, oh yeah, no problem, it's all software, wrong, jerk. It takes you months. So we tried to just put that gateway right in there. It's kind of like, okay, this is probably an easy thing to do. But with Riot, everyone has a ton of ideas, and we kind of got to like, this is a good inflection point, like maybe we should consider our options. So we thought about writing a brand new transformer that specifically just wrote things for Amazon. We thought about just ditching IP everything for security. Like, why do we use IP addresses anyway? Those are annoying. Can't we just do some sort of PKI app-to-app encryption thing? And I won't go into that, because that's just years of work. We can just write to IP tables, like that's everywhere, right? Like, everyone loves IP tables. There's no way that that would cause you administrative headaches or operational challenges. Well, like, you know, it's an option. Or the one we ultimately decided on as our first kind of stab at the thing was to put the gateway as close to Amazon as possible. If we just bring the gate, a physical gateway, like, and just plug it right outside the Amazon peering point, then we can just loop traffic out and bring it right back in again. It should be good, right? But we wanted to try the first thing first, which was, let's just spin up a gateway. We tried out the VSRX, the VMX wasn't out yet. This was pretty close, and it was simple, and it kind of worked. But without some caveats and some challenges. So you could use the VSRX, but you had to turn off the firewall part of its brain because it doesn't really work unless you're in packet mode. So that's, like, overkill. And then also in Amazon, you have to disable source destination checking because the whole system does, like, a reverse, you know, is this IP? Like, did you send that? That's not your IP, then I'm chucking it. So you have to disable that everywhere. But what we found is it only works within a single VPC. So we get kind of these questions, like, why is this happening? Like, why would it work within a VPC and not out of a VPC? This must be some Amazon, something that's going on. But probably most frustrating to me as a network engineer is that there's no concept of any routing protocols. Like, you can't run BGP inside of Amazon, to Amazon. You can't run ECMP, you can't run OSP, BGP, or anything. Like, it's just, like, you get one static route to one interface, and you can't even do two statics and do, like, a floating static. It's just, it's really hard. It's just, like, it's the 90s. Like, it's just, like, we couldn't believe that there's, like, no options. You know, we talked to the product team and everything and road map and blah, blah, blah, but, like, we need something today. And without these routing protocols, how are we going to do HA? How are we going to do M plus 1? What if we get to the biggest instance size and I can't send any more packets through this thing? Like, I got a million players that are pissing me right now. Like, what do I do? So here's a quick little diagram of summarizing all that. Basically, we tried putting a single gateway inside and fail. So hardware gateway, those work, right? So as I said before, we'll just basically slap a gateway in our pops, which we had direct connect peering, and we just tried to leverage those. And this should be awesome because we know the gateways work physically, and we can gerry tunnel in, what's a few more boxes, right? Juniper sales guy showed up that day. He was like, this is a great idea. He loved it, you know? Shed them all over the world. So the big thing, though, is, like, we get this, I call it the sad trombone, and it's, like, so it, like, heads all the way out to the pop, loops around, and then, you know, gets its way all the way back. And we measured this, we did some estimations, and we said, yo, this should probably be fine. Most of our apps aren't that sensitive, right? Well, yeah, they are, and they got cranky. Plus, it felt really weird. Like, you're in the same VPC, but I'm getting this ridiculously long delay, like, why is this happening? Like, developers didn't understand it, and then we tell them that we made their traffic go to, like, Jehunga and back. They just, they wanted to throw us out. They thought we were kidding. They were not happy. And sometimes we didn't have Direct Connect. Like, we just didn't have that option. And some of our sites are in kind of remote parts of the world, and have a huge need to build their own data center there. So it kind of defeats the point. To be able to build a site completely in cloud is, like, kind of one of the reasons you buy a cloud. If you're gonna, like, go get a cloud and, like, bring your hardware and, like, it just felt terrible. So, yeah, it just feels bad, man. So, there's gotta be a better way, right? And we kind of looked at the pieces and we said, well, EC2 understands instances, and that's about it. So maybe there's a way that we can dump our routers down to speak instance? And we started doing the math, and we're like, this might be doable. There's pretty much eight limit, eight interface limit per instance, and we can put 30 IPs per interface. That's 200. Okay. So let's try it. And long story short, it ended up working. So one of the benefits is that it uses the native address space of the VPC, which was part of the problem with the, why those packets were being dropped when they moved between the VPCs is because Amazon has this virtual gateway that you use to traverse between VPCs. And if you think of that source destination check that lives on each physical interface for each instance, you can kind of enable and disable that, but when you move between VPCs, there's no checkbox or no phone call you can make that lets you turn that off. So what we ended up doing was just like, let's just steal a bunch of IPs from this physical VPC, and then just basically bring them all into the router. And then from there, we can GRE tunnel to wherever we need to go, especially back to like our data centers where our core platform sometimes would live or more importantly, I won't go too much into this, but we've built a global backbone basically to get players into our network quicker to one, have no politics as far as like, oh, AT&T's peering with, not to pick an AT&T with Cox or Ryzen or whatever. And there's like contention between who's gonna pay for this bandwidth and there's like this indirect issue where some players are coming from some directions and we're getting this brown out, but we can't pinpoint it because our stuff looks fine, but all these players are in pain. So we just bring them into our network quicker, shoot them across the MPLS backbone and then dump them in. And it's also a huge DDoS mitigation network as well. We leverage anycast and we have all these, we call them unicorns that are out at the edges, doing mitigation. So we really wanna make sure our players come in through our edge, but we wanna leverage the compute within Amazon. So it works. Now sad trombone, we leverage our edge. We can still use GRE tunnels to do whatever we need. A couple little things that we still needed to iron out is you have to monitor the gateways basically, make sure they don't die. And in order to do that, to do failover effectively, what you need to do is make a bunch of API calls into EC2. So if I have four gateways, each one has 200 IPs on them. I kind of need like a fifth one that is empty. It's kind of the N plus one node. And then when I notice gateway one is done, I'll go run to EC2 and say, hey, for all those secondary IP addresses that were previously living on router one, like quick reprogram them and route them over to gateway five. So that happens pretty quick. It happens like two seconds. So as long as you can detect the outage and have a script that moves the IPs, excuse me, it happens pretty quick. And one other thing that's nice is it's N plus one scalable now. So we're getting around a gig or so of traffic out of a virtual Mx right now. So if we need more, we just add another virtual Mx and then Contrel just sees that as an equal cost next hop. So it scales pretty nicely. A few optimizations if we're gonna try and do this yourself. You don't need to put IPs on the gateway interfaces. Just put Inet on them and just traffic goes in. So you save yourself seven IPs there. It simplifies management too. And one other thing we did is we took that same idea, but now we made a centralized VPC. So we took all the gateways, VMXs that we wanted, we put those in a central VPC and as teams spin up their own VPCs or we spin up more clusters, we just peer with that through VPC to VPC peering. And then we can scale that central resource as opposed to each team having to own and control their own set of gateways. And yeah, and like I said before, we're using the hardware gateways for the public edge, which also gives us ECMP, which is really nice. You can't get ECMP within Amazon because it doesn't have the notion of such a thing. It relies on DNS or an ELB, but if you pipe it out to an external gateway, then you can anycast that and then ECMP that traffic over GRE tunnels to your compute nodes. So this brings me to my hopes and dreams. Wouldn't it be great if I didn't have to deal with gateways? That's by far the most contentious part of the solution. Like when you compare a contrail with like say Calico, for example, the first thing they bring up is like, oh, well, you don't need gateways because it just magically works. But part of the value of contrail is that you can do service chaining, you can do all this intelligent things with it. So we're like, well, what if we just put packets right on the wire, would this work? We said, well, if a top rack can just learn the routes for the tenants from like the central controller, then that will fix all of my, you know, BGP, IGP style, Claus Fabric routing. So I know the next hop for this destination, this tenant is the compute node. I put policy on it just like I always would. And then when I need to service chain, you know, I could just then leverage the tunnels. So we're hoping this can come to fruition because we really view it as kind of a best of both worlds. You could put a compute node in just have the traffic immediately exit the V router if you don't need multi-tenancy, which is great for the enterprise, but it also works great on Amazon because you don't have to deal with, you know, any of the complications of what we're working with right now. So please, contrail, please. So yeah, that's the end. There's a cool League of Legends game graphic and any questions? Thank you. Just to ask you, what sort of services are you looking to have chained? We have the no services chained right now actually. Yeah, we've looked a little bit into it. Most of our service chaining, we actually have the application developers leverage service discovery. So like my app looks up in the central database of what other apps I need to talk to and then they just create TCP connections between them. So we haven't had the need to use the network to bend the traffic, but we've thought like maybe for like an IPS or some sort of transparent bump in the wire service that we want to kind of hide that people can't see or just, I won't tell you what the project name was called, but we'll call it latency as a service was something that we wanted to add. I picked out a particular ISP. But yeah, so we'd want to put that in and then just slow everyone's packets down to simulate a poor network connection so we could test out what that experience would be. So those are some things we've thought about. Yeah, just like, because I sort of done that sort of case before with Calico, let's talk offline on ways we can do that. Yeah, yeah, absolutely, yeah. Any other questions? My favorite champion is Thresh. Thanks for asking. Thank you, Doug, for everything. Let's give him another round of applause.