 Hi everybody, this is what we learned from the Gateway API, designing Lincredi's new policy CRD. This room is so big, I don't know where to look. So this is a talk by my colleague, Matei David. He was not able to make it, unfortunately, because of visa issues, so I'm giving this talk instead of him, so hopefully I do a good job with it. So my name is Alex, I'm a software engineer at Boyant. We're the creators of Lincredi. I've been a Lincredi maintainer since the beginning of the project, and it's something I'm really passionate about. I think service meshes are very, very cool, and we have recently done a lot of really interesting stuff with the Gateway API that I want to talk about. Okay, so show of hands, who is familiar with Lincredi here? Okay, a lot of people, great. What about who's familiar with the Gateway API? Still a good number, but fewer, okay, cool. So I guess I don't have to spend too much time talking about Lincredi itself. It's a service mesh, it's a graduated project in the CNCF, it's used in production in many different places by many different people, and it's a very collaborative and active open source community. So as a service mesh, that means that there is a sidecar proxy in every pod that's part of the mesh that adds a lot of really interesting functionality like MTLS between all your services, reliability features like retries and timeouts, and a lot of observability features like layer seven metrics on success rate, request rate, and latency. And with a major focus on operational simplicity. So the whole thing kind of works out of the box and doesn't require you dedicating a lot of brain power to making it work. So the main focuses behind Lincredi are just that it's supposed to be very, very lightweight both in terms of resource usage, which means that it doesn't take up a lot of memory and it doesn't add a lot of latency, but also lightweight conceptually in that it's a very simple model that you don't have to think too much about and you can operate it without getting bogged down in the details. So being simple and secure right out of the box has been kind of a guiding principle for the project. The control plane is written in, this says go, but it's actually now a combination of go and rust, and the data plane is a custom built proxy written in rust built to be ultralight. Okay, so I'm gonna give a little bit of background here about how authorization works in Lincredi. So this is a fairly new feature that was added in Lincredi 2.11, I think sometime last, earlier this year or late last year maybe. And that's kind of set the stage for why we care, but the gateway API and how it's gonna help us. So the idea is that we wanted to add a way to do authorization in Lincredi. So to take a step back, Lincredi has MTLS and it's had MTLS for a very long time. And what that means is that for every pod in your service mesh, it has an identity and it has this workload identity that has been built up from its service account token and that gets transformed into a certificate which it uses for all of its communication to other pods. And so that means that when two pods talk to each other, the M and MTLS means that they can mutually authenticate their identity. So if I'm talking, if I'm the foo pod and I'm talking to the bar pod, I can cryptographically know that the person on the other side of this connection is someone with the bar identity and likewise they know that what's on my side of the connection is someone with the foo identity. And this is really awesome because it means not only is that connection encrypted, but it's also we know who the other person is and we know that securely. And so we've had that in Lincredi for a really long time but what we haven't had is really a way to act on that information, a way to do authorization which is to say, I'm only going to allow certain identities to talk to me or I'm only gonna authorize certain parties from establishing connections. And so we wanted to add that and that was a major focus of Lincredi 211. And so as we were kind of designing that feature, one of the things we really had to think about is well, where does that configuration live? And as I mentioned on one of the earlier slides, one of the guiding principles of Lincredi is to kind of minimize configuration, make it easy to use, make it just work out of the box. But there's a few places where you really do need user input in order to decide what the behavior should be. We can't automatically determine who should be allowed to talk to whom. That's really something that has to be configured because only the people operating the cluster know what that behavior is supposed to be. And so this began the thought process of where should we put this configuration? And a natural place to put that configuration would be on the service, right? When we think about services in Kubernetes, we think, hey, I want to restrict access to this service to only certain identities. So that seems like a natural place to put that configuration to somehow attach that authorization policy onto a service. But as we kind of got into this, what we realized is that service might not actually be the best place for this. And the reason that is, is because when you think of service, there's actually kind of two concepts that are sometimes co-mingled there. I think the Gateway API kind of refers to them as service front ends and service back ends. I like to think of them as service targets and service receivers. And so what I mean by this is that the service kind of has two parts. So there's the service front end or the service target, which is usually a cluster IP or a DNS record. And that is something that a client sends traffic to, so it's a target. On the other hand, there's the place where the traffic actually goes after it is sent to that service. So this is usually a list of end points or a list of pods or a list of back ends. These are sort of where the service goes or this is where the traffic goes when you send it to this service. And a service object kind of co-mingles those two concepts. And if you wanted to attach policy onto that, authorization policy, you run into this problem where you can have pods, back end pods, which are in multiple services. You can have back end pods, which are not in any service. And so when a back end pod receives traffic, it really doesn't know which service was used, which service was targeted to send traffic to me. So in other words, if we had authorization policy attached to these services, we wouldn't know which policy to use. And furthermore, if a client connected directly to that pod without using a service, does that mean that it should bypass the authorization policy? So of course, this doesn't make sense. So we had to come up with a different resource to kind of encapsulate this idea of a traffic receiver rather than traffic target. In other words, put this authorization policy directly on the service back end, not on the service front end. So we came up with these new resources, server and server authorization. And if we take a look at what's here in the server resource, it's very, very simple. So there's just a pod selector, which selects which pods this service refers to, or this server, I'm sorry, this server refers to, and a port. And so this is another difference from services where a service can have many ports defined and you can actually use a service and target a port that's not defined on that service. With a server, it's specifically talking about one single port. And so this defines very specifically a traffic receiver. And so therefore we can set authorization policy on that server. And we can say, well, for this server, here are the clients who I want to authorize, who I think should be allowed to talk to the server. And so that's what the server authorization resource is. It's a resource which selects which server it's gonna apply to, and then it's gonna give a list of clients that are allowed to connect to it. And that list of clients can be defined in a number of ways. You can say I'm gonna allow anyone, just unauthenticated and anyone's allowed to connect to it. Or you can say anyone, as long as they're in the mesh and they have a valid identity they're allowed, or you can restrict it down and say only these certain identities are allowed. And so those two together make up the linker de authorization primitives. So this is kind of taking a step back what that kind of looks like. So you see on the right there, there's the bar pod and that's the one we're trying to control access to. Above it you can see that there's a server that's defined for it, which selects a certain port. So in this case, we're just talking about port 80. And then above that, there's the server authorization which says here are the clients which are allowed to connect to the server. And in this case, we're only gonna allow MTLS connections from pods that have the foo service account identity. And then so on the left hand of the slide there you can see that on the top there's the baz pod which does not have the right identity so it's not allowed to connect and that connection will be rejected by linker de. And then at the bottom the foo pod which is allowed because it does have the right identity. So it's allowed to connect. So what does this have to do with the gateway API? Well, let's give a little bit of an overview what the gateway API is for those who aren't familiar. So the gateway API is the set of Kubernetes resources that are very useful for defining the behavior of gateways. And one of the kind of key ideas here is that there are a bunch of different resources which exist at different layers and are owned by different personas. So at the top there you see there's the gateway class which is a resource representing a type of Ingress or a type of gateway rather. And then below that you have the gateway resource itself and so these might be owned by cluster operators. And then below that you've got HTTP routes which can be owned by individual application developers and those attach up to the gateways and say here for this route, I'd like to attach this to the gateway and if you get any traffic for this route please send it to the service. Here's another route. If you get any traffic for this route please send it to this other service. And so there's kind of some cool ideas in here. This idea that you can have these resources at different layers which kind of attach up to each other and that they can be owned by different personas. So this is kind of how that looks as an Ingress is as Ingress traffic is coming in. That traffic first goes to the gateway and then it's going to look and see which HTTP routes it has attached to it and whichever route matches that traffic is going to determine where that request is gonna go to. So what does this have to do with service measures and how is this useful? So this is all kind of to do with Ingress which is not something that LinkerD really does right now. LinkerD is more interested in managing east-west service to service traffic within the cluster. So how does this apply? And if you take a look at the structure of the HTTP route resource there's a few kind of interesting parts to it. So on the top right you'll see there's a parent ref on an HTTP route and this is kind of describes what that route attaches to. So in all of the Ingress cases we had that attaching up to a gateway but you could imagine that attaching to someone else something else. You could imagine having an HTTP route which is attached to a server or a service in your cluster and defining policy for service mesh traffic rather than for Ingress traffic. And then down at the bottom you see that it's kind of made up of two parts down there. There's the match which defines what kind of traffic is gonna match that rule. So that's usually path, either path based matching or header based matching or a combination. And then on the right you have the behaviors that should be taken for traffic which matches that route. So either a set of filters that apply some logic or some back end refs that indicate where that traffic should go. And so this is kind of an example of what that spec might look like in YAML. And so the kind of idea that we really latched onto here which we thought was really interesting is well one this gives us a way to talk about routes which allows us to specify policy in a more fine grained way. It means that when we're talking about authorization policy we don't have to authorize an entire server and say for this server here are the clients that are allowed to talk to it. We can be a little bit more specific and say well we actually only wanna authorize these clients for this route on this server. We wanna say that if for example you have a admin server you might want unauthenticated access to the liveness and readiness probe endpoints because that's gonna be hit by the cooblet. You might wanna make sure that Prometheus has access to scrape the metrics endpoint. You might want to further lock down other endpoints that are let people do administrative tasks and make sure that those are only accessible to people who really should have access to them or service accounts that should really have access to them. And the other interesting thing here is that this gives us a way to kind of attach not just authorization policies but potentially other kinds of policies as well onto these resources. So this is kind of the structure that was informed by that in Lincardy. We adopted the HTTP route type from the Gateway API. We had to modify it a little bit in order to fit our purposes but we now have support for HTTP routes and those can attach onto servers so that you can authorize routes rather than authorizing an entire service. So it gives you that more fine-grained control. And we also made this a little bit more generic so that when you have an authorization policy that can kind of target a wide range of different resources depending on how granular you want that authorization to be. So you can authorize access to an entire namespace just to a specific resource or even more specifically to a specific route. And so this was added in Lincardy 2.12 which was released fairly recently so we now have much more granular support for the way you can do server authorization. Okay, so looking to the future, what's coming next? So if anyone is kind of familiar with some of the more advanced features in Lincardy, one of them is called service profiles. And so service profiles are a feature we have in Lincardy that allow you to configure things like retries, retry budgets, timeouts, and do that kind of on a per route in a per route way. So you can say these routes are retryable, these routes have timeouts. And that's gonna sound pretty similar to the stuff that we've been talking about earlier which is attaching policy onto HTTP routes defined in the Gateway API. So these are kind of two parallel ways of doing the same thing and we want to kind of slowly unify that and move away from service profiles and move more towards this Gateway API style approach of policy attachment where you can have resources which represent policies like retryability or timeouts or authorization and you can attach those policies onto HTTP routes. And this gives us a much more, an approach that's a lot more consistent with the way things are done in the Gateway API and feels more Kubernetes native than what we had before. So why are we doing this? What is the purpose of adopting the Gateway API? I think there's a few reasons of why the Gateway API is really interesting for defining these types of things. One of the primary ones is that these Gateway API types give us a standard way to define this rather than having to come up with these new structures or new types ourselves. So rather than needing to maintain our own version of HTTP route or route, we can rely on what already exists in the Gateway API and that can kind of be maintained independently of our, of, of Linkerty and we know that it's, you know, it's gonna be kept up to date. So that's a lot easier for us as maintainers. It's a lot, also a lot easier for adopters who are looking to use these technologies. It means that if they are familiar with the Gateway API, they've learned Kubernetes and they're trying to adopt Linkerty, they don't have to go and learn an entirely new set of APIs and an entirely new set of resources and an entirely new set of concepts. They can just use what they already know and it should feel intuitive and natural. And the Gateway API is just a well-designed API. It's very well thought out. It's got extensibility points built into it in places that make sense. And it's, it's really nice to work with. It's very intuitive. The cons of adopting the Gateway API are that it's still fairly new and it's still kind of evolving a little bit. So there may be still things that we don't know or that we discover over time and there's always a possibility for API churn as it continues to mature. But I think being part of that conversation and adopting it now, I think makes a lot of sense for the project. So I'm very excited to see where it goes. And in fact, you know, the Gateway API was originally intended to be used for gateways. But there's an initiative called GAMA, which stands for the Gateway API for Mesh Management and Administration, which is working through how the Gateway API should be evolved to accommodate service mesh use cases. So we're very involved in these discussions. If this is something that's of interest to you, I highly recommend that you get involved as well. And kind of as these discussions continue to happen and we continue to work on this, we're going to be at the forefront of supporting whatever GAMA recommends for using the Gateway API in a service mesh context. Okay, so what are the main ideas here? I think the main things we learned from kind of working with the Gateway API and trying to adopt it and use it for linkerty is that this difference between traffic front ends and service front ends and service back ends was really important. Trying to reason about the difference between a traffic target, in the case of a service, this is something like a cluster IP or a DNS record and a traffic backend or receiver, which is the actual endpoint or pod that's receiving that traffic. It's really important to keep that difference distinct in your head, otherwise things can get very, very muddy. And I think the Gateway API does a really good job of calling out that difference explicitly. The other idea that we really wanted to lean into is to not reinvent the wheel. We wanted to use the Gateway APIs because they made sense for us and it didn't make sense for us to just kind of come up with those APIs on our own when we could be engaging in the community and kind of playing nice in the ecosystem. And this policy attachment idea from the Gateway API where you can have these policy resources that represent things like client side policy, like routing or circuit breaking or retries or timeouts or server authorization and having those attached onto HTTP routes or onto higher level resources is a really nice framework and it makes a lot of sense for gateways and it also makes a lot of sense for meshes. And so we wanted to kind of lean into that. So if you're interested in kind of learning more, kind of diving deeper into a lot of these concepts around the way that Lincredi uses the Gateway API or other deep dive Lincredi concepts, there is a Service Mesh Academy that I can highly recommend. It is a monthly hands-on training. Their next one looks like it's from November 11th to 17th and if that is of interest to you, I highly recommend you check it out. You can check it out at buoyant.io slash SMA, Service Mesh Academy. And if you're interested in kind of running Lincredi without taking some of the administrative burden off, there's also a fully managed Lincredi which is called buoyant cloud and it can handle things like automated upgrades, version tracking, certificate rotation and a bunch of other administrative tasks that would otherwise fall to the clustered administrator. So if that's of interest, I highly recommend you check that out as well. And I also want to shout out once again, this is talk is by my colleague Matei. He did a really good job with it. If you liked this talk, please make sure you contact him on Slack or on Twitter and let him know. I can't take credit. But other than that, thanks for listening and happy to take any questions. You mentioned one of the cons is Ingress. Is it a replacement for your Ingress or can it complement your Ingress like a traffic? It complements your Ingress. So there's kind of, your Ingress may be using the Gateway API, but we are also using the Gateway API for East-West traffic inside the cluster. Awesome, thank you. Staying with the cons for a moment. So the other con he talked about was the Gateway API is kind of new, especially with working with a service mesh. So we don't have a service mesh yet. We have Ingresses that are con and I understand con can work with the Gateway API. So if I go back to my team and say, okay, let's start using the Gateway API for our con stuff and now we can start sort of getting into using Linkedee through Gateway API as well. Am I jumping the gun? Is this production ready? Or is it too early to be thinking that way? No, I don't think you're jumping the gun. So I think these are kind of mostly separate concerns, right, you've got con for your Ingress, you perhaps have service mesh needs, you can use Linkedee for your service mesh. I think there's gonna be a little bit of a confluence in the future between the types that they use, like if you have HTTP routes defined on your Ingress, you may be able to eventually use those to attach policy onto that Linkedee will respect. But there's not any kind of conflict between the two.