 We're going to go ahead and get started. Thank you all for coming on a Friday. I really appreciate everyone coming out for our talk. The title of our talk is One API to Rule Them All. I'm Keith Maddox. I'm a senior engineering lead for the Open Service Math Project at Microsoft. I'm also a co-lead of the Gamma Initiative, which we'll talk about soon. I'm John Howard. I'm a software engineer at Google working on the Easter project. I'm also a co-lead on the Gamma Project, which we'll learn about soon. Oops, wrong way. All right. Before we get into it, I just want to give a very, very brief, I promise, history of where we're coming from to give some more context on where we're going. So in the beginning, Kubernetes was launched. It gave us a conference to go to and talk about things. It had some service mesh features that has service as a resource, so I'll give you some amount of functionality there, but not all the rich functionality that we want out of a service mesh today. So pretty shortly after, new service mesh products were starting to pop up. LinkerD2 and Istio were some of the first ones bringing things like HEP and GRBC load balancing and routing, MTLS, telemetry and tracing, and much, much more. Over the years, a lot of other products started popping up as well, giving more and more service mesh implementations. This one kind of offering their own feature set and custom API. So the custom API part is really what we're going to be focusing on today. So if you look at, you know, a service mesh, maybe you want to do a canary deployment, right, and that's what you're adopting service mesh for, or some other traffic routing mechanism. In the landscape today, if you want to do that, you need to decide, do I want to do it with a virtual service, a traffic route, a service router, a service profile, a virtual router, and many, many other ones, right? So these are all real APIs to configure the same thing from different service mesh offerings. And so while they all do the same thing or roughly the same thing, they all have different communities, they all have different documentation, testing, ecosystems built around them, slightly different semantics, right? So it becomes a bit of a mess to figure out what you want to do for something that's actually fairly quite standard and simple to do like a canary rollout, right? Yeah, and so this landscape, I'll go back a slide just to really hone in on this. This landscape is a bit, is a lot to work with, right? When you are somebody who's trying to build software on top of a service mesh, when you are trying to provide tool to the ecosystem, trying to make your way through this proliferation of resources is difficult. And honestly, it prevented real innovation in the space. And so circa 2018 or so, the SMI spec was launched. SMI stands for service mesh interface and it was a, the goal of the project was to create a single set of resources of APIs that are supported by all service mesh offerings. And so we saw a good benefit success with SMI. Projects like Flagr and Argo Rollout supported the service mesh interface as a common specification for service mesh implementations to use and they would add functionality like canary deployments and progressive rollouts, A-B testing, experiments, things of that nature. And, you know, the standard looks something like this. These are both SMI resources. You've got your HTTP route group that has, matches a HTTP header for some, you know, a user agent containing Firefox somewhere. And then you've got that HTTP route referenced in a traffic split resource. And in this example, specifically, there's no traffic going to website V1. All the traffic goes to website V2, but as you can see, that's configurable and it matches the route group that we refer to earlier. And so SMI provided this, a single interface for this kind of functionality across meshes and we saw it take off. There are, Linker Diem implements a couple of SMI resources, open service mesh, the mesh I work on implements a lot of the SMI spec and things were pretty good. But over time, we started to see some cracks in the foundation. SMI, because of where it started, it made some design decisions and its specifications that were incongruent with where the community decided to go overall when it comes to routing resources. The binding authorization policy was very route and service account focused. There was no space to insert workload selectors or things of that nature. There was no content of a workload in general within the SMI spec. There is a part of the SMI spec aiming towards unified traffic metrics, but about six months after that spec was introduced, open telemetry became a thing and we weren't ready for that. We open telemetry, love the work that that project is doing. They've got a lot of exciting news that we learned about here during this conference. But, you know, we already had one direction we were going. Open telemetry went a different direction. And so there is some friction in the community at that point. We had some key maintainers move on to some other projects and we struggled to kind of gain more widespread adoption in usage and patterns to move resources from alpha to beta. And kind of fast forwarding now doing a bit of a time skip to present day. We're kind of violating the future of the SMI spec. So if you use SMI, if you've got opinions, come talk to me. Let me know. Love to get some feedback from the community there. So, you know, we've talked all about service mesh here, but if you look over into the ingress world of Kubernetes, we kind of saw the same thing happening. So from the beginning, Kubernetes had this ingress resource to describe how to get traffic into your cluster, right? And they started out with one unified API that would be implemented by a lot of different vendors. And there are tons of vendors that implement this API. But the problem was that it was so simple that it really had this lowest common denominator field to it. And there wasn't really points to extend the API for adding vendor specific things. So what started to happen was implementations would add all sorts of annotations. I think I saw Nginx as like over 100 different annotations you can add to customize things. Or they would just make their own completely different APIs. Like in Istio, that is the gateway, which is confusingly overlapping a name. But there's many other ones as well. And so we saw kind of, you know, over and over again, we saw the same pattern. Like, there's not extensions. It's not portable. We have all this kind of mess in the ecosystem. So that's where the gateway API comes in, right? This is kind of a new-ish. I guess it's actually been around kind of a while now. We were talking about this last year as well. That's been introduced into Kubernetes by a unified service networking model for Kubernetes. That's learned from the stakes of both the Ingress API as well as all the other vendor APIs that have been around for the past few years and kind of learned how users want to configure things, you know, what the right way to arrange resources is, how to correctly like give those to different roles, all sorts of things like that. And one of the key parts of it as well is that it's extensible. So while a common core will be there in the API, there's always going to be some obscure vendor thing that they want to add to these APIs, right? So extensibility is built into the core to try and ensure that we don't get that same, you know, proliferation of annotations, people forking off new resources, that sort of thing. So this API has been really successful. It recently graduated beta. There's over 15 implementations. You can see a whole list here. I'm sure more to come as well. And so this has actually been going quite well for the Ingress space. So just a brief overview. There's actually some deep dives from other talks that you can go through this week if you want to learn more. And I have a bunch of links at the end. But just an overview of the Gateway API, rather than a single resource like Ingress, it's been split up into many different resources to give kind of a role-oriented resource model so that different roles can configure different resources and they can be segmented with RBAC, et cetera, right? So the core resource really is Gateway, which is like the entry point to your cluster that typically would actually go provision a real load balancer or a real pod that actually implements, you know, the proxying. And then application developers can add HTTP routes and there's also TCP routes, TLS routes, UDP routes, and even now GRBC routes to kind of go add routes to this load balancer, right? And you can see in this example, we have two different application developers living in two different namespaces that are both configuring the shared Gateway resource. Here's an example, a concrete example of what this may look like for probably the simplest possible route, which would just be exposing this food service through the food gateway. And you can see these resources are kind of interconnected by references to resources, right? So I'm gonna... No, no, I'm gonna stay on this slide for a bit and just back up and think about the narrative that we've been talking about here. These two initiatives, you know, between Gateway API and SMI kind of happened around the same time and it was a bunch of people coming together and trying to solve the same problems. And, you know, as an SMI maintainer, I became aware of what was happening over in Gateway API world for Ingress and a thought occurred to me, you know, why does service mesh networking need to be standardized in a separate workflow from Ingress networking? And so I sent... This is an excerpt from an email I sent to a lot of my very talented colleagues in the service mesh community. And I'll read it here. The features present in the spec referring to Gateway API are by no means specific to Ingress traffic. In fact, just about every project represented here in the email thread exposes near identical functionality to their users. Therefore, the greater cloud native community would greatly benefit from Gateway API being a universal set of resources to describe all Kubernetes traffic, North-South and East-West. SMI and Gateway API both sought to become standards for their respective network directions. But what we've never had in Kubernetes is a single way to describe traffic going both in and out of the cluster and across different services. And to that end, I just pulled an Avengers move and tried to get together a group of very talented people to work on this problem. I want to say, yeah, around early to mid-July, these two blog posts were authored simultaneously. One from the SMI blog, the other from the Istio blog, announcing a new collaboration between members of the SMI community and Istio as well as other service meshes throughout the ecosystem. To form something called the Gamma Initiative. Yes, it does sound like an Avengers team. I'm a big Marvel nerd if you can't tell already. And so the naming was somewhat accidental. Wink, wink. Yeah, so the Gamma Initiative, which is a long acronym that was forced in for Gateway API for Mesh Management and Administration, very fancy, was like we said, was a kind of initiative to bring all the benefits that Gateway API has started to succeed at in the Ingress space but to service mesh, right? So it's actually a very fancy name and everything for quite a simple concept. Like how do we apply these same lessons we've learned in APIs but for the mesh world? So this is not a new project or product or anything. This is more like just a group of people that have meeting every week and talk about mesh and Gateway. So yeah, like I said, our goal is to have this unified API not just across the vendors of meshes but also between Ingress and mesh, right? So if you want to do a canary rollout, for example, you might want to apply that for external traffic and also the same for internal traffic. Without a unified API between those two, you need to go learn two different APIs to solve the exact same problem. That doesn't sound great to me. So we've been meeting ever since July. We have weekly meetings. If you want to get involved, I have a bunch of resources at the end. And since we started, we've gotten a huge number of different service meshes on board. These are all the people I saw coming into the meetings, sharing these ideas, interested in implementing this. So if you look at this, this is not a group of people you typically see all agreeing on one thing. I was a bit surprised myself, but it's true. It's happening. So really the full ecosystem is starting to collapse around the Gateway API, both in the service mesh and in the Ingress ecosystem. So this is really powerful, right? This 15 seconds ago was now probably 15 minutes ago, but we have just recently emerged kind of the first big milestone for the project, which is, if you recall, I showed an HTTP route example, which is like matching and routing HTTP traffic. We just added support for really defining how that works in a service mesh, right? That's kind of the base foundation for the project so that we now have a common language that's used across all the implementations. And I'll get more into what is coming next. So here's just an example of what this may look like for a very simple traffic split between two different services. So as you can see, before we were attaching, we had this parent-ref resource attaching to a gateway. For a mesh, we attached directly to a service, and this says that all traffic going to the foo service, 90% of that should go to foo, but 10% of it should go to foo v2, right? So this enables you to do a Canary rollout. Yeah, actually, let me keep going. So one thing I want to say, we've talked a lot about, okay, uniformity between the vendors. And something I get asked a lot is, you know, I use service mesh X. I don't care about service mesh Y, right? Why do I care that they have the same API? I'm not going to run them both at once, right? And I get it, that's actually a fairly valid point, but one thing that I think has missed a lot is the value of having one ecosystem, right? So even if you only use one mesh, it's still valuable that the other meshes are doing the same thing, right? I saw a talk earlier on Flagger, which automates Canary rollouts, right? And I looked at their docs, they had a page of like the 15 different vendors they implement. They have 15 different APIs, 15 different code bases for this, like it's a huge mess for them to maintain, right? With something like the Gateway API, that can be dropped down to just one API that they implement, right? There's one ecosystem. There's one set of documentations for how Gateway API works rather than 15 different meshes trying to make their own documentation. There's one set of tests. People post tutorials, whatnot on YouTube. Those will all be Gateway API. And it's much of the same reason why Kubernetes itself has been so successful, right? It's this common set of APIs across Cloud vendors, et cetera. So even if you use one mesh, even if you don't use mesh, this uniform anyway still be great for you. Go ahead. Yeah, and hopping on top of that, I think it's important to call out that this was done in, you know, our first meeting was in July. It's now October. We do some quick math. You know, that's about three months. And, you know, we've been meeting weekly. This may not look like a lot for three months, but there, if we've got links on the end linking to our minutes from our agenda. And there has been a lot of conversation that's driving these changes. A lot of consideration of different methods. We've went through like eight or nine different approaches by the time we got to the end of everything to figure out what's gonna be best for you. What's gonna be best for the users in the ecosystem from mesh implementers, for end users. What's gonna be the most natural set of APIs. And so a lot of back and forth is done both on a technical merit perspective as well as a user experience, a lever experience perspective for something that, you know, is one line of code difference in a YAML file. But it comes after months of deliberation and wanting you to be a part of it. So now you may be wondering, where are we going next? We've taken care of HTTP route. There are a couple of follow-ups, PRs to clarify some of the language. You can actually go and see that gateway enhancement proposal on the gateway API website right now if you want to. There are some proposals that we are going to be adding to that to clarify some language in some edge cases. But what's next on the front here is the gateway for, you know, for the GAMMA initiative. One of the big things that we're talking about right now is authorization policy. Right now, again, similar to routing, just about every single service mesh has its own off-Z mechanism. With Isti, you've got authorization policy. Oh, I assume you've got a traffic target. I'm blanking on some of the other meshes. Yeah, what? Console has intentions, there we go. Yeah, this is still a fragmented ecosystem. If we can come up with a set of patterns on how service meshes and gateways potentially can do authorization policy, we'll be better off. And so that's one of the low-hanging fruits that we're deliberating on. Egress has come up a lot. When you talk about getting traffic in and out of a cluster, traffic going between different services, how do I limit the number of, not just the number of, but the specific destinations that a service or a gateway is allowed to send traffic to? We've kind of deferred that in our initial set of standards, but it's getting to a point where we're starting to really reexamine at what stage we need to tackle this. Policy attachment has been a huge conversation for us as well. For those of you who might not know, the Gateway API actually has a first-class idea of attaching policy to a route to any number. I think it's like seven different layers of abstraction throughout the API. And it's kind of a bit of uncharted territory. There haven't been a ton of examples of it in the wild. If I had to explain it briefly, the idea of Gateway API policy attachment is it's got a set of, you know, any CRD you could think of, any set of functionality you really want to put into a resource, but there's this system of setting defaults and of overrides that plays really well with the, I'm just going to go all the way back. It plays really well with this model with Gateway API. One of the big ideas behind Gateway API is persona-based development. And so your application developers don't need your cluster operator permissions. Policy attachment does that same thing. If I, as an application developer, want to set a time-out policy for my service, I should be able to do that without having to use cluster operator permissions and put that right in my namespace that I own. And so that contract exists already for Gateway API. How do you bring that into mesh? How do you bring that same persona-based development into the mesh world? And we're having conversations about that. Different kinds of policies you can think about us adding here. I mentioned optimization policy is one. It's kind of weird, so it's a little bit different from some of the other ones you might think of, so that might be a special case. But rate limiting, circuit breaking, all these service mesh features that you've come to know and to love and to rely on for your business critical applications, we want to bring those and standardize them and make them simpler. Maybe that's through a set of generic patterns. Maybe that's through a collection of resources at some point. We don't know yet. We take a very iterative approach to our development cycles. And one of the other really awesome things about Gateway API that makes it a great environment to do this work in is that it's got different levels of conformance and stability guarantees throughout the specification. You've got the stable and the experimental channels for different features to allow us to get work out there. Again, we've got our resources, this resource out here in three months. We're able to do that with a relatively high degree of confidence because we know we can market as experimental. And so we're going to continue to kind of move in a relatively quick pace to try to get these things out there. And we want you, anybody who's implementing a mesh, anybody who's using a mesh to get your eyes on these things and use them and to give us feedback so we can create a better spec. That was a lot. But yeah, how to get involved. There are, like I mentioned, several resources that are available. We've got the Kubernetes Gateway API website with a whole page on things you can do in order to in order to start contributing to the specification. I'm not sure what this points to anymore. But it, yeah, that's probably just the gamma page on the Gateway API website where we have links to our calendar. We've got links to our meeting notes. You've got another link to our meeting notes there. And then, you know, there are issues. Our issues are filed under the Gateway API repo under Community SIGs. And so we're going to try to do a better job of labeling things that's good first issue. A big thanks to Shane who created a milestone to track the work that we're doing in order to get all of our gamma, our mesh working Gateway API to a point where implementations can go out and implement it. And a lot of them already have plans to do so as soon as that light gets switched on. Like we mentioned before, we have weekly meetings and we actually do something a little interesting with our meetings. We alternate them on, we alternate time slots. So I think the meeting that was supposed to be this week that we canceled for KubeCon, it was going to be at 8 a.m. Pacific time. But the next week's meeting will be at 3 p.m. Pacific time. And this is just to try to be as inclusive of other times as possible. There are people who can't make it to one so they can come to the other. And we've tried to, we've had some processes we've implemented to try to make sure we are, you know, keeping track of context between meetings. But please, if you can make it, we really love for you to join our weekly meetings. There are PRs and get that there to read on the website. And we're on the Sick Network Gateway API in Slack. It's the same community between Ingress and Mesh. Trying to make Kubernetes a better place and make it easier for everyone to use it. So thank you all. Thank you. That's the end of our talk. You can clap if you want to. So we realized that that was probably maybe a lot of new information for you. If there are any questions, we've got Mike over here who's got a microphone who can run around. So we've got one right here queuing up one behind him. So on your slide that showed all the Mesh people that were participating, you said you pulled a Marvel move, I believe, a Avengers move or whatever. What actually is the backstory? Because did it really just happen that easily or was there a little more back and forth to get people engaged? Good question. Yeah, I mean, to be honest, this was my goal ever since we first introduced the Gateway API before we even really made publicity about it. I wanted to use it for Mesh as well. At the time, Istio was really, or maybe just me personally, was really the only one interested in that. And so I really wanted to go start doing this right off the bat, but I was worried about having only one vendor behind it and that we would just accidentally make it a new Istio API that no one else adopted, right? So we really were trying to get people, actually last year I was just going around to everyone's like, how do you feel about Gateway for Mesh? How do you feel about Gateway for Mesh, right? And trying to get people on board. So it was a long process. Eventually different vendors had different reasons for why they're adopting it, right? Some of them made a see like, oh, everyone else is doing, let's go for it. Linkerdee had a great blog post about why that's not what they care about. They don't care that everyone else is doing it. They just really like the API and it solved real problems for them, right? So to be honest, it did kind of just happen, but there was a lot of work behind the scenes to kind of get everyone converged and on board. So we don't have success until we hear it. Exactly, yes. That's a good way to phrase it. We had Mike, one right in front of you right there. Thank you. Hi, so with the move to like standardize the APIs, find generic patterns, right? What do you suspect will become the decision criteria for one proxy versus another? Ooh, good question. So as far as how we're trying to design the API, we're doing our absolute best to remain data plane proxy agnostic. And that's actually a principle that also exists within the Ingress gateway API. You've got on board based implementations in Ingress gateway API. You've got like platform cloud provider level implementations like GKE, for example. You've got, you know, trying to think of, you've got Nginx, HA proxy. All those have different data planes. And, you know, one of the criteria for introducing APIs is we try as hard as we can to make it work for both of them. If it can't, there is a custom conformance level that exists in gateway API for data plane-specific implementations to host their policies, host their implementation, things like that. So we hope that we don't have to make that decision. We hope we can create an ecosystem where different data plane implementations can work together and come up with best practices for their proxy and then consumers of that proxy can use them. Hope that answers your question. Oh, hi. Just, I think maybe just related to that previous question. Yeah, do you see the gateway API as, you know, if my application is going to use service mesh that I am going to be deploying my application solely with gateway CRDs? Or do you envision it as, well, probably 80% will be these gateway core CRDs and 20% will still be vendor-specific configuration? Yeah, so that's a great question. So, you know, a core API cannot possibly have all the functionality that all of these different service meshes offers, right? That just doesn't work. So like I said, extensibility is really the name of the game. We're hoping that most of the core use cases can be defined in the core APIs. And the more specific use cases are kind of in the extensions, right? So if you're just getting started, it's probably good enough to just use the core APIs. Once you start getting into more and more bespoke requirements, you may start needing to leverage those vendor extensions. Over time, some more may move into course. That may become less and less. Right now, there's not a whole lot in there, right? That's just routing. So as soon as you want to do some like authorization policy, you'll need to reach to a vendor extension. In the future, we hope that gets less and less. Mike, there's someone right here. This is based on the slides I've seen here. The Gateway API actually means to me, at least, is not so traffic, not necessarily east-west. I may be wrong, but just trying to understand what's your opinion about east-west traffic for the Gateway API. Is it even a thing? Yeah, yeah, so that's kind of the whole reason we started investigating this work stream. Gateway initially was focused towards ingress, but from the earliest days talking to Rob and Shane and Nick and some of those early maintainers, it was always kind of designed to that mesh could work if somebody was willing to investigate it. The naming is very hard, as I'm sure many of us in the room know, and the Gateway API actually had a previous name before that. That was more generic, but that caused some confusion. And so, despite the name Gateway, we really do see the specification itself as being useful and applicable to both north-south and east-west traffic. Yeah, the reason mainly here is like, it's between complexity and making it simple. The difference between having everything being handled by, let's say, Envoy, right? East-west or north-south, ingress, egress, and internally. But then you want to simplify that and make it into a core Kubernetes API. You know, we are trying to sort that out. You know, just getting some more clarity on what exactly is the vision. Are we going to be doing similar what ServiceMesh does? And then there's a lot of questions about what is ServiceMesh now. Gotcha, okay, so vision. Yeah, so for me, I think somewhere else, I heard another question similar to this, but I personally believe that ServiceMesh will always be, implementation will always be separate from Kubernetes. Our vision is, at least for me, a set of APIs in Kubernetes to describe traffic. Those, you know, Gateway API exists as a CRD, but it's within the Kubernetes organization, which actually helps a lot with acceleration, velocity, and productivity. You don't have to go through the lengthy process. But, you know, if there is a single set of APIs in Kubernetes for routing traffic period, why shouldn't your ServiceMesh implementation understand that? And that's the vision for us, for a new person who wants to use Kubernetes. They can use, you know, ten years in the future, Gateway API will be the de facto thing that they're seeing on tutorials that they're learning. And then when it comes time for their organization to adopt a mesh for regulation for MTLS or for more powerful extensions for Canary and things of that nature, they can use those same APIs that they essentially grew up with in Kubernetes and use that for their ServiceMesh. No new overhead to go and learn a whole new set of APIs just for the base level functionality. Our goal is that people can grow with their products, and they don't have to have this high barrier to entry at the very beginning. That's my vision, at least. Anyone else? Questions? All right. Well, we ended about four minutes early. Thank you all again for your attendance. Thank you, guys.