 Okay, thank you all for coming. I'm happy to see that the room is so packed. If this is your first time attending this session, we give a similar session at every KubeCon. Naturally, there are a lot of updates to get people up to speed with. But if this is your first time, this is a pretty laid-back session. We're just gonna talk a little bit about liquidity, about service meshes, what's been happening, where are we going? And I hope that I'll finish in due time so at the end you can ask me a bunch of questions. So please prepare questions, otherwise we're gonna have to sit in awkward silence. I don't think, I don't like that. I don't wanna do that. So, yeah. Before we start, just so I know what to talk about, can I have a quick show of hands for people who heard about service meshes before? Oh, okay, okay. Anyone here ever used Istio in production? All right, what about LinkerD? Where are the LinkerD droids? All right, all right. Cool, last question, I promise, and then I'll start. Anyone here ever needed to do circuit breaking in production? Okay, well, you're in for a treat. A little bit about myself, my name is Mate. I'm one of the maintainers for LinkerD. I'm a software engineer at Boynt. I've been involved with the project for over three years now. I started as a mentee for a community program sponsored by the CNCF, and since then I've been working on it full time. If you wanna get in contact with me after the talk is over, you can find me on Twitter. I'm also on GitHub, you can't really talk to me on GitHub, but you can tag me and I'll try to respond. And finally, I'm also in a couple of Slack workspaces. You see the LinkerD Slack here, but I'm also in the CNCF one and the Kubernetes one. I don't think there are a lot of Matei Davids around, so I think you're gonna have an easy time finding me. But yeah, that's a picture of me, so in case it wasn't obvious. Cool, LinkerD is a different kind of service mesh. It's an ultra light, ultra simple, easy to use service mesh. We've been in production for a large number of years. Ignore the slide, it's a four plus years, but it's more on the plus side. We have a very vibrant community. We like to help each other out. The Slack channel in particular has been my starting point with the community, and if you're not there, I really encourage you to join. Yeah, we have a bunch of companies that adopted LinkerD. You can see some of them here. And we're also the only CNCF graduated service mesh. But I think the takeaway from the slide is that LinkerD is a service mesh. So what is a service mesh? I feel like for the people who didn't raise their hands, you might think it's a very complex tool. Definitely in the industry, it's been associated with complexity. And I think that's a little bit unfair, because basically a service mesh is just a platform level tool that's given you a bunch of goodies out of the box. You get observability. You get service level golden metrics, protocol level metrics, such as HTTP success rates. You get service topologies. You get a lot of reliability features, a bunch of abstractions, traffic routing, traffic shaping, retries, timeouts, load balancing. And you get security. MTLS out of the box, authorization policies. And the reason why I say it's unfair to tie it down with complexity and to associate it with complexity is because all of you probably at some point had to do one of these things in your applications. And doing all of these things in your applications across different stacks is much more complex than just managing another tool in your production environment. Now, LinkerD itself as a service mesh is focused on simplicity. That's kind of a core philosophy of ours. We don't want to be just simple to contribute to. We also want to be simple to use. Now, the way service meshes work in general, oh, that's a loud door. The way service meshes work in general, and this is not kind of exclusive to LinkerD. All of them kind of use a similar model. You have a control plane that you deploy in your Kubernetes cluster, and then you have a data plane. We use a sidecar proxy for our data plane. It's something that we've built purposely for LinkerD. And it's written in Rust. For those of you who don't know, Rust is the new up-and-coming language, and it has no CVEs, C++, oops, spoilers again. It's ultra-fast. I keep going to the next slide. It's ultra-fast, ultra-light, and it's built on a state-of-the-art network stack. So a bunch of my former and current colleagues contributed to the libraries that power LinkerD and power much of the Rust community now. So Tokyo is the asynchronous runtime. Hyper is the HTTP library that most people use and build on top of, and Tower is a very fundamental networking library in Rust. Now with us and our proxy, we consider it to be, philosophically speaking, we consider it to be an implementation detail. You shouldn't have to worry about the proxy, right? You shouldn't have to be an expert. You shouldn't have to know how to write Rust, how Rust works, and how the proxy itself works because at the end of the day, Kubernetes gives you enough headaches. So what we want to do with LinkerD is alleviate these headaches. We want you to actually have a good night's sleep and not be woken up at 2 a.m. to fix stuff. Now, a common question that we used to get, now it's EBPF, but I'm sure someone will ask me by the end of this presentation. A common question that we get is, how does LinkerD differ from all of the other service meshes out there? I think, you know, obviously you have to take everything with a grain of salt because you're talking to a LinkerD maintainer. So I'm gonna tell you that LinkerD is the best. It's, you know, the only choice out there, but that's not entirely true. It's very dependent on your environment. I would boil it down to kind of the philosophical stances and approaches that we have to our applications and the way we develop things and the way we want things to work. For one, service meshes that are built on envoy catered to a wide variety of use cases. Philosophically speaking, we, you know, just wanna stay in our corner, make it simpler to use and a lot of the times that means not having a wide configuration area for you to mess around with. I would also say that LinkerD gives you less foot guns. I mean, you know, you don't have a lot of configuration. There are less places that you can mess up and we're very opinionated with the way that we do things. But at the end of the day, it is just a philosophical approach. If you need feature parity, if you need a lot of feature, then you need to look at something like envoy, which is a bit more bloated. We like to do things a bit different. Now, looking back at the past year, I think around this time when I gave this talk in Valencia, we were around LinkerD 2.11 and that's when we introduced authorization policies. So with authorization policies, the problem is a little bit flipped on its head because you don't have a lot of primitives that you can actually leverage to, you know, build all of the APIs that you wanna build. And basically the service object in Kubernetes, which is already super overloaded, cannot be used at all for authorization policies, right? There's no way for you as a server to know which client connected to you and which, well, sorry, which service the client that connected to you used or if they even used the service at all, right? You just get a connection. And based on that, you wanna have an expressive way and a powerful way to express fine-grained authorization policies. So in LinkerD 2.11, we rolled our own CRD server authorizations that basically selected a bunch of ports on your workloads and allowed you to have these fine-grained authorization policies per port. And as part of LinkerD 2.11, we also introduced some other things, such as GRPC retries. When you have HTTP post requests that have bodies, most of them have bodies, but when you have to do retries with them, the problem is a bit hard because you have to buffer all of these requests in memory. So how do you ensure that you don't get unkilled? And for us, believe it or not, it was even with Rust that doesn't actually introduce any CVEs and lets you manage memory in a very efficient way. Even for us, it was a problem that, we kind of banged our head against the wall for a while to fix. But in 2.11, we shipped it. You have a 64 kilobyte max payload that you can put in. And aside from that, we also made some performance adjustments. We wanted the proxy footprint to be even smaller than it currently is. And on average, I think we have like a 10 megabytes overhead. And we also reduced the control plane down to just free deployments because we thought a lot of the stuff that we have in there is unnecessary. So we tried to slim it down and package it in a different way. I think kind of looking back, 2.11 was very feature-packed. We also added multi-cluster headless support. So if you want to talk to just one instance across clusters, you can do that. You can just mirror the headless service. You can create the DNS records, and it all just works. We added fast testing, which is not very much a user feature, but it's kind of cool to say. And we also added CLI tab completion. And in this very detailed diagram, you can see how authorization is supposed to work. It's pretty self-explanatory. Also, in Lincordie 2.11, we extended Rust to the control plane, which was a very major milestone for us because prior to that, the control plane was fully written in Go. So for those of you who are kind of new to Lincordie, the control plane is made up of a couple of deployments. We have a destination service that we use for service discovery. So you kind of like cache endpoints and serve them to the proxy. We have an identity component that lets each proxy generate its own certificate. It has a maximum bound of 24 hours, but it depends on its issuer. And then we have a proxy injector, which is a mutating webhook server that basically lets you add the sidecar proxies without having to do any work. You just add an annotation to your workload, and then it will get the sidecar proxy in any configuration that you set for the proxy. But then we also added a policy container. And this policy container is supposed to discover these server authorization policies that we introduced. By the way, writing Rust is always good, and writing the control plane in Rust and introducing more Rust has been a breeze for us. It's been pretty fun so far. But I don't want to dig too much. In Lincordie 2.12, we introduced per route policy. So one thing that we noticed with authorization policies in general is that, okay, you have them scope per port, but you want to be even finer grained than that. When you have a lightness check, when you have a readiness check, when you have maybe a metrics endpoint that you want to expose to multiple consumers, having something tied to a port doesn't make a lot of sense. So we wanted to kind of find a different approach to this and let users have a little bit more control. And you'll see that that's one of the kind of trends that we're going to, where we want to give more and more control by limiting, again, the foot guns that we also provide you. So in 2.12, we borrowed the HTTP route resource from the Gateway API, but we put it into our own group so we can take some things out and add some things and not be fully spec compliant. And in this picture, you can see picture diagram, whatever. You can see how the HTTP route resource kind of looks like. You attach it to a server. A server is a construct that we use in Lincordie to say which ports a workload exposes. And then you can add the endpoints that you want the request to match on and then tie an authorization policy to it. So an authorization policy when you have it per round is associated with one or more servers and it allows you to have this fine-grained control. Authorization policy is also the new resource that we came up with, the replace server authorization because we just realized that we want to do things in a more Kubernetes Gateway API way. And I probably forgot to mention this at the start of the slide, but one good thing with the Gateway API is that it introduces this pattern called policy attachment that allows you to basically create resources that configure other resources. So instead of having to rely on annotations, which to be honest, when you start using them can get a little bit disgusting in prod and you don't have a lot of visibility into them. So instead of using annotations, you use actual type resources that you can then attach to other resources to configure them. Hope that makes sense. If not, add that to the list of questions. Cool, what's new? And let me check my time a little bit because I also want to give a demo, so. All right, we're good. So what's new? LinkerD 2.13 released two weeks ago and we've been working on it for a few months. We had, at the moment when 2.12 was releasing, we had kind of like a vague idea of what we want to do. And we thought about the Gateway API and we thought about the HTTP route resource and the fact that now you can have more expressive authorization policies. So naturally, the next point for us was when are we going to extend this decline policies as well? HTTP route is a great resource to kind of leverage Kubernetes native primitives to express a bunch of configuration without introducing additional types. And that's exactly what we did. In 2.13, we started to move away from service profiles and for those of you who are kind of new to LinkerD, service profiles are like a resource that we use to basically do a bunch of resiliency configurations such as timeouts and retries, and you kind of scope them to a service. And we started moving away from that because instead of maintaining our own type, we can maintain an HTTP route type that's shared with your ingress, that's shared with other components that might be able to use it. In other words, we wanted to kind of converge on an interop layer that's gonna make it easier for you to manage all of your configuration. The good thing about HTTP routes is that they provide you with a very native way to do dynamic request routing. So traffic splitting. If any of you ever had to do traffic splitting and you're familiar with the SMI spec, the SMI spec was kind of abandoned a year or so ago. There wasn't really much happening there, but with HTTP routes, you can do traffic splitting. You have a parent reference that you attach the route to and then backends that you forward the request to. You can also do header-based routing now with LinkerD and with the HTTP route. You can say you wanna request matches, a specific header, send it to this backend. So what I'm trying to say is that by kind of introducing the HTTP route resource, we started getting a more native approach to traffic splitting. And finally, and this is, I'm kind of looping in on my question before I started, we introduced circuit breaking. And this was a huge milestone for us. You can hold the applause until the end, but it was a huge milestone for us because a lot of people requested it. I think when I started three years ago, someone wrote a proposal for circuit breaking and he wasn't even the first person to do it. So we finally added, in LinkerD, we refer to it as failure accrual. And before I kind of go deeper, does anyone here not know what circuit breaking actually mean? I can explain it, please don't be shy. Okay, yeah, we have a few hands raised. So circuit breaking is this pattern for failure resiliency that you kind of consider in a load balancer. So it ties in with other load balancer policies. And you basically say, when you have a group of endpoints and one of them returns successive errors, you don't wanna keep forwarding requests to that endpoint. You wanna take it out of circulation. You wanna let it cool off for a bit, put it in a corner, send your request somewhere else. And when that endpoint is ready, you usually apply a back off period and you kind of periodically poke it. When that endpoint is ready, you bring it back into the fold. So this helps you manage your load. We also, oops, spoiler. We also introduced in a proxy a few more layers of load shedding and in-memory buffering and all of the nerdy stuff that people usually like when they look at the proxy code. In 2.14, we also started messing around. So we introduced a whole new API, a whole new discovery API that's part of the policy controller. And part of what we wanted to do is have a nice way to configure the proxy moving forward. So circuit breaking through failure accrual was the first step. But the next step will be to let you configure fill fast timeouts. If you've used LinkerD before, you might know that fill fast timeouts are an error users always run into and it's a bit scary because out of nowhere you just get like a fill fast timeout when you cannot connect. It's basically circuit breaking at the load balancer level. And we're gonna make it a bit more configurable. We're gonna let you configure the in-memory buffer. We're gonna let you configure queues and all of that stuff. So again, kind of offering more and more knobs that you can use to make sure that LinkerD is right for you. And finally, as part of 2.13 we started being involved with the Gateway API for mesh management and administration. It's a mouthful. That's why we call it Gamma. That's kind of the name we settled on and also sounds cool, easier to say. And Gamma is a subset of the Gateway API. It's a bunch of people that wanna see how the Gateway API can actually apply to mesh use cases. So we're kind of driving together with other service meshes in the industry. We're kind of driving this effort of having more standardization and figuring out what to do with the resources such as HTTP routes. Cool, and now I'm gonna try and do a demo. I've never done a demo during these talks. So let's see if it goes well. Can you all see the screen? Can you read the text? Okay, cool. So I'm gonna be demonstrating how failure accrual works. And believe it or not, when you're on stage it's much harder to do a demo in case that was not obvious. Cool, so first we'll have a look at the demo application. Actually just have it running in the browser here. It's called Faces. It basically just displays smiley faces. And when you see it being green, it generally means the request was a success. When it's read, there was a service error. When it's sleeping, it's a timeout, so on and so forth. Over here we see the pods that we're actually hitting from the GUI. So the GUI sends requests, displays them here, then sends them to this pod. And we're gonna see in just a little bit when we introduce failure, how circuit breaking actually works in practice. So first we get the pods, everything's running. Currently failure accrual and liquidity. I guess I should have talked a little bit more about this. But yeah, circuit breaking is done through failure accrual. You can do circuit breaking in a number of different ways, but we chose to go with failure accrual. And right now we only support max failure accrual. So that means max consecutive failure for failure accrual. That's a mouthful, try saying that three times fast. What that means basically is that you can set a max number of requests that will fail. And when those requests fail, you trip the breaker for that endpoint. What I'm showing on screen is just how to configure the max consecutive failures, but you can also set the penalty for it. Like what backup strategy do you want? What jitter factor do you want? So to kind of demonstrate, we have the same app here, another deployment. And you'll see that it has a 100% error fraction. So all of the requests that go to this, the decided deployment are gonna instantly fail. So we're gonna have a service like both deployments, the one that I showed you before, where we actually return successes and one that only returns failures. So I'm gonna apply that. I'm not typing this out, by the way. It's a recorded demo. I've learned my lesson. So now if I flip back, we're gonna start seeing some of the requests fail. I'm gonna wait a little bit more just so you can get the hang of it. All right, that's enough. Cool. Right now, circuit breaking is configured only through annotations. And that's because in order for us to provide like a more structured way, we need to wait a little bit and see, kind of how the community feels about it, where we wanna go with this. So, expand point releases and future releases to expand on this a little bit more. But for now, annotations on a service will do. So the basic idea is you have a target, you have a service, and you wanna say, okay, for this service, configured load balancer policy, configured circuit breaking. We add this annotation. The changes should be picked up immediately. And after a while, we set the, I think, consecutive failure limit to like 30 there. So after 30 fill requests from when I applied the annotation, we should see the breaker trip and the endpoint is taken out of circulation. So now it's again all successes because we don't consider the endpoint anymore. Occasionally, the backoff period will elapse and we're gonna send a request through. The request still fails, so the endpoint is still sitting in its corner. And that's kind of the gist of it. Sorry, it's a very anti-climactic ending to it, but cool, where are my slides? All right. So we had a look at what we've been up to in 2.14. I know my voice doesn't have a lot of excitement in it, but I'm super excited because it took us a lot of work to get here. So I hope all of you are excited and kind of ready to play with it. But maybe a question that you're gonna have, and if you don't have it, I'm gonna impose it on you now, is what's next? Like what are we gonna be working on next? We have a very ambitious plan for 2.14 and beyond. First of all, in the very near future, we need to adopt more of the extra resources that come out of the Gateway API. The Gateway API itself has been seeing a lot of new contributions and it's seeing more of a stable, it's in more stable state now. But we wanna adopt more routes when the time is right. So we wanna introduce GRPC circuit breaking through dedicated annotations. We wanna do policy attachment for a bunch of the load balancer policy. We also wanna adopt probably at some point TCP routes and let you do all of this native traffic splitting for GRPC and TCP and so on and so forth. Up next, I kind of mentioned this in passing throughout the presentation. We wanna have finer grain proxy configuration. So even more knobs for you to twist and that kind of extends to the load balancer policy. Right now we're doing Yuma, which is estimated weighted moving average. It's a type of load balancer policy that relies on the power of two choices. So whenever you have two endpoints that you compare, you take the one with the least latency and just kind of go on and on and on. But for some people, this doesn't seem to do the trick, right? And for a very long time, we've been as opinionated as we are. Said like, oh, this is probably gonna be enough, but sometimes no two environments are the same and no two environments are created equally. So if it's important for people to have this configuration, we kind of reached a point where we're like, well, okay, let's do it. And we're finally in a good place where our API is designed in such a way to support this at an even bigger scale. The last three ones are super exciting for me personally because I know a lot of people have asked about them. I've been working the booth in the past two days. That's why my voice is raspy and my eyes just kind of stare out. It's been a lot of work. But mesh expansion, this is something that people have been asking for a very long time. You wanna run Linkerty with your EC2 instances maybe or you have like a hybrid kind of architecture where you wanna use a mix of both. This is gonna be possible in the very near future. Ingress, this might come as a bit of a shock. Like what does he mean by Ingress? Well, we wanna roll out our own Ingress controller. It's, you know, Linkerty, in case you're not aware, works with any Ingress controller that you have out there as long as you have a pod in the cluster to kind of inject the sidecar proxy in. But the next point for us after adopting the gateway API and after hearing users' requests was to think through our own implementation of Ingress and the gateway API and roll something that people can use so you have a unified stack. And of course, once you handle the Ingress side, you also have to handle the Egress side. And I know again, a lot of people want to do external connections, maybe have a connection to a database. You wanna get in metrics in, you wanna do MTLS, all of this stuff we're gonna handle with Egress. And now comes the fun part. This is more of a call to action. I don't know how many of you here are in the Slack channel, how many of you ever contributed, but we're a very friendly bunch. So, you know, all of our development is done on GitHub. Everything is open sourced. If you wanna look at anything, you know, definitely hit us up. I see a few of you here that I've seen contribute. So, you know, makes my heart grow when that happens. We also have formal announcements and mailing lists on CNCF, so if that's your thing and you like to receive your newsletter and emails, we're always gonna publish those, especially when we do releases. We also have formal third party security audits. So, yeah, if you have any doubts about our security practices, you shouldn't. But if you have any doubts about our security practices, we have some third party audits that have been done. And finally, we also rolled out a support forum. So, one thing with Slack, I've been on Slack for a very long time, is that, you know, the history is just not preserved. Sometimes it's hard to search for issues. So, we rolled out a forum. People can post in there, you know, if you have any articles you wanna write or if you have any questions that you wanna ask. Yeah, it's all in there. And finally, we also run a monthly, hands-on engineering focus training to get you up to speed. You know, a lot of people, again, say it's complex. I beg to differ, again, also very biased. I work on a day-to-day, but that's why we kinda wanna help you be successful. And if you wanna run Linkerty easier on any Kubernetes cluster, Boyant, the creator of Linkerty, also has SAS product to help you out. And finally, my favorite part, questions. Yeah, should I just give the mic out? How does this work? All right, he's got the mic. I'll let you pick. Thanks for all the nice new features. The API gateway spec with the HP route seems to be very tightly coupled to having a gateway, which kind of sounds like Ingress. How does it work between multiple services in the same match? Yeah, that's a really good question. So you're correct. The gateway API was supposed to standardize some of the Ingress stuff. And it's mostly for north-south traffic. And that's why we came up with this whole gamma movement that kind of standardized the resources for east-west traffic, for mesh traffic. We're working on it. We have all of the people who work on service meshes in general, whether they're Inkerty proxy-based, Envoy-based, kind of adapt the HTTP route resources to make it make sense. But yeah, the gateway resource, for example, is not something we use in mesh, but the X-Route resources are because they allow you to shape traffic and do all of this dynamic request routing stuff. Does that answer your question? Okay, so if I use the HTTP route resource, or type, it will already work in the mesh. Yeah. Between peers. Yeah, as long as you attach it to a service that is meshed. Right. Yeah. Thanks. No problem. Yeah, keep on coming. Hello. So with the circuit breaking, what if all of my pods start throwing errors all at once? Do they all get taken out of circulation? Or is there kind of like percentage maximum out of circulation thing going on? A very good question. If all of your pods start returning 500 errors all at once, they're all gonna be taken out of circulation and they're all gonna be applied the backoff penalty. So if one of them comes back on, we're gonna probe it. If it does, we're gonna start routing requests to it. Otherwise, tough luck. Keep them coming. Okay, we've got one more at the back. I'm getting ready for the EBPF question at some point. Thanks for the presentation. I just wanted to ask, are you supporting IP version six? No, not yet. That's a very good point. We don't, it's in the books though, if you wanna speed up the process. So we know a bunch of people started adopting dual stack. It's not trivial to implement it. You have the IP tables layer. We have a few more changes to make because some of our, we use IPv4 by default in the proxy. It's not trivial, but it's something that we can do. We just need to see how much the community is currently moving to dual stack to kind of know how to prioritize it and what direction to move in. So if you wanna use IPv6, there's an issue open in the GitHub repo. Just go on there and tag me and bug other people and we'll get it on the roadmap. No more questions? Come on, I need at least two more. Is HTTP3 support coming in soon? No. No, sorry, that was a quick answer, but it's not on the books for now. Again, if you think you have a use case, so this is something that we generally tell people. If you think you have a good use case for it, we're super happy to have a chat, but again, we aim to be super simple in the stuff that we do. So everything has to be thought out and kind of passed through our own opinionated filter. So it helps to have an issue, it helps to track it and it helps to see how many people wanna have it done. One more. You hinted it kind of already, eBPF, SideColors, what's your stance? Okay, thank you for finally asking. It's a tough one. Okay, again, this is my opinion, right? As the maintainer. So it's not even the opinion of my colleagues. I like eBPF, I think it's a very cool technology. I think being able to have all of this observability built in the kernel is super cool. There's a big but coming. I think the big problem with eBPF is that you cannot actually do any state management inside of it. The verifier is super strict. You cannot have unbounded loops. There's not a lot of stuff that you can do in there. eBPF really sparkles when you need tracing, when you need packet tracing, when you need to do socket level load balancing. All of this stuff is cool, but as soon as you start abstracting things away and you need to do stuff like circuit breaking or when you need to do stuff like retries, when you need to do stuff like timeouts, you can't really do that with eBPF. And that's why even the service mesh solutions that kind of run a mix of eBPF and something else, they usually use Envoy to kind of handle all of this layer seven stuff. So I think eBPF is cool. We support eBPF CNIs. If you want to use Calico or whatever else, you can go ahead and use it. Linkardee supports it. But when it comes to doing service mesh things, which is failure resiliency, observability at a higher level, that's protocol related. And when you want to do MTLS, then no substitute for the sidecar proxy. Thank you.