 Hi, everyone. This is Overview and State of Lincority. So welcome. So I'm going to just start with a show of hands so I can do this. But who here has heard the word service mesh before? Should be everyone because I just said it. Who could explain what a service mesh is? OK, about half the people maybe. Who has used a service mesh before? More people somehow. Who's using a service mesh in production? That's still a good number. What about who's using Lincority in production? Still pretty good. I have to ask this to stroke my ego. Yeah, so speaking of my ego, my name is Alex. I am a Lincority maintainer. I've been working on Lincority since the beginning of Lincority. I'm very proud to be part of the project. Lincority is a very cool project. It's used by a lot of different organizations all over the world and has been for a long time. We have a very vibrant open source community. So if you ever pop into the Lincority Slack or go on to GitHub, you'll see lots of issues, lots of pull requests, lots of activity, people helping each other in Slack. It's really nice. We're a CNCF project. We are a graduated project. And we're very proud to be there. So what does Lincority do? So Lincority is a service mesh for those who said they were not able to explain what a service mesh is. A service mesh, in our case, or as we use the term, means that there is a sidecar proxy in every pod or in every pod that's part of the service mesh. And all network traffic for that pod that is either leaving the pod or coming in is redirected and intercepted by that proxy. And that proxy handles that traffic and is able to add a bunch of functionality there like observability so you can get things like golden metrics for request rates, success rate, latency, et cetera. You can see your service topology because you know who is calling who. You can get reliability features like retries and timeouts, load balancing, traffic shifting, AB deploys, latency aware load balancing, which is a really neat feature. So you can shift traffic automatically to replicas that are being faster and shift it away from replicas that are being slower. And then perhaps the most important thing, transparent MTLS between all services on by default. So this is the thing that people are usually most interested in or a lot of people kind of come to Linkerty or service mesh for first is they want MTLS and they don't want to work for it. They just want to install Linkerty and have it work, which is what we do. And of course you can do things like cert management and rotation and access policy, which I will talk more about in a little bit. And I meant to say this earlier, but actually the way I want to do this talk is I want to talk about what is new in Linkerty, what is coming up soon on the roadmap, but I want to get through all of that kind of as quickly as I can to leave a lot of time if possible for questions. Cause I really want this to be a little more interactive and to hear what you all want to know about Linkerty and hopefully answer that. So if we get to the end and there's no questions, that will be very awkward for me. Okay, so what's the kind of the philosophy behind Linkerty? Linkerty has gone through a couple of iterations. There was a previous version of Linkerty called Linkerty One, which was written on the JVM. And we learned a lot of things from that. It was very, very powerful, very, very configurable. And while that was really awesome, it was also very hard to use. And a lot of people who were trying to use it kind of had the experience of like, well, you know what we're trying to do. Why doesn't it just work? Why do I have to configure all of this stuff? And so we took a lot of those lessons into what we sometimes call Linkerty Two, or we've kind of dropped the two, we just call it Linkerty Now. And the idea is it should just work right out of the box for 99.9% of use cases. You should be able to just install it and it should do the right thing. And anything that needs user input or needs user configuration should kind of come after that. You shouldn't have that as an upfront barrier to getting started. The other big philosophy that we had for this project was for it to be really, really lightweight and really, really low resource utilization. So that's why we wrote a proxy from scratch. We're not using Envoy or any other kind of off-the-shelf proxy for Linkerty. We have a proxy called the Linkerty Two proxy. It's written in Rust and it's designed to be as lightweight and fast as possible. And operational simplicity has kind of been like one of the most important things from the beginning is that this should not only be easy to get started with and easy to install, but also easy to operate over time and easy to reason about and easy to know what's happening in your system without having to understand a lot of kind of black magic or weird things that are happening. So like I mentioned, we use a specially built micro proxy. It's very boring to anyone who operates it, which is the goal. It's called the Linkerty Two proxy. And it's something we're really proud of. It's kind of one of the most invisible parts of the service mesh because as an operator you should never have to worry about it. But it's got some very, very cool technology under the hood. It's written in Rust on some cutting edge libraries that we helped develop like Tokyo and Hyper-NH2 and Tower. And because it's written in Rust we get a lot of security benefits from that as well. We don't have to worry about certain classes of memory issues. But as this says, the philosophy is that the proxy should be in implementation detail. It's not something that we ever think a operator should have to worry about. And so this kind of leads into our philosophy on security. We want to have secure foundations. We want to be built on secure libraries in secure languages. And we want to leverage Kubernetes as much as possible and be Kubernetes native as much as possible. So that means for example, when we're doing MTLS all of the MTLS identities are bootstrapped from Kubernetes service accounts. And so that's kind of the primitive that we build everything on top of. And in order to be secure you really need to reduce the barriers to security because security features that nobody uses are not useful. So MTLS is on by default. There's nothing you need to do to configure it because we know you've got a service account. That's your credential to getting a certificate. Once you've got a certificate you can do MTLS. And so we have workload identity and all this stuff just kind of works without needing any special configuration. This is what people like to ask us about all the time is what's the difference between Linkerty and Istio and how do the two compare? I don't want to go through all of these points, especially because these two projects are both changing at rapid paces. So anything that's true here one day might not be true the next. But I think the biggest difference is just in philosophy in that the Linkerty project is really, really focused on operability and simplicity rather than trying to satisfy every use case under the sun. And I think that's what makes it easier to use and what makes people have more confidence in it and it's what means it can go to production faster. There's also a lot of performance benefits, I think, to using the Linkerty and the Linkerty Proxy. We've done some really interesting benchmarks which you can look up if you're interested. Okay, so what's new in the project over the past year or so? So Linkerty 2.11 was a really exciting release. I forget when this came out, but sometime in the last year. And this is when we first added server-side authorization. So we've had MTLS for a long time which has given us authentication. We know when one service calls another, we know who that is. They have an identity, that identity is bootstrapped from the service account. And so we know who is calling who. But for the first time in 2.11, we gave you the ability to restrict who you would accept requests from. So the service could say, I would only like to accept requests from this other service or I would only like to accept requests from this other subnet or some combination thereof. And so finally, there was fine-grain access control and you could start implementing those policies which a lot of people were asking for and are very, very useful. This took us a while to kind of figure out what model we wanted to use here. We wanted to do it in a way that was Kubernetes-native and felt natural to people who were in Kubernetes and didn't have to be a whole new system that people had to learn kind of on top of what they already knew. So we introduced a new resource type called server. This lets you define a specific port on your workload. And as soon as you have a server, then you can start to define authorizations to say here are the identities which are allowed to access this server. And that's all enforced on the server side, so it's very secure. And again, because we have the sidecar model where we have proxies inside the pod, each pod kind of becomes its own trust barrier and it can decide for itself what traffic it's deciding to accept and what it's not based on those policies. There were a bunch of other things in that release which were really exciting. Specifically, a GRPC retries was something that was a long-term coming. So we've had retries in Linkerd for a long time, but they've been very restricted in terms of what types of requests can be retried. So specifically, anything with a body, we were never able to retry before because we would have to buffer that body so that if that request eventually fails, we could retry that body so we could send it again. So that buffering was something we added in 2011 so that for GRPC requests in particular where there's always a body on the request, we could retry those. Up to a maximum payload, yeah, 64K. So we'll buffer a certain amount. And this is another philosophy about Linkerd and something that we get from working in Rust is we get to make very conscious decisions about how much data we're buffering to the proxy. We have fixed-size queues. We know that we can decide how much we want to buffer and how much we want to use things like back pressure to push stuff back and make sure that we're not growing our resource usage beyond what's acceptable. We also made a bunch of performance improvements, and we're always kind of having one eye on performance to make sure that we're not having anything run away. And a lot of other cool features, too. The other big thing that happened in 2011 is that we changed the control plane architecture a little bit in that we added a new piece of it called the policy controller. And the policy controller is very important for serving the policy API, which allows all that server-side policy access control that I just talked about. But what's exciting here is the policy was our first controller, which was written in Rust. So before this, all of the control plane was written in Go and the proxy was written in Rust. And so this was the first time we had Rust kind of entering the control plane side of things. And this, I think, is a very, very exciting development because we got to kind of try out what it was like to write a controller in Rust. Typically, this is written in Go. There's a lot of Go ecosystem around Kubernetes controllers. There's a lot of really good libraries for writing controllers in Go. But that ecosystem on the Rust side is really starting to develop now. And it was... I worked a bit on that policy controller and it was a joy to write, actually. I was dreading a little bit when I first got started because I'm like, oh, man, writing a controller in Rust is going to be a nightmare. But it was great. So I think those libraries have really matured a lot. And I'm excited to kind of, as we develop more things in the control plane, to do more and more of it in Rust. I think that's really, really exciting. Okay, so fast forward to 2.12, which is the most recent release of LinkerD. This kind of took the server-side authorization a step farther. So one of the big drawbacks with the server-side authorization that we had in 2.11 is that all the authorization, we said it was fine-grained, but it was fine-grained per server, which meant per port. So you could say, you know, this port has this policy, this port has this policy. These are the types of requests that it'll accept and here's who it'll accept it from. But what we really found is that we needed to get even more fine-grained than that and have per-route policy, which meant to take a look at the HTTP path or method and decide, you know, is this a liveness check? If so, it needs a different authorization policy than if this is, you know, scraping the metrics or if this is an application request or if this is something else. So people ran into a lot of problems specifically with liveness probes because those were coming from the Kublet. They were not authenticated. They were not part of the mesh. So in a lot of cases, we were rejecting them because people set up, you know, their servers and their authorization policies and those liveness probes were failing and then the application was failing because it was detected as down. So this kind of helped to clean that up a lot and meant that we can now do authorization per route. We can say this route, you know, these clients can talk to this route and these clients can talk to this route and it gives us a lot more control. So that's been a huge step forward. The other reason why this is really exciting is because when we were developing this feature, we needed some way for people to specify which routes, you know, were allowed and which routes were not. So we needed users to have some way to specify what is a route. And it turns out that similar work was already happening at the same time in the Gateway API where, you know, you also need to define what is a route. And this was a really good opportunity because we could use those types and we could kind of make sure that we were developing this in a direction that was compatible with what Kubernetes was doing and what the Gateway API was doing. And so that's what we did. We used the Gateway API type, the HTTP route type from the Gateway API to define HTTP routes and you can attach server policy onto those in Lincordies. And, you know, when talking to this server on this route, here's the authorization policy that you should use. And I think this sets us up in a good direction because right now there's ongoing work on the GAMA project which is a project to adapt the Gateway API for service measures in general. So this is obviously something that's of quite a bit of interest to us. And it means that we're kind of well set up to be compatible with that and we're very involved in that conversation. So the Gateway API is very cool if you have not checked it out. I'm doing a talk tomorrow about the Gateway API and specifically how Lincordy uses it. So if you're interested in hearing more about that, I will be talking about that tomorrow. But in the meantime, just know it's very, very cool. What else? Oh, yeah. And we also added kind of automatic support for health checks. So that means that even if you don't manually configure that these are the paths that you've defined as health checks, we will admit those automatically because we know what they are because they're in your spec, they're in your container spec. And we know those are coming from the Kubelet, so as long as they're coming from the right place, we don't require MTLS on them and that'll just work. So that's been a big quality of life improvement there. Okay, so what's coming up next in the future? So Lincordy 213 will be our next release. I don't know exactly when it's planned to come out, when it's ready. But the major focus for 213 is going to be client-side policy. So in 2.12 and 2.11, the major focus was on server-side policy, specifically admission policy. We wanted to say, you know, we're going to let these things in or we're going to deny them based on their identity. On 213, we really want to focus on client-side policy. And there's a few things, there's a lot of topics, there's a lot of things in there, but specifically we wanted to focus on header-based routing and circuit breaking in particular. These are two features that have been highly, highly requested. And so, you know, it's the natural time to kind of take what we learned doing server policy and move that over to the client-side. So again, we want to continue the path that we've gone on using the HTTP route type from the Gateway API to do this in a way that feels very Kubernetes-native and Kubernetes very natural to anybody who's running in Kubernetes. So for example, HTTP routes have these things called back-end or back-end refs, which are very useful for ingresses, because as an ingress, you want to say, okay, for this route, here's where I'm going to send this request to. In the same way, it's kind of also natural to use that for east-west traffic when saying, well, when I send a request route to this service, you know, actually I want to send it to this other back-end instead or I want to split it and send a portion of that traffic to this back-end instead or I want to split it based on a header and say, for this header I want to go here and this header I want to go here. So that's something we're working on right now that's very, very exciting. We've also had a lot of requests for circle-breaking. This is a kind of, this is an interesting one and a little bit of a nebulous one and a lot of requests for circle-breaking means something different by it. So we're never quite sure exactly what that means. But I think in general, there's this need to do, to specify load balancer policy. Load balancer policy encompasses a lot of things. So for LinkedIn right now, what we do is we do this load balancing algorithm called YUMMA, E-W-M-A, Exponentially Weighted Moving Average, which is a very, very cool algorithm that takes a look at historical latency data and uses that to weight how much traffic should go to each backend. So if you see some backends are performing very well, they'll get more traffic, other backends which are performing more poorly, they'll get less traffic and it will kind of automatically adjust over time as the performance of those backends changes over time. And it's really, really good at minimizing latency. So that's what we use. In 100% of cases we always use YUMMA and performance is great, but there's kind of this desire for some people to say, well, actually I want a little more input on how load balancing happens. I want to do, I want to have more control over when backends are in the pool or out of the pool, and this is kind of what people call circuit breaking to say, well, there's been a bunch of failures on this backend, maybe I want to kick that out of the load balancer pool for a while, let that backend recover and so there's a lot of surface area here for configuration to say, well, exactly how do you want load balancing to behave and so client policy is a really natural place for users to be able to specify that configuration and say, well, maybe we need different policies for different routes. Maybe this route is fundamentally, you know, behaves differently and it needs a different load balancer policy than this other one. Maybe we want circuit breaking over here, but we don't want it over here because the nature of those services is different. Now, if you are a LinkerD expert, you may recognize that some of this sounds kind of familiar from service profiles. So service profiles are something that's in LinkerD today and it's a way of specifying which routes are retriable and which routes have timeouts and you know, that's got a lot of overlap with what I just talked about being able to specify client policy because retries and timeouts are client policy. And so kind of because there's this overlap, we want to kind of start slowly to move away from service profiles for configuring these things and start moving towards something that feels more natural in Kubernetes, which is by doing things with HTTP routes and with the Gateway API. So this isn't going to be a hard cut over. We're going to kind of slowly start to deprecate things from service profiles and move them over to the new world of HTTP routes and Gateway API just because that feels a lot more Kubernetes-y. And so we'll kind of slowly migrate that functionality over kind of over time. And as I said before, we've got our eye on Gamma where we're very involved in those conversations and to the extent possible we want to be compatible with Gamma, we want to have all that server side and client side functionality be configured in a way that makes sense with the Gateway API. So if this sounds interesting we would love for you to get involved. All the development is open source, of course, on GitHub so you can kind of follow along. You can file issues. You can open PRs. If you are trying to run Linkerty and you run into any problems you can always hop into the Linkerty Slack slack.linkerty.io and there's always people, maintainers and other users who are there helping each other out and just being friendly and nice. We've got some mailing lists that are from the CNCF that we announced releases to and make other announcements to. There have been some security audits that we've done which are very interesting if you care to read security audits. And yeah, we would love to have everyone participate. If you are looking for a more kind of hands on way of learning about Linkerty there is a hands on service mesh academy. The next one is in November and that really gets you kind of hands on and teaches you right down to the nitty gritty the details about Linkerty that you might not be able to pick up on your own. So if that is of interest to you I highly recommend that you check it out. And if you are running Linkerty in production and you need a little bit of extra help or you want to alleviate some of that burden of running it Boyant offers a fully managed Linkerty so it will take care of things like automatic upgrades and certificate rotation and alerts and stuff like that which just takes some of the administrative burden off of you and makes it a little bit easier and gives you one less thing to worry about. So boyant.io slash demo if you are interested in getting a demo of that. Okay, so hopefully I went fast enough there to leave a lot of time for questions and I want to know what you want to know about Linkerty. So Istio is going to have something like a Proxeles mode using MPM mesh. Do you have any plans along those lines? Anything? Yeah, good question. Istio's got an ambient Proxeles mode. Is Linkerty ever going to have something like that? Probably not. So we are really tied to the sidecar model. We think it has a lot of advantages in terms of security and in terms of operability and understandability. The pod is a very natural security boundary and so when you do something with ambient where you take that proxy and you run it somewhere else and you have it kind of be multi-tenant and you lose a lot of those benefits and you've got one proxy serving multiple different pods. Those security boundaries get a little bit more muddled. You have keys from multiple different accounts potentially coexisting in that one proxy. So we don't see kind of any major advantages to that sidecar-less model that kind of outweigh the drawbacks. So we're probably in it with the sidecarper for at least the medium to long term. Yeah, that's a great point. Someone just pointed out that LinkerD1 had that architecture and that's true. LinkerD1 had a per node proxy and one of the lessons that we learned from that was that there's a lot of drawbacks to that and it's very difficult to do security correctly in that model and it's very difficult. It introduces a lot of complexity that you can drop if you go to this more isolated model of proxy per pod. Hi. We've been running LinkerD1 production since 2020 but we're on 2.9 and it's hard to upgrade. It's like a one-way upgrade and for production workloads it's very hard to guarantee that we can revert if there's a problem especially since you have to rotate everything to get the other recommendations but for us right now the plan is to safely upgrade to a new version is to have a separate cluster a separate cluster that's running the exact same workload, the same network but it's got a different LinkerD1 and then if something goes back we just switch over to the older cluster. Yeah, so the question is how do you upgrade in a way that you have confidence that you can roll back I hear that it has been an issue in the past that roll backs to a previous version of it very very very difficult we're trying to get better about that kind of going forward but it is historically a problem. My advice to you is if you're running into problems just try and engage with the LinkerD team directly we can try and help you sort out the specifics but going backwards doing it downgrade has not been something that has been a great experience in the past so I hear that. So obviously with the new server side policies you can enforce security within LinkerD is there a security reason to do both you know is it I know you're enforcing a proxy but is it you know a concern that we should apply it on both sides potentially if we have pods that we don't want anything talking to do you need to do it twice or is it one good enough? Is the question about whether you would also do kind of security in the application as well? No I mean on like the firewall level so you could do it on the on the LinkerD level and on the firewall level you know through the C&I type deal. I see I see. Yeah I mean you could it's kind of like a belt and suspenders type of approach they're slightly different things you know when LinkerD talks about security and MTLS what we're talking about is workload identity so we have these identities which are tied to certificates which are bootstrapped from service account tokens so you know we don't necessarily say that you know this request is you know coming from a certain place but rather it has this identity this cryptographic identity which represents this workload and so you know if you want another layer on top of that or below that that can potentially make sense as well it's kind of just a different it's a different layer and it's a different thing. For the cryptographic identity and the certificates how are you guys issuing that is it through an ACME do you link into like an ACME provider using CertManager or are you handling that entirely with the LinkerD? It's by default it's handled entirely by LinkerD we do integrate with CertManager so for example if you want automatic certificate rotation CertManager can do that and that works great. In the default case we generate all of that all those certificates inside of LinkerD and those are you know kind of bootstrapped up from the service account tokens but that trust chain is kind of all you know all there it's not connected to like a well known CA or anything like that. Okay so if you wanted to operate like your own ACME compliance CA and hook it into CertManager would you still be able to get the same like identity management rotation? Yep yep you can supply your own trust route your own CertManager and that'll all kind of get hung off of that. Thank you. Thank you for the presentation it was great so I'm a long-time SEO user but also a fan of LinkerD personally. One of the things that I was looking at is in terms of like day two operations in terms of visibility SEO for example makes it very easy to do something like Keali. Is there something in the community that's equivalent inside the LinkerD community or is there an opportunity is that something as a community we should like look at or in terms of viewing you know traffic routing inside the cluster? Yeah so a lot of that is kind of built into the LinkerD Viz extension you know if you use the LinkerD Viz extension you can get things like service maps and dashboards and so on. So a lot of the functionality that's in Keali is also kind of available through the LinkerD dashboard but kind of on top of that all of that data is in Prometheus and you can build you know any visualization you want on top of that. Two quick questions you mentioned that the control plane was down to three components but I counted five on the slide so are there some doubling up there? Let's take a look. Let's see whether I'm wrong or the diagram's wrong. So the destination and policy controllers they're still out here but they're in one, they're kind of bundled together in one pod. The public API doesn't exist anymore and that's three. So the slide is slightly wrong. Okay, yeah the slide's revealed because also I think your Twitter handle was wrong on there. Oh was it? I found you and I followed you anyway. The other one you mentioned briefly is something about back pressure. Does LinkerD provide a back pressure mechanism? Yes that's a great question. So for back pressure you know this is kind of the HDP back pressure mechanism so as kind of the proxy handles traffic it takes those bytes in and sends them out and as soon as it sends them out it kind of signals you know upstream that it's ready for more kind of using the natural connection windows of HDP. Hi. So I'm interested in LinkerD's talking about your authorization policies it's on a pod to pod kind of this workload authorization. Does LinkerD have a story around like user authorization between pods or accessing services or endpoints in pods or maybe further on the stack that a gateway wouldn't be able to necessarily block that traffic? Yeah good question no. So we've always kind of been focused on workload identity rather than user identity. I think that may change a little bit in the future as we start to take on more ingress functionality. I think that's a more natural place for that. But kind of for now all of the east west stuff is all workload off. Yeah sure I can take this one and then. Yeah the question was can you explain the difference between east west and north south traffic. North south generally refers to stuff coming in from the outside world to coming in through the ingress of the gateway and then east west is service to service within the cluster for one service in the cluster to another service also in the cluster. So can you explain to I guess work with something like open policy agent to have like data policies on service to service traffic? Yeah potentially I mean there aren't any kind of plans right now but open policy agent is something we've looked at in the past and it's very interesting and it's definitely potentially useful for a lot of stuff but we don't have anything concrete right now. Can you say anything about the communication between your data plane and control plane. So for example in Istio they use something called XDS because that's what Envoy exposes. But I guess I don't know can you say anything about what kind of protocol you use? Sure so the question was about how does the communication between the control plane and the data plane work. You know conceptually it's very similar to the XDS APIs from Envoy but it's a separate gRPC API so control plane each of the control plane components exposes some gRPC API and the proxies connect to the control plane to get that information. So conceptually very similar to XDS but a different thing. Okay I think that's time. Thank you everybody. I'll be hanging out if you want to ask me more questions.