 From everyone to Cloud Native Live, where we dive into the code behind Cloud Native. I'm Taylor Dullesall, head of ecosystem at the CNCF, where I work closely with end users as they navigate their Cloud Native journeys. Every week, we bring a new set of presenters to showcase how to work with Cloud Native technologies. They will build things. They will break things, and they will answer your questions. In today's session, Jason Morgan has joined us to talk about Service Mesh in production 101 with LankerD. This is an official live stream of the CNCF, and as such is subject to the CNCF code of conduct. Please do not add anything to the chat or questions that would be in violation of the code of conduct. Basically, please be respectful to all of your fellow participants and presenters. And with that, I'd love to kick it over to Jason to get us started on today's presentation. Jason. Yeah, hey, how's it going? Thanks for having me on again. I'm excited to talk to everybody about LankerD in production. So got some slides here. I'm gonna do my best to not go back to the slides after one more, it's got my headshot. So to see them there, here, this guy. You can find me on Twitter, GitHub, not that I can see any reason you'd want to find me on GitHub, or on the LankerD Slack as at Jason, and probably on the CNCF Slack. All right, that's it for slides. What I wanna, what we're gonna talk about today is how to run LankerD in production. So the first thing, if you're at there, we're gonna get posted in the chat, is buoyant, the company I work for, the people that make LankerD, publish a runbook of how to do LankerD in prod. And we do a couple of things that are pretty important specifically. We talk about things you need to be aware of before you deploy, potential gotchas, and things to check on. And that's essentially what today is gonna be about. Sound good to folks? Yes. All right, well, it sounds good to Taylor, and that's, that's works. Hi, Chivonc, all right. So, sure, great. Oh, one other thing, a great thing to check out. If you like this stream, we also have a, we also have a, this is based off a webinar that one of my colleagues, Charles Pretzer, did, that's a recorded Service Mesh Academy presentation. And let me just open that up. Nope, wrong, sorry, totally wrong thing. Service Mesh Academy, there we go. This thing, this thing, LankerD in production. So let me just copy this link and do these folks to paste in the chat. So it's just a video that dives into basically everything I'm gonna do today, but in more depth, in more depth. So, let's start. When you, or what this is, right, we're not gonna do an introduction to LankerD or an introduction to what Service Mesh is, right? I'll give you a quick overview because I feel like that's useful every single time I do it. Service Mesh is kind of a weird name for a network of a bunch of little load balancers that sit between your applications and do stuff with your application traffic. So they intercept and then use your application traffic to do stuff. Typically, that stuff is adding some security by encrypting the traffic and identifying the various parties in the conversation. It adds, it makes it a little bit easier to monitor by adding some standard observability where you can see things like success rate, saturation, latency for your applications, and then it can improve on the reliability of your service to service communication. So that's the very fast Service Mesh definition. And that's as far as I'm going into that. This is also not gonna be like a super deep dive into how do I do certificate rotation or even how do I do monitoring? I'm gonna tell you a little bit, right? If you wanna learn more about debugging, monitoring, anything like that, go check out that Service Mesh Academy link. The resolution, oh yeah, sorry, John, John, Johnathan. Is this a little bit better? Hopefully, and I'll make all of them a little bit bigger. So if you check out the whole Service Mesh Academy link, you'll see all the buoyant webinars, right? And of course you'll have, there's a LinkerD101 on this channel too. Yeah, and so that's the start. So before we do anything else, right? Let's talk about, let's talk about what, like what do you need to do before you go to production? And like, I'm a huge fan of checklists. I just think it helps me not forget things that I shouldn't forget, right? Fonts are not super clear. I can't do anything about the font, but I'll take that to feedback to the people that do the website, because yeah, I guess I don't know a ton about fonts. Sorry about that shell script code. So yeah, I think a checklist is really handy. The production run book here has a bit of a checklist built into it. The things I would tell you is one, first off, look at this run book before you go to prod with LinkerD, and look at a run book for your thing, whatever your product is before you go to production, right? Or build one yourself if there isn't one that already exists. To LinkerD, let's just check out LinkerD and the LinkerD install command. So if I do the LinkerD CLI, it's got this install mode. And if I look at health, it's got some flags I can set, right? Bunch of stuff to say here about the flags, but important one is LinkerD has a HA mode for installing. This doesn't go by default. It forces you to have at least three nodes in a cluster if you wanna install LinkerD and HA mode, because LinkerD's HA mode splits the control plane components into three replicas instead of just one, right? And those replicas have anti-affinity rules on them, so they can't run on the same node, right? So like the bare minimum you wanna do if you're running LinkerD in production is have three replicas and have them not on the same node. So if one node goes down, you can recover, right? The other thing HA mode does that is awesome is it will enforce a policy that your pods can't start if the LinkerD proxy injector isn't running, right? So let's talk about why that is just real quick, right? If you're using a service mesh because MTLS is important or required for your environment, right? Like I need to ensure that every app when it's talking from app to app is using, you know, identified and encrypted channels, right? Where both parties are mutually identified and that the data they send is encrypted. Or if that's a regulatory requirement, you don't want your pod to start if it's not getting a proxy injected with it, right? So the HA mode will say, hey, pods, you can't start if the LinkerD proxy injector isn't working. Sounds nice, right? But maybe you see like where this might be a problem, right? If it doesn't cluster wide, there's some components that don't use the proxy one and two need to work before the proxy can work, right? Specifically, if you have like a CNI that runs in its own namespace, right? Or you have the Kube system. Both of those are gonna need to be available at Freddy before LinkerD starts. So if you, when you install with HA mode, you wanna be sure that you annotate Kube system and you annotate like these like infrastructure specific namespaces with a little note. And I forget what that is, but that's why I have my slides here. Let's go HA mode. There we go. There's a little annotation you can put in and I'll try and get that sent to the chat. Oh yeah, sorry. Here's a, here's it in slideshow mode, right? This, hold on, I've got a little laser pointer. Come on buddy, please work. No, it's not gonna do it. Anyway, tell a little annotation we'll put in the chat. Oh, great. That basically tells the namespace, Thomas Ruecker is asking if this is the CNCF Finland event live stream. It is not, I don't think, Taylor. Yeah, it is not, but I could definitely check on that and get you some info, Thomas. Yeah, sorry, you're in the wrong classroom. All right, so a quick bit about HA mode, be sure that you've got the right thing installed. The other thing I would say is if you're installing LinkerD and you're going to production, right? Consider using Helm and not the LinkerD install, right? So LinkerD CLI is great, right? It's handy. It's a good tool that you'll wanna use especially in production as you debug things or get to know your environment, right? But the LinkerD Helm charts are what we see most folks that are running LinkerD in production use and Helm has a bunch of features that make it easy to, or make it extremely consistent when you upgrade and you install and it's very compatible with your version control system. Like the LinkerD CLI, say what you want about it, right? But in general, it puts you into a bit more of an imperative mode, right? I am saying, hey, cluster, go do this, that or the other things. Oh, thanks, Gerald. So Gerald just posted the link to the LinkerD run books. Thank you for that. But it kind of gets you in this, like telling the cluster what to do, right? Like in an imperative fashion where what you really wanna be doing is something like Helm to do it in a bit more of a declarative way. And I guess it's not really imperative but it feels more of that method. And I'll tell you, like I see most people that are successful with LinkerD in prod using Helm and we recommend it. Oh, and store your configs, take your values file and all your overrides that you use for the Helm chart and store that in version control. Like this is a heavy ask for all of you out there, please store your configs and all your values in version control. Like it's crazy clutch. And if you lose that, like you could find yourself in a bad spot. The other one I'd say is use your own image registry, right? Like when you go pull images in fact, let's see if I can install real quick. Install, dash dash ignore cluster. Let's just grab. All right, we're gonna see a bunch of stuff. But this URL is the path to our image and it's using an internet-based registry. Oh, hey, Sam, great to see you. It's using an internet-based registry that like might be available but it might not be available, right? Run your own registry if you want stuff to be highly available. It's a great call. It, you know, there's a ton of great open source registries. I'm wearing my VMware shirt today. You know, check out Harbor. It's awesome. There's a bunch of other great ones too. Like I'm not telling you, Harbor is better than your other registry but it's a good one. And I like it personally. Yeah, run your own registry. It's really worthwhile. And again, using Helm, you can replace the URL so that you've got it pointing to your local and then if your local registry is not available, like go talk to the person that runs Logress and be like, hey, come on, let's get this going. I got a production deploy that's held up, right? You get all that, get all that accountability internally. After the registry, right? Couple things that you wanna plan for, right? Like Winkert is good and it's stable in general. People find that it just works, right? Which is really good, right? It's a good, it's a pretty good thing to say about the project, right? But like, end of the day, nothing always just works, right? Like you have to monitor and alert on your Lincor D environment. You have to know what you're gonna do for that, right? I'm making sense so far. Taylor, this is all resonating with you, right? Like it's kind of simple advice and we'll do a bit more in demos in a minute but I just wanna make sure that y'all are thinking about this, right? Before you go. So plan, make a plan for monitoring and alerting. That runbook will help you think about it. You know, there's, well, I'll dive into a whole section on monitoring and alerting in a bit. Also, you wanna get, you don't need to become deep Lincor D experts, right? Like one of the things that we like or we say positively about Lincor D is that you don't need to develop deep Lincor D expertise. But how do I know if Lincor D is healthy on a cluster? Well, I know enough to just go in here and run Lincor D check, right? And it's gonna tell me what the current state of my environment is, right? But you need to know how Lincor D works. What can you do if something goes wrong and how do you debug it? And then really be aware of and think about how you're gonna do upgrades, make a plan and I'm gonna talk about what that plan should look like and understand the certificate hierarchy. So this is one that I just wanna dive into quick, really quick and this might be good for a little whiteboard. Oh, hold on, I'm just gonna try and get my laser pointer to work because I absolutely love it. Be great if they put the replace batteries function in there at some point in time. Well, yeah, it was fully charged yesterday but all right, I'm gonna give up. So here I've got like what's a fictional Kubernetes cluster. I have an Ingress, a bunch of proxies. So these proxies are the data plane and together the data plane and the control plane make up your service mesh, right? And we're gonna need for all these things, there are three tiers of certificates. There is a root certificate which isn't ever gonna be in the cluster, right? But is the source of the trust that everything else uses. And when you install LinkerD, especially when you install it with the Helm chart, you're gonna be forced to generate an intermediate CA. So a certificate authority that you'll give to the control plane and that the control plane will use to create and rotate the certificates for your workloads. And then every workload is gonna get its own certificate, right? That will be generated by the control plane and used to positively identify that workload, right? And if you watch some of the LinkerD overviews we'll talk about what these certificates are and a bit more about how they work. But you need to have a sense of what certificates are involved, what you're gonna use for your root CA. Are you gonna generate your own? Are you gonna use the corporate CA that you've got in your environment? Or are you gonna build one that's purpose built for this environment? And are you gonna rotate that root CA, right? And that's a decision you have to make. I see folks that are like, hey, we're gonna rotate that root certificate every year, which is not like a painless process, but it's something that they are okay with, right? Or we see folks that are like, hey, we're gonna treat the root CA for our service mesh like we treat the root CA for our Kubernetes cluster. We're gonna set it to a lifespan that should outlive the cluster and that's it, right? And we'll go from there. And if you're gonna do multi-cluster, you need to share a root CA between clusters in order to do multi-cluster. And if y'all want, we can do a multi-cluster live stream because that's one of my favorite topics. 25.01.19. Wait, what's that? Sorry, my Amazon spy device was talking to me. Okay, so certificates, figure out what you're doing. Great tool to create. Man, Gerald, you are like hitting it up, man. Thank you so much for this feedback. All right, so you need a plan for certificates. And then have a plan for how are you gonna rotate certificates in your environment, specifically this intermediate certificate? Most of the time when you see our docs will tell you make this last about a year, but I know folks that make it last seven days. I know folks that make it last 10 years, right? You have to decide how are you gonna rotate it? Check out tools like CertManager, right? It's super handy, it's worth using. It'll work with whatever other certificate authority, well, not whatever. It'll work with a lot of certificate authorities. It makes it pretty straightforward to programmatically generate the intermediary certificate. And then of course, plan out the integration with your certificate authority. Quick check-in, make it sense so far. That's most of what I'm gonna say about certificates. So folks, if this is like a topic where you're wishing for way more depth, check out. I don't think you do, Gerald, this has been fantastic. But if you want a deeper dive in this, go check out that ServiceMesh Academy link that you saw earlier. Like that will talk about Linkerd and certificate authorities and there's a talk in there either that exists now or that we're getting ready to do fairly soon in a deep dive in a certificate management with Linkerd. And also check out the production run books. We'll talk about it. All right, so that's certificates. Quick talk about ATA mode, right? This control plane here is really, it's really like a couple of different components but when you install something in ATA mode, we make multiple replicas. So let's just look at this. So let me change my namespace, have complete that. So I'm now in the Linkerd namespace. You can take a look at our pods. Where we can see that I've got three copies of each of my components. I also have resource requests and limits set for all of these things that are fairly reasonable but they may not be, they're not gonna work for every single environment, right? So one thing I'd ask you to do is set up monitoring, look at your environment, test out the things that you do before you make changes to your production cluster and see, do you have the resource requests and resource limits that work for your application? Are you seeing pods get umkilled, right? Like if you are, like do something about it, right? And speaking of which, we're gonna talk about debugging like right after monitoring and I'm gonna show you a little bit of how we're gonna use the CLI to plug in. Still okay as far as content. Any questions so far? Folks, the more I'm interrupted, the better. Like I can talk for hours if you lead me to it. Yeah, please let us know if you have any questions, just feel free to throw them in the chat. We'll get them asked if Jason doesn't see it, I'll be sure to raise it up to him as well. Cool, thanks. And yeah, you interrupt me too, Taylor, if there's anything you want me to double tap on. Absolutely. Going back, be sure that you set that annotation at a minimum on your CUBE system namespace to allow pods in that namespace to start. Even if this proxy injector isn't working, right? So there's a rule now in place because we installed it in an HA mode. There's a rule that says, don't let a pod start if this isn't running, right? But like if this service, all right, I'm gonna, so we have a question from Guyon Raj. And I'm sorry, I can't explain Knative. Yeah, but happy to, you know, if anyone wants to see Knative and Linkerty at some point, you can also do a live stream on that. Anyway, this thing relies on the CUBE system. If CUBE system won't start without it and it won't start with a CUBE system, you get into a vicious cycle. Oh man, we're getting amazing questions. So let's hop in. Guyon Raj, again, Linkerty and SEO, you typically are using one or the other, right? You're not, they're competing products, right? So they're, you know, in general, you'd use one or the other. I'll tell you to use Linkerty, but I work for points. So that's my job, right? But like SEO is also a great tool depending on your use case. Shell Script Code wants to know how proxy container injection work and how CUBE CTL applied checks as a chain. Yeah, awesome. Amazing question, right? So the way proxy injection works is we just set an annotation. So we set an annotation, am I choppy or am I okay? So my things started looking choppy. You set an annotation and you can set it on one of three levels. You can set it on the namespace. You can set it on the actual workload object. So when I say workload, I mean deployment, stateful set or daemon set. I think those are the workloads that I can think of. So you set an annotation inside the specification that says, hey, please inject this workload with the Linkerty proxy. And you set any rules on it that you need. Like you get to tell it to like skip certain ports or treat certain ports in a special way, like whatever you wanna do. But the annotation is the way that you set up injection. So going back to your question, the way the change gets detected is the object gets modified. And when the object gets modified, the CUBE API sees that change and then it automatically tries to roll the pod when you put it on the workload. When you put it on the namespace, when you put at the namespace level, once it won't take effect until the pod gets restarted for whatever reason, right? So if you wanna annotate namespace, go in, in fact, I think I've got emoji vote effects. Let's just do this. K get pods dash A. Oh, emoji vote is already injected. Okay, well, less good. But I could set it on the namespace. So let's actually just, well, I'm not gonna do it now because I'd have to like look stuff up and that would be annoying. But you can set a namespace level and just do a CUBE CTL rollout restart on the namespace and everything will pick up the proxy. And I think that's all your questions, Shellscript, but please let me know if that works. Mr. Feed, I think, sorry if I said your name wrong is the vis component, normally installed in production environments. Oh man, that's a great question. I'm just gonna park that because I'm just about to go into it. Short answer is yes, it is commonly installed, but it's not commonly installed with just the vanilla installation and I'll dive right into that. Thomas Rucker, Rucker, Rucker, I think. Ganraj is asking some interesting questions in here. I love it, man, thank you. So Thomas Rucker, I hope I said that right. While I'm here, would anyone happen to know when the non-Kate's Linkerty? Oh yeah, yeah man, so I can actually answer that. Well, I can answer that as well as anyone can publicly answer that because it's not like, we don't know exactly when Linkerty is gonna start supporting non-Kubernetes workload, but the current tentative plan is that the 213 release or somewhere around the 213 release of Linkerty, we should include the ability to run Linkerty outside of Kubernetes cluster. And there's a ton more I can say about that and please feel free to connect with me. I'm on Twitter, I'm in the CNCF Slack, the Linkerty Slack and I'm happy to give you way more details on that if you're interested. I'm not gonna answer any of the Istio questions, but the baboos. I'm sorry, I'm not gonna say your name correctly here. Ask if they can run two different service mesh technologies on one case cluster? Yes, but also like why? That's not a good idea. Oh, show us a code, ask if they can hard code the proxy container in the deployment? Probably, but don't like, don't do that, right? Like you want the proxy injected to do its work, right? Like it's specifically going in and you know, it's a mutating webhook, right? So that means it looks at the object and then it deliberately mutates the animal for you on the back end, right? And that process, because if we control the version of Linkerty and the pod specification in our version control or whatever an appliance system is you're gonna get the exact same output every time, right? And if you were to try and hard code it you'd add in a lot of extra work for yourself. It wouldn't add value to your environment and it would make your whole deployment more brittle, right? So it's something that you could do but I would recommend against, right? And if you were to ask the community for help and you were doing that, like the first thing people would say would be like get that stuff out of there and then try again, right? So it'd be setting yourself up for failure. All right, so going back and who asks someone asked a great question that I'm gonna get to, I promise. Oh, Mr. Feed. So let's hop into your stuff now. So monitoring Linkerty, monitoring Linkerty it's something you need to plan for. So if you look at Linkerty and in fact I've got a live Linkerty dashboard that anyone, if you feel like looking at it go check out my Linkerty dashboard. So I know I share this link every time I do one of these. But check this out, right? Like with this dashboard you can see a couple of things, right? Like I can see the state of my cluster from the perspective of Linkerty and this is the Linkerty dashboard or Linkerty viz, right? It's an extension, it's a separate component from core Linkerty that gives you some cool information about your environment, right? And one of the things that you'll see here is that the Linkerty control plane components are in the service mesh. So that the tools I can use to look at and diagnose my applications I can use to look at and diagnose my Linkerty install itself, and like you can see that here, right? Linkerty viz, because I'm not using a table on my cluster some of the pods have restarted and they haven't been injected, right? So I can go do a rollout restart. So here's just a quick little bit of debugging right here, right, so K, rollout restart. I didn't get that. Yeah, it's because I'm not talking to you, watch. Everything, every bit of my technology is trying to interrupt me today. Yeah, deploy service mesh. Deployed. I'm going to use this. Okay, so I'm just going to go ahead and well, it's probably a bad idea to do it right during the, oh, I'm in the wrong cluster, sorry. K, CTX, GitOps, A, rollout restart. Deploy.n, Linkerty viz. Yeah, come on, here we go. So we're going to lose this dashboard for a bit, right? But you can use the tooling that viz component to look at your own Linkerty environment. It's actually really handy. So I would use it when you go to production, but you need more than just Linkerty and that Linkerty viz component, it by default, installs an in-memory Prometheus and Grafana that you can use to pull up metrics, right? And if you've used Prometheus in production, the first thing you should think is wow, like I bet you an in-memory Prometheus doesn't last very long and it doesn't, right? Like you can't rely on that when you go to prod. So you have to have a plan for what are you going to do for collecting the Prometheus data from the proxies, right? You don't need to have Linkerty viz installed to grab the data from the proxies with Prometheus. The documentation on how to use an external Prometheus is out there, it's public, it's a very common procedure, but just make sure that you've seen that. And again, go back to that Linkerty production run book. It will talk about some options you have for getting that data out and some good patterns. Then go beyond that, right? Whether or not you're using Prometheus and Grafana to monitor it, right? It might be wise, depending on what you do for monitoring, to look at a third part of your paid tool, right? There's like Datadog, there's New Relic, there are other monitoring tools, feel free to put them in the chat. But there are tools out there to make monitoring easier. Also the company I work for, Boyant, right? We've made a tool that's a paid tool that you can use to do a lot of the alerting and monitoring around Linkerty, right? And ultimately, it's there to try and turn it into a bit more of a managed service for you. So if you're going to production, you can also do less of this checklist and just pay some people some money, but that's kind of your call here. But no matter what, make a plan for storing your metrics, right, like all of this, make a plan before you're going on a prod, have an idea of what you're gonna do before you run into the situation so you're not scrambling, right? Like Linkerty or any service mesh, right? It is critical path for your applications. So that means if it's suffering, your applications are suffering and like that's a bad place to be, right? Like you don't wanna be on the hook, not understand how do you address this, that, or the other failure scenario as you're losing production traffic? Like that's painful. So plan for alerting and monitoring. When you look at the Linkerty control plane itself, I know, but you're gonna be okay, buddy. Well, you'll be okay eventually. Sorry. It's taken longer to come up than it ought to. Anyway, if you look at the Linkerty dashboard and it's healthy, you'll see the success rates for the Linkerty control plane components. And there's no reason that those should ever go. Some of y'all try to get me in trouble. There's no reason it should ever go below 100%. I wanna be clear to everybody on the call, right? Like I work for the company that makes Linkerty. I think Linkerty is a great tool and I'm gonna say positive things about everything we talk about. I'm not gonna tell you yet. I can tell you one or the other. I'd say if you're evaluating service meshes, try some different options, right? And consider what are the things that you care about for the mesh and evaluate based on that. Linkerty is fast, it's lightweight, and I think it's absolutely awesome. Yeah. So yeah, make sure that your success rates are never below 100%. Look at the events, so that the Kubernetes events in your environment, right? Like if you're seeing control plane components, get umkilled, so out of memory killed, right? Like you have bad resource requests and limits and you're gonna need to address that, right? And please make a plan to monitor your certificates. I can't tell you how many times I've seen people come into the Linkerty Slack because they've had their issuer certificate expire, right? If this intermediate certificate expires, your stuff's not gonna be able to talk anymore. Your workload certificates aren't gonna be able to get generated and that's gonna be a big problem because people aren't gonna, your components aren't gonna trust each other anymore. So watch your certs and again, Point Cloud automatically watches your certs for you and is really aggressive about notifying you. All right, quick hop over to questions. Babu, sorry, I'm not sure how I'm gonna say your name there. If you're using Linkerty with something like stuff storage, how does it work? I don't know a ton about stuff and the integration, right? So it's like anything else, right? Linkerty will handle your TCP traffic. It's gonna do really intelligent stuff with your HTTP and GRPC traffic and it's gonna treat other TCP streams that it's not clear about. It's certainly gonna treat those as generic TCP trunks and so you're gonna get different metrics off that and you will off something that understands really well. How hard is it to enable trace logs for your sidecar? So actually we're gonna talk about that in just a minute. So Fias, sorry if I said your name wrong. We'll talk about enabling logs in just a second. So with that, that's my monitoring pitch, right? The long story short is have a plan, check that production runbook because it's got steps and then just make sure that you understand it well enough that when something starts to happen, how do I know if a pod is injected? Like you should have that answer for yourself, right? Like how do I know when I look at my, so when I look at all the pods in my cluster, right? Right, this is on my figure cluster. I can tell this by looking at it, right? What things are in a good state, what things aren't, right? And how many pods are injected, right? So I can tell that this thing or sorry, this thing isn't injected because it only has one container per pod, right? Anything that's injected with Linkerty has at least two containers. That's one quick view, right? And you should be aware of stuff like that before you go. All right, let me go back to my demo cluster here. So let's talk about debugging. So first off, Linkerty is a Kubernetes tool, right? Let's actually, let's do this, sorry, let's do this, sorry, let's do this, there we go. Well, that's something weird in my script, regardless, right, let's get our pods again. I've got Linkerty running. If I wanna debug the control plane, right? Like the first thing I do is just go check out the events, right, so I'm in the, I'm in the name J, so let's just take it events, right? And we can sort by timestamp, which is a handy one that I always like, I always save this command off the side because I never remember the syntax for sorting by timestamp, right? They didn't copy, give me just a sec, buddy, there we go. So I get events, I sort by timestamp, let me make this a little bit smaller, right? For good events, look through the events, right? Because it'll tell you if something's going on, this is the history of pods in this namespace, right? So like everything else you do in Kubernetes, events, logs, custom objects. Now for the default Linkerty installation, you don't have any custom objects you need to worry about, right? Yeah, you don't have any custom objects you need to worry about, but using Kubernetes events will get you a lot of details. Checking out the pods, like are all the pods in my Linkerty namespace healthy, right? Well, okay, I've got everything's up and I don't have any resarts, so that's like, that's a pretty good start. I can do logs, so let's just get logs for one of the identity pods, right? So, Owen, what are these various components? I've got the destination service, which are the destination pods, which tell me about the endpoints and help me help my pods, help my proxies pick where to route traffic. I've got the identity service, which is giving, like actually generating all those certificates that we need. So it's clutch, right? So let's check out the logs for the identity service. And last but not least, we have our proxy injector, which is getting that proxy out into the environment, right? All important components. So let's see some logs. K logs, this guy, and, oh, right, you have to pick a container, right? So a tip that I use all the time, right? Let's see, is that it? Normally it have completes, but whatever, you just say type logs and don't type a container. It'll tell you, you have to pick one of the containers, right? So, you know, we can look at the identity, we can look at the proxy and we can look at the init container. And let's just talk about all those real quick. So that's C, C, identity. Now this is the default log level, right? And now I need my slides again because I didn't ever remember this off the top of my head. So let's just share this real quick. So I can set the log level in a couple of different places. Be aware, Lincuity's control plane used to be all written in Go, and now some components are starting to get written in Rust and there's some stuff about why it's now easier to do Rust for Kubernetes control plane components, but it has become easier. And so you should expect more control plane components to get written in Rust because Rust is a really interesting language and this isn't the time for that talk, but there's some stuff there. Suffice to say, be aware, different controllers use different formats for setting the log level. Everything is out there in the documentation and you can see it all in the Helm chart, in the Helm chart doc specifically, how do you set the log level? In general, you can set it globally at the cluster or per individual workload, right? So just be aware how you wanna do it. Typically what we see people do is leave the default log level and just turn on other log levels as they need it, but that is totally up to your use case and your environment and feel comfortable customizing. There's no like set recommendation beyond, obviously the more verbose your log level, the more intensive the logging operation is gonna be for your system, right? So be sure that you're compensating for that. So there are different log levels to set, which is the whole point of that story. Again, the messages that we can see here, I can see that certificates are being generated and why I can see some information about my issuer search, whatever, right? Some just generic stuff about this environment. And this is a great place to go if you wanna debug what's happening with certificates and the deeper your log level, the more information you're gonna get. So that's the identity service. I could also check out the actual proxy, right? Was it LinkedIn proxy? Got LinkedIn proxy, pretty proxy, right? This one's actually pretty simple because we don't have a very verbose debug level. If folks are, yeah, also great to see you on, there is, I know there's a new relic integration. I'm not sure about Dynatrace, but I'd be surprised that there wasn't. I'm just not super familiar with Dynatrace if I'm being honest, but check them out. And if you're dealing with a company like Dynatrace, New Relic, anybody like that and you're like, hey, I wish your LinkedIn support was better, send them my way, happy to talk to them and get them to a place where they're comfortable, right? Like it's all, it's like anything, right? But where is a given organization best at time? And we're happy to make it easier for them to support you. So going back, looking at the proxy logs, I can see data about specifically who's talking to this component and why, right? Like I've got some values about inbound connection and what's going on. So if I were debugging, put this into a trace mode, output the logs something, cause like even just with no traffic, this is already a lot of data. And if you're seeing traffic and you're in trace mode, it's gonna be so much textier that you basically just wanna set the log level, save the logs, try and reproduce the behavior that you saw and then set the log level back normal, right? That's the basic flow. And last but not least, I wanna talk about the init container. So let me just look at the init. This one's gonna need, right? So this is your dive into what's happening with IP tables, right? So it'll output the existing IP tables configuration when it starts, it will output what it's doing and it'll output what's changed at the end, right? So I'm not an IP tables person. So this largely looks like diverse to me, but if you're trying to debug and you wanna get into the IP tables level, this is the place to go to get this sort of information. Make it sense? So I'm just kind of pointing you at different places you can go with the tooling. That's the basic Kubernetes tooling. And we also have Linkerty's CLI itself, which gives us some tools that are a bit, they're fairly simple to consume. So first off, and really the most important one, just check your version, right? Like you wanna be on the latest patch version for whatever release you're on and in general things are n minus two. So we're on 2.11. So you can be 2.11, 2.10 or 2.9 today, right? But if you're on 2.9, go beyond 2.95. If you're on 2.10, go beyond whatever the latest 2.10 release is, which I don't remember what that is. So get the latest patch version on what you're doing. Beyond that, we have this handy-dandy and Linkerty check. Linkerty check can also output data as JSON. So I've seen people put this in a little container and just run it periodically and just dump this to something. But you've got all kinds of options. Also another point cloud plug, it's got all this data for you so you can get a sense of what's going on in your environment. But this just tells me, am I happy? And there's some things I can do with Linkerty check beyond just running the generic one. I can check my proxies, I can check my config, I can pre-check before I do an install. It's a great command, use it. It should be the first step if you suspect something's wrong, run Linkerty check. I can get proxy metrics with Linkerty diagnostics and I can get details about the identity or the authorization. All right, so there's a couple of other things that I can do as I look in there, but I'm not really gonna go too much further down that road unless anyone has a specific debugging question. Also hi to Billy, hi to Billy. I hope I said your name right there. Oh, last but not least, tap. So Linkerty's dashboard comes with this tap functionality and tap allows you to check on the state of, or it allows you to see metadata about the requests between your components. It's what gives you that, hey, like this path, when I called this path, I succeeded when I called this other path, it failed. That data is super valuable. You can access the tap data via the Linkerty CLI, oops. You can access that data via the Linkerty CLI, Linkerty Viz tap, right? I'm not gonna actually do a tap. Oh, thanks, Gerald. I'm not gonna do a tap now, but I could tap a particular deployment, right? And see what requests are coming in. I could tap an individual pod and I can make really specific rules that filter that down. It provides a default output and you can also input raw JSON data, which is super detailed, so you've got, you've got all sorts of stuff that I've been to. So check out Linkerty tap when you're debugging. It's a great one. I wanna take the rest of this to just talk about upgrading. Or Taylor, are there any questions that we should address? I don't think there's too much. I've got some just smaller ones around, you know, like why Linkerty, I know the answer and love the blog post and so that's kind of what I'm getting at, but the reason as to why Linkerty doesn't use Envoy. Yeah, okay, great. So there is a blog post, which let me pop up here. Linkerty Envoy. Yeah, why Linkerty doesn't use Envoy, so I'll post that in the chat. I'm sorry, short, right? For us, when we wrote Linkerty 2, right, there was a deliberate decision made to write our own proxy, right? And it comes down to more than anything, not to say anything bad about Envoy, right? But the Rust programming language was kind of becoming more serious at the time that we were rewriting Linkerty. And like, at the end of the day, when you're writing a proxy, you want two things, right? You want it to be, you want it to be secure and you want it to be fast, right? And you want it to be, I guess, three things. You want it to be small, you want it to be secure, you want it to be fast, right? So you need it to be in a compiled language that's not garbage collected, right? Because garbage collection, that process of cleaning out your memory adds all sorts of safety, which is great, but it also slows things down as it does garbage collection, right? Then after that, right? Then after that, right? Like you want it to be, you want it to be small, right? So again, a compiled language that's very efficient, right? Is a big deal and you want it to be, you want it to be secure, right? So that the downside of writing in C or C++, right? Or not having garbage collection is that you, you know, there are like C and C++ are kind of famous for being hard to manage memory. And like, I'm not a good enough programmer to explain to you why that is or how that works. I wouldn't even consider writing those things because that's hard and I'm just not that good at it, right? Rust, the compiler for Rust is specifically built to avoid memory management errors or to catch them as it does the compilation. And again, I don't even know enough about Rust explain why that is, but if you dig in for yourself, right? And see what makes sense. But it is by writing in Rust, we made the LinkerD proxy really performant, really small and really memory safe, right? So in our view, like you want it to be fast, small and secure and you get that just by using Rust. And that's like, you're just by using the LinkerD proxy and writing it in the Rust language. And that was worth the cost of having to build our own proxy, right? And it's purpose built, right? Like on voice, great, but it's general purpose, right? LinkerD's proxy is just, it just does what the control plan does. Like go try and use it for something else. Like I saw someone talking on Twitter with one of our maintainers about, hey, could I use your LinkerD proxy to do this? And she's like, no, like just go, just go, like you can take some libraries to use and build your own Rust proxy. But like LinkerD's proxy like only does what the control plan tells to do. It's not, it doesn't have, it doesn't have room for a misconfiguration, right? Like if you look at, if you've got something that manages Envoy, like Envoy's heart, right? There's great stuff in there, right? But it's, it's hard. So be sure that, be sure that, and be sure that whatever you're using, you've seen, you know, that it's really good at configuring Envoy. And watch it because there are, no matter what service mesh you're using, there are vulnerabilities that will come up due to configuration mismatches between those two projects, which we get to avoid because we have a purpose built proxy. And then I'll talk about that more in the article. All right. I want to just do five minutes on upgrading. Can we, can we talk about that real quick? All right. So the first thing I'm going to tell you about upgrading LinkerD versions is test, test, test it in non-production. Be real, like keep your stuff in version control. Try a dry run of what you're doing, right? And, and validate that. The second thing I'll tell you is that the data plane, those proxies will work with up to two versions ahead of control plane, right? So you can roll forward the control plane, upgrade the control plane, make sure that you're happy with it, then upgrade your workloads. So do that kubectl roll at restart to get your workloads to the later version. You don't want to, you don't want there to be a long running version mismatch between the data plane components, those proxies and the control plane, right? And you can see it in the Kubernetes cluster. And again, if you're using my company's commercial product for liquidity, it makes it really easy to see. But just check that. Be aware and pay attention. Have a checklist when you do your upgrade. Just like you should have a checklist for going to prod, have a checklist for doing your upgrade and make sure it includes upgrade your proxies and have your testing. Don't skip versions, okay? Don't go from two nine to two 11 if you're doing an in-place upgrade, right? I wanna be very clear about what I mean there, right? There are two ways to upgrade, essentially. You can upgrade your live environment or you can build a brand new cluster at the new version and switch over to the new cluster. I have, I'm not gonna tell you which way to do it, right? Cause like I think ultimately that's gonna depend on you. There's a lot of folks making a lot of hay about why GitOps is a great operating pattern, right? And I agree with them, right? But your situation, I don't know everybody's situation. I think if you can do it with fully disposable clusters, do it that way. Don't ever upgrade anything, right? Just deploy new clusters with new infrastructure. Go from there. But if you are upgrading, don't skip versions. And even if you are doing a wholesale new cluster, test before you shift your traffic over. I know this is like, I know it's obvious, right? But like I just want everyone to really, really have that in their mind, right? Cause it's an easy place to do yourself. And actually, yeah, Gerald just posted a link to upgrading. The other one I'd say is look at the docs about upgrading versions, right? Specifically, when folks went from linkerd 2.9 to 2.10, we did a whole bunch of talks and a whole bunch of messaging around, hey, listen, behavior's changed. You could see it in our upgrade notes. We talked about behavior changing, but like is everyone reading upgrade notes? Like have I, like I've upgraded things in production without reading upgrade notes before. Like I'll admit it. Maybe I'm the only one, but like I doubt it, right? So consider strongly reading upgrade notes, definitely test, right? And be aware that in some versions, especially the change from 2.9 to 2.10, there are behavior changes that are significant and that can cause problems if you're not tracking on it, if you're not testing, right? Beyond that, I would again, strongly recommend you use Helm. And if you're using Helm, no Helm, right? Like I want you all to see something. So this is our, our, our Linkardee expert put this together and talked about upgrading, right? And their thing, their thing, they talk a lot about Helm upgrade and reuse values versus reset values. I don't have time to dive into it now, right? Just be aware, what are you doing and when, right? Like, and another, another weird one, if you have to roll back using Helm, you're gonna Helm upgrade to the new version you want. And then if you roll back, you're gonna Helm upgrade to the old version, okay? So be aware of Helm and, and how it works, right? That's, that's the big story, right? Check upgrade notes, don't skip versions, you know, like with everything else, make a plan, you know, measure twice, cut once, especially when you're dealing with production traffic, right? Helm just sits in between your production traffic. Man, Gerald, I just wanna say, hit me up after this on Slack. I'd love to, I'd love to send you a thank you. You've posted so many great links. I'm crazy grateful. And, and yeah. So now I guess we've got a couple of minutes. Is it okay to do questions? Or do you wanna wrap? No, I'd say, yeah, if anyone has any questions, please feel free to throw them in the chat. We'll get them asked and answered. And then, yeah, if nobody has any questions, we can, we can go ahead and wrap from that. I wanna make sure that I answered, I answered someone had a question that I loved earlier. Yeah, so Mr. Feed there. Yes, or they asked, but he probably for Mr. Is the Viz component normally installed in production environments? Yes, but the, the Prometheus, they typically folks externalize the Prometheus. They use their own Prometheus instance and they connect the Linkerty dashboard to that. They don't use the built-in Prometheus that comes with it. The Linkerty Viz is handy in production environments. It is, and you can always just install it and uninstall it, right? Like, Linkerty Viz is much less, like much less important than the rest of it, right? So like here in my cluster, I don't have Linkerty Viz installed. Even if I installed Linkerty itself with, with him, I can just go in here and type Linkerty Viz installed, right? Wait for it to be ready and then uninstall it later if I don't want it, right? Like I can bring it up and bring it down, use it, get some tap data should I want it, right? Pop open a dashboard, right? Let's do Linkerty, Linkerty check, or sorry, Linkerty Viz check, right? So now I'm just checking the Viz extension to check. Again, like I told you, it has different modes that it can be in. I can wait for it to be ready. I can pop in a dashboard, I can do the debugging I wanna do and then I can turn it off. I'd say in general, it's not a high-cost thing to leave up and running and I wouldn't remove it unless that's like a particular pattern you have in your organization, but you can just install and remove it as you need it. So let me see if there's any new ones up here. Oh yeah, so shell script code, that's a great one. So that the resource utilization comparison, you know, again, right? Like remember, we're the buoyant folks, different people will say different things. We did a benchmarking, ACO versus Linkerty, right? Now we used the benchmarking suite from the folks over at Kinvoke and I'll put the link in the chat, right? So we did it twice last year. This article talks all about the ways we did it, but we found when we tested it with this third-party benchmarking suite, not our benchmarking suite, we found that we saw like a dramatic difference in latency and resource consumption with Linkerty versus Istio. But again, do your own testing, right? It's totally, totally worth that. Yeah, Gerald, I will look you up right after this on LinkedIn. So I got you copy and pasted there. All right, Taylor, I'll stop, man. Feel free to do the rap. Absolutely, no, thank you so much, Jason. It's been wonderful to have you on again. I love that we keep meeting like this on these streams. Very, very soon in KubeCon. So just a reminder to all of you attending KubeCon EU, make sure to grab your reservations both your hotel and your flight if you're looking into that. I just booked mine recently and saw that things were filling up. So the CNCF is looking at getting a few more blocks on that front, but time is definitely of the essence. It looks like it's going to be a popular conference. It'll be a lot of fun. I hope to see you there chatting about KubeCon, if not, the talks will be available after that. So lots of good sessions, lots of good DayZero events, should be fun. Thank you, everyone, for joining the latest episode of Cloud Native Live. It was great to learn from Jason. We enjoyed the interaction and questions from the audience, always a lively discussion. Next week we'll be joined by Richard Collins on the topic of workload misconfiguration, the number one security threat when using Kubernetes. Thank you so much for joining us today. Thank you again so much, Jason, and we hope to see you soon. Thanks, everybody. Yeah, thank you. Hope to see you at KubeCon.