 All right, this is on. So welcome to this session on Multicluster Fellow with Lingedee. I'm not gonna say much other than if you have any questions, there's a microphone down here in the middle, so please go up there and ask your questions. I'm sure Charles gonna instruct how he wants his questions as well. But without that, I'm just gonna head it over to Charles. So please give him a big round of applause. Well, this is by far the biggest audience that I've had the opportunity to speak with. So first of all, welcome, everybody, back to KubeCon. It's so exciting to see folks in person. We've done the virtual thing for the last couple of years, and I am just really energized by all of your presence right now. So what are we gonna do today? We're gonna talk about multiple Kubernetes clusters. We're gonna talk about why we're gonna do that. And then we're gonna talk about what it looks like to fail over your services, your traffic between these two clusters. I'll be honest with you, I woke up in the middle of the night last night because I'm still on California time, and I had this vision of getting everybody up on the stage and involved, doing failover. What does it mean to have a single point of failure why do we need to rely on each other? So if you will experiment with me on this journey, I will, in my head, I will call out to you, and maybe some of you can share this vision of relying on each other, trusting each other. That's what I think is really, really tremendously important about being back here in person, trusting each other, and you know, let's have fun. So this is a workshop. I think we didn't properly communicate to you like, I hope that you all have a couple of Kubernetes clusters that you can easily access. If you don't, you'll see me executing my commands up here on the screen. I don't have a ton of slides. We're gonna do a lot of walking through the terminal. You'll see my aliases. You'll probably see my secrets. You'll see my passwords. Do your worst. That's what I have to say. So again, but again, trust. I'm so happy to see you all here. If there are any questions during this, the microphone as Casper mentioned is just there. This is meant to be a dialogue. I wanna share this conversation with you, share the space with you. So anything that you want me to stop and pause, let me know. Like I said, I'm from California. I have a tendency to talk very quickly. If I'm talking too quickly, slow me down, raise your hand, whatever you need, I'm here for you. So with that being said, let's... So something I don't get to do anymore. Have you ever tried to during a Zoom meeting be like, so who here has worked with multiple Kubernetes clusters? This doesn't work in Zoom. Cause you're like, grid view, grid view, grid view, too many squares. Yes, I'm so happy. To me that looked about 70, 75% of the room. Awesome. Great. In production, who is using multiple clusters? Yeah, that looks a lot less. That looks like 40%. I'm guesstimating. In production, who is using a service mesh? Look at y'all. And then in production, who's doing it properly with Linkerd. I'm heavily biased. I'm heavily biased. You'll get that from me during this whole time. So, I don't know, should I click on this? Just kidding. Okay. Let's get started. So we understand, let's talk about this dream that I had last night. So, failover's not new. Back in the good old, bad old days, whatever you want to call it, we had multiple servers. You had these VMs. You actually had actual servers. And when things went bad with one server, you had something that was smart enough to route traffic to another server. Whether it's a database, like none of this is new idea. So everybody is like familiar with failover. It's to me, it's a category of circuit breaking. So it makes sense when something fails, you don't want to have a single point of failure, which I've come to realize last night, I am a single point of failure. If I didn't show up today, you would all be staring at this screen without these slides. So, the point is, this is not a new concept. What is new and what we are excited about is that the way that Linkerty sits in your Kubernetes architecture situates it perfectly to be able to failover traffic when a service becomes unavailable. We'll talk about what that means. So let's start with that. I think the network connection is slow. If anybody is mining Bitcoin or downloading Torrents, whatever's happening, maybe it's me. I might go back on the Wi-Fi. There we go. Okay. So we're gonna talk about service mesh. So again, I know some of you are running service mesh in production. Who knows what a service mesh is? Who cares about service mesh? Who's here to learn about service mesh? Yeah. The first couple of slides are for you. The rest of the presentation is for everybody else. Yeah. So what is Linkerty? It's a service mesh. Why is it important? Why do people care about it? The history of Linkerty is that it started outside of Kubernetes clusters. It was a standalone JVM application. So you might call it monolithic. You might not. And it was built on FNAGLE, NETI, which are libraries, FNAGLE is a library that came out of Twitter. And the responsibility of it was to manage your traffic to be able to encrypt, well, not only to encrypt, but also load balance as traffic between a service-oriented architecture. So this is not new for some of you, but for others it might be new. The really interesting thing about it and the reason that I care about service mesh is because as a former application developer and a current slide developer, I develop only slides now, when I was writing my code, I would have to consider if I wanted traffic to be secured between my services or whatever my end points were, I would actually have to include libraries. If I wanted to do load balancing or retries or anything like that, I would have to include this logic into my code. And I was not good at that. Who has written a retry block that looks like while retries is less than retry count, retry? Anybody? Okay, good. I don't feel so bad. There's a smarter way to do that and it's not to say that you've done anything wrong. This is something that LinkerD implements. So we're talking about LinkerD1, we're gonna move into LinkerD2, talking about the Kubernetes space specifically. So what is LinkerD? It's a service mesh. What do we care about as part of a service mesh? I mentioned security already. Reliability is that load balancing piece and observability is being able to collect those metrics. All those things work together to give you, some of you who are running a service mesh in production, I make the assumption that it's obvious and it's not obvious to understand that really understanding the latencies between your services when you have a new service, when you scale replicas of a single instance of a service and you see those latencies go down, how do you observe that? Where do you get that information? You get it from LinkerD. So when we talk about a service mesh, so this is what's incredibly important to understand is that all of the traffic within your, between your services, within your cluster and across clusters, which we'll see today, happens thanks to LinkerD. Your services are not communicating directly with each other unless they are what we call unmeshed. Everything that we're gonna do today is gonna be meshed traffic. Any questions? Does that all make sense to folks? Do I sound crazy yet? Good. Okay, so when we talk about failover, what we are gonna dive into and it's important to understand that this failover extension, there's a concept there that I've already made an assumption about. LinkerD supports extensions. So you've got your core LinkerD control plane and then on top of that we have different extensions. Today we're gonna see the Viz extension, the multi-cluster extension and the failover extension, which is like the grand pooba of this whole thing. If you don't know what a grand pooba is, it's because I'm really old and some of you are younger than me and the grand pooba, I don't know, it's like the guy who is important. Anyway, today the failover extension is the most important part. I'm probably gonna cruise very quickly through the LinkerD deployment, the Viz extension deployment and the multi-cluster deployment to focus specifically on the failover piece because I expect thousands of ooze and ahs when you see failover happening. Oh, we're only 11 minutes in, goodness. I got a talk slower. Okay, yeah, so the failover piece is an extension. You can run all of LinkerD without any of these extensions. You can just have the core components running and you can trust that your application is secure, the traffic is secure, that the load balancing is happening properly, but to really understand and get those metrics, which I'll show you, you're gonna want to include some of the extensions that we'll have, so let's cruise through this. I promised you not too many slides. Here's architecture. I actually updated this slide. It used to look terrible. Now it looks less terrible. So you have services, A, B and C. The way that LinkerD works, again, these proxies are injected as a sidecar into your pods and we'll show you how that happens. This is all thanks to the control plane which reads an annotation. Who knows annotations? I'm gonna keep doing the raise your hand thing all day because I missed it so much. Annotations, labels, who knows the difference between annotations and labels? Yes, okay, good. So yeah, we annotate workloads, whether it's a stateful set, a daemon set, jobs. Ask me about later. Don't ask me about right now. Deployments, daemon sets, stateful sets. Most of your workloads that you care about, we inject with an annotation. There is a control plane component which I will show you that is called the proxy injector. It says this workload is annotated. I must add the LinkerD proxy to the pod. Pod, who knows what language, who knows about the LinkerD proxy? Okay, who knows what language it's written in? Who thinks it uses Envoy? Yes, back in the old, the bad old days of KubeCon, people were like, yeah, LinkerD uses Envoy. It does not. The LinkerD micro proxy is a purpose built proxy for the LinkerD service mesh. It's written in Rust. Our team of maintainers spends a lot of time in the Rust community contributing, giving back to that many of the open source projects. Tokyo is one of them, H-Hyper is one of them. So we're very happy with how Rust turned out for LinkerD. One of the questions that I get probably every six months or so is, can I run the LinkerD proxy as like Nginx? No, you cannot. Again, it's purpose built for this, for LinkerD. That's why it works so well. The endpoints that are used to configure the proxy come from this control plane that we'll discuss very shortly. Does this all make sense? Nods, yes, good. Okay. This is where we're gonna end up. So I just jumped from like service ABC, generic to like, we've got a failover controller. We're talking about traffic splits, service mesh interface. I've got some services here. I've got a service with a dash two. That means that like it's super important, but at the end of the day, we're gonna have two separate clusters running the same application and traffic being sent across those. In addition to that, I am going to tear down this emoji service right here. I'm gonna scale its replicas to zero and what I want you to take away from this is this failover that you're gonna see happen. Failover controller is going to say there are no endpoints for this service. I am responsible for routing traffic to whatever the backup services are. The backup services in this case are going to be emoji dash two and something that is called emoji dash service dash east. And I'll show you what all that means. Does that all make sense? If not, let's go into the terminal. Again, I don't have a ton of slides. So as we talk through this, hopefully my fingers work. We'll see how that goes. But I haven't even set up my cube config at the moment. So yeah, we're gonna, we're getting warmed up here. If you have any questions, now is a great time to ask. Yeah. Okay. I'm gonna have a cube config file. I have two clusters that I've already spun up. I've already had LinkedIn running on it. I tore it all down this morning because I'm kind of a masochist like that and we're gonna start from fresh. Let's see. I have to hide something from y'all for a minute. So apologies to the AV crew, which by the way, they've been so professional this morning. I really appreciate them. I'm gonna unplug this. I apologize. He told me it gives him a heart attack anytime I unplug it. So yeah, that's why. Yeah, you bet. You got it. I was just asked to increase the font size, which happy to do. We already thought it was massive. What I'll tell you though is that we're gonna see some log files that are going to get, they're gonna get wide. So we'll do our best. If there are any other requests, please let me know and we'll take care of this. So, here we go. I'm in, actually I wanted to go back there. Okay, so you'll see that I have, if you don't already use CubeContext or CubeNS, I will use those frequently. It's awesome, great software, easy for managing Kubernetes. You'll see that I have two clusters, East and West. They're both running in Sevo Cloud, which is no plug there other than like they, their clusters are easy to spin up through an API, whatever you wanna do. So nothing up my sleeve. I started these clusters 19 days ago. I've actually installed, deployed, removed LinkerD multiple times from this. My coworker used this yesterday for a very similar presentation. So what we're gonna do now is deploy LinkerD to these clusters. So like I said, there's nothing, nothing up my sleeve. Oops. There we go. Yes, hey. So in my West and East clusters, pure Kubernetes, nothing special happening. There we go. Okay. I have already in my, in this directory, some certificates. Certificates are important. While I deploy LinkerD, first I'll deploy LinkerD and then we'll talk about why certificates are important and what they mean in LinkerD. What it is. Does anybody remember which cluster I was using? East or West? That's okay. Let's take a look. Well, East. The reason that I care about this is because, come on, there we go. East, this all looks good. Okay. So certificates are important, especially when we talk about multi-cluster communication. The certificates that you deploy with LinkerD, especially when you're deploying using Helm, they're a requirement. If in kind of like a dev mode, what we talk about with LinkerD, you can use the CLI to deploy LinkerD. It'll generate YAML for you. In fact, I'll show you as soon as this is done what that looks like. When you use the LinkerD CLI to install and deploy the control plane, it has a CA. What happened? Oh no, I'm in the wrong namespace. Okay. Thank you. Let's do, right. I need to do create namespace, I think. Let's try this. I'm gonna look for you for the hand up again in case something goes wrong while I blabber on. So there is a CA that comes with LinkerD. The reason that their certificates are important and that it all matters is because of that mutual TLS that I talked about earlier. So your services can communicate with each other just using normal plane HTTP traffic. When LinkerD is proxying that traffic, it will detect whether it's TLS or not and encrypt that traffic and send it over. This all happens through the magic of PKI. Who cares about PKI? Yes. PKI means to me, was it private key infrastructure? Is that right? I'm guessing. So we create a root certificate and we have issuer certificates. The reason that we have issuer, those two levels of security, the root certificate is the CA that is used by the LinkerD proxies to when they send a certificate signing request, they send this root CA, they say here is our mutually agreed upon certificate. Please issue me a certificate that I can use to talk to my equivalent proxies throughout the service mesh. The issuer identity are intermediate certificates that are generated from the root CA. And you'll see in my directory, excuse me, I specified west and east. It's important in a multi-cluster LinkerD configuration that the root CA is the same CA and the identity issuers are intermediate certificates from that same certificate key pair. The reason for this is because if for some reason your identity issuer certificates become compromised in one cluster, that is the blast radius of that compromised security boundary. Nobody can go into your other cluster and begin to tamper with your mutual TLS happening in that other cluster. Does it make sense? Security is very important. That's all I know. That's what they told me to say. Security is important. Okay, so let's see. This is, I'm gonna get confused a lot about which context I'm in. We're currently in the east context. So if we look at this, we'll see that, yeah, we have, I'm gonna try, is this big enough for everybody? Okay, so we're gonna see log lines that are gonna overflow and it might get a bit confusing, but LinkerD control plane is deployed in HA mode. I have three nodes running in this cluster, in this one cluster, and you'll see I have three replicas of each of the control plane components. This is default for LinkerD's high availability mode. You'll see from the command that I ran that I have values-ha.yaml specified there. This does things like configure the resources, requests and limits in Kubernetes, as well as set these three replicas. So you may notice that a couple buzzwords, maybe, I don't know. Things that I've mentioned, identity. This is a very important LinkerD component and the proxy injector, also a very important LinkerD component. Let's do the same thing for the west cluster. So we're deploying LinkerD to, gosh, I hope I did this right. The helm flag, which drives me crazy. The helm flag for specifying a context is dash dash cube dash dash context. For cube control, for, oh, we're gonna get into it now. Who says cube control? Nobody, who says cube cuddle? Who just calls it K? There you go, cube CTL for the win. Okay, so that binary just has a flag that's dash dash context. Helm, it has to be dash dash cube dash context. I don't know why. But yeah, so I've got the control plane in one cluster. I'm getting it across the other cluster. At the moment, the clusters are not communicating with each other. We've got a few steps to go before we get there. Who's following along, by the way? Like, who wants to follow along? Okay, good. Everything going well so far? Awesome. Okay. So, let's talk about what we're building. I showed you a diagram that had a few things missing in the slides. The things that are missing, like the whole Linkardy control plane is meant to be on these clusters. It's assumed. What we're building is identical clusters with Linkardy deployments across the thing, or across the two clusters. So, one of the really interesting pieces of the Linkardy architecture, and you all can tell me if you think this is interesting or not, the clusters, when you have a multi-cluster environment, are meant to be agnostic. There's no one single control plane for all. There's no control plane of control planes. Each cluster has its own control plane, and that's what we're building here. So, I stalled long enough to make that happen. So, PO is my alias for Q-control get pods. So, in the West cluster, in the Linkardy context, I have the three replicas of each of the control plane components. In the East cluster, same thing. So, we have identical clusters that are not ready to communicate with each other. Next, we are going to, and I'm gonna do the, oh, I promised to show you, we did the Linkardy helm installation, which is the 100% recommended way to do this in production. I think this is right. Ignore context, I don't know. Anyway, if we do Linkardy install, the Linkardy CLI, there are probably a couple things I should show you first, but I wanted to make sure that I got that flag right, which should be ignore context, skip. Okay, well, here are a couple things you can do with Linkardy CLI. You can do Linkardy check. CLI is not a sub command. I had straight word mouth to finger contact there. Linkardy check is going to go through and tell me whether the cluster, the control plane is ready, it's whether it's healthy. There's also a Linkardy check pre command. Linkardy check dash dash pre. Before you deploy the control plane, it makes sure that you have the correct roles and permissions to be able to deploy the Linkardy control plane. So I think it's worth mentioning here that we consider Linkardy an extension of Kubernetes. You could also think of it as an application that runs in Kubernetes, but it should be no surprise that anything that's managing all the traffic within your cluster is gonna have permissions that are elevated. In this case, specifically, I have an init container that has net raw and net admin. Is that what it is, Casper? Net raw, net admin capabilities. So it's an init container, it can configure IP tables. That's what it can do. There is a CNI plugin. If you're so interested where you can defer those responsibilities to a daemon set and the proxies don't need, or the pods themselves don't need that elevated, those elevated privileges. All make sense? Okay, good. So we did, oh, thank you. We did Linkardy check. We know that the cluster is healthy in the, this is the east. So we'll do the same thing in the west. I feel boring. Is this getting boring for anybody? How do we spice it up? Be honest with me. We're here for trust. We're here for trust, friends. Yeah, so this check's gonna run. It's gonna tell me everything's healthy. These clusters have been true to me for the last 20 days. So the next thing we wanna do is deploy Linkardy Viz. So as it stands right now, in fact, let's do something fun, which would be deploy our demo application. I'm going to create a namespace called Emoji Voto. Emoji Voto is our demonstration application. And, oh, already exists. Great. Let's see if we can annotate it. Let's edit. So I didn't do a great job of cleaning up this cluster. What I'm looking for when we edit this namespace is an annotation which doesn't exist. I'm gonna add it right now. This is going to be Linkardy.io inject enabled. This is going to tell the Linkardy proxy injector, which we saw in the control plane. Anytime a pod is scheduled in this namespace, inject it with the Linkardy proxy. There we go. And then I'm going to do, kA is cube control apply dash f Emoji Voto.emo. Emoji Voto.emo. Conference networks, notoriously slow. Things are configured, things are created, things are looking good. Okay, okay. So what I care about here, I've seen this so many times, I know exactly what I'm looking for. What I care about here are the number of pods, or sorry, containers in the pod, two of two. This means that the Linkardy proxy has already been injected into the pod. What we can do is kev is cube control get events. Kev is also a very good friend of mine. It's one of my favorite aliases. What I want to see actually is in the Linkardy namespace. In fact, the reason that I have this alias is because when you do cube control get events, it doesn't automatically sort by timestamp, which drives me bonkers. So I have this alias that does it for me. So what I'm looking for here, oh, lots of failed stuff, that's fun. This was all happening before, but what I care about is the proxy injector. I may not have the right logging level. What I wanted to show you is that when we look at the proxy injector, it is a, who's familiar with mutating and or validating web hook configurations? Yes, they're so much fun. Proxy injector is one of those. We've got another one called the TAP injector. Anytime a pod is scheduled, it reads the annotation and it updates the YAML in real time, injects the YAML into the pod. So I think what really matters here, let's look at one of the pods in Emoji Voto. Here we go. Actually, space, Emoji Voto. What did I do wrong? Oh, actually, I see. Thank you. Yes. Oh, what is this? You're right, you're right. Thank you so much. My fingers. Unfortunately, you have given your money to the CNCF to watch me type poorly for the next hour. So thank your company for that. But this is a live demo. Yeah. Emoji Voto-OYAML. Blah, there we go. Thank you for your help on that. So if we looked at the deployment, we would not see something like this in the NIT container, which is the Linkerty NIT container. This is the one that configures IP tables. And here is the Linkerty proxy. Outside of that, my actual service, which is the web service, which is a, I think it's a React application. Here are all its variables, values. If we looked at the deployment YAML, this is what we would see. Does that all make sense? So the proxy has been injected by an annotation. Give me a hands up if you get that. And this hands up. Come on, help me out here. Okay, good. So you can run Linkerty like this in production, and that's it. Like you have load balancing. You have mutual TLS. So I can't actually show it to you because I don't have. The next thing that we're gonna install is Linkerty Viz. And I could use Helm for this as well. I'm being lazy because my fingers are tired. And I'm going to install the Linkerty Viz components into the West cluster, and we'll do it into these clusters as well. This is going to give me the ability to show you what I want to show you, which is the secure connections between the services and as well the actual metrics, which we, if you remember, from my very important second slide, the thing that Linkerty is really good at is capturing those metrics and giving them to you in an actionable way. So it's one thing to have information and look at it and say, okay, I see my latency is X number of milliseconds. We'll see exactly what those are in just a second. To me, it's really important, and what Linkerty will open up for you is exposing these Prometheus metrics in a raw format that you can then take action on. So one of the cases where I've worked with customers, in fact, Casper and I have talked a couple of times, and maybe some of you in this room who, has anybody actually talked to me on Zoom? Like, are you all new to me? That's okay, if you are. So when I've worked with customers directly talking about solving problems with them and coming up with solutions for their architecture, what we get into oftentimes is these raw Prometheus metrics, which, like I said, I'm gonna show you, but what I care about is that we can do something with those metrics. It's not just, it's not like, yeah, they're there, they're great, we can actually do something with them. So if you recall, I have two clusters. I'm going to install Linkerty Viz into both clusters. While that's happening, let's see. Okay, so let's do Q, yeah. So Linkerty Viz stat, this gives me the statistics. So Viz is visibility, stat gives me the statistics. Let's get all of the deployments in the Mojivoto namespace. The context is West. So I did that right. Linkerty Viz is not found. East, okay. So I think the pods are probably still starting up. What did I do? Linkerty, what's that? Oh, right, the Linkerty Viz. So we do want to get the deployments in the Mojivoto namespace. Let's see if we can do this. Here's namespace not found, that's weird. Something should be happening. Okay, Casper, what am I doing wrong here? Say it again? Linkerty Viz stat, right? Let's look at the pods, see if we can get the pods. Okay, so let's do what that tells me to do and say Linkerty Viz check, West context, Viz not found. Did I, Viz, yeah. Linkerty, why is it, okay. I've done something wrong here. So let's, we'll look at all of the pods in the West cluster. Oh, I didn't deploy Linkerty Viz there, that's why. So let's go Linkerty Viz. I didn't do it, okay. Linkerty Viz install dash context, West, dash, dash. I did do it in the East though, right? Yes, okay. I forgot the context where, ah, good call. Okay, so I'm so glad you're here because I do this frequently and if this were me running through my presentation in the hotel room, I would be banging my head against the wall, so thank you so much for saving me the skull fractures. Yeah, we're in good shape. So as we go through this, Linkerty Viz check dash context East, one green check, two green checks. So production tip, I'll show you as soon as this is done running, we have a good number of customers and end users whether, you know, Linkerty community, I use, I talk about users and customers interchangeably. So please don't be offended if I call you a customer, please don't be offended if I call you a user. One of the really awesome things that folks will do is you can specify a specific output format for any of the Linkerty checks, including the Viz, JSON specifically. So some of the operators, people who are using Linkerty on a day to day basis, they will pipe, they'll just run this as a crown job, pipe it out to whatever their system is and ingest that JSON. And if something doesn't look right, like their network is slow, then they will alert themselves on it. So again, actionability is what is important. Is that a word actionability? If not, I coined it, I didn't point it. Okay, so we've got Linkerty, the control plane deployed and we've got the Viz deployed. I don't know what's going on with this cluster. Needless to say, there are some hamster wheels spinning very quickly trying to get this data back to us so that we can display it up, there it goes. So like I said, JSON, consume it, parse it, do whatever you need to do it, put it into your learning system and be able to understand what's going on. So now that we know that Linkerty Viz is installed, that it's running, what I wanted to show you is Linkerty Viz stat, specify the context explicitly and we'll look at all of the pods in the emoji voto namespace. Did I deploy it? I only deployed emoji voto into one cluster, didn't I? So there we go. What I'm gonna do is deploy emoji voto here as well. So you'll see I'm slowly building these clusters to be identical and that's typically what we see in a multi-cluster configuration with folks who are in production today. It's not necessarily what we see and Linkerty multi-cluster doesn't require that you have identical services running in every cluster. What it cares about is if you want to route traffic, which we'll see from one cluster to another, that service just has to exist in both clusters. So it can be one single service, not all the services. Does that make sense? Okay, good. Now I want to do Linkerty Viz stat, p-o-n emoji voto West. Hopefully this works. While that's happening, gonna find my water a bit parched. No traffic found. Why not? Why not? Let's go to the East cluster where we know, I think what's happened is I've just deployed emoji voto. There's a traffic generator and we use Prometheus metrics to display these statistics. What I think has happened is there just aren't enough metrics to display what we want to see. So if this says no traffic found, then we're in trouble. I'm gonna have to do some hand waving and distract your attention to somewhere else. Yeah. Any questions while we wait? No, okay. So we've got Linkerty, the control plane deployed. We've got the Linkerty Viz extension deployed. Here are my metrics. Lovely. Save the day. And often what I do is I will watch these because it's just so fascinating to see the request per second, all of this information. And again, you can output this as JSON if you want. You can consume it by whatever you need to consume it. Our people who use Linkerty in production have used different APM tools. I refuse to name any of them by name, but providers. Anything that can consume Prometheus metrics, you can export everything from Linkerty out to your provider of choice. Does that make sense? Okay, good. So we're watching, we're seeing every two seconds this is gonna update. I've got some success rates, failure rates. These look like the golden signals. Who knows what the golden signals are? Who wants to raise their hand again? Yeah, nobody wanted to raise their hand again. Okay, yeah, golden signals are specified by the Google SRE book, which as far as I'm concerned, not as I'm concerned, as far as I know, is basically the Bible for managing your DevOps practice. I use the term Bible with a lowercase b. You can also say dictionary. Your guide, your guidebook. Let's call it that. Yeah, so we've got metrics here. Perfect, Linkerty, Viz, Check. Next thing we're gonna do is deploy. So again, our clusters are identical. Next thing we're gonna do is deploy the Linkerty multi-cluster extension. NC is short for multi-cluster. If we look at the help for that, we see that there are a few different options. And I'm just gonna Linkerty install onto West. There we go. And thank you to the person who reminded me that I needed to do this. Context West dash F dash. So again, Helm charts are available for this. I'm just taking the shortcut because I can. Because we're in Spain and I can ask you who wants me to not take shortcuts and you get to raise your hand. Yeah, there we go. You got it. That guy got it. Okay, so we're gonna run the same command on the East cluster. Hopefully I did that right. Yes, okay. So we're deploying the Linkerty multi-cluster extension to the East cluster as well. So again, our clusters were slowly building them up. They were building them up to be identical. You will see a couple of things being created here. We've got a gateway and a service mirror. This is all that's in the multi-cluster extension. So it doesn't matter which cluster I'm in. I'll start with East. The way that I'm gonna set this up is I'm going to have the West cluster sending traffic to the East cluster. So one of the things if you all are interested in helping improve an open source project, we initially talked about like source and target, destination and I forget what like local and remote. A lot of terms here that really just mean which cluster has received the traffic and which cluster is serving the response. Today I'm going to follow our docs and I think our docs say local and remote. So for the sake of clarity, our local cluster is going to be the West cluster and our remote cluster is going to be the East cluster. Does that all make sense? Okay, good. Yeah, okay. So you will see that I've got Lincority gateways in both of my clusters. If we look at the service definitions for these gateways, we'll see that there is a public IP address. So get your mining bots ready because you are about to see an IP address for my cluster. So this is the East endpoint and then the West endpoint. I'm not going to do this today but you can put this behind an ingress if you want. The thing that's important here, I think it's actually is misunderstood or not well explained. These two gateways are communicating with each other using the same mutual TLS that Lincority uses with services on the same cluster. Does that make sense? So if any of you were to try and go and poke this 212, 246, whatever at 4143, you're going to get a 402, 401, 403, I don't know. Does anybody know about the coffee break HTTP code? What is it? 418, look up HTTP 418, it's awesome. I learned about it a couple of years ago. So anyway, this is all secured through the same mechanism those certificates that I used when I deployed Lincority. We've got the trust route and the Lincority gateways themselves are Lincority proxies with a pause container. So some of you out there who might be thinking like, hey, this would be a great way for Lincority to act as a gateway because it's called gateway. It's coming in the future. I get to tell you that announced at QGun. We talked about it already. It's on the roadmap in the open source project. Lincority will eventually present a gateway as an ingress as well as an egress. So that's on the roadmap, something that folks like you have asked for for a very long time. Okay, so this all makes sense. We have two clusters. They have public endpoints. They're still not communicating with each other. What do we have to do to make that happen? We have to link them. So this is the one part that requires the Lincority CLI. Everything else that I'm doing today I could have done through Helm. The Lincority multi-cluster link command. I'm gonna link east with cluster name east. So you'll see from this command, I'm calling Lincority multi-cluster link. This is going, because I have the cube config and I have cluster admin rights on both clusters, it's going to, the CLI is going to in the east cluster generate a service account and basically a cube config file with the appropriate permissions, not too much. It's the Goldilocks, not too much, not too little. The appropriate permissions for the two clusters to communicate with each other and to be able to specifically create endpoints. I will admit, all I know is that the Lincority multi-cluster functionality we'll see the service mirror component will spin up as soon as I do this. Whenever Kubernetes creates an endpoint in the remote cluster for a service, that endpoint is translated by this Lincority gateway to an endpoint that the source or local cluster can use to communicate. So let's do that. I hope that made sense. Again, it's the folks who wrote that code are here. I can read the code, we can sit down and read it together and we can understand it together. I cannot, as of this moment, tell you all of the things that are actually happening there to make this actually work. Is that okay? I apologize. They told you you were gonna get a Lincority expert. They lied. Okay. So let's look at Lincority multi-cluster namespace in the West context and what we should see, we've got this Lincority service mirror. Bueno. If we do Lincority check, Lincority multi-cluster check in the West context, what we should see is in addition to making sure that all of the services are running in the Lincority multi-cluster namespace, it's now aware of a cluster named East that has a mirror and that it can talk to. So now, unlike five minutes ago, now our two clusters are able to communicate with each other. If I run the same command in the East context, it's only gonna check to make sure that the components are there and exist because we have a one-way connection right now. So a couple of notes on that. You can have bi-directional communication between all of your clusters. Lincority multi-cluster supports N number of clusters. So this architecture that we talked about with not having a control plane of control planes, it's a trade-off. So now you can have every cluster looks like the same cluster. If one cluster goes away, you can still have that cross-cluster communication. And the important thing there, the thing I know, I hope you're all asking yourselves is, does it do circular checks? And yes, it does. So you can imagine a scenario where we're gonna export a service in a minute here. If you have bi-directional communication for N or multi-directional communication for N number of clusters and you export one service between, just to keep it simple, we've got East and West. So let's say I have my emoji service, which is the one that we're gonna export. If I have that exported in both clusters to, does it make sense where I'm trying to get here with, if there's a circular communication where a request would just fall down into the ether and never ever be handled with a response? This linkerty multi-cluster check checks for that. Raise your hand if you get it. All right, I promise not to abuse my raising hand power anymore. Okay, so clusters are communicating. They're nearly identical, but for the fact that our West cluster is the source cluster that is going to send traffic to the East cluster. How in the world are we going to expose our services in the East cluster to our services in the West cluster? We've got our service mirror component that we saw and the answer to the question that I just posed is that we add a label. So the service mirror component is a controller that watches for specific labels on or any workload. So we are going to actually, sorry, on service resources. So I could do a K label, but I'm gonna edit this because I think it's more explicit if we edit directly in the, in VI. So edit emoji service in the East context Okay, great. Okay, so I've got and a metadata. We're going to just add a simple label. And I believe it is mirror.linkard.io exported true. Oh, thank you. Okay, now for the moment of truth, we'll look at the pods in the, so at this point the service mirror in the West cluster should have created a service that should have created a service in the West cluster that can now route traffic from the West to East cluster. So what I want to look at is in the West cluster, I want to look at the logs for the service mirror pod to see if that actually happened. And I could probably do a cube control, get events and it would give me the same thing. But let's do this the hard way. West and it's service mirror, I think. Wrong, name space. Okay, so what do we see here? What do I care about this? So the emoji service, creating a new service mirror, emoji, voto, emoji service. What I wish that I had done before I showed you this was if we look at the service resources in the emoji voto name space in the West cluster, what we'll see is that service mirror created, you can see 103 seconds ago, emoji service East was created. This was automatically created by the service mirror because I added that label. Straightforward, does that all make sense? Okay. Still, traffic is not going between the two services. What do we need to get there? Does anybody know what we need to get traffic from one cluster to another? I know you know Casper. Who's familiar with the service mesh interface? Specification. That's the last time I'm gonna ask you to raise your hand, I promise. Yeah, so the service mesh interface specification was written by the folks at Deus Labs in conjunction with the folks at Lincordia and Boyant and I think some of the Google folks were involved. Anybody else know? Do you know who else was part of that? Okay, so what the specification seeks to do is define a set of specifications for doing things like traffic policy, traffic management and Lincordia today supports the traffic split specification. So I have, and you might see some secret things here. So okay, good. This is perfect. This is exactly what I want. Okay, this is a traffic split specification. API version split.smi-spec.io. If you are interested in more about the service mesh interface specification, it's smi-spec.io. We actually support v1-alpha-2. I don't know why I have v1-alpha-1. They're up to v1-alpha-4 at this point. It doesn't make a difference, although Lincordia only goes up to Lincordia, or sorry, v1-alpha-2. Traffic split resource. It has a name. It has to be in the same name space. Who has seen this before? Because it took me a minute. So here's the thing. There's the concept of an APEC service and a LEAF service. In this definition, you see that service, right? It says the root service. That's synonymous with the APEC service. Is the one that decides where traffic goes. The reason that I struggled with this so much, and in this case, the APEC service is a concrete service. This APEC service or this root service can also be an abstract service, which means you have just a service definition with no pods backing it. Is that confusing to anyone? Don't raise your hand, just like, nod. We're done raising hands today. So it confused me. It really confused me a lot. I finally sorted it out. Anytime that I do traffic splits today, I have a concrete root service, and this is the part that can be confusing. The back ends, which actually handle the traffic, have to be concrete services. In this case, our concrete service, our APEC service is also a backend. So I will ask you to nod aggressively if that makes sense to you, because it was super confusing to me at first. Okay, good. Okay, great. So you all are brilliant geniuses. We're gonna apply this, you know what I didn't do? The right context. So this is actually interesting. It doesn't matter that I have it in the east context. It's gonna ignore it. I do want to put it, I don't need to specify the namespace because that's already part of our definition. What does matter is that it's in the west context. This traffic split is configured to route traffic from our source cluster, the west cluster to the east cluster. So we now have everything routing, or we have traffic going between the clusters. Do you all believe me, or would you like to see it? I'll show you, don't worry. That question answers itself. Don't believe me ever. Linkerdivistat context, west Emojivoto. And specifically, the stat command has a TS sub-command traffic split. Ignore this part. That's fine. This is talking about a future version, linkerdiv2.12, where there's gonna be some extra steps involved. It's not important today. What's important today is that we're all in Spain, hanging out together. There we go, okay. So let's watch this one. We're still waiting for traffic to come through, but what we should see is those latencies, the success rates, all those golden signals should begin to populate as those Prometheus metrics are collected. See me separately, I'll be at the booth. If you find me out on the floor anywhere, I think, personally, I think how to get to the raw Prometheus metrics in Linkerdiv is super important. I don't have time to go through it and show you today, but it's really neat. And if you find me, I will show you how to do it. It's also a very handy debugging trick for Linkerdiv. So guess who has egg on their face? Why do I not have any traffic? Say it again. Oh, did I, oh, brilliant. In the West context, right? Brilliant, okay. So I showed you how to annotate the namespace, which is gonna do everything. Since we're here doing this, we could actually keep control, edit the deployment itself to add the annotation. One of the simplest, easiest things to do when you're using Linkerdiv is to put the annotation in the wrong space or in the wrong part. It should be in the pod spec. So when you have whatever your deployment, your stateful set, your daemon set, you've got the template, and then you have the pod spec with the annotations. What I'm gonna do right now is some CLI magic. Let's do... Keep control, get... Actually deploy, emoji voto, West. This is gonna get all the deploit, the YAML for all the deployments. I am going to pipe that into Linkerdiv. Inject dash f dash. I don't think I need context here, but I'm gonna do it just as like to be safe. Again, this isn't going to apply any... Ooh, what'd I do? This isn't gonna apply anything. It's going to generate a bunch of YAML, and it's, again, up to me to apply it, but we see the CLI reports that things were injected. So, specify the context. That's good to go. We haven't even gotten to fail over yet, and I have 15 minutes. Anybody else nervous? Okay, so we see the pods, if we do watch... Nope, emoji voto. We should see these pods rolling. Oh, wrong context. So as you can imagine, whatever CI-CD system that you have set up, you will want to use the context as a template parameter. So we see our pods are all restarting. We have two of two containers inside of each of the pods, one of those to the Linkerdiv proxy. The other is the original service. Now, thank you to my friend. Visit the booth, we'll get you some swag, because you saved my bacon. Let's take a look at... We'll watch these again, and we should actually see these metrics now. We'll see the traffic going 75% over to the east cluster and 25% going to the west cluster. So what's interesting here? Anything that's going off cluster from the west cluster to the east cluster, we would expect to see higher latencies overall. Request per second, the success rates are interesting. I don't know why that's that low. We do have a built-in error into our application, but it's usually around 83%, 85%, something like that. So that's interesting to me. Outside of that, this is traffic splitting. Our clusters are now handling... Our cluster in the east is handling traffic that's coming into our cluster in the west. Make sense? Great, okay, let's get to failover, because 13 minutes, I'm scared. The thing I want to do is talk about this traffic split itself. What I'm going to do is ship all the traffic to the east cluster. So if you're familiar with Kubernetes resource requests and limits, this should be similar, familiar syntax to it, uses micro particles. I don't know. Anyway, it uses millies. Oops. So what I'm going to do, you can also do. Oops. Lots of oopses going on over here. Sorry. Hi, there we go. 0% of the traffic is going to go to the local cluster, the west cluster. 100% of the traffic is going to go to the east cluster, edited just the way I wanted it to. And we should see that update immediately in the... in the output, the stats output. Make sense? Everybody sees the weights. They've changed. This is where, OK, this is where it gets fun. You're ready, y'all. Couple of things that we want to take a look at. I have a repository for you to visit. Don't look at this stuff. It's not important. Let's go back here. Oh, wow. OK, we want to go to demos. Come on, Google. This is where I keep files for dot, dot, dot, demos during presentations, GitHub, et cetera. So the important bits are in this linkerty failover directory. I've actually forgotten what other stuff is in here. There might be something interesting. There might not be. So the setup.sh is a bash file that will use K3D, if you've got it, to set up your clusters for you and get everything in the right state. All it takes to do this failover that we've talked about is these three YAML files. I've got this Emoji Service 2, which is actually just another set of pods that do the same exact thing as the Emoji Service. And the part that really matters, again, is this traffic split. So this is how we've written the linkerty failover functionality. The part that matters is failover.linkerty.io primary service. And you know what I haven't done is I haven't installed the linkerty failover extension. Let's see. So because we are short on time, let me save myself this energy. We'll go to linkerty failover. It's in its own repository. It's an extension. I'm going to use Helm 2 to play it. Today it works with linkerty 2.11.2 and greater. It does not work with anything prior to that version, just so you're familiar or so you know. I didn't do the right context on this, so we'll have to go back and do that. The failover controller is another controller written in Rust. It looks for that label that I pointed out in this YAML file, failover.linkerty.io, or sorry, annotation. And it says, if this exists, then I will keep an eye on the health of the service and make sure that if that service is no longer available, route traffic over there. Remember that thing that I said early on? Passed me? Brilliant. Current me? Not so much. Yeah. Seven minutes. We're winding down. I apologize, you can tell that my gas tank is getting empty. But hopefully this is all making sense, and you're going to see some magic here that I am very excited to show you. Let's go back over here. Great. So we already have our linkerty failover controller running. I am going to delete this existing traffic split, because it's confusing to have both of them there. I'm going to put up this other traffic split shortly. Yeah. So let's go. Great. OK. And then what I want to do is not that one. OK. Let's go back over here. We'll get these raw. I need to get all of the different YAMLs. So we're going to create a new deployment, a new service. And again, these services do exactly the same thing as the existing service. Hey, thank you. There we go. Where are you? There you go. OK. So we should see a new set of pods called emoji-2 being created in the emoji-voto namespace context-west. Good. OK. And then we will do the service five minutes. So nervous. Don't mess this up. Like, I came to Spain to be on time. I did all this. And now I'm so worried that I'm not going to do this attack. OK. This created that traffic split for us. Yeah. OK. So if I do ETS, nope. ThinkerD, viz.ts-an-emoji-voto, Westy. There we go. Stat. I forgot the stat. OK. So we're going to put a watch on this once it comes through. No traffic found. No. What's that? I'm sorry. Oh, did I not inject it again? Brilliant. You are brilliant. Emoji-2, dash-an-emoji-voto. This one will go in manually and spec. Whoa, whoa, whoa. Nope. Down, down. Like I said, this is the part where folks and myself, like myself, mess up the most. Here we go. So this is on our pod spec. We do egerd.io-inject-enabled. Did I do that right? I did. OK. So hopefully this is going to start giving us some output. 90 seconds. From 90 minutes to 90 seconds, I am going to. Oh, no. It all came down to this. Why? Why, why, why? I applied the service. East? And we're going to scale down, oops, context. West, replica zero, emoji, the emoji-voto namespace. Let's make sure that we have no pods running there. Terminating, great. Oh, interest. Right. OK. Emoji service, emoji-voto namespace. I pulled it straight from the GitHub repository. Anybody figure out what I've done wrong? Other than show up today with a terrible demo? I swear it worked yesterday. OK. Well, thank you all for your patience. The demo gods were not on my side today. What you would have seen as we watch the traffic split statistics when there's actually traffic being sent is that all of the traffic, when I scaled down that deployment of the emoji service on the local West cluster, would have been routed to the emoji-2 and emoji-service-east. So apologies. Find me on the floor, and we'll get this sorted out. Like, I'm probably like two keystrokes away from figuring out what went wrong. Thank you so much for your attention. I really appreciate it.