 Thank you so much, everyone, for coming. It's lovely to see such a full room, so I really appreciate you spending time here to learn about Sillium and how you can use Sillium cluster mesh to really make it very, very easy to deploy your services across multi-cloud and multi cluster. I'm going to try and get these out of the way. I don't think they make any difference because I'm using this. OK, great. I work with Iso Veilant, who are the company that originally created Cilliam, so I have the pleasure and privilege of working with some incredibly talented engineers there. I've also been very involved in the CNCF over the years with the Technical Oversight Committee and now the governing board. I also work with the Open UK board as well. So, Cilliam cluster mesh is not really a new thing, so really what I'm going to be talking about today, there's no new launch, but this is really about diving a little bit more into what cluster mesh is, how it works, how easy it is to use, and I wanted to also just talk a little bit about my experiences of connecting clusters in multiple clouds. I'm going to do something ever so slightly reckless, which is that I do actually have a cluster running in EKS, so I'm going to be trying to use the Wi-Fi and I have another cluster running in Google Cloud. So, let's hope that I will be able to continue talking to those clusters for the next half hour or so. Is that big enough for everyone to see? Yep, great, good. So, cluster mesh really enables us to distribute the functionality that you get in Cilliam across multiple clusters, whether those clusters are running co-located in the same cloud, perhaps they're in multiple different regions, perhaps they are, as I'm going to do today, running in different public clouds, and there are all sorts of reasons why you might want to do that. So, the first thing you need to do is actually connect those two different clusters, and this isn't really anything to do with Cilliam, this is about how you set up a connection. In my case, I have connected my VPC in Google and my VPC in AWS with a VPN connection. So, one thing you have to do is learn the terminology of what these things are called in AWS and what they're called in Google or what they're called in Azure. The terminology might be slightly different in different places, the interfaces might look slightly different in different places, but you are essentially trying to set up something that's symmetrical. So, I've got a VPC, essentially I've got one subnet in the Amazon cloud, and I want to be able to get connectivity between that and my VPC in Google cloud. To make things kind of easy, I tried to use .5 for my Google cloud and .6 for my Amazon cloud, although it turned out Google also wanted to allocate a few other IP addresses for pods and services, but I should refer to them as 5 and 6 occasionally. We have this VPN connection, it actually consists of two VPN tunnels that link those two clusters, but we also have to advertise the addresses from either end so that if the Google cluster sees an address that's 10.6, it knows I'm going to reach that through the VPN connection. If the AWS side, if a node there is looking for a 10.5 address or a couple of other subnets, it needs to know I'm going to reach it through the VPN connection. So, you set up something called something like a virtual private gateway on your cloud, on your, attach it to your VPC, and you have a kind of virtual representation of the remote end, and you run VGP to advertise the addresses through that VPN connection. Cillium does not really care about any of this, which is why if you run cluster mesh in kind nodes on a local machine, it will look exactly the same as it will here running it in two completely different clouds. Cillium really just doesn't care about how those two clusters are connected. It does, however, care about being able to route or route, if there are any Americans in the audience, being able to understand how to reach IP addresses. And they need to be unique. You couldn't have, I mean, here I've got 10.5 in Google. Couldn't also have 10.5, you know, the same subnet in the Amazon cloud because it would be very confusing. How would you know where to route to? So we need unique IP addresses, non-overlapping sets of IP addresses for the pods, and also for the nodes. So once you set up this VPN connection, we've got VGP announcing routes from one end to the other, you get a routing table at either end. This is the Amazon end. And what you can see here is that orange entry in this table says everything that starts 10.6 is local, is on this VPC. Everything that's 10.5, or these other two subnets that Google allocated, I'm going to reach through this virtual gateway. So that's really saying go through the VPN tunnel. And then anything else, it's just on the internet. So reach it through the internet gateway to the outside world. And there's a similar kind of symmetric version of the same thing in the Google end of this. So again, here it's got those three subnets that are in the Google cloud. They're routed locally. If you want to reach the Amazon cloud, for reasons that I can't give you a good explanation for, it lists just two different tunnels in the Google side, but it's just seen as one combined connection on the Amazon side. There are two tunnels in both cases, but you can reach those addresses through either of those VPN tunnels from Google's perspective. And then if you had an address that wasn't 10.5, or 10.6, or 10. whatever, you would go through the default internet gateway. So these root tables tell us how the nodes in one cluster are going to be able to reach, which interface to use to reach the remote end. Now, I could probably spend 50 minutes doing a talk about how long it took me to realise you also have to propagate the routes. This is a pretty annoying checkbox if you're trying to set these up. You have to tell the virtual private gateway to inform the nodes in your subnet about those remote addresses. Otherwise, they just don't know how to reach the remote end. In addition, having set up the routes, you also have to have the right permissions to send traffic. So typically, when you set up a VPC in a cloud, it will create some default rules for you that will let you send traffic within that VPC. So maybe it's going to say, permit all the 10.5 traffic in my 10.5 VPC. But you have to configure your own rules to say, I also want to be able to accept traffic from this remote cluster. And we don't have time or inclination, and it's really nothing to do with psyllium, but you need to know to set up the firewall rules of the security groups or the network access control lists to allow at both ends the remote traffic. Otherwise, it's not going to be... It can get to the remote VPC, but then it can't get to the destination node. So having got your connection between your two different clusters, now we can start talking about psyllium, and we can start talking about enabling psyllium cluster mesh. So we have to give each cluster a unique name and ID, because otherwise they'd get confused about each other. So I've called them five and six. One of them is UKS, one of them is GKE. And they do need to be using the same routing mode. Psyllium can either use native routing, which means... Well, I'll talk about... The other option is encapsulation or an overlay mode in which your packets get kind of encapsulated in a bigger packet. So you have to be using the same encapsulation at either end, otherwise the packet that you received would have a weird address that you didn't expect. In my case, I've got native routing at both ends. Okay, so I'm not going to do this live because it's already set up that cluster mesh is enabled. When you run the cluster mesh enable command, what it creates for you is this cluster mesh API server. So let's just make sure that our cluster meshes are up and running, that our whole systems are up and running correctly, and let's hope we're going to get Wi-Fi to tell us that psyllium is up and running. We'll come back to it. What we should see is something like this, it's deployed okay for the cluster mesh, and it will have deployed pod and a service for cluster mesh API server. And we'll come back to that in a second. I'll just talk about the API service for a moment. So when you enable cluster mesh, you get this API service that allows remote clusters to learn about the services on that cluster. So each agent is going to listen to the remote API services that it's connected to for cluster mesh so that it can learn about the services that are running on that cluster. So let's see whether, yeah, okay, it's come back in both places and we can see cluster mesh is okay in EKS and it is okay in GKE, great. And if we take a look at the services, I think it's in Qubes system, let's just check. Yeah, we should see here. We've got the cluster mesh API service running here. Accessible externally, great. So the bit that I skipped over is running cluster mesh status where this tells us whether or not clusters are connected. I haven't connected them yet, so they're not. But we would see, well, have in real life, but in my slides I haven't. So we would see that the cluster information would be available on this service. I'm now ready to be connected to a remote cluster. And we do that with cluster mesh connect. And all we need to do is pass in the Qube config context. So in the same way that you set up your context, if you're running a Qube control command, or how many people say Qube cuttle? How many people say Qube CTL? Do we have any Qube control folk? I hope to convert you. Qube control. When you run a Qube control, or you can just say K like I do, have an AES, that's the easy way. So if you're running Qube control, by default you're going to pick up your Qube config, or you can specify a context, a specify a config file. Cilium cluster mesh is going to run some, essentially run against your Qubinetti's cluster locally, and also the remote cluster that you're connecting to. So in this example when I ran this, I ran it on cluster six, and I said I want to connect to cluster five, and it looked locally and remotely and figured out how to connect those two services to each other and send each other the information about the remote API service. So if I look at the cluster mesh status, we should be connected in both cases, I hope. Okay, good. We've got global services of one in both cases, and we can see the remote end in both cases. So good. All right. It looks like that. So now we can run global Qubinetti services that are accessible from both those places. Or since I like to have Star Wars themes demos, let's consider them as intergalactic Qubinetti services. So all we need to do to make a service intergalactic global is we just have to annotate it as global is true. There are a couple of other annotations I can use that I will be showing, and to make my life easier, I've added them all in. But if I look at my rebel-based service, we should see, yeah, it's got global, marked as true. And the same should also be true in the remote end. Yeah, global is true. The shared also means not only is this a global service, but also I am prepared to advertise it to my neighbouring galaxies. So that means I can... Actually, let's run it a few times. Yeah, so I can exec into my X-wing in this galaxy, and if I send a message to the rebel base, sometimes it's going to get to the local galaxy, sometimes it's going to get to the remote galaxy, and sometimes we're going to hit live demo issues, which we'll gloss over because... But yeah, so randomly it's picking between the local and the remote cluster. And we could do the same thing from this end as well. It's taking a bit longer. Actually, I've got a time-out on this one for a reason. I'll do that again without the timer, and hopefully we will see time-out. Hopefully we will see... I hope we're going to see responses from both clusters. Maybe one cluster is... I'm going to say that EKS cluster maybe is responding a bit more slowly. Maybe it's the Wi-Fi, I don't know. Hopefully. Hopefully we'll see both ends live demos. OK, so what's happening here? We've got an X-wing pod in this particular case. I am sending requests from both sides, but my diagram shows my X-wing making a request to the rebel base service, and I used the name, so there was a DNS lookup that kind of converts the name to a service IP address. And then, like in any other Kubernetes service, we're going to load balance from that service IP address to one of the available backends. OK, so let's have a look at the services... Get service. Let's have a look at the services. And we can see... There's a few services here. Here's the rebel base service, and if I describe that service... OK, I did this a minute ago, but we're going to look at something slightly different. OK, so that's got two backend endpoints. I hope that's visible enough. So there are two IP addresses, and they correspond to the local pods. So here are two pods, 10.6.1.60, 10.6.9.41. And yet, when I ran my request, sometimes they were getting responded to from my remote cluster. And if I give up waiting for this, but try and look at the pods here, so these have 10.172 addresses. So how... If Kubernetes only knows about the local services, how are they getting sometimes sent to the remote addresses? And the answer is that Cyllium knows about them. So if I exec into my Cyllium pod here, I think it needs to be a queue system, and I run command... I think it's endpoint list. Let's try that. And we can see that endpoints that this Cyllium pod knows about. And if I need service list, that's the one that's more interesting. Service list is going to tell me how a service relates to pods that are the back ends of that service. And I've got a great memory. I can tell you that this was the service address for Rebel Base, and there were four entries here. So we've got a couple of 10.6 addresses. Those are the local pods. And we've also got some 10.172 addresses, which are the remote pods. So Cyllium is going to load balance across those four pods, and sometimes it's going to choose a 10.6 address that will just get routed locally in the local VPC. And sometimes it's going to pick a 10.172 address, which is going to get routed over that VPN tunnel to the remote cluster. It's as simple as that. Cybinetys doesn't know about it, but Cyllium does. OK. This is why it's important for those pod addresses not to overlap because of the need to be able to route the pod addresses, to be able to route to different destination pods. So what can we do with the ability to send a request to remote galaxies other than being able to contact different Rebel bases? Well, we might choose to do topology-aware routing. We might choose to say, well, I would prefer a local service if a local service is available. But if the Rebel base gets blown up or in fact doesn't exist on Dantooey, then we'll just send traffic to another cluster. So if I edit the service... Oh, yeah, is that right? ..and I'm going to change this affinity and say I would prefer... I'm not sure I did that, yeah. I would prefer local pods. So now if I make some requests here, hopefully, since I'm running on the EKS cluster, we should see them all being responded to locally from Dantooeyn. I'm going to have to find out what it is that is causing that output to crash sometimes. But you can still see the responses are coming back on a regular basis, always from Dantooeyn. That's because Dantooeyn is local. It's the EKS cluster and I've preferred those pods. Now, if I scale down those pods locally, so we'll basically remove them, so there are no Rebel bases locally. Let's check whether that's happened. Okay, we've got no Rebel base pods running locally now. And if I rerun my request, we should see them fingers crossed, getting sent and responded to from Alderaan. Great. So that allows us to do things like failover or gracefully move traffic from one galaxy to another. And it can allow us to handle graceful moving of services, moving of traffic during something like an upgrade, for example. We also have the option to prefer a remote cluster, so we could scale the pods back up again. Let's do that. I'm going to edit the service again. We'll prefer the remote pods. And even though hopefully we've got some pods back up again here, we've got two Rebel bases here, but we should nevertheless see any responses getting preferred rooted to Alderaan in the Google cluster. The other thing I could do is I could decide that Alderaan doesn't want to receive this traffic, so we could change this side to say, you know what, although it is a global service and I would like to be able to reach both clusters, I don't want to share. So we should find this side can go to either destination, but this side, I don't know what's happening, why it's not going to. So even though on the EKS side, I would prefer to be going to the remote cluster because the remote cluster is not prepared to share with me, I'm getting all my responses locally. Live demo, gods are not behaving very well for me in the Google Cloud apparently. OK. Right. So we can prefer a local endpoint, we can prefer a remote endpoint, and we can prefer local but failover to remote if there are no local endpoints available. Another thing that we can do that just works seamlessly is network policy, and we can apply Selium network policies with cluster-specific information. So an example would be something like this where we can say I'm only going to allow a local pod or something that has the X-Wing class label, but it also has to be in my local cluster on the EKS cluster. Those are the only ones that are going to be allowed to contact the rebel base. I'm beginning to have a suspicion why that wasn't working a minute ago. So if I apply that policy... Policy? Yeah, it's unchanged. That's why the remote end can't get there. Let's delete it. Let's... Actually let's just... Yeah, let's delete the CNP. CNP ingress, hopefully. I think that might well be. That might well explain why I wasn't getting responses at this end. So if I put the policy back in again, we wouldn't see those responses. And we would be able to see. So let's take a look at some Hubble traffic. Yeah, we want to look at some policy verdicts. I need to port forward my Hubble. So Hubble is going to let us see individual packet flows. And I'm just going to look at the ones in the galaxy far, far away that are policy verdicts. And... Oh, I removed that policy. Let me add the policy back again. Sorry, policy. I think we should see occasionally that... There we go, yes. Start seeing requests from the Google cluster getting denied because of that policy. Now, one thing that's kind of... I don't know, just a little subtlety is if I look at the pod. Let's get pods and look at the labels. There is a label talking about the X-wing class. There is no label saying that it's in that particular cluster. So it's a kind of implicit, automatically applied label that we're looking at with this policy. It says match labels, it's implicit label. And if we looked in the... I'm going to actually do it here because I know I've got this environment variable set up. This is why I wanted to do end-point list because somewhere in here we should see... Well, I'll try and find the right X-wing. Where's X-wing? You're in here somewhere, I'm sure you are. There's a lot of... Here we go. So this is the X-wing pod, or the end-point that happens to be in that X-wing pod. And we can see, hopefully somewhere, there should be, yes, the implicit label saying that the cluster is the EKS cluster. So that's what Cillium is using to match against that policy that specifies the cluster. OK, so it is a very big universe out there. So the question is, how many galaxies can we connect? And the answer is that we can... It used to be a limit of 255, and it's now 511, which it turns out is all ones in binary, so I think that's why. I've actually currently got it connected to only 255 because I really wouldn't have the patience to set up another 509 clusters, so I don't know. The trade-off you get is that you... because we need to be able to convey the identity, and the identity includes the kind of cluster, so you have fewer possible security identities in each cluster if you decide to go beyond that 255 limit. So if you are working in a universe with a lot of galaxies, you may have to have... it's still quite a lot of security identities. Yeah, I was going to mention KV Stormesh, which is really the change that's gone in relatively recently. I think it was 1.14, actually, to essentially have a cache of all that information that we're learning from those remote cluster API services. And that enables us to scale beyond the 255 to the 500-odd limit and still get decent performance. So I hope that's shown you that cluster mesh is pretty... I mean, all you have to do is annotate a service and you can make it available across clusters. You can decide whether to prefer traffic locally or remotely, and you get all of these capabilities of Cilium that are just shared across clusters, which I think is almost magical. So I hope you want to try it out for yourselves, and if you do, there are some really great labs on isovalent.com slash labs. They are available merely for the price of your email address, and you can explore cluster mesh without having to go through the trouble of setting up clusters, and it will walk you through all that process. You can also come to the isovalent booth. We've got a number of different labs running over there, and later on, well, this evening, at half past six, I'm going to be signing some books if you want to come along and say hello. So thank you for exploring the galaxy of cluster mesh with me today.