 So thank you, thank you everyone for attending. I know it's Friday, it's a bit late, so really appreciate it. I hope it will be an interesting talk and you learn something new. So let's start, the talk is about implementing MTLS in Istio multi-cluster environments using Aspire. My name is Samu Veloso, I'm software engineer at Solo.io where I'm building a multi-cluster service mess based on Istio and Envoy. Yeah, my name is Edu Bonilla, I am customer success engineer in Solo.io, working with open source technologies, Istio, Envoy, Aspire, so hope you like the talk. OK, let's start from the beginning. So what is MTLS? Most of you probably already know it, but for those that don't know it, let's start. So most of you probably are familiar with TLS. It's a traditional protocol used by many websites to encrypt the traffic. So in TLS, the client has to validate the server from the certificate in order to establish an encrypted communication with MTLS, both the client and the server authenticate each other. So in this case, also the server has to validate and authenticate the client certificate in order to establish an encrypted communication. This is more powerful because with MTLS, you can define policies and restrict the access to your server only to some specific clients. So in order to establish or to use MTLS, you need to generate a client certificate and a certificate and a key for your client and for your server for both of them. So this is simple. If you only have two apps, a client and a server, but in enterprise environments that you are probably more familiar with, with Kubernetes environments, they are really dynamic. They, the pods are being recreated very often. So in order to handle the certificates, it's you have to use some kind of tool because it's impossible to handle this mess just with manual actions. So the technology that we are going to show that solves this problem is SPFI and SPIR, okay? So let's review first what the SPFI and later we will review SPIR. So SPFI stands for Secure Production Identity Framework for everyone. It's an open source standard for securely identifying software system in dynamic environments. So basically SPFI is just a standard, okay? It talks about how to design your system to be compliant with SPFI. So there are several concepts. In both in SPFI architecture that I'm going to go through. The first one is the workload API. The workload API is in charge of generating a document called SBID. I'm going to talk about a lot about SBIDs. So SBID is a verifiable identity document. It's a document that the workloads are gonna use to establish the MTLS communications because inside the SBID, we are going to put an SPFI ID, okay? So the SPFI ID identifies a specific workload within a trust domain. All the workloads running, for example, in a cluster, the cluster will be the trust domain as we will see later with Edu because we are going to have two different clusters so we will have two different trust domains. And the SPFI ID is just Yuri compliant, okay? It's you have to put inside the SPFI ID, the trust domain, and also the workload ID. And the workload ID depends a lot on your environment configuration. So the important thing here is the SBID. So the SBID, there are two different formats of SBID. We have the X509 certificate and the JOT token. The X509 certificate is preferred over JOT token. So in the rest of the talk, we are only going to speak about when we talk about SBIDs, we are going to talk about only X509 certificates. And it's a certificate generated by the workload API with the SPFI ID inside of the SBID. And for example, as you can see there, okay, it's a normal certificate, an X509. But in the extensions, in the subject alternative name, we are putting this SPFI ID inside there, okay? So the workload API will return the SBID, the private key, and the bundle of the CIA of the trust domain to the workload. And the workload will use that information in order to be validated by other workloads in the trust domain. And as the talk is about multi-cluster, we are going to have different trust domains and every trust domain has to trust each other, the rest of the trust domains. So SPFI declares that you have to expose every workload API has to expose the trust bundle, which is the public information of the certification authority that is issuing the certificates to the rest of the trust domains. So with the trust bundle information, you can federate your trust domains into each other as we will review later in the demo. So this is SPFI, and now SPIRE is just one implementation of SPFI. There are several others, but here we are going to review SPIRE, okay? So in SPIRE, it's more complex architecture. We now, instead of having only the workload API, we now have an agent and a server. So the agent will generate the SBID for the workloads in a process known workload attestation. In order to do that, we need first to register the different workload entries with the registration API that you have just in there into the server. And the server will store these identities into a database that are different in SPIRE, it supports several different storage vacancies with the data store plugins that you can define, and the same with the certification authority. As the server has to generate certificates for the different SBIDs, we need to use a CA plugin. So you can use just an external certification authority. By default, it will auto generate a certification authority that will use to generate the certificates. So that's the server and now the agent in order to be able to trust the server. And vice versa, also the server has to trust the agent. The agent will be authenticated, okay? If this process is not attestation. So we are going to see how this works, okay? This reparts the registration API, the not attestation, and the workload attestation in Kubernetes. So the first thing is how we register our workloads with SPIRE in a Kubernetes environment. So in the SPIRE server, that's a pod running, okay, in the SPIRE system namespace, and inside the SPIRE system pod, we have two different containers. We have the SPIRE server itself and the controller manager. So this controller manager is continuously watching for pods that match the specification in the cluster SPFID custom resource. So we define, for example, a pod selector. So only the pods with these labels can be registered in the SPIRE server database. And the SPFID that will be generated for that workloads will match the, what we define in the SPIRE FID template. In this case, it's the trust domain, the namespace of the workload, and the service account of the workload. This is what we are going to, this format is we are using with Istio later on. So once the workloads are registered into the SPIRE server, we need the agent to grab or to get this, the SBIDs for this workload. So this is what this node attestation, okay? The SPIRE agent or the SPIRE server needs to trust the SPIRE agent and just only trust the SPIRE agents to avoid others, to impersonate the SPIRE agent and return the SBIDs to a different agent, okay? So the SPIRE agent will generate a CSR, a certificate signing request and it will use the service account token of the SPIRE agent. The SPIRE agent is a demon set running on Kubernetes. On every node, there is an SPIRE agent running with a service account. With that service account, the SPIRE server, so here we need a TLS communication, okay? Because we are sending confidential data, the service account token to the SPIRE server, so this communication has to be secure. And with that token, the SPIRE server will go to the Kubernetes API and with the token review API, we'll grab the information, the metadata for the agent. And with that information, it will register the agent into the SPIRE server database. It will generate also an SPID, but in this case, the format is pretty much different because in this case, the format is based on the planning that you are using to register the agent. For example, in this case, it's Kubernetes projected service account, because we are using just the path of the service account token. And the selectors are retrieved from the token review API and the selectors are used just to identify the agent inside the database. So once we have this, once we register the agent, we are going to generate an SPID for the agent that the agent will use to establish an SPID communication with the server and now it's gonna cast, it's gonna retrieve all the SPIDs for the workloads running on that node from the server and they will be cast in the agent. And finally, how now the workloads get that SPID from the SPIRE agent, okay? So in this case, the workload is just running on that same node where the SPIRE agent is running and they are going to communicate with a unit socket. So that unit socket is gonna be mounted by a different component, the SPIRE CSI driver. The SPIRE CSI driver will mount this unit socket into the workload and the workload is going to use this unit socket to communicate with the SPIRE agent. The SPIRE agent is exposing this unit socket at GRPC API and the workload is gonna request at this endpoint, the SPIFI workload API, the SPID. So the agent is going to get the PID from the workload, the process ID, because that information cannot be impersonated. The workload doesn't decide the PID, you cannot set the PID when you are creating a workload. So that's unique and with the PID, it's gonna get from the kernel, from the C groups, the container ID and the pod UID for that workload. And from the KubeNet API running that node, the KubeNet that is running, it's gonna get the rest of the information, the selectors that we are using to register the workload into the SPIRE server and from the list of SBIDs available in the agent. So with that selector, with that information, it's gonna get the SPID and it's going to return finally the SBID to the workload that we'll use to establish MPLS in the trust domain. So that's a bit of theory about SPIRE and SPIFI how it works. So now Edu is gonna show you in action how all of these works and how to set up a multi-cluster environment with all of these. Thanks, Samu. So now I am going to talk about how you can integrate SPIRE and IST in action in a real environment and then I'm going to show a real-time demo. So first of all, I'm going to make like a high-quality architecture of our demo which is composed of two Kubernetes clusters, Istio cluster one and Istio cluster two. And in each of them, we will deploy one SPIRE server in order to get the domain of its cluster. Then we will deploy one SPIRE agent per cluster because this demo is only done with one Kubernetes node so it's one SPIRE agent per node. Also both SPIRE servers will be in communication between them because we will federate the vendors from its them. So in order to know which applications are in its cluster and in order to be able to establish communication between them, we are going to create a mesh thanks to Istio. So we will deploy Istio in a multi-cluster architecture using the same network. So every application will be deployed in network one and we will deploy then some apps as you can see here in the image. So we will have our Istingress gateway from which our users will access. Then the users will be routed to our frontend application which is product page and from product page to the backend applications. So let's see the flow, users come to the ingress gateway, frontend and from frontend they will be to the backend. Also just for some observability for the demo purpose, we have deployed also Prometheus on its cluster in order to get the metrics from the traffic of the applications. This metrics will be federated into Thanos and we will show some observability using Kali. So first of all let me talk to you about the Istio multi-cluster architecture. In this case we are deploying both clusters in the same network and the way that Istio upstream has to discover the services deployed on the other cluster is by making request to the API server. This is why we will use this command which basically creates a secret in cluster one, let's say, which contains the cube config so that IstioD is able to communicate with the API server of cluster one. This way all the applications, all the services in cluster one will be aware of which services are running also in the second cluster. So we can establish a direct communication between service A and service B. So how have we set up this in Istio? So basically we are deploying a multi-cluster architecture we define here the name of our cluster, both of them will have the same mesh ID and both of them will be running on the same network. Also we have to specify the trust domain for each cluster and we have to make also this modification trust domain aliases in order for the applications in cluster one to trust also the applications from task domain cluster two. So now that we have Istio multi-cluster let's go with a spire federation. So how does this work in Kubernetes environment? So we have a spire server from cluster one and from cluster two and we need each one to communicate with each other in order to fetch the bundles from the other cluster. So for that we will enable federation and we can establish the template of the specific template which consists on the trust domain, the name space and the service account of the application. This can be modified but this was done for the demo. This will federate with Istio cluster two. So here for each federated domain we can establish like some variables. So the trust domain of the federated cluster and the bundle endpoint URL. This URL will be pointing to the gubernated service of a spire server two in order to fetch the bundle from it and have all the identities available also from this cluster. So now that we have Istio multi-cluster and spire federation, let's see how do both integrate together. So before explaining that, I'm going to explain how does Istio do it in order to get the certificates for Istio cloud and you need to know that Istio cloud in Istio contains two containers. One of them is the application container and the other one is the Istio proxy container. Istio proxy container is formed by two components which are Istio agent and Envoy. What Istio agent does is basically boost trapping the proxy and provisioning identities to the workloads. How does it do it? It makes a certificate signing request to IstioD, then IstioD signs the certificate and Istio agent and Envoy communicate thanks to this sacred discovery service which is called SDS, which is basically a Unix domain socket. So Envoy is able to talk with the SDS through SDS Envoy API and it can get the certificate and the keys which will be stored on Envoy on memory. So now that we know how does Istio do it, let's see how does spire and Istio do it together. So instead of having the Istio agent communicating with Envoy via SDS, we will establish direct communication between the spire agent and Envoy using SDS. How will we do it? So in the installation of Istio, we set some webhooks so that every workload from Istio that will be deployed thanks to the CSI driver that is in the node thanks to Spire, we will mount this CSI on an own location in the workload. So the spire agent and Envoy will be communicating via socket in the same node thanks to the discovery service. So this is the way that we will provision the certificates to Envoy. So how does Spire do it? It's time we deploy Istio workload in our cluster. As I was saying before, we have the Istio controller manager which is in charge of checking which workloads or which identities, which workloads are created on our cluster and reconciling them. So in this case, we are checking the logs so Istio Spire controller manager saw that this pod was deployed on our cluster and it didn't have any identity. So it creates an entry on the Spire server with its PVD selectors and the domain which is federated with. Then if we check on the other container, the Spire server, the entries, we can see that we create one entry for each identity on our Spire server. In this case, it's the entity also of our booking for product-based application with the EntryD, SPVED, the same that we defined before, the parent ID which will be the Spire agent, also the SBIDs, and this is interesting because we are also able to define some selectors in order to identify our workload. In this case, this is the default installation so we only have the pod UID which is the same that you can check in the YAML of the pod running in Kubernetes. We can also customize this by adding more selectors to the identity of each workload. And also we have to say that this will federate with Istio Cluster 2. How does the configuration in Mboy look? So this means that Mboy will have two SANS, two subject alternative names created for each certificate which will match both domains for the same workload. So for booking for product-based, we will have the certificates matching the SPVED from Istio Cluster 1 and Istio Cluster 2. Also, if we check on Mboy, the trust domains, it will trust certificates from both bundles coming from the Spire servers. So now let's go with the demo. First of all, you have here the repo that we use for this demo. It is public and it's very easy to install with kind. We have made a make file so you can easily deploy it on your environment. Okay, so I'm going to show you what we have in Istio Cluster. We have two contexts. So if we go to kind Istio Cluster 1, you can see that we have been looking for the front-end application and also the backend. We have the Istio Ingress Gateway in Istio System, the Istio Decontamplane, and the Observability Components with ArcGale, Prometheus, and Thanos. Also in Spire, you can see that we have the Spire agent and the Spire server. So let's go to Istio Cluster 2. As you can see, we have all the backend applications to which we will make the call from the front-end. We have also Prometheus for Observability, the Istio Decontamplane, the Spire agent, and the Spire server. So let me make this screen bigger. Okay, so apart from that, we have added also more security in order to make the demo more interesting in the security aspect. And is that we have added some job policies to the Istio Ingress Gateway. So we are only accepting jobs that are signed by this certificate. You can see it also in the repo. We are extracting the job token from this URL. So we are getting this job token and we have also exposed the gateway using HTTPS with a certificate. So we will also need to specify one certificate in order to go through our Istio Ingress Gateway. So basically, this will be our job token. We have here the Istio and the subject, which is testing secure.istio.io. And we will only allow jobs with this Istio and subject. As you can see here in the request authentication, we are specifying the certificate. And here in the authorization policy, we are specifying the request principles. So let's make the test. If we make the test with the right token, it works. If we try to make it with a fake token, we get the Istio Ingress Configure. So it's like an extra step for configuring the security of our gateway. Also, another thing that you can do for security playing with the SPFIDs is adding some authorization policies. In this case, for example, we say that only booking for product page, which is our frontend, will be able to access our backend application reviews with this method and with this path. So any other application that tries to access this reviews backend application will be denied. So let's make the test. If we try to call from the sleep pod, the reviews application, we are not able to make any request to this pod. So this is another way to play with security and SPFIDs. Now I'm going to let a for loop running here. With every two seconds, we'll make a request to our Istio Ingress Gateway. And also if we check the logs that they have here, we are able to see what in the request is coming to our AppStream service. Yeah, so it's here, as you can see, this is the SPFID of our AppStream application. So I'm making the request to booking for details with this SPFID. You can see the hash that we have for the data, the certificate that it's been forwarded with organization, the UD, and also the SPFID from the application that made the request. If we check this into our graph, we are using Kali for this one, we are able to see all the flow of our communication. So this box is for Istio Cluster One, this one is for Istio Cluster Two, and let's display also the security. We can see that all the communication is encrypted. If we click on any of them, we see that MTLS is enabled, and we can see also the SPFID that is making the request and the one that is receiving it. So the SPFID that is making the request, in this case, is booking for product page, and the one that is receiving it is booking for reviews. So everything is encrypted, and just to make a double check, we get, I have a sniff pod running, and we also got some data into Wireshark. So this is the IP of the SUS node, and this is the IP of the pod application. This is the request, this is the response, and this is the ACK of the request. As you can see, everything is ciphered using MTLS, and we check the transport layer security encrypted application data. So that was the demo, different ways of securing your mains, your clusters using also SPFIDs. So we hope that you like it. We have here some references that we used for this talk, in case that you want to check it, we have also submitted these slides into the schedule of KubeCon, and here we are going to let also the QR code just to leave us some feedback. So we hope that you enjoyed it. So any questions, just feel free to ask. Yes, we have a mic just in the center. Great. Hello, thank you for the presentation. I want to know how certificate rotation is managed when you have this multi-crystallized setup. Is it managed by SPIRE and not by issue itself? Yeah, in this case it's managed by SPIRE, you can configure just the rotation period. So yeah, you can rotate it every hour, or do you prefer? Okay, one more there. Yeah, hello, hi. What would something change if you have chosen a different network architecture than you chose? Because you have the same network. Two closers in the same network. If I have another not network, something like that, what would change? Yeah, so nothing would change for SPIRE, it would change for Istio. So you are not able to access the API server of the remote cluster because you don't have, you are not talking in the same network, but you would have to deploy an Istio gateway in order to be able to do that pass through and go through the services. So it would change architecture of Istio but not the one from SPIRE. You would have to deploy also the Istio West component. But I mean, so the SPIRE will still be trusted even though you are communicating through a different gateway, right? Because you will now communicate throughout Istio West gateway not by Ingress anymore, but that would work normally. Yeah, SPIRE will be also trusted because the SPIRE will federate the bundle from the other clusters. So as long as both the SPIRE server has both bundles, the Istio West gateway will do a pass through. So it will forward also the same certificates. It's running pass through, that's correct, sorry. In this reference right here, the Istio.io docs, you have all the different multicluster configurations and yeah, all of them are supported. Thank you. Okay, one more right there, I can see one hand. Yeah, I was just wondering in terms of use cases, I guess benefits of having multiple trust domains over a single trust domain with one root of trust. Like when do you see one versus the other? What would you advise? Yes, it's a very good question. I think it depends on the company. I mean, on your business logic, that's gonna define everything. I don't know, maybe just one department is going to be one trust domain and the other, it depends on your organization, the different components and the relationships between these departments. I think it depends a lot on your business logic or business domain. Thanks for the talk. How is under the SPIRE server is the SPOF or it is possible to have high availability on it? Is it what, can you repeat the question please? It's a? Single motor failure. It supports, in our case, yes, because it's a demo, but it supports high availability configuration. So actually you can deploy it outside of Kubernetes in a different, actually it's a very important, it's a component that it has to be very secure. I mean, you need, so you can just deploy it outside of Kubernetes in a virtual machines with hardening and just in high availability. The same with the database. I mean, in our case, we are just using a SQLite inside the Kubernetes spot, but in production you are going to, you want to use a Postgres outside with all replication and in a production way. So yeah. Thanks so much. Thank you. Cool. In your example, you showed that the two Istio clusters are across a single network, like, and so they're connected as part of the same. I'm guessing it's possible to do it across multiple networks. And if so, how much complicated does it get? Yeah, this is, also if we go here to this link, you can see all the possibilities. It is possible. Yeah, it's like the similar question to the one before you need to deploy an Istio gateway because you don't have direct communication to the API server, so you must do a pass through to the Istio gateway. It depends about complexity. It depends on the number of clusters you have. So if you have 10 clusters and you want to do complex routing between one of them, yeah, then it becomes a bit complicated just to manage the objects. But it's easy to do for just traffic, but if you then want to implement policies like JotPolices, Redimiting, then it depends on the number of clusters you have, the more complex. But definitely it is possible and I have seen it in many places. Many customers have this, yeah. Cool, thanks. I'm sorry for the duplicate question. Sorry, could you just put the GitHub link up one more time? Yes, there it is. That one, right? Yes, thank you. If there are any issues, just open an Istio gateway and we will try to address or just solve your questions. So yeah. Any other questions? You can also ask offline, okay, or just in our social media. So thank you everyone for attending. Have a good day. Bye. Thank you.