 Hello everyone. Welcome to this session. My name is Wen Cheng, my colleague Oliver, and I will talk about how Spiffy helps Istio in Service Match Federation today. Both of us have been working on Istio since it starts, and how many of you have heard of Istio? Great. And how about Spiffy? Wonderful. I'm so glad to see that. Looks like popularity for both of them have increased quite rapidly over the past year. Let me start with a little bit of background introduction. So Istio is mostly known as a service match. To be more specific, it's an open-service platform to manage service interactions across workloads running from everywhere. Kubernetes, VM, on-prem, public cloud. It solves three major problems for service communication. First, observability. It provides your uniform visibility into what's happening to your service, who is accessing your service, what's latency, what's error rate, which method is called. And second, operational agility. Istio does advanced load balancing, traffic shaping to help you manage your traffic easily and roll out a new version of service safely. The third but not least is policy-driven security. So Istio provides declarative policy. It allows you to describe your intent and Istio will enforce the desired security for you. It has features like mutual tiers to encrypt data in transit also provides you triple protection against your service, which refers to authentication, authenticate who is accessing your service, authorize, allow you to control who is allowed to access your service, and audit so you know who access your service in the order way. Next is about Spiffy. Spiffy is basically a set of open-source standards that help you build secure production identity framework in a heterogeneous environment like Istio, like a VM, Kubernetes, on-prem, and other environments. And Spiffy standards include two parts. First, SVID, that defines what is Spiffy identity is and how Spiffy identity is presented in the X509 certificate. Second part is Spiffy APIs that describes how to securely provision Spiffy identity to each workload from a certificate authority. And if you have Spiffy identity provision from different certificate authority, how do we federate them so they can talk to each other? What's the relationship between Spiffy and Istio? So hopefully this slide gives you a rough idea. So Istio gives you a secure identity framework that provides strong identity for service-to-service authentication, and Istio heavily leverages Spiffy to build this identity framework. First, every Istio workload identity is also a Spiffy identity. So basically the identity is specified in X509 format following the Spiffy's following SPID standard. We are also actually working on supporting Spiffy Federation API. We will talk more about this part in the following slides. Now let's talk about service mesh federation. Why do we need a service mesh federation? Imagine you have workloads running from different meshes. How these workloads can talk to each other. So the main point of service mesh federation is to provide interprobability between two different meshes, and the meshes come from different organizations, from different departments in the same organizations, or you may have one mesh on plan, another mesh on public cloud, or you have one mesh that's purely for Kubernetes workloads, another mesh is for VM workloads, and service mesh federation allows you to make these workloads talk to each other. There are many challenges on service mesh identity federation, service discovery federation, and also observability federation. For this presentation we are mostly focusing on identity federation, and there are two fundamental challenges. First, we need to build a trust between meshes. Like when you receive a request from another mesh, how do I trust the identities so I can apply appropriate policy to give the right privileges to allow this identity to access your service? And second challenge identity isolation. I don't want the other mesh to issue the same identity as identity in my mesh, otherwise he can easily initiate impersonation attack. Now I'm going to handle Oliver to talk more about how we are going to deal with these two challenges. Thank you Wen Cheng. In the following slides I'm going to talk about the technical details and various approaches that we recommend for the service federations. Going back to the question of what is a service mesh and trust domain. In terms of security, in the current e-steal, the applications in a service mesh, they share the common roots of trust, and they are within the same trust domain. A trust domain could represent an individual, an organization, environment, or a department under their own independent Sibifi infrastructure. The trust domain is encoded in their e-steal identities, which is compliant with the Sibifi standard. In the following format, the Sibifi colon slash slash trust domain and slash namespace slash their service account. So in these slides we are talking about their identity federations specifically. So in terms of the federation of the meshes, for the applications in two different service meshes to authenticate each other, we need to verify each other's certificates using their own trust roots. So for example, in this following graph, we have service mesh one, which is with their trust domain foo.com and service mesh two with the trust domain bar.com. For example, if you want a service in service mesh one to authenticate a service in service mesh two, you need to have this service be able to authenticate the certificate presented by the other end using its own root of trust. So how do we do it? In terms of the scenarios, we first talk about this in the federation within an organization. If you have one organization that is using a common root CA, as just showing in this picture, and then you will have multiple trust domains. Each of the trust domain will have an intermediate CA. Those intermediate CA's are using the certificates issued by the root CA. Each of the intermediate CA is responsible for issuing certificates for the services that are running in their own mesh. So in this case, CAY is issuing certificate for service A and CA2 is issuing certificate for service B. Both service A and service B are trusting the root CA's certificate. So through this complete certification chain, they can easily authenticate each other. Beyond that, our recommendation is intermediate CA name constraints can be helpful for isolating the trust domain. For example, if you apply their name constraint for intermediate CA1 to only issuing certificate for team1.fu.com, if this one is compromised, for example, it issues certificate for team2, which is belonging to this part. Service B will verify their name constraint and decline this connection. Okay, then we talk about federation across the organizations. In this scenario, usually we'll have one root CA for each different trust domain. In this case, the left side is trustdomenfu.com, the right side is trustdomenbar.com. The root CA is signing the certificate for their services running in their own service mesh, and you can see the services are trusting their own root CA certificates. Talking about their service and federation, their mesh federation, some of you might think about, okay, we may be able to cross-sign the root certificates so that they can build up a different certification chain to enable their mutual trust. How this works, this root CA1 signs the public key of root CA2 to generate a new certificate, intermediate certificate, because this for root CA2, the public key are represented by two certificates. In this case, you are able to build a new certification chain from Service B to this intermediate certificate to root CA1. And then for the Service A to authenticate Service B, it will use this new certification chain to verify Service B's certificate using its own root CA certificate. The drawback of this approach is high complexity. This cross-signing is hard to automate, and if you have, for example, N service mesh aids you want to federate, there are N square cross-signings for the N trust domains. So what we recommend is this one, SBIFI trust bundle, how it works. So the core advantages of the SBIFI trust bundle is first automation of the root of trust exchange, and second, the authentication can use the root service corresponding to the peer's trust domain. In this graph, we are showing it's the same, root CA is signing a certificate for Service A, root CA2 is signing a certificate for Service 2, but beyond that, you will notice in the red box there's a trust for foo.com only use this red certificate, which means the root CA1 certificate to verify it. And bar.com, you should only use the root CA2 certificate to verify it. How this entire thing works, be patient, I will talk about it in the following slides. Okay. So first, let me spend one minute to talk about the SBIFI trust bundle. The SBIFI trust bundle is an RFC 7517 compliant GWK set containing a trust domain's cryptographic keys for the validation of the certificates issued in that trust domain. If some of you are familiar with the GWKS standard, you will figure out how this works. But here, I'm going to give you an example. This is an example of the GWKS for the trust bundle. You will have keys representing the certificates for this trust domain. It's an array, so that means you may have multiple keys that you can use to verify the certificates in that trust domain. The use part is required for SBIFI standard to be X509-SVID. SVID means SBIFI verifiable identity document. The X5C, this part is critical. It carries the base 64 encoded DER of the X519 certificate that you use to verify the certificates for that trust domain. And one more interesting field is this one, SBIFI refresh hint, which means this bundle will be valid for 10 minutes, 600 seconds. The key type and those four fields are redundant here. They are more useful for drought type trust bundles. One thing to note here is the SBIFI trust bundle not only serves for the X519 certificate, but it also serves for setting up the trust for drought tokens. In that case, you won't have the X5OC. You will use this one to verify to obtain the public key. We put it here because this is a mandatory field for the JWKS standard. The federation with the SBIFI trust bundle going a little bit into their technical details. For a trust bundle, there's a publishing side and there's a consuming side. The publishing side needs to expose an HTTPS endpoint encrypted through the PLS certificate based on WebPKI or the SBIFI standard. The consuming part is still admin needs to configure mapping from the trust domain to the endpoint. And then when you still get that mapping, it authenticates the endpoint and retrieves the bundle from it. And then it will build up a message including this trust domain and bundle tuples and propagate them to their workloads. And the workloads can use it in the certificate verification. To give you more vivid explanation of this flow, I draw this down into their pictures. This is an example. On the left side, you will have Citadel. Citadel is the CA in Istio. On the right side, it's the Spire server, which is the CA in SBIFI standard implemented by their Skytail company. Suppose you want to federate their left side with the right side. And for this to work, you will have a trust bundle management module running in Citadel and Spire server. This module is in charge of both publishing their trust bundle endpoint and also consuming the trust bundle from their other endpoints. In this example, you see the Citadel is exposing endpoint, which is HTTPS endpoint one. And the other side is HTTPS endpoint two. The Istio admin configures their consumer side to obtain, for example, the foo.com certificate is local. That means you don't need to go to an endpoint to get that. It's used locally. But for bar.com, which is this domain, it points to the endpoint two, which is this guy. Right? And on the other side, it's the same similar thing. And after you configure all those, they are up and running. The trust bundle management module starts to retrieve the trust bundles. The Citadel part retrieves their distributes its trust bundle containing its own root certificate to the Spire server. And the Spire server also distributes its trust bundle to Citadel. After that, the Citadel side will create a trust bundle message mapping from foo.com with its own root certificate and bar.com with the certificate from the Spire server side. And propagate it to service A, and on the other side, the same. Now service A will have the trust bundle, and service B will have this very similar trust bundle on its side. Service A can use bar.com mapped root certificate, which is the green one, to authenticate service B. And service B will use their root certificate mapped from foo.com to authenticate service A. So that's basically about this flow, about the isolation, identity isolation, right? Suppose in a scenario, the Spire server is compromised. So what will happen? If the Spire server is issuing a certificate, for example foo.com, what will happen is this service A will gather service B's certificate and examine its trust domain, and it will figure out, oh, it's from foo.com, which is wrong, right? And then it will use the trust bundle, see the trust bundle, and see, oh, foo.com, we should use this red certificate to verify. And then it will try to use this guy, its own root certificate, and it will figure out, oh, it's wrong, it's actually not working, it's not signed by this guy. Then this handshake will fail. So that's how their identity isolation works in their service federation, sorry, the mesh federation scenario. Okay, so I think that's it. Have any questions? Oh, they use the microphone, otherwise they can't. Oh, that's a good question. So his question is, there's a 600 seconds expiration for their trust bundle, and I didn't show in this picture. So basically, go back this one. This retrieving trust bundles is periodic. Every, it should happen within, in that case, it's 600 seconds. And you retrieve this new trust bundle and compare it with the trust bundle that you cashed. If it's any difference, you will propagate new trust bundles to their workloads. So the question is, for this encoding, what's their, how it's working, how it's encoded, right? Yeah, so the X file C certificate is encoded in base 64. So it's not using a public key. But this one, this public key is the same with this, the public key in the certificate. It's kind of redundant here, just because this is required by their JWKS standard, we have to put something there. But beyond that, if you have X509 JOT, suppose their trust bundle is for JOT, you won't have this field. And in that case, this will be meaningful. That's right, yeah. This is an elliptical curve encoding and algorithm, sorry. And X, Y are their coordinates. And you can use this to get the public key. Let's increase this to a larger number, that's no problem. Practically, this is very short. You don't want it to be this short. Any other question? So a follow up question for the refresh seconds. So what happened for the existing connections when it got refreshed for the trust bundle? That's a good question. So if this refresh second, if this is spared and it's changed, right? The TLS handshake will verify the certificates only at the beginning of the connection. And currently, we don't have a renegotiation mechanism implemented in envoy by default in Istio. So if you created the connection before this changes, it will still be working unless you disconnect and then you try to redo the handshake and if this is changed and it's not valid anymore, you will fail. So how do you think about the connection of Spiif and IAM and Istio security? Istio, sorry, could you repeat your question? IAM and... IAM is mostly about authorization, right? And your question is once we have Spiif identity, how do we leverage IAM to enforce the policy to apply access control? In that case, it's orthogonal to Federation and the key is you want to understand who is calling you, which identity is being used and then you can apply policy to that identity. The thing that is specific to mesh Federation is that when identity comes from another mesh, there's trust domain which you can use to identify which mesh identity comes from. So in IAM policy, you can set it appropriately. Like for identity within your mesh, you can apply these IAM policies, but identity from different mesh, you apply this separately. Did that answer your question? No, thanks. I just wondered how easy this is to use today. How do I turn that on? Is it like command line arguments to Citadel or is there a YAML resource or a Helm values file for this? So you are talking about how easy it is to use, right? Yeah, so I already have like three Kubernetes clusters, each with a service mesh that is federated using the new 1.1 features, right? And they currently, they just share a root CA cert. I want to move to this. It looks better, but how much work is that? I think you are talking about this federating through their specific transponders, right? Yeah, yeah. So I have to configure each Citadel with all the M. I have to turn on my local endpoint to population. I have to configure with the map of all the remote endpoints. Like, do I have to go hacking around in the Citadel pod today or is there a nice YAML? This is working progress. First of all, yes. And then in their final state, it will be very easy. What you need to do is configure this endpoint and the transponder config. It's basically YAML file. But there is a YAML file? Yes, yes. Is it in 1.2? What do I have to wait? That's a good question. It's still working in progress. We'll try to make it happen. Yeah, very quickly. Where does the trust domain string come from? Because like Kubernetes clusters don't have a first-class name. I don't know what it's set to today. So this trust domain part is configured now. It's configurable in Istio. Yes, it's not correlated to their GKE clusters yet. It's totally an Istio concept. Yeah, it's not a Kubernetes concept yet. No, sure. But you could, maybe in the future, or you customize it to... But if I don't set it, I just get around the string because I can't think of what you defaulted to otherwise. The default one is the cluster.local. Otherwise, you configure it. So I need to override it if I want to federate it? Yes, override it. Yeah, thanks. Any other question? All right, thank you guys. All right, thank you.