 So, uh, hey everyone, and uh, thanks for attending this session today. Uh, can everyone hear me okay? Okay, great. Uh, I'm Naranjan. Uh, I'm currently a software engineer working on the Itzdo add-on, uh, for AKS at Microsoft. Um, specifically right now, um, I'm focusing on, uh, multi-cluster, very topical, um, edge traffic management, and, uh, various other Itzdo features that we are incorporating into our add-on. Uh, so today, uh, I wanted to cover some important, uh, security considerations of using Itzdo. Um, both based on some, uh, feature requests and, uh, frequent questions that we've received for our add-on. Um, as well as some, uh, broader guidelines and best practices that have been adopted throughout the Itzdo community. So, throughout the ecosystem, um, it's become pretty common to refer to Itzdo as being, uh, secure by default. But, uh, to borrow a phrase from the book Itzdo in action, which I highly recommend, uh, everyone here, check out at some point. Um, it's more appropriate to consider Itzdo, uh, almost secure by default. Um, so even though a lot of Itzdo's, uh, security capabilities like MTLS and automated certificate management, uh, are enabled out of the box, um, there are still various vulnerabilities and loopholes that operators need to be aware of and take actions accordingly. Now, there is being work done by the project maintainers and various stakeholders, um, to make the default settings more secure. Uh, but there's a, a limit to how, um, robust these baseline security settings can be. Uh, just given the variation in architectures and networking typologies with, uh, Itzdo and Kubernetes, a one-sized, uh, fits all approach to security, uh, is just not feasible. Uh, in terms of what these potential attack vectors and, uh, security risks are, um, there are quite a few that are worth considering. Uh, we still haven't fully locked down communication, especially when it comes to, uh, restricting inbound and outbound traffic. Um, we haven't addressed the possibility of misconfiguration and configuration drift, uh, which is a problem throughout the cloud native and Kubernetes ecosystem. Um, staying on outdated Itzdo versions with CVEs is another major one. And even after we take some steps to harden our environment, uh, there's still avenues of bypassing our security controls. And a lot of that stems from relying exclusively on Itzdo as a mechanism to secure our applications. Um, and also without the right level of telemetry, uh, we might not be having enough insights into our environment to detect anomalous behavior. Um, or on the other hand, uh, we might be, uh, end of being too verbose with our logging and inadvertently leak sensitive information. So what we should be striving for, uh, is to implement zero trust principles. So, uh, never trust and always verify, least privileges, and assuming that an attacker is already in our network and attempting to limit the damage that they can do. But fully putting these principles into practice requires some additional steps on our end. We also want to achieve defense in depth. So, building a holistic security architecture so that there isn't any single point of failure. So not only do we have more Itzdo configurations, uh, that we need to tweak, um, but we also need to look beyond the service mesh itself to incorporate additional tools to establish guardrails at multiple layers. So some of the areas, uh, we'll be looking at will be authentication and authorization, uh, edge traffic, uh, PKI, and a few others. And, uh, to conclude, I'll showcase a brief demo of what a more secure setup, um, that incorporates multiple controls could look like in practice. Uh, I also just wanted to point out that this list is not all inclusive or exhaustive. Um, it's pretty much impossible to, uh, squeeze in every single threat you might face in a 25 minute session. Believe me, I tried. Um, but at the same time, uh, you don't necessarily need to account for all of these risks or, uh, take all of these measures. So, uh, the goal of this talk is to, um, provide an overview of various security risks that could be applicable to you. Um, and propose some solutions and tactics for you to explore later. So, to start off, uh, let's talk about authentication and encryption. As I was highlighting earlier, uh, ITSDO does, um, enable auto MTLS, uh, out of the box. But, um, it's important to delve deeper into what exactly this means. Um, there are two levels to MTLS. Uh, one is what kind of traffic our servers accepting, which is governed by the peer authentication resource. And, uh, the other is what kind of traffic the client side cars are sending out. Uh, which is controlled by the destination rule. Um, so the default peer authentication setting in ITSDO is permissive mode. Uh, meaning that even though auto MTLS is enabled, uh, by default, the side cars can still accept plain text traffic, uh, from non-meshed workloads. And, uh, the reason of, uh, having it as permissive, uh, is to make it easier to migrate your applications, uh, to the mesh, uh, especially for, uh, larger organizations where onboarding can be a very complex process. Uh, but once you're done onboarding all your applications to ITSDO, uh, you set the MTLS mode to strict to, uh, enforce that MTLS is used. Um, so after migrating all of your workloads, uh, you can create a peer authentication like this, uh, in the ITSDO system namespace to configure, uh, strict MTLS at a global level. Uh, however, uh, this setting of global MTLS can be bypassed. Um, in ITSDO, a, uh, workload level configuration would take precedence over a namespace level configuration, and in turn the namespace, uh, would take precedence over a global configuration. So, um, if a non-admin for example, uh, creates a peer authentication with the mode as disabled, uh, our permissive, um, or permissive, uh, that targets a particular workload, uh, then this would override our global strict MTLS policy. Uh, the same by the way also applies to destination rules. By setting the TLS mode as simple or disabled, uh, we can override the, uh, global auto MTLS setting. Uh, as I'll talk about in more depth later, uh, the solution to prevent bypassing of the peer authentication and destination rule settings that we want is to leverage policy enforcement. Uh, so now that we've touched on authenticating and encrypting communication, um, we can use the authorization policy resource in ITSDO to set access controls. So, uh, which workloads are authorized to communicate with each other. Um, and we can also be pretty granular about how we set these rules. Uh, for instance, um, authorizing requests based on the originating namespace or, uh, service account or also based on specific ACTP methods or request headers. So a recommended best practice is to create a, uh, deny by default authorization policy in the root namespace and then from there, uh, explicitly set your allow permissions. Uh, so now, uh, we're implementing the principle of least privileges, uh, by explicitly permitting communication on a case by case basis. And also, uh, as per the ITSDO documentation, um, some recommended safe patterns for, uh, authorization are to use, uh, allow with positive and deny with negative matching and to also use, uh, URL normalization, uh, to prevent bypassing or policy mismatches. Uh, while ITSDO's, uh, authorization policy, um, on its own is pretty powerful and flexible, um, it's possible that you might need to use a more sophisticated, uh, authorization mechanism like, uh, Open Policy Agent or, uh, OAuth 2 Proxy for instance. Um, if this applies to you, uh, you can configure an, uh, external authorization provider through the ITSDO mesh config and, uh, set a custom action in the authorization policy, uh, to delegate the authorization decision, uh, to the external service. Uh, we also need to validate that the end user has valid, uh, credentials from an identity provider. So you could use the request authentication, uh, resource to verify the end user JOT token, uh, based on the URL of the provider's, uh, public key set, uh, and then, uh, the request authentication, uh, will extract the claims. Um, but if you actually want to, uh, deny requests or set access controls, uh, based on the claims or deny requests without JOTs, um, then you need an authorization policy on top of the request authentication. Uh, sometimes folks forget about the authorization policy. Um, and in ITSDO, this end user authentication and authorization would typically be done at the Ingress Gateway, um, at least in most cases. Uh, it is worth pointing out though that, uh, while you could use JOTs at a workload level, uh, for communication inside your mesh, uh, this is typically considered a less secure compared to MTLS. Um, but if you want to use JOTs or, uh, end user verification for intramesh traffic, uh, I would advise to use some kind of token exchange mechanism, um, and use, uh, JOT authentication on top of MTLS, uh, inside the mesh, not as a replacement for it. Um, also because ITSDO can't get access to all the JOT fields, uh, besides the issuer and, uh, different providers might put, uh, more information in other fields, uh, you likely need to integrate with an external authorizer, uh, to write more granular policies. Uh, the next crucial consideration for, uh, improving ITSDO security posture is how we manage traffic at the edge. So a good rule of thumb is to have the, uh, ingress and egress gateways, uh, in their own separate namespaces, uh, and then we can restrict access and configuration of the gateways, uh, to administrators or whichever team in your organization is managing traffic at the network boundaries. And when we expose our services, uh, externally, uh, in the gateway custom resource, uh, we want to define them individually and explicitly, like the configuration, uh, on the right, um, instead of using a wild card host, uh, as is the case, uh, with the example here on the left. Uh, to secure traffic through the ingress gateways, uh, we want to ensure that the incoming traffic is encrypted. Uh, we could either use, uh, do that by using, uh, SNI pass through or TLS termination, uh, depending on whether the back end application, uh, is serving HTTP or HTTPS. And, um, as touched on earlier, uh, we could also have the request authentications and authorization policies, um, that target our ingress gateways. It's also not uncommon for users to integrate their ingress gateway with, uh, some web application firewall, uh, to help prevent against, uh, various type of web exploits, uh, like SQL injection, uh, cross-site scripting, or DDoS attacks. Uh, you could either do this by integrating with an open source tool, uh, like OWASP, uh, Coraza, um, into your ingress gateway with a Wasm plugin, or, um, you could also have a cloud-based WAF, um, in front of your gateway. Uh, another option is if you're looking, uh, specifically for rate limiting, uh, you can configure this through an Envoy filter. Uh, I would add though that, uh, we're, though we're talking specifically about ingress, uh, right now, um, you can use web application firewalls, uh, or, uh, global or local rate limiting for, uh, intramesh traffic as well. Um, there's a great lightning talk from earlier today that unpacked this more. Now, for managing egress traffic, uh, the first thing we need to do is set the outbound traffic policy mode, uh, to registry only from the default value of allow any. Uh, and then the mesh administrators can create, uh, service entries to selectively add external hosts, uh, to its DDoS service registry, uh, to make them accessible from inside the mesh. Then, uh, once we route traffic, uh, through the egress gateway, um, there are two layers of controls we could use to ensure that the egress gateway isn't bypassed. Uh, one we could use the sidecar custom resource in it, in its DDo, um, to limit the scope of the outbound listeners in the proxies themselves. And on top of that, uh, we can use Kubernetes network policies, uh, to block any traffic from pods that aren't destined, uh, for the egress namespace. Uh, another piece of advice is to perform TLS origination, uh, at the egress gateway, and have MTLS set between your sidecars and gateways, uh, in your destination rules, as opposed to, uh, originating HTTPS traffic from the applications directly. So the advantage here is that you can then, uh, target the egress gateway with authorization policies and have even more fine green control of your outbound traffic. Um, it's also best practice to have the egress gateway deployed onto its, uh, own dedicated node pool. And then you have cloud firewall rules to block any requests, uh, that aren't routed through the egress gateway nodes. Um, if you're interested in diving deeper into egress traffic management with its DDo, uh, I cover this more in greater detail in a previous lightning talk for last year's ItzdoCon. Uh, next, uh, let's talk about some additional steps for certificate management with Itzdo. Uh, so by default, uh, ItzdoD will act as both the certificate authority and the registration authority. Uh, meaning that it both verifies the certificate signing requests, uh, and, uh, issues and signs workload certificates. And to do that, it uses a self-signed root certificate, meaning that your root of trust is generated by Itzdo itself and, uh, that signing key is just living as a secret directly in your cluster. So, um, that typically isn't really safe for production. So one alternative framework, um, is to use a plug-insert model. So here, uh, ItzdoD would have an intermediate certificate, uh, issued by a root CA that lives offline. And, uh, the root cert, uh, can be secured, for instance, in, um, some cloud key vault or key management service. And then, uh, Itzdo would use that intermediate cert to issue and sign the workload certificates. The other approach would be to use an external CA, um, so here, uh, we'd be delegating the responsibility of issuing and signing certificates to an external service altogether. Um, there are multiple options here. Uh, you could use, uh, cert manager, uh, Spire, Kubernetes CSR, or, um, even different combinations of all of these. So, uh, in the example up here, for instance, uh, you have cert manager, uh, acting as the root CA to, uh, issue certificates to ItzdoD and the Spire server. And, uh, you have Spire receiving the CSRs and issuing the workload certificates. As for which one of these PKI models is better, um, it really depends. Uh, each one has its own respective advantages and disadvantages. Um, and organizations are also using different deployment models and, uh, they have varying requirements, uh, in terms of their, uh, PKI infrastructure. Uh, if you want a more detailed overview of the pros and cons of each of these approaches, um, I would highly suggest checking out some additional resources online. And, uh, these two talks, uh, right here from last year's KubeCon and, and ItzdoDay, uh, would be really good places to start. Uh, another important mechanism for enhancing our security framework, uh, is to use policy enforcement. So, uh, as I was mentioning earlier, uh, we could use solutions like Gatekeeper or Caverno, uh, to block peer authentications and destination rules that could override our desired global, uh, MTLS setting. Um, an administrator might also want to enforce that all workloads are injected with a sidecar. Say for instance, if a malicious user, uh, was trying to, uh, bypass Envoy. And, uh, they could do that by rejecting pods, uh, that try to disable a sidecar injection through a resource level annotation. Additionally, uh, given that, uh, misconfiguration can easily undermine our overall system security, uh, it would be prudent to block or set limits, uh, around some of the higher risk and error prone features, uh, like for example, Envoy filters, uh, allowing them in more limited capacities. And also, um, it's a good idea to, uh, disallow custom resources that are experimental or alpha. And, uh, you can get more details on that in the It's DO, uh, feature status doc. Um, we also might want to set some fine grain validations, uh, around, um, say authorization policies and gateways, uh, to enforce that some of the safe patterns I was alluding to earlier, like broad hosts, um, are actually being adhered to. Uh, to secure our environment, uh, we also need adequate visibility to determine who did what and when. Um, so either through the mesh config or the telemetry API, uh, we can enable Envoy access logging. Uh, and then from there, you could forward these access logging to some, uh, analytics workspace for your specific cloud provider. And, uh, from there, you could leverage some rich alerting and visualization tools. At the same time, however, um, it's important to verify that our telemetry isn't leaking any sensitive information. Uh, for instance, if some of your logging data has personally identifiable information, uh, you want to redact or encode that information accordingly. Um, there's also some request and response headers that It's DO injects, uh, that users often want to remove. Um, the X Envoy peer metadata is a particularly notorious one. Um, though it's possible to remove or, uh, reformat some of these headers with, uh, Envoy filters and, uh, Lewis scripts and, uh, virtual services. Uh, there is some ongoing work by the community and the project maintainers, uh, to help simplify this process further. Uh, finally, I wanted to quickly touch on some other honorable mentions. Uh, to harden our environment even further. Uh, we want to be mindful of workload and image security. So, uh, the very useful tip here is to use the It's DO chain CNI plug-in, uh, instead of the It's DO init container, uh, because the init container requires root capabilities, uh, to write IP tables rules. Um, and you should also be prioritizing upgrades and staying on an It's DO minor version, uh, that's still receiving security patches. Uh, one helpful way to avoid remaining on outdated, uh, or older versions of It's DO or Envoy, uh, that still have CVEs, is to, uh, deploy and upgrade It's DO through a GitOps workflow like Argo CD. And, uh, this way, uh, you also get the added benefit of, uh, minimizing configuration drift. And lastly, uh, to re- to reiterate the importance of defense in depth, uh, you'd want to use Kubernetes network policies to address, uh, ports and protocols that aren't captured, um, by It's DO. Uh, so to conclude, um, I wanted to offer, uh, some suggestions, uh, regarding, um, just, uh, how many of these security controls and mechanisms, uh, you need to incorporate. Uh, one caveat is that the way that, uh, I went about describing these layers, um, is just one conceptual framework. Uh, other users and It's DO community members, uh, might define these, uh, layers differently. Um, for instance, uh, some other guidelines and talks on the subject have used the OSI model to describe, uh, the various threats and mitigation strategies. And that's also a perfectly valid way of considering it. But, uh, regardless of how you define these layers, uh, what's more important, um, is that you have, uh, comprehensive and, um, holistic coverage over all of them. So, uh, as an analogy, um, if you were preparing to dress up for a snowstorm or, uh, very cold weather, um, it wouldn't make much sense to have a thick jacket, uh, on top of a sweater, on top of thermals, uh, but to go out walking barefoot with shorts, um, right? So, uh, likewise, if you have a setup that incorporates all the It's DO security, uh, tools, uh, but doesn't take advantage of the controls offered by, uh, Kubernetes or your cloud provider or other extensions, uh, then you're going to be, uh, vulnerable to various threat types and malicious actors. So, uh, as one example of what a more hardened setup could look like in production, um, in this demo I've used various It's DO security features and configuration options like the, uh, chain CNI plug-in, uh, outbound traffic policy and strict peer authentication. Um, and I'm also using Gatekeeper, uh, to prevent bypassing of a lot of these settings. Uh, at the Ingress Gateway, uh, I'm authorizing requests based on, uh, the JOT token and I also have a web application firewall installed as a WASM plug-in, uh, to protect against other types of web-based attacks. Um, I also explicitly needed to create a NAT rule, um, to make the Ingress Gateway accessible through the firewall. Uh, so another layer of security right there. And, um, I'm also controlling outbound traffic policy, uh, through the Ingress Gateway and network policies. Um, and then I have the firewall as another line of defense, um, that only allows traffic to leave the nodes, uh, if it's routed through the Ingress Gateway nodes. Um, and on top of that I have a cloud-based key vault to secure, uh, the root certificate and I'm also forwarding the access logs that I've enabled to an analytics workspace. Uh, so now just for a quick run-through to take a closer look at the security architecture. Um, yeah, so as you can see here, leveraging GitOps as a handy tool to facilitate installations and upgrades through Argo CD, and also monitor configuration drift. Um, for my PKI model, um, I'm using a plug-in cert, uh, with Azure Key Vault, uh, to store these certificate and keys. Um, so you could see the, uh, CA certs, uh, secret here in the ITSDO system namespace. Um, and then you could see the secret provider class that maps the secret to the key vault. Um, now let's see what happens to incoming traffic entering the mesh. Uh, we see that a request without a Jot token, uh, gets rejected by the authorization policy with the 403. A request with an invalid token gets denied by the request authentication. Um, now request with a valid bearer token succeeds as expected. Uh, but what happens if I tried, say, a cross-scripting attack? Um, because I have the Karaza web application firewall, uh, running in the Ingress Gateway to detect these kinds of exploits, uh, we could see that that malicious attempt also gets blocked. And if you see the Ingress Gateway logs, uh, we could see, uh, Karaza doing its work and reporting the anomaly. So now let's take a look at communication, uh, inside the mesh. Um, I have a strict peer authentication in the ITSDO system namespace. Um, and if I try creating a permissive policy that overrides that, we see that it will get blocked by Gatekeeper. Right? Um, I also have a deny by default authorization policy. Uh, so, um, if I have my sleep application trying to communicate with HTTP VIN, um, it'll return a 403 because that communication is not authorized. Okay, cool. Um, now let's take a look at egress traffic. So here, I needed to explicitly add a service entry for cnn.com, uh, to make it accessible from inside the mesh. And, um, that's because I have the outbound traffic policy in the mesh config set to registry only. But if, uh, sure, sure, sorry. Yeah. Uh, for the interesting time, I'm just gonna, uh, wrap that up. Um, but yeah, the main thing there is just multiple lines of defense and, um, basically, even if the traffic is bypassed through the egress gateway, then the requester will end up getting denied. And, uh, if you wanted to try any of this on your own, um, on a different crowd provider, uh, here are the equivalents for GKE and AKS, er, and EKS. Um, also the slides will be available, uh, online, so feel free to download and take a closer look later. Uh, yeah, a few other additional resources if you want to download the slides and check these out later. Um, also Zach has done a lot of work with Istio and Zero Trust, so, um, he is the go-to person for that. So yeah, that's it for my talk today. Uh, I hope you found it helpful and, uh, thanks so much for attending. Uh, please feel free to reach out on LinkedIn. Uh, please scan the QR code and leave feedback if you enjoy the session. And, um, yeah, um, that's pretty much it for me today and please enjoy the rest of Istio Day. Thank you so much, Nirajan. I don't think we will have much time. Um, actually, we might be able to take one question. As John, you come up and set up the laptop. If there's any question, uh, to Nirajan. I have a question. Yes. Do you want to answer the question? Sure. What is your opinion, uh, on using security frameworks like SE Linux or App Armor together with Kubernetes and Istio? Sorry, uh, could you repeat the question I didn't hear? About the use of, uh, SE Linux or App Armor within Linux, uh, to harden, uh, the containers. This is a question. What is your opinion about this? Uh, I'm not, uh, as familiar with, uh, the specific, uh, container security tools, but I think, um, yeah, there, there are various, uh, ways you could go about, like, uh, hardening your container images and you do want to have that in your, uh, environment, um, along with the other, uh, uh, like Istio and Kubernetes security controls. Um, I did have something briefly there about workload security, but, um, but yeah, hardening your container images using, uh, distro less is always good practice. Okay. Thank you.