 Hello, everyone, and welcome to ItzDocon 2023. My name is Naranjan, and I'm a software engineer on the Azure Kubernetes service team at Microsoft. I'm working on the ItzDoc add-on for AKS, and I've also contributed to the ItzDocode base and documentation. In this presentation, I'm going to be covering some best practices for securing egress traffic from your Kubernetes cluster with ItzDoc. So if you're already familiar with ItzDoc, you most likely know that it secures communication between applications in the mesh with MTLS by default. The next step for operators is securing traffic, entering and exiting the mesh, also known as north-south traffic. This is important because we want to prevent attackers from infiltrating the network, or in the event that they do, prevent them from accessing unauthorized and harmful endpoints on the internet. Now, ItzDoc does offer several options to secure control of outgoing traffic. This includes mesh-wide settings, several custom resources, and an egress gateway that can perform TLS origination. However, these functionalities and configurations by themselves still aren't enough to fully secure outbound communication. For instance, organizations often have requirements that egress traffic needs to flow from dedicated nodes and be logged. And to truly achieve defense in depth, we need to integrate our mesh with cloud security tools such as firewalls, NAT services, security groups, as well as Kubernetes network policies. Now, let's take a look at an example of how exactly ItzDoc can integrate with these external mechanisms in five steps. I'm going to quickly break down these steps individually and highlight some caveats to keep in mind throughout the process. Then we'll take a closer look at a real-world example of what this could look like in practice. I've added these steps and configurations to my personal GitHub repository, which I've linked in a subsequent slide. So the first step is to set up our firewall rules and create our cluster in an associated virtual network or sub-network. Few important caveats to keep in mind here. The cluster administrator needs to ensure that applications can communicate with the API server and pull images from Docker Hub, MCR, or other image registries. Since we're setting up monitoring extensions on our cluster, these specific endpoints need to be whitelisted as well. When we create our cluster, we also want to be sure to use a CNI plugin that supports Kubernetes network policies. Additionally, administrators should provision a dedicated egress gateway node pool that can communicate with external services. We apply taints to these nodes to repel application pods that don't have the corresponding tolerations. The next step is to install its DOD and the egress gateway with the following specifications. This can be done with Helm or the ITSDO operator API, though the latter is discouraged. When installing its DOD, we set the outbound traffic policy mode to registry only. This restricts workload's communication to external services that have been selectively added to its DO service registry. The egress gateway should be installed in its own namespace for separate administrative control of the gateway. Also remember to apply the appropriate tolerations and node selector to the gateway deployment to ensure that the gateway pod gets scheduled onto the dedicated egress nodes. Third, we onboard our applications to the mesh and apply the appropriate custom resources. One thing to note here is that when setting the service entries, remember to set the resolution to DNS so that the gateway resolves DNS for the host name that it receives in the incoming request. Now, in terms of encrypting outbound traffic, there are two options. The applications can send HTTPS requests to outbound services directly. Or the applications can send plain text requests to the gateway and the gateway originates TLS with the external host. The benefit of the TLS origination approach is that it gives us the additional option to use authorization policies. This allows for even more fine green control of egress traffic. Next, we configure Kubernetes network policies. As we will see, this ensures that even if attackers bypass the proxy, they still can't gain unauthorized access to external services. Keep in mind that operators need to apply the policies individually to each namespace in which they want them enforced. Finally, we need to log and monitor traffic exiting the mesh and the cluster. To enable Envoy's access login, you can set the access log file field in the Itzdio mesh config. Or to enable access logging on a per workload basis, we can use Itzdio's telemetry API. We can then view access logs and firewall logs in a log analytics workspace associated with our cluster's monitoring extension. Now, it's time to take a look at a real-world example of what such a setup looks like in practice. This demo uses Itzdio deployed onto an AKS cluster. However, keep in mind that these general principles and guidelines apply for other Kubernetes distributions as well. So let's examine the cluster setup and the demo environment. Again, in my tutorials repository that I've linked, I provided these steps and configurations for you to try out on your own. So I've taken the steps outlined in Microsoft's user-facing docs to set up my firewall and routing rules. I then deployed the AKS cluster in a subnetwork associated with the firewall, like what we saw in that diagram. My cluster has Azure monitor container insights enabled, and I've also set the CNI plug-in to Azure CNI to enable Azure network policies. Additionally, the outbound type for this cluster is user-defined routes, which allows us to customize our cluster egress. As you can see from the output of Kube CTL get nodes up top, I've created two node pools, the first one being the default that AKS provisions, and the second one that I created specifically for the egress gateway. For the mesh environment, I have the egress gateway here deployed in the Itzdio egress namespace, and as you can see, it got scheduled onto my dedicated egress node. In the Itzdio mesh config, notice I have the outbound traffic policy here set to registry only, and I have on-voice access logging enabled. Here, my CNN service entry adds the host addition.cnn.com to Itzdio service registry. This way, sidecars and gateways can communicate with CNN. I have two destination rules configured here. One to set MTLS between the sidecars and gateway, which I've deployed to each test namespace, and the other to configure the gateway to perform TLS origination. I've created network policies to restrict POT traffic to Kube system, Itzdio system, and the Itzdio egress namespaces, and I've applied these policies to each namespace individually. There are three namespaces I'm using in this test environment. In the default namespace, the sleep application does not have a sidecar. In the test egress unauthorized namespace, the sleep pod does have a sidecar but doesn't have the corresponding authorization policy to communicate with CNN. The sleep pod in test egress is authorized to communicate with CNN. So let's see what happens if I try to send a request from the sleep pod in the default namespace without a sidecar to CNN. As you can see, the request here hangs because the network policy is blocking it. However, even if I delete the network policy, as I'm about to do right now, the request will still fail. This is because the firewall is preventing workloads not deployed onto the egress notes from communicating with unauthorized services. My request from the sleep pod in the unauthorized namespace will also fail with the 403, blocked by the denial authorization policy. However, because the test egress namespace is authorized to communicate with CNN, the egress gateway will permit this request which will succeed. We can gain some additional insights about our test environment and firewall on Azure portal. In terms of how only the egress notes can communicate with specific external services, I queued to the following outbound firewall network rule. Allow traffic with a source IP of either of the egress notes to communicate with addition.cnn.com. I set the protocol and port to TCP on port 443 because this is what the gateway uses when communicating with CNN over TLS. This works because the source IP of external traffic is translated to the node IP by default. Also, here are the application rules I was mentioning earlier that enables applications to pull images from Docker Hub and for Azure Monitor to forward logs to the analytics workspace. In our analytics workspace, we can query firewall logs. This gives us important information like whether traffic was denied or allowed and the protocol, source and target of each request. For instance, in this query here, I'm able to see the firewall permitting the egress gateway's request to CNN that we just saw earlier. Azure Monitor is also forwarding standard error and standard out from every container in the cluster to the analytics workspace. So the Envoy access logs from the egress gateway that we just saw are also visible in the log analytics workspace here. For example, here's the access log for the request that the gateway authorized from the test egress namespace. Before concluding, I wanted to highlight some additional steps platform administrators may want to take to lock down egress traffic even further. With Kubernetes role-based access control, access to the HDO system and HDO egress namespaces can be restricted to admins. Second, policy enforcement through admission controllers can ensure that application pods don't get scheduled onto the egress nodes. Finally, operators may want to strip outbound requests of Envoy headers that can potentially leak sensitive information. For more information on outbound traffic with HDO, firewalls, or relevant cloud tools, check out some of these resources. Thank you for your time. I hope you found this session helpful and please enjoy the rest of HDOCon.