 Hi everyone, welcome to the session on Calico security policy best practices. My name is Adil Abdul Majeed. I'm a solutions architect with Tigera's customer success team helping our customers deploy Calico solutions. Okay, let's start off with the agenda for today's session. We're going to be looking at security challenges in Kubernetes and how Kubernetes disrupts traditional networking and security paradigms. We look at Calico security policies, the anatomy of a Calico security policy and the declarative policy language. Next, move on to best practices for security policy implementation with some examples, security policy patterns. I think one of the best ways to understand security policies is to look at some examples and how those examples can be incorporated into your environments and build a security policy model. Once we have the model built, we will then look at how you can introduce policy governance to that policy model. Just have a brief look at the Kubernetes networking model. The model stipulates that pods can communicate with all of the pods on any other node without the requirement for network address translation. Agents on a node such as system demons and kubelet can communicate with all pods on that node. And the most common container runtimes use container networking to first plugins to manage their networking and security capabilities. Calico being one of the most widely adapted CNI plugins for Kubernetes. So at a high level, the Kubernetes networking model provides for a flat network where pods in the cluster can freely talk to each other. There are benefits to this model. It alleviates the complexities that could have been due to the underlying network. However, the model also introduces certain security challenges. Kubernetes distrops traditional networking and security paradigms in traditional networks. When security is enforced, it's done so at certain choke points in the network. And controls are enforced to when traffic traverses or transits those choke points, typically using some form of networking construct. Now, in Kubernetes, pod scheduling is dynamic, IP addresses are ephemeral and scheduling is typically non-deterministic. What that means is you can't bind a workload identity to a networking construct anymore. Also firewalls external to the cluster cannot map IP addresses to workload identity. So when traffic egresses the cluster, upstream firewalls cannot enforce security policies based on IP addresses because the IP addresses for a pod are non-deterministic. With those challenges also comes an opportunity and we'll look at Calico a bit in this slide. Calico, among certain other functionalities, primarily offers the container network interface for Kubernetes IP address management and the security policy engine. Now Calico is quite flexible in the routing modes that it supports, as well as the data plane it supports for policy enforcement. So for routing you can choose between IP in IP, VXLAN overlay modes or native BGP and for the data plane you can choose between EDPF, IP tables and Windows HNS if you have Windows nodes in your clusters. Now the primary benefit of this tight integration with the orchestration plane being that workload identity is tightly coupled to networking and security in a way that was never before possible prior to Kubernetes. And this offers a number of advantages that we will explore in the subsequent slides. All right moving on let's look at the characteristics of security policies. Security policies are label-based and the key value pairs are the primary selectors to scope policies and refer to source and destination endpoints in policies. The policies are declarative so Calico offers a very flexible declarative policy language and what that means is that the underlying implementation of the policy is abstracted from the user. The underlying implementation being the data planes. For example EDPF, IP tables or Windows HNS that is abstracted from the user. What the user gets is this declarative policy language that can be used to define security intentions. And policies are highly dynamic and what that means is the underlying policy implementation in the data plane is for every pod in the cluster. So as pods are created so are policies and as pods move so do the policies. So the policies are highly dynamic but you as a user you only have to specify the security intention using the declarative policy language and Calico will take care of the underlying policy implementation using a data plane of your choosing. Okay so these are some of the characteristics of security policies. All right next let's move on to Calico security policy features. Calico supports network policy and global network policy and when I say security policy I'm referring to both network policy and global network policy. Support policy ordering. Now Calico policies are an extension of Kubernetes policies however Calico offers several more extensions than what's available in native Kubernetes policies and policy ordering being one of those and we'll look at how you could leverage policy ordering when building a policy model. Policies can be applied to any kind of endpoint being pods, VMs or hosts external to the cluster okay and policy rules support allow denied log and pass actions. When you're specifying source and destinations in a policy rule there are a number of match criteria for example you can match a port and this could be based on port numbers, range of ports or even named ports in Kubernetes. You can specify a protocol such as TCP, UDP, ICMP for example, HTTP attributes, ICMP attributes, IP versions, IPs, ciders, network sets and global network sets. We'll explore some of these in some of the subsequent slides. Endpoint selectors and this is where you'd use label expressions to select pods, VMs or other host interfaces, namespace selectors or service account selectors okay. Now policies also support optional packet handling capabilities such as disabled connection tracking, apply before NAT, apply to forward traffic etc and these primarily apply to host endpoint policies. Now there's a lot going on in this slide and it's not possible to cover all the policy features in a single session so in this session we're going to focus on policies for pods and containers all right we're not going to be discussing about policies for host endpoints or VMs or policies for nodes outside the cluster and in terms of actions we're primarily going to be focusing on the allow and deny action we're also not going to be discussing about the optional packet handling capabilities. Now I'll link the docs site and the best place to learn about policies and all that it's capable of is the project calico docs site okay so there's a link to that down below okay so let's have a look at the anatomy of a calico security policy. So a security policy has a scope it could either be a namespace scope policy which applies to a specific namespace or a global network policy which could apply to multiple namespaces or even all the workload endpoints in the cluster. Now there are a few selector options you could use endpoint labels to select specific workload endpoints if it's a global network policy you can also use namespace selector to limit that global network policy to specific namespaces you could also use service accounts as selectors. Now a policy is going to have one or more ingress and or egress rules and if you look at a particular rule in a rule you could have either an allow deny log or pass action and in an egress rule if you look at the destination there's a two field there where you could specify the destination and the destination protocol is also a from field there that you can use to specify the source. Now when specifying the source again you can use endpoint selectors or namespace selectors, service accounts could be networks, network sets or global network sets. Now network sets and global network sets are a way of grouping either IP addresses or cider blocks and we'll explore some examples in the subsequent slides. When it comes to protocols again we looked at some protocols in the previous slide for example you could choose between TCP, UDP, ICMP even HTTP match if you're using layer seven policies. All right so quite a bit going on here right just park this for the moment I think we'll be able to bring this together when we look at the examples in the subsequent slide but the idea being that there's a lot of flexibility within a policy and understanding some policy patterns is the best way you could approach implementing a secure depositing model for your environments okay. Alrighty so before we look at some examples let's look at security policy behavior. Now if no network policies or global network policies with that matter are applied to a pod then all traffic to and from that pod is allowed so by default if a network policy is not applied to a workload all traffic is allowed that's that's a default whitelist behavior in Kubernetes. Now if one or more network policies apply to a pod containing ingress rules then only ingress rules or ingress traffic rather specifically allowed by those policies is allowed for those endpoints. If one or more network policies apply to a pod containing egress rules then only egress traffic specifically allowed by those policies is allowed for those endpoints. What this simply means is that once the endpoint is matched by a policy only flows allowed by that policy or any other subsequent policy is allowed for that endpoint. So once you've matched a policy or rather matched an endpoint in a policy there's a default deny behavior for those endpoints okay. All right best practices for security policy implementation. So you should look at grouping all the rules that apply to a given workload or a group of workload into a single policy instead of having multiple security policies. So inside the policy you can have one or more either ingress and or egress rules and you should look at using rules for a given workload or a group of workload and making sure that you have a single policy for those workloads with multiple rules okay. Implement a hierarchical design that allows for optimizing the number of security policies by filtering out non-compliant flows at the top of the funnel. When we look at the policy model in subsequent slides you will look at how for example global network policies are used to implement certain high-level controls so that we can filter out unwanted or non-compliant flows at the top of the funnel. Use global network policies to implement high-level guardrails and network policies to implement fine-grained controls okay. So global network policies are what spans across multiple namespaces or even all the workloads in the cluster. So you could use global network policies to enforce high-level security intentions and network policies for fine-grained controls that apply to very specific workloads. Leverage policy order with allow and deny actions when developing security policies right. This is a powerful capability in Calico policies also use network sets and global network sets to group IPs and ciders so that they could be referenced by even multiple policies okay. It's just a bit more efficient doing it that way right. So with these best practices in mind let's look at some example security policy patterns. Now the cluster shown here will be used as an example to demonstrate certain security policy patterns. This particular cluster has two tenants and a tenant is simply a logical isolation so it could be any form of logical isolation. For example in your environment it could be a PCI environment that you'd like to isolate from the rest of the cluster workloads. If you're a hosting provider a tenant could be a customer so if you have multiple customers you have to make sure that they remain isolated in a shared cluster. Also if you're having a shared cluster that's shared between various development teams you may want to have a logical isolation all right. So a tenant is any form of logical isolation that you'd like to create for your cluster. Now a tenant could have one or more namespaces. So for example in this cluster tenant one has two namespaces it's a shop and yow back and tenant two has a single namespace called book info okay. Now the applications in these tenants are exposed using an ingress controller in the ingress nginx namespace right. So hopefully this cluster is similar to your environment of course you could horizontally scale you can have more tenants more namespaces but the pattern hopefully is similar to your environments okay. All right so now let's look at some example policies. Before we implement policies I'm kind of assuming this is a live production cluster so I don't want to break anything and that's why I've started off with a default allow here and the default allow is simply a policy that allows all traffic. Now remember in Kubernetes by default Kubernetes has a white list behavior however once you've matched pods in a policy there is a default deny behavior okay and implicit deny behavior. I'm using this default allow as a failsafe so that I don't impact any live production traffic in this cluster okay. So my first policy is a denialist policy and the denialist policy is a calico global network policies which applies to all cluster endpoints more the reason for that default allow. So now I have a policy that applies to all cluster endpoints what that means is all cluster endpoints now have a default deny behavior for egress traffic since this policy matches egress traffic okay. So this policy has egress an egress rule which denies traffic. Two IPs and siders in the IP denialist global network set and I've shown the global network set on the right hand side as well. Now recall that global network sets are a way of organizing could be IPs or cider blocks and referencing those in policies. Now the selector field is used in the egress rule to match the global network set based on the GNS equals IP denialist label. So if you look at the global network set we have a GNS equals IP denialist label which is referenced in the egress rule implemented in the security policy. But there's also a namespace selector in the security policy and the namespace selector uses a global match criteria and that's because this particular network set is a global network set okay. So what this policy does is if any of the cluster workloads try to connect to the IP addresses specified in this global network set those flows are going to be denied by this policy. For example if you could retrieve these IPs from a threat feed and update the global network set then you could effectively deny all cluster workloads access to those malicious IP addresses. Now if you look at the actual underlying policy implementation the policy is implemented for every endpoint in the cluster using the data plane okay. However for simplicity I've grouped all the endpoints and shown that traffic is denied to the IP denialist global network set okay. All right so this is the first example let's move on. The second example is the kube dns policy. Now the kube dns policy is a global network policy which applies to all cluster endpoints. It has an ingress rule which allows DNS traffic from all endpoints to kube dns. It has an egress rule which allows DNS traffic from all endpoints to kube dns. The selector field is used to match the kube dns endpoints using the k8s app equals kube dns label. The namespace selector field is used to select all cluster endpoints using the all match operator okay. So if you look at this policy this is a global network policy and in the ingress direction the destination is kube dns and we've allowed UDP port 53 so the protocol is UDP. If you were to kind of visualize this policy you have a kube dns endpoints and you have created this pinhole on the ingress direction from all cluster endpoints to kube dns. So we've matched all cluster endpoints by using the namespace selector with the all match operator. Now on the egress direction the source of the egress rule again is all cluster endpoints and the destination again is kube dns on port 53. So on the egress direction for all cluster endpoints we've created this pinhole permitting traffic if it's testing to kube dns on port 53 on UDP port 53. Again recall that the pinhole is in fact created for all endpoints in the cluster. However for simplicity I've grouped those endpoints and shown a single pinhole. So this is the kind of visual representation of this policy if you'd like to kind of visualize what the policy looks like and this would be it and this policy is quite powerful isn't it. You now have a single declarative policy that's protecting all dns traffic in the cluster. So you may have thousands of polls and thousands of dns flows in the cluster that are now protected by this single policy. Okay let's look at the next policy. This policy is called tenant one restrict. So this is a calico global network policy which applies to all cluster endpoints in tenant one. The namespace selector field is used in the security policy with the project calico.org forward slash name label to select endpoints in the tenant one namespaces. It has an ingress rule which denies all traffic except from the specified namespaces. It has an egress rule which denies traffic except to the specified namespaces and the not selector field is used in ingress and egress rules with the project calico.org forward slash namespace label to exempt endpoints that should not be denied. Okay so this is a global network policy and in the policy and in the policy the namespace selector is used to specify the hipster shop and yaw bank namespaces. So this policy applies to the hipster shop and yaw bank endpoints. In the ingress direction there is a deny action however the not selector is used to exempt the endpoints that should not be denied. Similarly in the egress direction the action is a deny again the not selector is used to exempt endpoints that should not be denied. Now the project calico.org label is a label that can be used to identify namespaces and endpoints in those namespaces. Now the value of the label is the name of the namespace. So you could use the project calico.org forward slash name key with the value of the namespace and calico will then select all the endpoints inside that namespace to scope the policy or the rule. Okay now the project calico.org name label is something that can be used to specify or select endpoints. It's a label that calico associates with an endpoint and the value of the label is the name of the namespace that endpoint belongs to. Okay so since we're using a not operator we're using the project calico.org forward slash namespace label and we're providing the name of the namespace so that those endpoints are exempted from the security policy. Now if you want to visualize this policy we have the tenant one endpoint selected using the namespace selector so basically endpoints in hipster shop and yaw bank. All traffic to other endpoints are denied except for what's exempted using the not selector. So this policy does not permit any traffic right? Given that you've specified a policy for tenant one restrict a subsequent policy must permit traffic flows that should be permitted for tenant one workloads. Okay all this policy does is it isolates the tenant one workloads and denies traffic to all other cluster endpoints except for what's exempted using the not selectors in the rule. Now we're going to have a similar policy for tenant two as well again a global network policy using the namespace selector with project calico.org forward slash name label to select the tenant two endpoints has ingress and egress rules again with not selectors using the project calico.org forward slash namespace label to exempt endpoints that should not be denied. So similar to the previous rule you know the book info and the ingress engine x namespaces are exempted now of course we're exempting the ingress engine x namespace even in the previous case because the applications are exposed via that namespace okay so we'd expect to receive inbound traffic from that namespace again we are not permitting that traffic and when you look at subsequent policies you will see how traffic from ingress engine x is permitted to workloads that are exposed to external consumers okay however with this policy what we've done is similar to the previous policy isolated tenant two endpoints from the rest of the cluster okay so the two restrict policies you know with the two restrict policies you've created a certain level of isolation so high level guardrail demarcating those workloads from the rest of the cluster workloads okay all right so now we have a policy called the front end policy now the front end policy is a network policy which applies to the front end endpoints in the hipster shop namespace in tenant one now the selector field is used to select the front end endpoints using the app equals front end label has an ingress rule which permits traffic from the ingress controller the namespace selector field is used in the rule with the project calico.org forward slash name label to select the ingress engine x namespace and the selector field is used with multiple labels to select the ingress controller egress rules with the selector field matching app equals x meaning we've got multiple labels being matched to select other endpoints in the same namespace that the front end endpoints communicate to okay so let's go through this policy a bit this is a network policy which means it applies to a specific namespace the namespace being the hipster shop namespace however in the policy we are using an endpoint selector so we're using the selector field and using the app equals front end label what this means is that the policy applies only to the front end endpoints okay in the ingress direction you are using a namespace selector to select the ingress engine x namespace however we're also using endpoint selector so we're using the selector field and using an end operator specifying a couple of labels to match the ingress controller okay so when you're using selectors and when you're using labels you could use an or not operators in this case we're using an end operator to make sure that the controller has two labels and that's how we are identifying the ingress controller for in this particular rule okay now in the egress direction front end is sending traffic to ad service service specified a port here similarly sending traffic to checkout service now this rule continues i've truncated the rule what's important is not the number of services it's talking to in the same namespace there's quite a bit of them for this particular application but the idea being that the pattern is is the same it kind of repeats and what we're doing is given that this policy applies to the hipster shop namespace we're using the selector field to select other workloads that the front end workload can communicate with in the same namespace now bear in mind that just because traffic is permitted from the front end workload doesn't mean that we've allowed end-to-end communication okay so if you're to visualize this policy you know we have the front end endpoints and in the ingress direction we've created a pinhole so that the front end endpoints can receive traffic from ingress nginx and on the egress direction we've created a pinhole so that the front end endpoints can send traffic to other services in the same namespace however for those services and we look at this in a subsequent policy you also have to make sure that you create an ingress pinhole for those endpoints okay which we've not done yet so just because there's an egress rule permitting traffic from front end to a checkout service it doesn't mean that the rule or the floor rather is allowed end-to-end all right so now let's move on to the checkout service policy again this is a network policy which applies to the checkout service endpoints in the hipster shop namespace the selector field is used in the security policy to select checkout services endpoints using the app equals checkout service label has an ingress rule with the selector field to select the front end endpoints using the app equals front end label and several egress rules with the selector field matching app equals x labels to select other endpoints in the same namespace the checkout service endpoints would need to communicate with all right so in this example the policy is a network policy very similar to the policy we saw for the front end service however if you look at the ingress rule we are simply permitting the front end service to talk to the checkout service so there's a pinhole now created in the ingress direction for the checkout services endpoints however if you recall the previous policy the egress pinhole for the front end endpoints were created in the front end policy okay so now with this policy we now have end-to-end communication or the end-to-end flow from the front end to checkout permitted okay and the checkout service endpoint has an egress rule so pinhole in the egress direction for the checkout services endpoint to permitting it or allowing it to communicate with the other endpoints in the same namespace that it should be able to communicate with okay again the same logic holds right so for example if it's communicating with payment services in the policy that we have for payment services we've got to make sure that that flow is allowed in the inbound direction okay so the policies we're developing for the hipster shop namespace are very fine-grained granular policies right the policies apply to select endpoints endpoints representing a particular microservice and for those endpoints we are applying rules in both the inbound and the outbound directions right so these are very granular policies okay now of course for the checkout service we are not permitting traffic from the ingress because it's the front end that communicates or the front end that receives traffic from the ingress okay however the checkout service will receive traffic from front end and on the egress direction communicate with other microservices or endpoints in the same namespace all right so let's move forward so I've not shown the rest of the policies for the hipster shop namespace those policies you know will continue however the pattern remains the same okay so assuming that we've completed the policies for the hipster shop namespace let's now look at the policies for the yaw bank namespace this policy is a network policy which applies to all endpoints in the yaw bank namespace has an ingress rule which permits traffic from the ingress controller the namespace selector field is used in the rule with the project calico.org name label to select the ingress engine x namespace and the selector field is used with multiple labels to select the ingress controller traffic permitted to the customer endpoints using a selector with label equals app equals customer label now the policy also has an ingress rule which permits traffic from all other endpoints in the same namespace using the selector field with the all match operator similar it also has an egress rule which permits traffic to all endpoints in the same namespace using the selector field with the all match operator so this is a interesting pattern so very similar to the front end policy we looked at however with a distinction so this is a network policy applies to the yaw bank namespace and in the ingress direction if you look at the rule given that this policy applies to all endpoints in the yaw bank namespace we've not specified a selector and what that means is all endpoints in the namespace are matched for this policy if you look at the ingress rule very similar to some of the ingress rules we looked at previously we are using the namespace selector and the endpoint selector to select the nginx ingress controller however this particular rule also has a destination field that's used and the destination for the ingress rule is the customer endpoints okay if you look at the second ingress rule what this means is that all endpoints in the yaw bank namespace can receive traffic from all other endpoints in the yaw bank namespace given that this is a network policy and we've not specified selectors all endpoints can receive traffic from all other endpoints and the egress rule is similar to that as well there are no selectors and what this means is that all endpoints can send traffic to all other endpoints in the yaw bank namespace so this is what the pattern looks like the ingress nginx can send traffic to the customer endpoints that's governed by the first rule in the ingress direction however the endpoints within the namespace can freely talk to each other since we've permitted or allowed outbound and inbound communications from all endpoints in the same namespace so this is what this pattern looks like now this is a bit of a coarse-grained policy at times you may want to implement such policies right if you trust all the workloads inside the namespace if it's a namespace that is not too critical however has several workloads you can create a guardrail around the namespace rather than for every group of endpoints inside the namespace now this differs from the policy we created for you know front end and checkout and the services in the hipster shop namespace because those policies were very specific to a group of endpoints within the same namespace and even when they had to communicate with other endpoints in the same namespace it had to be explicitly permitted using rules or policies all right so moving on to the book info policy very similar to the yaw bank it's a very similar pattern again traffic from the ingress is permitted this time to the product page deployment which is exposed to external consumers and all communications inside the namespace who went from all the endpoints inside the namespace is allowed okay so very similar pattern to what was seen in the yaw bank policy so it's not important that you kind of understand all the microservices involved in some of these examples the idea is that you understand the pattern right so in the yaw bank example it was the custom endpoint that was exposed to external consumers and in the book if for example it's the product page endpoints that's exposed to external consumers and those are the endpoints that have the pin hold in the ingress direction permitting traffic from the ingress controller all right so we've looked at a few policy patterns right so we started off with the denialist we had a pattern for qdns build policies to restrict tenant one and tenant two workloads we looked at some granular policies for the endpoints in the hipster shop namespace and then look at some coarse grain policies for yaw bank and book info once you're done building policies for a certain set of namespaces you could then enforce a default deny for those namespaces so in this case we're first enforcing a default deny for tenant one and what this says is that deny all traffic in the ingress and egress directions for the tenant one workloads this policy is a global network policies and the workloads identified using the namespace selector in this case we've matched the hipster shop and yaw bank namespaces similarly we have a tenant two default deny very similar logic it's a global network policy we've matched the namespaces using a namespace select in this case it's just a booking for namespace and all ingress and egress traffic is denied all right so hopefully now you understand the reason for this default deny right sorry default allow if you look at this policy model i have policies where we've matched all the endpoints in the cluster for example cube dns however we've not built all the policies for all the endpoints in the cluster right and what this means is that if that default deny sorry the default allow wasn't there right i would be denying all other flows in the cluster so the idea being that you take a progressive approach when developing your policy model you may want to restrict certain environments so in this case assuming that the one the tenants were actually customers you'd want to secure those custom environments and isolate them from the rest of the cluster so that's what we've done and now you can continue with the rest of the policy development however hopefully the patterns that we've shown with the global network policies and network policies and how environments were isolated from the rest of the cluster the course grant policies and the fine grant policies hopefully those patterns help you think about how you should be approaching policies for your environments all right so with the policy model built now you're in a position you know to introduce security policy governance okay and when you're thinking about governance you gotta think in terms of the policy model rather than individual policies and leverage security policy ordering role-based access control and admission control to enforce policy governance okay for example if application developers are authoring policies and if they're authoring policies for a particular namespace you could for example assign a certain order for the policies that they are allowed to author okay that kind of ensures that they're not able to circumvent high-level controls that you have enforced for example controls such as the restrict and the denialist may be under the purview of the security team and when you're permitting other authors to apply policies you may want to have some governance around that and policy ordering is a feature that you could use to create a governance structure okay of course policies are Kubernetes resources and as such are subject to Kubernetes are back again when allowing other authors to apply policies you could control which authors can apply policies to which namespaces using are back of course tie all of these together with admission control alrighty so there's a lot going on in a calico policy I think we've simply scratched the surface here the best place to understand all the features all the capabilities all the fields and operators available in a calico policy is to refer to the documentation I've put the link here to the network policy and the global network policy documentation for project calico all right so with that you know let's talk a bit about calico enterprise and calico cloud calico os s is the foundation for calico enterprise and calico cloud we look at the policy features in calico os s and calico enterprise and cloud builds on top of this and offers certain advanced security policy features for example you could use policy tiering so when we build policies this time the policy model was in a single tier with calico enterprise and cloud you have the option to have multiple policy tiers there's a policy ui editor a policy recommender to such as policies based on active flows in the cluster policy dashboards with prometheus metrics policy auditing login capabilities and endpoint browser to identify which policies are applied to certain endpoints service craft to identify and understand security policy evaluation flow visualization for travel shooting and compliance reporting so add on functionality however the functionality provided by calico os s remains the foundation for this additional capabilities all right with that i think we're going to wrap this session again thanks for your time i hope it was beneficial for more information you can find us on the project calico slack channel and also the project calico documentation for further information around some of the policies and the policy features that we discussed in this session with that thanks for watching i hope you have a good rest of the day thank you