 Hello, welcome everyone to another QtCon session. I hope you enjoyed the event. In this talk, I'll show you the different ways to attack storage volumes that are dynamically provisioned with a CSI drive, and how you, as an SRE, can be read that. My name is Hendrik. I'm a solution architect at NetApp, and I help our customers with storage and data management solutions for Kubernetes, both in public cloud as well as in their own data centers. And one of the questions I get a lot is around CSI, you contain a storage interface, which is really nice because it orchestrates the equation of volumes, snapshots, and clones. But the question is, how does it protect my data from unauthorized access? In this session, I will show you what is available in Kubernetes, and how to enable and configure it. We will have a live Q&A at the end, so if you have any questions, please let me know. You can also find me on the QtCon Slack if you'd like to discuss anything. So just as a very brief recap, how does dynamic storage provisioning with CSI work? For the purpose of this session, I'm going to focus on the pot level. Of course, in a real world environment, it would have to stay for set, if it is a stateful workload, or a deployment or replica set. I'm leaving these details out that they're not relevant for this topic. I'm also focusing on dynamic storage provisioning with the container storage interface. So with CSI, along with the pot, the user requests another Kubernetes resource, the persistent volume claim or short PVC. That volume claim references the storage class. The storage class allows to offer multiple service levels for storage in our cluster. So the user can decide what type of storage service he needs by using a specific storage class for the volume claim. And this is where the container storage interface comes into play. The storage class has a CSI provisioner, and this is how you can integrate different storage solutions into Kubernetes. Each solution has its own provisioner that takes care of everything that's required for that specific solution. But for the user or application, all of these complexities are abstracted away. You just create a PVC that references the storage class. Everything else happens automatically by whatever CSI provisioner is running in your cluster. The provisioner creates an actual storage volume. Depending on the solution you're using, that can be a software-defined storage running in your Kubernetes or an external storage array or storage server. The volume from your storage solution is then represented inside Kubernetes as a persistent volume or short PV. And that PV is bound to the volume claim and finally made available as a file system inside your container. So much for the basics. So let's think about how an attacker might try to get access to our data. The first one is easy. Assume the attacker can run another pot in the cluster. Now, an obvious target are our Kubernetes resources, the PV and the PVC. If someone could get access to these, they would be able to mount the volume into their own pot and access the data. Fortunately, the answer to this problem is easy. What do we do to separate resources that are within our cluster? Correct, namespaces. The interesting point when it comes to storage is that only the PVC is namespaced. The underlying PV, as well as the storage class are global resources in the cluster. They are not within the specific namespace. That is okay for the storage class as it's only used during the dynamic provisioning of a PV but doesn't give any access to the volume itself. We're going to look into limiting access to a storage class later on. For now, just remember that there can be one and only one default storage class. Whenever a user doesn't specify in the storage class as part of the PVC, the default class will be used. And most users don't specify class, so the default class will typically be used a lot. And that means you should be extra careful when deciding what the default is. The PV, though, that looks like it might be a problem, right? It's not part of the namespace, but it directly gives access to the underlying storage volume. The good news is that while it's not part of the namespace, Kubernetes still sort of attaches it to the namespace once it's bound to a volume plane. And since we are dynamically provisioning the SCSI, the binding between PVC and PV happens immediately. It might look like the PV is unprotected because it's outside the namespace, but Kubernetes ensures that no one else can access it. Let's take a look at how that works. I have a Kubernetes cluster with three storage classes, server, gold, and Vesca. I'm using the NetAppTriedim provisioner simply because I'm most familiar with that one. But all of this would work the same with any other CSI provisioner. So let's create a volume and a pod that uses that volume. I have a PVC resource, just a few gigabytes, that uses the Vesca storage class. And I have a very simple alpine pod that mounts this volume as slash data, and otherwise, really doesn't do anything useful. Let's go ahead and create the PVC and the pod in my Tatooine namespace. That only takes a few seconds. We can now check the volume claim and we see that it is bound already. So all provisioning is completed and the volume is ready to use. We can also check the underlying PV and that is bound as well. We also see that the PV has a reclaim policy of retain. So the volume will continue to exist even if the claim is deleted. This is especially useful if you want to delete the PVC but keep the data for later reuse or restore requirements. Now let's take a closer look at the PV by getting the full YAML output. The magic that protect the PVC from being accessed by anyone outside my namespace is in the claim ref. Kubernetes adds this to the PV automatically as soon as it is bound to a claim. And this contains a reference to the PVC, its namespace, as well as the specific UID of the claim that it is bound to. As long as this claim ref is in place, Kubernetes will not allow any other PVC or namespace to request this volume. What happens if I now delete the PVC? The PV is still there because the policy is set to retain but the status changes from bound to release. Even in the released state, no one can reuse the volume or access the data because the claim ref still exists. I can now patch the PV and remove the UID reference of the old claim. And when I do that, the status of the PV changes to available. So now I can reuse the PV and bind it to another volume claim. But the claim ref still contains the namespace and PVC name. I only deleted the UID. So the PV is still protected and cannot be accessed from any other namespace. If I want to make it available to another namespace, I would need to delete that part of the claim ref as well. Since the PV is a global resource in Kubernetes, no one else but a cluster admin can make that change. The inherent security model of Kubernetes protects the PV even though it's not part of the namespace. And it also protects it beyond the lifecycle of the volume claim. So protection mechanism number one is kind of obvious. They use up namespaces and make sure that no one has access rights to the persistent volume resource. So if PVC and PV are protected, what else might an attacker do? Well, bypass the whole mechanism of PVC and PV and directly access the underlying storage volume. This is often connected via standard protocols such as NFS or ISCAS. And Kubernetes has built in functionality to direct inline NFS or ISCAS access in a pod without using a PVC or PV. Fortunately, there are ways to prevent that, in particular pod security policies and world-wide access control. Let's take a look at how that works. We have our pod that has the volume attached. We can take a closer look at that if we access into the pod and check what is mounted. And from there, we see that slash data respect by the NFS mod, simply because my Vesca storage class uses NFS storage. This would be ISCAS-y or anything else. We would see some more details about that, such as the device path on the Kubernetes host. The actual mount is performed by the Kubernetes node, not by the pod, but we see the mount form within the pod as well. Let's assume that someone found out that this is the NFS mount. It contains the UID of the PV, so nothing you could easily guess. But there are tools such as Showmount to discover all mounts on an NFS server or ISCAS-y discovery to find all ISCAS-y mounts. With that knowledge, an attacker could simply do an inline NFS mount in the pod. And that would look like this. We have a simple pod definition. If we specify a volume and tell Kubernetes that this should be accessed via NFS at the same server and export part that our PV is using. So we bypass the whole PVC and PV mechanism and do an inline storage access. And Kubernetes will happily do the NFS mount for us and make the data available to the pod. I'm using NFS here in the example, but this would work the same with other protocols as well. So how do we prevent someone from doing this? How do we control what a pod can do? Pod security policies. Let's take a look at an example. Besides the regular security policy settings to prevent pods from running as root, we can also specify what volume types a pod can use. There are a couple that we should include here, such as config maps and secrets because we want and need these. And in Kubernetes, they work like a volume as well. And we want to have persistent volume claim so users can dynamically provision volumes. What we don't want is volume types such as NFS or iSCSI. So make sure these are not in your pod security policy. If I now switch to a user that is limited by a pod security policy and I then try to create the pod with the inline NFS mount, Kubernetes will prevent me from doing so. The policy provides protection against that attack. But there's one more thing our attacker could do. Instead of doing the NFS mount inline in the pod, he could create a static persistent volume that does the NFS access. And that would look like this. We simply define a persistent volume and tell Kubernetes to use the NFS export as the backing storage for that PV. The attacker could then simply use a PVC that claims this specifically manually created PV. We allow PVCs for the pod because this is what we want to use. So we cannot remove the volume claim from list of allowed volume types. But we can also not prevent the PVC from claiming a specific PV that already exists in the environment. Pod security policies do not help with this. But what we can do is to use robust access control and disallow the creation of PVs. This does not prevent dynamic provisioning as the PV is created by the CSI driver and that has permissions for it. But it prevents an attacker from manually creating a PV that access storage, it shouldn't have access to. So the SREs are expected by enabling pod security policy that disallows direct storage access and by configuring robust access control so that users cannot manually create a PV. We talked about preventing access to the PVC, the PV and the underlying storage volume. But what about a valid application that should access the volume? How do we ensure that the pod system permissions are correctly set for our application? Kubernetes can take care of that for us. We just need to enable it in the security context of our pod. Let's take a look. Within the pod, we can set the security context to control the user that our app is running with as well as the group. In this case here, I set the user ID to 100,000 and the group ID to 200,000. In addition to that, I can set the FS group parameter and this specifies the group permissions on the volume. So when the container starts, Kubernetes actually sets the group of all files and folders in our PVC to this value. And the container is started with this group in my example, 300,000 as a supplemental group. So it's the application can access the data on the volume. There are two caveats with this. First, even if you specify the FS group in the security context, there are some cases in which Kubernetes does not apply these permissions. Simply because it's not possible as the underlying storage does not support that. Or in cases where applying those permissions might break things. For example, if you have a read-write many volume with multiple pods that access the same data, you cannot have each pod apply different permissions to the volume. One pod would effectively log out the other pod. Also, if the FS type is not specified in the storage class, Kubernetes will not apply those permissions, assuming that it is a file system that does not support this. So in those cases, Kubernetes does not apply the FS group permission, even if you set them in a security context. There's a recent addition in the CSI spec that allows drivers to specify if they support FS group changes or not. That will hopefully make this a little bit more transparent in the future. The second caveat with FS group is performance. By default, Kubernetes will do a recursive modification across the entire file system every time the PVC is mounted. Usually you would only need that operation the first time the PVC gets mounted. But Kubernetes doesn't have any way to figure out if this is the first time the volume gets used or not. So it goes through all files and directories every single time. And if you have a large file system, there's lots of small files that can take many minutes and your pod won't come up until this is done. So every time your pod is rescheduled to another node, the volume has to be mounted again and this process is triggered. In order to address this, Kubernetes 120 introduced a new option in the security context. You can now set the FS group change policy to on-route mismatch. Kubernetes will then check the permissions of the volume route. If they already match the requested FS group, Kubernetes assumes that the rest of the file system is okay as well and skips the recursive modification of permissions. So unless you or your application mess around with the permission, this option enables Kubernetes to only apply permissions once and avoid the performance penalty on every mount. Let's see this in action. We create the PVC and the pod that has the FS group set to 300,000. If we now exit into the pod and check the permissions of our slash data mount, we see that they are correctly applied. So we looked into securing the PVC and PV, preventing inline mounts of the storage volume and controlling process potential. But how do we ensure that the user on our platform behaves? How do we control the amount of capacity consumed by a namespace? Well, the same way we control usage of CPU and memory, resource quotas and limit ranges. Let's explore that. This is an example for resource quota that controls storage consumption. We can limit the number of PVCs as well as setting a limit for the total capacity of all PVCs. We can also break those limits down by storage class. So in this example, the user can consume a total of 200 gigabytes from storage class Basker and with a maximum of five PVCs. And in addition to that, 600 gigabytes from storage class gold with a maximum of 10 PVCs. Let's see if that works. We have our example pod and now I'm requesting 500 gigabytes of storage class Basker. Remember the total limit for all volumes in that class was 200 gigabytes. So I shouldn't be able to get a single 500 gigabyte volume and I can't. Now this sets upper limits for the whole namespace. In addition to that, we can also apply limits for each individual volume. We do that with a limit range. For example, we could specify that each PVC has to have a minimum size of one gigabyte and a maximum size of 10 gigabyte. Now if I modify my pod to request the 20 gigabyte volume, the limit range will prevent me from doing so. Last but not least, there are a few things outside of Kubernetes itself that you can do to improve security. First on the protocol side. Many CSI drivers use standard protocols such as NFS or iSCART in the background. We won't have time today to go over all of the best practices for these protocols. But here are a few things you should consider. For NFS, the first question is about protocol version. NFS version three has been around for a very long time. The code for the client is very robust and optimized. The NFS v4 standard also has been published for quite a while now, but adoption was slow. But in general, security was much more of a design focus for v4. So it should be considered the more secure version. However, a lot of that is based on using Calvars authentication. Something you cannot easily implement in your Kubernetes cluster as there's no way to distribute the required key tab across all the nodes. So you won't really benefit from Calvars, but if possible, I'd still say go for NFS version four. Most Linux systems are configured to try version four first and then fall back to version three if necessary. But you can also enforce a specific version with mount options specified in your storage class. As all mounting of volumes is handled by your CSI driver, there's no need for the show mount utility that reveals the list of all exports on the NFS server. As we saw earlier, the mount path can be used to create an inline mount in a pod. But not revealing that mount path in the first place is a good additional step you can take. So if your NFS server allows to disable show mount functionality, then make use of that. Root squash in NFS allows you to map any file activity of the root user to someone else, typically the user nobody. As your containers shouldn't run with root anyway, enabling root squash does not impact your applications. But it makes sure that nobody can access files with root permissions. This is already enabled by default on most NFS zones. Most servers also allow you to configure export policies that limit access to an NFS export to specific IP addresses or host names. It acts a little bit like a firewall, just for NFS. As the nodes in the Kubernetes cluster might change over time, your CSI driver should have the ability to dynamically update this export piece. For Iskasi as a protocol, two things are relevant. First, Iskasi allows you to configure authentication between client and server. This is based on check. That might not be the most secure authentication mechanism in the world, but it definitely is better than no authentication at all. If you have a good CSI driver, that should automatically handle or check setup on the nodes in your cluster. And then, similar to the export list for NFS servers, many Iskasi servers support initiator groups as a way to limit storage access to specific nodes. In the case of Iskasi, that is usually not based on IP addresses, but on the IQN identifier of the client. And again, if you have a good CSI driver, that should manage this automatically for you. So, whenever a new node is added to your cluster, the CSI driver automatically adds the IQN of node to the list on the store site. And beyond that, please check the documentation of your CSI driver. Many drivers support additional features that are vendor-specific and allow you to control encryption of data in flight and at rest, configure quality of service to guarantee certain performance levels, but also to limit storage consumption based on number of IOPS of Android. And best practices around securing the driver itself and the communication between driver and the backend system. Let's do a quick recap. Make yourself namespaces and the inherent security model of Kubernetes will protect the PVC in the namespace as well as the PV that lives outside of the namespace. Configure a pod security policy that prevents inline NFS and Iskasi access. Enable role-based access control so users cannot manually create a PV. Everything should be based on dynamic provision. Set the security context of your pod to control the file system permissions of the world. Make use of resource quotas and limit ranges to control storage consumption. If you use storage protocols such as NFS and Iskasi, configure them correctly. And check what additional features you see as IDriver can offer. I hope you enjoyed this session. If you have any questions or comments, please let me know. Thank you.