 Okay. Hi, everyone. Thank you for coming. My name is Ashna, and I'm a software engineer at Microsoft. Hi, I'm Peter. I'm also a software engineer at Microsoft. Thank you all for coming today. We're really excited to talk about Eraser. Ashna and I and a whole heap of others have been working on it for some time now, and we're excited to share it with you. Yeah. So Eraser briefly is a tool to clean up vulnerable images from Kubernetes nodes. And before we get into the technical details of Eraser, we want to give a little background on why we created it. So the first reason is to eliminate the risk of spinning up vulnerable images. So as you can see, software supply chain attacks jumped over 300% in 2021. And by leaving lingering images on nodes, we're giving attackers the opportunity to spin up vulnerable containers. The second reason is to eliminate alerts for non-compliant images. A lot of teams and organizations are overwhelmed with the amount of alerts they receive, and there's no easy and effortless solution to this problem. So a common approach developers will take is they'll create a cron job and they'll use container D to target the non-running images in the cluster. But this is a manual approach, so it takes up a lot of time, and it can be more error prone. And a common follow-up is usually what about the current Kubernetes garbage collection. Unfortunately, it isn't as efficient since it deletes by a percentage of load. So once the disk usage reaches 85%, it'll start the cleanup of the unused images until it reaches 80%. And while you can customize this value, you don't have any control over when it happens or what images it's targeting. And finally, these solutions don't have any customization features. So as you can see, Eraser uses a config map, and you can plug in different components like the scanner or the repeat period. You can also have control over the cleanup and more. So as a solution, we created Eraser. Eraser is a CNCF sandbox project, and what distinguishes Eraser is the control it provides the developer, so you decide what gets removed and when. Now I'll pass it on to Peter to go over the architecture. Yeah, thanks, Ashna. As the slide says, this is architecture, and we have this architecture diagram. It's a little bit dense with a lot of arrows and words. So we are going to break it down a little bit and talk about it piece by piece. Before I get into the questions that erase our answers and the actions that it takes, I just kind of wanted to talk a little bit on a higher level of how it works. So there are two basic modes that it can operate in. The first mode, you can remove images essentially on demand. You provide a list of images that you'd like to remove. And Eraser will just act on your behalf and remove those from the nodes. That mode is always active, so whenever you provide an image list, it's a custom resource that the controller will pick up and act on your behalf. You can also provide a wild card if you want, or a partial wild card. And for example, remove all images, you just provide a star. And then the other mode, and this is enabled by default, is that it will run on a schedule. And by default, it's 24 hours, but you can schedule it to run as often or as rarely as you need. And what's going to happen is, whether you trigger it manually or whether it's running on a schedule, there's going to be one pod deployed for each node. You can exclude nodes if you like by adding a label to them. And Windows nodes are currently skipped by default. But one pod per one node, and it will collect information about the node and answer these three questions. The first question, very simply, is what images are present on the node? Kind of self-explanatory. The second question and answer is of those images, which are not tied to a currently running container. It's an important question to answer because pods are supposed to be self-healing if your pod crashes and the image has been removed from every node in the cluster. There's going to be latency while the pod attempts to spin back up and the image has to be re-pulled. You don't want your garbage collector to destroy your cluster. And just as an aside, I asked Dolly to visualize what it would be like if your garbage collector destroyed your cluster. And it gets better every time I look at it because I guess that says Kubernetes, kind of. And there's all our pods in a trash can, or are they nodes? Nobody knows because everything in Unix is a file. Everything in Kubernetes is a box. And they're all connected by ribbons, just like in the diagrams. Anyway, the third and potentially the most important question is of the images that aren't tied to running containers which contain vulnerabilities or a known CVE. And so the process of determining what to remove consists of listing the images, removing the ones that are tied to a currently running container, and then removing images that are not vulnerable. And this is the final list of images that we will delete. So now we can start to look at the architecture diagram and piece things together. The goal here is to give you enough information that you can extend Eraser because it is extensible. We have a pluggable model that allows you to configure its behavior down to pretty fine-grained details. So the important part here is the cube in the middle, the pod. The pod consists of three different containers, and they kind of act as a pipeline. So the collector pod's output goes into the scanner. The scanner's output goes into the remover. They share a small slice of the file system and communicate using named pipes just to get the information from one place to another. And they run in sequence. And they don't necessarily correspond. The three containers don't correspond one-to-one to the three questions. The first two questions are answered by the collector. And it collects information from the node using the container runtime interface. Most of you probably know what that is, but if you don't, the container runtime interface is the common interface that all of the runtimes provide. So it doesn't matter if you're using container D, cryo, even Docker shim, they all provide the CRI as a GRPC service. And you can do things like list images, list containers, so you can see how you can filter down that list. Then that list is then fed to the scanner. By default, the scanner that ships with Eraser is Trivi. Again, most people here probably know what Trivi is, but if you don't, it's an open-source vulnerability scanner. It will analyze the file system layers of your image and compare it to a database of known CVEs. It will look at various bits of OS metadata. It'll tell you how severe the vulnerability is. It'll tell you whether the image is end-of-life and no longer supported upstream. It'll tell you if there's a version of the package in, say, the Debian repository of the Debian image that fixes that vulnerability and if so, which version it is. And you can configure Eraser in its stock Trivi image. You can configure Eraser using the config map to act differently based on those criteria. But even more cool, I think, is that you can write your own scanner. You don't even need to use vulnerability as the criteria if you don't want to. Essentially, at the end of the day, you're just deciding what to remove. And Asha will go through it at the end, but we have a template set up on GitHub that handles the communication between the container that comes before and the container that comes after. And all you have to do is provide the custom logic and return a list of images from a function, depending on whatever you want to do to it. So then that final list is fed to the Remover container, which also uses the container runtime interface to call delete images on each of the images in that final list. And then your cluster is clean. And Asha is going to walk through our first demo. Okay, so our first demo is to use Eraser to clean up images periodically. So by default, this happens every 24 hours, but you can change that. So you deploy Eraser once and it'll handle the scheduling to clean up the images every 24 hours. So for this demo, we're going to start by creating a kind cluster. And we can call it Eraser demo. And for the sake of the demo, it's going to have one node. And I can skip this. And then we're going to apply a demon set with a vulnerable alpine image. So we're going to use this to load the image into our cluster. And then we can check that the pods are running. And now we're going to delete the demon set so that we have the alpine image present as a vulnerable and non-running image in our cluster. And we can check the pods again. And then we can exec into our node to make sure the alpine image is present there. And then we're ready to install Eraser. So we're going to use Helm to install Eraser and we're installing it in the Eraser system namespace. And once we install Eraser, it'll schedule the controller manager, which will then schedule the Eraser pod. So if we skip ahead, we can see when we get the pods, there is the controller manager pod. And then the one Eraser pod, since we have one node in our cluster. And we can also see the three containers that Peter mentioned before have completed. There was the collector container to get the list of non-running images, the scanner container to scan the images using Trivi, and then the remover container to remove the images that were found vulnerable. And they ran one after the other and communicated with namepipes. And since this pod has completed, the controller manager will go in and remove that pod. So we're left with just the controller manager now. And if we exec into the node again... I'll just skip back to the end. If we exec into the node again, there's no return value for the Alpine image. So we can see Eraser was able to successfully target that image and remove it from the cluster. And so that's Eraser running kind of in its scheduled mode. And I'm going to demonstrate how you can use an image list to skip the scanner and collector entirely and just... There we go. And just remove the images on demand. So similarly, I'm creating a kind cluster here. I'll skip that. There are three nodes in this cluster. And we're just going to look at the images that are on one of the nodes. And it's a whole bunch of stuff. For example, there are Kubernetes images, Kube Proxy, the API server, all that stuff. There's nothing... No NGINX there. So I'll skip ahead. We're applying this demon set here. One thing I forgot to mention earlier is people have asked us also why we don't use a demon set. To run Eraser, it would be a lot simpler to not have to manage the pods ourselves. The reason for that is that a demon set doesn't run to completion. If the process exits, it'll attempt to restart and is considered an error condition. The other reason is... Well, the reason that we don't want to have a long-running process is that in order to communicate with the runtime, we need access to the socket, which requires root access. We don't want to have a long-running root process in our cluster. So I've applied... Coming back to the demo, I've applied an NGINX demon set just to get NGINX on every node so that we can look at any node and see that it's there. So if we exactly into the node, these images are one and the same. One is just by tag. The other is by digest, they have the same digest. You add the eraser repo so that you can get the helm chart. The installation here is a little bit different because I don't want the scheduler to immediately kick off a job. I'm going to set some helm values to disable the collector and to disable the scanner. So it won't scan at all. It's just going to remove what we tell it to. So I just set these values to false. You can also do that in the config map. They are 100% mapped to each other. So we have the controller manager pod which we'll pick up on the image list that we will eventually create. We delete the demon set and you can see that those images are cached on the node. They will just sit there taking up space unless you intervene and remove them. So I'm going to zoom out just for this part of the demo so you can see the cause and effect here. It might be a little hard to see, but on the top we're watching what's happening in the eraser system namespace. On the bottom we are creating an image list. The API version is eraser.sh slash v1. The kind is image list. Kind of importantly, for now we can only have one image list in the cluster at a time and the name has to be image list. And then the real meat of this is the list of images. And so we're just going to give it that nginx image and check whether or not it gets deleted. So this is about to get applied so I will move up here so we can see what happens when it gets applied. One pod for one node. They all run. Let's do this quickly. And now we can check the node again. Okay, it's gone. There's nothing called nginx. Let's just double check to see whether or not that digest starting with um ADD479 is there. No, that's gone as well. The image has been deleted from our cluster entirely. So that's what eraser can do for you. And I'd just like to take a minute to whoops, talk about I'm stuck. Take a minute to talk about our future work. Hopefully what we'd like to see is people participating and a lot of future work being scanners that people write for themselves. Issues that people file. Future requests that people bring to us or want to contribute themselves. That's our ideal future work. But things that we do have planned right now are possibly surfacing vulnerable images. Giving some kind of report what's found on the cluster and what failed vulnerability scan. Staggering the load so you don't overwhelm the cluster by scheduling all the pods on all the nodes all at once. So you have the jobs run in waves. It's a little more complicated to implement. So the runtime interface has the concept of pinned images like the core Kubernetes images will be pinned so they don't get removed by the garbage collector. We don't currently respect that and we need to do that in the future. And then we want people to be able to have a custom location for the runtime socket. Currently we expect it to be in its default location. So we really want to encourage everyone to get involved. So we've linked our contributing guide as well as our Slack channel and GitHub repo where you can find us and request any features or bring up any issues that you might see. And we've also linked the scanner template repo if you want to try eraser with your own scanner. And since we have time I can show you that really quickly. So this is the link in the previous slide. And we have an example to show you where you can implement your own scanner. So all you have to do is put your implementation in the scan function and we've handled the communication with our collector container to get the list of images and the remover container that will remove the result of your scan function. And you don't have to use this for vulnerable images. Like Peter said, you can have eraser flag any type of image by putting the functionality in this scan function. And then once you have your code there you just compile this and put it in an image. And then in the config map or in the helm chart you provide that image as the scanner image. And finally we would of course like to say a huge thanks to all the Eraser Devs and the CNCF. This is my first KubeCon. It's Asha's first KubeCon. We couldn't feel more welcome. I'd also like to thank my friend Josh who knew I'd be a little nervous speaking and showed up in the front row dressed as a hot dog. Thank you, Josh. Thanks everyone. And you can find us at the end if you have any questions. Yeah, come find us if you have questions. Happy to talk about it. Thanks guys.