 Hi everyone. My name is Adam Wallace. I'm a Senior Security Software Engineer at NVIDIA. I've been here for three years. I have a focus on DevOps and applications development. I'd like to talk today a little bit about how NVIDIA is working to secure containers, not only within our infrastructure, but also those that we get to deliver to our customers. I'm going to talk a little bit about NVIDIA's background in history and then also our NGC container catalog. I'm also going to talk about our approach to security, sort of some of the practical ways that we're approaching security with those containers. And then I'm going to wrap up at the end with bring your own policy. So many know us for the graphics card. We've been synonymous with graphics cards pretty much since the invention of the graphics card. It's been a really important core business for decades with a focus on gaming. But there's been a shift over the last decade or so. The importance of the GPU has really changed in that there have been massive pushes into machine learning, deep learning and artificial intelligence. It's pushing these incredibly complicated systems directly to the consumer. With these pushes, it really changes the way that we look at the graphics card and how we support the graphics card. This is a full software stack that is required in order to support the new development. And it changes the way that we have to look at securing this stack, not only internally, but for our customers. NGC is an exciting service that we offer. It's a catalog of tool kits that enable and accelerate machine learning, deep learning and artificial intelligence. The catalog offers a few key features, some of which are storing Docker images. So it is a full Docker registry. It allows storage of Helm charts for your different Kubernetes workloads. It allows storage of artificial intelligence models. This really helps you get off the ground quickly if you're new to AI with models that are proven to work. This is all done with the goal of simplifying the effort required to enable GPU optimized workloads. Additionally, a private registry is offered. This is so that enterprises or teams can collaborate within a private space and store their different Docker images, Helm charts, et cetera, that they're maybe not ready to share or don't have the intention of sharing. We use Anchor Enterprise as part of NGC to help scan for security issues and let our customers know what sort of vulnerabilities might be applicable to them within the images that they're deploying. You can see a link in the bottom right there for more information on NGC. So often security gets a bad rap. What happens is, you know, it's three days to a release cycle, developer comes to us and realizes they have a bunch of security problems they didn't address or weren't even aware of. You know, they want us to drop everything and help them right away. And hopefully this doesn't actually come out of our mouths of security, but you know, the snarky response would be something, sure, I'm not doing anything else. I'll just, you know, help you and drop everything else I'm doing. And by the way, you're probably going to have a bunch of issues that you won't have time to fix. And you'll have to decide to either, you know, ship with them and acknowledge the risk there, or you're going to have to delay your release, which no developer wants to hear. No program manager wants to hear. So this presentation is really more about moving from the end of the process and shifting left as part of your thought in your development life cycle. You know, a view of this is as development starts kicking off, you know, you can think of rapid prototyping and development of software, probably not even really testing it. Things start off kind of hot and heavy, start moving down this hill that we have here. And you realize, I should be unit testing. So you kind of bolt that on after the development, start improving your code coverage a little bit, you know, you move on to integration testing to make sure that the different components really sort of interact together as expected. You know, start moving down the hill. This is where you might really hit your first big hiccup in that based on your different teams review styles, you might have, you know, a lot of feedback that is based on opinion, maybe that's coding style, maybe true problems are found or corner cases that weren't anticipated or brought up. And you have to go back and now readdress all the three phases we just talked about. Finally, you make it through that phase, you get the point where you're ready to merge, and then you think about security, you know, and as a security team, we don't want to be the police, we want to be a partner, we want to provide guardrails and not gates. So how can we help these teams? You know, are we just plugging in after the merge point looking for security problems? Well, I think, you know, the answer is probably no, it's a little late in the game to do that. So as you can imagine, we keep moving up the hill here, trying to look for the best places to plug in. And the best place to start is at the beginning. So some of the practical ways we can look at hooking in is described in this graphic here. So we'll start off in the first bucket talking about your code, you know, open source screening as described down there might be something like static analysis checkers. This is looking for, you know, basic security checks, making sure that you're writing secure code. This isn't going to catch everything, but it's a really good thing to automate. This is also reviewing your open source dependencies, looking for known open source problems and vulnerabilities directly in the source code repository. This can be done potentially even directly at the developer's workspace, potentially in their code editor. So think about like a VS code plugin that offers some sort of a static analysis or security code plugin. If they can be alerted before they ever even commit the code, they get the opportunity to address it right up front. Moving on to the build phase. This is running scans and checks as part of every merge request or even better every commit. This isn't just security. This can be code formatting, linting, type checking, for example, MyPy and Python. In addition to those security scans, this is probably your best time to evaluate your dependencies. This is not just your direct dependencies. So in Python, if you ask for the request library to be part of your package, you're also indirectly or transitively requesting URL lib3 and a bunch of other different Python packages. So this is evaluating your direct and your transitive dependencies, also committing those lock files directly into your source code repo so that you understand the artifacts that you are intending to ship. These dependency files might not be published in your final artifact and I would even argue they probably shouldn't be. So this might be your best and only time to really evaluate those dependencies against what's expected. Moving on to packaging. We're at KubeCon. So the obvious thought here is Docker. This is combining all of the artifacts into a singular package, view, container, whatever you want to say, that you're going to deliver to the customer. These artifacts that you have done in the previous two stages of the build really need to be looked at through the lens of the operating environment. And what I mean by that is if you're using a package, for example, let's say you rely on lib SSL, whether or not lib SSL is running in Ubuntu versus Red Hat has security implications. It may have been patched in one environment and not the other. So it's important to look at it through that lens. Also, this is a great time to look at the artifacts that were actually built into the container and compare them to the dependency list that you published in the previous stage. Do the direct and transitive dependencies that you requested? Did those actually show up in the container? What other dependencies are now showing up, for example, from the operating system in the container that you've chosen to deploy from? Moving on to release. This is hopefully just a set of checks and balances that the previous stages that we've already talked about meet the certifications that your team has to adhere to. This should tie back through all three of those stages. Moving on to config. Ideally, your configuration is being stored and treated as code. For example, let's say you're using Terraform to keep track of your AWS security groups. Those should be tracked and committed as code that describes your expected state. So as you move on to monitoring, you can check and see if there's any sort of drift in your configuration. You want to make sure that there is no drift. And if there is drift, you need to know, do you know how to alert the appropriate parties? Do you actually have alerting in place so that you could contact those parties in an automated fashion? This is just a quick view of an internal tool that we develop within my team called Inspect. It's sort of a product security catalog. The idea is to give you a single pane view of your risk posture against your bill of materials. This is not tied to just one single scanning tool on the back end. It combines multiple tools and gives you this single pane. It allows you to see different CVEs against some of your open source dependencies, et cetera. It also allows you to sort based on vulnerabilities and give an idea of what linked packages are providing those so that we can work with the teams and see if those vulnerabilities really apply to them. Oftentimes, you'll find that a vulnerability doesn't apply because of the way it was deployed or because they're not using certain functionality that only that CVE applies to. Moving on to container security. As many of you here are probably dealing with, Nvidia is also dealing with containers at scale. We're talking about hundreds of thousands of individual container digest that we care about. Thousands of containers that must be scanned every day. And I don't know what it is about machine learning, deep learning, and AI containers that are so complicated, but for some reason they tend to be large. I'm talking 20 gigabytes or more. And as you know, that's pretty large for a Docker image. So there's challenges that come along with these large containers. But, however, having all these things containerized really do bring some exciting possibilities and new risks and considerations to any developer on how this code is being deployed. This is definitely not an all-encompassing list of those challenges, but just a few I wanted to highlight. Tags are not immutable. This is probably not used to anybody on this call, but a tag is just a pointer to a digest. And while that digest is immutable, the tag is intentionally changeable. It's a feature and one that we all rely on, really, allows easy updates to the consumers of your container. But based on the fact that tags are not immutable, reliance on tagging strategies really can introduce risk if you're not paying attention. This can really range from a developer changing the contents of a container out from underneath of you. This might be unintentional. It could be careless or even malicious. But other times, this might be a very intentional change that the developer wanted to have in the container. They've been broadcasting it, but perhaps your team hasn't been paying attention to their release notes, or you're just not paying attention at all. And you're not ready for those changes. In the context of Kubernetes, your different pull policies can affect whether or not you're pulling the latest and greatest containers down at all. You might be stuck in a really old container image because your Kubernetes deployment has reference to an old image, and you've been telling it not to update it. Often, Docker daemons that are shipped with popular distributions don't enable user namespace and by default. So you might be over provisioning access to the containers that run within your infrastructure. This is granting increased privilege to resources on the host that you may not wish to deploy. We've already talked a little bit about some of NVIDIA's large Docker images, but skating these containers bring additional challenges. It requires more CPU, more memory, and more disk space as compared to just a regular static analysis code checker. Not to mention secrets and sensitive data. I actually believe it's easier within containers to accidentally leave a secret or something sensitive in a container than it is to properly leave it out during a build. There's a lot of ways to mess it up. It's a huge risk to an organization if an API token or an encryption key is accidentally left within a container and it gets published. And a container digest can persist forever, but a tag of latest, as you know, induces a false sense of security. That container might be the latest, but it might be three years old and not be receiving regular updates at all. So skating these containers really grants us some unique considerations. Packages and artifacts can be inspected in the environment in which it will actually run. So for example, you can validate that a compiled binary in the linkages. So for example, the libraries it relies on exists within the container and meets the expectations the developer had. Skating those packages, I talked about this before, though, before, gives you the ability to look through a lens to reduce false positives. So if Ubuntu has in fact patched a version of libssl and you're using that container, you don't want to see all the CVEs that don't apply to you in your list. It's extra noise that you don't want to deal with. Containers should also be continually checked against the feeds for ever-growing lists of vulnerabilities. The different feeds we have access to from NVD, from Red Hat, Ubuntu, et cetera, they're updated practically daily, sometimes hourly. So you can't just set and forget. You need to constantly monitor and be aware of the vulnerability posture that you have against your deployed code. One of the exciting changes to the version two of Docker registry was this multi-architectural support under a single tag name. It's really nice from an organizational standpoint because you can have a single tag that covers both ARM64, for example, and AMD64, but that also introduces a challenge of knowing the risks associated with each of the container images that rely within that tag. You might have a vulnerability that only affects ARM, but doesn't affect Intel or AMD64. So some of the ways that we're approaching this in a video, I'm going to switch gears here a little bit and talk about the evolution of how we've tried to approach this. Initially, we really tried to use this decentralized approach directly in developers' pipelines. We used Anchor's open-source engine, which allowed this game to run right in the pipeline, so it was nice. Each team could use their own resources to handle it. The results were also captured directly as artifacts in the team's pipelines, but they weren't really restored in some common database or an API where the security team could go and query the security posture of different teams. The next logical step was purchasing Anchor Enterprise and deploying our Anchor scanners within Kubernetes. A lot of benefits here, really. You have a centralized API where you can retrieve those results. It's constantly refreshing the feed databases so that you can, if you ever need to check your security posture against certain policies and checks that you have deployed, you can get those on a practically real-time basis. But there were some drawbacks of this approach. We had to store teams' API tokens within our SaaS instance so that we had access to different private registries that teams have within NVIDIA. Performance can also be a bottleneck of a whole bunch of teams queuing up at the same time and could cause a tremendous slowdown in pipelines. Some of what we're working towards now is that decentralized plus SaaS architecture. This is using the open-source product by Anchor called SIFT, which generates a bill of materials within teams' pipelines. The heavy lifting of generating the bill of materials is done within the teams' pipelines. SIFT promotes that bill of materials back to the SaaS endpoint for a quick processing to do a vulnerability check. Another advantage is that teams don't have to provide any credentials at all. Since SIFT is operated within their infrastructure, and that's where the images and intro are inspected, it gives us the advantage of not needing those credentials. We retain the benefits of SaaS API for the catalog, vulnerabilities on demand, without having to have a whole bunch of infrastructure within SaaS to support it. We allow each team to scale as they need to based on their number of images. Continuing along this whole idea of guardrails and not gates, we realized pretty quickly that policies were not a one-size-fits-all idea for different teams. We do want to provide guidance in a same set of default policies for teams. This is along the lines of vulnerabilities, so certain CTE scores. We might want to mark as critical or high and not allow a pipeline to pass. This is looking for secrets and credentials, perhaps stored within the different layers of an image. This might be internal or secret project names that we're not quite ready to get Twitter or Reddit excited over. Malware, so we can do malware scans in the different images as part of same defaults. Even things like looking for mistakes, typos, or even typosquadding that might be released in an image that might direct customers' traffic to incorrect domains. So what we really want to do here though at NVIDIA is encourage each team to sign off on their set of policies that are appropriate to their program, commit these policies as code directly within their source code repositories, and keep track of it. And then we allow them to submit that to our SaaS endpoint to evaluate their risk posture. This really allows us sort of on-the-fly policy and has really worked out well for our team so far. So as I'm wrapping up, I just want to talk about how we're trying to support developers in NVIDIA, how we're trying to make their lives, how to help them without inhibiting them. So one of the first things we look at when we investigate a new tool is sort of its API posture. So we're API first. We use a variety of build tools that developers are already comfortable with. And by build tools I'm talking more along the lines of pipelines. So Jenkins, GitLab, CI CD, GitHub Actions, TeamCity. So we're working in a variety of environments. So APIs are really important. Teams don't have the time or the luxury to learn a new UI tool for every single tool we roll out. And even worse, we know they're not going to be checking on any sort of a reliable schedule to see if they're now vulnerable to some newly released CVE within that UI. So we have to use the API within their pipeline to query and return their results. We need robust APIs for the services deployed at NVIDIA. We want to minimize the impact on development teams. Scan times have to be reasonable. If we're adding 15 minutes or even five minutes to a pipeline, developers are likely going to start looking for ways to bypass our scans or reduce the frequency. I mean, this just makes sense as a developer. You're on a tight schedule. You have a lot of work to get done. So we don't want to slow them down. Kind of combined with that as alert fatigue, we've really found that it's kind of productive to over alert teams. Receiving a Slack message for every different vulnerability report or every CVE is really just going to force them to invest more time in learning how to mute channels in Slack. And it is going to be with them trying to fix their vulnerabilities. And then optimizing the signal to noise ratio, every tool is going to generate some threshold of false positives or even worse, false negatives. So we're looking for tools that do a good job at trying to minimize that. That's why it's important to look at vulnerabilities against a vendor feed or against the operating system in which they run. So high priority for us and a primary consideration. The last thing I'm going to say is you just really want to provide these guardrails and knock gates. We work carefully and closely with our product security incident response team. And we hope to encourage teams and guide them through the process of a vulnerability is found within their products. We also help to provide a healthy set of guidelines to the team so that they don't all need to be experts on every facet of every tool that is provided to them. Just wrapping up here, we're going to have a Q&A session now. If you have any questions, feel free to ask them now. If you have more interest in NGC, you can find a link here below. And if you're interested in SIFT for getting an idea of your bill of materials, please check out the GitHub link below. Thanks for your time, everybody.