 Hi, welcome to software supply chain aspects in infrastructure as code. My name is Leo Kaplan and I'm the open source officer for checkmarks. I'll start with a short background about myself. I started as a Linux sysadmin before there was cloud or the title was referred to DevOps. I'm a Debian GNU Linux developer, and I work with the PHP security as part of working both for Zen, the PHP company, and part of my work in Debian. Today I'm managing the open source program office at checkmarks and I'm leading the kicks open source project with Stanford keep infrastructure as code secure. Today we'll talk about infrastructure as codes and its relation with software supply chain, and how can we learn from software supply chain regarding infrastructure as code. Let's do a short review about software supply chain. In the slides before you, which I took courtesy of Justin Murphy from DHS. You'll see logos of major incidents regarding software security related to open source. It's including shell shock heart lead meltdown and a lot of others. But what's common to all of them. They were completely not intentional. They weren't malicious. The developers of these open source project had a regular mistake while developing what we call we know as a bug. When the the bug was reported they did their best to fix it and share it with the community in a responsible way. The developers didn't try to use the problem in their advantage. They just try to fix the mistake they've done accidentally. When we talk about software security. We talk about the famous quote those who cannot remember the past are condemned to repeat it. And as we see with current cases of software security and CV is being reported. We see there is an increase in effect. In the numbers of problems we see, and increases use of our dependency on such software. The graph that you see does see the trend. I think they they didn't create the graph for 2022 recently, or in the last few months as you can see from the table below. We still have December for this year, and we're really surpassed what we have been reported for last year. It's not a huge change, but the idea is that we get more and more software and of course more and more security related reports regarding that. But when we talk about something as code, we might ask ourselves what happens for those who don't learn from the present, not only from the past, but from things we see at the moment. We're not talking about anything as code right now because there's a lot to learn from software. And of course software is code. Let's review quite quickly. Some of the major incidents we had in the last few months. I'll talk mostly about August but also reference November, which had a few incidents by itself. The research I'm going to show you was done by the checkmark software supply chain research team, and they share their results on medium you can follow up on that as well quite easily. In August we started to see large scale campaigns created to fake GitHub project clones with fake commit added a malware to them. So attackers try to mimic whatever there's an open source, adding a malicious bit to it using of course misleading names and so on, just trying to get people to use these projects. So we saw type of squatting campaign targeting Python store packages, creating automatic code and injecting malware, and we started to see these as widespread campaigns, which led to a threat actor publishing more than 1000 malicious pi pi and npm packages, trying to persuade people to use these packages. At some point we started to see malicious users trying to strike again with another set of tools from the supply chain world in ways to to mislead people. And at the end of the of August we start the first non fishing attack against pi pi users, meaning attackers targeted the maintainers of packages trying to steal their credentials in order to be able to release in their names. In November, we started to see new ways of attackers trying to get these malicious packages take action, and we started to actually see use of tiktok challenges, or popular videos in tiktok being the base of encouraging people to install malicious packages. In this case, a package that might help reverse the effect of invisible narrators in tiktok movies. And then we see attacks on more sophisticated takes trying to steal credentials. In this case discord, trying to steal cryptocurrency wallets, and a lot of problems, based on these kind of malicious packages, going back to anything as code. We might have more than just regular software that we were used to. It might be infrastructure, it might be policy, it might be configuration. And there's also a trend that calls everything as code, in which anything we can create a menu from. Sorry, files from, we can save them might be treated as code if we can formalize their structure and so on. Everything as code. It brings a lot of the good things we see in software and it's code efficiency, repeatability reuse the ability to scale. And of course code management through get and the software development lifecycle and get ops. There's a lot of benefit on those, but we also inherit the problems. For example, dependencies and in some cases we know it's dependency hell. And of course with dependencies we also get the software supply chain or the effect of supply chain, because we're not only talking about software anymore. If we talk about infrastructure as code. Let's talk about some of the efficiency. It brings us. We have the from close in Docker files, by the way, include wasn't accepted. I checked their GitHub repository and their history. So we can reuse base images with terrible we have templates with Kubernetes we have hand charts with cloud formation we have the AWS include phase. Sorry phrase. So in all of these cases we can reuse the resource written by us previously. In a lot of cases we also can reuse resource written by others. It's not necessarily from an organization. In this case, it raises the question of trust. When we use some code from someone else. How do we make sure it's secure, it's safe, and so on. And before we jump into that. We might find a the need to find the right containers. We might want to find something to use for our own tool chain or or project. Let's say we want to find the right container for our job. And that would be quite similar to finding the right npm package or pi pi package and others whatever repository you walk or you walk with. We might be able to to check to find something relevant by the number of downloads by the last updated date by the number of stars in popularity. Of course, by indication it's a verified publisher. And in a lot of cases, we might be able to check something. We'll go to the GitHub repository we look at the number of stars in this case I use me now as a reference. It's a quite popular project to create a S3 interface. As you can see it has more than 300 30,000 stars on GitHub. And it's very successful. Let's say we want to find the helm chart for that project. I went into a hand chart. Search called the artifact hub. It's a since a project being incubated. And I looked for me now and I'll show you the result. In this case we see we have a few more a few results and I mark some of them. So we could discuss it quite easily. We see there's something which is called me now. The repository is referenced as me now official. It only has two stars. And it was last updated 10 days ago. Next week, next week we see there is another project. Sorry, another results called me now as well. It has 65 stars. It was updated a day ago. Quite interesting. I wonder which option would you choose? We see there's another option which is called me now. The repository itself also called me now same as on GitHub. And it only has one star but it was updated updated two years ago, which in a lot of cases looks quite weird. We see that in some cases we have the logo with the name. In some cases we also have the icon, like in the top bottom right. And in this case it makes it's quite confusing to choose the right option, because all of them have the same name. We need to distinguish between the different characteristics on our own. And we don't have any indication which one is the official one. And talking about the levels of typosquoting, typosquoting is when you try to mislead someone by choosing a similar name, in this case we have the exact same name, but different levels, sorry, different details. And when talking about infrastructure as code, there's different levels of typosquoting. There might be typosquoting in the Helm chart on other or other infrastructure as code files. There might be typosquoting on the container name, and of course regular typosquoting in software where the name of the packages might be misleading. In case we want to check to find the right container and not the Helm chart as we did previously, again we need to go to a search engine and search for Minayo. And in the same case, how would we write it? Downloads, last update date, stars, verify publishers and so on. In this case, you can see we have quite a lot of results. There's 2300 results and a little bit more. The first result would be Bitnami Minayo, there's an indication it's a verified publishers. It has an increasing, sorry, a high number of downloads, more than 10 millions and about 50 stars. And then there is another verified publisher with one minual downloads, but that was updated two years ago, and another one by IBM, which was updated three years ago and it has a verified publisher. All of these looks very, we might be able to trust all of them, both the verified publisher, both the number of downloads. The updated date might be a little bit suspicious for the last two, but it might be confusing for people. Also notice the name that the rancher one is Minayo-Minayo, but the Bitnami one is Minayo and IBM also is called Minayo. The official result is on the next pages, and as you can see, it has 500 million downloads and 600 stars, but that's not the top result, which means that when we search anything on the repository or the registry to be exact, the first results might be misleading here as well. And in this case, we might trust the container we get from Bitnami, but it's not the official one. The official one comes from the open sort project. And in this case, you also see that it doesn't have any logo. It says it's by Minayo, but we have no idea to verify that. And that's another big problem. One thing is that from the same author, we have a different library. And in this case, you can see that the client, MC's Minayo client, was updated 15 days ago while this image was updated 15 hours ago, which is also something which might mislead a lot of people. Besides selecting the right hand charts and container, there's a question, how much do you invest time in actually reading those or checking that whatever you you you decided to use is either from a trusted source, or did you check the commands, or whatever the the ingredients being used to create those. In a lot of cases people go to the documentation, copy paste commands copy paste instruction just install it, and any mistake in in these commands or references to files might be might have a catastrophic effect, would there be an unintentional mistake or something which is malicious. For example, we'll talk about the WordPress container and we'll demonstrate it with it the container supply chain. Whenever you install. Sorry, whenever you use the WordPress container official container by the way, you actually have a front close that says, I want to use PHP version, some version and some variant of distribution. In this case, PHP version seven four, based on Apache web server. If you go to that container as well, you'll see which it is based on a container from Debian. In this case, Buster Slim, Buster is one of the names of a release by Debian and slim means it's probably a slim install or probably smaller one to load the footprint. And in this case, will indicate that WordPress uses PHP container built from source, and that's in, in case use Debian already available images, and actually we uses them. So if you go to look at the Debian container will sorry the Debian Docker file will actually see that it's been used from scratch. We just add a root file system and call the command line. And in this case there's a problem because you don't you don't know anything about these files, just know it's a terrible being extracted and that's our container. So if you look at the URL below, you'll see that it's it's not coming from the official GitHub organization for Debian. It's come from something else, although it says Docker Debian artifacts, the name of the users looks a little bit suspicious. And I'm saying that as a Debian developer. I just kept digging and trying to figure out what what are the sources. And I was able to find a page in Debian that says, what are the checksums for all of the of the containers. And that was a way for me to verify that whatever I downloaded was an official image. I'm not sure everyone that uses containers would go to check the supply chain for containers would try to do the research on all of them and verify the checksums. And of course, it's there's a difficulty in doing it for each time you want to use a container. And of course supply chains might be much longer than the three examples on this supply chain I showed you. So you might feel safe in this case because we were able to trace the steps from WordPress to PHP to Debian, check the Docker files. So it check the sources and it's looked reasonable we're able to check to make to verify the checksum for the base image. Everything else was targeted from official sources. So it looks safe. So my question to you at this point, do you feel safe. And if you say you do, then I have a follow up question. Who said that the Docker files I showed you are the ones actually used to create the containers. And if even if you had the files, how would you know that these are the files that actually been used to create the containers, or how would you verify that they weren't the one being used. And in this case, remember that whenever you download the container you just download the artifact you don't get the sources and that's a major problem. So, if your answer is that you're safe, you can just scan the container, you write it to a certain degree. But in this case, I'll quote a blog post by Don Lorna, which is also a keynote for this conference. And he, he checked, and he actually claims you're wrong because whatever your scanner your container scanners doesn't know can hurt you. And result might vary between container, sorry between container scanners, but the, the most important thing from his blog, his blog post is that scanners check for very specific things inside the container. And most of them, they depend on metadata available in the container and mechanism available as part of the container or the operating system in them. For example, the scanners might query the package software management from the operating system, it might query NPM, PIPI or PIP sorry, or other installers available as part of the containers. And that makes metadata quite important, because if container scanners mostly verify the package they can see, we should treat the information or metadata is important. Otherwise, we see nothing. And we might want to invest in those custom scans rules for popular or special edge cases, for example, node. We just get the binary and not installed from the operating system. But that's an endless game of cat and mouse. We create a custom rules then we find out there's a package that wasn't scanned. And that's, that's a problem. The first thing is we, we want to reuse existing artifacts instead of rebuilding them in the example you, you saw with WordPress and PHP WordPress use the official PHP containers, and they're being built from source. It means they don't come from the Debian distribution, although Debian has its own PHP packages, they're being maintained there's a security team for Debian, which maintains the packages on top of whatever the PHP maintainers do. I know that from the past from first hand because I also have been a few cases, securing these packages, and also walking with the security patches and making sure that the PHP maintainers took them, or making sure that PHP security time. Put them in place. And in this case, some of the metadata we also have is from the distribution itself. In this case I opened the, the terrible. We saw a reference to earlier from Debian and each package has both the files themselves. There's a list of files but they're also the MD five sums of the files being supplied. And this is metadata we can verify against making sure the whatever we see in the container come from a verified publisher, and the results is something which is official. And that's great for us and that's great news. But the problem that metadata is not always present. And the things I'm referring to is this wireless containers. They're still based on Linux distributions, which might be a little bit counter-intuitive regarding the name, but the idea with this wireless containers is that they've been reduced to not hold the Linux distribution they've been built from. The idea is to save size. There's claims that it saves attack vectors or attack surfaces, which is true because if you have less files in your containers, there might be less software in it, and then less security vulnerabilities. We try to eliminate all of them. But the problem without the metadata of visibility is limited. We have no idea which version of the packages are being installed, what's their MD five sums and a way to verify the origins of these binaries. And don't forget that the base image for whatever you use is like giving someone root access on your machine. For example, if someone brings you a malicious whatever binary executable to your build system, he might inject a lot of stuff into your software which you build in that container. So one of the solutions we have in this case is reproducible builds. And from my perspective, this is the only way to be safe is to have the ability to retrace the actions and verify the binary results. If I can do the same process as you get the exact same results, I know your system isn't compromised. And then, and that for the long run, whatever you said or claim you do, you actually do because I can trace that. And that's the real way of fully leveraging the transparency of open source and that's a big advantage, because with open source, we already have the sources we can rebuild it, we have the we can get the instructions. And that's something we can do with cloud software. And there's a big benefit. I want to use the the this presentation to commentate a Debian work on reproducible building and it's done that for for years. And that makes Debian a much safer distribution because we can verify everything. And that's work being done by the core team of reproducible builds now much bigger than only Debian but it started with Debian. But if you think about it, it's similar to the same service CentOS brings to Reddit, because it takes their binaries, sorry, take their sources creates the binaries again and verify that everything is similar and compatible. There's a big advantage for having more and more project checking everything that the recipes are being used and the results are exactly the same as we expected. And we have the same effect with Ubuntu and others to do it for Debian. We said about CentOS and others for for Reddit. Who does that for containers we have it for Linux distributions but there's nothing that re checks containers says okay give me your sources give me your dockerfiles I recreated verify that it has the exact same results. In a lot of cases. Are we sure we have all all the sources available in a lot of cases, we don't. And there's no way to get identical containers, we can create it again but we have no way to to guarantee it's the exact same thing. Because, in a lot of cases, there's, we don't have all the ingredients, even if we have some of them. There's other, there are other solutions about signatures. For example, just a second. Cosine, and it can really help against type of squatting it can make sure as we get the package from an official source someone that want to clone our software, maybe create a container from our sources. We might be able to sign it. But the problem that they can sign your container as well. And if you only check that the signature exists, then we might get signed artifacts containers or whatever from malicious attackers, because they also have the ability to sign. So in this case we need to, to, to make sure it's from the right person. And that's, again, a harder case as we saw with other type of squatting problems. And again, even if you get something signed, it doesn't mean it's safe. And that's another problem regarding containers, especially containers, but also true for other artifacts. For example, these would also be signed. And recently we had a case where hackers injected a malware into multiple multiple extensions from figfish. It's a vendor of magenta waters integrations. And of course they have a lot of downloads because they're quite popular. So the vendors took control of fig fish pigs server infrastructure and added the malicious code to the vendors proprietary software not to the open source. First, because it has fewer transparency, but also because the their build system and distribution system automatically signed the software industry you did it. And, and that made everything been distributed in high numbers in in short time. And in this case, I think the transparency of open source yet again is an important way to protect us. But we need to be able to check the sources and not trust anything which is signed. And this example is an example why signing is not the only solution. So, if we talk about what might be a solution. First let's talk about container creation best practices. We need to make sure we get in the code from verified sources. We should prefer a gift for accountability and transparency. Because we have the history, and for tables or other artifacts we don't have any change history. So whatever we have, we prefer it to have a way to to make sure what was changed in the last release. We can verify that things are being done as we expect. And of course, we forget we also have sha one to sha 256, and then we can verify that the commit to to the to the artifacts. Of course we want software reproducibility. We want the same for containers. And we want access to Docker files, which is not always available, as I mentioned, and even if it is available, we don't always know it's the same file being used to create a container. Whenever you upload the containers to registry, you don't have to upload the Docker file. And I think trying to tie these together would be a major step forward with both cloud native environments and both with infrastructure is called security. So if you want to secure container creation. So the recommendation would be to start with kicks as mentioned in the beginning. It stands for keep infrastructure is code secure. It's a opa based rules to secure your docker files hem charts and other investors code formats, including whatever the cloud vendors create specifically. We want to to if we're talking about the salsa diagram, we want to secure it as early as possible, but also go over the supply chain and make sure that whatever we use is secured as well. If we're going to go into a little bit more detail about securing containing creations. One of the examples could be is downloading a terrible inside your container, a bad practice. You might have a rule against it. And which is actually one of the things we saw with the WordPress container, which actually takes a terrible from the WordPress websites. It deploys it as part of the container, which mean it doesn't have a way to verify the details of the terrible the signature and so on. Because it's also downloads the signature from the same case. If I'm, if I were able to change something on the on the WordPress website. I have an easy way to influence the container, although I might not have access to the get in this case I would prefer to see a git repository or being cloned into the container environment. And then having the checksum being published as part of that container. So in this case with base images. We want to make sure we we have rules that will help us make sure they're verified, and hopefully make sure they're reproducible. In this case. We want to make sure a specific container version or or checksum is reproducible. And then we say okay this is secure I know I verify the sources. I might use it again. Either we do the the check or someone we trust as the check. Shortly with kicks. It has a lot of supported platforms. It's already been adopted by GitLab is the infrastructure as code and I recommend you to adopt it as well. If we talk about key takeaways. The infrastructure as code are called infra but they're still software based as any software they have the same problems, including security. The biggest challenge for us is software supply chain and its security implications, and we should verify everything we can and answer software supply chain, because we don't take code from strangers. Here's a short example about type of swatting. All of you, and especially on open SSF. I know it's about salsa that's a scheme, or ways to secure your artifacts. If you go to salsa.dev, you might get this page, which is something not related to software security. If you go to github.com slash salsa, you get this, and all of these because salsa on on the web page on also on on github is written on the A and that's a short example of type of squatting and something that it's very easy to be wrong, even if you're sure that you have the right address. Thank you very much everyone.