 Thank you, everybody, for joining us today. We'll be talking about how to secure your software supply chain at scale. I'm Hemel, and I'm a software engineer at Yahoo Security Team. And this is my colleague. Hi, I'm Yong He also from Yahoo Security Team. Our security team is also known as the Paranoys. Here is a logo for identifying us. We are very excited to be here. All right, so here's the agenda for today. By the end of this talk, we would like to show you how a lot of the existing open source tools can be deployed and integrated with your existing ecosystem. And also some of the policies one can create to safeguard against some of the supply chain attacks. So let's get started. So what is the software supply chain? Software supply chain is made up of everything and everyone that touches your code as part of software development lifecycle. Beginning from application development to CI-CD pipeline and eventually deployment to production. It also includes information about the software, like components, the source, the people who wrote the source code, and sources where they come from. It includes information about the known vulnerabilities, supported versions, and so on. Basically everything that touches it at some point. So what is the problem and what are the various attacks that are possible in the software supply chain? There are quite a lot of places an attacker can target your infrastructure, starting from compromising your source code system, example being the PHP's Git server when they were attacked, or compromising your build system, recent-ish example being the SolarWinds attack. Or they could also trick users to use bad artifacts, which is known as typosquatting, and that's been happening often in package managers. So why is software supply chain security important to us? These are some of the recent attacks and headlines related to software supply chain security. At the bottom, you can see the most recent attack on PyTorch, where a nightly version of PyTorch, which was released during the holiday season, most likely had a rogue package, which was able to siphon off some sensitive data from user system. This attack is known as dependency confusion, and that's been affecting a lot of the development environments and package managers. Right at the top, you can see the one where there was a well-planned spearfishing attack, where they were able to attack more than 130 organizations and were able to get around 10,000 octa and two epic credentials. So here's an obligatory slide showing you some scary figures. Industry data suggests anywhere between 85% to 97% of enterprise code base is using open source. This means that most of your application consists of code that you didn't write. The vulnerabilities in third party open source dependencies can pose a significant security risk. Supply chain attacks are on the rise, and according to the recent report by Sonatype, the supply chain attacks have gone up by 742% year over year over the last three years. Three out of five companies have been targeted due to the supply chain attacks based on Anchor's report as well. So what is the industry doing in order to secure software supply chain? Actually quite a lot. So there are a lot of great tools and resources, especially open source and proprietary, with a wealth of security guidance and resources that can help aid in one security journey. Recently, CNCF Technical Advisory Group also published a document that lists down all publications and references that could help one in their journey, especially it has information like the policies, security assessments, use cases, and best practices. However, it can be really hard and intimidating to find the right choices for your company or your situation, especially when a lot of these tools either have overlap or functionality, or one tool depends on the other. In a company where the infrastructure is already set up, it's really challenging and difficult to figure out how to integrate these tools in their existing ecosystem. When we at Yahoo were starting our journey in software supply chain security, we faced a similar conundrum, and my colleague here will talk more about what tools we used and how we integrated in our existing ecosystem. Thanks, Hamel. Yahoo indeed has a large software system. We are supporting yahoo.com, which is amongst top 10 visited websites in the world. We have a lot of services and products serving high-low traffic. Internally, there are about 60,000 build jobs running and about 5,000 images published to the registry on daily basis. And at the same time, there are more than 700 Kubernetes clusters running more than 100,000 posts. And that is still now the whole picture of Yahoo Software Subvention. There are still a lot of other kinds of artifacts not being deployed to the Kubernetes clusters, like to the on-prem virtual machines. So yahoo software subvention scale is large, not only in the data quantity, but also in terms of tooling choices. Within yahoo, we have different teams with different requirements and needs. And each team may have their own path to deliver their software. So this diagram lists some of the major tools we are using on yahoo, from source code management tools, build systems, artifact stores, and finally, the deployment environment. The variety of tools makes securing software subvention of yahoo challenging. So we must simplify the problem at hand. When we first started our journey, we decided to pilot the security measures and produce real value in a certain software subvention path, which shows the cloud-native path. That is GitHub Enterprise as source code management tool, School Driver as a build pipeline, Internal Outside Registry as artifact store, and finally Kubernetes as a deployment environment. For those who are not aware, School Driver is a CSAD platform built and open sourced by yahoo. You can also find this project on the CNCF cloud-native landscape website. And it is also widely used internally at yahoo. OK, so as we all know, software subvention security has a lot of aspects. Even if we have only one path, there are still a lot of security controls and the best practice to follow. But luckily, we do not need to start from scratch because we have already had some fundamental security controls in place, like static call scanning, ripple branch protection, two-prequest rivers, and so on. So after evaluating the existing security controls and open source standards, we realized there were three major gaps. Firstly, even if we have static call scanning to check the proprietary code, we are not able to detect the vulnerabilities in our open source dependencies. So we decided to introduce software composition analysis to fill this gap. In addition, we are not able to check the software and block the deployment after the code is released from the repository. So to fill this gap, we decided to add two checkpoints in our software development lifecycle, one in the build stage and another in the deployment stage. We will walk you through for these three gaps. What did we do in the following slides? First, software composition analysis, also known as SSA. As Hemel said earlier, most of our application consists of code we didn't write. Traditional call scanning will just look at the flow and the logic of your proprietary code and report if there is any potential vulnerabilities. It will not look at the open source dependencies. So that's why we need a software composition analysis to fill this gap, because those tools will only focus on the open source dependencies vulnerability check. They can identify the open source dependencies, package names, and address, check against vulnerability database, and finally report if they find any issue. They could also probably raise progress to bump up the version of your open source dependencies, which will, in most cases, can fix the potential vulnerabilities. So by applying SSA tools, we make the progress of detecting and remade vulnerabilities in your application code much easier. However, images that will be deployed to the Kubernetes clusters will always have components other than the application binaries, like a base operating system and any other package you could add. So having a built-in vulnerability assessment to scan the whole image is very necessary, because it can not only cache the vulnerability not solved in your application code, but also the issues in those extra components that are integrated during the build time. In practice, we use SIFT and GRIP to generate S-bomb and vulnerability data to do the assessment. We also provide an option to block specific pipelines if they find any severe vulnerability. So in the next time, if another log4j crisis occurs, we can take actions and prevent those vulnerable images from being published to our registry. So after we have some gallery rails in place in both source code management tools and build system, we would like to check all the images that will be deployed to our Kubernetes clusters. So there are a lot of use cases. For example, if an image is not signed or it is from an untrusted registry or it still contains any unsolved vulnerabilities, we can block it or at least inform engineers for those violations. To achieve the deployment time verification, we utilize dynamic admission control in Kubernetes. Hela allows us to implement HTTP webhook that can receive admission requests, check against pretty different policies, and finally decide whether or not to deploy a resource. And in practice, we use both Yahoo proprietary webhook and Kverno to achieve our goal. So let's talk about some of the checks we do during the deployment time. First one is ensuring provenance of images exist. Provenance is a bunch of requests that tell you where this image comes from. It can include information like who commits the code, which repo and branch is this image originally from, build information, and so on. We collect our provenance data from GitHub, School Driver, and Registry. It basically covers all three major steps in our software sub-engine. And those data are stored in the provenance database. And this provenance store is based on Graphia, which is an open source project that standardized provenance format along with API and backend solution. So here is a demo for showing you how provenance check can secure you from some attack. In this demo, we will deploy three images. The first one does not have provenance. The second one has a complete provenance, but it is problematic. The third one has a complete provenance. We design these three policies to verify these three image deployments correspondingly. And the first two images are the malicious cases and the third one is a positive case, a good case. So this is the first one. This image is uploaded directly by an attacker to the registry without passing through the expected state of the pipeline. So in this case, our web hook cannot get any provenance data from our database. So it will reject the deployment. Yeah, in the error message, you can see there's no resource repo or build job information form. And the second image, this image is built and uploaded from a forked repo with malicious changes. In this case, even if this image has a complete provenance, but it will show this image is originally from a forked repo, but not a trusted repo. So our web hook will still reject this based on our policy. As you can see in the error message, it says source repo mismatch. We find it is from a forked repo, but actually we expect it to hit from a trusted repo. OK, lastly, this image is built from a valid source and pipeline finally locates in a trusted registry. In this case, this image has a complete provenance, and the provenance information has no problem. So our web hook will allow the deployment. This is this demo. And second check is the image signal check. We want to make sure all images that being deployed to Kubernetes clusters are signed during the build system because it can help us verify the integrity and publisher of the images. Yahoo currently uses self-managed long-lived keys to sign the images. And this signing foe has been integrated to most of our standard templates that builds and publish images. Actually, this signing mechanism has been existing for a long time, but there was no enforcement to verify the signature before deployment. So this check can also fill this gap. Here's a demo for this. Firstly, because the signal check is directly integrated to the web hook, so there's no policy shown as a previous demo. In this signal check demo, we will deploy two images. The first one is not signed in the build pipeline. So our web hook cannot find a signature attached to this image or signature database. So it will reject it. And the second image was signed in the build system. So our web hook will allow the deployment. OK, so here comes the third image first and check. The motivation behind this check is to encourage people to upgrade their image regularly because an older image will usually have more vulnerabilities. And if you didn't update your image for a long time, by the time you have to make the security fix, you may find the patched data is too huge and it's very difficult and risky to make the change. Lastly, by regularly updating your image, you have to trigger your build pipeline. So by doing so, you can notice your build pipeline issue in time. To achieve the image freshness check, we utilize the Kvernal and Kvernal policies. Here is another demo for the freshness check. Firstly, this is a Kvernal policy for this check. You can see it's just a block-ending image built more than six months ago. You can also find this policy on the Kvernal official website. OK, in this, the demo will also deploy two images. The first one is built about a year ago, so it is too stale. In this case, Kvernal will reject it simply because it was built more than six months ago. And the other images built within a month, so Kvernal will allow it. OK, so the checks that we were showing you in the deployment time or even in the build time are by no means expected to be the full story. We could choose to evaluate nearly arbitrary policies with the necessary data. So for example, if you want to achieve vulnerability check during the deployment time, we are actually ingesting vulnerability data to our Graphios database so that our webhook can fetch and validate vulnerability data along with the problems data. And for the next two checks, they can all be achieved by applying a single Kvernal policy. OK, so here's the end for the details and demos. Now my colleague, Emil, will summarize our work so far. Thanks, Yonger. So just a brief summary on what tools we have been using and the timeline on how we reach there. We started our journey by around mid-2020. When we started our journey, not a lot of open source tools and solutions were available. So we built a lot of in-house tooling and also used Graphios as the provenance store. But by early mid-2021, we were able to collect most of the source provenance and around 10% of the build provenance through our build pipelines. Later in 2021, we open-sourced Graphios RDS, which adds supports to Postgres database with Graphios. Early 2022, SixTor had emerged as a really promising solution for both signing container images and also a testing software. So we started exploring that internally for some of our build pipelines. And while doing that, we were also working on adding deployment checks, which Yonger showed earlier. So we use both Keyburno and Yahoo's proprietary policy checks. Later that year and beginning this year, we've incorporated some of the cosine functionality to our existing build pipelines and are also working on GUAC for visualizing the SBOM data, which is generated by SIFT and GRIP. So what did we learn from our journey so far? Given the hybrid environment at Yahoo with different teams working on different projects and different tools, we faced quite a few interesting challenges and few lessons were learned during our journey as well. The first one is enhancing existing developer workflows automatically. If the tool being built can do its job behind the scenes or if it can be integrated with the tools itself, it reduces onboarding and learning efforts by developers. And by doing that, we reduce the friction between the security team and the development team because they don't need to update their projects on a regular basis and this can be done behind the scenes. This also helps us in reducing the time we go to market or we go to actually the deployment time. An example to showcase this lesson is when we started collecting the provenance, as mentioned earlier, we used Graphius. So we integrated source metadata collection as part of GitHub webbooks. And for getting the build metadata, we integrated it with the build teardown steps. So any build which runs in Yahoo was able to send the provenance to the Graphius store. That way, it gave us around 70% to 80% of provenance. The remaining 20 is because they don't use the standard tooling. And the same when we tried to do for admission webhook, the problem is a lot of the individual teams and owners have their own clusters running. And in order to add the webhook to their clusters, we had to work with individual teams. That took a lot of time because the priorities were different and eventually it took us around six months to make the webhook as default. So that brings me to my next point. That is, and one of the most important lessons we've learned is around adoption and adoption of tools and services like the admission webhook across the company. If even if there is a small portion of the company which is not onboarded to using these tools, we may still be susceptible to software supply chain attacks. So we need to ensure that everyone in the company uses or gets onboarded to the admission webhook. And example for this being, we made the webhook as default for all EKS deployments, but there were some other teams which were using non-standard tools or different deployments. So in order for them to be onboarded, we had to either create a Helm chart so that they can install it manually or we needed to get behind them to make sure they can actually adopt it. So it's still a challenge and we are still working on increasing the adoption within the company. A follow-up to that is, follow-up to the pre-planning is ensuring that there is enough visibility of the project. If we need to convey the business value to the execs, periodic status update of the project to the stakeholders is important. And along with selling the overall mission, we also need to demonstrate the incremental value of the project. The next one and an important one is embracing open-source technologies. Lot of the solutions or tools which we use for software supply and security are already available as part of open-source tools. So embracing open-source technologies is quite important. We should engage with the open-source community and also try to make meaningful contributions that increases the expertise in that and also gives back to the community. And the last one is around continuous feedback. So performing continuous testing on your solution and getting periodic feedback from stakeholders is important. Ensuring the requirements and use cases are covered is important to the success of the project. Yeah, this project has truly been a team effort and there are few folks for, we would like to thank just for the guidance and thank you. Sure, so OPA is a great tool and it's quite powerful. So when we started our journey, we started working with OPA, but as we proceeded, we realized that OPA is so powerful and the needs what we have is not as granular which OPA offers and also as learning a new language since OPA needs to have some expertise for learning the language, it wasn't easy to adopt and add more policies because at the end of the day, it's not just the security folks who would be adding the policies, but we wanted to make it such that any admin could add more policies to their own clusters. So by using our in-house and key-burner policies, it's quite intuitive and self-easy to learn, but in the future, we may revisit OPA if we think that's the need, but. For permanence check, as I show in the demo, so there are basically three, not three, maybe several checks. So we can check the source repo information and build pipeline information and kind of what the registry is it from. And actually for the permanence, but not from permanence, but during the, there's other kinds of data stored in the Graph Files database is vulnerability data. So we can just fetch vulnerability data along with the provenance data. It is not provenance, but we can check with provenance. We can set some policies that we want to block a certain vulnerability, like a severe CVE. So basically there are four kinds of checks for the provenance part. Yeah, deployment time check, three for provenance, one for vulnerability. So I can take that. So internally we do mirroring of most of the open source packages and repos. So we eventually, so obviously it doesn't go through our build pipelines because those are open source. So we won't have provenance for those builds, but for artifacts and making sure those are internally signed, that's what we will make sure we do it as part of provenance check. That's a good question. So it's always a challenge to get a lot of folks working on things like this because it's, unless you are attacked, there's not a lot of value you can prove to the execs. So I would say around four to five people working on this and this not a full time or that's not the only project they work on. So it varies and I would say around 60% has worked on this project over the past three years or so. So we are the paranoid security team. So within that we are the engineering pillar. So we work primarily on building security and DevSecOps tools to make sure the infrastructure is set up. So out of that around four to five of us work on this part. I want to add on that because we have already finished most part, mostly the coding part or the infrastructure building part. Now we are more focused on how to increase the adoption number, how to collect those numbers from our system and make a list of them that you form the engineers to improve their system. Good, yeah. So when a CI CD job tries to push an artifact which it generates to Artifactory or OCI Registry, we need to make sure or we need to explicitly ask the Artifactory to only allow Artifact generated by Pipeline X instead of Pipeline Y being sending it to it. So if let's say there is a node package which has authorization to be created by Pipeline X, Artifactory would have that information. So we can map the image ID with the Pipeline ID and that's how we ensure that only the images which are marked will be allowed to be published as part of the Pipeline. Does that answer your question? Right. GraphFest has authorization. Internally we use a product called Essence. It is also a port source product. It's kind of RBAC system. This is also a CSF product. So you can find these in the website. And every time we send the provenance to the GraphFest database, we actually send kind of a mission required to the Essence and if they can get that permission, they can just put the provenance into the database. Otherwise it cannot. So yeah, that's what we did. Okay, so next one. Yeah, I think those decisions were made way before we actually did it and that's been happening for a long time. But we are now with CoSign and SixTor being widely adopted. We are looking at exploring that solution of having either short lift keys or FMR keys by going the keyless route to use those instead. Firstly, it could be just use CoSign with long-lived signing keys. But finally we can gradually add some record infrastructure to achieve the keyless signing. So it's not an easy path to go. We will gradually work towards that. No. Yeah, I don't think it's a requirement and I may be wrong again, but we are working with the right folks within Yahoo to make sure it we change because it's been around for so long. It's and the open source tools have been coming in recently as well. So it's a change of learning curve as well. So that's what we're working on. Sorry. So we've just started collecting those and doing the vulnerability checks. The plan is to tie the S-bombs with the freshness policy so that as we ensure that an image or an artifact cannot be deployed for more than six months, which is a stale image, having an S-bomb which is older than six months also may not add value because we won't have any artifacts which are running for, which are older than six months. So the plan is to have it either for a year or so because teams may have exceptions or allowed, they are allowed to deploy an older image. So, but yeah, we, that's more policy question than developer or an engineer question I feel like. So yeah. And we are not sending the vulnerability data or S-bomb data in the build time to any database right now. We are actually collecting the vulnerability data but it is from an asynchronous resource to further vulnerability data. It's not from the build stage. Go ahead. So to answer your first question, so we use internally gripe and sift to generate the S-bomb and vulnerability data and currently it spits out a lot of information about vulnerabilities. So ranging from critical to medium level and we use that information along with the information which the vulnerability team within Yahoo generates which may determine not all vulnerabilities are critical to Yahoo and it can shortlist those. So we then determine based on the vulnerabilities which Yahoo marks as critical and with what SIF generates and then based on that we make a decision whether to block that image or just report it and continue processing. Right now it's implemented internally because this is more of a build time check and not deployment that okay, the time is of the essence. But I think we are out of time but thanks for your question. You had a follow up question as well. We can. So it's a great time and just based on great time right now we have a plan to check all the components, create time out of every component. So yeah, we are just working on that. You haven't have a final solution for that. We can take the questions offline I think out of time. Yeah, yeah. Sorry, can you repeat the question once more? So we currently integrate the SD command which screwdriver provides with all the standard templates. So before it gets published to Artifactory we've integrated that SD command to all of the existing pipelines. So it gets scanned and determines whether to proceed to publish it or not. So yeah, we use as mentioned gripes SIF to get that information and some propriety code to figure out whether to push or not. I think we are done. Thank you guys. Thank you guys.