 Good afternoon everybody! There are an awful lot of talks here so thank you for taking the time to attend mine. My name is Joshua Loch I am going to try and provide an acronym free introduction to Yn gweithio i'r Fyloedd amweithio yw'r cyflawni yn ymgyrch yn ymwyllgor a fwyllgor yw'r ffordd yn ei wlad, ond rwy'n creu'n gwybod yw'r ffordd ac rwy'n cyflwm iawn i'w byw yw'r ffordd. Mae'r rei ymddangos gweld eich ffyrdd ymddangos yn ei wneud i ddweud i'r gwybod a'r ddweud i'r ddweud i'r ddweud i'r ddweud i'r ddweud i'r gwyllgor, codi'r arbitr ar y sylwg cynllun agor'i fod yn holl yn eu genedlau'r ddyliaw. Yr hyn yn ymlaen i ysgwrs, mae'r ysgwrt-nod yn ymgyrch nesafau yma ar y Government Oes yn ymgyrch ymdaint o'r gweithio ymdau pa mor gwrthol i'r ysgwrt-nod a'r hyn yn ddwyny'r ysgwrt-nod. Ac mae'n cymryd o'r cymryd i'r cyrch yn ymgyrch, mae'n ei gyd victimau ddoch chi, go into producing a piece of software, so you'll have someone writing some code, they'll push the code somewhere, hopefully a revision control system, the code will get picked up and built by hopefully some kind of automation and it will pull in a bunch of dependencies, increasingly many dependencies in this day and age and it will produce some kind of artefact packaged up that it then sends somewhere to be run. And this is a nice simple model for our software supply chainers, but of course in reality they are much more complex, each dependency we pull into our package has its own set of dependencies typically and there are of course additional complexities in that there are boundaries of where our common tools can observe, so the language package managers we typically interact with when we're building software don't connect into the operating system package managers that provide base images that we're running our software on and of course the infrastructure that sits underneath that, so this even this diagram is a relatively simple view of software supply chain security or a software supply chain sorry and if we want to think about securing software supply chain it's useful to have a shared understanding of what that means and most people when they talk about software supply chain security they're really thinking about how can they protect against unintended modifications of some software and I really like this diagram this is tamper resistant tape on some medication and this is how I kind of think of software supply chain security we we want to prevent someone from tampering with the software in production or if we can't do that we'd like to know that someone has tampered with it and so with the tape here if someone's opened it you can just clear evidence that the thing has been opened and tampered with before you've received it so how does that map to our model software supply chain there are innumerable places where we can tamper with the software in production and so you know the developer's device might be compromised or the developer themselves might be compromised and introduce some malicious changes the revision control repository itself that could be compromised or the system that builds the software and of course basically what I'm saying is that any step here is an opportunity to introduce some unintended changes and all of the links between those steps as well are opportunities for a malicious actor to do some tampering and one thing that is increasingly recognised but for a very long time wasn't recognised in software production is that the devices the machines the infrastructure that we're building the software on has to be cared for with the same level of attention as where we're deploying it so with some very high level kind of overviews of software supply chain and securing them I want to do a brief tour of what can go wrong when a software supply chain is compromised so give some examples of how these compromises manifest by looking at some real world examples from previous years this is a really interesting one this is actually six years old now in 2018 the npm package event stream was compromised and the way that was compromised was the maintainer of the package had less time to spend on it and another contributor came in was making really high quality contributions and eventually became a maintainer of the package and a little while into their maintenance they introduced an additional dependency and if you looked at the source code for this dependency it all seemed fairly innocuous but that dependency had additional malicious content that was included in the package content on the registry that wasn't in the revision control system and the way this manifested is that people who installed this dependency were installing some malware that would look for a specific crypto wallet and extract effectively your your cryptocurrency redistributed to the attacker it took about two months to discover this this compromise before it was fixed similar timeframe the webman project which isn't a very cloud native project to admit but it's a web interface for doing server administration and they had a build machine they would build their projects for release on this machine that was effectively under some contributors desk or you know similar setup and someone was able to get to that machine it's obviously a high value target if you can compromise a bunch of servers for free so someone was able to get on this build machine introduce a change in the packaging step and nobody discovered this for around 18 months so every release of this project for about a year and a half had this compromise introduced where the attacker could basically just get on to a bunch of servers on the internet that were running webman and do who knows what and then a really big one in the winter of 2020 a lot of people working in security didn't have a Christmas effectively because the the solo wins Orion project had a a compromise that was delivered to all of their customers fortune 500 companies federal government lots of places that you don't want data exfiltration to happen and the way this attack was delivered was a nation's data attacker figured out that they could get on to the build server for the solo wins cooperation and substitutes modify source code file at kind of build time so this this file lived in the malware's memory was substituted in at build time and then I don't think it's ever actually persisted to disk and so it was really hard to detect I think it took around a year before people noticed there was extra traffic coming out of their systems and yeah a lost a lost winter which is sadly a recurring theme in supply chain security another one almost last one more after this and then we'll get back to some of the what we can do about it and take the power back but this was a an interesting one because code covers a tool for doing coverage analysis of software code how well is the code covered by test and they have a an upload a script that would upload your coverage results to their server and do fancy analysis to also nice graphs and everything and you could get this script through a docker image to integrate into your container pipeline and the docker image build process wasn't verifying that the script they were downloading from their own servers was the script they expected and someone was able to compromise their servers put in a substitute script with some malicious content and then code cover would build this into their container images which they recommended their customers use so for around three months anyone who was using code cover had put this malicious content into their ci pipeline so it could see an you know the environment environmental variables the the source code probably could get access to the data stores all of this stuff that's part of your ci pipeline that you don't want running people to be able to access was just kind of unwittingly exposed and then finally my favorite one which isn't actually a security compromise but as with many software security issues there's i i think there's a strong overlap between kind of software quality and the rigor of our engineering practices and the security of our software that deliver and the evi vendor rivian delivered an infotainment update that broke their systems because someone fat-fingered the build and pushed out a build with the wrong security certificates in so all of their evi systems that were deployed to customers suddenly couldn't use the infotainment system because they had a test certificate basically and i have to kind of scratch my head and probably crying to my pillow thinking about how how do you accidentally release to all of your very expensive electronic vehicles and even describing it as a fat thing because that implies a very manual process for releasing this this software that um yeah really worries me um don't buy cars i guess so what can we do um as software producers as people involved in software production to secure our software supply chains most people i think or many people get into software um as a creative act and i think i love doing creative software development but i think the creativity should be at your editor and everything that happens after that should be as bland as possible so i'm going to make a very tortured analogy to fast food franchises as i talk about software supply chain security for the next 10 minutes or so basically the the thesis is that your software production should be as predictable as a fast food franchise so i can go to you know a coffee chain anywhere in the world and get an equally mediocre coffee that's that consistency is like very replicated across this franchise and you want the same thing for your for your software another way of putting it is the creativity happens when you're preparing the menu not when you are preparing the individual meals so the first thing that any food handling organisation should do is practice good hygiene and in software terms this is what i like to refer as to as covering your assets so you need to have strong hygiene around your secrets this is going to be a synthesis of things like using pass keys multi-factor authentication short-lived secrets not storing your your secrets in your revision control repository because that is a great way that people find credentials for getting into people's networks and equally we have to practice infrastructure hygiene so our configuration and deployment of our infrastructure should be highly automated should follow principles of lease privilege ideally and this is a fairly big ask but if you can have two-party review on your infrastructure changes the same way you might on your code changes and have someone kind of verify that the info changes what you expect you'll catch a bunch of human error and potentially malicious changes by doing that and then the final piece is like we should we need to make sure our infrastructure and our laptops and our development tools are all kept up to date to make sure we don't have any known security vulnerabilities in those so following these good hygiene practices will help mitigate credential leaks prevent the exploitation of known exploitable vulnerabilities and will also help protect against human error which is a frequent source of security problems the next thing you want to do is have a good understanding of the ingredients that you're including in your your software so we want to make sure that our software components that we pull into our build are coming from reputable suppliers this is a kind of trusted upstream repositories like well-known Linux distribution vendors or making sure that the things we're pulling in are from the you know the original developer that we expect to the canonical upstream authority it's really easy to accidentally pull in especially if you know if you speak a variant of English that is different to American English like I do you'll spell words differently and so you might think you're introducing a package into your dependency chain like colorama and you've put that you in that we brits like to do and suddenly you've got something with a very different intended outcome um so we need to make sure that what we're pulling in is what we expect and there's a bunch of great work happening to make that easier um but some of that is just you know people taking the extra few minutes to look at the source of what they're pulling in and once you are pulling in dependencies once you're facing these ingredients you want them ready at hand while you're building your software so reaching out to the internet during your build process um is potentially rife with error that the original upstream location may disappear for whatever reason or it may get tampered with um so having some kind of artifact store caching your dependencies within your control gives you a lot of risk mitigation and can help provide provide something of a fire door against the upstream being changed unexpectedly and of course fresher is better um so keep up to date with your upstream don't pulling a dependency and leave it in your build for three years and never get the newer versions that include a bunch of fixes I've put a little asterisk here because if you're using a volatile dependencies ecosystem like npm or pi pi i and you try and keep up to date with all of the changes in your dependencies you're probably going to be applying them on a daily basis you probably want to think about what is a reasonable cadence for your organisation every three months every six months monthly whatever makes sense and then have some kind of automated and followable process in place to do that but you really want those fresh ingredients don't know your ingredients kind of rot and that will help prevent use of compromise dependencies provide that firebreak to the upstream and prevent you from being exposed to known exploitable vulnerabilities and on this slide I have this term software bill of materials I wanted to avoid all acronyms but sbom is such a term of power already that it was kind of unavoidable so I want to introduce you to that acronym so our software bill of materials is effectively the ingredients list for your software project which things go into producing your software and where they came from is really useful information for your own team and also increasingly desired by kind of the downstreams the users of your project and so the next kind of theme of advice is like a fast food restaurant the default configuration of your dependencies and container images and software project is going to be what most people choose so I'm really encouraging everyone to think really hard about what they include in their software projects and try to ensure you're only using what you need there's a a go proverb which is a little copying is better than a little dependency and this causes I get undoubtedly I get questions every time I use this proverb because I'm an open source person I've been working in open source for 15 years and people hear me say that and say so you're telling me not to use open source you're telling me to write everything out myself and and not have dependencies and that's not the case but I think we have to be more considerate about the number of dependencies we add to our software because the more we have the more attack vectors there are the more maintenance turn we have and fundamentally open source software is free of monetary cost but we pay that cost in maintaining those dependencies and keeping up to date with the upstream so you have to think about whether this is a cost you want to pay and in a similar vein I think when you're building container images base images VM images starting small and building up is really hard to do it's much easier just have a big image that includes everything you might need but those things have a tendency to get shipped into production be a nightmare to update and provide a very big kind of threat surface and also just mean much higher storage costs and transfer costs for our artifacts so sadly there's no perfect tooling for doing this today but there's a bunch of open source projects you can learn about at this conference that help make it easier to build container images which are more suited for your specific task and help you do streamlining adding additional content at a later date and Cengard are a software supply chain security start-up and their CTO coined this term the principle of minimalism that I really like they say the default should be the lowest common denominator of what you actually need and you can apply this principle kind of across your software production to help you think about ensuring you're not introducing a lot of risk to your projects by adding a bunch of stuff you might need later and so by doing by following the principle of minimalism we reduce attack surface we help focus our remediation efforts and we probably have you know cost benefits in storage and transfer costs as well so nice little bonus there and finally I already hinted at this one like a fast food chain consistency is really crucial in software production it helps us to be more secure it helps us to avoid fact fingering a build and releasing something we didn't intend to and it helps us to really have a much more stress-free software development process if we can have a recipe that enables us to rebuild our software so on demand with controlled steps gives us a repeatable build that that's in my opinion table stakes for software development and then the other aspects of this consistency notion increase efforts for increased reward but many people will kind of stop after that initial repeatable build notion but I like to refer to builds as being either repeatable, rebuildable or reproducible so a rebuildable build is one way you can replicate the build with the exact same ingredients regardless of whether you know the latest tag moved on your container images or the upstream release of your dependency and then having a reproduction line like build process where you can reproduce a bit for bit equivalent binary has a lot of security and debugability benefits but takes a lot of effort there was an article published recently where a project that's specifically set out to provide reproducible builds found that five years later they couldn't reproduce a bunch of the things that they had built that and to me this indicates how hard it is there's so much variance so much so many sources of non-determinism in software production nowadays that even if you think you're controlling for them all you're probably not and that's why I've put here you know reproducible builds reproducibility is hard it's not worth it for everyone but I think there are significant benefits by having this consistency notion you can prevent and detect mishaps making much easier to replicate and debug your software and you can make the modification of that software more detectable so given the tortured analogy and the distracting pictures I wanted to summarize the things that I just said with this this slide kind of provides the four things that we talked about in one view so when you're doing your software production practice good hygiene infrastructure secrets development tools know what you're including in your software have it at hand try to use as little as possible only what you need and ensure you have consistent processes so that was my kind of whistle stop whistle stop tool for the fundamental principles that I think are necessary to be aware of for producing a secure software supply chain what can you do now that you have this this grounding well fortunately at this conference there are a bunch of open source projects that work in this space I couldn't list them all so selfishly I've listed those which I have had some involvement in the intoto project is a CNCF project to enable you to rigorously define a supply chain that has end-to-end integrity and there are there are several talks at this conference about intoto the update framework is a CNCF project for doing secure content delivery typically of software updates but we've actually seen some ingenious applications of this technology to legal documents and the root certificates for a trusted system and the update framework is at the conference as well there'll be a few talks both intoto and tuff have a kiosk in the project area so you can come and talk to developers of the project about it and then outside of the CNCF the supply chain levels for software artifacts project is a set of standards and controls to prevent tampering with software in production pretty much what I've been talking about here but much more rigorous formal definition or semi formal definition I highly encourage you to check out that project if you've got a team managing your infrastructure that would like to secure it further and the stick store project is a project to help you sign software artifacts without having to manage keys which is really the most difficult part of signing is managing your keys in a way that doesn't open you up to more risk by you know leaving your ubike on a tram or accidentally putting your tokens in your get repository all you have to be able to do is authenticate against your email account and they will give you a shortlist certificate that can be used to sign a software artifact it's a very cool project out of the open source security foundation and then finally the secure supply chain consumption framework is an acronym that I promise not to use which provides some guidance around how to securely consume third party dependencies into your software so it will provide a lot more detail about some of the things that I kind of touched on in the talk here and then there's some reference materials for people that are interested the CNCF has a security technical advisory group where a bunch of leading edge thinkers in software supply chain security produce really useful artifacts there's a catalogue of supply chain compromises which we'll talk about many of the attacks that have happened how you know which bit of the system were attacked starting to classify them and provide a really good corpus of information there's a white paper on supply chain security very detailed really good read if you've got the time to go through it they then the same team then produced a follow on paper describing how to mitigate a bunch of the problems they talked about with a software factory process and and so that's a much more hands-on document describing how to address more of these concerns and then I've linked to the chain guard blog on the principle of minimalism because I was very fond of it so the final thing I want to say is that software supply chain security even when you talk about it in 25 minutes it's a big problem space there's lots you could be doing and it's really hard to it's really easy to become overwhelmed by the scope of the work and so the thing I like to say is that security is not a kind of a boolean property it's not either secure or insecure it's always fuzzy you are more secure so if you can do one thing you will help improve the security of your software supply chain so I want to encourage you to encourage all of you to do one thing and to to let you know that taking that one step can be pretty easy I'm confident that you've got this so yeah please do that and coming in a couple minutes early I think but that is the end of my content so thank you for listening if you've got any questions I'd be happy to try and answer them yeah so one question here like what will be your recommendation for example if we're building artifacts or syntax or stuff like that so you recommend it about pinning the versions you know um but sometimes uh you have like this trade-off that you need to decide between uh pinning the hash the specific hash that you're using so it's very secure very stable and stuff like that versus maybe pinning the major version with the fixes the batches and all of that or just like pinning a more specific version so like what's your what's your take on that yeah great question so when you're thinking about uh ensuring that you have deterministic inputs to your build do you pin um you can you can pin the major version of a release or the specific version of a release a patch version and I tend to favour pinning very explicitly the release like the specific release um and having a process by which I can get informed about newer versions and have that automation so if you're on github you can use dependabot it's pretty great you configure it you can configure it to tell you every you know week or two weeks which of your dependencies are out of date and it will even write a patch for you to include the newer versions and this is where there's kind of a synthetic overlap between security and quality and rigor in software engineering that I mentioned at the beginning of the talk if you've got you know tests and um you can take changes in with some level of confidence then having this automated process in place is a relatively low overhead um so I'm in favour of uh that effectively pin your dependencies explicitly have tests that help you ingest newer versions with confidence and then um have a process which preferably automatically pulls in those newer versions a very fast follow-up on that one is uh if we're going to use like dependabot and all those stuff like is there any recommendation because sometimes developers get overwhelmed by all of the dependabots updates and stuff sometimes they just like version because you know I'm done with this this week right uh so if I I think your question was um or maybe statement was about the frequency of updates right and so that's why um yeah I think that's why I had the asterisk in the slides and I think it's very important to define a cadence that is sustainable for your team you know if you're a team of two people or a team of 10 people what is sustainable is very different and it also is affected by which language ecosystems you're using but the key thing is to define a cadence have a process have as much automation as possible and then follow it so if you think daily updates are too much I'd be inclined to agree but maybe once every two weeks is something your team could manage uh spending Friday morning with a coffee just reviewing the dependency updates um so set that process in place have the expectations that they're doing it um and you know you kind of simplify things that way um make it right but uh achievable thank you no problem um just went a quick question we talked about reducible builds reducible builds and so on and so forth I would like your opinion on systems such as uh ganugigs or nixos which actually try to implement these things so my opinion on ganugigs and nixos and tools like that so ganugigs is the project I referenced in the talk where I said they they set out to be explicitly reproducible and they tried to reproduce builds from five years ago and and ran into a bunch of issues and then fixed them so I'm I'm a huge fan of ganugigs and nixos uh while recognizing that for many teams and organizations that level of rigor is uh unsustainable and unobtainable um I think there's a I feel like there's a an an intermediate ground between how most people are building software and the fully defined upfront rigor of nixos and geeks but I haven't found it yet and I haven't even figured out how to delineate it yet but it's something I noodle about on my whiteboard um it's a bit of a brain worm for me so I'm a big fan of those systems uh but the pragmatist in me recognises that they're probably too much to ask of most people it's a big leap to get there you have to be able to define how to package every single component in your software um to get the full benefits and and that's a huge ask thank you no problem thank you thank you for the talk um I've heard yesterday in a different talk about cubescape um how does this fit in your suggested approach um I have not heard of cubescape so I'm afraid I cannot provide a satisfying answer to that question uh yeah the cloud native space is is ginormous um so yeah sorry I don't have an opinion on that one no worries thank you all right thanks we've got about three minutes I think so um yes sir hi there thanks for the talk yeah um I know there's a lot of package repositories out there with lots of open source so you know npmgs and maven repository these kind of repositories um is there a good strategy for you know when you're bringing down repositories for actually scanning them for vulnerabilities or are these repositories inherently becoming more secure because they're applying their own security scanning great question so are the is there a good strategy for scanning uh dependencies you ingest and are the repositories becoming inherently more secure um most of the dependency scanning tools can only tell you about known vulnerabilities which is great like you don't want to be pulling something in that has uh a known remote code exploitation but the mvd base scores for these vulnerabilities are the reasonable worst case scenario so we can't always kind of take that a face value you have to think about how it's integrated whether you've got you know sandboxing of the component whether you're even using the vulnerable api and things like that so the scanning tools provide you metadata that is a good starting point but not sufficient in my opinion um but yes the the repositories themselves are becoming more secure they are um integrating tools and techniques to make it um more likely that the person who uploaded a package is the person that they were supposed to be so um you know trusted publishers on registries like pi pi i um and uh also uh npm have a great feature they're working on that provides a cryptographic link between the source code used to build an artifact and the artifacts so um right now i mentioned one of the attacks was that someone uploaded a package that had content that wasn't in the source repository and so these kind of techniques make that much less likely so the the registries are becoming more secure um and yeah the scanning tools are a good starting point but there's there's more that has to happen after you scan the package effectively thank you no problem go ahead i think we've got one minute yeah thanks thanks for the talk it was really practiced and um so i think it's related to the previous questions you've told us how we can improve our security when building software but how can we evaluate that the dependencies we are using are uh applying those principles yeah great question we definitely don't have time to dig into that uh i don't know that there is a good way to evaluate it one of the reasons that um i think it's important for people to be much more explicit about their dependencies is because the only reasonable way to do it today is to individually research each of those projects and see whether you think they are doing software development in a in a meaningful way so um the chaos group i can't remember what the acronym stands for but there it's community health and something they define metrics for understanding um how healthy and open source components uh kind of communities whether so that's a good risk analysis tool for whether whether you pull this thing into your product that you ship in for 10 years is it likely to disappear because there was single maintainer got um fed up and i think that is a good piece of metadata to include in your analysis but it's not the only piece there's so there's a bunch of tools out there like the chaos metrics um the open ssf scorecards that will rank a bunch of security metrics like whether people are using uh code review on their pull requests whether they're doing um branch protection whether people can push directly to the main branch and all of these sorts of things but there's there's absolutely not a single tool that can help provide confidence and a lot of it comes down to the age old security person's answer which is you kind of have to understand what your threat model is and make a risk decision um but ultimately there's a lot of data available we have to decide which signals are important there's no no one's doing that for us today um yeah unsatisfying answer for the end of the come for the end of the talk but that's all i have i'm afraid all right thank you thanks everyone for showing up