 So my talk is learning from supply chain failures and best practices in other industries, not software. A little bit about me, I'm Damian Ginther. I am a Distributed Systems Engineer at Super Orbital. I'm gonna talk today about some supply chain failures and practices in the non-software world that we can learn from and apply to our own work in software. A little bit about me. I've worked in sysadmin for over 20 years moving into what's now called DevOps. During that time I've worn a lot of hats. Security though has always been at the core of what I do. I put some of my employers up here but I've done network administration, incident response, penetration testing, sysadmin, physical security for a data center. I've worked for governmental agencies. So if anybody wants to talk about any of that, please feel free to say hello. So what is a supply chain? This is kind of a silly question here at this conference but you've probably heard about supply chains in other industries like food and goods and commerce, medical and chemical. Entertainment has a supply chain, energy, utilities. They all share a lot of common concepts but the overall definition of a supply chain is here. It's a network of entities, individuals, groups, resources and technology involved in the creation and sale of a product or put differently the entire system of producing and delivering a product or service. So in the past few years we've seen a lot of growth in the complexity of software and a huge growth in open source software and we've seen a lot of threats emerge in many, many parts of the supply chain for that. Focus had been a lot on endpoint protection, phishing, other email issues like that and in around 2020 it started to become painfully clear that the supply chain for the software everyone was relying on was really becoming a big threat. So of course there had been some supply chain issues before 2020 but in 2020 there was the solar winds, Orion attack and it was an infection of a system that built commercial software. There was an estimated 18,000 government entities and Fortune 500 companies, nine federal agencies and more than a hundred other companies. The estimated economic impact with that was about $90 million in incident response, forensics, cleanup, PR, fines and lost fitness. That's from a study by BitSight after the fact. In 2021, PHP suffered a compromise to their self-hosted Git server and keynote crashed, I apologize for something, I don't know, that was weird. So there was a self-hosted Git server that they were using, someone forged some malicious commits along with forged developer signatures and there was an RCE added to the PHP code. The remediation for this included moving to 2FA, moving to hosted GitHub, it turns out maintaining your own infrastructure is difficult but it also impacted the credibility and the perception of PHP in a lasting way in my opinion. In mid-2021, the build system for Homebrew, if anybody's ever used that to install software in their Mac, there were vulnerabilities in the build system which allowed untrusted actors to inject arbitrary code using a pull request to the repository, it'll basically allow anybody to insert any code they wanted into anything in the brew ecosystem. So obviously configuring your pipelines and secure configuration of those pipelines is important as well as trusting your developers. And then, we've heard a lot about Log4j in this conference. There was a vulnerability discovered in the release code which allowed unsophisticated attackers to fully control servers. There was a huge cross-section of devices and software that were affected and a lot of people didn't actually know that they had Log4j in their software because it was largely a dependency added by other software packages. And the initial incidents that caused these failures maybe were not preventable. I think a determined attacker can and will eventually find a hole somewhere. But the follow-on impacts and the overall exposure of information and the cost to remediate them I think could have been reduced by better supply chain security. So let's talk about some non-software supply chain concepts. Some of them apply and some don't. So traceability, where did these raw ingredients come from? Transparency, who made the product? What's in the product? What are their qualifications? Safety and quality. In the non-software supply chain, these are things like storage of the product, freshness, proper packaging, but certainly safety and quality are relevant concepts to apply. Logistics, in this case, we can think about logistics for software as our pipelines and resource utilization of those pipelines. Tracking distribution. Where is the software deployed? What version's deployed? Do I know what dependencies that version has? Can I track that piece of software down quickly if I need to? Aftersales tracking, so where is the product in use? Who are my customers? Here are some things that don't so much apply, production, processing, packaging, storage, physical distribution, these kind of things. Most of this is for wholesale and retail sale of physical goods, and there are some parallels that you could draw, but mostly aren't applicable to what I'm trying to discuss here. So if you think about the last time that you ate romaine lettuce, and do you know where that lettuce was grown and what farm produced it? And I had this slide here before Brandon showed it in his keynote. So where did the honey that you ate yesterday originate and what plants did it come from and why might that be important for allergy mitigation and for the taste of the honey and that kind of thing? As Scott Lee points out in this quote, the public and economic costs of non-transparent supply chains are considerable. So if we apply this concept of transparency to our software supply chain, if you think about the last time you used a node or a Python package in your project, do you always go and check out their code and then follow the rabbit hole to look at all the code that they included and what their dependencies are? You know, we all know this joke about the node modules being heavier than an actual black hole. How long did your last NPM install take and did you recognize all those packages? How do you determine where you have the products installed and how do you detect if there's a new vulnerability in those products? And how do we make sure we're not this XKCD cartoon where we build this fantastic new product but somewhere in the dependency chain there's a project that's under supported, vulnerable or problematic? Let's talk about safety and quality. In many other industries, safety and quality is about freshness of the product, whether it was stored properly and how outside factors might have affected it. Rotten food, stale food, those kind of things. Degradation to exposure to light or moisture. Particularly that's a problem with chemical and medical supply chains but definitely applies to others. Or in the case of the entertainment supply chain, the safety of the workers, the safety of the truck drivers in logistics and transportation. Much of that doesn't apply to software. Luckily, software doesn't go bad. It doesn't have an expiration date. Although you could argue that it probably should. But these concepts do apply to software but most of this is the safety and quality of software comes down to the previously covered traceability and the ability of you as the developer to make sure that you are auditing and reviewing what you put in your code and that you are not including toxic ingredients in your code. Does anybody remember Mr. Yuck? Maybe I'm showing my age. We'll talk later about some supply chain concepts that can help you make sure you're not including these toxic ingredients in your code. So in other industries, we have to worry about many more physical concerns than the software supply chain. But we have a lot of corresponding concerns. So you can think of software supply chain logistics as how does my software get from the developer's keyboard into the code repository, right? You are using a code repository, right? Through a series of tests to an eventual code merge and how does my software get distributed to the end user and how does it get deployed and how do I know where it's deployed and how it's performing. We aren't driving trucks or ships here but there is a pipeline that is analogous to those physical world concepts. So here are some common tools, both open source and paid and you can use these for those analogous processes. It's not a full list, but these are tools I've used and thought were good quality. But whatever tools you choose to use, you must make sure to incorporate best practices using security at every step, scanning during builds, storing your codes securely, making sure that your repositories are secure, contributors are vetted, contributors are aware of security. Regularly scan your old builds and S-bombs, everybody's talking about S-bombs, for new vulnerabilities. This is something that was mentioned in the keynote. There's a gap I think in the way that we handle S-bombs after they're created, we don't want them to just sit there and gather dust. New vulnerabilities are discovered all the time so it's possible that old S-bombs will show you new vulnerabilities. Scan your software before you deploy it. Use an admission controller to deny entrusted containers. Ensure that you have a way to know what's running and where and keep a record of what you deployed and what version it is so you can find it in a hurry in case you need to. So let's talk about failure. In the title of the talk, so I love failure. I think it's the best way to learn. In the moment, it can be pretty awful. There dire consequences can happen. But you see across a lot of other industries like Toyota uses the concept of Kaizen or continuous improvement. In order to have continuous improvement, you must be open to failure and you must identify your mistakes and work toward developing solutions. In many ways, this is also what drives agile software development. You iterate, you determine what didn't work, you plan your next moves, and then you move on. And in my opinion, failure is what drives creativity in problem solving and it can lead to a Eureka moment which is very gratifying. So let's take a look at some failures. So let's talk about why blockchain is going to save the world. Not really. In November of 2018, there was an outbreak of E. coli in the U.S. and Canada which was linked to Romain lettuce. 62 people in 16 states were infected over roughly two months. The source of this outbreak was identified and the produce was recalled and on the surface it looks like a success. But economic analysis shows that the impact of the outbreak could have been lessened had there been more transparency and traceability for the affected produce. If there had been a system in place that allowed consumers to know where their food was grown, it could have potentially reduced the time to determine the source of the lettuce, determine change the scope of the recall, and would have affected the subsequent drop in lettuce sales. So a lot of people weren't able to determine whether their lettuce was affected or the lettuce they saw in the store was affected so they just didn't buy any Romain lettuce. For example, Walmart food safety teams before this event could take as many as seven days to trace where food that was sold in their store originated because it was all tracked on paper. Had to be tracked through farm, processing, packaging, distribution, sales, and since then Walmart has implemented a blockchain based tracking system which seems like overkill to me but I didn't design it. It's reduced the time to mere seconds actually to determine that entire chain which I think is excellent. So we can take from this that the ability to determine where your dependencies came from is vital in detecting and reacting to potential problems. Another case here, the Peanut Corporation of America between 2006 and 2009 they were leasing peanuts contaminated with salmonella into the food supply and there were over 700 infections, nine people died and there was a big cover up by the people who ran the company and the eventual fallout of this was federal prosecution of officials on a 76 count indictment. They got successful convictions, there were fines, huge amount of fines. During this process some companies were notified that they had contaminated peanuts but there was a lack of full visibility for the entire chain of supply, not only because of the intentional tampering in the reporting of the contamination but also because most companies rely on what is called a one up, one back approach. Companies only know their direct supplier where they got the peanuts and their direct customer. So some companies purchased the contaminated peanuts from a third party without knowing that the original source of the peanuts was the Peanut Corporation of America. So even after the recall this meant they continued to sell the products because they didn't have sufficient visibility into the entire supply chain. We as software developers we don't have any such excuse. We have the technology to if we were selling peanuts we could know the individual customers who bought the peanuts and the peanut farmer could know the name of the person who purchased each batch of peanuts and the customer can know who farmed the peanuts and each party can know where all those peanuts went from start to finish. It's possible to do that. I don't think we're necessarily quite there but I think it's certainly a possibility with the technology that we have. So we can take from this that it's not enough to know what your dependencies are. You also have to know what your dependencies dependencies are and what their dependencies are and so on and so forth. So how do we know where our dependencies are from and where their dependencies are from and so on and so forth. It's turtles all the way down. So you can use an S-bomb. What does the software or container or the Helm chart contain? What pieces of software are used in the creation of your software? And you want to use a tool that creates a multi-level S-bomb because it's not enough to just know your own dependencies. You've got to know the dependencies of those dependencies as well. And very importantly, in order to lighten the burden of creating, storing, and signing those S-bombs, you must use an automated process. You cannot do this by hand. It's too onerous to do that and too prone to mistakes, not auditable. And you should sign your S-bombs. You should attest that they are what they say they are because of the theory of integrity you want to make sure that your S-bomb is maintains its integrity and that it has not been modified since it was created. There's a lot of question about how we can verify that it's correct as well. And I haven't seen a good solution for that yet, but I hope one crops up. And finally, if you have an S-bomb in a digital signature, you can then use an automated check or admission controller to make sure that you're not running unsigned or unchecked deliverables. We've all seen this picture. In March of 2021, the ship called the Evergiven got stuck in the Suez Canal. Plan B for ships that need to traverse the canal, but can't, is to embark on a voyage around the Cape of Africa. It's a dangerous trip. It's dangerous waters. It adds weeks to the transport. This wasn't the best plan B for a lot of those goods. Additionally, some of those products could have been loaded on to a plane instead of a container ship depending on the monetary value of the goods. You've got economic concerns there. It could have mitigated some of the losses experienced. And then also poor inventory and tracking of goods on the ship because it's just a bunch of giant containers on the ship and there's poor counting in a lot of those systems meant that suppliers and customers didn't really know if their goods were on that ship at all and they were in limbo regarding whether their goods were affected when they might arrive. So caused a lot of churn in the supply chain. There was a ripple effect of that. It was a six day bottleneck that was stuck there for six days until they finally dug it out and it created months of effects and a huge financial effect of like $60 billion, that's a lot. So, you know, what we can take from this is that it's important to have infrastructure and resources available to support a plan B and maybe a plan C and a plan B, a plan D if you're not creating a plan that's tenable in the first place. So, plan B, plan C, plan nine. In order to ensure that your supply chain can be resilient to disruptions and unforeseen events, you should dedicate some thought to redundancy. If your budget allows, use multiple regions in your cloud provider. If you're on premise, use multiple data centers. Lacking that, please use redundant power and cooling. I worked for a data center that had cooling failures often at one in the morning and I had to get up and drive to the office and flick a switch. Don't be me. If your supply chain pipeline breaks down, have a plan in place for how you're going to track and trace your software, a human process for that. Kubernetes can assist with some redundancy and some self-healing and you wanna make sure that you have monitoring and alerting in place for resource utilization errors, visibility into your applications. Logs are great and they can assist in troubleshooting and maintaining integrity and traceability of your code pipeline, but they're not enough. Logs are just indicators of what already happened. So, in order to avoid problems before they happen, you need to start developing a system which verifies everything. So we're starting to talk about a zero trust architecture here. And if you take the time to actually develop a plan B, you must practice that plan B or plan C and D. If your human processes are too slow, maybe you can't get around the Cape of Africa fast enough to prevent the goods from spoiling. Your plan may not be helpful in the event of an emergency or an unforeseen event like that. And if you want to develop really resilient systems, I'd recommend looking into chaos engineering so that your engineers can start to know what happens in each part of your supply chain when things start to fail. So there was a retrospective study done in 2011, a hospital's blood transfusion system in Spain, the hospital de Navarra. It was determined that because of the implementation of manual processes for identifying transfusion materials and patients, a nurse or a tech would go in and they would look at the patient's bracelet and then they would look at the blood transfusion materials. There was a 48% error rate. This is terrible and people can die. You give them the wrong blood, they will pass away. So this was just due to human error, manual mistakes. So they implemented an electronic system where they had a scanner, they could scan the bracelet, scan the materials and it led to a massive increase in traceability and error reduction, which obviously led to better outcomes for the patients. Not many people died and after 2005, they discovered that traceability was over 99%. So very small error rate. How did this happen? Automation is what allowed this to happen. Always automate your systems. Manual systems are impossible to audit, maintain or properly rely on and in your automation, you need to add checks to ensure that you're using the right products, deliverables, software packages and those kind of things and that they've been checked for vulnerabilities. So I'm sure you've heard the old proverb, trust but verify. I would much rather hear you say, never trust, always verify. Verification is the cornerstone of a good security program and the basis for a zero trust architecture. I won't get into the complete zero trust architecture conversation here because I only have six minutes left. But some good things to think about are using strong authentication in your processes, using least access policies and make sure that you're validating your application behavior. Don't just assume that your application is gonna behave the way that says on the tin and use the same validation and verification procedures for all components of your system, including your supply chain. You cannot have a security system which relies on human beings to manually review scan results, detect vulnerabilities. You also can have a system which relies entirely on computer automation yet anyway, maybe chat, GPT can start analyzing and fixing our vulnerabilities someday. Someone has to look at logs, someone has to look at vulnerability scan output and your team has to take action to remediate. And preferably you'll have another team using separation of duties that will verify that the remediation has occurred and audit you. But in the meantime, you can automate some of the easy stuff. So things like version bumps and dependencies with Dependabot or other tools like that. Use one of those tools to analyze your dependencies, alert you that there's updates. Have it make a PR if you can. Automate tests, use TDD to build your software so that you can know if a version bump breaks your software. That'll reduce the need for your team to manually test when versions change, if they can just go in to GitHub and see that this passed all the tests even though it upgraded a package, you can merge that. Use automated scans in your pipelines. Make sure that if a vulnerability is discovered, it will prevent a merge. This can sometimes make things difficult for your developers. But if you cultivate an attitude of security being everybody's responsibility, surface problems early and often, your team will have an easier time dealing with that, they'll get used to it, and your software will be more secure. And then create Sbombs, create and store Sbombs with your container images and sign them. And a missing piece in my opinion, develop automation which will detect new vulnerabilities in these Sbombs. Make sure you know exactly where you're using a container and the version of that container. And make sure you can know it quickly and easily in case something new is detected in an old image. And then of course, admission controllers, if you're using containers or Kubernetes, it's pretty easy to implement an admission controller which will ensure validation and verification of all of these things that we're talking about here. So I'm not gonna read through this slide, we've talked about these things, but that's my talk. And here's some credits. And are there any questions? Do you have any tools for figuring out, oh, I have this dependency, and maybe that's not a great dependency to take, it's that one guy in Topeka? I don't know, I don't know any good way of figuring that out, it's certainly a whole, yeah. I did one of the, so it has whatever metrics it's always been, you know, how many contributors, how active is this community? Yeah. But that, unless anybody else has any other answers. I think somebody mentioned, yeah, I think somebody mentioned in the keynote yesterday about using stars as kind of a dark pattern. It's, I mean, it's difficult to know. Personally, I look and see when the last commits were and the rate at which it's been developed, but that's some annual process too. So, go ahead. I will, yeah, thank you. Yeah, yeah. I think, I mean, it's my personal opinion. Blockchain has a very, in my opinion, narrow application set, and I think that a lot of things that people want to use Blockchain for could be much better managed with, you know, traditional database type systems. I don't know why it's not more, you know, I don't know why people haven't done more experimentation with it, but I do know that it is a huge thing for like physical supply chains right now is everybody's rushing to try and do it. So, I think I need to do more research into how they're using it, but yeah. So, I don't really have a good answer to that question. Well, it's 1130. Thank you all for coming, and yeah.