 Hello, EnvoyCon. I'm Harvey Tuch. I work at Google where I tech lead the Envoy platform team. I'm going to be presenting today with Michael Payne who leads Kubernetes architecture at JP Morgan Chase. And we're going to be talking about Envoy's supply chain. What is this supply chain? Why do we even care? Security is only as strong as the weakest link. And the weakest link in software is often the components which you don't see. Any software system has a number of software components, some of which come to it indirectly through the transitive chain of dependencies. There is a very real attacks that have been staged in open source against the software supply chain, in particular in the managed language runtimes such as Python, Ruby, and JavaScript. But there have also been attacks such as Xcode Ghost which are attack developer infrastructure and those aspects of the supply chain. These are very real attacks which exist. It proves us to understand better what is Envoy's exposure to these and what can we do about them in open source and what can you as an organization do to limit your exposure to the supply chain. So this talk is going to focus on how the supply chain contributes to Envoy's trusted compute base and how we manage and maintain this and how you can understand it. As a starting point, we can look at Envoy's code base itself, the direct code that you depend upon from Envoy. And this is now around 170,000 lines of mostly C++ code. It's grown roughly linearly since the beginning of 2018. It's actually super linear if you look a bit further back, but this is the current trend. And we can actually break that down and view it between Envoy's core in blue and extensions in beige. And you can see roughly right now it's about 50-50 split between the two and the extension mechanism which was introduced back at the beginning of 2018 is actually very effective at helping to temper the growth in Envoy's code base, at least the mandatory code that you must include in your binary. But the core has still continues to grow. Okay, so looking at the same time period, but instead of looking at Envoy source code, we look at the number of dependencies that Envoy directly relies upon. And we're just looking at like one set of dependencies here, which are the ones that are linked directly into the binary and feature the datum control plane. They're grown from roughly around 20 or so dependencies to today where it's roughly 60, again roughly linear growth. And if we slice this data and look at it in terms of extension and core and other like components like things, extensions that dependencies that are there for build and test and APIs, it's roughly a third, a third, a third, a third for core, Envoy, a third for extensions in Envoy for the actual binary, and a third sort of of the remainder. And even core dependencies in Envoy have grown from roughly 10 to 20 today. And the sheer number of extensions of over 60 means you're in a situation where it's very hard to keep in your head exactly what Envoy is depending upon at any point in time. And we've introduced some metadata to help with this, which I'll talk about later. The next few slides will look at Envoy's dependencies through a difference visualization. And this is that of a tree map. And this tree map depicts the square, the rectangles are sized based on the number of lines that each component in Envoy contributes to the binary as determined by the debug lines, which exist in the Envoy binary. So it's somewhat of an over approximation, but it's the actual, the most direct way to map back from an Envoy binary artifact to the underlying dependencies in Envoy source code. So we'll go with that. It provides a reasonable first approximation of the relative size of each of these dependencies amongst themselves and between there and Envoy. And there's some interesting things to note from this visualization to begin with in pink, Envoy's, you know, flagship color, we have all the Envoy directs contributions to the TCB, which is about 20 to 25%. And the rest is actually external dependencies. And these are there for various reasons. We have boring SSL for our transport security. We have Qiche, HTTP, HTTP2 and HTTP parser for our codecs. We have things like LureJet for dynamic code execution, GRPC and protobuf for the control plane, and so on. So we have roughly at least 20 dependencies, but maybe 20 to 30 roughly, which we would see in a normal Envoy binary, which are there and we're linking against. We can look at this data through a different lens and ask which are essential and which we can optionalize and compile out. In blue, we have the core components, which sort of dominates the view. And we have in beige the optional components, the extensions. And you can, you know, by removing these significantly reduce the size of the TCB at Envoy by maybe 20, 25%, but most of these dependencies are actually there to stay. Looking at this set of dependencies in terms of organization and the only organization which consistently is actually responsible for maintaining or being the sort of provenance of dependencies is Google. And they're responsible actually for roughly, I don't know, maybe 80% of our TCB there, which is pretty actually encouraging because, you know, you know, while between these projects, there's no uniform standards, there is at least some core common software engineering principles that developers coming from Google will bring to the table, which are not necessarily true in other projects. Looking at the projects to a different view again, we can think about how vulnerable they are, how many security vulnerabilities have occurred over a period of time. And the methodology I use on this slide is to look back since the beginning of 2018 and ask how many CVs have occurred in the project. So this doesn't take into account how many of these CVs affected Envoy or how many of these CVs would have actually even mattered because these dependencies weren't linked to Envoy at the time, but they give a rough idea of how scary different projects are. And the outlier here by far is libcurl or something like 14 CVs in this period. So this is actually really scary. We actually don't need to rely on libcurl, so that one's probably going to go away. We have an open issue to remove that. Anyone is interested in helping out. They have our full support there. We have a few trickle of CVs and other dependencies, and some have actually known whatsoever, which is either encouraging or just means they haven't had enough eyeballs. Looking at just CVs is no panacea. Many projects don't issue CVs when they discover security bugs. Some security bugs are treated as functional bugs, but are really security bugs. And some CVs don't get properly correlated with the CPs and so on. We can take another view of how much we trust or what we should be doing about these dependencies by seeing how often we update them and they should be updated. We can see in terms of updates, boring as a cell gets the most frequent updates and we'll understand that when we get to the next slide, but it's encouraging because that is one of the really most significant attack services that we have. But other dependencies like GIPC or quiche or protobuf are also updated relatively frequently. We can now look at the number of actual version tagged releases on GitHub for these different dependencies. And this creates a very different picture because many other dependencies actually have no version releases, which is kind of actually scary in so many ways because things like boring as a cell or quiche or something like the TC lap or this kind of thing. It's kind of like a little concerning that we don't actually have a way to know when we should be updating and when, you know, other than watching for CBEs or trying to pay attention to mailing lists or this kind of thing. So this isn't that we actually want to address and I'll talk about that in a few slides time. Final way to slice this data is to look at this in terms of what are the sort of the most scary dependencies from a security threat model perspective. And here I've color coded in red manually. Those dependencies which feature the data plane, which will interact with untrusted data plane traffic. And there's quite a significant number of these. There are ones which are sort of like in orange and yellow, which are sort of control plan or observability related, which are a little less scary. And in white, we have some dependencies that TC lap or PGV, which aren't going to be super important from a threat model perspective. So how do we even generate these visualizations and how do we keep track of them today? Well, this is actually a shifting story. In the last month or two, there's been a number of improvements to this. And we now have pretty complete metadata, which used to only be in comments or not at all about most of our projects that we depend upon. We have project metadata, versioning information, pinning, sharp pinning. We have when these parameters are last updated. And then information on the use category and extensions in CPE, which were actually very useful in generating visualizations, which preceded the slide. We're actually also providing a dashboard which tracks ahead the kind of version and externalizes all this information and links to CBE search and that kind of thing. And this gets included now in the onboard documentation in each release. So far, we've largely talked about things which are informational. They're not necessarily actionable in and of themselves. We're trying to actually take this information and use this to improve onboard security posture. And the way we're doing this is to start with this by formulating a policy. And this policy is already up, but we're working on refining this and making it enforceable. But this will say things like, well, you don't get to add a repository to onboard unless it has peer code, peer review, peer review of code. It has version releases, release notes, testing, fuzzing, all this kind of thing. Or at least you don't get to add it to make a dependency of the core or things like robust extensions because having a policy like this is going to help control the growth in extensions. And it's going to allow us to have greater confidence in them. We probably have like, you know, grandfather in certain dependencies which don't quite match up to this standard. But where possible, this will also use this drive a policy around replacing these older dependencies. The other way we can make use of this information is by trying to use this to follow best practices ourselves when it comes to dependency management. And this includes providing a clear bill of materials, which is essentially what this dependency dashboard documentation does. And we're working on making this more complete over time, recently added support for API dependencies, and also doing and making sure we're using sharp pinning wherever possible. We've been doing this for a while for core dependencies, but some of the Python stuff only recently got hashes. And then we can maintain these dependencies and update them as needed in a regular cadence or as releases occur. And so for the rest of this talk, I'm going to hand you over to Michael, who's really been phenomenal and the onboard community owes him a huge debt of gratitude in basically being a one man shop for maintaining and keeping us up to date for all other dependencies and supply chain. And a large part of our security story really rests on his shoulders. So with that, I'll hand it over to Michael. Thanks very much, Harvey. Yeah, so dependency maintenance has been an interesting journey for me. I actually went to one of the first envoy meetups that Matt Klein gave in the Bay Area sort of many years ago and became very intrigued with the project and sort of tried to jump in straight away. I'm not like a strong C++ programmer, but also I sort of thought, how can I add value here and really got interested in how do we maintain dependencies, maintain their currency throughout the project? And so I sort of started doing some of this work and I was thinking about how do I actually, how do I go about doing this? And so essentially what I've done over over the years is sort of build up a process around maintaining the dependencies and going to share that with you. Now, in terms of some of the dependencies get bumped through the natural course of development on invoice. So people are adding features and they require specific features that are in dependency versions or commits that we don't currently have in the project. And so those PRs will bump those specific dependencies and we go through the process validating those that Harvey went through before. But then there's just the regular maintenance. And that's largely what I've been doing. And so what I've done is built like a large RSS feed that tracks most of the dependencies. And when a new dependency gets added, there was actually another one added today. I actually modify my RSS feed and then I monitor the updates from releases and for where we have dependencies without releases, monitor the commits. And as they come through, I can open those, look at those, determine if we think we need to actually make a change, whether it's significant, whether we can skip the change. And that's sort of just built up over understanding where all the dependencies sit over time. And there is a link down the bottom there. There's a GitHub guest which has that RSS OPML feed for those that are interested in looking at these, the frequency of these updates. And then we've got some teams also that use bots that look at GitHub release emails as well. And they can actually turn those release emails that come out of releases or commits and modify their repos based on those watches. Next slide, Harvey. And so just quickly, here's like a table that lays out where all of the dependencies in the Envoy codebase are. There used to be more than this. We've actually been trying to consolidate the location of the dependencies. It used to be quite spread out. And so you can see that most of the heavy lifting for the core dependencies is in the Bazel repository locations file. And then we've got some API dependencies on the far top left in those Bazel files. And then down below bottom right, we have some of the other test categories. And then the Python pip installs that we do that we do also. And as Harvey mentioned, we've just recently added a char pinning for those as well. Next slide. And so what I want to talk about here is how to actually go about testing these dependencies. And so what I found over time is that quite a few of these dependencies can be relatively volatile. So in terms of in between commits and even version changes, their interaction with the Envoy codebase can change, APIs change. In particular, things like protobuf, GRPC, rules, rules on score go have been quite volatile. So they require a lot of testing. And so I do all this sort of testing locally. I run Linux x86 64 bit environment, an arm 64 environment, and then Mac OS as well. And I found having these multiple architectures also really helps with the testing speed and quality as well, because I can find a platform specific issues before PRs are raised, and we sort of kick off the CI jobs that test the pool requests. Another key thing here is that I don't rely on Docker for any of this. I've got into Twitter flights with people about this. But I generally find that when you're adding Docker, it either slows you down or just adds another layer of abstraction, if you like, that can make this sort of testing more difficult. So what that meant up until recently, Bazel 3.5.0, I was actually compiling Bazel from source on my Raspberry Pi 4. So I do all this for arm 64 on a Raspberry Pi 4. Next slide, please. And so what are some of the tools that I use to go through this process? So mainly this refers to how do we generate the shards that we use to pin against in the build process. So the first one is just shards 256 sum. That's just a Linux tool that's very freely available. And essentially what we're doing here is just downloading dependencies out of GitHub or wherever the source of the dependency is. You just run that against the tool and generate the shard. Then that goes into the metadata that Harvey showed before that is used to verify as part of the build process. We just added the same requirements for the PIP installs. We use a tool there called hashen. And essentially you run that against a requirements.text file passing in the dependencies and it generates the shards there. I actually think that this is actually fairly important. So the integrity of our docs is important to us as well. So we don't want anything that has the ability to modify or interrupt that sort of quality of the doc side as well. So we think that these additional hashers are important here as well. Then the last one, this is API only. This is some of the Go modules. And I've had a really hard time working on how to generate these sums. And I've tried everything from custom Go.mod files to playing around with the Go command line tools. And what I've resorted to now is I find the version bump in my RSS file. I then bump the version number, then run a test and have it fail, which generates the actual correct sum that I can then put into my testing. So I'm not ideal, so I open any suggestions there on how to auto generate the Go modules, sums and hashes. Next slide. So what I wanted to talk about here is strategies to minimize your supply chain. So what that is is how do we actually minimize the dependency lines of code in your binary. And I can feel very confident saying that there are many extensions that Harvey showed before that you don't use or won't need to use as part of your envoy deployment. And so the current images are currently built as what's called a kitchen sink image. That is that they have all of the extensions enabled as part of the build process. And so why kitchen is misspelled there is that if you actually look at that issue, Matt Klein actually misspelled the word kitchen sink. And whenever I would try and go back and find reference to kitchen sink, I remembered it by knowing it was misspelled and searching for kitchen without a T. And so the way to, there's a couple of strategies here on how to minimize your envoy binary here. The first one is a custom dot baselar C file. There's a link there that show, link to the documentation on how to do this. But essentially it gives you a way of disabling certain features of the envoy binary as you build it. So an example here is hot restart. There's quite a few scenarios where you don't need to require hot restart. It's a fairly sort of complicated feature. You can just disable that out of the actual source code out of the build profile to reduce that feature. And then secondly, and this is probably the biggest bang for the buck is compiling out extensions. This is where there is a fairly large extension definition file. And even that you can actually just comment out or remove the extensions that you don't require. Some examples here is ALTS. That's a protocol that you don't need to use if you're not running on GCP, for example. Another example here out of a list of many is double the double double RPC protocol. And then this can go on for all of the different types of extension types, whether it be observability, data dog versus light step or database filters is a very large list here. So you'll find that you'll be able to significantly reduce your exposure to the extensions via this method. And then lastly, for the teams that are actually importing envoy into their internal repos, you can actually using tools like copy barra actually delete the extensions out of the source tree, rewrite the basal files and essentially take those extensions out of your import. So you end up with the much slimmer and streamlined envoy source code base. And so with all of these all of these methods can be combined. You can really reduce your build times and more importantly, reduce your exposure to the dependency supply chain. And so this is the last slide here. Just to point out, there's multiple ways to maintain these dependencies. So it's not certainly not one size fits all. And so you can either rely on upstream envoy to maintain and track those dependencies for you, both the consuming projects. You can rely on your local package management systems, the ARM app, and there's a variety of other ones as well. Or you can actually maintain your own versions and override the dependencies in the workspace in the root of the envoy repo. So many different approaches here depending on your needs, your sophistication, and your requirements. That's it. Back to you, Harvey. Okay, I think we're done and we're happy to take any questions. Thanks. Hey, Michael. Yeah, happy to answer any questions or talk about envoy dependencies or supply chain, or generally, if there are any questions. I fear that this is not as exciting as some of the other tracks, Harvey. Oh, okay. Well, I think that's why it's super exciting. So do I. So do I. All my other stuff in there. But yeah, we can hang out a few seconds. Otherwise, I think maybe we can just leave our Twitter and email addresses for folks to reach out to. Okay. Yeah. And that one's for you, Michael. Yeah. Do we have to make many changes to dependencies? Yes, quite often. So we often where there are, what we try and minimize is carrying patches. So if you look at the actual envoy code base, you'll see that there is quite a few dependencies that actually have patch files. And that's where Harvey and I are both committed to try and reduce or eliminate those patches. So we often go back to the upstream dependencies and ask for them to make changes. Sometimes that's easy because they're controlled by friends of envoy, in particular Google and others. Sometimes that's more difficult. But yeah, we often have to make changes to dependencies as we bring them in. Yeah. I mean, yeah. Yeah, I think that that's pretty much it. I was just going to stop you to this other question that came up from Matt, which is, I know, how does supply to other projects? And sort of as an industry problem, I think, yeah, I think it's well recognized that supply chains are big problem in open source specifically. And GitHub and a number of other organizations have an initiative called OSSF, I think it's called OSSF, something like that. Anyway, they're basically planning to put together a scorecard, which will allow us to get an idea of what qualifies as, you know, for example, a good dependency versus a bad dependency. And in terms of trying to like, you know, audit the supply chain, like the kinds of things we did in this exercise and making sure you're not out of date and scanning for CVEs. Yeah, thanks, Brian. That's the link. There's actually a lot of tooling in out there, which does us today, some of companies whose business that this is, is to scan your project for this, but they mostly work with managed languages or languages with manifest files. So Python, Node.js, Ruby, even Rust. Unfortunately, there's no standardized C or C++ format for this. And so we're essentially having to make things up and build our own tooling as we go because C and C++ are sort of like the less popular languages ones, which folks aren't as interested in building standardized solutions. And what it's just finally really hard to do. So yeah, that's kind of where we're at right now. But we do plan on like, you know, sharing what we've been doing on where once we're, I guess, in a little more of a final state and a blog about that. We are also doing some work to integrate the CodeQL capability of GitHub. There's been some initial work there as well, so doing sort of additional scanning. That doesn't reply to our dependencies, unfortunately. So that's true. And it is already slow enough that you can get it to run fully on all more. It's kind of a bit of a problem. There was also a question from Fabrizio about there are some documented is there any documentation on how to actually minimize the build? And I don't think we actually have a good guide there. So I think opening an issue around that or I can do that actually, or country and PR would be very useful because I think it's obvious that then reduce the number of extensions you have compiled in. And most folks aren't quite clear on what that is. So she gets back to a point Matt made in one of the earlier talks, which is having profiles like a minimal profile secure profile. And we are actually opt out most extensions by default in the build will probably be advantageous. Yeah, I can I can I've documented some of this internally, but I'm happy to help out there and sort of write up a more concise guide about how to go through binary minimization. Okay, great. Thanks. I think we're out of time now. Thanks everyone for listening and all the great questions. Thank you very much.