 Great, thank you. Hi, everybody. Thanks for joining us today to talk about dependency management in C and C++. My name is Aaron. I'll be your host for today's discussion and I'm joined by Jessica Black, a senior software engineer here at FOSSA. Welcome, Jess. And a big thanks to the Linux Foundation for hosting us today. As was mentioned, we'll be happy to take questions throughout the presentation so feel free to drop questions into the Q&A at any point. We'll try and take as many of them as we can in real time and we'll get to the ones that we can't in the Q&A at the end. So let's take a look at our agenda. We're going to start with a quick overview of the C++ ecosystem. Then we'll take a look at some of the various methods that exist for including dependencies in your C++ project, including vendering, static linking and dynamic linking. Then we'll dig a little deeper to see how FOSSA identifies those dependencies across all three scenarios and we'll wrap things up with the Q&A. Now, before I hand things off to Jess, a quick word for folks who are not familiar with FOSSA. We provide technology that makes it easier and more efficient for organizations to manage the open source software they use. So this includes identifying and remediating open source vulnerabilities as well as identifying and complying with licenses in your open source code. So basically, if you're using open source like Slack, Uber and Confluent, you should be using FOSSA to manage it. Great. Now let's talk dependencies. Starting first with a quick intro of our distinguished speaker. Jess is a senior software engineer. She works on features like our C++ dependency scanner, and she specializes in relational databases, server software and CLIs, primarily using languages like Go, Haskell and Rust. So we're super happy to have Jessica here with us today. Thanks again for joining us, Jess. I'll let you take it from here. All right. Hi, everyone. Next slide, please. So yeah, so CNC++, as I'm sure most people here are aware, is frequently used in what we call systems programming. So performance critical or lower level areas like games is a common example or, you know, hardware drivers, databases, networking stacks. According to the Stack Overflow Survey in 2021, over 20% of developers did some form of CNC or C++ development last year. You know, it's been a popular language forever and that's not changing. Next slide, please. So whenever we use open source, it is, you know, it's always great to be able to get us off the ground easily and filling gaps that we don't have to develop ourselves, essentially. But the biggest issue with using open source alongside that convenience and velocity is that it can sometimes be risky, both from a license compliance standpoint and a security standpoint. You know, there's all kinds of concern about using like GPL licensed software or, you know, security issues like Heartbleed or other major security problems like that or minor ones like, you know, buffer overflows. So tracking those risk vectors is really important. And yeah, that's what we're looking to do with this. Next slide, please. So a big key with CNC++ is compared to other languages. CNC++ projects usually don't have any kind of package manifest listing their packages that they depend upon. So on the right hand side, there's an example manifest from a Rust project. And you can see all of the different dependencies and their versions. And so with other languages, you know, you can look through this list and get a good starting point. This isn't the whole story, but get a good starting point for what dependencies you're using with CNC++ as, again, I'm sure most people here are aware. We largely don't have that. Obviously, there are some open source package manager like Conan is a big one. But by and large, most CNC++ projects don't use that and instead use a mix of like local vendor dependencies or system dependencies pulled in. And so finding these is hard and organizations often will rely on software composition analysis tools to inventory their dependencies and then, you know, list that compliance and security issues that they have. Next slide, please. Just actually before we go to the next slide. You talked about dependencies, but each of those dependencies have dependencies and have their dependencies, right? So it's turtles all the way down. So the problem you're talking about really covers not just the complete set of direct, but also the indirect dependencies that a project might have, right? Yep, exactly. Yeah. I've kind of touched on that with basically this isn't the whole story. But yeah, to like kind of iterate further. Essentially, every one of these dependencies can have their own dependencies. And sometimes those can be like hidden even to where, you know, like in this example using a cargo project, they might have been might actually statically link new dependencies in a way that's not even in this manifest. So yeah, they're the whole story is pretty complicated and and multi layered. Great. So yeah, so before we discuss identifying dependencies, we'll kind of go through some common methods of including them in C and C++ applications. This is not necessarily the entire list of ways you can include dependencies, but definitely these are kind of what we have seen as the most common ways that are including dependencies. So a very popular way for dependency inclusion is to vendor them. And that essentially looks like placing some subset, maybe it's the whole thing, maybe it's just part of a dependency into some kind of sub directory of your project. For example, inside of a vendor directory. This screenshot shows the Facebook Folly project put into a vendor project and a vendor directory inside of the project directory. And the benefit to this is it's relatively straightforward to get started. You just copy and paste the code and you just integrate it with your build system, which can be messy at times, but you know, then you have full control over how you compile it and exactly what parts of the dependency are compiled. And they just kind of slot in. Next slide, please. Another common way to include dependencies is static linking. So, you know, vendering is potentially easy and gives you a lot of control, but sometimes it's preferred to compile the dependencies into binary statically and then distribute those and just include them that way. This is really common for like internal teams developing a library and then internally or for like vendors who develop some like library that you have to either pay for or, you know, they just offer the pre-compiled binaries either way. But these are common ways to include those dependencies. And sometimes these can be centralized into the package manager as well. Using a Linux package manager sometimes it pulls in static linking or statically linked libraries as well. Next slide, please. And then the last common way that we've seen dependencies included is dynamically linking. And this is very similar to static in that you link against a pre-compiled binary. The main difference is that with static linking, that linking is performed, that compiling together is performed at build time, whereas dynamic linking, it's done at runtime. So whenever you distribute the application to the end computer, that library has to be present on the computer. But this is a very, very common way to link in, especially like standard libraries or very commonly used libraries across the ecosystem just because, you know, often you can assume that they're there or make sure that they're there by installing them first with some installation script. And then that makes your deployment a little bit easier and not to mention you can patch your dependencies as well and then patch all your applications at the same time. Next slide, please. So, you know, the pros and cons of these kind of dependency inclusion methods are it really varies my project and even by dependency. A lot of the time, C and C++ projects will use all three of these in some mixture because some of them are better for some dependencies and some are better for others. So, you know, vendering, you can build it from source at the same time provides maximal flexibility. What I think on is that it's kind of difficult for SCA tools to track, essentially, since we have no package manifest, and these are just files that are part of your project tree. There's no obvious way to kind of rope them off as like a dependency. I think that like vendor folder that I mentioned earlier can sometimes be a clue but there's no like standardized way to refer to that and so, yeah, it's very, very much a mixed bag as far as like analysis goes. Static linking is a great middle ground between vendering and dynamic linking. One of its biggest con I would say is that it's hard for compilers to fully inline binaries. And so, often the binary that's actually produced is going to be larger or less efficient than like being able to compile the source yourself where the compiler can reason about the code more correct or more fully. It's really, really hard for SCA tools to track without some kind of package manager or like a linker integration. And that's largely because like, once you compile in that statically linked binary, there's it largely disappears. There's no, there's not even the like source code to statistically analyze like we can do with vendering. Often these are totally invisible without, like I said, without some kind of like ability to integrate with like what the linker is actually doing. And then the last that we covered is dynamic linking. It's really easy to include these in a project. It's biggest con is that it's difficult to create a reproducible build and runtime environment. I kind of mentioned that whenever you deploy a binary that is built with dynamic linking, you have to ensure that that dynamically linked library is on the target system. And sometimes, sometimes it's there but it's not exactly the same one that you built against so things will be a little bit different. So that can, that can just be tricky, certainly manageable, but can just be tricky sometimes. It's also really easy for SCA tools to track because we can just ask the system, what dynamic dependencies is this binary using. We can look those up against the system package manager or, you know, like I mentioned Conan like if Conan were to pull in something like this, then we could theoretically ask it. So yeah, so it's, it's pretty straightforward. Yeah, this is, this is great Jess. I just one quick thing to make it maybe a little bit more concrete so you know we talk in the, in the title of this about this kind of blind spot that's there with regards to dependency management and I can see how with these different inclusion methods, it's easy for things to kind of get lost but I wonder if you mentioned the heart lead example in open SSL a little bit earlier, I wonder is that the kind of thing that could get lost in this blind spot if you're not kind of able to see all these things at the depth that you need to. Yeah, good question. Yeah, I definitely think so. The stuff like heart bleed. I know that this isn't C and C++ but stuff like the name is escaping me the, the log you want log for Jay. And stuff like that is definitely things that we can, that we can easily lose track of with like the dependency graphs of modern projects. C and C++ often try to use fewer dependencies than other ecosystems like say JavaScript with npm. But still like the dependency graph is usually relatively large and relatively hard to track, especially for you know the developers that are trying to trying to fix and ship things. So yeah. We have one more quick question that came in on the Q&A as well so are you familiar with NYX and if so where do you think that fits in here. Great question. Yeah, NYX I think fits in really well kind of. I think that NYX is very well positioned to solve a lot of these problems. I will caveat this with saying, I haven't personally dove too deeply into NYX. I've done a lot of reading about it but not actually used it. But actually just before you go too deep like for maybe folks that don't know what NYX is maybe start with what is NYX and then kind of how do you think it fits in. Oh, yeah, good idea. So NYX essentially kind of at a high level is a deterministic package manager essentially, or at least that's how I think I would describe it. So the idea is like with a standard package manager system, like say apt, you might say apt get this package and the its dependencies kind of depend on what the package manager wants to pull in at that time like whatever it thinks is most appropriate. And, and, and you know the version that you get will depend, you know, if you ask for a specific version that's fine but like, if you ask for like the latest and it'll depend on the latest. And so that whole like graph of dependencies of a thing that you pull into the package manager is like very dependent on the time at which you pull it in, or can be. So NYX is very much more deterministic where usually all of that will will be more more static and more reproducible. So, you know, when you pull in one year, you know what you're getting. And I think that NYX works really well for like dynamic linking I think it's, I remember reading that can be like kind of hard sometimes to get it to set up well with like certain compilers. And like some non standard locations but assuming you work around all that I think like dynamic linking is like largely solved, like the frustrations of the reproducible build is largely solved with NYX. I think like static linking is, is largely the same. I think vendering doesn't really apply because that's not really pulled in with the package manager so that's like still you're kind of on your own, but NYX can definitely help with the with the end of the year. Great. Yeah. So we talked a little bit about CNC plus plus dependencies, how we include them so we'll now we'll talk about how Fossa identifies them. Next slide. So our design principles with our CNC plus plus support for Fossa are maintainability is better than correctness is better than speed. And I'm sure this will be kind of familiar. If you've looked at any of Fossa's other blog posts or anything before this because this is a very common kind of talking point for us. But essentially, we think that maintainability leads to long term correctness. And, you know, our customers care more about correctness than speed. So that's kind of where we focus. And so what you're seeing today is kind of like the result of those principles where we're focusing on maintainability first and then correctness and kind of, you know, this is this product is still like early stages so. So yeah, we'll be going along those those principles as we go. So our first strategy for I'll talk about these in the same order that we talked about before. Essentially, for vendor code, our strategy for this is VSI vendor sourcing identification. And VSI compares the fingerprints of files in your project with the ones that we've seen on the internet. So we have a separate process essentially that crawls open source repositories that we know about. And records fingerprints for all of the, all of the files that we find and records metadata about those dependencies. And then, and we kind of store that in like a large database in the cloud. And then when you actually scan your project, our tool performs fingerprinting on your files the same way, and then uploads those fingerprints and the fingerprint file paths to our back, our back end, which then essentially compares, compares your source code, like the fingerprints of your source code against open source projects that we've seen. And our algorithm is actually kind of, I believe, kind of unique, obviously, you know, who really knows what everybody's algorithms are, but it seems like ours provides results that is a little more, they're a little more. A little more accurate because we, what we kind of try to do is apply kind of what we think of as like a file subtree mask on your, on your project. And so essentially whenever we look at your project rather than looking file by file for matches instead, we're looking kind of subtree by subtree. So like we look at all of the files in a directory and all of the files in like a given subtree and compare that against an open source subtree that we've seen before. And if it closely matches enough, then we'll say, Ah, yes, we believe this to be this dependency. So we think it's pretty, pretty able to resist like a lot of the noise issues that a lot of a similar solutions have. Next time. Sorry, quick question on this, Jess, is the technology used for OSS snippet scanners. Oh, sorry. I didn't catch all of that. Sorry, is VSI the same technology that's usually used for OSS snippet scanners. Yeah, it's very similar. So snippet scanners will typically categorize the contents of a file into, you know, different snippets. And then they'll compare those. And then usually they'll often, well, often what they'll do is kind of try to group those up and like, you know, eliminate super noisy snippet matching. And they'll be pretty noisy because they're looking at like all of the different pieces of a file, which a lot of files have a lot in common, even in disparate projects. Kind of what we do instead of snippet scanning is rather than go with that like sub file pieces to look up. We kind of look at like more than the file level, like we like kind of the files are like our base unit. And then we try to look more at like the directory level or the subtree level rather than smaller pieces of a single file, essentially. Right. Yeah, so the next one is static linking. So, FASA identifies static link dependencies, mostly with user provided data. So essentially users tell FASA about a compiled binary that they're distributing. So like in this screenshot we have this live Jason internal. So in this scenario, the tool or the team that makes live Jason internal. Oh, would tell FASA about this binary and say, I've built a compiled this binary, and it is this internal project named live Jason internal has this license whatever. Or, you know, if it's like a open source project, and they're just like pulling the open source project and distributing it, building it and distributing it, then like they could fill in the open source project information. But either way, they tell FASA about this binary. And then later, when users are scanning their projects, our system like the VSI system that I talked about earlier, essentially is able to look for these user provided matches first and kind of eliminate them from the match tree. Okay, so so the first thing that it does is basically say, well, this binary was for this file was one that was told to the algorithm as being this dependency and so it'll basically say okay that's a match and then rope that off from the rest of the algorithm. And so essentially, you have to provide the data to FASA, but once that's done, it can detect it and any other project in the organization. Next slide. Quick, quick question on dependent on static linking we got in the Q&A so if you're using an artifact or like JFrog or SPAC to manage dependencies in static linking. Then, is that also then can you also use that to generate S bombs and perform SCA and other things or kind of how do those tools help. Yeah, so the way to integrate that right now would be to essentially pull in this could be like an automated process, but essentially to pull the artifacts from like that artifact instance and then, you know, register those into FASA saying like this is that artifact and it has this information. And then yes, once that's done, we'll be able to then report that as a dependency and included in our, you know, in like S bombs or other reports. Great. And then the last method that we've talked about is dynamic linking. And I kind of touched on this earlier when I was saying that like for SCA tools dynamic linking is pretty simple. And we essentially take advantage of that for our dynamic linking support. We just read the dynamic section of the binary. When on platforms that we can do this we just rely on LDD to just tell us the the contents of the binary. And then we use the system package manager to associate those linked binaries with packages. So like if we run LDD on a program and it says you're using libc.so.6. And then we just, we just ask the local package manager like, okay what package owns this binary and then and then we report that along with all of its metadata, such as licenses. The dependencies are linked but not owned, like for example if it was distributed externally from the package manager. The current solution is to flag that as an unlicensed dependency. So you see it and you see the path to it. But it just doesn't have any license information originally, like immediately attached. And next time whenever ready. Great. And thanks everybody for tossing your Q&A into the Q&A blocks. Keep the questions coming. They've been great so far. I got another one here for you, Jess. So among these dependency management methods, how do you feel each fits with an environment where you're using code security scanning tools like sonar cube. So vendor code seems to be the obvious answer, but they're looking for more of a nuanced answer and I think you've talked a little bit about it because all three of them are used. We use all three of them when it comes to code security scanning but I'm curious like, from your perspective, how do you think each of them fits. I think when it comes to tools like sonar cube, I'm not. I'm not sure, but I don't think that they can scan things that are not code. So I think that like rendering code is kind of the only thing that they can support off the top of my head. I'm not certain that this is the case. But that's my understanding. Certainly, I think that. But it kind of. I think that like ideally for these for tools like this or other static analysis tools like sonar cube. The ideal would probably be rendered. And then like for static or dynamic I think probably the best bet would be to have that as like a, like a, like as a project that you scan with those tools, and then, you know, you can track that dependency through fossa and then like, or, or some other SCA and then like scan the the local or scan that project using like the sonar cube and then you, you have more, more insight into like what they're doing. Yeah, I, I think, I think that would be best apologies if I'm not remembering sonar cubes capabilities correctly. Okay, we're putting you on the spot here just for throwing you all the heart of the curve balls. But we can jump to the next line up. Yeah, so I think the key takeaways for this are essentially just because cnc plus plus dependencies are so unmanaged typically, you know, SCA tools like fossa. We've traditionally struggled to accurately identify the dependencies in this languages, just because they're so varied and they use the use cases are so specific, like everyone is used in different scenarios and so we have to develop solutions for each different situation. And this has left organizations more vulnerable to open source compliance and security risk. You know, you can't mitigate what you don't know is there. And so, we think that with what we're bringing to the table now, it's, it's going to improve this support a lot for everybody who's using cn plus cnc plus plus to be able to view their dependency graph and a more more fully featured way. Another question here for you. So, and I think this goes back to some of the things you mentioned in doing talking about the vendor code but for some of the cnc plus plus dependencies that do not have kind of the pre build package. This team is downloading the source code from GitHub and then building it in their CI CD process and then statically linking it. So in that case, how can we generate how would they generate an S bomb because the hash value fingerprint of the dependency will not match the open source database or may not match the open source database. Yeah, so I think my, I think what I'm hearing is because it's being downloaded and then built and then statically linked, I guess, in a separate step. The concern is that we wouldn't see that statically linked dependency, because the source is no longer there. And then the best way to make that work with our current support would be to during that process where you're building it into a static binary. Essentially, we support the ability to scan that original source code and like store that in fossa, and then link that built binary with that static linking process that I talked about before. And then that way, you have all of the information that we found for that, for that dependency, during the initial scan, and then it gets statically linked in, and, and then identified with a VSI scan further down the pipeline. So I think that that's, I don't know if that like well explains it it's a little complicated of a process, but definitely I think that there's a basically the our static linking support is designed to handle a pipelining system like that. That makes sense and you keep basically you keep the source code so that you can do that original scan and then link it to the binary one to produce the binary that makes sense. Yeah, essentially, like, it's like, during the build process, so it'll be something where like you'll run like, you know, gcc dash or whatever whatever. And, and essentially like right after that command before like the source code is cleaned up or before it moves on to the next stage in the CI pipeline. We would just run fossa and and record that linking, essentially. Great. Thanks again for the question do we have any other questions I'll give it maybe 15 seconds or 30 seconds here for folks to jump into the Q&A and ask any final questions. Otherwise, we're going to wrap things up here. I don't see anything just yet so I'll jump to the thank you screen. I want to thank the Linux Foundation for hosting us today. And thank you, Jess for sharing all of your experience with us. If anybody has any further questions, and I knew it one jumped in right when I was ready to wrap things up. Jess, how do we define package URLs for C C plus plus open source libraries because there's no ecosystem for C C plus plus and so it's very difficult to identify the source of a dependency like you would for JavaScript, where they have npm. Yeah, definitely. In our case, since we have this. So it again kind of depends on the way that you're including them. So in the case of like dynamic linking. We'll start there because the simplest. The package URLs are dependent on the system package manager. So like if you're using Debian and using apps on Debian. And we find this inside of your package manager, then we'll just rely on apps information. So, you know, whatever homepage is given for that is the homepage for that package. For for vendor source code, essentially that depends on where we have seen that source code. So, at the very or when I first talked about the VSI process, essentially we scan known open source code host. So like for example, GitHub, you know, we'll scan GitHub open source repositories and that way we can match them up with your dependencies later. So we know where we scan this from and therefore we are able to tell you, you know, where that code is, or at least where we saw it. So like, depending on depending on where we got it from like, you know, GitHub, a specific project on GitHub or other source code like, for example, we don't support this today but we're we're planning to work on it but like for example, source forage, or other code host like that, like we would know that this dependency came from this place and therefore point you there. And then the last one is like static linking which, since it's binary only. That's a great way to tell you where the source code came from. And, you know, it's a binary. So there is no source code attached and there's, you know, it's not in a system package manager usually so with that one we kind of rely on the information that is given at the time that the binary is is linked into VASA. That's great. And I'm purposely speaking slowly now in case anybody has any last final questions they want to toss into the Q&A. I know one's going to come as soon as I start wrapping things up, but I'll do it anyway. Thanks again to the Linux Foundation. Thank you, Jess for sharing all of your experience with us. If you have any more questions or you'd like to talk with us more about dependency management, especially with C++, you can reach us at resources at FASA.com. And if you'd like to give FASA a try, you can try us at try.fasa.com. I want to wish everybody a very wonderful holidays. Thanks for joining us and thanks again, Jess. Thank you. Thank you so much, Jessica and Erin, for your time today. And thank you everyone for joining us. As a reminder, this recording will be on the Linux Foundation's YouTube page later today. We hope you join us for future webinars. Have a wonderful day.