 Welcome back everyone to theCUBE's live coverage here in Vancouver, British Columbia for Open Source Summit 2023. I'm John Furrier, host of theCUBE. My co-host, Rob Streche, breaking down all the analysis in Open Source as the winds of change are here. Open Source is seeing a lot of security, momentum, massive AI in migration of new capabilities. This is going to shake and rattle the innovation cage of Open Source. And Ed Warnake is here, Distinguished Engineer at Cisco. theCUBE alumni and co-founder Omni Boar, an exciting new project. We're going to discuss, we're going to demystify S-Bombs and break down what is the future of security in cloud native and in networking and network meshes. All things security. Ed, great to see you. Good to see you too. Good to be here. I'm super excited that I ran into you because one, you've been on theCUBE many, many moons ago. We've worked in our 13th season. God, I can't believe it's been 13 years. It's been an amazing run. So much has changed in Open Source. It's been such a great journey. Open Source is one. It's now the software industry. So now it's not really like a thing. It's nothing. This has been a huge thing. Just what's your personal take as you pinch yourself right now and say, hey, we've won. It's winning. Now there's a whole nother journey ahead of us. Well, a lot of it really comes down to people trying to figure out what to do about the fact that an Open Source ecosystem is intrinsically highly distributed. So I can sort of remember when the realization started fading through that significant amount of the software that were being shipped, were actually not the software that was written by the supplier. It was all of this third-party software coming from Open Source but also from other commercial suppliers who themselves had commercial suppliers who themselves were using Open Source through this very vast tree that is sort of the software supply chain. And it's a much more complicated space than most people appreciate. Well, I want to get into this segment because this is really one of the hottest topics. We've been talking about it for many years on theCUBE around security and Open Source and supply chain but developer productivity. But Sbom has hit the scene about a year and a half ago as kind of like a conversation. Software bill of materials. What's inside the packages? What's inside the containers? Kubernetes are there. Microservice is more complex than the, you know, a lot of service mesh activity going on. What is Sbom? And demystify the reality. Is it helping things? Is it hurting things? Where are we? What's going on with Sbom? Take us through the state of the market with software bill of materials. Fundamentally, Sbom's are a response to that realization that you have this whole supply chain sitting behind any software that you're using whatsoever. So a software bill of materials is attempting to provide a means to describe what it is that went into the software that you're shipping. So people will often talk about it as a list of ingredients in the back of a soup can or something of that nature. The goal being to understand the kinds of things that can go right and go wrong with your software. So the one that everyone is very focused on right now is security, you know, am I vulnerable to CVEs? But it actually has its roots in software licensure. It has applicability for things like looking for latent performance issues in the systems that you're dealing with. Anything that involves understanding the root cause of your problems requires that level of understanding that Sbom is attempting to bring onto the scene. Yeah, I think you brought this up. It's not a new problem. And I think you seem to, you know, in our pre-talk, you bear this badge proudly for many years here. It's a very old problem. So the sins of one's youth often haunt you throughout the rest of your life. And what are the reasons that have some perspective? Explain. Not for all of them. One of the reasons I have some perspective on this is that I actually built Cisco's industrialized Sbom facility in the mid 2000s, where we needed to be able to capture and take action on what was in tens of thousands of software releases a year that were going out the door. And that's a very daunting problem. And so one of the first things we realized we started doing it for reasons of licensure, making sure we were in compliance with open source licenses, et cetera. And we realized very quickly it's applicability to security because we had come to the realization that the majority of the security surface was actually coming from what we referred to as third parties at that point, right? Third party security was coming from open source that was being consumed, was coming from the commercial software that was being consumed, the things that was consuming all the way down the line. And one of the first things that's very difficult to wrap your head around is how deep the tree is. Because people tend to think, oh, well what did I ship? So I typed, I have to get install foo. What they don't immediately realize is the dependency tree for foo contains 200 things in it when you actually set it all the way down. And that's just at the level of the components. It doesn't get into the fact that each of those components isn't really a thing, it's a bag of parts, right? So if you look at the Linux kernel, for example, there's about 50,000 source code files in Linux kernel. A typical build of the kernel uses a few single digit thousands of them. And so if I tell you that a particular version of the Linux kernel has a CVE and you look at your little list and say, okay, well I'm using that version of the Linux kernel, you have no idea whether you're vulnerable. And in a world where you have this firehose of CVEs coming at you all the time, that matters. I mean, the running joke right now is we should start talking about systems in terms of the molarity of CVEs involved. For those of you who really remember your chemistry back to Avogadro's number and 10 to the 23 things, it becomes almost impossible to address all of them. And so you need to prioritize. And this is made worse by the fact that often the way we constructed our S-bombs, together with the fact that components are not actually the fundamental objects, enormous percentage of the CVEs that you might think you have looking at from an S-bombs perspective, you don't have. And so often the most common resolution of a CVE in an S-bombs world is to investigate and discover you don't have it. So how does Omnibor help with this? So Omnibor was actually a side effect of both past experience I had in the early days of SPDX together with the new prominence of S-bombs that came out after the colonial pipeline hack. So what's SPDX for those who don't know? For those who don't know, SPDX is sort of the ISO standard for how you write down your S-bombs so you can exchange it with other people. And this exchange of information becomes crucial because again, if it has to pass through a supply chain where it goes through 17 hands before it arrives at the person who's going to ship you the product that you're actually dealing with, each of those 17 hands have to be able to, in a consistent way, pass that information along. Right. So Omnibor helps in what way with the SPDX? So Omnibor takes a very interesting attitude. It sort of turns the problem on its head. I very firmly believe if you don't like the outcome, you should change the rules. And clearly coming in after the fact and trying to scan to determine components was producing a lot of false positives. Clearly it was producing a lot of very, very messy data. But what you really have in any piece of software is you have software artifacts, right? So I give you a Docker container or an executable. And each of those software artifacts themselves were built from other software artifacts, maybe object files, class files, jars, Python files. And each of those get built by other software artifacts like source code files. And at the fundamental level, those source code files are where all the CDEs live. Right. You very rarely, there are some exceptions, very rarely see CVEs that don't originate in a source code file. So Omnibor takes the attitude of working out a very simple means of when I have a build tool like a compiler or a linker, allowing it to capture just the identity of its inputs into an input manifest and associate that input manifest typically by embedding in the output artifact and identifier with that artifact that you've produced. So if you could imagine just something as simple as building a Hello World program in C. Yeah. The compiler can see what CNH files are included. It can construct the input manifest, capturing all the get object IDs, which is how we are indexing all of this in our source code repositories with get. Right. And then take the get object ID of that manifest and insert it into the resulting executable, say at an L file section, there are places to put metadata in pretty much all output artifacts. Right. So that when I pick that output artifact up, I can say, okay, what's in it and get what we call the artifact dependency graph, which is literally just these hashes, these identifiers, all the way down. What Omnibor does with the artifact dependency graph differs from Sbom is Sbom captures a lot of the metadata that you care about. What component inversion did it come from? Do we know the license associated with it? Who's the contact person for it? All kinds of things that are crucially important, but are complicated. And by just focusing on the artifact dependency graph, we get things that can be embedded into the compilers, the linkers, the runtimes, et cetera. And then you can map into this skeleton with the sort of flesh and organs of the metadata that you have in your Sbom. And so it gives you the ability to get an absolute precision X-ray of your software all the way down to the source code files. And then you can reason about it at that level. You can understand, okay, Log4j comes from the JD and I realm mumble mumble class. And so I can see from the artifact dependency graph that I have that in my running system and so I should be concerned. And this gives you the ability to prioritize because it may be the case that I have something with a vulnerable Log4j, but I'm not loading that class. And so I'm not actively vulnerable. So if I've got 2,000 instances that I need to go remediate and 50 of them have loaded that class, I know which 50 I should be addressing first. So the accuracy on where to go, look, and remediate, it becomes a key benefit. A key benefit there is the accuracy of where to go and remediate, but another key benefit is a lot of the people doing the construction of these software artifacts are in the open source community where they don't need yet another step in their life. And by making this something that simply happens automatically because you built. I mean, it seems like- Hey, the idea is something simpler. Yeah, well, that's what I was going to get at. I think that, again, this is something I've been involved with all the way back to my IT days when I had to do this for compliance reasons at a financial services company and understand and also so if the software police came in, we knew what we were building and we were in compliance. But the thing that's very interesting is that I think there's not a lot of people who talk about getting rid of complexity. There's a lot of projects out there. How does this fit in with the stuff that the CD Foundation's doing and others that are out there? The intention is to sort of borrow from the UNIX bottle, do one thing and do it well. So Omnibor gives you the ability to construct this artifact dependency graph at complete precision all the way down the line. There's a lot of more things you have to do to actually get the supply chain security, but there are many people working in those areas. So we don't concern ourselves with how you should sign these input manifests or sign your artifact dependency graph because Sigstore does it really well. We don't concern ourselves with how you might want to produce other kinds of attestation along the line because Intodo does that well, but we're complimentary to them because if I have an Sbomb, I can reference the identifiers in the artifact dependency graph. I can index things and use an artifact dependency graph to generate an Sbomb. If I'm using something like Intodo to produce attestations about who built what, how and where, who tested what, how and where, all they have to do is capture the get object IDs of their artifacts, which you don't have to be in a get repo to do. The actual computation of that hash is very simple. And now I've got back pointers into the skeleton that allow me to do all kinds of innovative things in other areas. And I got to ask you, I mentioned skeleton a few times, you mentioned graph. Is it a graph database that's the key here or is the key secret sauce? Well, it's remarkably simple. The most common confusion we get is people who read into it and are absolutely convinced they don't understand because it couldn't be the simple. That's me, by the way. It's something you could load into a graph database, but a graph database is really about how you store and make queryable information. The fundamental structure when you build anything is in fact a graph structure. It's a structure where object files have a bunch of input source code files. Where executables have a bunch of object files that have been linked into them. Where, you know, I've got Java files that compile the class files that get loaded by class loaders into JVMs. And when you look at the structure of those and you squint hard and you drop all the particular context of all the many different kinds of languages and environments we work in, you realize it's all just this graph. And so we provide a very simple way of capturing and characterizing that graph in just that graph. None of the metadata information about it. So I can say things like this CVE comes from list of object identifiers for the source code files. And then I can see where I have those in running things. Because that's the key, right, is traceability. And I think we were talking about it with the CD events announcement that was earlier today and how do you get and understand how something got to where it is. So that's a, would seem to be, and you tell me if I'm right or wrong, a huge advantage to Omnibor is that it gives you that traceability. It gives you that traceability and it gives it to you in a way that's actually usable. One of the tensions that you have in Sbom always is between wanting to publish the information out to the broader worlds so they can use it, but not wanting to reveal too much of the secret sauce of what you're doing. Because the artifact dependency graph is just a bunch of opaque identifiers, the fact that I see an identifier there for some proprietary piece of software, some proprietary source code file, doesn't mean that I know the name of the file. It doesn't mean that I know what's in it. It doesn't mean any of that stuff. And so you can reveal the parts that are actually interesting and crucial to your customers or that may become interesting and crucial to your customers without having to reason, have the difficult conversations about exposing proprietary information. Yeah, that's key. And I think that scares a lot of people off from Sbom's is like, you know, how much of my proprietary stuff do I put out there? How do I actually make that public to what levels and understanding of what severity as well? Is there any linkage to, because you talked about it, hey, we don't load that class, for instance, or we're not using it, the package is there, but it's not installed or something of that nature. Part of what gets interesting is, if you think about it, and this is something we don't talk about enough, but if I give you an executable and it has an artifact dependency graph, unless that executable is statically linked, it's not actually what's happening at runtime. It's a part of what's happening at runtime. It may link in other things dynamically at runtime with their own artifact dependency graphs. Omnibor gives you a way to talk about that. So, one of the ones I will sometimes describe is, if I tell you that I have a package dependency on core utils, which contains dozens of binaries, I've told you too much. But if I've got a running executable from that, I can see just the things that it depends on, not necessarily the things, the other things in that package depends on. Packages fundamentally, components as we talk about them, are bags of stuff, they're part spins. They're not actually things in a concrete sense. And so, by being able to say, I'm going to talk about the artifact dependency graph of this executable that sits on the disk versus what it's like when it's running, particularly with dynamically loading languages, like Java, it's all about what's dynamically related by your runtime. Node, it's all about what's dynamically been loaded into your node instance. And I don't care nearly as much what's in the file system in your container. I mean, I may care, as a matter of hygiene, I may want you to remediate it, but what I really care about in terms of prioritization is, am I vulnerable because I'm actually running that code? Right. So, this is, I get that piece, I see definite benefits, so I've got to ask you though, where did the origination of this idea come from? Obviously, you mentioned you built Cisco's SPOM facility, was it, you had an issue with scratch and you saw the need, was there an existing problem with SPOMs today? We heard from earlier guests on theCUBE that a lot of people are building SPOMs, but they don't, nothing about the consumption side. And so, we're hearing a lot, I'm hearing a lot about SPOMs kind of aren't up to snuff. Or is that the issue, or what's the core problem? SPOMs are their core, so a lot of this came from a recognizing some of the problems and limitations of SPOMs. So, SPOMs fill a very important ecosystem role, but they're intrinsically messy data. So, if I'm talking about as the fundamental identifier for the thing that I'm dealing with, a component name and a version and a supplier, which are the fundamental fields in an SPOM, well, for open SSL, you capitalize the O, you don't. I capitalize the O, but I capitalize the SSL part, you don't. When you're talking about the version, you put little tiny Vs in front of your versions because you find it aesthetically pleasing, neither of us do. The key, when he's got something that's X dot Y dot zero, just clips the dot zero off the version. I don't. And so, just the simple matter of how do you identify even a component. I use VI, what do you use? It becomes very, very hard to produce consistently usable data, and part of that is we're using things that mean things to humans, but that are imprecise, right? And so, an identifier for a thing should be canonical. Everybody presented with the same binary should agree on the same ID. They should be unique. Every unique artifact should have its own, and they should be immutable. If I change the thing, its identifier should change. And that's what we do in Omnibor, and that's quite distinct from the kinds of valuable information that SPOM brings, which intrinsically, it's just a larger and messier problem space. And so, some of what makes SPOM useful is also some of what makes SPOM unuseful. It seems that your approach in the thinking of the team is to make it more scalable, more operationalized at scale, and multiple parties involved. And something that can be reasoned about more intrinsically by machine. And that's where AI will come in. You can parse through the dependencies, look at priorities, maybe put a workflow together. Good, clean labels are key if you're going to train your AI model, and we've got very, very simple labeling at an artifact dependency graph. And then you can look at the messier reasoning that comes about the metadata that you're instructing the SPOM around it. My final question for you, first of all, thank you for coming on. I know you're super busy and appreciate you always coming on, great resource to share to the community. I need to understand your view on AI. Good, bad, ugly for the community. How does open source, are they ready for the tornado that's coming? Or is it a tornado? And there was just another tool. AI is strange, so my interactions with it so thus far everything I've actually asked chat GPT about that I know a lot about, it gets wrong. It seems to be beyond deceitful. It will fabricate information when it doesn't know. And for people who sort of think that AI is thinking, this is highly confusing, but AI is sort of like, and I'm sure we've encountered, there's a certain species of person who is extraordinarily eloquent and entirely vacuous. And we sometimes mistake the fact that they're able to speak eloquently for them actually knowing anything. And that's the state of AI right now, is that I think we call them politicians. Dave Vellante calls it an average person who writes makes look like they went to Ivy League school. Exactly, and so like I don't expect AI, like there are pretty people predicting that AI is going to put programmers out of work. But is auto-generating code, I mean there are some automation mindset, I mean put chat GPT aside, as you have more augmentation to the human element, writing code, doing heavy lifting, this kind of, it has opportunities and some challenges. But augmentation is the key word there, right? Because the people I know who are getting huge productivity shifts in programming from utilizing AI as an assistant, it's because they know their space deeply and they can catch the errors the AI makes. And it's often also because the code they're generating is very narrow in its scope. It's dealing with a very small problem. I don't see nearly as much success where I have to design a system that's more complicated. It's not a crutch, it's not the silver pool either. It can get you going, help you brainstorm an article, it can help you brainstorm an approach. It's about as useful as an intern. Yeah, it's a pretty good intern as far as I'm concerned. I'll delete that when redundant. But it's in its wheelhouse there. Its wheelhouse is saying things well. And it's extremely good at that. Yeah, well, like I said, I'm worried about our job being replaced here on theCUBE. And finally, how do people get involved in the project? Put a plug in for OmniBor and what you guys are trying to do. Take that last minute to explain what's going on and how people get involved and what's your north star. Absolutely, so OmniBor has a website at omnibor.io. We have a Twitter handle at OmniBor, and that's Omni, B-O-R, bill of receipts, for universal bill of receipts. We have a vibrant community of people working- Hold on, you want to tell them the error correction goes OmniVor. Be careful, make sure it's OmniBor. Yeah, we're not here to eat all the things. Yes, the error correction will get you. We have a very vibrant community of people you can plug into via the Twitter handle of the website who are writing a variety of tools. We've got people who have done work that's eventually going to be upstreamed for the major C-compilers like LLVM and GCC, for the major linkers that LLVM has one, GCC has one. We have people doing work to instrument AVMs to capture OmniBor data as they run, similarly for Python. We've had involvement in the community from folks in Rust and Go and other languages. So what we really need in terms of community participation right now is people who are interested in producing work to upstream into all those communities for the build tools, and then people on the consumption side doing creative work in terms of consuming that data to produce actionable results. And the benefits of having clean, solid SBOM standard is reliability, security, consistency. Reliability, consistency, security. I've talked to people who want to use it to identify source code files that have well-known performance issues. So you can sort of look at your code and say, well, it's not performing before we start throwing profilers at it. Do we have places we just know we're broken? Because it turns out security issues are just one flavor of bug. The world is full of all kinds of other flavors of bugs, and if you want to ask yourself what kinds of things do I need to fix or improve, the very first step is knowing what you have. Yep, yep. Ed, we leave it there. Ed Warnock, you're here on theCUBE, back. Cube alumni, distinguished engineer at Cisco Systems, co-founder of OmniBor. Check it out, the project's got some legs, SBOMs, suitably relevant in this cloud-native world. We're going to need that more containers, more Kubernetes, more microservices, all happening now, all the explosion of cloud in open source. It's theCUBE, of course, bringing all the coverage here in beautiful Vancouver, the open source summer 2023. I'm John Furrier with Mike Kessrich, Rob Streche, co-host. We'll be right back.