 Welcome back, everyone, to theCUBE's coverage. I'm John Furrier, Rob Streche with theCUBE here for Cloud Open Source Summit. Let's say Cloud Native, KubeCon, because we're just talking about it coming in. This is Open Source Summit 2023. You've got a great guest here, Ayesha Kaya, Senior Director of Strategic Insights and Analytics at slim.ai, slim.ai. Also was on the community panel yesterday with Liz Rice and team talking about the conversations here and the keynotes at Open Source Summit in Vancouver. Ayesha, great to see you. Thanks for coming back. Wonderful to be back. So Slim.ai, doing some great work around container and container security. That's your wheelhouse, container security, a lot of reports. Give a quick background for your background and what Slim.ai does before we get into it. Sure, I am a data scientist, not a developer or a DevOps engineer. I have been working for Slim for the last three years. Slim.ai is a SaaS platform. We do container intelligence, container optimization and container security. And we scan millions of containers on a regular basis. It's a data scientist's dream to dive into the data and understand what's going on under the hood. And there's been a big talk about the security of containers, software supply chain, developer productivity, all thread together. You've done a lot of reports over the years on container security. It's moving in the right direction. Can you give us an update on what the current is? You would recently talk about this at RSA just a few weeks ago. Yeah, I was a subscriber as well. What's the state of the art current situation with container security? So let's do this. Let's focus on the recent past and present trends, but let's also extrapolate into the future talking about generative AI and its implications for cybersecurity, which was a core topic at RSA. And it is important here as well. So, John and Rob, you will I guess agree when I say 2022 marked a turning point when it comes to software supply chain security in the aftermath of multiple security incidents. And we have seen at SLIM AI because we are scanning all these containers on a regular basis the magnitude of the challenge from a quantitative lens as well as how we are doing in terms of our response because after 2022, during 2022, there was this industry-wide renewed sense of awareness. And we have seen a lot of effort being put into vulnerability detection and remediation. And customers and governments started to ask for zero vulnerabilities. But the bottom line upfront, all these reports that we've been publishing in the last 18 months, what do they say? There's not a single container category or a distro that has fewer number of vulnerabilities than before. And it is not just a number of vulnerabilities, the complexity has been rising significantly as well. So, the number of components, the packages, the spatial permissions, libraries, licenses, even the magnitude of the sizes of these containers and the metadata about these containers, they are as promised have increased significantly in the last 12 months. And it is not like we are not remediating vulnerabilities, we are, but for every CVE that we see being remediated, we see four new CVEs being added to the system. And it's also these repair rupture cycles that I talk about that's very slow in terms of like when we detect the CVE in, let's say, top publicly available containers, the likelihood that that CVE resolves in the next 180, 195 days is less than 20%. So it's very, very small. And this is without AI-generated code, the influx of new code. What I'll say is that without, you know, being in a fear-mongering state, what we are doing is not working properly. We do not seem to be coping up with the challenges and we are definitely not ahead of the curve. We say we mean the industry or? As an industry, as an industry, no, not. Again, it's them we are looking into the problem by scanning all these containers and seeing if they are going. The publicly available containers, private registries, so we scan a huge landscape, a portion of the landscape. The scan is one portion of it. Yeah, yeah, and then we collect my team. What we do is we decompose these containers, try to understand what's going on in an effort to understand what makes these containers. So we as the industry is not doing well. Yes. Okay, continue. Yes. But I wanted to focus deeper on is the story behind the story, though. So it's not about just like everything is going in the wrong direction. We are putting the right effort, but the way that we are trying to solve this is by throwing more humans at the problem, if you will. So, what we see is... So humans are the bottleneck. You put it, yeah, well said. You can't throw, you need machines to scale up. There's more data coming in than humans can handle. Yeah, let me give you data actually to support that hypothesis. So, let me look into the 55,000 packages in the publicly available containers, right? 40% of the packages explain all the CVEs. So 1% of the packages explained 25% of CVEs, 86% of packages have zero vulnerabilities. And these are not just these noise, like these are software at the core of these systems. So 86% of the code that we see has no CVEs attached to them. And when you look at what packages are vulnerable, you realize that the packages that are the most popular have the most number of CVEs in them. Does that mean that the rest are fine? I call this the popularity trap. The more popular a package is, the more CVEs it has. Does that mean that they are the most risky software? Definitely not. If especially, like what this says is that the things that are under spotlight get more attention and we realize we notice their imperfections faster. But there's a huge amount of software that never goes under security review. Right. I think that's the interesting thing is that to your point, there may not be a CVE. That doesn't mean that that package isn't vulnerable. And I think to the generative AI, one of the things we've been talking about with teams here is that because it's trained on stuff that may have bugs or CVEs that are undiscovered at this point in time, if you're using generative AI to go and build that next set of software, you're actually injecting that CVE in without even knowing that you're building it in because you're just taking the code as is. Is this something that you're seeing as kind of a wave of that second set of, like you said, 25% from, I think you said it was like. 1% of packages explained 25% of vulnerabilities. A lot of percentages there. So it's like a small number of packages account for most CVEs. Right. And like you said, that's because people are actually using those packages so they're seeing it quite often. Is that usually more of the core infrastructure pieces, the Linux distros and things of that nature that you're seeing that as part of that 1%? This is top publicly available containers that have billions of poles in them. And there's a variety of distros here, general purpose as well as special purpose containers in the set that we are looking into. So it is representative of a very large portion of usage out there. What about with those, because I think, again, with the technology that you have and having spoke to you guys while we were over it, KubeCon and container cloud native con. I'm going to screw it up now too. Now you got me going. KubeCon, I know, I don't think you've tried it. So while we were over it, yes. So I think part of it was about the bloat as well. And is it, are you seeing and able to see things that are just not active in there as well and like the bloat aspect of that is intriguing to me. So when we do harden in slim containers, there's a, the percent of packages that are removed is in the range of 50 to 90% depending on the type of application that type of container that we are slimming. So more often than not, a significant percent, majority of the packages that are being shipped into production, by the way, a ton of these packages have zero vulnerabilities in them. Just because of the reason that we like the likelihood that a package has zero vulnerabilities is huge because we don't basically do the security reviews on these packages, right? A ton of it is excess. And that's the main point here. We cannot resolve, we cannot fight with the complication of these things by, you know, without automation. One of the interesting things that I have seen by running a global randomized survey asking questions to developers and DevOps engineers was that majority of companies are trying to solve this issue of excess packages slimming and hardening through manual processes. So vulnerability detection and hardening and slimming, if they are manual at this day and age with this scale of problem, we're basically slimming against the current. And organizations thinking that only doing the compliance, and we are even having a hard time doing that, just saying we are working on zero vulnerabilities are missing the big point. The big point here is that there's a lot of excessive software that's being shipped into production and the unchartered territory with vulnerabilities is huge. And bottom line up front, we cannot solve this problem by, you know, doing what we are doing with more people. How do you scale up the machines to solve the gap for container security? Got the S-bombs, dependencies, a vulnerability might not exist in the database, but you might be in the container. So all kinds of new things come up. There's new approaches, people are trying new things. How do you automate the machine side of it, the scale? So machines, AI, I think is going to first create a lot of problems before they solve a lot of problems for us. So maybe just, there's something to highlight because at RSA, you've probably noticed this as well. Security teams are not exactly, not very happy about the fact that anybody who can prompt can generate code right now. And Liz Rice was telling us yesterday that the security teams are probably not happy that their developers are writing code anyways. But the fact that there'll be more code than ever before. Like there's an influx of new code. And there are these security researchers that are already starting to use AI to augment their research process. So they will be finding the needle in the haystack much faster. And there'll be bad actors writing more robust, malicious code that we talked about yesterday. So there is this dual nature to AI. So we will be seeing a ton of new code. We will probably be detecting, especially as these systems get smarter. Yesterday we talked about AI starting to, trying to understand itself. Like OpenAI was saying that GPT-4 is being used to understand GPT-2. So there's that recursive self-improvement cycle. So it's going to first improve, first create a ton of code, ton of CVEs, and the problem is going to scale to a point where we cannot do this anymore by throwing more security researchers into this. And then I think we will realize that it needs to be all automated. And slimming and hardening should be- Is there a best practice right now, in your opinion, Aisha? What's the current best practice? Know what's inside your containers and ship only what's needed to production. It's easier said than done. I don't want to oversimplify this, but what we see is these bloded containers in production systems. More often than not, they're being pulled from publicly available systems. And it's great that developers have fun with these excessive number of packages, experiment, innovate. But when you ship those to production, you need to be very mindful about the attack surface that you're shipping. Yeah, I think this is one of super interesting because I think that it's going to take analytics and AI and models to get beyond and be learning because I think there's a lot of times as we've had some people on from Finos, which is the financial or FinTech industry folks that are contributing back into open source. And I think what's interesting having come out of that world is you may have an application that uses a package just once in an entire year. And so it's shipped, so it looks like bloat today, but on April 15th on tax day, it didn't look like bloat because that was the one day that package is actually utilized or something to that. I think what's your view on how we're going to deal with the complexities of some of these custom applications that are built on top of these open source packages? Testing the edge cases, like looking into all these, I don't think the problem is right now, those packages that are rarely used. The problem is those packages that are never used, that are redundant, that are excessive and they're still in production systems. And some of them, and most of them have zero vulnerabilities in them. Zero known CVs, I should say, there's definitely vulnerabilities in them. Zero day opportunities in there too. I mean, it's not on the list, it's zero day. And I hear chatGPT's writing zero day too. So again, code pollution coming in, vulnerabilities there. Docker says they're verifying containers. They think they're just scanning. There's a need for verification. Is there going to be a blue check box in the future? For, I mean upon it, I'm just goofing on Twitter, but the point is people want secure validations. There's going to be a flag that says, hey, you know, this is secure. I mean, good starting points. That's probably what we can do with the verifications. But this is a cumulative process. So even if you start with a blue verified tick, right, you keep adding stuff in, you take things out. This needs to be a continuous scanning verification, removal of vulnerabilities. It needs to be an end-to-end system. You cannot just say, this is a good starting point and then forget about what's going to happen next. So we had the cloud open source, secure, software secure foundation, SSF, on earlier theCUBE. They have a whole foundation dedicated to this one problem. Is that good for the industry? Or is it going to be creating more, are you guys involved in that? What's the open SSF about? Josh Pressers was talking about how we are thinking about right now with these several different organizations are thinking about how we add more security researchers to these open source projects, which is a great starting point. So being the data scientist, let me speak with numbers a little bit so that you can see why I'm still leaning towards machines, like deploying more digital minds into the problem. Than more security researchers, so here's the thing. In the NPMs ecosystem alone, there are 32 million packages and we're adding one million package a month. And if we wanted to just do top 10,000 projects in open source, let's do security research on them, draw 1,000 security researchers and try to do this on a regular basis. I think it was Josh who was calculating the numbers around how it would take 3,000 years just to go through the NPM ecosystem with this human effort. So yes, we need to start somewhere. Yes, we need to start learning about how we can improve all of the software that we are putting out there in containers, in open source, in companies and whatnot. But again, looking into the influx of new code, it is impossible that throwing more problems, throwing more people into this is going to solve the issue. But we will collect more intelligence, we can scale it from there by learning about these systems and then automating it using generative AI, I hope. So it's a data problem, it's an opportunity to use data. More data. More people. What's your vision on data usage in container security? What's the best? Well, I have the luxury of looking at tons of scans and learning from them for containers. And there is a ton that we are not taking advantage of right now. So we can get a ton of actionable insights around building recommendation engines, like what's needed, what's not needed, helping the developers do their best with the machine intelligence, with the recommendation engines that we can put into the system. So there's a lot of potential. But I'm also again, worried that because a lot of people who do not know about the security dimension will be generating code, we will have that gap getting larger first. So it's going to be a journey that we will be on for a while. I should, great to have you on. It's great to have you in the CUBE group. Thanks for coming on. First, final question. Tell about your journey in tech. Sure. So I am an AI ethics advisory board member in Northeastern and I'm a data scientist. My career has been in the intersection of cybersecurity and AI in the last 15 years. I'm an MIT graduate. I studied emerging technologies at MIT. So I was looking into technological forecasting, how new technologies are growing over time. I should say that I'm very surprised with chat GPs in the generative AI's hachistic curve, although I've been in these circles for a while. My journey has been, I grew up in rural Turkey. My parents were teachers. The fact that I'm here speaking to you right now, talking about open source technology, being surrounded by brilliant minds. It's such a humbling factor. I'm honored to be here. And I'm somebody who is just like you, John, I think. I'm a cautious optimist. I love technology. I love numbers. I enjoy looking into numbers, but also navigating the waters, looking into the industry direction through a data lens, but also from a strategy and high level, like insights perspective. Yeah, I mean, cautious, maybe not cautious. Optimist. Incurable optimist. You're cautious. Yeah. I'm reckless. Everyone knows that. But I love AI. I think it's the most important movement because the timing of all the things, the perfect storm is here. The compute, the super compute, super cloud, the super data, and obviously the apps. So, and AI is up and down the stack. It's really been quite a run. Yeah, and I think this is also a place where the data has to lead us because there's going to be so much regulation trying to be pushed. And I think without the data, I think some bad decisions on the regulation side may actually be made. So, I think it's super interesting how you're digging into the data and presenting that data back to the community is super helpful for all these folks that are out there and involved with some of these regulations. We were talking to the ones that are involved with the UN earlier on the sustainability side. I think there's a play there as well for the AI longer term. I mean, you're so right on. And I think the whole thing that gets me even more excited beyond that is that to me it's a cultural match that is lighting and igniting a bonfire of innovation and excitement from smart people. I've read more papers and academic papers in six months that I have in six years. Every time you turn around there's a new paper about new vector chaining, the success of LangChain, Haystack, all these new technologies, just in just companies like theCUBE with a language problem, we're enabled. And everybody now has an opportunity to capture value and open source has always been about giving more value than you take. And I think open source is going to either be rocked to the core by AI and that'll either cripple it or expand it rapidly. I think that's what the question we're watching is how does the open source community which wasn't built for this kind of velocity, will the spaceship to handle the new turbulence? What do you think? I mean, I think we need to be very thoughtful, deliberate, intentional about the next steps here. But, I mean, things are shifting so fast. Eight months ago in my ethics advisory board, the things that we were talking about was, chat GPT was not even out but we were very familiar with language models. So we were saying, this is like autocorrect. It's just like playing the game of what's the next world. A couple of months later, we started talking about how it is simple but surprisingly lifelike, right? Then a couple of weeks later, we started talking about, hey, this must be, this may actually steal Google's digital crown. There's something to this. And all of a sudden with auto GPT, right? We are talking about how AI is becoming a real decision-making agent for us, all the APIs. Now I'm seeing another current here. I was at MIT's Generative AI Summit last month and Eric Schmidt was talking about how we are playing with fire when it comes to AI. I have heard even- Eric Schmidt. I have heard Max Hygmack was talking about how AI is becoming a reasoning engine and sentient very fast. We are talking about super intelligence and whatnot. But I think I met Satya Nadella on this topic, Microsoft CEO. He's talking about how he's so optimistic that we are empowered like never before. We will be doing a lot more with less. What is happening is basically a user interface revolution. The technology was here, but right now the users are enabled like nothing before. And this can, I believe in humanity. I think we will do the right thing. We will take advantage of this. It's more of a question of like, are we ready? Are companies thinking about this properly? Are individuals thinking about this property? Because there are a lot of opportunities here. Well, our conversations that they're finding here, they're pragmatic, they're ready. And I don't think it's going to be a shock to the system. I think it's going to be an expansion, a glorious expansion for open source, just a whole nother level. I should great to have you on theCUBE. Thanks for coming on. Really appreciate it. Thanks for that extra insight into AI and your cautious optimism. It's contagious. Thank you. I feel great. Thank you for having me. Rob, thank you so much. Okay, CUBE coverage continues day two. We have one more day, three days of Coal to Wall coverage. We'll be right back in Vancouver with theCUBE after this short break.