 This is not this is going to be one of the more difficult talks that I have given Not because of the things that I'm about to share But because of the things that I need to keep to myself And you might be asking like why what is she talking about? So I'm a keynote speaker on KubeCon stage in two days and My research team and I've been working relentlessly on this report this thought leadership article on top publicly available containers on Docker hub and In the last couple weeks we are burning the midnight oil and I'll be sharing Those results right after we publish the report online tomorrow on stage. I'll be talking about the high-level insights but I wish This conversation was a couple days after that point So that we do a first principle steep dive into the findings and talk about what we found in that research But today is not going to be that day My growth team didn't allow me to share a lot of those insights the cool thing is that I would like to give you a An insight into the journeys that we embarked upon and we have published the same report last year So I'll talk about that a little bit and I'm going to share some insights of being a data scientist I did some sneaking into and and pull some clusters in the data set some samples that will signal the findings So we will go into that so I I know that I might be able to get you excited about that But my hope is that you download that report and maybe come to The cube con keynotes so that you can hear the messengers I'd like my goal for for that speak is basically just get out of the mail of the message because I think You know, I don't say this very lightly, but this time I think data speaks for itself Before I go into any detail I would like to talk a little bit about myself and slim AI the company the startup that I work for Before I do that dough, can I see a show of hands? How many of you have heard about slim AI or darker slim of impressive several people? so slim AI or Maybe I should start with darker slim because that's our open source genesis our DNAs darker slim is a popular GitHub project on container optimization. It has been on GitHub for about seven years right now A lot of stars. We have passed 15,000 stars a while ago and again It is our open source genesis and DNA with slim AI we take this technology and we build upon it This is the score foundation that we are extending making it simpler more It's more secure layered and more efficient with slim AI our SAS platform at slim we are Scanning hundreds of thousands of containers regularly this year alone. We have seen more than 900,000 unique containers in our SAS platform and as a data scientist working with this data and my research team doing the same We observe these containers. We deconstruct them. We try to understand what makes these containers developer friendly and Production ready. So last year around this time The publish this report slim AI top public containers report 2021 and I will go into the details a little bit and again, I'm going to talk about the Delta between then and now But before doing that, I would like to say something about the data So this is the type of data that Again as a data scientist I really enjoy because it's dry hard cold facts, right? I can go into the system pull the data look into the trends and whatnot But for me, although data is beautiful is decision intelligence that matters and for that You need to look into the people and the processes the data that data's impact on people To be able to do that this year we partnered with a research firm dimensional research and We carried out this survey this global randomized survey asking questions to developers in DevOps engineers and Being a data scientist looking at people I see data points I'm joking. I Would like to ask some of the questions that we asked to those developers to you and I would love to hear your Opinions on that I would basically like to take that survey To a test drive before we go into the technical results So the first question that we asked there are multiple, but I picked a few was this one And it's a simple soft one. Would you like to know more about containers than you do today? And it is more about, you know, it might it might start from a very simple conceptual understanding of containers, but deeper understanding of container slimming and hardening how Containerization works at scale Understanding these things we were we were looking for these things, but I would love to hear your answer on this one So repeating the question. Would you like to know more about containers? than you do today Or you think your knowledge is pretty good already Okay Some some people want to learn more 92% of the people that we asked admitted that they would like to learn more about containers and This one is the sequel to that question, which was eye-opening to me because I realized that You know this question that I would like to read here right now It says which of the following do you fully understand about containers and you can see that most of the developers said they Conceptually understand how containers work Container deployment creating containers and whatnot, but it when it comes to more advanced topics like all the components within a container or Container orchestration those percentages start dropping down and when it when you know We were asking about the hardening and slimming of containers only one in four developers said they understand these areas Fully another question. Did your company ever use publicly available containers? Can I see a show offense? publicly available containers Less than what I saw in the data, but 66% of the developers that we asked said yes, we still do 15% said yes, we don't use them anymore, but we did use them in the past and Only 90% less than 20 said no we have deployed to develop all of our containers in house Next one we're getting close in your experience. Is it challenging? To ensure containerized applications are free from vulnerabilities What would you say? Removing vulnerabilities from containers Okay, some This was an overwhelming yes in the survey people said it's hard to remove these these Issues from the containers. I was actually I want to fast forward to this this one People said the complexity of these containers the numerous components with dependencies in these containers Was the number one contributing factor for why these vulnerabilities? Were hard to deal with you know others mentioned that it takes a lot of time other some people mentioned manual processes Actually, this one the manual processes one was interesting because more than 50% of the developers who answered said they are removing Slimming hardening their containers via these manual processes and one last question This is this is something that I there's a lot of discussion in industry about this one some people that it's It's a land that's out of reach some others think that it should be enforced But I would like to ask you and get your opinion What are your customers or other end users demanding that your containers have zero? Vulnerabilities, can I see a show of hands customers asking for zero vulnerabilities in their containers? Very few. Okay in the survey It said that 70% of the people said yes People are end users, whether it's other departments in our company or our customers. They're asking for zero vulnerabilities So let's move forward to the resource that we have done So this is more of the hot like hearts core Quantitative analysis part so what we did with this one before I go into the details in the results I would like to give you a quick overview of the methodology So We went into darker hub and pulled the top 165 containers And you might think that's a very small sample size after all in darker hub There are more than 10 million images with more than 320 billion Pool volume right so one hundred and sixty five may seem like a handful But these containers Together are a huge sample size in terms of the pool can't but because they account for more than 30% of the pool volume There are certain containers in the sample size in the sample set that has been pulled by five six seven billion times Okay After we pulled them the latest versions of them they created them into nine different categories and You can see some of them in that Circular dendro ground that maybe it's too small But we have programming languages general purpose categories and the spatial purpose categories like local development infra and in data science type that you know high-level categories and We scan them using standards open source tools. I have some of my favorites that encore. I'm a huge fan I use their darker extension to the sift in the gripe tools for a spam and vulnerability analysis We use the darker slims x-ray tool and the questions that we were trying to answer was that are these containers easy to use Are they efficient and safe and will they cause any issues? When we pushed them to production with our applications Okay, and You've been looking into these regularly scanning them every day looking into a time series analysis How the vulnerabilities are evolving in these in these containers? Regularly, but at that time here were the high-level findings Okay, the first one was that there was this perfect correlation Between the scan times of these containers and their sizes for every 500 megabytes added to the system we were seeing 50 seconds longer scan times and if it is a few Containers being shipped by a developer every now and then that's one thing But we know that scale changes everything Right if I ask you to buy me a pint of ice cream That's one thing if I ask you to buy me 20 million of them by next Monday Without letting anyone of its about melts. It's a different problem. So it's scale these things start mattering The second finding was about the complexity in these containers, okay so there are a few box plugs charts on the right-hand side and We're looking into spatial permissions libraries packages in these containers Let's do a deeper dive in one of them the packages. So you can see on the x-axis here I'm showing programming languages web development all these different categories that we have on the Y-axis you see the number of packages in these different Container categories and you can see that it is not just the Outliers, but even the averages are surprisingly high and if you think about this right, you know in Programming languages category for example the average Package size is about 350 right this number is supposed to be the tip of the iceberg Because there are studies I'm sure you might remember there's this one But there are multiple studies like that. This one is from the Darmstadt University 2019 research where they looked into a specific ecosystem the npm ecosystem and they said the package reach of the top five packages was between a 134 thousand and a hundred and 66,000 Dependencies other packages so a single package. It's one of the top obviously might have a hundred thousand other packages so a container having four hundred packages right each one having thousands in certain cases hundreds of thousands of dependencies you realize that The tip of the iceberg is an iceberg in this case And maybe one note to make here about the packages you might think That I am proposing that we should be removing these packages these tools from these containers I don't think in those terms I think they make the developers lives really easy. It makes it very experimental fun to work with these containers You can test and deploy and build applications in more easier ways with these types of tooling but the problem is that They also represent an attack surface and it implies a need to Automatically remove these things especially if they are redundant to the system Otherwise, they will make it to production Okay So the point that I'm making here is that if we don't make the conscious effort to remove these Unnecessary redundant packages. We will be incurring tech debt down the road First learning was building upon the category this packages category is the vulnerabilities So I'm not going to talk about the vulnerability count right away because you know that and there are security experts I'm sure I have spent a lot of my career in cybersecurity industry myself attack surfaces more than a vulnerability count There are the the root the user's Permissions that we were just talking about right if the user is root that can create a ton of issues We talked about packages that might be a ton of other issues that are important for an attack surface But I'll say this even the average count of vulnerabilities and the outliers obviously were too high was concerning to me in fact You know some of these having 2000 vulnerabilities that I was looking at these and I thought if a DevOps engineer has to ship some of these containers Why should they even start right? So we looked into the severity categories of these vulnerabilities and realized that 20% of the vulnerabilities were belonging to a high or critical severity category My expectation was not more than five and it was 20% in this year and I can't give you numbers, but it grew up it increased Significantly, but you can see that across all categories from web development to programming languages. There are a lot of critical and high severity level High severity vulnerabilities except the base OS one which has you know some vulnerabilities But definitely not the critical and high categories that we had here So focusing on one of those categories the programming languages Right this category seemed to be very much in line with the averages that we were seeing so not necessarily the most vulnerable You know high in terms of package counts and whatnot, but also not Innocent as well. So let me pull a few things and start comparing 2020 to 2021 With respect to the last year. So I pulled go lang node Python Rust I could have pulled many more are these in line with like the things that you work with Is there any other programming languages that you would love to see in such a comparison? For example, or are these interesting enough? Good Okay, so if you look into it that guy I am just showcasing the number of vulnerabilities year over year so 2021 versus 2022 between go lang node Rust and Python as I said as you can see the only one that seemed to have improved slightly is No, not slightly. It seems it seems substantial. The other ones Python has increased by about 50 percent Rust increased by about 50 percent go lang increased by 20 percent So all of them have more vulnerabilities and it is not as if we haven't seen certain Incidents CVE's resolved from the system. So for each and every one of them We have seen that certain CVE's were being removed, but there were more CVE's three four five times more in this subset three times more Getting added into the system. So our remediation rate is much slower than The one that our detection detection rate for for these containers This is looking into the severity levels of that one and I would like us to focus on I hope you can see the details here But what we have is it's the same order what we have is go lang and node on top 2022 this time you're looking into the percentages of these Vulnerabilities that we have seen and you can see for example, we said node has less This year so look into the 2020 one vulnerability distribution for node and you will see that there are a lot of low and negligible category Vulnerabilities and if you look at this year the high and critical level of Vulnerabilities has tripled. So in a year, yes node made progress in terms of removing a ton of the vulnerabilities But then we also added three times more high and critical vulnerabilities, which you know puts a shadow on to the improvement that we made others Others were already you said they have seen more vulnerabilities some result, but a lot more has been added and When you look into a you know, Python for example the percentages of these critical high Incidents medium incidents. They are also Much higher than compared to last year So the main conclusion that I can give you a deeper dive later on when we publish that report but the main conclusion that I had here was that even when you look into specific categories with very strong communities sometimes If a container that has a strong enterprise behind them We do not see a lot of improvements, especially for this category Programming languages in general. I can say that we are no more secure than we were in 2021 and this is in the aftermath of multiple security incidents last year and An intense focus on software supply chain security. Now. I'm hopeful As an industry We will be making the right decisions and the conversations have started And I do think that this container landscape is providing a ton of opportunities to the Developers in terms of scale, but you can see that it is also presenting certain risks Was there a question? Okay, and we will be doing more and deeper research hopefully leading to actionable research But I Feel like we need to start thinking a lot more about how to improve as an industry What do we want as an industry, right? Our customers want zero vulnerabilities or get close to that dreamland our developers want to enjoy and Enjoy coding not to be overwhelmed by infrastructure They should not need a PhD in infrastructure and understand every little detail all the complexities of these containers But we also want to push production ready containers to production So to be able to do that for us at slim we think you should know what's inside your containers and push only What's needed to production? We believe in automation through intelligent optimization and we are seeking to solve this problem for all of the world's Containers and there are other teams with different approaches and they're relentless and brilliant and they are putting They're losing their sleep over these problems and because of that. I'm very hopeful about the future I'm hopeful, but I'm concerned. I think this is troubling that you know after this much conversation the Direction is not heading in the right direction But again, I do think that the more we talk about these especially with these Decision intelligence points these analytical Insight we will be making better Improvements so with that. Thank you so much for listening to me. I would love to hear more of your comments Online on Twitter on LinkedIn. Let's connect and continue the conversation Thank you so much for your time and looking forward to connect in the conference. Can you take one question? Yes? I think we have time Thank you That's a great point so Not all vulnerabilities are created equal and some of them might not be even exploitable I definitely agree with this like I have seen certain vulnerabilities certain packages Being in different paths in a container for example, right the same package can be Represented in the same container in 40 different pathways and the same package might have like several different packages have the same CVE's That said that same CVE in a different package in a different runtime Represent a whole different risk level so even that even the same CVE even the same package Might have a different security rating in a container itself. So it's a very nuanced detailed problem but Try selling that idea to a government a Department of events in any government saying hey, you know what we have 1000 vulnerabilities here But we don't think it's exploitable you we might not think it's exploitable But we know that they are there it's uncomfortable and there might be in the future certain issues So I know I have a feeling that customers governments will be demanding more of this But we need to have a nuanced understanding of what matters and especially we need to understand What's the highest priority so that we can make some progress right if you go into a container as a DevOps engineer and see 2000 vulnerabilities 40% high and critical Where do you even start like which ones should I start removing and it is not the one container that you need to work with in a Company there are you know in our survey We asked questions about the scale that can be 400 4,000 400,000 containers in an environment right and everything is changing all the time So if there is a living organism so to be able to do that at scale Becomes a huge issue without prioritization saying I'm going to have zero vulnerabilities period I don't think you can make any anything work for you, but that's a great question Can I take one more question maybe no one okay any other questions Go ahead So the question is we have seen 20% in the previous report Critical and high category vulnerabilities in these containers. How did we go about scanning and finding them labeling them? Yeah, depending on which scanner you use you get different results. Oh my god I have pulled my hair so many nights just like trying to figure that question out. I'm so glad that you asked so many different so There are certain scanners that don't even Reveal certain CV is some others do You know some people say like you know over counting things just like Looking at the CV and the same package different path as a different incident count Which I think shows the effort is an occurrence, but it's not necessarily a unique CVE so these scanners need to be standardized Hopefully we will do that soon because I have again spent so much time to understand compare these We have used we are putting that into our slim AI platform so that you can easily dip these vulnerability scanners But you can see those this can be huge so same container might have you know 1000 vulnerabilities in one and 2500 in another and they are both correct and sometimes you need to take a step back and understand But you don't have the time to do this because we just talked about the scale issue, right? We need to standardize these for sure I think I understand how each and every scanner works from a design perspective So I used gripe 3 v in several tools like all the things that that you know the usual suspects I understand they are design principles, but they are very different from each other and it's a headache. That's my answer I guess thank you