 We are on what a great honor to participate at a conference that combines two things that fascinate me, Latin America and open source software, specifically new technologies with a cooperative altruistic mindset through open accessible life changing software. My first open source software was a couple months ago in Austin. I hope you're feeling the synergy that I feel at these open source conferences. The opportunity to learn from the best, ask questions freely, share heart to heart successes as well as failures, but more importantly enjoying the presence of like-minded, brilliant yet truly humble human beings that is this community. I'm beyond grateful to contribute in my own small way today, so what I'm going to do is I'm going to go ahead and share my screen and go into presentation mode, but before we dive into the technical topic that we have here, which is the container insights that my team at SLIM AI unveiled, I would like to share a very quick story on the genesis of my fascination with the Latin American tech space, and I think it's rather unusual. So back in 2009, I was taking this power in negotiation class at MIT, and as a data scientist engineer learned, I was a bit skeptical about the effectiveness of this class. So the first lecture, the professor gave us this real-life case study on the 1999 Ecuador-Peru border dispute that could have led to a war, and he asked us to form teams and come up with a win-win strategy. Ready to present, we were, and he announced that we had a guest, and he opened the door saying, class, please welcome the prime minister of Ecuador at the time of the conflict. He will be, you will be presenting your recommendations to one of the top stakeholders of the issue at the time. You might guess that that changed the entire dynamic in that classroom. The next R is probably one of the only lectures that I remember to a T in my entire student life. We talked about the collapse of language that we call war, but more importantly, we talked about what was at stake, the rich and layered history, the diversity, the colors of the continent, all of its nations, the lasting contributions to human progress, and the emerging tech scene being at MIT. And what would it do to the innovative spirit, right? And ever since I've been reading and researching, learning more about the tech space in Latin America, and as a data scientist who follows tech trends closely, I keep getting more and more impressed and honestly just looking at how the open source community is doing so well in Latin America, I just hope that we can avoid war at all course everywhere for countless reasons, one of them being, you know, the fact that we can keep the innovative spirit up only in peaceful environments. So with that little divergence, let's get back to business. Today, we'll be talking about what we learned dissecting the world's most popular containers. And I would like to leave you with a few core messages, but before we do that, let's set the stage a little bit. We will go through a quick overview of why we do what we do at slim.ai. I would like to quickly go over the container landscape and how it is evolving over time. We will be looking into growth through a lens of opportunities and challenges. Then we will do a deeper dive into the research insights. And once we go with them, we will start brainstorming about the future. What can we do as an industry to improve the status quo? But before I go any deeper, I just would like to leave you with one core message, bottom line up front. This is a very complex landscape and it is not getting any simpler without us interfering. So at slim.ai and with Docker Slim, what we do is we help developers build more secure containers faster and we help organizations secure their software supply chains automatically. We have an open source DNA. We have a very well known open source project in the container optimization space called Docker Slim. It was invented by our current CTO, Kyle Quest, and it is about to surpass 15K stars. So this brings it into the top 0.01% of the reports in GitHub. And it has integrated into thousands of CICD systems across global companies, many teams. On the side side with Slim.ai, we are building on this very strong core and we are making it faster and more efficient, richer and multi-layered. And what we are doing is that we are looking at the container landscape with a very quantitative, very analytical lens. Data at Slim is our central nervous system. Just a tiny little bit about myself and why I'm presenting today. I spent a substantial amount of my time in my career, previously in my career in the cloud, in cloud cybersecurity, in startups and larger companies. I'm an engineer by training. My research and grad work focused on operations systems, operations research systems in industrial engineering, supply chain management, and that cybersecurity and supply chain management background right now is allowing me to look at the evolving digital software landscape with a relatively wide angle, especially now that we are speaking about the year of 2022 as the year of software supply chain security. So without further ado, let's start looking at that landscape. I'll say that it's yesterday's news that containers have become the norm. Some of you might remember this graph from 2020 CNCF survey, which showed an undeniable increase in the usage of containers across development, testing, and production. That bar chart on the far right, you can see how the containers in the production environment have increased from 23% to 84%. That was a result of organizations putting more trust in containers, leveraging them more in their user-facing applications. And proof of concept was the only area where we have seen a gradual decline over time, which basically meant that containers were less of an idea and they were being adopted in production in the real world. Fast forward to 2022. So one of the favorite moments of the year for me is that time and tag overflow publishes their developer survey because I think continually taking the pulse of the development community is key to understanding the true development trends across the globe. Earlier this year, when they published the results, one thing that wasn't surprising, but that Docker came up as the number one most loved development tool, and it remained to be the number one most wanted tool. And speaking about Docker, last time I checked, which is only a couple of days ago, there were more than 9.5 million images on Docker Hub alone, almost 9.5 images in Docker Hub alone. And while still it is still the most popular container registry out there, it is home to half of those containers. So between 2020 and 2021, the number of all-time pools on Docker Hub nearly tripled from 130 billion to 318 billion. I'll take a pause there because think about how long it took to get to that first 130 billion, more than six years, and we have tripled in a year. I don't use geometric or exponential growth lightly as a data scientist, but it is safe to say that containers are being used everywhere, all the time, for almost everything related to software across all verticals. Combined this with the Gartner prediction, they say that 70% of organizations will be running multiple containerized apps by the year 2023 next year. I think it wouldn't be far from the truth to say momentum is only growing here and with no sight of slowing down in the near future. And while this is great, these atomic units of building and shipping software that we call containers are helping us with agility, speed, and efficiency of our development cycles. It's not all roses. I have heard experts talking about how we are living in this torn age of container management. Well, these containers, thousands of them, start covering our microservices, our user applications. We are copying and pasting code and shifting these containers around. But I have personally seen in both startups and in larger enterprises is that you start losing control pretty easily, pretty fast. I would even go so far to say that growth, I believe, has outpaced our capacity to understand and to manage by orders of magnitude, creating a gap for us to do this valve, to do this container management issue, give it a good, efficient management. It is becoming harder and harder as far as we can tell. So I say this not just from experience, but through analytical insights. At Slim, we care a lot about what's inside your containers. In fact, we process hundreds and thousands of containers on our platform daily. And we are constantly trying to understand what makes these containers efficient, secure, safe, but also we want to make sure that they are easy to work with for developers. And since we began the company back in 2020, our data research team has been analyzing the most popular public containers, collecting data on every new version that we see on our platform and analyzing them through these container tools, container analytics tools to better understand the challenges that developers face. So there are a series of different analysis research projects going on. In this specific presentation, I'll be focusing on a study that we have done by looking at the most popular 130 containers available on Docker Hub. And just a little bit about the background methodology for those data and notes, like myself, who is interested in the behind the scenes of this stuff, the very selected containers, the criteria was that we looked into the containers that are in common news. We looked at Provolium, Docker official image status, we looked at popularity. We also added a layer of qualitative selection based on the expertise of our in-house container experts. We looked at our platform, the usage of our Slim.AI beta platform and captured the most scanned, you know, investigated containers there. And you might say, you might be asking a question about the sample size, you know, why starting with the top containers? After all, it's just, you know, 130 may seem like a handful given the large population, the vast population we just talked about. I will say this, that some of these images that we have looked at in this study has been pulled by one and a half billion, two billion times. So if you think about the entire full volume, which was 318 billion, if you remember from a previous slide, these containers represent a significant percent of all use, all daily activity on Docker Hub. So from that angle, I believe it is statistically representative of the population, especially for those things that are used a lot. After selecting these containers, we created them into nine different categories. And the idea was to understand them better via these in-group, out-group dynamics. And within these categories, as you can see on the circular dendrogram on the right, we had more general purpose ones, such as the data science, web development and programming languages. We also had those ones that are more spatial purpose, such as build tools, DevOps and infrastructure. And in terms of the process of analyzing these containers, what we did was we modeled a real-life, real-world scenario for a developer going out there and finding one of these public containers, using them for one of their projects. And to that end, we developed this mock CI CD pipeline. And we started pulling the containers, scanning it through standard tools that I listed here, x-ray from our Docker Slim tool that I mentioned, but also these two great tools from our great friends at Encore, Swift and GRIP for SBOM and Vulnerabilities. I recommend, if you're interested in this space, definitely try out these open source tools. What we did was we scanned it using scanned our containers through these tools and measured the time it takes to scan an image and then analyze the contents of these reports. And basically, like the bottom line, the questions of interest for us was that we were trying to understand, is this container going to be easy to use? Is it going to cause any issues when I ship my application? Is it going to be safe and efficient? So the first finding, probably not a huge surprise, but it was interesting to validate, was that by not optimizing their containers, companies might be risking wasted time and decrease productivity in their CI CD systems. Our analysis showed a nearly perfect correlation between container size and scan time, meaning that within this set, every 500 megabits were adding another 50 seconds of scan time. And as images grew larger, we realized that it wasn't a linear relationship anymore. It might indicate an S-curve, an exponential relationship for larger container sizes. And I talked into a lot of companies in the field. We find a lot of containers that are significantly larger than these relatively large public containers that are in use. So this number, that 50 seconds that I mentioned, may sound trivial for shipping a single container to production. But think about a real life company environment, thousands of images being used, hundreds, maybe hundreds of thousands of developers shipping images multiple times a day. This might mean a real productivity loss for companies who are optimizing. That could be linkages to operational costs and it could be impacting delivery and velocity in these environments. And as you can see on the left-hand side, I added the size to category relationship. So you can see that some of the more spatial purpose containers like Bayes OS, obviously, DevOps infrastructure, they tend to be relatively small. And the programming languages and data science containers tend to be larger. But there is a distribution that it's not like all programming languages containers are that large and not everything in the spatial purpose containers are that small or innocent. The second thing is about complexity. So what we did was we looked into the distributions of several components. We did some component analysis, looking into things like packages, licenses, and special permissions, and a lot more. But those are the ones that I wanted to showcase here. And this was one of those constellation of graphs that I had back and looked at my computer screen with a blank expression for a while because I was expecting large outliers in each category for several different containers. But it turned out that even the averages were surprisingly high. It was really difficult to see hundreds of packages even in small spatial purpose containers. So let me actually pull one of these so that you can take a closer look. What I am showing here is a box plot graph of different categories of containers, where you can see, for example, in programming languages, the lower and upper part, higher the median as well as the largest and the largest in the lowest, the smallest outliers. So as you can see, hundreds, sometimes thousands of containers, thousands of packages in a single container. And even in small spatial purpose containers, we see tons of different packages. And don't get me wrong. Developer friendly tooling, such as shelves, package managers, the libraries, all of them are great and often necessary for experimentation, making it easier and more fun for developers. But if this means that if you're starting with these publicly available containers and not making a conscious effort to remove some of these unnecessary packages when shipping into production, it means that you're incurring inevitable tech debt downstream period. I just wanted to pull this. There's a lot that I would like to say about the package landscape, but we don't have time to discuss that part a lot. But I just wanted to pull this academic study that I became aware of through a sonotype state of the software supply chain security report. This came from Darmstadt University, where researchers were looking into a specific type of packages, the typical MPM package environments. And they were looking into the package reach of these containers. So what we have just seen was the count of packages in those containers. But that is the tip of the iceberg, which turned into an iceberg when we saw those numbers, right? But then if you think about it, there might be certain packages as these researchers find in a specific case that has hundreds of thousands of other dependencies. And the fact that very few maintainers are involved in these top packages, right? There is a ton of issues in the supply chain security space by just like, no, not by introducing all these unnecessary packages into your CI CD flows. Again, a topic for another conversation maybe, but again, that entire box plot constellation of graphs were very mind-blowing to me. So let's build on the previous category and start talking about the attack surface, right? So if you have, there are security experts in the community listening to this presentation right now, they would know that attack surface is more than just a vulnerability count. In the container landscape, you can maybe think about a combination of things like known vulnerabilities, their levels of criticality, files with special permissions and packages, just the things that we looked at, which could be a potential zero-day issue down the road, user being the root, et cetera. But if we look into the council on, even that was mind-blowing. Some of the popular containers that I looked at had more than 2,000 known vulnerabilities in them, okay? But that wasn't the most surprising insight, although it was interesting. What was surprising was the distribution of the severity of these vulnerabilities. About 20% of all vulnerabilities in these containers were belonging to a high critical severity category. I'll repeat that. 20% of all of the vulnerabilities that we looked at were belonging to a high critical severity category. And you can see that divided into different categories here, right? High critical, and I wanted to highlight the unknowns here as well. But you can see that. Interestingly, some of the innocent perceived as innocent, small spatial containers categories such as build tools, for example, DevOps and local development had the largest percentages of critical and high vulnerabilities, which could be a blind spot for our teams, for DevOps teams, for our developers, for DevSecOps teams. And since these are publicly available containers, attackers know which vulnerabilities they have, which packages they contain. And it means that if you are, again, not making a conscious effort to move these packages to remove these vulnerabilities from these systems, once an attacker discovers such a running container in your system, their first order of business is going to be trying to, trying those known exploit vectors, pat face, into your organization. So what comes next, right? So we talked about how size being a huge issue, how there's a lot of complexity, there's a lot of redundant packages, libraries, spatial permissions, how vulnerability counts and the criticality of those vulnerabilities are off the charts. This research was just the first phase of our in-depth container research. What we find is a varied invest, a very complex world that gives developers massive opportunities to scale. But it also presents risks to both the security and the productivity sites. So this data and our ongoing research will enable us questions that have yet to be explored, let alone answered by the industry. Things like how these containers are evolving and changing over time, not just from a security perspective, but in general, from an efficiency and productivity perspective. What are the supply chain implications of these vulnerabilities? What are the rippling effects on our technology ecosystem, right? More research will definitely enable a greater understanding of the cost, time, productivity impacts, even the evolution of the attack surface as it pertains to containers and the cloud native ecosystem. But ultimately, what do we want? We want our developers to have the best experience possible. We also want them not to feel constantly behind the fact that they are not enough, we don't want them to feel not enough, we don't want them to think that they need to comprehend every little detail. But at the same time, we want our systems to be production ready. We need to automate our container optimization processes and make these flows as smooth as possible to achieve thought. And if you learned anything from decades of studying complex systems in various different fields and differently in software, through using tools like systems dynamics, system dynamic programming and systems thinking, we should know that complex systems are inherently riskier. And as you have seen, as an industry, we have not focused on intelligent optimization of these containers that much yet. But I believe the future is bright. That's what my team at SLIM is working on. I know that there are other companies who are thinking a lot about this space and making sure that the developer experience gets better and better, while the production ready containers become the norm. And next time I speak, I fully believe that we will be making waves in this area and I'll be bringing much better news. So with that, I hope this was helpful. If there was any confusion, if you have any questions, please feel free to I'll be hanging out in the Q&A. And please reach out and connect. And we would love to hear your feedback on this research and the new research that we are working on going forward. Thank you.