 that has become increasingly important to pay attention to from a security perspective. So that's what mostly we're going to talk about today. So we'll kind of level set a little bit on kind of open source today, how folks interact with open source, kind of what we've seen when working with clients and organizations. We'll also share a set of best practices, right? Best practices that we've seen on, you know, how exactly do we start to step into securing our supply chain, right? And kind of, I would say that with everyone's experience here at the summit, you know, this space is still being fleshed out, right? There's not necessarily a kind of full solution or industry standard. And so we'll share the best practices for our clients that we've had. We'll also share pitfalls, kind of what we've seen organizations do. You know, it's fine that they do it, but necessarily, you know, doesn't necessarily bring them any safer, right, to securing their supply chain. In fact, they're still opening themselves up to a lot of risk. And so we'll kind of share what those risk profiles look like. We'll also share a case study. I think it's a really, really good case study. It's about how a specific security engineer was able to use a manipulative tactic, right, called dependency confusion on creating a package, uploading it to a public repo. And then we'll have just a little bit of a discussion on kind of what that would look like in the anaconda space or how, you know, with us, that would be a type of situation that wouldn't be as significant. And that's really where it comes to how anaconda helps, right? So at the end of the day, we're all here to learn about CVEs and securing their software supply chain. We also want to kind of share what anaconda is doing and what our technologies are to kind of help you in this space. So as a brief level set, right, what is happening today? You know, someone in the organization obviously has a need for Python and R, so they reach out to the open network. They download a cond installer or a Python installer, and it's preconfigured to, you know, the public network. And at that point, they're installing it directly to their workstations, right? And so for the organization, for the administration team or the security team, there simply isn't any oversight with that approach, right? And while this may get the job done, right, the reason this is significant is with those public repositories, especially in this space now, right, anyone can upload build, sorry, anyone can build upload a package and have it be downloaded or mirrored inside of an organization and cause some significant harm. And that's actually one of the things that we'll talk about during our case study. So what are these different primary approaches, right? In the previous slide, we talked about kind of the main activity one does to get started with open source, but once open source starts being used inside of an organization, right, how do those organizations tend to manage that supply chain? First and foremost, we speak to lots of organizations and clients that there's no oversight. You know, it's simply, hey, I've got a team of 20, 30, 40, 50 folks, you know, they need it for business purposes. They need it for the analytics that we heavily depend on. However, we've mostly been content with them just reaching out to these, you know, open networks, reaching out to these repositories and downloading these artifacts, right? And while this may be an approach, it does bring some significant risk, right? While sure it gets the job done, ultimately, you know, a single real vulnerability or ransomware attack through that approach can cost as much as $1.8 million. That's been the average ransomware attack on a couple of studies we've analyzed. But then there's also manual processes, right? And it's kind of a step in a better direction. It's, hey, we do have a security team, however, it's quite manual. And that manual process introduces a lot of pain, right? Not only does it do pain, but it's also, are we getting the most productivity out of our end users through this approach, right? So it's both. And really what that means is a manual process is, hey, they've approved some packages for me. I need a new package. I have to open up a ticket. You know, they spend a week or two analyzing it. I finally get that approved. But guess what? It's two weeks after, you know, I really needed to use it. So that's kind of going back to that productivity aspect. And then there's an automated approach, right? And there's also a reason I put an asterisk, which I'll get to. But the automated approach is, hey, we've got a security team. They've used, you know, tons of other scanning tools. I'm sure they've got this, right? And while that may be true, you know, as we've come to learn through the summit, there's so many different aspects to security rather than just the score rather than just the CVE, right? And so even though there may be automated approaches today, they may not be the same data or information that the vulnerabilities are being compared to, right? There's proprietary data that we provide that we'll kind of talk to as the builders' maintainers that doesn't necessarily compare to public sources or doesn't necessarily compare to simple scanning tools or malware tools that some IT security teams might use. But ultimately, that automated section is where most organizations want to get to. But because of that asterisk, it's just important to do so with the right information. And so obviously everyone here is very familiar with the Python space, right? Packages depend on packages and there's dependencies. And so this is just a visualization. But as we can see, it's a complex web, right? So already it's a complex web, right? And this can actually be, this is currently a 2D image, but if you really think about it, you can actually add, you know, a couple of other dimensions as well, right? Versions, for example, right? And so this ultimately is what happens to organizations when they're using Python or using packages, especially those that just have an open network approach, is ultimately these dependencies, these packages may not have vulnerabilities, but they have ties to packages or dependencies that have vulnerabilities. And if we think about extrapolating this to a 3D image, right, if we could just add another axis, you know, now imagine all the different builds and all the different versions, right? All of a sudden this becomes this huge, huge web of complexity and inside of it, you know, from a conda perspective, there's mostly gray area of which artifacts should or shouldn't be used, which ones are affected, which dependencies are vulnerable, which ones aren't. And so this is often, you know, the clarity or understanding from an information perspective that organizations need at least, you know, start to secure their open source pipeline or make sure that at least they get started on the right foot. And so there's clarity that we provide, right? There's clarity that we provide on, you know, which lines are red, which lines are green, and another thing to think about is this is just kind of what it is at this current snapshot, right? Tomorrow, you know, certain red lines turn green, certain green lines turn red, certain gray lines turn red, right? And so it's a constant evolving ecosystem and it's not stopping, right? This is something that's absolutely not stopping in this space. And just a testament to that, right? Conda itself, we have 28 million users, right? And so we can see, you know, quite a drastic increase in conda. We're seeing quite a drastic increase in PyPI in all sorts of programming languages that use packages and artifacts, right? And so this is great for the ecosystem. This is great for the community. It obviously means that enterprise organizations are getting a lot of value out of it. But at the same time, right, there's a ton of vulnerabilities that are popping up. I believe in the last six years alone, it's tripled, right? And so we're going to see more and more. And so while that does seem like a scary item, it's not necessarily something that should prohibit us from using this technology, right? Kind of same thing happened with Microsoft, right, and Windows. It was a very popular OS. There was a bunch of malware and vulnerabilities created, right, to attack that specific OS. However, it flourishes very successfully today, right? And so the same thing in the Python ecosystem space is, sure, while there's an increase in Python usage and there's an increase in vulnerabilities, how can we be more intelligent, right, about which components we do or do not use? We obviously need it. This risk is becoming more and more, you know, apparent and detrimental. But how do we actually be intelligent about it? And so I'm not going to hit you guys over the head with it. I'm sure, you know, other talks and other folks have during this week of the summit. But there's been tons of attacks, right, that we all are familiar with. Lock for J, SolarWinds, you know, tons of smaller ones that may not, you know, catch the headlines but still impact organizations heavily. One that we'll specifically be talking about in this session is actually that top left one. It's the dependency confusion. And it's actually a very common exploit tactic that people use when it comes to public repositories. And so this is obviously very significant. It impacts, you know, tons of people, you know, folks in open source, folks in commercial entities. And this is, you know, also why we have a cyber executive order, right, and why it's been kind of signed earlier this year. And so there's an initiative to have a better understanding as to what is in your software supply chain and how to secure that and how to actually, you know, be able to prove that these components are there, these components are safe, you know, I've done my due diligence and, you know, this is the item that we can present, right. And so this affects all industries, right, manufacturing, energy, health care, finance. So again, it's still, the space is still being fleshed out. Obviously we have, you know, pressure at the federal level. We have, you know, a need at the user level but we also have risk, right. So with a combination of these three factors, it's quite an interesting time in the open source ecosystem but we'll start to talk about kind of how things can help. So obviously we know that this is significant, right. So what is the process of stepping up one's program from an open source perspective? Well first and foremost we really have to understand which artifacts are being used inside of the organization, right, who's using what in what context. So we really have to understand our pipeline, right. That's really the first piece of the puzzle. Now once we understand that pipeline, we can now take security actions against those components, against that pipeline. We can start to filter items out. We can start to gain visibility into, hey, which components were downloaded. What is the security aspect of that, right. And while we can do that, you know, at a package by package level, which folks do in that manual process that we're getting to, ultimately because of how often this CVE space changes, right, based on that web, ultimately this should be automated. But it can't necessarily be automated just with simple public CVE data, right. That's often just the tip of the iceberg when it comes to CVE information, right. That's just kind of publicly rumored data, right. That's kind of based on the reports and chitter-chatter of the internet, but it isn't necessarily validated, right. There's kind of steps that we do to help with that. But ultimately that's what you want to do is you want to automate it with, you know, intelligent and curated security data to ensure that you're being, you know, as risk-free as possible, right. And then once you're also taking in these actions, the next step is, hey, once I am impacted by vulnerability, right, how do I remediate that? So there's a few other technologies and steps we're doing at Anaconda to provide some information on how to remediate those vulnerabilities. And so this is a little bit similar to the last slide, but if we think about this open-source maturity curve, right, again, someone has to get started somehow, right. So it starts with that initial download, you know, users inside of the organization start to interact with Python or condo via that network access. And that's really where that oversight, lack of oversight or visibility comes from. But then we start to step into a centralized approach, right. And so now instead of, you know, business groups or different teams accessing packages from the open web, right, with unknown visibility on our end, let's centralize that in one place. So where, you know, we have one source of the truth and folks can't necessarily reach out to the open network. And once we have that centralization, right, and we understand which components are actually being used, we'll effectively have an understanding as to what the security statuses of those are. And from a security status perspective, right, CV information can be updated as often as every four hours. So it's important to kind of stay on top of it, but it's also a personal decision on how often you choose to update that CVE data. And ultimately, automating it, right. And automating it means a little bit more than just beyond filtering, right, kind of, again, this space is being fleshed out. And so that's often where folks are very focused on is automating, filtering those components out. But the next step of this puzzle is how do you actually automate monitoring, right. How do you, how do you wake up one day and say, hey, I've identified based on, you know, the vulnerability information of today, which user groups are impacted, which production jobs are impacted. And so this really kind of brings us into the best practices, right. And so to kind of further echo that sentiment, you know, creating that central server or that central package repository is going to be really, really key in this process. It allows you to quickly identify, right, which components your organization should be using, which one, you know, which packages your organization will be supporting, right, from an enterprise perspective. And it also allows you to gain visibility, not only into the packages that people are being, that are being used, but also gaining access to the history and audit, right, and the logs. So you can see that, hey, I noticed, you know, one of my user groups, you know, downloaded this vulnerable package a week ago, right. The vulnerability only came out yesterday, but I've noticed, you know, in the past two weeks of the whole team that's been developing on this vulnerable package, let's start to create a mechanism where we notify them so we can stop development on that build, right, kind of kill two birds with one stone. We don't want to continue to build with a vulnerable package, nor do we want to, you know, hit production and be like, oh, well, we have to rework this. And then that's again where the automation of enforcement comes into play, right, creating and maintaining an audit trail. So a couple of the pitfalls to avoid, this is kind of what we've seen in this space. Ultimately, everyone is in pursuit, right, of innovation and security. That's why we use open source, that's why we do analytics, that's why we do the things that we do. But some of the pitfalls that I've noticed when talking to certain clients is there really are certain extremes of this spectrum. You know, on the left-hand side, we have a completely locked-down approach. This is where folks say, hey, you know, we're very, very particularly sensitive about this, and because of it, you know, often they can't make any progress on their initiatives. And then there's also a kind of fully ignore approach, right? This is, hey, you know, my users are using it, it gets the job done, I'm building my models, I'm hitting production, that's great, you know, it's fine, right? But while that may get the job done, that, you know, inherently poses a lot of risk. Poses a lot of risk by allowing, you know, dependencies to get into your system, by allowing vulnerabilities to get into your system, and now, you know, you might have a lock for J, or you know, some other type of attack, or lots of vulnerabilities. And so in the middle, we also have a manual. Actually, one last thing I wanted to say about the high-risk approaches, I was kind of thinking about it, and I was thinking about kind of what analogy would be applicable here, and it's kind of like getting in the car and getting from point A to point B. I can do it without a seatbelt, right? And I can get there, I can get to the grocery store, I can get everything that I need to get done, right? And that cost, and that cost is an inherent cost directly, but it's that risk that's associated to it, right? It's the risk of not wearing the seatbelt to the grocery store, and it's the risk of not understanding which components are in your pipeline. Is it really, is it worth that risk, right? That ultimately is what it comes down to. And that really kind of brings us into the manual process, right? And those are exactly those folks that are like, hey, maybe it's not worth the risk of not completely paying attention, we've got to do something. The manual process comes into play. And that manual process, again, is an antiquated system of request approval. Let me request these packages and dependencies, okay? It ships over to a different team, a different team uses various public tools or very basic security items, and we call it security, right? We call it vulnerability information, but again, very, very basic type of security scanning done, right? And through that process, not only is it a basic type of analysis, but we've also wasted, you know, two weeks, right, of where that developer could have really used that package for whatever that initiative was. And so that's really kind of what ends up happening is you have folks that make no progress, folks that, you know, get the job done, but at what cost, you know, from a risk perspective, and then a little bit of an antiquated system here in the middle. One last thing to say about the manual process is, again, if we go back to that web, right, that web of lines, that web of lines changes constantly, right? And manual efforts, you know, individual security teams or, you know, a team of five at an organization isn't necessarily going to be able to keep up, right, with the ever-evolving space of thousands and thousands of packages, dependencies, builds, versions, OSs, and more, right? And so these are just all kind of key things to keep into consideration, right? Again, we may be able to get the job done today, but there are some larger things that really need to be thought about, right, as we take this more seriously. And why should we take it more seriously, right? This is actually a really good example of something that has actually happened, right? And it hasn't happened necessarily in a complete and malicious way. This is a security researcher, Alex Verson. Hopefully I'm saying that name correctly. It's a medium article. We'll link it in some of the materials afterwards as well. But he essentially went on an experiment of, hey, can I identify packages that enterprise organizations depend on? Can I build packages that mimic these packages or look like these packages? And can I get these packages inside of that organization to then send information back, information such as, you know, the host name or IP or basic, you know, desktop information, right? And spoiler alert, you know, he was successful, right? But ultimately, that was the question that kind of started this experiment. And so that's really what he did. And so this screenshot here is actually kind of what the dependency tree was in GitHub. So he would go to GitHub, he would see these code commits. He would identify that, hey, there are certain packages that they depend on that are public packages, right? Those items in blue. And there are certain packages that they've created themselves that they depend on those items in red. Well, what if I went ahead and built those packages out of them to a public repo, I can obviously see that their workloads depend on it, right? And depending on their configuration of various people's desktops, right, my package actually might slip through. And that's effectively what happened. So with the understanding of which packages to build, right, he went ahead and built those packages. Those packages were uploaded to a public repository. And again, I do want to come back to this notion of a public repo and a curated repo in a second. So just keep that in the back of your mind. And that's effectively what happened is those packages, right? Either two things happen. One, those desktops were connecting directly to the open networks and downloading it directly onto the machine, or possibly even worse, in existing organizations, you know, Artifactory, Sonotype, or, you know, Artifact Repository would mirror these packages, right, based on the requirements, based on the need, and then host them internally and distribute it to their internal users, right? And so that's effectively what happened is those packages started hitting desktops, they started hitting servers, virtual machines, right? And effectively they're now all within the organization. And it did send information back to Alex. Information such as host name, the user, basic IP address information. And so that was information that he shared directly back with these entities, right? Obviously he was a security researcher. This was done in kind of, you know, an effort for improvement, right, and a bounty, but it highlights how easily it is to take advantage of these public repositories. And these are just with techniques of today, right? Techniques that we know of today. And perhaps the reason I wanted to kind of bring up that public repository and the curated repository notion is, hey, Python and, you know, PIP and Anaconda, they're both Python, right? But there are some slight variations to them. And one of the variations is the condo repo by us is a curated repo. We don't necessarily allow, you know, anonymous people or one-off projects to be added to our repository. We build them from source. We build them from the project maintainer. We curate them to determine, you know, which popular packages are needed by the ecosystem and community. And so that's a big difference is, you know, the activities that Alex, you know, has done with a public repository that wouldn't necessarily be possible with a curated repository, right? And it's not necessarily a conversation to sway PIP users to use condo, but it's a conversation of, hey, for supporting your existing condo users today, right? That's an important characteristic to know and have a little bit more comfort and relief in that, hey, when I'm using condo, right, at least I'm staying away from, you know, at least these lower-level types of attacks. And so now that we've kind of been through this full cycle of how it's used, the PIP falls, kind of best practices, let's also talk about kind of the maturity of a program, right? We obviously gave a real-world use case now that this is actually a real-world type of example, right? What are the critical questions that, you know, you have to start answering to build this type of program? It's first and foremost that comes to that trusted repository, right? Do you have a central place where you can confidently say that these are my artifacts, these are the vulnerability scores associated to them? I'm taking, you know, the right packages out, you know, I'm taking packages that are confirmed active out of the pipeline. I'm, you know, staying away from certain license types because of my legal or security policies. Number two is, hey, do you know what's actually being used, right? And once you know what's actually being used, can you identify where it is in your pipeline, right? If a vulnerability comes out in the morning, can you quickly identify and say, hey, this vulnerability is currently available, right, to this user base or these groups? How can I immediately remove that item? How can I immediately, you know, reduce my risk? And so while we can do that at a, you know, point-in-time, case-by-case basis, there's also a structure of, hey, because of its evolvement, you know, a quick evolvement in this ecosystem, right? How can we proactively do this? How can we automate this? And then finally, if you are effectively hit, right, how do you know where you're hit, how bad, and what do you do about that hit, right? How do you remediate it? How do you locate it, notify that end user? You know, how do you, you know, get an understanding of what package to use in its place? And so here's just a couple pieces of information of essentially how, you know, Anaconda is stepping into this space from an information or technology perspective, right? Ultimately, right, because of that curated repository notion, we're building these packages from scratch. And by building these packages from scratch, effectively, those same builders and maintainers are the same folks that actually review the public database. And I think I have a, yeah. And so if we really wanted to look at this, right, we have our left, which is public data, right? Another analogy in case it's fun. The way I like to think about it is kind of like kids at recess, right, or at the playground. This is just kind of the chitter-chatter. This is the internet chitter-chatter of what we've been hearing based on the different pockets of the internet. Support posts, various third-party advisories, forums, GitHub comments, and it's an aggregation of, hey, this is initially what we think that risk score to be. This is initially what we think, you know, those configurations from a package version, package OS perspective, right, build. So what we do is not only just simply match, right, and that's what a lot of other scanning services do. That's also kind of why we were making the point earlier around, hey, just because you have automated policies in place, it's important to understand what data feeds, right, those automated policies are happening against. And so that's really where we start to enrich that data feed. And so when we speak about this human curation, it's effectively those same builders, maintainers of the packages themselves. They're the same folks that actually curate those CVEs, right? And so we're really taking that analogy of, you know, chitter-chatter on the playground, right, to actually doing our due diligence and determining what is true and what isn't. And through that process, we enrich that CVE data, right? And that really kind of brings us back to this right side, which is what does that enrichment really look like? And it's the clarity of, hey, which components, which builds, which OSs are actually active. Hey, this is coming from Anaconda, right? The builders, maintainers of that repo, of that distribution. And there's lots of instances where actually we'll do a lot of cleanup behind NIST, right? And what that cleanup really looks like is, hey, we've cleared this package. This particular version isn't applicable, right, to that vulnerability. You know, sure, your scanning service, right, may rely simply on public data. But again, we've gone a little bit further, right? And so that's really where the Anaconda CVE data comes into play. And so some of that cleanup looks like that cleared flag, where in fact, we've cleared a certain package. Hey, we've actually found that this vulnerability doesn't apply to this version or perhaps it's not even applicable to this package at all, right? This happens quite often. There's also a mitigated one, which is, hey, this publicly looks like it's still risky, but in fact, we've applied a bug fix. There's been a code patch, right? So this is actually something that you should totally, right, feel comfortable on using, but we give you visibility as to this was affected at one point. And then there's a disputed flag, right? Which is, hey, the original project maintainers themselves from the source are pushing back against it, and we're just letting you know. So if we think about the conversation that we've had thus far, right, at this point, it's all information, okay? But now that we have information, again, how do we take that next step, right? And that next step is now that we have that critical understanding of the packages or CB information itself, right? How do we set policies in place, right? And these policies essentially are a combination of your logic and the information that we provide. You know, some organizations may want to have security level scores ranging from, you know, six, seven, eight. Some may want to actually completely filter out active vulnerabilities. Some may want to keep vulnerabilities if they have a very low risk score, right? That might be because they still depend on that package, or it's just, you know, not too risky for them yet. And so this is completely customizable. And if you have your own logic, your own rules, this is also where, you know, we can help. And this also brings us to a very good conversation around we work with very, very large organizations, very small organizations. We work with organizations that are doing this in lots of different ways, right? And I think there is, yeah. And so the reason I wanted to highlight this is when it comes to some of the logos or brands that you see here, it's not like this is net new information for them, right? In fact, they're doing a lot of security items themselves, either with other tools or, you know, they've got their own security folks. But even because of that, even through this process, we're still tightly integrated with them. And we're tightly integrated with them because of that CVE enrichment data that we have. And it's okay, you know, even if you have your own, you know, storage, even if you have your own existing security policies today, ultimately what we're articulating is that CVE data feed, right? And so I've worked with many organizations. In fact, one of my kind of large financial organizations has thousands of developers. They've got a few different scanning tools. They've got a very proprietary system of how they approve packages and how they move packages, you know, from analysis to dev use, right? And even through that process, KONDA and our KONDA data is a key metric or key information feed. And so it's a complementary approach. It's not necessarily a replacement approach. It's more of a complementary of, hey, this is CVE information coming directly from the builders themselves. And so with that information, right, we can ingest those policies, we can enforce them, we can automate this and keep it up to date, right? And so this is really kind of those first two to three steps into securing your open source pipeline, right? It's that centralization, it's that visibility, and then it's also that enforcement of those components going forward. And yeah, like I mentioned, you know, there are obviously various different industries, lots of different use cases, and we work with these guys in lots of different ways. So some of them just have, you know, our CVE data feed, some of them have lots of different CVE data feeds, and we're tightly, tightly integrated with them at all different levels. And so because supply chain is still being figured out, right, based on our experience here, we'd love to hear from you. So there's actually security consultations that we do. If there are various questions, use cases of your own. If there are items that you're already doing that you're trying to understand how Anaconda might, you know, be helpful in that process, feel free to set up a security consultation. You'll get paired up with one of our security consultants or even myself. And if you'd like to talk about some of the other security aspects, right, if you don't necessarily want a consultation itself, but if you want a deeper dive on security, our CVE data, and perhaps maybe some other ways that we help in the open source ecosystem beyond security, right? There's obviously the focus of this summit, but there's also lots of work that we do in the development, collaboration, deployment space. So, you know, we're Anaconda. We have lots of ton folks, lots of, a ton of smart folks, excuse me, it's Friday afternoon. So happy to help. If there are any questions, I'll stick around. But if not, thank you for attending. But when we heard about the log4j motor abilities, we used a tool called Starfish. And that broke it down to user ID level. Does condo do the same thing? So we do have a logging mic. So the question, if we're still doing it, to link it to the user ID, is that fair? Yeah. So we do. So there is a capability. So I would say there's two ways to do that currently. One, there's an API. So there's an API where you can actually ping, you know, hey, what has a user actually used in the past, you know, 48 hours or a week long, right? That's one capability. And then there's also an audit log within Anaconda. And that audit log essentially tracks when artifacts were downloaded, right? And so that would be able to tie when that event happened to the user itself. So the way it would work is instead of pointing to the public network, so the question was, how exactly would this work, right? Well, I guess the change, what change would need be, would need to be implemented, right? For this to be possible, right? How do I gain visibility? So the change at that point really is just simply changing the configuration of KONDA, instead of pointing to the public repo, to point it to your server, your repo itself. So that's the first change that needs to be made. But by doing that change, when I'm pulling a package down, I'm interacting with it, so I have to pass over my credentials, right? So I pass over my credentials, and the technology logs that download the artifact. So those are the two kind of components. Awesome. Any other questions for a late Friday afternoon session? Cool. Well, I appreciate it. And if you do have any other questions, feel free to reach out to Anaconda via those links. So thanks, everyone. Have a good weekend.