 All right. So welcome everybody and thank you for taking the time to see our talk. We are happy to be here at the DEF CON Red Team Village. And we are very excited to share with you some of the findings that we've been doing from, since we did a talk in a previous Red Team Village Summit where we try to showcase and expose basically how dangerous is the leakage of credentials on the internet. And today me and Jose Hernandez, my colleague, are going to be presenting how my key is being pound API edition. So let's get going. So most of you know who we are. I'm a principal security research engineer in Splunk. I work with Jose at Prolexic, which is now Akamai. I work at Akamai for a while. Then I went to Caspita and came back at one point in Splunk. I co-founded Hack Miami and Pacific Hacker Meetups and Conferences and I wrote my own CDFs. Some of you may have Bladen, which is the command and control and of course CDF. Jose, can you tell us a little bit about yourself? Yeah, I'm also a principal security researcher at Splunk. I have an old long-time friend of mine from our Prolexic days, which got pretty much like Akamai. I co-founded a company called Xenich, which is now Oracle's web application firewall and lead-off services. And we turned to Splunk to do research, security research. This is one of the things that we've been working on that I'm super excited about. Awesome. So let's move on. So just before we get into basically what we're going to show you today, it is important that we recap a little bit or how we got to this point. And in order to recap to a point where we can understand how bad and how we got to this very bad situation, we need to understand or take a look at what is DevOps. So as we have approached it before and explained it, DevOps is a set of practices that basically is within software development and team operations. It has become very popular. It's not really new, but it has become very popular and it has been widely adopted as most companies are starting to somehow get a foot in cloud platforms. Some of them have radically moved most of their operations to the cloud. And when we're talking about developing software and producing software, building, coding it, planning it, testing it, releasing it, the first thing that comes to your mind is DevOps. And DevOps is a set of practices usually guided by some software development principles. And the most popular currently is Agile. And some of you might be familiar with Agile. If you have worked in a software development company, I work for now for three software development companies and then all of them were using Agile. And this is how I was exposed to not only the infrastructure, but the risks that are associated with this. So next slide, please. So one of the things that I noticed throughout the years, and me and Jose had been researching on, is that when you use these set of principles and divide it in things that are called tool chains, that go in the DevOps process, there's a number of products and a number of tools that are constantly used, reused, shared and repurposed. One of the characteristics of the, to say in a certain way, the platform that is used for development, software development is at times his ephemeral character, meaning, for example, that you can create containers, destroy them, and then simply recreate them again. So in order for us to give this a little more structure into what we can look at when we're trying to gauge the exposure and risks associated with DevOps, we had to look at DevOps tool chains. And DevOps tool chains basically are a combination of tools that aid in the delivery, development, management of software, and applications for the entire cycle. As you saw previously, there is an actual cycle where you code, you build your plan, release, you test it, and then you go back again. And if you are, for example, in specific methodologies, for example, agile, usually there are sprints and sprints are sets of times that you have to produce a number of features or bug fixes. And all of this is based in a flow that goes through what they call the tool chains. So in the tool chains, you can see there are things that are used for planning such as GIT or JIRA. There are things that are used for coding that involve coding, code repositories. There are things used for testing that implicate things such as Selenium or Bakern or Docker containers, for example. There are tools that are made for software build such as Ansible or Terraform or Chaff. And then, of course, there are things that are within the tool chains that they call deployment, which imply things such as Kubernetes. In Docker, for example, most of these things, like Kubernetes, at times are throughout the entire process. And depending on your cloud provider, you may be very familiar with some orchestration automation languages such as Ansible or Terraform. And then finally, part of this tool chain is the monitoring part. And then we have things usually the main two monitoring tools are either based on EOK, which is Elastic Search, and of course, Blank. Next, please. So once we have seen the picture of the tool chains that are associated with the software development plus the cycle throughout the flow of planning it, coding it, building it, testing it, releasing it, and then coming back, there's one thing that we're focusing today, and that is credentials. Credentials are part of the entire process. Credentials are needed for many reasons. And here's a little bit of some of the highlights of what happens in this process with credentials. So developers, for example, usually have high privilege credentials. Why? Because they had to be able to test things that would run with low privileges. They are supposed to develop things that may be kernel libraries, sockets that need to be created with services or connections. These environments at times are usually ephemeral, like I said, and as a result of that, many times they're dismissed and poorly monitored. We're seeing cases of developers downloading anything from Dock and Hub or who knows what have you, a container repository on the internet and simply putting them in their DevOps tool chains without any checking, without any scanning. And now you may have implanted containers, vulnerable libraries, vulnerable operating systems that will eventually affect and be published into your production environment. Obviously, this is related as well. There's a disconnection between development and security operations. I've been part of this problem many times that I have interacted with developers. Developers usually don't like when you tell them that their code has bugs or vulnerabilities. Most of the times there's not a straight link in between what they're developing. So I'll give you an example. You have a development department and they download, for example, all these containers or libraries or code. Many times there are no tools to verify this. Many times they're not inventory or what is it that they're downloading? What is it that they're using? What libraries they're putting into the software packages? So this by itself is at risk. And as well, this is also very popular and this is part of the nature of the DevOps process. There's just spread use of open source tools. And code. So basically many times I notice in this environment that this code is trusted by itself by the fall. Meaning they just go, oh, I know this developer. Oh, I know this group. I'm just going to download this and use it in my application. And again, this goes back to the disconnection. But it is, there is some sort of a honor code between the open source community where do no harm is always the driver of software development. But it does not mean that we have seen this in supply chain attacks that malicious actors, nation states, criminals in general, may target these repositories, these open source communities and embed by stuff in it. Also, embedded credentials usually end up in public repositories. And that's pretty much what we're going to show you today. How even big companies, how individuals that unfortunately there is no mechanisms. All in all, we don't blame them for what you're about to see. We don't think that they're purposely leaking this credentials. However, it is important that we point this out in order to bring awareness that there has to be mechanisms created in order to avoid the rampant. Because there's no other name for it. This rampant leakage of credentials and we will show today how bad it can get. So when we have also the development departments and DevOps processes where most of these developers have high privilege credentials or permissions, there is obviously a higher risk of insider because it takes less of an effort to cause harm to embed malicious stuff in it or to even destroy it. Like we had seen in some cases before where a person that was part of a development department has come back or a system has come back and do harm to an employer. And another couple of points is that due to the CSD nature, the continuous delivery, basically that cycle that goes around building, testing, developing and coming back to planning and on software, these things get published immediately. That's one of the nature of the DevOps process. The DevOps process has shortened the time where you plan, code build, test, release, basically becomes something almost immediate that goes into production. And this by itself represents a risk because when you, like I told somebody, when you have vulnerabilities in environments that are driven by CICD and things like, for example, an implanted container and you have very large environments, the risk and the opportunity of exploitation increases in orders of magnitude. So finally, and just to put this, to make this even worse, unfortunately, the cloud environments that make this risk even higher. Why? Because basically you are connected to the internet, you publish right away, you stage right away, and if there are attackers that are knowledgeable and are able to pretty much footprint your process, they can do a lot of harm. Next slide, please. So here's a little bit for you to have a reference of how the cloud providers manage credentials. We're going to focus mostly on cloud-related environments. So here's, for example, AWS, they have their own AIM credential service, which usually has things such as passwords, access keys, key pairs, or SSH keys. And also, they do have a number of temporary security credentials that can be created on the go and that sometimes have a feature, whereas you can give a specific user access to, temporary access to a resource that otherwise that user does not have access for. So next slide. Most of the providers have sort of similar systems for managing credentials. However, I had to get putters through Microsoft because they are trying actually to tackle this problem. And as you can see here, they do have, this is an example of how they manage credentials. They use the Azure. There's a framework within Azure Active Directory where it tries to avoid the embedding of credits and they use a different mechanism. We're not going to focus on this feature of Azure in this talk, but it's important for you to consider and look at it because they definitely seem to be aware of the issue. Next slide, please. And then here's, we wanted to give you a little sample of basically what's happening with the three main cloud providers, which is AWS, GCP, and Azure. In the case of GCP or Google Cloud, most of the stuff is based on OAuth, which is a protocol that's used for identity federation and single sign-on. And for the most part, because of the constant interaction and services that are present in cloud environments, obviously, these providers have to come up with a way to allow the interaction of either devices, users and services, dividing and trying to establish boundaries between these entities. And then as we will see soon, this is very challenging. Next slide. So here's a general, when you're looking at credentials, things that you need to consider that are usually used and as such potentially exposed when you have developers that are publishing or storing code in public repositories. So things as an email and password, username and password. Remember, there's a difference between your local, for example, A active directory or LDAP and a cloud identity access management. Sometimes this brings up a lot of confusion. And depending on the integration that you have with your cloud environment, this may or may not plan your favor, meaning if you're not very integrated, losing, for example, the username and password from a specific cloud service will not allow the attacker to access your internal environment. Also, multiple factor authentication is something that's been coming. And at times it can be bypassed by certain frameworks. We've done some work before with Evil Jinx, which basically is able to capture the authentication or the TLTP, whatever interface is presented to the user and bypass MFA. Access keys, we talked about it. Key pairs that we talked about it. Specific account identifiers. And at times they use extra 509 certificates. Next slide. So what are the primary source of leaked credentials? Well, as you will see soon, GitHub is probably the most popular code repository on the internet. GitHub is now used for many other things, the storing files, even hosting web pages, which is kind of cool. So GitHub is like the reference when it comes to the leading internet code repository, not only publicly, but many companies use it. And then we also have GitLab and we also have Amazon S3 bucket storages because the reason why I put this here is because you can definitely search for Amazon S3 buckets that are open or have writing or read privileges. And there's not only data stored in it, but tons of code with possibly embedded keys are usually found in this environment. Next slide. Next up. All right. So as I was explaining with GitHub and GitLab and even S3 buckets, they're not just the only source of leaked credentials. As you can see right now, I basically Google, Google dork aka, which is usually how Amazon Permanent Keys start. And I was able to find this snippet. Fortunately, the person that posted this sanitized his keys. But that doesn't mean this happens all the time. And with this, I just wanted to show you an example that it's not just code repositories. It can be anything. I actually had a friend that lost his username and password for his Gmail and he turned into an absolute nightmare. The attackers actually reset everything he had. And it took him like a week and even him being part of the community to get a hold of Google in order to reset this. So please be very careful with these things. And that was a, my friend was working for a very large company, but there are other examples that we're going to show you where the attacker may not be so obvious yet cause even more damage. So let's go on the next one. So before we can continue on this presentation, it is important for you to understand that when we look at the context and nomenclature of attacks on the internet, on the cloud or inside the perimeter, we always look in a minor cloud attack matrix in this case. Like I said before, we were going to focus on cloud related type of environments. So in this case, we're talking about basically unsecured credentials, unsecured credentials. But for example, if you leave a credential in an Amazon street pocket or just embedded in some code and GitHub can lead to things such as what is called ballot accounts and ballot accounts can be used for initial access, persistence, lateral movement and privilege escalation. And I'm going to give you a, an example of it as we move on in this, in this, this presentation. So just keep this in mind. So let's move on. One of the things that we, we're looking at here is a technique which is T1078.004, which is valid account, cloud accounts. So we're, you're obtaining cloud accounts that basically in one of the scenarios that we're going to propose, we're able to, to find access keys and then allow you to not only access the provider but move laterally and escalate privileges. This, this attack vectors are real. And many times they get dismissed because the company and being honest to you, they do not have an awareness of the reach of the cloud within their perimeters. As we move on in the cloud adoption, there is many hybrid environments and it is hybrid environments. Parts of your cloud infrastructure would allow access to your perimeter either because of the developers do it or because you are in IT operations and there are some servers that have some access to S3 buckets. For example, you have a WAN or you have a cloud VPN. So these are scenarios that are important to consider that are real. There is the line between the perimeter and the internet gets blurred or even disappear with the adoption of cloud technologies. And here's an example, something that you should read on. I know this is an evolving framework and we may be giving you a number that will change tomorrow or even the definition of it. But it's important to understand that there is some work associated with this attack vectors and these vulnerabilities. And this is what we're trying to showcase today. Next, please. So here, as I said this stage, I wanted to give you a little bit of an example and something that we are actually working right now and we will be presenting more research in the future, which is lateral movement and escalation of privilege by simply obtaining keys. You can obtain keys. Jose and I did a presentation, which was called Red Teaming Develops in the Mayan Summit of Red Team Village where we explained ways of either fishing or finding these credentials, which we're going to show part of it today. But for example, if you were able to get, sure if you remember, how I was able to Google for a permanent keys and AWS, which is our AKIA, you can do things as depending on the actual user, you can even create new trust roll policies. You can add yourself to roll trust policies. So for example, let's say there was a trust policy that allows you to access certain buckets where there is sensitive data. You will be able to basically access that data by simply starting from the compromise of these keys or you can create temporary keys by either STS Zoom roll or get session token. It will depend on the policies that are in place and the privilege of the users. However, the boundaries are not that strict and because this is an evolving and new technology, it's not that difficult or far-fetched to say that if you're able to get a permanent keys that have been leaked on the internet, you might be able to basically go around in a specific environment. So here's an example how you can abuse tokens specifically in AWS by either obtaining permanent keys or compromising a session where they're already using temporary keys which usually start by ASIA. And then from there, you can abuse temporary tokens using things such as a Zoom roll. A Zoom roll is a cross-account feature given by AWS to, for example, provide temporary access to a user to a resource that he may not have access to or things like get session token which are temporary tokens that can be used for specific features. So please be on the lookup for these things because this has not really exploited as publicly available but it's definitely something that is happening right now. Next, please. And with that, I'm going to pass it to Jose and we're going to see another demo. Hey, so give me two seconds here. I'm going to go ahead and share my terminal as well as the sites. Let me know when... Can you confirm you can see both the terminal and the sites? Perfect. Okay. So I want to do a demo live today if everybody's okay with that. I don't usually do this but I'm so confident that, again, I'm so confident about how comfortable this tool runs and finds leaks. I want to do it live. And what I'm going to show you in the demo today are basically three basic steps of how you would use the tool that we built to find liquid ventures in L. The first step is going to be installing it and deploying it. The second step is going to be searching for leaked AWS credentials and then the third step, we're going to just really quickly dig into that data that it gets generated from one of our searches for hunts. So the tool... I already have the tab here prepared. The tool is on GitHub. Unsurprisingly, we're searching GitHub for leaks and we're also using GitHub to host our code. Under DB is one. So I'm just going to go ahead and just clone the project really quickly here. And that's going to bring our project down. I'm just going to flip back to our slides. So just a few notes. Again, going back to just... Well, this is coding down. To Rod's point, this leaking credentials is totally... I've actually made these mistakes before. I've been in incident responses where colleagues have made these mistakes. It happens. And there's very few mechanisms today out of the box to protect it, but they're good mitigation mechanisms. Out there is just, again, to Rod's point, it hasn't exploded yet. So it's not actively being mitigated for either. So here we go. We've cloned down, gave a hunt. And the first step to install it is... I'm going to go prepare a virtual environment. And I'd use virtual AMP for this. So we're creating a virtual environment in Python 3 and this is just more for us to install over our dependencies separated from our system Python. And so the next step here is I'm going to activate and then install my required dependencies for the project. The project is pretty lightweight. There's not a whole lot of dependencies in use or it should complete pretty fast. And now I have a... Yeah, the tool should execute just perfectly fine. The tool is pretty straightforward. There's a config file that you really don't have to do much with it besides configuring the actual GitHub API token. So now that we've configured our token in there, we should be ready to go. And what I'm going to do here in this case is I'm going to use one of the example switches that the tool already has out of the box. By the way, the tool actually, before I jump into running it really quickly, I didn't walk you through really how the tool kind of functions. The first function of it is you pass it a search parameter and this is the equivalent of a GitHub advanced search. And we have a few examples in here, like how to find GCP JWT tokens, AWS API secrets, Azure JWT tokens, so on and so forth. And then what it does is it's going to go ahead and search GitHub for files that match these patterns. And then it reads every single file that returns in the results and it checks whether that file has a valid credential from a set of regexes. And by the way, huge kudos and credits to Trafalgar, which is the project we're actually borrowing a lot of these regexes from. But it's a two-step function tool, right? First, it searches for leaked, potentially leaked files, and then it verifies that inside those files we actually have credentials in them. In this case again, let's try to run an example for pulling back AWS secrets. And so once we've run it, the first thing we're going to get back is the total results that we found in GitHub for files that match this. And then the tool is going to go ahead and process every single result. And it's printing out here the actual URLs, checking whether there's actual credentials in them. And here we have a first kind of hit from Bill Rosie, where he actually committed a AWS secret. And if we actually go into this URI, should be able to see his AWA access key. I'm just going to open it here in my browser. And yeah, there you go. That's literally his secret key and it's out of the EU central region. So as I explained earlier with this key, we literally have the exact same permissions that Bill Rosie had in managing AWS. And so the tool, again, the tool runs and it's going to take a few minutes because there's 200, some of the results, right? We're about, you know, lead number 595. And it collects all this data and saves it into a JSON file essentially. Now, while the tool is running, I want to go ahead and show you a bit of... So we grabbed the data on this JSON file. I'll show you at the end of the execution what it looks like. But we grabbed all that data. We've been collecting this leaks for about seven days now. And I pulled some quick reports on what we've collected so far in the past seven days. And for example, for top leaks or top leaks by technology, we have by far a whole lot of AWS API keys out there that we've collected that happened. And second to that is GCP actually service accounts. I couldn't say I wasn't... This did not surprise me. I don't know, Rod, if you got surprised when you saw this data set, but this was kind of somewhat expected. AWS, by far, is the secret that gets leaked the most. Some surprising is still a lot of private keys where I say out there, which I was not expecting. Another thing that, again, just another curiosity is that AWS secrets, if you break this down by the last seven days, it doesn't variate a whole lot. It seems that by far, again, AWS seems to be the normal thing that gets leaked the most. A second to GCP, although what they call YouTube all tokens and the tool called YouTube all tokens are actually generic all tokens as well for Google. It just considers it like a specific YouTube one, but it's not. And so you can see that actually it variates every so often across days. And so for most of our people, they're leaking their Google Suite tokens, which is really what the YouTube all token are, or the GCP tokens, which is pretty bad. Again, now you have full access to whatever they can do with Google on their account. We did a breakdown as well by top companies. And mind you, where this data comes from is if the user, because we're searching GitHub and GitHub gives the users abilities to input things like what blog they have, what their Twitter handle is and the profile and what company they work for, we pull that information back when we have a match since that's open data. And this is just a very, very ugly pressure of what the most credentials by company that we collected were. XBioinch is by far the company that leaked the most or at least that was labeled the most, but we got some pretty big ones out here like Nordstrom, VMware. And again, these are at the top. Like you see here the other flag, it means that there's a bunch of credentials that were leaked, but not necessarily multiple of them. These are multiple credentials for some of these companies. Microsoft was in there. Yeah, pretty bad, at least to say. Yeah, so this is a very revealing because many times, and it's like what Jose was just saying, there doesn't seem to be awareness of how bad this is. And we're trying to show you a picture of we collected what, a week or so, and look all the stuff we have. And we actually, we basically did this for analytic purposes and to show awareness. We could have built even deeper into this data and who knows what we would have been able to get. And if we are doing it, bad guys are doing it too. So this is something that's very remarkable that you see some big names in there. I understand there is a number of mitigation issues like you can say, well, you can get my key, but if your IP is not in my security group, you will not be able to load the game. True, but we don't know that. And the fact that you put that or those keys are being leaked, it opens the possibility. It's almost like an open port. That's how I see it. It's almost like an open port of a vulnerable application. I wanted to say all this data also, and that's a really good point about this. Again, because this is what individuals have put in their company profile. We, there's some garbage in here, like this LinkedIn profile or ABC, which we necessarily haven't cleaned out of our data yet. But this is straight up, again, the users that we've collected or we've seen leaks for, what companies have put in their profile. And again, we also kind of wonder about this down by region. My favorite region so far that users have listed is in your heart. This is pretty sweet. But unsurprising, and actually, Rod, you made this point to me. Unsurprising, there is big cities that have a lot of development footprint. Like a lot of developers and a huge tech footprint are showing up here, right? New York, Mark, this is the March Club in New York, Santa Monica, San Francisco, Sweden, Russia, Pittsburgh. So again, cities that typically have huge companies that the development or a lot of developers in there. Again, the data, we haven't really cleaned up the data, hence what you have in your heart in here. But I just want to give you a really rough view of, like, if we were just trying to aggregate this really, really quickly, and we did it for seven days and what it looks like. Right. And if you were targeting a company, right, which is where you and I were talking yesterday, or we can hypothesize that by the number of leaks, and if we reduce that to the regions, because we know where most of the developers are working, even though that's changing a little bit because of the work from home, that sort of cuts off a little bit of the work that an adversary needs to do in order to try to infiltrate one of these big companies that by omission or willingly may have developers that are posting keys that are revealing too much. That's true, right? By the way, our tool finished here. I just want to give the viewers here a quick preview of what the results file gets written as. So if I just kind of pass this to JQ, since this is a valid JSON, you can see here the data we're collecting, right? So again, we're dumping every result that's getting matched into an array of JSONs here, or actually various JSON objects, and we collect the URL where we found the leak, the check that it actually matched, the different matches, right? So in the matches, purposely, when I'm away to not necessarily store the actual secret, but merely just the keys if possible, the owner, so who owns every repository that leaked that credential, the owner URL, the type. Again, if it's a company, it's going to get listed as a company. The name, email, again, if they listed the company in their GitHub profiles in their blog, the location if it got listed, and this is where this data is coming from over here, Twitter handle, so on and so forth, right? Some of these fields tend to be known if the users don't allow their profile in GitHub to show this data properly. That's the only way we can actually read it. But yeah, again, pretty telling. Pretty telling data set. And with that, actually, I kind of want to talk about one example that stood out to the CEO of a telecom. I'm going to say more than that. It was pretty wild that, again, because the CTO had listed on his GitHub profile, his Twitter handle, as well as his personal blog site, we were able to find his LinkedIn and essentially, for that matter. Pretty much everything. We found pretty much everything about this gentleman and we obviously, we have sanitized everything so you can't find it in yourself. But just like him, there's many, many cases of people that we found. Right, this is just an example. We've contacted them again to clean things up. Our intent is to make sure these things take clean up. Again, we want to bring awareness that this is a big issue. And again, I've made these mistakes in the past, but this is how bad things can get, essentially. It's amazing that you can go for a simple key to pretty much everything, from a lead key to opening the doors of your company, revealing your personal life, personal things. So please be careful with these things and try to implement some measures.