 Hello, everyone. I'm Alex Chinthavi. And I'm Marco Lanciini. Today, we're going to tell you about a free open source graph tool that we work on called cartography. Our talk is called Using Graphs to Improve and Scale Security Decision Making. It's an honor for us to present cartography here at Cloud Native Security Day. We hope you find it and the techniques we present today useful in your own teams. And we also look forward to receiving your feedback and contributions. So just to tell you a little bit about who we are. So I'm Alex. I'm a software engineer at Lyft. I have a background in red teaming cloud environments. So I kind of look at all these security problems from a little bit of an offensive slant. And I'm Marco Lanciini. I'm a cloud security engineer, a thought machine in the UK, and I'm also the curator of CloudSecrets.com. So I'd like to start off this presentation by telling you the story about why cartography was designed in the first place. It all began with a common problem that security teams at all cloud native companies are facing. Things are moving too fast. And when things move fast, there's lots of security and tech debt that accrues. And because things are moving so fast, there's not time to document things. And everybody develops their own sort of tribal knowledge with hypergrowth comes a large attack surface. And one of the problems that we aimed to, that we found ourselves needing to address was how can we understand, track, and manage our infra as it changes over time. This is compounded by the fact that modern infra is complicated. There are very, very complicated permissions models on all the major cloud providers. For example, you have with Amazon Web Services, you'll have Identity and Access Management, GCP, Google has their own solution for IAM. And so you'll have one solution that will set up identities and accesses in one way. And then you'll have this cloud system that'll do that. And then you'll also need to deal with plain old username and password pairs on a storage resource or database resource. And you need to answer the question, who can become whom? How does this IAM solution allow identities to become other identities? And how are you opening up your environment for transitive risk? And the question of which identities may access what resources is not immediately obvious. And with all of these multiple vendors, lots of knobs and dials, it's easy to get this wrong. And there are big consequences for getting it wrong. And the couple of the scenarios that we wanted to address here was we wanted something that could check and audit accesses. We wanted to understand policy grants for cloud resources. And we wanted to understand the effect of changes to network policy. And of course, we wanted it all yesterday and for minimal cost. Every company faces this problem where you have limited resources. We had a small team, and we needed to automate where it was practical. We had to aggressively prioritize projects. And even if you have a large team, you're still going to run into this problem of balancing everything. So what could we do about these problems? First, I have to give credit to Sasha Faust, who originally built cartography by asking this question. Can we apply an offensive security approach to these keep the lights on problems, these problems that have to deal with essentially running the business, keeping everything alive? What offensive approach was he actually talking about? Well, there's this famous quote. And if you've been in InfoSec the past couple of years, you've almost certainly seen this at some point. And that's, defenders think and lists, attackers think and graphs. As long as this is true, attackers win. Pardon the overused Drake meme. I'm honestly sorry about that. But this quote has been cited so much, it's almost become as ubiquitous as the Drake meme. But I promise that I'm bringing it up again for good reason. So John Laundert at Microsoft, who made this quote, has this influential paper where he says this. And the idea is that if you only look at your high-valued assets in terms of lists of people who may access them, you're missing out on the opportunities that an attacker is going to have moving laterally within your environment. So this particular example has to deal with a Windows on-prem domain environment, where an attacker could gain access to this terminal server, dump creds from memory, and then use those to move closer and closer to objective, dumping creds along the way. What are those graph paths to reach there? So in the cloud, doing a similar technique like this is even more effective because you can take advantage of the patchwork set of permissions models and multiple zigzaggy ways to reach your target assets. If you're a cloud-first company, cloud-native company, then we're all of your interest cloud-based. You have your security work cut out for you. And it became clear to us that we need a self-maintaining map. We wanted something that could highlight structural risks and answer hard-to-answer questions. We came up with these use cases that we wanted a central view or technical assets. We wanted something that we could help us in incident response, security research and blue teaming, and help us quickly finish the drudgery of compliance reports and audits. Essentially, such a graph would help us show non-obvious connections between one piece of our security posture to another. Existing solutions that we looked into, they were either really expensive, proprietary and locked down, or too focused and limited in scope. So the main thing that we wanted to look into was because we had many different vendors and many different data sources, the ability to quickly extend on the product and then build plugins for ourselves, that was a very key scenario. And so that would be a main blocker for us in considering some of the other products that we looked at. Enter cartography. So we're very, really excited having last year in March of 2019, we open sourced this idea to GitHub. Come check us out, github.com, lift cartography. So it is a Python tool that pulls data from many different sources and it puts them together in the form of a graph database. And so what you'll see is that what this approach lets us do is it lets us show these non-obvious relationships between each and every one of these assets. And I'm going to show you how we modeled our organization as a graph and why this approach has been so useful and effective for us. First off, I'll start off by showing that we can model our Okta infrastructure at lift as a graph. So this is an Okta organization. Okta is an identity provider. And as a person, as a human, I have an Okta user identity. I can be a member of a group. You can set up Okta to delegate access into your AWS infrastructure. So if you are a member of an Okta group, then you may assume an AWS role. You can become an Amazon identity. And those AWS identities, they are grouped up into accounts. An account is a notion of separation between business units. So many companies will set up different departments into AWS accounts, for example. It's meant as an organizational way for you to split up different concerns. We can tie this personnel data to Workday or any other HR system. So we're adding in this new context. You see that we can add in the reporting structure of an organization. We can add in who belongs to which teams. We can augment this further by adding, OK, well, we have Okta identities. Now let's use G Suite identities. Let's add that in there. We can get visibility onto the Chrome extensions that our organization installs using something called CR Excavator as well. And the thing I want to call out here is that every single one of these edges is if it was actually installed onto, if we were to represent this in a relational database, every one of these edges would be a join. And then if you were to do this for every single one of these edges, just typing join on table name, join on table name, this gets really old very quickly. So representing this in the graph allows us to see all of these connections in a lot more intuitive way. And I'm going to show you this scenario, making this a little bit more, hopefully a little more compelling. And then so the point here is that we have this extensible, pluggable platform. Your graph becomes most useful when you can take our existing modules and join them with your own. And doing so is a straightforward process. So the scenario that I was just telling you right now where I got myself and I have an Okta identity and I am the member of the developers group in Okta, this lets me assume an AWS role named developers. This developers role belongs to the developers AWS account. And OK, so far so good. This is how that AWS and Okta, it's a well-documented scenario. This is what we want to do. Now the problem here is, OK, I mentioned that it is possible in AWS for one role to assume another role for one identity to become another identity. And this is used as a feature of most IAM solutions. So as developers, we have discovered, using cartography, that I can assume this auditor's role. So as a developer, I am able to assume this additional role. Is this expected? Is this not? Let's find out. And then so with this auditor's role, this auditor's role belongs to the finance account. And like I mentioned earlier, AWS accounts are used as ways to split up different concerns. And so having the dev account and the finance account, the way that this organization was designed was for those things to be separate. And yet here we have this opportunity for developers to become auditors and view assets in the finance account, potentially violating some of our assumptions about how our organization is segmented off from each other and potentially highlighting some compliance issues, auditing issues, et cetera, et cetera. So you weren't to have this visibility in your organization to see opportunities to move between roles, especially when they cross trust boundaries. And we think that this is a compelling and powerful scenario that lends itself well to a graph approach. I'll talk briefly about how our tool does this. So the core part of cartography is this core sync. So we run a core sync. The core sync of cartography, like I said, it's a Python tool. And every single one of these data modules is something we call an Intel module. And what each Intel module does is that it will pull from its data source and write it to a graph database powered by Neo4j. And it will write that. And then all that data will be exposed either via a web browser interface or via an API powered by Bolt. And so that's basically the basic idea. And so we have also cleanup jobs. So what happens is that on every sync we set a timestamp so that we always keep the freshest data in the graph, and we run a cleanup job so that nodes that are no longer there in the graph, that are no longer there in real life, they get deleted and cleaned up. This last box on the right for enrichment jobs, after we have all this data in the graph, we can perform additional analysis and additional enhancements to that data. And this is another scenario of why it's useful to put this and represent the data in the way that we have. And I'm going to show you what this looks like right now. So a compelling scenario is, is my compute instance open to the internet? We've just run our cartography sync. We've got all these nodes in the graph. What can we do with all this data? Can we exploit it? Can we make use of it in a more intuitive way? So I want to answer this question. Is this EC2 instance at the bottom left-hand corner of your screen here? Is it open to the internet given all of the relationships that are available to it in the graph? So if you look at the top, we have an IP range representing the whole internet, 000 slash 0. So we can ask ourselves by starting to form this query using Cypher query language. And so we say that, OK, let's draw a path from an IP range with the zero subnet and connect it to AWS IP permission inbound rules, IP rules. Do any of those IP rules map back to an existing EC2 security group? If so, let's keep going. Let's keep drawing this path. Let's build on this query. So from an EC2 security group, can I map that back to a network interface and map it back to a Amazon EC2 instance? If so, if this path matches, let's set exposed internet on that instance to true. This way, what we can do later on is we can query this. We can query for all EC2 instances that have this exposed internet true label without needing to do this long query that I showed you right here, needing to go through this entire path traversal. And so it's easy to what you're able to do with our platform is come up with your own enhancements, come up with your own analysis jobs to perform these sorts of shortcuts to look at. And with that, I'm going to hand over the mic to Marco to tell you about a real life deployment. Hello, let me just shut my screen. OK. So as Alex was just saying, staying on top of cloud environments is a challenge that many organizations face. I work for a software company, building a core banking solution in a highly regulated environment. We are also, by default, a multi-cloud native organization. Hence, we needed a way to detect, identify, categorize, and visualize all the different assets we deployed in our state, regardless of the cloud provider in use, whether AWS, Azure, or GCP. Here, I want to describe briefly the process we went through to adopt cartography and to find a way to effectively use and also act upon the data we collected with it. Basically, we started with a very high level overview. This picture shows our multi-cloud setup at a glance. This is something we run in production and is up 24-7. The bundle of cartography and Neo4j runs in the Kubernetes cluster of GCP project we dedicate to internal tooling. And from there, we instructed cartography to pull assets from every GCP project and every AWS account in our state. To do so, we had to grant cartography the minimum set of permissions necessary to pull data without introducing risk to the infrastructure, as we said, highly regulated. For the AWS, we use the Habensburg model. For GCP, we rely on service accounts at the organization level. More in detail, the overall deployment is made of two main components, a stateful set for Neo4j and a Chrome job for cartography. If we start for Neo4j, the stateful set is made of two containers. One specific for the database itself and one dedicated to O2 proxy, which is a reverse proxy that we use to integrate with our identity provider. There are two services exposing the relevant parts. One service for all HTTP-based services like the Neo4j web interface and another service, the bold service for interacting with Neo4j programmatically, for example, what cartography uses. In addition, we have Kubernetes ingress used as an entry point for connecting to a database and also persistent volumes for storing the data of the database. And also we integrated Ashikov vault with our GKE cluster so to provide secrets to the running containers. For example, the Neo4j password is stored within vault and retrieved around time. The cartography setup instead is more simple. Cartography gets released as a Python package so it is fairly straightforward to containerize it. We then rely on a Chrome job, which is set to run daily. And as before, we integrated with vault so that cartography can fetch credentials for both AWS and GCP at runtime. So we have our setup with cartography running daily and pulling assets from all our environments. The next thing we wanted to see is how we could put the data collected to work. The most direct way to interact with this data is to connect to the browser built into Neo4j. You connect to the ingress and you can manually run queries. Although the Neo4j browser gives you full freedom to perform exploration of the data and it's quite powerful, this method is also completely manual as you will keep have to manually copy pasting queries in the web interface. So this doesn't scale as to many. So we had to change our approach. We wanted to be able to run queries automatically against our dataset. So first we had to define a structure format for storing these queries, which allowed us to enrich them with metadata. So we stored them in JSON as a list of dictionaries where each dictionary is a query enriched with metadata. We have name, description, tags for easy filtering, human readable list of fields and that's the raw metadata and reaching the main Cypher query. So we defined a custom query format and we define multiple queries that were important for our organization. Like as Alex was saying, we should see you are public to the world, S3 bucket with anonymous access, main IAM and many more. The last piece of the puzzle was to find a method for empowering people and teams to perform analysis of the collected data on demand. What better than Jupyter notebooks? They are already heavily used by the security community for investigation purposes. So it felt natural to create runbooks for self-service consumption of data. To start, we created a dashboard specific to three main domains, one dashboard specific for security, one for genetic inventory and one for networking for both AWS and GCP. However, we quickly realized that Jupyter notebooks on their own were a bit too restrictive and limited in their capabilities. And we started looking for alternatives which provided a better integration with the rest of the security tools that we already used in our organization. That's why we turned to elastic search next. Our security monitoring team already made extensive use of the elastic stack, hence integrating with elastic search was the most obvious option after Jupyter notebooks. In particular, we had two main goals in mind. The first was to provide security analyst with a current snapshot of the infrastructure so that cartography data could enrich security investigations. And the second was to alert on any new instance of drift. Since cartography itself can be used to detect drift within ephemeral environments. Here you can see the high level setup. You can note at the bottom right corner a new AWS account is a completely different account used by monitoring team. In there, we already have elastic search deployed for all the other security logging and already integrated with the last alert which can provide notification to JIRA and Alpsack. So what we had to do is creating integration between cartography and Neo4j and elastic search. We created a custom ingester deployed as a Kubernetes cronjo, which periodically pulls data from the Neo4j database and forwards them to elastic search. With cartography data now getting ingested into elastic search, we were able to start using many of the features of Kibana. The most direct way again is you can start by browsing the discovery section of Kibana, which as you can see in the screenshot will report data as it gets ingested. From there, we wanted to recreate the dashboards we already had in Jupyter and create more advanced one within Kibana itself. To aggregate a whole set of the infrastructure and give a quick glance of the main issues and misconfiguration we had at the moment. We ended up creating one Kibana visualization for each one of the custom cartography queries we created around 125. They're all open source, of course, like cartography. And then we went to create, aggregate these queries, these visualizations into dashboards. Again, starting from one specific for security, one for inventory, one for networking. And then also what we can do with elastic is to show trends over the time, but also not even also security related, we can provide data to other teams like SREs to see what kind of EMI's we are using different accounts and provide a history of how the accounts are working. These dashboards show an excerpt of some of these dashboards on some test data. As you can see from these screenshots, Kibana dashboards are perfect to provide snapshots of the current state. It's easy to see how does the environment look like today. And these visualizations can help to quickly identify specific misconfiguration. You can create a table to show all the CTO exposed to the internet, which is easy to see. However, these interaction will still have you on manual involvement and lacks the automation needed to be more proactive in remediating any potential misconfiguration. Analysts still needs to go login to Elasticsearch and look at this dashboard. We wanted something even more automated. That's why once in Elasticsearch, we had the chance to leverage further integrations with tools like Elastalert, as we were seeing before. So we defined Elastalert rules so that we can get notified in Slack for any misconfiguration and or drift. So for every new entry in any of the dashboards, especially for the most critical ones, we also get an alert on Slack so we can go and investigate. A question that someone might ask be, many companies are shifting to infrastructure as code and they're using Terraform. So why using cartography data to perform drift detection rather than Terraform itself? Well, Terraform provides drift detection capabilities out of the box and are excellent in detecting drift for resources managed by Terraform itself. What Terraform lacks is support for any other resource that might have been created with other means like the console via the command line. So that's why we decided to use cartography as a complement also for Terraform drift so to catch everything that can be created regardless of the source. And that's our journey so far. We also plan to integrate with many other providers that Alex was talking about like G Suite and others. But back to Alex now. Thank you very much, Marco. And I'd like to talk about what plans we have next on cartography. The first thing that we would like, the first big improvement we would like to bring back to cartography is to have a DAG based data sync, directed acyclic graph data sync so that our data sync is more reliable and that if one sync fails, then it doesn't fail the entire rest of your data sync. We would also like a nicer plug-in framework. I mentioned that we have a very extensible way of adding your own context and adding new data modules. That said, there are improvements to be made there. We would like to make this process even easier for new developers to get introduced to our platform. We also are working on near real-time updates. So I mentioned earlier that the main cartography sync works as a job that pulls in all assets from multiple sources and then wipes away the previous state, anything that is not there. This is slow. And so the thing that we would, an area that we see improvement on is being able to consume a stream of something, perhaps like AWS CloudTrail and other similar providers and then put in those changes little by little instead of in batch all at once. So just a couple of ideas we are exploring there. And as always, we are adding more data types to the graph because having more data makes things, well, the more the merrier, the more context you can throw at this, the more powerful your graph can be. So with that, I would like to end with my call to action. Go get started. Please go play with your own graph. Check out our tool at github.com slash cartography. Join our community, say hi on Slack, participate in our monthly video discussion. Tell us how to be useful for you. Your opinion matters. We're absolutely thrilled to be here at Cloud Native Security Day. Our goal is to grow and improve the project to meet that bar for high-quality open-source security tools and we want to continually make this, we want to continually improve this project. So with that, thank you very much for your time here today. And we hope to say hi to you directly on Slack or in any other venue. Thank you. Thank you everyone.