 So, for those of you who don't know me, my name's Armand Dodgar. You'll find me all around the internet as just Armand. And I'm one of the co-founders and CTO of HashiCorp. For those of you who don't know about HashiCorp, don't worry, it's very common. People know us much more for our tool names than they know the company. So, it's part of our regular education campaign that there is also a company behind it. Most folks are probably familiar with our vagrant tool, maybe Terraform. We make a handful of different tools, but really our overarching mission is looking at the challenges around delivering an application, whether you're an IT operator trying to figure out how to provision and manage infrastructure, whether you're a security engineer trying to figure out how do we deal with this brave new world of cloud and zero trust networks, or whether your application developer who just wants a runtime platform and want to find your upstream services and don't really care about the bits and bytes and how they all connect together. And so, because we try to look at so many different problems, we have quite a different handful of tools. So, I'm not going to spend a lot of time talking about most of them. Instead, I want to specifically spend time today on Vault and the security challenges we face as we go to the cloud. So briefly, I'm going to touch on a few different things. For folks who are less familiar with Vault, I'm going to give a quick intro there. What are some of the use cases? How does it work? And then, from there, talk about a new feature that I'm very excited about that landed in Vault 09 and then talk about Kubernetes integration since we're all here at KubeCon. So, for folks who've never heard about Vault, the way we like to describe it is it really solves three different challenges that are all pretty related. The first one, which is really why Vault exists, came out of the need to deliver secrets to our end applications. So, as we adopt tools like Chef and Puppet and Kubernetes and get more rich in our orchestration and deployment and automation, we have this interesting challenge, which is how do our applications actually get all of the credentials they need, whether it's database credentials, whether it's cloud API tokens, whether it's TLS SSL certificates, so on and so forth. We have all of this privileged material that our applications need. And as we're sort of driving towards this infrastructure as code vision, whether we're committing Chef code or Terraform code or our YAML configurations for Kubernetes into version control systems, we don't want to pollute all of that with sensitive information. Because otherwise, it's being stored in plain text. Most of the version control systems we use are not really designed for sensitive materials, so they don't really have things like access logs and fine-grained capabilities. And so, this is the first use case, how do we think about managing this privileged information? How do we deliver it to the applications that need it, but then still do it in an automated way so that these systems don't get in our way? Another sort of interesting class of challenge is, great, we're delivering our database credentials to our application and we're giving them a 32-byte encryption key and saying, God be with you, get the crypto right. This kind of works, but what we find in practice is that cryptography is really, really hard and key management is even harder. So, how do we make doing security right a lot easier? And so, how do we think about providing higher-level APIs, things like encryption as a service, so that we're not handing an application 32 random bytes and saying, have fun, but instead providing sort of higher-level APIs that are less error-prone? And then the last one is, as we're doing this, what about the people? We've spent a lot of time talking about machines and automations and applications, but there's still people in the loop. So, how do we as developers and operators and security people get access to the privileged systems that we need? And the last bit is that bit. It's the access management. So, with that, let's jump right into secret management. What does this world really look like? I think it starts first by understanding what a secret actually is. The way we like to think about it is, it's any piece of information that you can use for authentication or authorization. So, it's something that I can provide to another system that's granting me access to it or proving my identity or elevating my capability, right? So, here's a few examples of it. Sensitive information is sort of a different class. It's things we'd like to keep confidential but don't necessarily grant me additional access. So, maybe, you know, things that my customers order, I consider that sensitive, but it's not a secret piece of information. It's sensitive. And so, the open questions as we talk about secret management are really things like, okay, how do our applications get these secrets? How do our human operators get these secrets? How do we update these things? What happens if we need to revoke access? How do we know who's used what? And what do we do in the event of a compromise? These are sort of the hard questions in secret management that we'd like to have an answer to. The standard kind of state of the world, I guess, in some sense, the answer we typically see in many of these questions, is not great. Typically, it's what we would define as secret sprawl. These secrets are defined everywhere, whether hard-coded in an application, living in plain text in a config file, you know, strewn about GitHub or, you know, our puppet servers, so on and so forth. And so, these things kind of live everywhere. As a result of this, there's an extreme decentralization, which means we have relatively limited visibility into who's doing what, when, and where. And in the case of a compromise, we have, you know, minimal, if any, procedures around how to do break glass. How do we recover from this? And so, this was really when the kind of problem space that we started thinking about when we started working on Vault was, how do we get to a better set of answers for those questions than secret sprawl? And so, the kind of driving goal behind the system was to first and foremost get to a single source of truth, right? Moving away from this decentralization, this sort of sprawl to a single sort of broker for all of these values. And if we do that, then we have to recognize those two very different audiences for the secrets. There's our applications and our automation tooling, and they want automated API-driven access. And then we have our human operators who want the exact opposite of that, right? They want CLIs, they want UIs, they want to interact with it in a very different way. And so, as we think about these things, what's the right place to be on the spectrum between sort of theoretically perfect security and theoretically perfect usability, right? In some sense, I think this is a painful trade-off, but it is one that we have to deal with in security land. We can create these perfect systems that are so difficult to use that no user uses them, right? And that's what we see in practice is organizations who will adopt these things, and it's so onerous that one app uses it and everything else just skips it. And so, you have to find that balance of it being practical enough that applications will use it even if it's, you know, not perfect. So, not letting good be enemy of the perfect, or other way around. Last one is modern data center friendly, and what we really mean by that is, you know, pure software. It shouldn't depend on hardware devices because those things are relatively inaccessible if we're in a cloud environment. So, as we translate the sort of top-line goals we had, what does that mean in terms of the feature and functionality of all itself, right? The sort of table stakes for the system is secure bit storage, right? Bits in, bits out, encrypted at rest, encrypted in transit, and that sort of table stakes. Where I think things start getting a little more interesting is when we start talking about dynamic secrets, and I'll get into that. And there's a whole notion around how do we generate these things on demand, and what does that mean in terms of tracking their lifespan and dealing with things like revocation. And then we have all of our good bread and butter security things, right? How do we think about auditing? How do we think about fine-grained access control? How do we think about authentication? And so, while we're doing all of this, what are the set of principles we want to bring to bear, right? When we really look at, you know, what are the things that you have to care about when you look at security, they haven't changed a whole lot, right? The core bread and butter has been the same for a long time, right? We want to provide confidentiality of all data that's stored. We want to provide integrity. We want to provide high availability. But then more so, we want to do things like least privilege, right? Don't give access unnecessarily, because when someone inevitably gets compromised, right? There's a phishing attack and they push the link, right? The things that they're allowed to access will be accessed, and so minimizing what people have access to is a prudent thing to do. How do we get some element of privilege separation? So does everyone need to be an administrator? Can some people be read-only users? How do we sort of bracket who can do what? Privilege bracketing, you know, how do we time-bound access to things? So access doesn't mean access forever. Non-repudiation, how do we get to a place where we have sufficient audit logging that we can say with certainty someone did something at some time? And then how do we apply defense in depth? How do we not put all of our eggs in one basket? And so as we go through all these things, you know, I'll talk about how we kind of weave some of these principles in, because the goal is to bring all of these sort of security best practices into the way Vault works, into the workflow, but do it in a way that's sort of invisible. You're kind of applying best practices without having to be overly cognizant of it. So like I mentioned, the table stakes of the system is just secure bit storage. You want to be able to write something to the system and pull it back out, and that should be secured as it's going over the network, secured as it's at rest. And so, of course, we do that. Everything at rest is AES 256 bit, everything over the wires, TLS12. All of this is pure software, so there's no hardware security modules required. And the goal here is providing confidentiality and integrity. Doing it is actually pretty simple. So, you know, for all that we've mentioned so far, this is what it, you know, writing a secret and reading it back would look like. So here we're just going to write some arbitrary value. You know, bar is equal to bacon at path secret foo. The system's architected roughly like a hierarchical file system. And as we read it back, we get our arbitrary bars bacon back. Simple. So like I said, table stakes, the going gets interesting when we start talking about things like dynamic secrets. One of the challenges with something that's just a static bit store I put in bars equal to bacon is that every client that's reading that value out sees the same value. So to make this concrete, if I have 50 web servers that all go in and say, give me the database password, all 50 will see the same username and password. And so now we have this interesting challenge where it's like, what if that thing leaks? Which node was the point of compromise? All 50 of the machines were using the same credential. All of our developers who accessed it had the same credential. And so now it's really hard to understand, okay, where was the source of the problem? So we've lost that non-repudiation. We can't say with any certainty which actor was responsible. Instead, what we'd like to be able to have is a unique credential per client that's generated on demand. So when we see this client, this token that gets leaked, we actually have a very high level of certainty that actually it was Arman who accessed it at 2.45 p.m. And that credential was unique to him. And so how do we do this? This mechanism within Vault is called dynamic secrets. And so there's a few interesting fall-outs that come from this, and we'll talk about them. At the highest level, the way it works is pretty simple. A client, like myself, connects to Vault and says, I want a database credential. Vault ensures that I'm actually authenticated and able to do this and that I should be allowed. And then it goes and talks to the database and says, create a new dynamic username with random user, random user, random password. Restrict the set of things that username can do. So scope it down potentially with a grant statement. And then audit that Arman has created this thing with this particular unique username. And then hand it back to me. So as a user, I'm getting back a username and password. I go and connect to the database using my normal client driver. And I'm off to the races. In terms of the surface area of what Vault supports today, it's pretty huge. Almost any real RDBMS system. Pick your favorite NoSQL systems. Pick your favorite messaging queues. The list sort of goes on and on. And this is made possible because this is a plugin point. So what we've seen is the communities show up and say, hey, I need to support the system. And then they're like, here's the 300 lines of code that connects MongoDB or Cassandra or RabbitMQ. And this plugin is split away from the core. So this is one of the core tenets of the design of Vault is how do we minimize the size of our trusted compute base and have this sort of layered model for defense and depth. And so these plugins have very little access to like the innards of Vault and aren't really a security sensitive thing. And there's a clean separation of responsibility. What this comes with then is if this secret is dynamic and unique per client, we need some way of tracking it. We need some amount of metadata so that we know which client has access to what. And if these things are time bounded, when do they expire. And so Vault rolls all of that up into the concept of a lease. So when we generate a dynamic secret, the value is returned back with a lease. And so this lease says, this is for you, Arman. It's valid for the next two hours. This is your username and password. And the lease is something concrete that we can talk about. So we can renew a lease. We can revoke a lease. We can query what are the outstanding leases in the system. And so this actually lets us start defining break loss procedures. So as an operator, if I know my SQL was compromised, but I don't know who was exactly the source, I can go to Vault and say, you know what, just revoke all of the MySQL leases. And then we're going to do the forensics and figure out the point of compromise later. Or if you know it was just Arman's lease, just go in and revoke my lease, and there's no reason to impact all of the other outstanding leases. So why do this? This seems complicated. It seems like it adds a lot of nuance to the system. Partly it's because it gives us all four of these properties. It gives us this notion of privilege bracketing so that when I give you a credential for the database, it's not a forever credential. I can time bound as a valid for a day, a week, a month, and I can have some guarantee that the client is forced to move on to a new credential, otherwise that credential just gets wiped out. I get non-repudiation. So I know of my 50 web servers, this one was the point of compromise. I can also update. I can bound the time it takes for a secret to update. If I know every client is forced to play ball and respect this lease contract, then I have some certainty that if I change my policy, that within the time bound of the leases, all of my clients are going to get the updated version. And then the last one is it enables revocation because now we have this rich metadata on who has what, and it's unique. If it's a shared secret or it's something that's not time bounded, it's a lot harder. So then we get the sort of security bread and butter, the authentication, authorization, and auditing. As far as authentication goes, this is kind of an interesting part because here we have the split between what machines want to use and what applications want to use. As an application, you're not going to log in with a username and password and enter a 2FA code. You have to be able to use something like mutual TLS, bearer tokens, another mechanism we call app role, which is like a username and password for applications. And then users are going to do things like single sign-on with LDAP or Active Directory, maybe an OAuth provider like GitHub, something like that, a little more human friendly. And what we do with InVault is basically map down these different authentication mechanisms into a uniform authorization language. So you apply a consistent set of policy over the different users of the system, whether they're an application or a user. And so while there's many, many different authentication backends, there's one way you handle authorization. And the system models this as a default deny sort of system. So it's a need to know basis. And this is really in keeping with least privilege. Unless you really need access to a secret, you probably shouldn't have it because that's just an unnecessary surface area of attack. And then with request response logging, everything can flow through a number of audit backends. So you can configure, you know, one to n audit backends. And the idea behind allowing multiple is it allows us to design the system in a fail-closed way. So we can allow you to set up two different auditing backends and then make the assertion that if none of the backends allow the request to get logged, the system rejects the request. So why would you do this? This allows us to protect that non-repudiation property. So if I as a malicious operator know how the system is set up, I don't go in and just turn off Splunk for a little while, pull out a bunch of secrets, and then, you know, now that it's convenient, turn Splunk back on. Instead, you'd much rather the system say, you know what, I can't log this, so no, it's better to have an audit trail than to have this sort of gap in the history. So one of the interesting properties of a system like this is that you're pulling secrets out from it for your app to actually be able to function. Your app doesn't have the database credential until it talks to Vault and gets it. And so you kind of care about availability of a system like this, right? So from very, very beginning, Vault 0.1, the system was highly available. At that time, it only supported console. Now it supports etcd and zookeeper and a few other systems for doing leader election. And so you kind of point Vault at these systems and it'll automatically do an active standby model. So one of the Vaults becomes active and services request. The other one's forward things to the active instance. And if it fails, it will automatically flip over to one of the stand-byes. And this is providing high availability. One of the challenges of Vault is if we're providing this promise where we say data is encrypted and transit and at rest, it means we're encrypting Vault's own data. And so when Vault starts, it has an interesting problem, which is how does it decrypt its own data? One of the decisions we made very, very early on was to say this key must be provided online. You can't provide the unsealed key to Vault via a configuration file or a CLI flag, because inevitably what would happen is that that config file that gets managed by Chef would have the key back in it, which then lives back in GitHub. And so the entire sort of system becomes defeated. So this unsealed key, the one that allows the system to continue to function, in some sense acts as this key to the kingdom. Because if we had this key, we could just bypass all of the rules and access controls that are set up in the system. We would just go to the cold storage, decrypt whatever we wanted since we have the decryption key and not bother with auditing and access controls. And so we have to be very, very sensitive with this particular key. Particularly, we have to be worried about insider attack, people who do have access to the system and may have access to this key. So the approach we take is what's referred to as a multi-man rule. And the idea behind it is you actually just don't give out the master key to anyone. So Vault has this notion of a key hierarchy. So the sort of most inner keys, the keys that protect all of the data Vault stores are referred to as encryption keys. These can be rotated online all day and night. Nobody has access to these things. They're sort of an internal detail. That key, that key ring, I should say, is protected by the master key. The master key then allows you to get to the key ring, which then allows you to get to all of the other things that are stored. And so this is really the true sensitive one. This key doesn't even get stored anywhere. Instead, what we do is we split it out into a number of key shares. So we use this algorithm called Sharmir key splitting, where we'll split this one key until, call it five, 10, 50 parts, whatever you want it to be. And you need to provide some threshold of them that are then recombined together to rebuild the master key. So by default, Vault says, I'm going to take the master and split it into five and you need to provide three keys. And then what you do is you give these keys out to different key holding officers at the company. And when you need to unseal Vault, you know, it's kind of like the red October scene, right? Everyone puts in their keys and turns it. And once we get to the threshold, Vault is able to rebuild the master key, which is able to get back to the encryption key ring and boom, we're in business. This can be slightly annoying if we're doing automation and doing things like putting it inside of, you know, systems that are going to move us around. So there are mechanisms for doing it in an automated way. Effectively, we replace our human key holders with a machine key holder that we trust. And so this could either be, you know, a hardware security module that we're racking, or we say, you know what, we trust our cloud provider enough to be our key holder and we're going to delegate this to them. So there's ways of automating this, but out of the box, the sort of Shamir splitting approach is the default. So sort of high level summary, what we really set out to solve with Vault is the secret sprawl problem. It's really looking at saying, how do we get away from this decentralization, this sort of sprawl of definition where we have low access control, low visibility, bring it all into one place, where we can have a uniform policy, uniform way of auditing, uniform way of authorizing everything. And in doing so, we're really thinking about two different types of threats. The bigger, more real one is insider threat. It's, you know, folks that should have some access to the system. You know, it's people who are inside or are going to have access to our private GitHub and all of our keys and passwords are in there, right? And so how do we think about that? That's really about imposing things like ACLs and secret sharing and sort of isolation of the secrets. And then there's the external threat problem. People should have no access to the system at all, and that's really where having a strong crypto system comes into play. And lastly, we really want to do this in a way that applies security principles, but without making this painful, right? Like keeping to that practicality. So for folks that are familiar with Vault and maybe started using it earlier, a feature that we, you know, recently announced in Vault 0.9, which is only a few weeks, is the stronger notion of identity. And so at its core, what this is really about is saying, you know what, there's actually entities who are going to use the system and interact with the system in a sort of a longer-term persistent way. Like Arman as a user is going to interact with this on a daily, weekly, monthly basis. And so there's kind of different things you can do, different ways of thinking about it, when you actually model these entities as a first-class thing. So an entity could be a single person, it could be a single system, it could be a single application. It's sort of a, you know, a bit of a flexible concept in that sense. And what it lets us do is assign policies and metadata to an entity. So we can say, hey, Arman actually has this other metadata I'm going to annotate with, because I want visibility on that in audit logs, or I want to assign, you know, special or maybe more restricted privileges to this entity. And then what Vault lets us do is map this to multiple different aliases. So I might have one persistent entity, which is Arman, but I might exist in many different ways, which is my GitHub username and my, you know, LDAP sign-on, and maybe I have an X509 cert that's assigned to me. But all these things still represent the same entity. So whether I'm using a smart card that represents me or I'm typing in my LDAP password, it's the same person. So I should have potentially the same level of access without having to deal with me in three or four different ways. What this lets us do once we start modeling in this more sort of first class way is start grouping people together. So entities can be placed into arbitrary group hierarchies. So then you can start building sort of a group notion in a much richer way, right? And so, you know, one question is like, well, why deal with this in Vault if I have a notion of groups and something like LDAP? The challenge becomes, how do you start bridging all of this where you have one set of apps where your source of truth, your sort of set of trust, is Amazon. You're using Amazon as a trusted third party to say, hey, this is a web server, and you have users who are coming in via LDAP and you have a separate set of applications that are authenticating against your Kubernetes cluster and GKE. There's no longer a single identity provider, right? There's no single source of truth of who's in my web server group, right? And, you know, my Kubernetes cluster being in my web server group means I'm a pod with a particular tag, and Amazon it might mean I have a certain tag on my VM or I'm a certain AMI, and in LDAP it means I'm in the web dev group. And so what this really starts to let us do is sort of take a step back from the platform-specific nature of our identity and map all of these things in and to sort of a more cohesive shared set of identity. So I can just go in and say, you know what, my whole web dev group has one set of authorization and it doesn't matter whether they're authenticating against my Kubernetes cluster or as an Amazon AMI or they're coming in via LDAP. And so that's what it's really about is allowing it to be much simpler to have this consistent way of doing it so you're not sort of repeating the work three or four times. I guess I should have just gone to this slide. But yes, the basic idea is bridging across multiple different sources of identity as we're sort of losing this single notion of truth. So moving on a little bit now is the Kubernetes auth backend. We released this one with Vault 083 and we worked pretty closely with the folks at Google and other members of the community who have much more expertise in the platform than we do. And so the basic idea at the very, very highest level is how do we bind something in Kubernetes that's going to provide a sense of identity for our application, our pod, to something that Vault can actually verify in a sort of a cryptographically trustable way that we can assert strongly the identity of the other end, basically the caller. And so the mechanism that Vault really relies on here is the notion of a service account. And so what we're doing is binding roles within Vault to roles within Kubernetes using these shared service accounts. The nice thing about doing those is that we have this cryptographic property we can lean on of the JWTs to actually make sure that, hey, is this just a fabricated thing the pod has come up with or is this actually officially signed by the platform that we're trusting? And the nice thing is because these are built into Kubernetes already, there's no additional moving pieces. So for folks who have been kind of following the kind of Vault and Kubernetes, you know, the evolution of the integration story, for a while there was a few different approaches but they required using different service brokers. So there was a few different moving pieces that you had to kind of coordinate and as different APIs would break on the two sides, you had to kind of you were getting the paper cuts of that. So the nice thing here is it's a pure native integration of Vault, it's a pure native integration in Kubernetes and there's no additional moving pieces. So just to flow through what does it actually look like sort of as the kind of the bits flow. In this sort of starting state we have Vault running somewhere maybe within Kubernetes, maybe outside of Kubernetes, doesn't matter for this example, and then we have a pod that's been scheduled somewhere with a service account token. So the first thing that happens is when our container boots, it's reading the service account's Jot, this might be the app or it might be a sidecar that's doing some of this lifting on behalf of the app but we read the Jot then we send it over to Vault and say here's a thing that should prove that I am who I say I am, I'd like to have access to Vault now. Then what Vault does is verify that this Jot is actually cryptographically signed and valid, this thing says it is who it claims to be, and then we go even further and call into Kubernetes token review API and say can you also verify this and return any additional information you might have about this entity. And then finally once we've checked it out everything is verified and looks good to go, then we return the token back to the container. And at this point the container is holding a Vault token. So the nice thing about this integration as opposed to a few others that were sort of considered is this means the application can make use of any of Vault's capabilities. So not just thinking about Vault as sort of a back end for Kubernetes secrets where we're going to just do treat it as sort of a bit locker, but we can actually leverage dynamic secrets, cryptographic offload, dynamic SSL certificates, the whole sort of gamut of Vault is available because we have an actual Vault token. In terms of how we actually configure this thing, it's relatively straightforward. So the first thing we do in Vaultland is enable the Kubernetes back end. So everything in Vault is sort of off by the default so you have to sort of opt in. So we opt in and say I'm going to set up a Kubernetes back end as a system that I trust. And then the first thing we have to do is configure it and say where is the Kubernetes cluster? What are the CA certificates? How do we establish trust of the platform that we want to authenticate against? So this is sort of a one-time configuration. We set up the sort of linkage between Vault and our cluster. Then what we do is for each role that we care about, this might be on sort of a per service level, we define what that mapping is between something we trust in Kubernetes and the policy definitions within Vault. So here we're just writing a demo role where we say, okay, if you see anything that has the Vault service account in the default namespace, map that into Vault's Kube auth policy. So anything that logs in with that service account will get whatever is granted to it by the Kube auth policy within Vault. And then on the side of the client, let's say now we've actually scheduled something that has access to that jot, it boots up, grabs its jot, and it's effectively issuing the API equivalent of this call, probably not using the CLI, and providing the jot and saying, hey, I'm logging in as this demo role. And so once of all verifies it, you know, out comes the token. So here we'll see sort of the standard login where we're getting a token back as well as the policy that's associated. In this case, we can see the default as well as the Kube auth policies are associated with this token and some amount of metadata. So the metadata just helps us within sort of the audit trail understand what's going on, how does this thing tie back to the Kubernetes system. So briefly just to close out, you know, when we think about Vault, there's these three primary kind of things we talk about. The first one is really secret management and how we thread through these credentials, especially in an automated way, right? What we don't want to do is boot a machine and then file a ticket against our security team and say, by the way, this machine is running somewhere, you know, please go get a database password and put it on that machine, right? We'd like to be able to do end to end automation and bring our secrets along with us. The second use case that we didn't talk about at all is encryption as a service, which really is about how do we push the key management out of our applications and up to Vault and let Vault handle key versioning, key rotationing, key decommissioning, and then just provide high level APIs to say encrypt, decrypt, sign, verify, HMAC, whole nine yards. And the final bit is great, we've centralized all this, we've achieved, you know, we've defeated secret sprawl, everything is in one place. How do our human operators access everything, right? We don't want them to interface just with, you know, an API that's sort of painful. So instead, really thinking about identity and access control and sort of user management as a first class challenge within Vault. And so within that we sort of talked on the secret management bits, we talked about the Kubernetes integration, but there's lots and lots we didn't cover. So if this was interesting, please check out either Vault's website or, you know, search YouTube, there's a bunch of other talks on this, or check out the HashiHort blog. Thank you so much. And, you know, quick, there's some stickers up in the front. So if you use any of the Hashi tools Vault included, please help yourself. Thanks so much.