 conversation if I say let's trucks it up and my dinosaur there and here we go. I'm assuming everybody in this room can answer this question or at least feels pretty comfortable that they know what the vault is. Show of hands, are you using the vault? Do you know what the vault is? Some people are going to be watching this and maybe don't so we're going to do it quick. These are the three main functions of the vault. When we talk about the vault, we're talking about either a security value store, a secret generation tool that's able to generate dynamic secrets from a bunch of different locations, or you're talking about the encryption of the service tool to transit back end. In Zipcar we're primarily using the security value store and we're sometimes doing dynamic secret generation. We never use the transport back end. So when I start talking about using the vault at scale and in multiple data centers, it's not, none of that information is about transit back end usage. If you are using the transit back end, maybe there's another talk that's good. So this is kind of the network diagram of a basic vault interaction. An application authenticates the vault in some nebulous way, uses the secrets that it needs to communicate with services that need authentication. We use this model quite a lot too. You see ICD set up, needs to read some secrets from vault, awesome, builds up an app manifest, can pick a file, something like that, and your application can run and communicate with services. Maybe you've got a scheduled environment, you've got some kind of scheduler that we have, a lot of containers. Same deal, right? Scheduler authenticates the vault in some nebulous way, reads, secure keys, gets values that it needs to do its work. When you generate secrets, same deal, right? Weird way to authenticate the vault, whatever you pick, the vault can also reach out to some service on the app's behalf and generate a secret and give it back to the app. The nice thing about that is that you can now manage that application's access through vault. You can revoke its access, you can audit it, see what's going on, how often it's requesting access, super useful and cool. Vault has tons of secret services, I couldn't fit them all in a slide, I don't know how to do two columns, I'm sorry. But vault's also for people, right? And if you have a lot of environments, we have a lot more than three. And so it's kind of wild, you're a person, you're an operator, and you need to bounce around between multiple vaults. And so in this car, we have some of this global vault that stores high-level infrastructure things, compiled versions of benefits, stuff like that. And then inside each environment, we have an environment-specific vault that's all secret for just that environment. Maybe certain things from global end up in there as well, like an API key that's common across the board or whatever, so that any application does not have to contact the global vault, the vault for people and the scripts that people learn. And that's a pain. And so does anybody have multiple vaults with multiple vaults? If you don't and you're already using vault, you will. Sooner or later, soon it's going to say, let's do a separate vault for that. If you need to bounce between vaults, SAFE is an incredible CLI tool to do that. It's from Stark and Wayne, my good friend Jay Kron, who's a big contributor to that. And it has a SAFE target command, which is going to capture your authentication information, your tokens from vaults. Which is not really any different than what Vault CLI does, which you can also vault. It's going to write in your home directory a dot vault token, so there's no real additional security documents there. And it allows pass-throughs on their vault plans, which is super useful, because SAFE doesn't support it. It's not a one-to-one match with the API. They're not trying to replace the Vault CLI. They're trying to give you helpers. Vault CLI can do things like list paths, which is useful, but it's not very recursive. There's not a lot of recursive tools in the CLI, and that's by design. SAFE on the other hand does have those things. It lets you visualize paths and trees. It lets you do recursive removals and know you've got everything that's awesome. And there's lots of other helpful wrappers that we're going to discuss shortly. This is what targets looks like. List of vaults. You pick one. You switch. It tells you which one you're on. Great. So the pass-through looks like. That's just a regular vault code we look up for man. You can see in this particular case, this person was authenticated as a through token, and you get information about that. This could be any vault command. SAFE, Vault, whatever you would run normally if you were running Vault CLI. This is what some of that recursive tooling looks like. The nice thing I think specifically about SAFE paths is if you run just Vault and List, you don't get the full secret path, and that means if you're writing automation scripts, you need to then build up that whole path to that secret again, which is a little more error prone. Whereas in this case, you can dump a whole bunch of paths and rip out the one that you want and get the whole path to the secret, which is cool. Tree can also be useful if you're trying to troubleshoot a problem and you want to see what the state of your vault is. So as a part, the way that we bounce between all of our vaults is we have a single infrastructure repo for all the deployments that we manage, and that contains operator scripts, maintenance scripts, and metadata for each environment. Each environment has a folder, and then there's a bunch of metadata in there. So that's everything from, you know, we use Bosch, anybody in the Cloud Foundry, Bosch community? Cool. So, you know, we're bouncing between compiling manifests to deploy infrastructure and that's where those scripts live. And those metadata files that they're communicating with are the source of truth for what your infrastructure should look like. And this is a quick bash example. You can imagine vault address being, like, you know, fetched from some metafile instead of being hard coded like it is. One thing to know, this script is going to switch between vaults and it doesn't require you to say, okay, every operator that might use vault needs to use the exact same alias name. Because that's, you know, string magic, not fun. The URL, however, really can't change. If you're talking to the right goal, it's the URL. Unless you're seeing this in my deck, but I want to get all scared now. I'm going to talk about that. It's a different talk. You will notice that safe informational output, so safe targets, for example, that outputs that kind of thing, that's all on standard error. So you'll see this script is pointing standard error to standard out. Those orders familiar with bash, that's that wacky two and one thing is, then you just grab for your address and you use awk to get the proper one. And if it's not, you can tell the operator, hey, there's no way to do this. So that's how we safely bounce around, yeah, safely bounce around between our vaults in automation scripts. So if you have multiple vaults you think you might, you should try out safe. And you should write custom scripts that you see. Now, that's how vault works for people. I want to dive into the main portion of this talk, which is like, what do I do by scale? What am I talking about when I'm saying we're managing a lot of vaults for sure? Zipcar has a lot of users. We're in 10 countries. We have 25 production environments. Some of those are like multi-tenant white labels, not that it matters. And we have over 6,000 container instances running across all of our environments right now. We have 12 h&a vault performance, which is just part of a larger kind of service landscape at Zipcar. And our engineers are shipping code to production all day, every day. And when I look at this slide, I get out of it that vault has to be up. If you're using vault in any of those network diagrams we talked about before, vault being down is a huge problem, because so many pieces of infrastructure, so many people are relying on it. And sure, the apps that might be running may continue to run, depending on how you set it up, if vault is down for some time. But all of the reasons you're using schedulers, all of the reasons you have self-healing infrastructure, if your vault service is down, you no longer have that. You just have old, broken infrastructure that makes everyone sad. And so we want to make sure that we can deal with that kind of thing. And I don't think this list of kind of scale points, these are the things I think about when I think about scale. I don't think this list is specific to a vault. This is for anything that you want to scale. You have to understand how it's going to behave under load. You have to be able to define what your load is going to be. You need to be able to meet and measure out the lay requirements. This isn't necessarily like a lawyer is going to come and make you sign it or make your manager sign it. It might just be, as I said, your colleagues are depending on your infrastructure to be up to do their work. Maybe you've got multiple environments, multiple data centers. You certainly should be confident in how the application is supposed to work. So you can spot the novel. And I don't think you can effectively do scale without good tooling because you can't rapidly iterate. Every mistake is that much worse. I'm like, well, it's fail often, right? They fail early, fail often. But if you can't iterate to recover from that, it just kind of being, again, broken and making everyone sad. So how does vault scale? And remember, this is how we're using vault. Not trans back end. Mostly secure key values. Vault's pretty chill. It's not doing a lot. It's got, you know, we have maybe 89 microhertz, average load, compute lives on vault. Most vault servers, we've got four gigabytes in active memory. Do need a lot of network performance. Now vault's not transmitting huge packets or anything like that. But it certainly is going to have a lot of concurrent connections, especially as your infrastructure grows. So, here's the consumption. What should we monitor? What? Yes, you don't want it. One thing that's crazy. You have some agent-based setup that's going to send like, oh no. You have some agent-based setup that's sending full resource consumption information. You've got logs. You have the audit back end to be able to see what's going on. And this is a DevOps conference, right? So, if you're not sure what I mean by monitoring everything, I know for a fact there are lots of people in that cycle around the room that want to sell you something. So, so you will know when your single point of failure is down. And I know, like, is anybody here dealing with like compliance issues, stuff like that? Like, oh, okay, yeah. So you read that last slide like this, right? It's like, there's nothing else on the slide. This big bowl, a lawyer is going to be so mad and I'm going to get fired. When somebody asks this, you, that's not a thing. You can't do that. You've got this option too. You could be the person who says, I don't know. That's not you. Don't do that. So here's what I say. What kind of down are we talking about? When both are the four main ways that I have found the ball can go down. My favorite, not down. Best. Best one. Sealed. It can be ripped. It can be gone. You know, your server fell out of the sky. Somebody deleted the whole thing. It's not good. It's the first place to go. Health check endpoint. If you haven't used this, oh, it's so good. You can pass so many parameters to this so that whatever health checking thing you are using, you get the status code you need. You can take every state of law. It's enumerated. You can pass something and say, send this, send this status. If you've got just basic monitoring, it's like, is it 200? Bring in 200. That second option is what you want. Stand by OK equals true. Because standby in AHA mode, and we'll get to that in a little, so if you don't know what I'm talking about yet, hopefully you will by the end of this talk. Stand by is OK, right? It's unsealed. It's in a forward request. It's fine. You can say, stand by OK equals true. If you can't do that and you're fancy, stand by unsealed, ready to work, but you know, if you're chilling right now, it is 429 steps. So if you can configure yourself to do 229, then you're probably going to be fine. Maybe it's not down. Maybe it seems like it down. The person who told you it was down can't get at the thing they want. And because they're not connected to VPN, or your buddy changed the firewall rules, and that was sad. My favorite thing to do is talk and look at it. Ask them how they're offering to vault. Try to find there, you know, you can look at metadata for current leases. We'll talk about that in a little bit. Vault token will look at a great thing. It'll list what policies that user has. Maybe they're just trying to access something they're not allowed to access. Tokens also have assessors. So if you don't know exactly what token they're using, or they don't want to give it to you, and they don't know how to give you the assessor instead, it is a non-secret reference value to a token. All you can do with an assessor is revoke it. You can also look up the thing to just see. Because like they're telling you it's down and then you go and revoke it. If it's sealed, you have auto-logging and you can only figure out why it was sealed. Maybe it was an accident. Maybe it's actually a secure response that you want Vault to have stayed sealed. Just unseal it. This is another easy one. It makes you feel good. Like, I can fix this. But you probably have your own sealed keys stored somewhere secure. And if you don't, we'll talk about that in a minute. But when you initialize a Vault, and any of y'all who have a Vault, so we're going to talk about five keys for you. And you have to give three of them back. That's both Sumir's sheep sharing. Sumir is the SNRSA. And it's an awesome way to make sure that like no single developer, no single team can like totally own your Vault server. But we have a lot of ways to manage access to information in the enterprise. Right? Like we can use our password managers to set teams and we can see what people are accessing. So you might not need or use Sumir's sheep sharing. And if that's the case, I suggest you re-key with a single key and make your life easier. Don't paste a bunch of keys in there, right? Just re-key with a single key. Re-keying is a whole other thing. The documentation is pretty good about it. The idea of that is to spread the keys around. And if you group the keys together and put it in one place, you already have a single key. So don't get scared just because it's encryption. If it's bricked, you have more questions than anything. And this is where, you know, monitoring everything comes into play. Why is it correct? Is it taking a long time? Does Vault need more resources? Is it under some kind of attack? Give it what it wants. Sometimes though, you don't know how to do that. And I have great news, because there is a new on-call support number for HashiCorp. I believe it rings directly to Mitchell's phone. And you just, it's very easy to remember a number. You just pick up the phone and you dial, Oh, one, one, eight, nine, nine, nine. Eight, one, one, nine, nine. Nine, one, one, nine, seven, two, five. Don't leave me hanging. Three. Yes! Your vault's gone. It's terrible. Mitchell's not answering. It's fine. I'll turn around. We get scared because it's encryption. It's a single point of failure. Everybody needs it. It has to be up. I can't shut it down. You can shut it down. Maybe. Three questions you need to answer for yourself and your team. And this is the main theme now. So if you've got notebooks, this is the plan. How reliably can we boot the right version of the binary in the right place? How reliably can we reproduce the right configuration in the right place? And how reliably can we restore the back-end data storage? Vault is a totally stateless app. We do that kind of thing all the time. HA, this is the most complicated vault network diagram available. I checked. It's no big deal. It's no big deal. You have to answer these questions for yourself and your team to iterate on getting the answers back. Look at where you're at today. How fast is it going to take you to get that thing back up? Just like you were letting out an application, right? Use a consistent version. Vault is so friendly. It tries to not be breaking compatibility. But there are certain upgrades. So it's best, if you're in a disaster scenario, to not have more unknowns, right? So if you know you're going to spin up the same version, you're already ahead of the game. Templatizer config. I'm legally required to talk about infrastructure as code at this conference. So here's that slide. It does not matter what you use. It really doesn't. If you can recreate your config in a reliable, safe way that people in your team understand, you win. This is the number one thing that somebody is going to adopt vault and they say, hey, you like vault a lot. And I say, I do. And then they say, what would you want me to know about vault? And I would say, pick a back end you already know. Vault supports over 18 back ends. It's crazy. Those are the ones that all support H&A mode that you may or may not need. Some of those are cluster technologies that are kind of tricky to manage if you're not already doing. Don't learn vault and console. Don't learn vault and XCD. If you don't know how to restore XCD, don't put something that's a single point of failure and have it back with XCD. Use something that you feel comfortable with. Why live rough? Live life smooth. And you might say, well, I'm already stuck because I have vault today. It's not going to have to be complicated. We can migrate our back end. It's going to be so easy. You can still go both parties without the green in a mall in $2,000. I got three more questions for you. I have a lot more. This slide has three more questions for you. Do you have a lot of dynamic secrets in your config? Do you know how to reconfigure? Well, wait. You want to be able to reconfigure. Do you have a lot of auth options configured or are you not scared of reconfiguring? Do you have a limited access to vault? Are you not a root user? Because if you're not a root user, don't try to migrate vault because it's going to be stuff that you leave and it's going to be sad. Make sure you're a root user before you do what I'm about to show you. You can use safe. I have low pay relationship with this first command, safe export, goes into your key value store, decrypts everything, dumps it into a file. Oh, so awesome. But also, oh my God, what? No. You're going to be a root user and I'm going to put everything in a file. Oh, it's bad. But if you trade off, you just do that once and then you delete that file. That's winning again. Don't be scared. That's it. You're done. Reconfigure your auth back end. Reconfigure your dynamic secrets. We have a few vaults that we were worried about and we wanted to move off of one back end and onto another. And we remember that the data involved is just like any other data. It's encrypted, it's garbled, it's nonsense. That kind of works in our benefit because you can't effectively query the garbled, nonsense data. The schema that vault uses in all of its back ends is really, really easy to understand. And you move the entire back end, everything, isn't it? You know that everything you have worked. Users with active leases, the leases still work. Your same unsealed keys still work. Your root token still works. What the time said is you take the data you get and you turn it into a loop. Boom, that's a one line. Muscle KV export, the Dynamo DB import. One line of JQ it may have taken me two days. But I understand that's what makes me a senior engineer. I can take two days until I got it. No, you see what I'm saying, right? Okay, there's not a lot going on here. You got one back end is going to put the default at the end and the other one has a little field for it. Please I can do that. Maybe all today a few gotchas. They burned me. They maybe have burned you. If you feel preyed, and I hope it feels safe here, as I'm talking about these, just raise your hand if it's in here if you've been burned. It's all attainable. The way it horizontally is dealable. Unless you're paying for all enterprise, BallHA mode is a leader to follow so if one gets sealed the next one can take the request and you're all good. But if the reason you're getting sealed because you're getting DDoS and you DNS and you load balancer to put everything at the other vault server you just seal the gap. It also means when you think about transferring an HA vault setup you really need to have each node large enough to handle all the traffic. Each node needs to be able to do everything that the other node can do. That's what it looks like. In reality, there is a connection between the secondary and the data store but it's only in the beginning, just to say ham's here. So I quote this from the documentation. I don't know who wrote it. I wasn't going to get lame but it's a good quote. Everything in vault has a lease. So the secret tokens are not an exception. The reason I'm telling you this will become apparent in a minute. When users or machines I want to put, remember I said we had 6,000 container? When users or machines authenticate to vault with something other than a token vault creates and returns a token that it expects to be reused until it expires. And the default lease time in vault you don't configure into 30 days. If you don't reuse those tokens vault doesn't know that you're not going to reuse it. It's going to keep it and persist the lease. And then you're going to do a DR test one day maybe. And it's going to take 10 minutes to unseal your vault because it has to read 600,000 leases into memory. Fortunately, you can use APIs of API vault to look at all the leases that are open and see their assessors. Remember I mentioned with an assessor you can look at the metadata and try to find the ones that are not used anymore. Make sure the thing that is off thing. In our case it was a scheduler. The scheduler was re-offing every time. And you're kind of used to that from other power adapters, right? The only actual way to authenticate to vault is with a token. They provide off services that you can hook up to make it easier for us to log in. And to make it easier to do authorization on those paths. But in reality, all vault wants a token. Speaking of paths. The API and the ACLs that you configure involve the appearance of a hierarchy. But that hierarchy is not real. There are fake strings and every key value score has to implement this abstraction. This is something that was not obvious to me. When you sit down at your nice desk over here and you think, I wonder, how vault works? You might come up with the fact that that's how it works. Yes, of course, it's an abstraction and every backend must implement it. That was not obvious to me. And we actually ran into a bug in the DynamoDB backend where if you delete something that was too deeply nested and there are no other, like along that nested path, there are no other secondary secrets, the cleanup function was removing all the linkage nodes all the way down to the base level which means when you would go list what was in your vault, it would say there's nothing involved. Found that out during a deploy one night, a man. And he was like, I think that I think that vault is empty and we looked at the commands and we were like, there's no way it could be empty. Like, that's the right command. That's the right command. It was right after we moved to DynamoDB, my brilliant idea. I didn't feel good. And then found out what happened. It was just an implementation of the DynamoDB backend. And because of all this open source, we found a problem, we were to test for it. We tested. And I think as you think about migrating back ends, vault is very, very stable. Vault is solid software. And because you can go in and write tests and make it better, that makes it even better. And so keep in mind that there are abstractions like that. It's not perfect software. The other thing weird is you should totally you should totally help out. All the back ends have to conform to one test suite and one interface. So there is a lot of testing that goes on and for the most part everything is totally stable. So anything you pick can work great for you. The last one. Yes, come on. I know you've done it. I know you've done it. Vault writes overwrites everything in the secret pack. So SAVE, however, does function like this. I think the official vault recommendation is that you should really only store a single thing in any given path. But that could be kind of complicated. So I recommend SAVE. Make sure that you don't run SAVE, Vault, right? Because that should vault right there. Anyway, I hope that we all know now that you're on Vault of Scale and you're ready to go and break infrastructure and migrate your back end by answering these three questions. Same way you would anything else. How do you scale the vault? The same way you scale any other statements application. So thank you much. If you have any questions or don't have time, no time. Bye-bye.