 So, let's just get started, quick intro about me. I'm fairly active in the chef and broader dev ops world. I'm Code Ranger, almost everywhere except Twitter, where the name was already taken, so I'm Cantron. I work for Bloomberg on miscellaneous open source ecosystems stuff, but not really related to this talk. To lead off, this talk is going to be about secrets as they pertain to infrastructure. So I'm not talking about password managers for your laptop and I'm not talking about web development. If you force me to give an answer in one word, respectively, I use one password and PBKDF2, but this talk is not about that. Feel free to escape, no hard feelings. So, what is a secret as it pertains to infrastructure? You could treat all private information as secret, like all user data, but that would get ungainly really, really quickly. So to keep us focused, we're gonna look at three properties that define a secret as we'll talk about it today. The first is it has to be small. So again, you could use everything in your database that is related to users is secret, but no, that's gonna be really hard, so instead we're gonna talk about a passphrase or something that you could use via encryption. So the actual secrets, part of that is very small. Next, let's be radioactive. So when you're logging into something, you have both a username and password. We differentiate which one of those is a secret by which one other people are allowed to know. The secret part is the password, it's radioactive. If an attacker knew that, it would be bad. As compared to a user ID, which maybe if it's a vendor platform, you don't really want them knowing your user ID, but it's not radioactive in the same way that a password or a key leak would be. And third, it has to be required. Usually when we're talking about secrets, it can't be something that the operation can mention without, so graceful degradation is gonna be a secondary concern to everything else we're talking about. Four types of secrets they're gonna use kind of as guiding use cases. The first are passwords. When we're talking about passwords here, again, we're not talking about humans, we're talking about machine to machine authentication, but usually when you're talking about passwords, it's systems that were originally designed for a human. So for example, when you log into Postgres or MySQL, you give it a username password. That was originally designed for a human operator, but a lot of times these days, it's gonna be a web location framework or something like that. So it's server to server, it's that of human to server. In general, passwords are going to be very small, usually below one K, and there could be some sequence or string of asking bytes. Some other examples, like I said, SQL database passwords, HTTP proxy passwords, or Linux login passwords. To contrast to passwords, tokens are usually for APIs built from the ground up, understanding that they're supposed to be server to server, machine to machine. They also usually can't be hashed. With passwords, we can sometimes cheat and store some kind of irreversible hash for the password, so we're not storing it in plain text. Tokens usually have to actually be in plain text for them to function. So some examples include API credentials, credentials like pagidity, or OAuth refresh and access tokens from users. Keys are usually larger than passwords and tokens. They have sort of structure and semantics inside the file, so they have headers, they have new lines, stuff like that. TLS keys, SSH keys, but other than that, they're very similar. And then finally, this long tail of miscellaneous. Some of them, like say, perverse machine tickets, those kind of look like keys. We can manage them basically like we do other kinds of key material, except that we'll have to call K admin in a couple of places, but it's mostly the same. Others, like say, HIPAA is the healthcare privacy law in the United States. Things like that, they require totally custom tooling, or PCI DSS record files. You have to be able to, whenever you take credit card information, you have to store audit records. Those all require dedicated tools. They are still secrets, but we're not really gonna talk about them today. So we know what type of secrets we're talking about. Now we need to take the temperature. Hot secrets or online secrets are things used during the normal operations of an infrastructure. So this means that a server has to be able to access and use that secret during normal operations without human intervention. For example, your web application needs to authenticate to a database. So it needs the database password. You could have a human operator sit there and type the password in every time a web request came to you, but your application's not gonna get a whole lot done. So we think of that as being an online or hot secret. To compare this, cold secrets are things that we want to keep stored and we want to keep safe, but they can be put behind a lot more walls. So for example, AWS master passwords or revocation certificates. These are things that we won't need very often, and when we do, they're going to be humans that are going to be doing the request. This hot versus cold dichotomy is rarely 100% clear in practice. Most secrets and tools will fall somewhere in the middle. So for example, with a small web app cluster, when you spin up a new box in that, that requires human intervention to get the initial passwords and whatnot onto the box, but after that it's going to run autonomously. So it sort of starts cold and ends hot. With online secrets or hot secrets, there's another sub-spectrum around how often it changes. Most traditional online secrets management systems are built around slow secrets. Once a secret is set, it usually only changes either because you had some kind of emergency, like a compromise, or you have an industry standard like PCI DSS that forces you to rotate your encryption keys. Rotating a slow secret is usually a human-initiated action, and it's usually not trivial enough that you want to do it very often. For example, TLS keys. We all know they change. We've all had to sit and renew TLS certificates plenty of times, but day to day, you think of them as being relatively static and how many people have been bitten by, oops, I forgot to renew a certificate. So some newer platforms are bringing in this concept of fast secrets, secrets which change in hours and minutes instead of days, weeks, or months. So for example, OCSP stapling is a technique in TLS that basically you regenerate a TLS certificate or a piece of TLS certificate every 15 minutes, or Amazon EC2 role credentials which automatically rotate every six hours. Every time a secret is rotated, it invalidates all of the previous versions of that secret, usually, which means that if they had leaked but it's not been detected yet, it would render them useless. This does however usually require more explicit coordination between the secrets consumer and the secrets manager because the consumer has to understand expiration timers and refresh mechanisms. All right, so that's the properties of secrets. Let's talk a little bit about the properties of secrets management systems. The principle of least access or principle of least privilege as least as it pertains to computer science is generally attributed to Jerry Saltzern in 1974 ACM paper. This is mostly common sense, but it is so often ignored that it bears strenuous repetition. In short, a service or tool should have access only to the secrets it requires and nothing else. The quality of every secrets management platform should be judged on two main points. The first is the principle of least privilege which we just saw, and the second is how much audit information is recorded by the system so that when something goes wrong and it will, you can sort out what happened. Other features are important and they'll make or break your use case for a specific tool, but careful analysis of these two properties should always come first. All right, let's do it. Let's manage some secrets. Boom, database password managed. We're done, we can all go home, right? So we've all done this. We all knew it was a bad idea, but maybe not why it was a bad idea. What we've done breaks both of our guiding principles. The first is we've irrevocably tied the access permissions to having access to the code to getting the database passwords. That means everywhere that we wanna clone this code, it's getting the database password whether we like it or not. And the second is we have very little access logging. Maybe we can control, maybe we can see who clones the repository if we're not using GitHub, but at best we get the clone records. We have no idea who read this file because it happens locally and it's totally out of our sphere of control. So now we have a strong gut feeling or maybe more than that that we want to improve how we handle this. The next thing we have to figure out is what kind of threats that we wanna protect against. Not every secret's going to be equally valuable, but whatever system or tools you use needs to be strong enough to withstand whatever your most valuable secrets are. Threat modeling is an examination of where attackers are most likely to strike and what a successful compromise of each particular attack surface would bring to the table. So these are the eight major levels that I use. Again, I'm not talking about web application security because that's a whole nother talk and I'm sure there are several here for this conference. So we're just looking at specifically vulnerable points within infrastructure where an attack might have specific consequences. So first up, brute force attack. If you have a server that's on the internet, you see these nonstop parade of them. It's been happening for decades which shows no signs of slowing. On the plus side, because it's been happening so long, we have lots of tools and techniques to work around them. The three R's that I use, always rate limit access to use things. So logins, API calls, whatever it is, if it's a thing that could be brute forced, rate limit it. Restrict access. Let's say we're again talking about a hypothetical web application cluster running in the cloud somewhere. That database server doesn't need to be on the internet. Don't put it on the internet. Then there's no brute force attempts or at least way fewer. And finally, secrets rotation. If you've got let's say a relatively complex password, it could take say 10 years to brute force it. If the secret rotates every 15 minutes, not a problem. And finally you can use technologies that currently are beyond brute forcing, like say high bit count RSA keys or elliptic curve keys. However, remember that that's always going to be a moving target. Someone could capture traffic or data now and possibly decrypt it in the future. All right, next attack surface, a code leak. So this might be serious to the business, but we're not really concerned with that. For the infrastructure and security point of view, code leaks shouldn't be a problem. We've all heard security through obscurity is not really better than an in no security at all. So no one hopefully is hard coding passwords all over their web apps. This really shouldn't be a big deal. Backup leaks. There was an Instagram hack a couple of years ago. That was one of these. They uploaded a backup file to S3 and probably forgot about it and never audited what was in that backup or what impact it could have. This is usually a superset of a code leak because the code is probably going to be part of the backup files just because we're all lazy, but it's also going to include things like config files or database dumps. This gets to be a little bit more hairy than a code leak because maybe you put your database password into that config file and you forgot to exclude it from the backup. Best way to work around this, constantly audit your backup system, understand what data is in there and make sure you're very careful to always exclude things if necessary. You can also store things only in RAM. That usually makes it a lot harder to accidentally get it caught in a backup. Traversal attacks, this, as far as infrastructure goes, lumps together a whole bunch of different type of web application attacks like say directory traversals or SQL injection, all of those. From an infrastructure point of view, a traversal attack means it is a secret that the application has legitimate access to but the user isn't supposed to have access to. The best defense here is don't give apps secrets they shouldn't have access to. Again, principally it's privilege. If the app doesn't have access to secret, no amount of traversal attack is going to give an attacker access to that secret. Also practice good web application security, don't have SQL injection leaks, all that good stuff. As an aside, a frequent traversal style attack takes advantage of storing environment variables, or storing secrets in environment variables. Even the 12 factor manifesto from Heroku goes so far as to call this a best practice, storing keys and tokens in environment variables. I am not a fan of this. A lot of debugging tools, things like Sentry, et cetera, they automatically slurp up all environment variables and store them somewhere into a log server. So that database password or whatever that you were so careful to protect is now chilling in plain text in a log file on your error server somewhere. You could do this carefully if you were very, very precise about everything but here be dragons. So once they can do code execution, this is on to the next level, web app security no longer matters here. Once code execution is happening, the only thing we have left are structural protections. When you're talking about code execution, we want to use things like privilege dropping. So if you have a key file, you can make it only readable by root. Your application starts as root reads the key file into memory and then drops privileges. If the attacker gets code execution, they still can't read that file. So also you can use things like namespaces, charoots, containers, all that good stuff that limits the permissions of the runtime of the application. When dot type passes through the gates of penalty notes the inscription, abandon all hope ye who enter here. A root code execution vulnerability is approaching worst case scenario. Literally all we have left is structural protections. If there was a secret on the box, assuming it is compromised, here is where being able to tell what secrets a box ever accessed gets to be very, very useful because we want to know to rotate all of those very quickly. Many people will ignore this, attack surface is unlikely, but it can happen so be careful. Another commonly ignored attack surface are laptop deaths. Getting access to a developer workstation, especially a lot of smaller companies gives you root on every server. So that last slide times the number of servers you have. Fortunately laptops are usually used by humans instead of other computers so they can cheat a little bit. Humans are allowed to know a password they don't tell anyone else so you can use something like disk encryption to hopefully render this to be not as much of a problem. And then finally the higher power attack surface. This is where a lot of people draw the line either voluntarily or because their industry regulations don't allow telling the government to go to hell. Things like state sponsored hacker groups or advanced persistent threats which is a code for China and North Korea they get increasingly difficult to handle. Here you'll have to ask how far you want to go, how far you can go and what you can do to protect your systems and users. All right so that's a whole bunch of theory. Let's talk about some actual tools to manage secrets. So starting from the top again, manually moving files around basically what we did before. Sometimes you'll just put things into text files in the applications repo. Sometimes you'll have a repo called secrets or sometimes you'll just scp files around like that's how a lot of people manage TLS keys. You get a file from the CA and you just scp it to your three web servers and you forget about it. But we already talked about why this is bad so let's move on. All right next step that a lot of people reach for is I want to encrypt my data because encryption makes it safer. Git Crypt is the best of these but there's a whole lot of tools that do Git encryption. But they all come with major downsides. We still have no real least privilege because again we just have codes sort of chilling in Git. Maybe some of the tools can get a little closer but audit logs are really not gonna be a thing because everything is distributed so the actual file reads are happening on every individual server. Also most of them are opt-in systems. You have to explicitly mark which files to encrypt meaning if you forget to mark a new file as being encrypted you will push it in plain text and may never realize and then you get to experience the joy that is expunging files and you get history. All right, before we move on let's talk about different types of encryption. This is gonna be just a quick primer. Symmetric encryption. On our workstation we have a secret. We want to send this somewhere else safely so we generate a random key. We use that key to make an encrypted blob. We somehow copy that key to a target server. We somehow copy the encrypted blob to a target server and we use the key to decrypt the blob and we get the original secret back. To contrast this, asymmetric cryptography. We have a secret. We generate a public private key pair on the server. We copy the public key up to our workstation or retrieve it via some kind of service. We generate an encrypted blob using that public key. We copy it down to the machine. We use the private key from the key pair to decrypt the blob. What this means, symmetric key systems means that you have to distribute that single shared blue key to every target machine whereas asymmetric keys you generate a separate key pair for each. Just keep those roughly in mind. All right, cluster managers. This is going to be sort of one step past Git. There's not really a lot of commonalities between them but I lumped them together anyway. Zookeeper, console, etcd. They all do have ACL systems and you can use those to implement principle of least access. Console and etcds is okay. Zookeeper, ACLs, I'm not sure any person on the planet has ever implemented correctly so I probably wouldn't try this but if you really want to go for it. They do have all of the tech in place to implement decent secrets management but it's very, very difficult. I'm very active in the chef world so I see a lot of people trying this. It says encrypted on it. Just like we saw with Git and encryption makes us feel warm and fuzzy. Again, this is going to be a symmetric system so that means that we have a new secret. All of a sudden we're going to try to move that key around in the same way that we do all of our other secrets and now we have just moved the problem. So when I say turtles all the way down what I mean is that most of these symmetric tools and there'll be a whole bunch of them, the key used to manage all of the other secrets is itself a secret. So you're not really solving the problem so much as going down a level of recursion. Ansible vault is similar to chef encrypted data bags but it takes advantage of Ansible's post-based nature so instead of needing the decryption key on all of the target machines you only need it on the workstation. It's a little better but still same fundamental problem. How do you get that key on the workstation? Here at EMO is the closest analog to encrypted data bags and Ansible vault in the puppet world. The difference is that in puppet it does all of the decryption on the puppet master instead of on the target servers. What that means is that the puppet master can see all of the secrets for every machine and we tell it only give these to certain machines according to this policy. We call this a trusted third party system. That means that the trusted third party in this case puppet master has access to the universe and we are having faith in its internal access controls to only give things out according to the policy we've written. Puppet master I can say is fine and does this properly as far as I know but anytime you're evaluating this kind of trusted third party system check your faith in its access controls. Another chef specific tool is chef vault no relation Ansible vault. This takes advantage of the fact that chef uses RSA key pairs for API authentication to build a key distribution system for the symmetric keys. It's still kind of totally though those key pair that the RSA key pairs used for chef authentication are themselves not really generally very well managed. You can do key rotation and key management on those but very few people do. So if you're going to go this route just be aware you're probably gonna have to go write a whole bunch of extra tooling to do it safely. All right, another aside. So I mentioned before about trusted third party systems that contrasts with pre-encryption systems. In a pre-encryption system it looks kind of like this. Symmetric system we had before. We have a secret, we have a key. We make an encrypted blob out of the secret. We copy the key down to only the machines that should have access. We copy the encrypted blob down. Usually this involves putting it on some kind of storage system where everybody can download it if they want. But then only the machines that have the key are able to decrypt it. So what we're doing is basically doing all of the access control work ahead of time based on who we send the key. Once the key is on the machine, we can send it new encrypted blobs as often as we want. But we have to do this initial work ahead of time to get keys on machines to deal with the encryption. To compare this with a trusted third party system we've got four different key pairs and a secret. We send the secret over to the trusted third party system. Notice I never encrypted it. There's no encryption here. It's gonna probably be using TLS between all of the things but at rest it's effectively unencrypted. The trusted third party system has access to it in plain text all the time. But we attach a little policy saying only send this to B and C. So then only B and C can access this and decant. So again, trusted third party system has access to all of the secrets in effective plain text even if they're encrypted at rest or whatever but effectively can get access to all of your secrets. And we just give it a policy saying where it should go. So leading into a couple more of these trusted third party systems. Hash report vault is probably one of the newer things that I'm mentioning but it's already making waves in the secrets world. It's a dedicated secrets management platform so it supports all of the features you'd expect it to. Audit trails, granular ACLs, modular storage back ends and that's a breed auto rotation system for fast secrets. Slightly older but still very solid is Square's KeyWiz. It's got a more limited data model than a hash corp vault and some of the other things we've seen but that also means that it's battle tested to a much higher degree. I'll talk about KeyWiz FS in a moment but KeyWiz excels at managing key type secrets and other whole files. It can be used for passwords and tokens but that's not really what it's for. For people on AWS, a simple starting point is to make a private S3 bucket and use IM policies. I've got some stuff online if you're interested in this but this is sort of the simplest version if you are 100% all AWS all the time is to use IM roles and IM policies for controlling this kind of stuff along with S3 or Dynamo. Before I talk about the next couple of these I need to mention Amazon KMS. Amazon KMS is not in itself a secrets management tool. It would be better described as a key escrow platform. So all those times where I said to generate a key and make an encrypted blob, it handles that part. So you create a key, it lives inside KMS and you can send data to it to be encrypted. You can send data to it to be decrypted but it doesn't actually manage the storage. We can tie that in with some other tools though to give a more complete solution. So Sneaker is a command line tool from code of hail. It uses KMS for handling encryption and decryption and S3 for managing storage. Again remember just like with the S3 and IM stuff this is all based around the AWS ecosystem so you have to be willing to be tied to AWS forever. But if that's not a problem for you and it's not a problem for a lot of people this kind of stuff is cool. Confidant from Lyft is another solution on top of KMS. In this case instead of using S3 it uses DynamoDB for the storage and instead of being a command line tool it's a REST API. It's got a nice little web interface. It's got a versioning system for seeing the history of your secrets. All that kind of stuff. Going back to the command line tools Trusso is similar to Sneaker but instead of using KMS for the encryption it uses GPG and it uses a modular storage backend but usually you're gonna be using S3. This means that the encryption is not tied to AWS and is battle tested because it's GPG but it also can be a little bit difficult to work with. GPG is not well known for being user friendly and it does have some provisions for automated key management and distribution but not a whole lot of them. So if you're going to be doing large scale key distribution with GPG expect to feel a little pain there. SOPS from Mozilla combines the properties of the last two tools so it uses KMS or GPG or both but it doesn't handle storage management. If you've got a hybrid AWS, non AWS infrastructure however this could be cool to look at. Red October is from Cloudflare. It's very very different than the last couple I mentioned. It's built from the ground up for cold secrets. So if you remember the old movies when you want to launch a nuclear missile you have to have two people turning keys at the same time. It's like that but for secrets. So you can set up secrets as being two of three or three of five where you need a certain number of key holders to coordinate to unlock a high value secret. For stuff like AWS master passwords this can be very very useful. You could use it for hot secrets by just saying secret is like one of five so anyone thing can access it but that's not really the point. Mentioned for completeness Barbican was supposed to be the open stack answer for Amazon KMS but it's not happening, sorry. I mentioned conjure specifically because it's the one I see most often but this applies to all closed source security products, FICOTIC, whatever, ArcServe, all of them. With things like KMS or other cloud specific tools you have to somewhat take their word for it because they don't really source code and that's just the way the world works but if you're gonna run it yourself maybe demand a little bit more of a standard there. In general if you can't prove somebody's security guarantees assume that it will be false until it's shown otherwise. And finally the biggest gun. In this context HSMs are little bits of hardware that exist to hold a key such that the key cannot be extracted without disassembling the chip and examining it with an electron microscope. Most modern physical servers come with a little tiny version of an HSM called a TPM or trusted platform module. Does a whole bunch of stuff other than what HSMs do but it does have a key in it but otherwise HSMs are big bucks. So if you wanna go down this route they are phenomenally expensive but also phenomenally unbreakable if used properly. There's tons of them they are widely varying if you wanna go down this route expect a higher expensive consultants also. All right so around this we keep dancing around the hard problem of secrets management. Deep down any secrets management system needs to establish an identity relationship between the thing that wants secrets and the thing that has them. So generally referred to a secure introduction. Generally this initial trust relationship in a lot of our systems boils down to I'm going to SSH or WinRM if it's a Windows machine to an IP that I got from somewhere that I trust and whatever answers that SSH connection I'm going to assume that it is who it says it is because I have no real way to verify it. Some clouds have better mechanisms so for example on EC2 there's the instance identity document but in a lot of cases you can't do better than just trust the network. As a corollary if you take it as a given that you need to build this concept of identity or the secure introduction system in a lot of cases you can skip secrets management altogether and you can just use TLS client certificate authentication so like my SQL and Postgres directly support TLS client certificates and you can skip having a password at all. This does mean you have to manage PKI and public keys do need to be handled very carefully certificates need to be handled carefully but a public key and a certificate are not radioactive in the same way that a secret is. All right quickly talking about integration because I'm running low on time. Easiest way in a lot of cases is to read directly from one of these API services in your code so HVAC for Vault or Botocore for KMS if you're writing say Django app. Use HVAC in your settings.py and read stuff directly as you need it. Next up is config management like I said I use Chef but this applies to basically all of them. Things that are command line tools you'll very often have them be driven through your config management layer. KeyWizFS is relatively unique to KeyWiz it's a Fuse file system driver that acts as a client for KeyWiz's REST API so this means that you can use it with external tools like say Nginx. You would tell it to load its CLS private key out of a path from the KeyWizFS file system so you don't need to modify Nginx at all and it'll use safe, happy, in memory only never buffered to disk key management. Console templates was originally designed for HashCorp's console service discovery tool but it's been extended to work with HashCorp Vault as well. You can use this in conjunction with your config management system if you want to run at a higher rate of change than your CM tool does. So you could have your Chef or Puppet or Ansible running every hour and console templates can handle refreshing at 15 minute intervals. And console is similar but instead of putting into templates it puts stuff into environment variables. Remember I said I don't like this because of logging and exception handlers but it's there if you need it. And Summon is similar but it has pluggable providers. It was originally written by Conjure but this is open source so I trusted a lot more on their commercial offerings. But again, environment variables, hereby dragons, all that stuff. So to summarize, check your privilege in your audit trail and whatever tool you're using. Pick your types and temperatures of secrets. Think about the attack surfaces and what you're going to do if they are successfully attacked and have a disaster plan. Thank you very much and we don't really have time for questions. Thank you very much. So in principle we don't have any time but I think if there are some very important questions. Just come find me. One or two or find him. Just come, I'll stand here. Just come up and ask me. Okay so.