 Hello, thank you all for coming to my talk post lunch not an easy thing to do I only had broccoli for lunch very uncharacteristic, but it's gonna keep me spry So my name is Matt Sarabian. I work at zip car. We're our home offices right here in Boston. That's Andy We are hiring so if you think any of this stuff I'm about to talk about as cool. You should talk to him not me Zip car was among the first IOT platforms. It started seven years before the iPhone was invented we currently have over a million customers across a hundred ten countries and Cloud Foundry and Bosch tooling really helped us to standardize across all of our public and private data centers That's been huge. We've been using Cloud Foundry and production for about three years now Which you may have heard from Andy If you caught any of the keynotes And also if you caught any of my colleagues talk like Jason who is here in Derek You know, you may have seen these stats What running in production means for us is that we have over 25 production environments between Dev and prod We have over 6,000 containers running right now kind of a lot our engineers control their own application release cycles They release code every day of the week all the time And if you're laughing right now, I know you have bad taste in jokes and read slides So let's talk secrets These are the things we think about when we're talking about secrets, right if you've seen any other secrets talks here You've probably seen this list many many times We want to keep these things secret There's some big pile of secrets and application needs some subset We need to run time some subset We need during deploy time and we also want infrastructure as code and those things fight Right. So what are we gonna do? We're gonna store them somewhere secure when we're not using them But as a result of doing that Secrets are gonna end up kind of in plaintext on our stack, right? We're gonna try to restrict access to them, but we still need them at some point So we're gonna use a templating solution or we're gonna find ways to inject them into the environment, but You know, that's not ideal when your clicker does Very very bad. Hold on Yes, so We inject them into our environment, right? CFN you may have heard about that that some security talks you can see all the secrets that some container needs That's painful CF dot is even worse if you're in a Diego cell and you check out the state of your cell you're gonna see Every container in that cells secrets Because they live in the environment When you get a scheduler involved, of course your scheduler in this case Diego or if you're on Kubernetes, right? Something else is gonna need your secrets to inject into the environment So it's gonna persist it in its own data store so that when your scheduler wants to heal your Infrastructure or scale you up or deploy a new thing. It's gonna stick that stuff in the environment for you very hopeful also a little scary Or you're gonna do something clever and you're gonna say well I'm not gonna put secrets in my environment instead I'm gonna give it something that's gonna let it off to my super secret store But in order off to your secret store need to give it something to off with which is also a secret And so that's a pain it means that we're transiting secrets through multiple Pieces of our platform and that's Charles all the way down right James Hunt referred to it as the secret zero problem So I don't have tons of time So instead I want to do threat modeling with really big and broad strokes on a scale of script kitty to the massage With an incompetent or malicious employee landing right in the middle. What is our major concern? So I'm guessing that because you're at this talk, which is about managing container secrets at scale You're all comfortable setting up a network perimeter Firewall rules right role-based access credentials VPNs Audible authentication, you know who's in there and what they're doing Congratulations, you're safe from you know, where's from the 90s and script kitties trolling on the internet probably Probably most of you are here. You think your co-workers are kind of cool, right? Like they wouldn't do anything crazy like leave their unencrypted laptop on a bus You know, they're not gonna do something nuts like Publish, I don't know classified information into a publicly available s3 bucket. That's never happened And so we trust them and we trust the way that they work And we're gonna put in internal authorization levels so that we limit the number of people who can do the inside job And that's cool The problem is that if you can't detect or resist that incompetent or malicious employee You probably cannot detect or resist a Modern attack because at a certain point that modern and sophisticated attack breaks through all your script kitty level safety and it's in your network and now you can't tell the difference between that attack and And that malicious employee and so this is a great time to stop and ask ourselves Why do I care? Oh my god? Right like I you want to talk to me about secure introduction like I got default passwords over there So, you know solving secure introduction is only really a problem if you have secrets to protect and you understand your threat model So nobody's gonna come attack your super secret secret protection architecture If everything else isn't sorted out first So basically what I'm saying is if you're not sure you need this you probably don't need this. It's not fun Secret management is not like something you're gonna do for funsies and a hobby. I think it's fun because I'm insane Secrets don't make friends. That's a saying for a reason so So if you think you need this, it's probably because You've got regulatory requirements, right like somebody is gonna be looking at you if you leak secrets So here's where we want to get We can't resist the NSA or the Mossad or some state actor with unlimited compute unlimited funds spy satellites that we don't have But we can probably resist a malicious employee it'd be nice And you know anybody can get zero dead There's all kinds of bugs out there that people are just sitting on that nobody knows about and so all your cool Security checking is not gonna catch it because no one knows about it It's not we're saying what we're saying is wouldn't it be nice if we could design a system Where we're not trusting nobody ever we're just shy of trusting nobody ever How do you get there? I think there's two keys. The first is expiring trust Yeah, I'm gonna trust my machines I'm gonna trust my employees, but I'm gonna put some kind of onus on them to consistently prove that they're actually worthy of that trust and You know my key here is like if you're gonna implement automated credential rotation or you're gonna swap out your search You're gonna swap out your keys delete the old ones you can laugh but like credential rotation is actually really hard to do and There's a lot of funky and annoying Edged cases that can cause you to leave those old credentials around and if you do that for a few hours or days it's probably not a problem, but like I'm sure some of you have looked into a credential rotation scheme and gone like oh my god We've got six months of old credentials here Not good you'd be so much better off if you never did that because then you'd know the state of the situation There's one secret and as long as it didn't leak. I'm probably good and when I rotate it I'm doing it manually and oh, it's so painful, but like it really is rotated The other one is we're gonna keep our eyes on people and things We want a system where if our secrets are compromised, it's relatively easy and quick to find out in some way We have some options. We have to do this ourselves Some of these may sound familiar Committing secrets baking them into your base imagery base container injecting them into the environment kind of status quo Or fetching them at runtime So three of these things fall just shy of resisting your inside job and one of them is slightly better and I think the thing to remember is like it's still totally a valid Strategy to shrink the group of people who have access to that giant pile of secrets That is totally totally fine because this stuff is hard and doing it wrong is way worse than not doing it at all So committing secrets don't make me talk about this You know, oh, yeah laugh. You're doing it. I know you're doing it. I Promise myself. I'm not gonna get upset so I'm also not gonna talk about baking secrets into a base image I want to talk about this injecting them into the environment This is where at today. There've been some cool talks that show how you don't necessarily have to do this anymore And we'll get into that in a minute Just a reminder the environment can be seen by every process Friendly and hostile if you have a developer that writes an angry application that logs your environment to a log or a server Or an s3 bucket. That's bad Your environment ending up in logs is really bad for two reasons one It's like, oh my god, it's in the logs and two because you're lowering that technical barrier, right? I'm gonna restrict access to my jump box I'm gonna restrict access to my containers and nobody can get into my system Are you restricting access to your cabana logs or elastic search? Probably not because somebody who's gonna like debug this stuff needs to go in there And they might not have that same level of access as that tight-knit group of people And so if I can like Lucine crews are hard, right? But they're not so hard that I can't put one together that says like PG password, right? Like that's not that bad If I want to go a little bit closer to state actor level I can say that my hypervisor and my scheduler both have access To my containers environment and so that means that if there were ever zero days on those all my creds would get leaked And we already talked about how rotation is a pain so fine Great, I've convinced you I can tell we want to fetch them at runtime, but how are we gonna do that? MTLS very popular super awesome A secret that let's us get other secrets. We know that's bad Whitelisting with network information. It's not so bad I think it's like a base level right like is this request coming from a place that should come from We should be asking that question. It's like low hanging security fruit It keeps you safe from script kitties not from a not from a malicious modern attack So this is the secure introduction problem when something requests credentials at runtime Why do I trust that request at all? And if I do trust that request, how am I gonna get them to credentials in the first place? So You only have two options There's probably more but these are the two. I mean, I'm not gonna talk about any more today credit hub and vault Vault has some downsides. Oh, I love vault, but oh man If you've never worked with a vault server, it can be challenging. There's a lot of tricky things There's a safe release can make that easier And it can easily become a snowflake or single point of failure. That's pain There's no built-in blush support. There's things like spruce that can help. There's other third-party tools that can help us, but There's nothing built in There's lots of off options, but they need to be manually configured which can also be kind of tricky and get too close That's no flake state. Maybe And the seal and unsealed behavior people usually hate. I just restarted I just redeployed vault and like sealed and I have to wake somebody up at two in the morning to unseal it I think that is a feature and you can bother me about that later I don't get those pings by the way So vault is also awesome at a lot of things It has granular break-ass support It has parent-child token relationships And that means that if I have a child token that is doing this awesome second bullet thing of Generating dynamic secrets for me from tons of external services like I can revoke that parent token and all the other credentials get revoked for me I don't I don't care what they are. I don't know what they are, right? That's awesome It also has lots of built-in off options. Even though you have to manually configure them. There's tons of them super cool I say zero credential Rotate zero downtime credential rotation is solved There are lots of tools in the vault ecosystem that make this easier It does require some onus on app developers to solve this right if your app is gonna pull from the environment to read credentials Or gonna pull from some other place to read credentials and it only does that at boot time. Well Probably gonna have to reboot your app might be some downtime and we can talk about ways to solve that It does have built-in concourse support if you do a lot of things with concourse like we do at zip car Your life is can be can be much easier Vault also has mind share both in and out of the CF community. So just like all open-source software while you're sleeping vault is getting better Vault has detailed audit logs and tons of places you can ship those logs to which is great Vault is FIPS compliant, which if I don't know is that a concern for anybody here? Congratulations, that's great the mess not a concern for me either But they will give you a letter that says they are FIPS compliant. You can show that to a lawyer and you are set Vault also has multiple back-end storage options right vault really wants to function as a dumb host So vault doesn't want your data. What does want your secrets? Just gonna pass it through and encrypt it somewhere else That's nice Credhub has tons of benefits to Thanks to Diego Credhub can do mTLS auth for you This slide used to be much different by the way, and I had to rewrite this talk after I went to a bunch of talks here And that was kind of stressful for me But I have a lot of nice things to say now so Mtl sauce is done for you right Diego is gonna give you a cert and as I recently found out You can have Diego rotate that cert constantly for you up to every hour. That's amazing That kind of solves your problem, right? It also integrates with you a so people can off into it, which is nice It has built-in support for Bosch. It has built-in support for concourse And if you've been following credit hub development at all when they first announced credit hub and its integration with surface brokers There was a bit of a security concern that some of us in the community had that we're passing those credentials through tons of components and it wasn't very secure like why can't the apps just talk to credit hub and Since this spring they've solved that which is awesome It's another example of an active open-source project getting better while you're sleeping Credit up also has a detailed audit log currently. It's only in my sequel which can be a problem, but that is changing It also has spring credit hub integration So if you're using spring credit hub is an awesome option for you It has some downsides It does not yet have anything like parent child token relationships or automatic expiration via TTL, which is painful means you have to do it Or you have to figure out a way to rotate credentials without downtime or at least with minimal downtime because Credit hub is really right now relying on that fixed path I have to request a path if some of my containers have certain cred and some of them have another one When do I expire the old ones and these are the fun edge cases that come when you have to implement credential rotation? Mindshare is also right now at least confined to the CF community. So if you're using all CF or mostly CF That's great credit hub is probably the option you want to go with It does not yet have dynamic secret generation like vault does and I'm so it can't generate and expire credentials for you from third-party locations Maybe not a problem for you It's not yet integrated with CFCR. That's on the roadmap. There's only one storage back end as I mentioned There's only one audit back end, which is also my sequel So that can cause pain points. They're able to be worked around and like I said smart people working on this This is gonna get better How do we choose? So Like I said, if you're already using credit up today, and you don't have any of those concerns that I outlined Keep using credit Using vault today, and you want to answer this question? Is it possible to authenticate with a secret service without first seeding an app with a golden secret? What do you think? Let's talk about my favorite software. Yeah All right, I already talked to you about all the awesome things that vault can generate and expire secrets for This is a small list of those things There's you know infrastructure providers their services. There's things you can run yourself There's things that can be hosted elsewhere, and it also has really cool security primitives I would love to see these things integrated in credit hub or any other security software They're so awesome periodic tokens response wrapping in the cubby hole back end. Has anybody in here used one of these or more? Yes, yes talk to me awesome If you haven't let's review them. Okay typical vault tokens have a renewal time and a maximum lifetime and so That means that like, you know, you're using it and you can use it for so long and you could say you want it Some more but eventually it's gonna go away and that can be sad especially for long-running applications and containers periodic tokens do not have a max lifetime But usually what you do is you set a very short renewal time called the period on it So you might say that's 10 minutes 5 minutes 30 seconds if you're crazy and As long as whatever is using that token is calling out to vault and saying I am still using this token That token will live forever Very cool. You can also associate these periodic tokens with parent tokens, which can also be periodic tokens themselves And if you revoke the parent token All of its child tokens are evoked even if it's being renewed and the same goes for any periodic token If you have the assessor to it, you can revoke it even if it's being renewed Response wrapping is probably the closest thing that we're gonna get to those self-destructing spy messages you see in like get smart or mission impossible or like like In specter gadget, which was the get smart guy again Don Adams And this is built into vault. This requires no special configuration. It has a flag. So when you do a vault read Like that one there you just pass a flag wrap TTL and give it a time That's instead of giving you back the secret that you just requested it's gonna give you back a token that kind of looks like a good and That can fetch a single secret One time within that specified amount of time and that is so cool. I'm gonna say it so many times You get a single secret one time within a specified amount of time after you read it useless if the time expires useless Instead of issuing a read you should unwrap. This is really cool because I don't know where that secret is coming from See the read has my path in it. The unwrap does not very awesome Cubby hole This is a private scoped key value store each token has their own no other token including the root token Can do anything with another tokens cubby hole You don't even know if there's anything in it You can do nothing with it when that token is destroyed whether it's revoked expired doesn't matter Everything inside the cubby hole goes away to use it all you do is you write something to cubby hole slash instead of secret slash That's it Let's put this all together If you're a big fan of Dwayne the Rock Johnson as I am or you have children You may have seen this scene. It is Maui Inside the movie Moana jumping into the realm of monsters and that is what we are gonna do. So let's do this Okay You probably remember From just a few short minutes ago that in order to use secrets at runtime the app needs to authenticate to vault and get a vault token some way We're gonna say it needs a user name and password This could be an MTLS start it could be a couple of other things app roles are very common ways for machines to authenticate with vault So that's what we're gonna go with for this example, but this could be anything So in order to do that we're gonna need to give it a role ID, which is not so secret and a secret ID We can't just give it to it, right? We need to conceal it in some way give it some cover hold to those two things we talked about in the beginning of this talk Frequent credential rotation clean up the old stuff Watching does it sound like anything? It should sound like these two features Cubby hole gives us a secret place nobody else can read it unless they've compromised your token Response wrapping gives us a single use token that is able to fetch a single secret One time We'll specify the amount of time and the corollary to that is that if something else reads it and you try to read it Oops, that's bad and you try to read it and you can't Something else did that's pretty cool That's this I tried to read it. I couldn't something else did very awesome. Wow, look at that Whoo, all right, we're gonna do this in a couple of pieces So first actually let's let's start here, right this orchestration API. This could be a service broker Could be Could be anything right? It could be an external service outside of Cloud Foundry. You need something That is kind of a snowflake Probably already have this If you're using vault that orchestration API is going to generate a periodic token Probably under a parent token that we scope in some granular level that where it makes us happy for revocation We're gonna inject that token into every application instance the same way we inject stuff into the environment and you say wait You're injecting a secret no because this token cannot do anything inside vault all it can access is its own cubbyhole Okay, so what? So now our orchestration API server is going to generate a secret user ID for the application to use But it's not going to read it like a monster It's going to get it as a wrap response And so what it's going to get back is something that can only be read once And it's going to stick it in that injected tokens cubby hole So it's using its weakness as a strength. It has to keep That injected token around to reboot these applications it has to keep that injected token its environment And so we're gonna put stuff in that injected tokens cubbyhole that is valid for such a small amount of time that if it's ever disclosed Good luck So a boot time the application is going to read that injected token from its environment And it's going to loop through the entry in its cubbyhole, and if it finds one that's awesome It now has its username and password so it can go about its business and get a negotiated token and start working with fault It's not going to do anything crazy like write that negotiated token back out to the environment or to disk or to s3 It's gonna just keep it in memory And if it loops through the whole array and it can't find a wraps response token that's valid it is gonna freak out We're gonna ping people We're gonna revoke the parent token We're gonna do some measured thing or we're in dev and we're gonna tweak our timeout because It happens it happens Once the app is running We have everything we need right we have that token that's in our environment That's now useless because we've read everything out of that array. So Nothing of value there read my environment. I don't care and we have the the negotiated token which is really important we're gonna keep in our memory and We're gonna pole vault with both those things and say ham using this I'm using this I'm using this and That's gonna help us out So that those things don't expire and remember when they do expire whether we do it or whether your app crashes Or whether you just shut down this whole deployment and all the apps are gone All of the secrets that they requested from those dynamic secrets back ends go with them I don't have to worry about like oh my god. What where are those even they're gone? Well, it does that for me. So One more time for people that like diagrams like this. I Wanted to deploy two instances of an application. I talked to my orchestration API talks to vault makes a periodic token for me Maybe it's gonna make a parent and store the assessor so that we can do revocations on some granular level and you have a little control here, right? It might be at basis might be org basis might be space basis, whatever you want Then it's gonna request a wrap response for the secret thing that whatever you're talking about needs to authenticate to vault And it's gonna stick that in the cubby hole And Diego is gonna do its normal thing. We're gonna stick that into the environment manifest We're gonna put it somewhere else. We're gonna get it into our applications environment usually via Diego And now our apps gonna read that out of its environment loop through its cubby hole. So it's there Hopefully read something if it doesn't this is our break glass, right? This is where we would stop and and do something to notify somebody and revoke credentials Assuming it finds it. It's gonna then be able to authenticate to vault It's gonna say hey, can you know, can you actually give me a real token now, please? And it's gonna do that and then it can just talk to vault and it can go provision its secrets You can do anything it would normally do with vault and you can set roles all along the way, right? Like it doesn't have to have full free reign of all you can name space its paths you can Mount custom back-end for your service, whatever you want Once the apps running One it is gonna communicate to vault and do its normal thing requesting credentials. Now. Here's the cool part, right? I said automatic credential rotation was solved So what I mean by that is if you were to develop Something in your applications or a shared piece of code across all your applications like an SDK or just a library that you Want to share to connect to these things you could probably do in a source broker, too You'd say okay I want to I want to use these creds and your app is gonna get the creds and it's gonna start using them and It's gonna implement a small retry loop It's gonna go out. It's gonna try to communicate with these things to run a Query right on postgres site And it can't instead of just crashing and failing and being scared. It's gonna go hold on Let me try to Negotiate new credentials with that same secrets back-end Which should be able to do because it already has a negotiated token to off to vault to generate as many creds as it wants vaults expired my postgres credentials because I was really like nice and I wrote a role that made sure that my Dynamic back-end was expiring my credentials every 30 minutes or something wilds making security people so happy and My application is gonna get new creds and keep going on with its life And so that's not bad that gives us credential rotation We don't have that any downtime We just have a little retry loop and of course if it can't do that Then we probably don't need to trigger break glass But we might want to crash the app and crashing the app depending on how many instances there are will cause that beginning Large diagram to happen again If a single app instance crashes This is what's gonna happen. We're gonna listen to that crash event number one Which we can do right Diego emits events that say something just crashed we listen to that number two We reach out to vault when we say hey, I need a new wrapped response For for for secret ID. It's gonna put that back in the cubby hole number three It's gonna be injected back into the environment fine number four It's gonna go out to vaults gonna loop through that array and because in number two I already put the thing cubby hole for it. It's gonna find it or it's not we're gonna freak out If it does Then number five is gonna go about its business as it was when the app was running fine There's two risks that I can think about here The first I kind of touched on a little right something reads the cubby hole It's problematic But only for a very small amount of time If you're aggressive with your credential rotations, you're making it really hard like yeah I exfiltrated those creds and now I want to go use them But maybe they're not valid because they have a really short TTL kind of nice And most of the time unless you have an adversary who can like willfully crash your apps and keep them crashed before something else in your monitoring catches like Huh this app that communicates postgres has been down for a couple hours now and it keeps going down I mean something should notify you something's not quite right there, right? The second risk is that something compromises an application's periodic token does something reckless writes to disk Something can like debug your app in real time and get out of its active memory. Whatever that token is That's kind of bad because we wouldn't know But If we concourse or something cool To redeploy our applications regularly and part of that redeployment Was going through and revoking the parent token that we created at the very beginning of this that has never left our orchestration API All those credits will be revoked for us So again, I haven't totally solved it. Maybe it can't be solved But we've significantly increased the difficulty and like there is always an easier target in the enterprise space especially So I think That is my modest proposal of how we can solve this if any of that seems like too much So his legacy technology that you can use to keep yourself safe Miraculously, I think we have maybe time for one question Or or a couple more and and you can harangue me in the back about vault literally the rest of the night Any questions? Oh, it's so easy Great. Thank you so much for coming to my talk. I'll hang around back there And I I love really talking about this because I don't know why but I just do