 So thank you for coming to this talk. We're going to talk about Shield today, but first, we have to have a fire slide chat. This is Tom in the room. Tom boycotting because of the fire slide chat. We haven't had any fires here at CF Summit yet, but Boston City Fire Code does require this slide be presented. So presented. Shield. Today we're going to talk about Shield, which is a data protection system written by Stark and Wayne, open sourced in 2016, I believe. We're going to talk about where we've been, what we've done, where we're going, and then do a live demo because one way to end out Summit is a live demo because nothing could possibly go wrong with a live demo. Well, my I'm James Hunt, available on Twitter at I am James Hunt. I am the shield architect, one of the original implementers, designers, chief errand boy, GitHub issue. Herder, I work at Stark and Wayne. We do a lot of cloud boundary stuff. We do a lot of Bosch stuff. We do a lot of concord stuff and we try to protect your data. And that's what Shield is. Shield is a data protection system. We used to call Shield a backup system. And then we realized that's a terrible name because you can take backups all day. But if you can't restore those backups, all you have is a neat way to eat up S3 resources. So how does Shield work? Shield is a plug-in based system and we basically simplified backups down probably to the point of absurdity, but it does seem to work. And we treat the data protection and recovery as two different operations or two different paths. The first of which is your data protection, which is two separate operations. First we have to back up the data, which is going to the data system, pulling the data out, and then we have to store that somewhere else. Preferably S3 or an NFS share or something out of the room, out of the building, out of the city, maybe out of the state. So that if there's an emergency or an outage or a failure of Power Grid or data center or server rack or network switches, you can still pull the data back. Inside Shield, this might be a Postgres plug-in doing a backup, going out to Postgres, logging in port 5432, authenticating, pulling back an exact dump via something like PGDump, and then streaming that out into a storage plug-in like S3, which is going to take that input stream and tag it with a resource handle or an identifier and then stuff it in a bucket somewhere so that it's safe. Safe inside Amazon's data centers, which never go down, never have issues, everything's good. In the event that you have to restore from a backup, recover, it's the same process in reverse. You have to retrieve the data that you stored in the cloud, pull it back into your data center, and then put it back on the data system. Continuing with our example, we're going to go out to S3. We're going to hook up to our bucket with our key ID and our secret key. We're going to say, I want backup number 4672. Stream that over to the Postgres guy. He's going to P-SQL it back into a reconstituted version of the Postgres server, maybe a whole different Postgres server. These are the four verbs of SHIELD. Backup and store, retrieve and restore. And on this framework, we've built plugins and a plugin-based system of extending SHIELD. Because in truth, SHIELD does not know how to do backups, and SHIELD doesn't know how to do recovery. SHIELD knows how to tell other things to do backup and store, retrieve and restore. Backup and restore, of course, are the purview of target plugins. We have target plugins for Postgres. We have one for MariaDB. We have another for MySQL and extra backup. We have Redis. We do console. We do MongoDB. We do a whole bunch of data systems. Store and retrieve belong to storage plugins. Surprise, surprise. We have storage plugins for most of the major cloud providers. S3, Azure Blob Store, Google Cloud Engine. We have a couple for in-house. If you want to run, you can actually use the S3 plugin for a work-alike. If you're running something like ECS or Minio or any of those. If you run Scalady, we've got a plugin for that. If you just want to run WebDav somewhere, we have a plugin for that now. The future of SHIELD is an interesting place to be. Around the end of 2017, actually I guess it was the end of 2016, we open sourced in 2015, right? I think that was true. I was off by a year and I apologize. At the end of 2016, we've been running in production with SHIELD for a year on several clients. We had picked up a bit of a steam in the open source community. We had a lot of people interested, a lot of buzz. We kind of went around and we talked to people. Having clients using SHIELD, Stark and Wayne clients, gave us the unique opportunity to watch operations teams in the wild with natural habitat, hunched over MacBooks, attempting to restore things in the heat of the moment because somebody deleted Cloud Foundry. Oops. We got to see a lot of pain points. We got to see a lot of confusion, a lot of fear and uncertainty about, is this the right backup? What's the command line to do the restore? When it comes to backup systems, those are the things you don't want. You don't want confusion. You don't want fear or uncertainty. We also talked to our open source friends. We got their feedback on what are the things you'd like to see us do with SHIELD, what are the things you're doing with SHIELD, and we also looked through our GitHub issue backlog because people will often report things in tickets that they won't tell you when you ask them. Coming out of that, we built the SHIELD roadmap, which was a best guess on where we wanted to take SHIELD in the major features for the next iteration from SHIELD v6 on. That roadmap culminated in SHIELD v7, or at least it was supposed to, until I accidentally clicked plus on the wrong button in concourse and bumped the major version and then released it to bosh.io and then Dimitri refused to take it down because it had been done and what had been done could never be undone, so beware the power of your automation. It shifted an entire roadmap aversion ahead. As I was saying, the SHIELD roadmap culminated in v8 of SHIELD and there's a lot of new stuff in v8. I encourage you to take it for a spin. It's got a whole bosh release. We actually have a Docker image if you want to run on Docker. I'm told you can run it on Kubernetes, although I haven't tried that myself. But I want to focus on three things in v8 and the first of those is multi-tenancy. Anyone who's used SHIELD in the pre-v8 iteration will note that there's no access control, none whatsoever. We have basic auth on the web UI. We have basic auth on the CLI. But once you're in, you're in and you have free run of the system. You can do anything. You can be anybody. You can configure new systems. You can see all the creds. You can pretty much be the admin because there are no users. What that means is that if you're trying to propagate the usage of SHIELD and the practice of backups through your organization, you have two options. Everything in ops, which is terrible as an ops person, I can say please don't do this. It basically means that all of your app developers will be coming to you when they need to configure new backups. They will assume that you are testing their backups for them. And when it's time to restore their application data, it's not going to work out very well. The other option is a SHIELD in every pot. Everybody gets their own SHIELD. The front-end devs, they get a SHIELD. The ops guys get a SHIELD. The security team gets a SHIELD. The back-end devs get a SHIELD. The DBAs get a SHIELD. SHIELD, SHIELD, SHIELD, SHIELD, SHIELD. SHIELD, as far as the eye can see, this is equally bad because anyone who's deployed things more than once can tell you they always drift. They never stay in sync. And it's no fun. So in V8, we introduced multi-tenancy. And in V8, you can actually log in as an admin and you can create a tenant in SHIELD. And that tenant has a name, an identity, a whole set of users assigned to it in different capacities. Administrators, managers, engineers, operators. The different levels of access control are designed to meet what we feel are the archetypal roles inside of an organization. So an operator on a tenant can run backup jobs. They can pause backup jobs. Conversely, they can unpause backup jobs. They cannot reconfigure jobs. So your operators can't go in and say, you know what? Our 3 a.m. morning backups, they now happen at 6 p.m. See you guys. I'm going on vacation. They can't see credentials and they can't do much outside of use the system for its intended purpose. Technicians have a little bit more access. So technicians can come in and they can create target data systems. They can configure things, set up your credentials into your systems. They can reschedule backups. They can set your retention policies and all the things you expect to have to do to manage the data of a backup system. But they can't bring anyone else in. They can't hold the door open and let anybody in to make changes. That's where administrators come in. The thing I like about multi-tenancy is it allows you to expand SHIELD throughout your organization without having to worry about the two problems we talked about. You can have your development team or multiple development teams each in a different tenant. They have their own backups and their own schedules and they don't see other people's stuff. They don't accidentally delete other people's stuff and they sure as heck don't accidentally restore their data to someone else's system. The next big feature in V8 is encryption. At rest. I want to stress this. If you are a current SHIELD user, all of your data in-flight is still secured. We use SSH, we use TLS. All the data as it transits the network in both the backup and restore directions is still encrypted and always has been. What we punted on in the original was encrypting it at rest because we left that up to the storage systems. S3 should be able to encrypt at rest. Whatever you're configuring internally, you should have your own configuration and security controls. And that's a case where I think I personally misjudged the state of the world in encryption. So now what SHIELD does is it encrypts your backup archives as they're created and it dynamically decrypts them as they get replayed. So if we go back to this diagram, the backup to store retrieved to restore. What we did is we shimmed in an encryption pipe or filter or T. And as the data streams off the backup target system via the target plugin, SHIELD wraps it in a level of AES-256 symmetric encryption. Every backup job gets its own key and its own initialization vector or IV. And the keys and IVs are randomly generated via cryptographically secure random number generator. Super secure. That means that the store plugins never see decrypted data. It also means that the target plugins don't have to participate in the encryption. So we don't have to rewrite any of our existing code. If you've added plugins to a SHIELD deployment, they will immediately start encrypting data by virtue of taking part in the SHIELD system, not by virtue of knowing about encryption. Obviously if we encrypt, we probably ought to decrypt. I'm not real sure, but I don't think you can untar a ciphertext and get back anything useful. So we do the same thing in reverse. When we go to restore a backup, we retrieve it from S3 and at that point it's an encrypted blob that's completely unintelligible, we hope. And as we stream it into the storage plugin to do the restore operation, SHIELD unwraps that encryption and gives the plain text back to the target plugin. In order to do this, SHIELD actually has to keep track of every key and every IV that it has ever done. The main reason we randomize keys and IVs, by the way, is in case you're backing up the same data over and over again, there's a ciphertext stream coming out. It's a bit of a statistical analysis attack that can be mounted against your archive storage. So we do randomize for that reason, but it does mean that we have to keep track of all these keys. We set them up in Vault because of course we do. The SHIELD boss release for V8 actually has a Vault built in that runs on loopback. It's not exposed to any external network traffic. It's wholly managed by SHIELD. The SHIELD keys are managed by SHIELD in an encrypted file and the SHIELD itself, when it boots up, will ask for a master password and will refuse to do any backup related activities until that master password is provided. That master password never lives on disk, it doesn't stay in memory, and we never tell anybody about it. It's the purview of the administrators to remember that password, set that password and then re-key their Vaults or their SHIELDs as often as they like. We could talk about encryption probably until next summit because it is a fascinating topic, but the biggest feature in V8 has to be the Web UI. I don't know how many people have actually used SHIELD v6 or v7. I know you have, and I know all the people in red have. The SHIELD v7 UI is terrible. Let's be honest. I won't say it was written in an afternoon to meet a deadline, but it might have been written in two afternoons to meet a deadline. We basically took the CLI and we said, how would we make this into web forms? If you haven't used the v6 UI, or if you have, my condolences, if you haven't, you're lucky, but it basically amounted to pasting JSON into text areas. No fun. In V8, we watched how people used SHIELD, both on the CLI and in the Web UI, and we talked to them about how they reason about their archives. When they say, I'm going to restore something, here's the things that I think I need to know in order to affect that change in my infrastructure. From those interactions and that feedback, we built an operator-focused Web UI that is both pretty and functional and tries to cut down on the fear and the uncertainty related specifically to the restore, because in the middle of a restore, the last thing you want is to doubt that the archive you're using is the right one. Can I get an amen from Mr. Weibull? Talk about the Web UI, but I'd rather show it to you. So we're going to do a live demo here, which as we all know, having watched several talks, they never, ever have problems. Apparently I just did an update, that's cool. It's so pretty. It doesn't look anything like the old one, but we're going to go ahead and log in. I'm going to log in as an admin first. One of the things that I didn't actually put in the demo is we do support GitHub as a back-end authentication, and you can map your orgs and your GitHub teams into tenants, and you can assign roles based on team memberships inside your GitHub orgs. We also support Cloud Founder UAE, of course, and obviously because it's a great authentication system, and you can map your skim rights and your UAE groups into tenants so that you can have some semblance of parity between your Cloud Founder orgs and spaces and your Shield tenants. As I mentioned, these things never go without a hitch. Where's my Shield and the Wi-Fi? You guys didn't tear down, though. Yeah. Okay, so, um, like we can do test dev live on conference? Yes. As I said, no hitches whatsoever. Everything is perfect. Pay no attention to the massive fires burning in Google right now. I have a vault running so much. I have to change it because of my Vagrant setup, unless Vagrant's not running. So this was port equals 717. No, it's not liking this schema. Let me make sure I'm up to date. I think I just need to recompile because isn't this the thing, Dave, where we're not actually making Shield in the dev target anymore? Well, in the absence of that, do you have one? Where are my heroes, sir? It is. You're good. Hotswap, HA. Always have a backup. Even if that backup's name is the inimicable David Lowell. Right. Do you want to zoom with me? Always have a backup to your backup? They always say, hey, I'll trade you. Shield them in. It's probably the default. Oh, you're on... Yeah. 10, 200, 195, 15. 195, 15. Can someone get another laptop for XJ to hold? I would greatly appreciate that. 195, 15. Have we angered the AV equipment? I think we must have. Cool. Wow. So many hitches. So many hitches. Does everyone want to gather around? Sorry, this isn't a demo. This is a pre-recorded video where everything works without issue. Did I pull it out of the back of the... Unplug it from that side? It says DAC70. I think I broke anything. Can we get some more nerds up here to look at the AV? Say what? Have fun. Fear, uncertainty, confusion, of course. The nemesis of all presentations. I mean, I guess we could do questions now. If anyone has any. I'll repeat the question if anyone has any. Aside from why it doesn't the demo work. Mr. Seravian. So we actually encrypt the byte stream as it comes out of the target plugin, right? Because SHIELD does not... It's really just a carrier of bytes. The file system plugin, for example, is tar-based. So when you say, hey, backup of our VCAP store, it's going to go through, find all the files, create entries in the tar ball for directories, set the ownership permissions and the access rates and all that stuff. And it's going to emit that tar ball in plain text, at which point SHIELD, via a process called SHIELD pipe, which runs locally on the box doing the backups, will turn that into a CBC, a Cypher blockchain, and actually do the key IVs magic to encrypt that before the storage plugin sees it. And then, of course, the reverse happens with the decryption side. Oh, yes, I should talk about the BBR. Bosch Backup and Restore, in case you haven't heard, is the way that the Cloud Foundry teams, the Bosch team concourse, et cetera, are packaging up backup scripts, systems, integrations that are aware of how the deployment operates. The example I like to use is the UAA database. So in Cloud Foundry, the UAA node will have a BBR hook script for do a backup of the UAA database. And what that hook script does, at least last time I checked, is it would stop the UAA itself so that nobody's accessing whatever the backend database is and then connect in using the same creds that the UAA is. So all you have to do is say, hey, BBR, I need a backup of the UAA database. And what you get from BBR, just straight BBR, is then an archive on your machine. I believe it uses SSH and some Bosch magic to pull it off the box. You're then responsible for keeping track of that archive. You're responsible for putting it somewhere safe. You get to do the store and retrieve operations on recovery. Oh, cool. Likewise, with BBR Restore, you have to give it that archive and it will, again, drop the UAA process, connect into the database, do the restore, and then bring UAA back up. BBR is great for integrating with Bosch releases that have BBR support, which is a bit of a tautology, but it's true. Until BBR gets everywhere, we still have a lot of play for shield plugins. But we do have shield integration with BBR. So if you want, we have two plugins, BBR Deployment and BBR Director, because we have to handle Bosch a little bit differently than most deployments. And you can have shield connect to the box that's doing the BBR stuff and say, hey, do BBR, but when you get that archive back, stream it back to me so I can put it in whatever storage I want. And then shield will handle your retention policies and automatic expiry of your archives, purging them off. And then when you do a retrieve or recovery, it will go out to your cloud storage, pull whatever BBR gave it, now it will decrypt it and play it back through the BBR tool chain. And that way we kind of bridge the best of both worlds, right? You get the scheduling policy of shield, because BBR doesn't have its own scheduler. You have to set that up. And you get the retention policies of shield, but you also get the intimate tie-ins to your deployments that BBR brings you. This is up. Let me get on the VPN. And what was the final count? Three, four, four nerds to bring back there. And of course, use nerd in the best case. Is it straight HTTPS? Yes, okay. Got to love lab environments. All right. So, pretending none of that happened. Boom, boom! Admin? There we go. All right. So, I guess we get to go through a full-on setup. So this is the new web UI. Up across the top, I'm trying to make this a little bit bigger without destroying the...going into phone mode. So across the top, you've got identity. This is the alpha shield. So if you run multiple shields, as I know several people we work with do, you can kind of keep, you know... Was I in Amazon West? Or was I in the Azure one? Or was I in vSphere Dev? Because those are very different environments. And then it gives you this lovely little heads-up pane that shows you fun things like the IP address, the name, the version of shield that you're running. Actually, no, we don't do that here. That's in Admin. The middle pane there, it says there's a big red angry thing. It says shield is uninitialized. That is a quick heads-up on the health of your shield environment. It shows you things like whether or not cloud storage that's configured is working. Because we now test that in v8 where we'll run just a sample. Hey, can I put something in S3? Has someone rotated the creds on me and this isn't working anymore? So that you find that out at 9 o'clock on Friday morning instead of 3 o'clock Saturday morning. A key difference in the on call there. Also, all jobs succeeding to show you, hey, there might be something wrong with one of your backup jobs. It's not pinging properly. Over here on the far, we've got the data protection summary that shows you how many backup jobs you're running, how many archives you've created, how much size and space you're using on your blob store. And then the daily storage increases a nice little, am I adding more to my backups total if I take into account purgation, retention and expiry, and net new data. And then all this other stuff down here is mostly admin stuff. So we're going to go ahead and initialize the shield. Pick our new master password. It's master. We'll initialize the shield. Shield gives you this fixed key. Just a quick note. When you're trying to restore in a disaster recovery scenario, having a vault hold all of your backup encryption keys is a bad idea since the encryption keys are in the backup that you need to use to decrypt the backup that has the encryption keys. So what we do is we generate this very random fixed key that you keep somewhere else, and then you can tell shield this is a fixed key backup so don't go creating a new key in IV. That's mostly for shield itself. Thanks. So now we're all healthy. Everything's good to go. I'm going to quickly attempt to create a tenant. And we're going to call it Stark and Wayne. And I'm going to go ahead and invite myself as an admin to make life easier. Now I'm going to log out, log back in. Now I have a tenant, this Stark and Wayne tenant, and I can access these guys. And I can see my systems, my configured storage, and my retention policies. Over here on the sidebar, we have a whole bunch of wizards for things like, I need to run an ad hoc backup. I just need to know what data system you want me to run against, and we'll just do what we need to do to get it into storage. Restoring data from a backup is one of the biggest areas we've improved with Shield V8, and that's to remove that fear and that uncertainty. That wizard works and we literally don't have time to go through it. The way that wizard works is it shows you the data systems. It says, do you want to back up or restore Postgres? Do you want to restore CC database? Something else. And then when you click on that, the next step is here's all the archives. Here's the most recent. Here's when I took it, how big it was. And then a big blue button that says, this is the one I want, because nine times out of ten, the backup you want is the most recent backup, not the fourth from the most recent. But you can choose those if you're into that sort of thing. And then configuring a new backup job takes you through just a nice little wizard of where's your data, how often should I back it up, how long should I keep that, and where should I keep it. And then there's this lovely timeline view that I hope you all get to experiment with where you can annotate task logs and you can say this backup failed because the storage team revoked our creds for some reason. So don't worry, we went ahead and manually ran it, so the next one up should be successful, and that way you've got some history and audit logging going along with your backups because the backup system is not something you need to be staring at day in, day out, and you need to be able to account for any and all failures that it's reporting to you. So I wish we had some more time. I believe I'm officially three minutes over. If there are any questions, feel free to stop up by the stage or we can go out in the hallway if we get kicked out of the room. I'd be more than happy to show you something and we can hack on some things, talk, answer questions, whatever. As I said, I'm James Hunt, I work with Stark and Wayne, and that was SHIELD.