 Hey, everybody. Thanks for coming. I'm Josh Krock. I work for Pivotal. I'm one of the solutions architects in the customer success team. So my job is actually the coolest job in the world. I spend about half my time with customers getting CF up and running and just helping people be successful. So sometimes it's app devs. Sometimes it's operations. And the rest of my time is spent working on Cloud Foundry to make it better, taking what we learned in the field and putting it back in the product. And that's really where this whole story about using service brokers to manage data instead of just connect data came from. My third or fourth week on the job, I was with a customer in Seattle, and we started to hear this need about, I've got data already, and I need you to connect it, which is great. But then we start to talk about life cycles. And we have some problems that come, and we can migrate an application through its life cycle, but we can't necessarily help it on the data side. So to kind of cover that, I want to talk about what does data look like. And those of you that operate your apps probably already understand some of this. Some of you don't, or it's hidden, you kind of forget, right? But in the first 12 hours of an application, and this came from, I used to work for a backup company, so this was a big retail company's backup schedule. You end up taking snapshots hourly, and you end up making a DR copy that's maybe an asynchronous replica, but it's something that's far away. You end up doing a backup. And then you get to 24 hours, and we start to double these things. You've now got two backups. You've got another 12 snapshots. So you've got a big pile of data that relates to your application at certain points in time. And if you expand this out over a year, you end up with 5,476 copies or something like that. I think that number's wrong, but it's close. And these aren't full copies, but they're still a copy. There's an entity out there that represents your application or some portion of your application. And there's a lot of them. They're hard to manage, right? The problem isn't the capex. The size of the copy is almost irrelevant. Just go buy another array, right? Every time I get to a customer site, you have this initial thing about, oh, this costs a lot of money, and then the next question is, how can I capitalize this? The real problem with all these copies is they're managed by three systems, right? We've got a storage subsystem, an array, or JBOT, or whatever you're keeping your data on, right? You've got a backup, and you've got a database, like an application level entity that understands your apps, your stuff. And then you've got a whole bunch of teams involved in these copies. All these guys, maybe this maps to your org. Maybe it doesn't. Opex is really the difficult thing about the kind of splurge of data in the environment, right? You can't buy more people to scale with the amount of data you create. There's kind of another problem here, is that we make all these copies, and we talk about what do we use them for, right? And they're all recovery, right? So most of the time, they're doing nothing. They're just sitting there, they're hanging out. So that's kind of the first part, right? I don't know, whatever you want to call it, assumption. There's a bunch of copies out there, not doing much. This is Cloud Foundry, right? We don't often talk about backup, and we don't actually talk a ton about data other than we're going to connect to it and consume it, right? But I think we all know, right? To write good code, we need good tests. And to write good tests, we need good data. We need to have data that represents the real world, right? Mock data will get us quite far, but it won't get us all the way there. And when I talk to customers, especially customers that are transitioning from client server apps into kind of cloud native, they have a problem where they've got big systems that need to interact, and they need to be able to reproduce the entirety of the system to test it. So we have to get them a copy. And so I have a little play. This is a real life experience of me when I had my old job, when we tried to figure out how to do this. So you start with a conversation about how do I get a copy of our production data? And the first thing somebody always says is we don't have any copies of that. And then we all kind of look at each other and we realize they might be right. We don't actually know. And some time goes by and we finally figure out through the organization who the operations team is and we get to a situation like this where it's like, yes, indeed, we do have a copy of the production data. You can't have it. Some more time goes by and you get here, right? Everybody's decided to work together, file a ticket or file seven tickets, right? Because it's not enough to have a copy. You gotta have a machine to put it on and you gotta be able to bring it up and all that stuff, right? So kind of the IT process we're used to to launch a new VM. Now it's a copy of stuff. So obviously I think we've created a new problem in this. Once you find a copy, you gotta do certain things to it, right? You don't just get handed the keys. You've gotta size it. And I'm gonna get on my soapbox for a minute here. We spend a lot of time trying to size copies of data for big things like regression tests. This was a really common thing we used to do at my old job. We tried to make our data sets smaller so things would go faster. And we ended up in problems like this, right? Where we tried to curate it. We don't know what we don't know about our data so we don't know if our tests are any good anymore. We also restricted our tests to what we left in the data. And the other thing that really jumped out at me as we kind of went through this process is we were able to hide performance issues. We missed an index on a field, didn't catch it until production. So sizing, maybe it's not worth it. There are some things you absolutely have to do that, right? We have to sanitize data. We cannot put credit card numbers and social security numbers and other sensitive information into tests. The other thing we have to do is delete data, right? Data rots, it gets old. It makes our tests irrelevant if we hang onto it for long enough, right? Prod and whatever your copy is will drift if you don't delete it and refresh it. So, right? We start over. Curation is a really expensive problem. Every time I work with a customer, I encounter people whose job it is to curate test data. That's the good side. Or I encounter customers who have test data that is three or four years old and they've been stuck in QA for a year and a half. So, what can we do? I think we can let CF manage the data with service brokers. We have a really good hook in a service broker. We have spaces. We have something that gets created in our space. Can we inject one of those copies into our space as we move apps through their deployment lifecycle? When we do this, we have to remember, right? Our backup copies, any secondary copy, probably exist for a reason. So, we can't hurt it. We need to come up with a way that we can leverage this copy without destroying it. I think there's a reusable pattern, right? I just alluded to it. It looks something like this. And I've got a demo of this. So, I've gone ahead and, oh, right, wavy part in the middle. I've got a demo of this where I'm running Postgres inside of a VM on AWS. And in the process of provisioning the service, we'll go ahead and we'll clone that VM and bring it up and connect to it. And then our test data will be able to operate. There's other ways to solve this problem, right? There are lots of other enterprise products out there, Actifio, Delfix, EMC. Lots of people make products that are supposed to do this. They will manage a data source for you and expose a data source to you. And it exposes a form factor you can consume. That's great. Probably a faster solution. But you got to ask yourself, does this work for you? And does it work for you here in the context of our Cloud Foundry applications? Can I consume it as a service? Or you got to look at building something. And this is definitely the approach I've gone down, right? Leveraging some IaaS primitives in order to build essentially a clone of a VM. So, I want to show you guys what this looks like. And then with any luck, we'll get through the demo and Amazon will play nice today. So, I've got a pretty typical layout, I think, for, you know, definitely a demo environment, right? If I say CF Spaces, I've got a development and I've got a production space. And we'll start in prod. I've already pushed an app up here that's connected to my production data source. You can see we've got a demo app. And we've got a Postgres database exposed to it from our special service broker here. And one of the things we can do is take a look at that. And you can see that we have, I've picked up, the app is pretty simple. It grabs the last 10 records in the database. We've got just over 430,000 rows. So, we've got a decent amount of data here. This is not, certainly not production, anything, but it's big enough to not be a toy. Right? So, if we kind of pretend we're the deployment pipeline, we've got a new version of code. We want to test some things and we want to walk through all this. The first thing we do or a tool would do is target development. And so, we've got an empty space here, right? We've come into nothing. So, we can say CF create service. Actually, let's do this, sorry, let me step back. So, we can see exposed to us. Our platform admin has given us something that exposes a Postgres running in Amazon and they've given us our production copy. That's what we saw in the other space. So, I'm connected to my production data in that other space. And then we can create a copy of that. So, if we go ahead and kick this off and then I'm gonna flip over to the logs and we're gonna kind of do that demo where we watch Amazon do stuff. Nope, I'm gonna flip over because I don't trust my logs thing. Okay, sorry about that. So, what we'll see is we're tailing logs is now I have a service broker running inside of Cloud Foundry, right? That I've requested it. Please make me a copy of our production data. And what we'll see is that it's now going out and it's talking to Amazon. Sorry, everyone, I apologize. Here we go. So, we came to the party a little late. I'm totally floundering, apologize. What we saw happen here is that we reached out to Amazon and we say, Amazon, please create me an AMI of my running instance. And so, what Amazon does on our behalf is it goes and it takes a snapshot. So, now we've effectively got a backup copy. We take that backup copy and then we tell Amazon, run this thing, right? Create us a new VM from it. And we can see now that Amazon has given us a new VM instance and it's starting to transition its state to running. Excuse me. So, at this point, we've now got a running VM and we had to go out and apply an IP to it. So, we've added an elastic IP to the box, right? So, now we can reach it from the outside world. And the service broker knows that, okay, I gotta wait for this thing to boot before I can do anything with it. The contract that we expose as a service broker for create service is to say, give me this thing back when I can consume it, when I can bind to it. So, we're using the async service broker API right now. We're just essentially waiting for Amazon. We've told Cloud Foundry, we're gonna do this in the background, come back and check with us. And we'll see that as Amazon kind of does its thing and we spin up. The service broker will just come back and pull. So, we also did something else interesting here, right? As I brought up this copy, I knew I had to sanitize it. So, we were able to run a script before we handed over control of the test database to Cloud Foundry. So, it's up, it's running, right? We've started the instance successfully and we can connect to it. The service broker, while it still has control before it's given control over to the application or a consumer, it goes ahead and it runs a sanitize script. The sanitize script we have here is fairly basic. We just set all of our credit card numbers that we saw before to zeros. Theoretically, this changes our data just enough that our tests can proceed but we're not exposing anything risky. And then we're done. We hand back control and we should be able to see our app now. So, or our service now, excuse me. Okay, so Cloud Foundry's yet again pulled and it's seen that our create service command has completed successfully. I'm gonna push an app and we're gonna go over and look at AWS and see what's happened. We have a very simple app here. Our manifest just binds it to the service we created. So, if we look at AWS, this is what we started with, right? This is our one, our instance running Postgres. And this screen was updated, right, before I ran that create service command. So now I can run a refresh and we'll see we've got another instance here. That instance is up and running. It's got an elastic IP associated with it. And if we were to click through the console, we'd see all the other artifacts associated with it, right? We've now got an AMI, which maps to that instance at a point in time. You can see this is what we created. You would also have a snapshot that maps to this and kind of all the artifacts involved in AWS. And that's great. So now we've got a copy that's up and as soon as our app push is done, we'll be able to interact with it. But we've now got kind of an interesting problem. It goes back to that where I asked, did we create another problem when we start making copies of things, right? How are all these artifacts managed? I can't just leave all this hanging out. But for the moment, let's just assume that that was good enough and we can say CF open demo. And this is our new app. And we'll see pretty much what we expect, right? I have a new copy of my data. My data's still got, you know, 430,000 rows. And I've got same thing that happened in prod. So production data just made it into prod. I'm sorry, production data just made it into test. And we sanitized it before we gave it over to the application. Everybody with me? Cool. So that's great, but we need to be able to reverse this process, right? I need to be able to say CF delete for this to work at scale. And then I need to be able to delete my service. So now we can pretty much just watch this process in reverse, right? AWS is gonna go through. We're gonna reach out to AWS and we're gonna say terminate this instance. And then we're gonna go down the list and ask it to terminate all the artifacts in the instance. When this is done running, we're left with kind of a clean space, right? We're left with what we started with. So there's some pretty interesting implications to this. I think if we look at this as a way to test environments to move apps through. The first thing is, right, we've definitely took an advantage of all the things that Cloud Foundry gives us as far as our deployment environment for our application remains the same, right? All that really changed was the space. My apps are connected to the same service. We're still binding via the name of the service. In this case, it was Postbroker or Postgres. We specified it in the manifest. The other interesting thing about this to me is that my service actually stayed the same, right? What better way is there to have production parity than to actually clone the thing in prod, right? I realized that if you have an exadata, you can't clone that rack. But you can ask Oracle to clone your database, right? We can ask VMware to clone an instance that's providing a service to us. OpenStack, right? Kind of ad nauseam. All of our IaaS tools or our production arrays, kind of, you know, all these big things we have in our enterprise are generally capable of providing us a space-efficient copy of something, something which I can take. And maybe it was provisioned by a backup application, right? Maybe we didn't actually have to cause the copy creation to happen. But I can get ahold of it, and I can write to it. And I can write just the change blocks off somewhere, like a writable snapshot. And then I can throw them away when I'm done. So that makes our service provisioning really fast, right? I just walked us through all this, and this is written through AWS, but it could just as easily be written against AVSphere. It could just as easily be written against Oracle. And this is where the talk goes from, like, hey, this is a Cloud Foundry Summit talk to, hey, this is a cry for help for an open-source project. So we're good here. It's gone. And if we go back into AWS, whoops, let's go back to instances. We'll see that we're cleaned up. Our artifact is gone. Cloud Foundry's cleaned it up. So I think this is really powerful, right, just to hit home, that we're now tying our data's lifecycle to our application's lifecycle. There were no tickets. There was no pre-provision thing out there that I had to deal with. I was able to create it on the fly and on demand. The last little bit I want to cover is the sanitize script, and then I'm going to show you two interfaces for how we extend this thing and beg for help. You saw earlier I called out a sanitize script for the service broker. So the service broker is just another Cloud Foundry app, right? Those of us who have written service brokers in the past kind of understand that there's five interfaces you implement in a service broker. As long as you fulfill the contract, you can run that thing anywhere. You just tell Cloud Foundry about the endpoint. So we deploy ours into Cloud Foundry. And our service broker presents a pretty simple interface that says, number one, log in, because you don't want everybody to be able to specify this. And then you just specify the sanitize script. So this is what we ran against that data as we provisioned it. When we first had this idea, we were talking, we thought, this is crazy, right? This whole thing sounds really hard. Can we make it work? And that's kind of how I got to hear it. I was like, well, why not, right? Let's experiment. We think we can. So I spent a lot of time thinking about this, and how do I make it flexible? And there were a couple of obvious answers, right? We use Java, because Java probably connects to your legacy data source somehow. There's a driver, there's something. We use Spring, because Spring lets us kind of inject a whole bunch of stuff. If we were to actually look at the service broker, we see that we just, target, oops, we just inject all of the variables, which we need for the service broker to do its job, right? It interacted with AWS, it needs a key, we interacted with some data source, we need its URI, we need some login credentials, all that kind of stuff. We're able to give the service broker enough information to do its job. We're able to do that separately, because of the way you can deploy applications in Cloud Foundry, right? The service broker is deployed to a space where only your admin or your DBA has control of those things. So we still kind of can maintain separation of concerns where we need to do that in organizations. So in order to do that, there's a body of code out here that I'll put the URL up to. And there's two interfaces that we're asking people to implement. We, me. There's a copy provider, there's the thing that makes the copy, right? This is what interacted with Amazon in the example you just saw. And it's a pretty straightforward thing, right? We've decided that there's an opaque instance ID, which you need to talk to me in terms of. So in vSphere, this would be a VM GUID, I think in OpenStack they use GUIDs to talk about VMs. Maybe it's a DSN, whatever. Give us a string. Go out and create your copy and the contract is you're done. Delete is simply the reverse of that. Go out and delete this thing and come back when you're done. These are all implemented synchronously. The broker's handled all the async for you. And then lastly, we have an operation to get credentials. So you're gonna create a copy. There's gonna be some change in either the URI to address this thing or the usernames and passwords, right? What did you implement in the broker? So that gets your copy up and running. And then you need to run the sanitize script against it. And that's where we end up here, where we have a pretty clean interface that just runs a sanitize script. We're gonna pass it to you as a string and do what you need to do. So that's it. The last thing you can do in here is you can change the password. So I don't have to expose the production database password to my test instances, right? So you just saw where I ran CFE and V and pulled back a whole bunch of stuff. One of the things we pull back in there is VCAP services. So you can change the login creds when you're doing your sanitization. That's kind of one of the best practices I would suggest. But that's really it. It's that simple. There's what, four, five APIs, something like that. And it seems pretty flexible. And so this is one of the things that I'm looking for validation on, right? Is this something that makes sense? Is this something people need? Certainly I hear it from the customers I interact with, but I get to interact with, you know, in a good year, maybe 20 customers. There's a lot more represented in this room. So that's kind of what I'm looking for from you guys. Is this a path that makes sense for service brokers? Is this something we should continue to look at? I should continue to look at questions. This is where I lives. Ideally it'll go into incubator if there's value in the idea. But yeah, that's it. That's the whole thing, presentation and demo. I'd love to have, answer some questions or like I said, take feedback. Does this meet needs? Is it totally crazy? Yeah, over here. So from the marketplace, you attach, you create a service and attach it into an application. How can you configure where to store your backups? Can you extract your backups and store them in a different media? Like in a block store or anywhere else? So I would pass that off as an operational, like what does the operational team want to do? Tell us where the backups are and let's write the service broker to consume those backups. So for instance, if you use net backup, or Tiblea or TSM or S3, inject that into the broker so the broker knows where to go get things. So today I showed you a broker that took a snapshot of a VM, right? You would have to rewrite that create copy portion to interact with a backup application or interact with S3. But one way or another you inject the coordinates and the smarts into the implementation of the broker. So the demo was on AWS and you said it could in theory work with, say vSphere, is it currently set up to do that or is that where you're asking for help? That's, so actually my buddy got a vSphere environment for me provision today inside of Pivotal. So that's probably what I'll spend the rest of the week doing. Man, if you can get there first. Totally happy to work with you. That is my next, like vSphere and then something kind of legacy in Formixy maybe. Cause right, it's one thing to do it with AWS and Postgres, it's quite another to do it with stuff that we all have to live with day in and day out. That's on my list of things to do. And as far as feedback goes, I know that in our organization we were talking about like this exact need. So it looks spot on to me. Okay, cool. Thanks. I can't see anything. I think that's it. Cool. Thanks everybody. I go take whatever's left of this. I'll be back.