 Hello, hello, is this thing on? Oh, it is, it is, excellent. All right, welcome everyone, thanks for coming. My name is Ted Young. I'm the lead on persistence at Cloud Foundry. A year ago, I came and spoke at this conference about a proposal to add persistence to Cloud Foundry. And I'm happy to say today that we've gone and done it. So I'd like to welcome onto stage my co-workers, Promote Manageri from IBM Research and Paul Warren from EMC. And we're going to talk to you about the persistence options we've added to Cloud Foundry today and the ones that are coming soon. All right, Promote. Good morning, everybody. A quick outline for today's talk. First, we'll talk about why we need persistence for CF applications, by this I mean native persistence. We'll back it up with a few use cases that we have come across from multiple customer scenarios and multiple application use cases. We'll follow it up with the architecture we came up with and then go to a couple of walkthroughs for WordPress and one sample big data analytics application and follow it up with futures and comments. So Cloud Foundry is a very good runtime for web applications built using the 12-factor paradigm. CF runtime manages application instances. It actually manages cells and VMs and deploys the application instances in isolated containers using build packs. There is a fMural file system available inside a container which is at the disposal of the application, but as the name suggests, it's fMural. There are no guarantees on this file system. This is local to the container. If you have multiple instances of your application, chances are they're deployed in different cells and this local fMural storage is not shareable across instances, so there's no common namespace. The only way to do state persistence in Cloud Foundry today is via the services route. You have a bunch of different types of services available in Cloud Foundry today. There are relational services like DB2, Oracle. There are no SQL engines like Cloudant. There are also the object storage services like Swift, which you can use for persistence. But the key point to note here is the services themselves don't run on Cloud Foundry runtime. They run on other runtimes externally or directly on infrastructures. If you look at the current model and if you try to look at typical legacy applications which have a lot of file system dependencies in them, porting them to Cloud Foundry is not a trivial task. Eliminating all types of file system dependencies in your code is quite complicated and often you'll end up probably rewriting, you're better off rewriting the entire application code from scratch, but this is not always feasible for all application scenarios. And when it comes to 12-factor conformance, achieving the last mile of 12-factor conformance is the most crucial or non-trivial part and that makes porting existing legacy applications to Cloud Foundry really an arduous task. Further, most applications use a lot of third-party dependencies and libraries which have been accrued over time for different purposes and eliminating file system dependencies from them is often a non-starter for porting applications to Cloud Foundry. The other type of applications or composite applications nowadays are composed of multiple microservices each deployed on different runtimes. You might have some microservices running on Cloud Foundry and some others running on infrastructure providers directly and in these cases, porting existing data sets which are the heart of all these applications becomes a very arduous task and this is unstructured data and it's not always possible to API file this data and file system is the only common way of sharing data across them. You could potentially do a ETL phase where you transform all the file system related data to objects or no SQL, but this is often a very arduous task. And often, not all data is suitable for no SQL or object or DV backends and it's so unstructured that it file system is the best choice for actually maintaining this data. Further, it's not just a matter of doing a one-time ETL at the beginning of your deployment phase. Chances are that since your deployment strides across multiple runtimes, you need to be in a constant cycle of doing this ETL on a day-to-day basis. So if your data changes in one runtime, you need to again do an ETL from file system to objects or no SQL engines and ingest it into Cloud Foundry and then actually consume it in your Cloud Foundry applications. So it's not a one-time ETL process. It's a very constant process for a lot of applications and this is a nightmare and such applications typically cannot be ported to Cloud Foundry today. There's however one alternative to overcome these limitations. That's basically by using Fuse-enabled file systems. So what you do in this case is you use a space file systems like SSHFS and actually mount remote file systems into the container directly. By doing this, you can access external remote storage in file systems and your applications don't need to eliminate all file system dependencies and they can be readily ported to Cloud Foundry. But however, this comes with a bunch of challenges. First and foremost is security. What typically happens is when you're trying to do a Fuse mount inside a container, your container needs to be running in privileged mode. Otherwise, containers don't support mount commands. So this is often a non-starter because running containers in privileged mode opens you up to a lot of security challenges in a provider runtime. The second challenge is that of concurrency. When you're doing Fuse mounts and if you're mounting the same file system across multiple containers, there is no consistency guarantees that are achievable in such systems. Third important point is that of performance. Since you're using a user space file system, it cannot match the native file system access that you can get from traditional file systems. So these limitations make the Fuse-based systems non-starter for quite a few situations. With all these in mind, we started designing capabilities in the runtime to support cluster file systems and different types of persistence mechanisms and Paul will take us through the design of this. Yeah, thanks for making it. So design goals, let's have a look at those. We knew right up front that we weren't going to build hooks for all of the storage systems out there and neither did we want to. So primary design goal was to be open and extensible in the solution that we provided. We also decided for a couple of reasons to surface this through the standard service broker semantics and model. Two main reasons there really was we wanted to be familiar just like attaching any other service to your app. And secondly, we know that even with the same category, all of these storage systems have subtleties to the way they work. So we wanted the app developer to be cognizant of the type of volume that he was attaching to. And obviously we wanted to solve that security problem that Promo talked about. So we wanted to be able to do this so that the containers didn't need to run with any special elevated privileges but they could still mount volumes. So implementation, how do we actually solve that? We're looking in the community, there was no standard body out there for volume mounting but Docker had done quite a lot of work in this area and having a look at their stuff, they had a fairly nice volume API and a nice ecosystem of volume drivers being written to that API. So we decided to take the Docker volume plugin and implement a volume manager to that. Service broker was pretty good as is, didn't really need to make any changes to that apart from in one area where we just extended a little bit so that it could incorporate volume mount instructions. In Diego, we only had to change for the first phase anyway in a couple of areas. One, we had to add a little bit of volume information to Diego so that the auctioneer could be a little bit clever about where it placed apps on cells. Clearly we don't want it to place an app on a cell that doesn't have the right volume drivers for the type of volume that the app needs to mount to. And secondly, we actually had to obviously make Diego do the mounting itself in a couple of respects. It needs to mount onto the cell firstly and then it needs to mount, take that volume and mount it into the container. Picture paints a thousand words, so let's have a look at that. Here's a rather simplified version of the Cloud Foundry architecture. Don't pick me up on this, it's simplified. So let's see what bits we added. First and foremost, obviously we've got the storage sitting outside of Cloud Foundry. Some sort of volume service, okay. Could be EMC scale IO for block. Could be EMC isolons 1FS for NFS stuff. Could be IBM spectrum scale. We actually chose to implement a reference implementation, so there's something that works out of the box and we chose CEP for that because it provided what we needed from day one and actually provided a little bit of stuff for us moving forward. As I said, we surfaced this through the standard service broker semantics. So we front that with some sort of volume specific service broker. Okay, that like provides, I think in modern lingo, that provides the control plane for the storage system. Then added a volume manager onto each cell and we call this thing Volman. And that doesn't actually really talk to the storage systems, right, open and extensible. So we add volume drivers, we co-locate volume drivers onto the cell at the point you install Cloud Foundry and those things, as we've already talked about, exposed via the Docker volume API. So the Volman talks to those things using the Docker volume API calls. So those are the pieces we add. How does it really work? Again, reinforcing the point that we, just another service, right, but it's a little bit of a special service. Okay, so your developer needs to bind his app to some volumes. He just issues the same CF create call, right? Cloud Controller receives that and it forwards it onto the volume specific service broker, the one that can handle that. And that does whatever it needs to do to create an instance, a service instance. In our reference implementation for CEP for a couple of reasons, we just made it do something really dumb. So it creates a top level folder named using the service ID, but we envisage production versions of this doing something slightly more sophisticated than that, obviously. Then, when you actually wanna bind it, you just issue a CF bind call, Cloud Controller receives that, forwards that onto the volume specific service broker. The best way to think about this is that in response, the volume specific service broker issues one or more volume mount instructions back to Cloud Foundry. Cloud Foundry then takes them and does a couple of things. First thing it does is BBS talks to volume manager and the volume manager talks to the volume specific driver on the cell to perform the actual mount onto the VM, onto the cell itself. And then secondly, whoops, wrong way. And then secondly, BBS instructs garden to take that volume mount we just did at cell level and mount it into the container that's gonna host the app that you're pushing. And that's pretty much it. So you can see it's a fairly simple picture, but whilst it's simple, we think that this thing cases for all of the current use cases and all of the future use cases we've got moving forward. I'm gonna hand you over to Ted who's gonna talk a little bit about those. Yeah. Thanks, Paul. So let's look at a couple of specific examples to really drive home how this works exactly the same way other services work. So let's talk about blank storage. This is the most obvious example. You have an application. Let's say it's a large WordPress application. You have a very high traffic blog. And you'd like to use WordPress on Cloud Foundry, but you'd like to use it the way WordPress was intended to be used, which is to have a site administrator be able to install themes and plugins directly through the admin panel, rather than having a developer do that for them. You'd also like to dynamically scale the write load. In other words, it's not enough just to cache the responses from WordPress. You also have a lively comics comment session, maybe a forum, something that has a write load that can spike. You'd like to be able to scale horizontally. So Cloud Foundry allows you to do that, provided you have a distributed file system in your service marketplace. So if we look at pushing WordPress and then creating a CephFS service that attaches to it and then scaling WordPress up, you can see it works like any other service. And if we focus on the calls we're specifically making to the service broker, you can see we're calling Create Service with a premium plan. There's some storage-specific options here, but these are specific to this service. So it wants to know which storage tier you'd like. Do you want SSD or spinning platters? Do you want any kind of out of band backup to be happening? This particular service broker provides nightly backups, so we're turning that on. And then when you bind it to your application, you're asking the service broker to bind it to a specific mount point within your application. So for WordPress, that's the WP content directory. That's where all the themes and plugins go in. Then you start WordPress and then you scale it and it works just like usual. Let's go through another example. So this is an example of working with an external dataset. So to set this up, imagine you have satellite images and every day new satellite images come into your system and you've been doing this for years and those images are stored on a variety of storage back ends. The oldest stuff is still on tape drive. You've got things in EMC Isilon. You've got something new maybe that you're rolling out tomorrow that you're gonna add to that cluster. And currently your developers have to know where these things live if they wanna access them. And you'd like to move those data processing workloads on the Cloud Foundry, but you can't move all of them on the Cloud Foundry. Some of them still need to be running in virtual machines or in some older backend. So how would you do that? Well you do it with a custom service broker. So if we're pushing an image processor app to Cloud Foundry and then binding it to our satellite image service broker and we look at the options that we're talking about when we bind to that service broker, we're saying we wanna bind to the daily snaps snapshots. We're interested in some image sets, specifically we're interested in the four band images and the multi-spectrum images for this particular use case. And then when we bind it to our application, we're giving it a top-level data path, but our expectation is it's going to mount multiple things under that. It's going to mount the ingestion data, the input and read-only mount. It's going to mount the output somewhere else and then possibly mount some local scratch space in which to do our ETL workload. Then we start our image processor and we're off to the races. Now notice there was nothing in this create service or bind service call where I talked about what type of service was running back there. I'm not talking about please mount this thing from this Isilon cluster, please mount this thing from the tape drive. As the application developer, you don't really care about those details. You just care about I want the four band daily snapshots and so the service broker is capable of isolating all of that information from the application developer because ultimately it's the thing that's in charge of the volume mounts. So I think this shows that this clean separation using service brokers really allows you to continue operating up at the domain level rather than delving into the implementation details of how the service actually works. All right, so let's talk about the future. We've talked about distributed file systems and that's what we have available today but there's two other kinds of storage that we're looking at. One is local scratch space and the other is single attach volumes. So what is local scratch space? Well, you have ephemeral space available to your app but that's running on a layered file system in kind of a shared environment and it's not necessarily all that performant for heavy read write workloads and you might be running an environment where you have better dist available and you like to make those available specifically to the apps that have these heavy read write workloads. So we plan on extending the Docker volume plug and interface to allow drivers to advertise local resources and to have Diego be able to take advantage of them. So in this case, every new instance would get a new volume whenever it spun up and when the application instance is spun down that volume would be reaped and recycled. So there would be no permanent data but there would be temporary scratch space and this works nicely with our current stateless environment. Taking that a step further you have single attach volumes. So this is the same thing as scratch space but you're saying I would like when I start the application instance again for it to reattach to the same piece of data and to have there be a consistent relationship between my app instance and the application data. This is something that we can build from a persistence layer but it's problematic on a couple other layers. Notably, you're now bringing identity and state into the mix onto a platform that's primarily associated with running stateless application logic. So in order to do that, we would need to extend this scheduler to allow for consistency. So a consistent identity can be maintained across restarts. And also you would need to be mounting things that could somehow take advantage of these volumes and mostly what people think about in those cases are databases. But the problem with databases is the ones that people wanna run in production today they don't really cleanly support a separation between DevOps, the person operating the database and CloudOps, the person operating the machine the database is running on pushing CVEs to that things of that nature. So there's a couple other problems that need to be solved beyond just persistence in order to make the single attached volume use case work but we're very interested in that on Cloud Foundry. So you'll see further discussion coming in the future about how we can make that happen. And in fact, the container networking team down in LA is already starting to work on this problem. So to sum up, we currently support distributed file systems we'll soon be adding support for scratch space and then we eventually hope to support single attached block devices. And that's where we're at. If you have any questions feel free to ask them at the microphone here or find us after the show.