 Welcome everyone. My name is Dinesh Sranee. I'm a software engineer at Portworx and today I'm going to be talking about how you can use external persistent volumes to achieve higher availability and reduce recovery times when using stateful services on DCUS. So let's jump right on. So here are the topics I'm going to cover today. We'll talk about the different types of stateful services. Then I'll go through the advantages of using external persistent volumes. I'll also give you an introduction about Portworx and how you can deploy services on DCUS to take advantage of Portworx volumes. Then I'll do a demo showing how you can install Portworx as well as Cassandra to use Portworx volumes and demonstrate failover and some other useful scenarios. So let's talk about stateful services. There are basically two types of stateful services when it comes to persisting data. The first are simple applications which don't do their own replication. They rely on the underneath storage layer to be always available. WordPress or MySQL are examples of such kind of applications. Now the second type of stateful services are basically applications that do their own replication across nodes. So in case a node dies or fails, there is always either another copy of the data available somewhere or somebody is able to replicate data onto that node. So if a node that has crashed comes back online, then the application takes care of repairing data onto that node. This repair can be done either manually or it can be automatic depending on the application that is using the data. Some of the common applications used nowadays like Cassandra and HDFS are examples of such a kind of stateful services. So now you might be asking why is this replication strategy important? Well, it's because bad things happen all the time, right? Your nodes could crash, your network could have issues, disks on your nodes could fail. You might also have power outages which might take down complete racks of your nodes. For applications that do their own replication, there's always another copy on one of the other nodes in the cluster. So they can continue to serve IOs. And if you have to replace a data, you can just bootstrap it and repair all the necessary data back to it. This does end up taking a long time depending on how much data you had on the node that failed. For non-clustered applications, though, if you had no backup and they were using local storage, your application is doomed. You will not be able to bring it back up unless you can either move your data disk to another node or restore from a backup. And if your disk has actually failed, you will end up losing your data and you will not be able to bring your service back up. So how can persistent storage help in all of this? Well, for the case of applications like MySQL and WordPress which don't do their own replication, it can help you provide high availability for your services and make sure that your downtime is eliminated. And for services that do their own replication, it can help reduce recovery times by a large amount. This is because you don't have to bootstrap a new node every time a node fails. All you would have to do is basically start the same task on another node and basically the external storage would be able to be mounted on the new node and you would only have to run a repair to restore data for the rights that you had missed while the node was down. So I'll talk a little more about this in the next slides. So before we talk about the scenarios, I would like to give a brief introduction about Portworx because a lot of the scenarios that you talk about are based on software-defined storage solutions like Portworx. So Portworx is the first production-ready software-defined storage solution designed from the ground up with microservices in mind. So using Portworx, you can provision and manage container granular virtual devices and a tight integration with schedulers like DCOS and Kubernetes and Container Orchestrators helps run your workload local to where the data is. So you don't spend a lot of network bandwidth basically traversing your data between nodes. We also have features like snapshots and cloud snaps for backup and DR. We also have support for encryption so you can actually integrate with Vault as well as DCOS Vault to provide keys to encrypt your volumes. And all of this is done using automated provisioning and control. So you can basically repeat this entire provisioning using RESTful APIs. And Portworx itself actually runs as a container. We are also working towards integration with CSI. So when DCOS has support for CSI, we will be able to support CSI too. So this is how Portworx, the Portworx architecture looks when it's deployed. Portworx basically consolidates all your storage infrastructure whether it's bare metal servers, existing NAS or sands, clouds or hybrid clouds into a unified data layer which is specified by the orange part of this diagram. So by deploying Portworx, you have persistent storage ready to be provisioned for all your stateful containers and your apps driven by a scheduler of your choice. So the block layer replication performed by Portworx is synchronous. So all the replicas will always have the same data. This ensures that your application is always able to come up on another node in case one of the nodes where your data is placed crashes. So once the node comes back up, Portworx will automatically repair all the data that was missed on that node and bring it back in sync with the current state of that volume. So if the node goes away permanently though, Portworx will be able to re-replicate the volume onto another node completely. So this ensures that Portworx is able to deal with any further crashes so that you always have data for your volume available. So now let's take a look at how recovery times with local volumes and external persistent volumes are different. With local volumes, your data is basically pinned to a particular node. So if a node crashes, the faster it comes back online is the best for your application because you would have to repair less data. But in reality, that is very hard to enforce. Because if you have any kind of maintenance on your servers, it could end up taking some time and basically when the node does come back up online, you would have to repair a lot of data to get it back in sync with the rest of your cluster. Also, servers take some time to reboot whether it is bare metal or just a VM in the cloud. And in case your node fails permanently, this recovery time will be even longer. Because what you would have to do is you would have to provision a new node, add it back to your application cluster and you would have to bootstrap it with all the data that was missed while the node was down. And this involves a two-step thing. For examples like Cassandra, you would actually have to bootstrap the node and then repair all the data back to the node to make sure that it is in sync with the rest of your Cassandra cluster. Comparing this to external storage, your volumes will be accessible from any node in the cluster. So if a node goes down, you don't need for it to be able to come back up for the node to join back into your application cluster. Your scheduler would just need to start the same task on another node and use the same volume from the node that went down. And in case a node fails permanently, this is the same two over there. Because you would be able to use the same volume to basically bring it back up online. So in this case, for example, if you were running Cassandra, you would not have to run the bootstrap step. You would only have to repair data to that node that was missed while the node was down. So here are some of the advantages of using portworks. So you might think, why can't you just use a SAN or a NAS? So the fact is that using a SAN or NAS is kind of an anti-pattern for container workloads because it involves static out-of-band provisioning, which is not something that fits in very well with the current DevOps flow for new generation apps. Also, using SAN and NAS introduces a lot of latency in your apps since you actually have to basically go over the network to get storage, which is different from what you would actually want for your high-performing apps. And if you actually used any of SAN or NAS services, you would know that there are a lot of issues just around automating the entire workflow with containers and schedulers. You will frequently run into issues where you're not able to use the same volume if a node had failed onto another node. And in fact, if you actually hit network issues contacting your NAS or SAN, your entire service will be down. You will not be able to provision anything just because that one link to your NAS or SAN is down. So the advantage of Portworx here is that it is built ground up with microservices in mind for containers. And our goal has been to basically have a tight integration with schedulers to basically avoid the pitfalls that you would see during failure scenarios. And with Portworx, you can actually have a unified solution for hybrid deployments. So in case you have an on-prem deployment as well as a cloud deployment, you don't have to have two strategies to deploy your apps. You would basically be using the same Portworx solution in the cloud as well as on-prem. So all your automation can be consolidated to use the same automation. So you might wonder why can't you just use EBS directly, right? So EBS doesn't work for hybrid deployments. It's fine if you want to use it in the cloud, but then if you have an on-prem deployment, you would need to have different strategies for that. Also, EC2 instances have a limit of 16 EBS volumes per EC2 instance. So you will not be able to deploy a lot of apps if you're using something like EBS. And if you have actually used EBS, you would know that a lot of the times EBS volumes get stuck in detaching an attaching state. So when your EC2 instances fail, you will basically not be able to use the EBS instance, EBS volumes on other instances. You would basically have to go in and manually force it to be attached or detached so that it's usable somewhere else. And performance is also not very good when you're using EBS volumes unless you're using provisioned IOPS, but then you end up paying a large cost for using that. And if you're using EBS, your failover is slow because that's how it is designed. So AWS would first have to realize that your EC2 instance is down. They never need to make sure that the EBS instance is not being used on the old node, spin up a new EC2 instance, and then attach the EBS volume there. So basically using portworks with stateful services, there are three ways to use portworks with stateful services on DCUS. You could use Marathon to basically deploy services to use portworks volumes. We also have a couple of services in the DCUS universe based on the DCUS Commons SDK, which has added support for portworks volumes. So they're available in the universe and you can just deploy them through there. The changes that we've made to the DCUS Commons framework is also available on GitHub. So you can actually write your own services to take advantage of this framework in case you want to use portworks volumes. So this is an example of how you can use Marathon to deploy apps to use portworks volumes underneath. So if you're familiar with deploying apps through Marathon, you basically have to specify a JSON format for your apps to specify what container you want to use and the volumes you want to use as well as your port specification. So to use portworks, all you would have to do is specify PXD as the volume driver, and you would have to specify the options for the volume that you want to use. So there would be no out-of-band provisioning. As soon as you start the app and you specify the volume options that you want to start it with, you would be able to use the app. And this same method can be used to deploy apps using Docker or UCR on DCUS. So the second way is to deploy apps through the universe. So we have a couple of apps published in the universe based on the DCUS Commons framework to use portworks volumes. Those are Cassandra, Hadoop, Elasticsearch, as well as Kafka. So apart from adding support for portworks volumes, the changes that we've made also support failure of tasks when nodes crash. So this ensures that your services have a higher uptime. And it also reduces recovery times because you don't have to wait for your node to come back up. As soon as DCUS recognizes that a node is down, the framework would know that would get a notification and it would be able to launch the same task onto another node using the same volume. So we've also actually updated the framework to be able to co-locate your data with your nodes. So because we have control over the framework, what we can do is we can actually query portworks and figure out where the data is and launch your tasks local to that node. So this reduces latencies as well as reducing network congestion on your network. So this is an example of how you can use the updated DCUS Commons framework to use portworks volumes. So DCUS Commons makes it very simple to start stateful services, but it only has support for root and mount volumes, which is not very efficient when you're dealing with failures, which will happen in large clusters. So what we've done is we basically added support for portworks volumes in the SDK, and all you need to do is if you're writing an application for DCUS Commons, all you would need to do is specify the volume name and volume options in the YAML specification for DCUS Commons, and you will automatically end up using portworks volumes when your application starts up. So once you have your application specified, all you would need to do is build it and deploy it, and you would be able to use portworks volumes. And again, same as launching tasks to marathon, all your volumes will be automatically provisioned, so you don't need to wait for a storage admin to basically come in and provision volumes for you. So the changes that we made to the DCUS Commons is available on github at github.com slash portworks slash DCUS Commons. Okay. So I'm going to show you a demo of how easy it is to install portworks, and on top of that, how easy it is to install Cassandra services to use those portworks volumes. And then I'm also going to demonstrate what happens when a node fails and some other scenarios do. So like I mentioned, all these services are actually available in the DCUS universe. So first I'm going to go ahead and install the portworks service. I'm just going to change the image name to use the enterprise image, and I have a final DCUS cluster, so I'm just going to change that to install it on all the five private agents. I don't think so actually. So I have actually sped up the video a little because it takes like eight minutes to start the entire thing, but I'm just going to pause it here. So what will happen is we'll launch the entire stack. So first we'll go ahead and launch a three-node HED cluster, then we'll start an HED proxy, then we'll start an influx dv node, which is basically used to store the stats for our UI. Then I'll go ahead and start a lighthouse task, which is basically the portworks UI. And after all that is done, it'll basically go ahead and install portworks on all the five private agents. So as you can see here, the HED three nodes HED has started up, influx dv started up, and then lighthouse is basically staging at this point. So since lighthouse is installed on a private agent, it's not accessible from outside the DCS cluster. So what I'm going to go ahead and do now is it'll basically go ahead and start a epoxy service, which is kind of a proxy from a public agent onto the private task that's running. So all of this information is actually available on the docs website, and it's also linked when you're starting the portworks service. So this is just starting a marathon app to start the epoxy service. And once that comes up, you can basically go to your private agent, port 9999, and you should be able to access the lighthouse UI. So once this comes up, what we'll see is basically portworks install is happening on the private agents. So we'll start seeing these nodes come up on the dashboard. So as soon as they come up, they basically contact lighthouse and tell them they're up, and as you can see, five nodes are installed. And it's a little blurry in the video, but all this happened in basically eight minutes. So we basically stood up an HCD cluster, the UI, as well as portworks, and it's already be provisioned in under eight minutes. So now that this is done, so you can actually also look at the status of your portworks cluster by running the portworks control CLI. So all you would have to do is log into one of the private agents and run PixiCurl status. And I guess the color is not very visible, but as you can see in the green, there are five nodes that are now online and ready to be provisioned for volumes. So now that portworks is set up on all the private agents, I'm going to go ahead and install Cassandra, which is available in the DCO's Universe 2. So this Cassandra service has basically been updated to automatically use portworks volumes when it gets provisioned. So the defaults are set to use the portworks. It's been set to use portworks volumes, but and we can also pass in additional options for the portworks volumes. So what I'm going to do here is I'm going to set up the portworks volumes to use a replication factor of three. So in case a node where the volume is provisioned goes down, we'll still be able to basically use the same volume on another node. So basically here is the portworks volume name and here is the portworks volume options. And I'm just going to specify a repel equal to three. And you can basically specify multiple options over here, which will be which are comma separated. So if you wanted this to be an encrypted volume, you would just say secure equal to true. You would need to make sure that portworks is set up to basically pulling keys from a secure vault. So once once I said this, I'm just you can also specify the size of the volume I've currently set it to 10 GB. And I'm just going to click review and install and install. Just looking at the volume list, I'm confirming there are no volumes available. And I'm just going to keep a look out for the volumes that get automatically provisioned. And I'm going to go back to DCOS. And from the lighthouse also you can see that there are no volumes currently provisioned. And we'll see as soon as the Cassandra nodes come up, the volumes get automatically provisioned and they're visible both through the CLI as well as lighthouse. I sped up the video a little because it takes time for the nodes to come up. But at this point, the three node Cassandra cluster is up. And as you can see, three volumes were created all with a replication factor of three and a size of 10 GB. So now that this is up, what we're going to do is we're going to basically go ahead and power off one of the nodes. Now what would happen with just the regular Cassandra framework was if you powered off this node, the framework would not bring up the same task on another node because all your data would be local to the node that was powered off. But what we'll see here is since these volumes are backed up by Portworx and the replication factor is set to three, DCOS will actually be able to bring this same Cassandra node up on another physical node. So I'm going to basically as such into one of the nodes and I'm going to just power it off. First I'm confirming that the volume was mounted and the task was running here and then I powered the node off. And going back to the UI, you'll see that this task will disappear. So DCOS already sees that the node is offline. And in some time you'll see that the node disappears from the list and another one gets spun up. It's not very clear, but the IP is actually one of the other nodes where the replica was for the data, for the volume. So it'll basically be able to join as the same identity as the node that went down. And we can see that Portworx also sees the node that was powered off is off. So it actually knows that this volume is safe to be attached on any other node. Scenarios like this, if you were using EBS or a SAN or NAS, you would run into issues where the software solution, the storage solution is not able to determine whether a volume is still being used by another node or not and whether it's actually safe to attach it to another node. One more interesting scenario I'm going to show you is, so as I pointed out, I had actually started these Cassandra nodes with 10 GB of storage. Now, if you're running any application in production, 10 GB of storage for each Cassandra node is very low, right? And if you were actually using this with the root amount volumes, you wouldn't have any way of actually increasing the size of these volumes. You would have to destroy the service, start a new service, and then copy all the data on, start a new service with increased volume size and copy all the data onto that. What I'm going to show you here is you can basically resize the volume online without actually taking your service offline. So, I'm not sure if it's very clear, but yeah, this shows that the volume was actually mounted on that node. So, and resizing the volume as simple as running the CLI, all you would need to do is run PixiCurl volume update, specify the name of the volume and specify the new size of the volume, and it would automatically go ahead and update the size of the volume. So basically over here, I'm increasing the size of the volume from 10 GB to 100 GB, and as you can see, it went ahead successfully. And if I do a volume list again, you can see it's not very clear, but over here, you can see that the size has been increased to 100 GB. And the application will actually see this size increase right away. So, you don't even have to restart your application to be able to see the increased size. Okay, I guess that's it for the presentation. Does anybody have any questions? Sure. Yes. So, you can start a service through Marathon to use Portworks volumes. So, for the example that I gave for MySQL, the only thing you would need to do is use a Postgres container and specify the same parameters, and it would basically be able to start a Postgres service. So, any container that, any service that you can start using Marathon, you can use Portworks volumes with that for persistence. And you can use Docker as well as UCR. So, all the replication is synchronous? No. So, there'll be a total of three replicas. One will be local and two will be remote. So, we do wait for all the IO to be persisted before it's acknowledged. So, there is a slight overhead, but it provides an advantage of providing high availability in case the node goes down. Yes. Yes. But we see in most customer deployments that they have, I mean, most of the customers have a high throughput and low latency backbone connecting the nodes. So, the changes actually are very minimum. So, it's all open source, like a pointer, or you can go ahead and look at the changes, too. But the changes are very minimum. So, the changes involve adding support for Docker volume drivers, where we support Portworks. And the second major change is to basically support failover of tasks. So, the upstream DCS Commons package, even for state less apps, your tasks cannot failover between nodes, even if the node goes down. So, that is something that was basically added as support. So, there are these two major changes, but it does not diverge a lot from DCS Commons. And we basically try to keep up to date with the DCS Commons releases. So, we will basically try to push changes into the universe as soon as DCS Commons make changes, too. Hi. So, I have two questions. Can you talk to running Portworks in an environment where only some of the Portworks services have volumes that are shared to the overall Portworks environment and making other nodes in your cluster available to that storage for, I guess, Headless is what you guys call that. And then, also, talk to your roadmap about removing SED from your dependency chain. And one thing that we found was that running SED within Mesosphere is not fast enough and it fails often. So, we extracted our SED outside of the cluster altogether. So, if you could talk to those two things, that'd be great. So, two things. The first thing is, basically, if I got your question was, how do you specify Portworks to be installed only a set of nodes and not on the other set? Okay. So, Portworks can be installed on nodes which have storage as well as which don't have storage. So, in case you install Portworks on nodes that don't have storage, you will still be able to mount volumes that are basically placed on the other Portworks nodes. So, that all works seamlessly. If you want to specify DCOS to run your jobs only on the nodes which don't have storage because they have a lot more compute power, you can always specify constraints. So, you can place labels on your Headless nodes. And when you start up these services like Cassandra or Hadoop, you can basically say that you want to start these tasks only on the Headless nodes. So, what will end up happening is they'll remotely, you will end up mounting, attaching and mounting these block devices on your Headless nodes, but your data will lie on your other storage nodes. Yes, they all have the same licensing. Yes. And about your question for HCD. So, yes, we have also realized that running HCD within this entire framework causes issues because what happens is, in case you hit issues with just DCOS in general, what happens is you lose state for those root and mount volumes for HCD. So, it is, for a test scenario, this is using the entire stack to start HCD as well as Portworks within the same thing, is a good way to basically make sure that you are, it fits in each and stuff. But for production, it is advisable to provision HCD separately or on a different cluster completely so that in case your DCOS cluster goes down completely, you're still able to bring Portworks up because your HCD is lying externally. So, Portworks doesn't have any dependency on DCOS. All it really depends on is that your data disks are not touched. So, even if DCOS goes down and you have to reinstall DCOS, you will still be able to start up Portworks and reuse the same volumes. Yes, and in the next release, we are basically planning to move the KVDB into Portworks itself, so you'll not have a need to even provision an external HCD server at all. So, we will not be using the root or the boot disk. We'll basically be using, we'll create a partition on one of the data disks and store HCD information there too. So, again, in that case too, if DCOS goes away, if DCOS basically, you have to reinstall DCOS, you'll still be able to bring up Portworks exactly in the same state. And all you would need to do is basically redeploy your apps and you would be back up in production. From some of the answers you've given to the questions here, it sounds like that there's always the potential that you could be crossing network boundaries with addressing storage and obviously with being synchronous there's some right overhead. Is there a capability to get additional read performance by having more replicas? Does it have the intelligence to distribute reads among the replicas? Or is it always focused only on the local copy even if it's, say, like across the network you're running headless compute against storage nodes? So, our reads are optimized in the sense that we would, we basically send out reads, we do reads from all the replicas and basically we try to send out more reads to nodes that are replying faster. So, in that case, we are able to basically utilize reads, utilize all the replicas to perform reads. And this is all consistent. So, this is all consistent. So, reads can come from any data and they'll always be. Second question. Since this causes storage across network boundaries as well, do you do like storage pooling across network boundaries? So, let's say a single storage node does not have the capacity to service what you need, say a staple application that doesn't support sharding. Would you be able to use Portworx? Yes, we have support for aggregated volume. So, you can basically, apart from doing replication, you can basically split out a volume up to three aggregates. So, basically, you can split it out amongst three nodes. Yeah. And this has an advantage, right? So, if you basically want to scale, you can basically either scale up your nodes or you can scale out your nodes, right? So, you can actually add more disks to your existing nodes or if you don't have enough capacity or slots, you can basically add more nodes and you'll be able to basically resize your volumes across nodes in that case too. Any more questions? Yes. So, if it's not the same size, you'll basically end up creating a new pool but it'll still be accessible for any new volumes that you want to create. Yeah. And what we basically do is we are able to partition, we are able to create different pools for different types of disks. So, if you have SSDs as well as SSDs, we'll basically end up creating two pools of these disks. And depending on your application, for example, if it's Cassandra and you want low latency, you can basically specify the class of service that you want and you'll be able to use the SSD for your Cassandra services. But if you have something like Hadoop where you want more throughput, you can basically specify the HDD to be used instead of that and it'll automatically place the replica on all the different nodes on the same class of service. Yeah. Go ahead. So, do you mean replication across availability zones? Sorry, I didn't get your question. So, these would be two different clusters completely or would it be a part of the same cluster? Or would it be two different nodes and two different racks is what you mean? Okay. So, right now, we don't have support for replication across clusters because, if I'm understanding correctly, these would be two different portworks clusters completely, right? Yes, you can do that, but obviously you would have to realize that there would be a latency involved when you're doing replication across like a van or something. I guess that would all depend on your application needs and what your application requires. For example, if you're running Cassandra, I mean, that really won't help your application, right? So, it can be done, yeah. There's no limitation from our side. Yeah. Any more questions? If you want more information, you can visit us at our booth. You can visit us at our booth. You can also visit our website at portworks.com. Like I mentioned, all our services are in the DCUS universe, so you can just search for portworks and they will all show up. And if you want to request a demo, you can always contact us at info at portworks.com. Thank you.