 It's good to see you guys and it's good to be here in Prague, coming all the way from Chicago, so a bit of a long flight, but it's good to be here. My name is Chris Flinter and I'm a product manager at DataStacks. Yeah guys, just come on in. And today I'm going to be talking a bit about the DataStacks Enterprise integration on DCOS. And specifically we're going to talk about why production quality, stateful services are hard. We're going to talk a bit about the evolution of the framework, how we went from ground zero to where we're at today. And then I'll go through the current state of the project, show you the couple of tiles that we have on the DCOS universe today. And then we'll talk about some customer deployments, how they're getting real business value out of this service. And then finally we will conclude with a couple of demos just to show you guys how this integration is working today. So I want to start off real quick just by giving an overview of what DataStacks is to try to provide some context for the integration. So it's a peer-to-peer, no SQL database. At the core is Cassandra and we build in enterprise features such as search, graph, analytics, and we add some security on top of it as well. We see a lot of personalization, customer 360 apps built with it, messaging programs. We get a lot of sensor data, IoT stuff. And then fraud detection is a big thing with our analytics and graph features. And then interestingly enough we see it being used for playlists as well. So now I'm going to go into really what goes into developing these production quality, stateful services, and some of the technical challenges behind just the complexities that are involved. So first I'm going to start with some terminology. And this is specific to Cassandra and DataStacks Enterprise. A node is a single instance, so a single unit of DSE. And these make up data centers. And the data centers can be either logical data centers or physical data centers. And Cassandra, the way it works is it replicates your data based upon a snitch. And you can configure how many replicas you want in each one of your data centers. And the largest unit of work is a cluster. And the data centers add up to this larger cluster. And you can have several data centers in a single cluster. And in a world where data autonomy is becoming more and more important, it's important to be able to deploy these clusters either on-premise or in the cloud. And this is going to be a big piece of our DCOS integration that I'll go into a bit more later. So the next thing is the Gossip Protocol. And this is kind of what makes it all work. This is what makes it peer-to-peer. And these nodes are constantly talking. And they exchange these messages every second. And in these messages are the health of the nodes, whether they're up or down, where they're located. And this is really how the different nodes learn about each other. And they're all communicating to each other. It's not just a one-to-one relationship. And based upon this communication, this is how the different requests are routed. So if all of a sudden one node gets very slow, the cluster knows that it'll start steering requests around that slow node in order to keep the different SLAs and latencies in a good state. The next thing I want to touch on here is the write path. So when a request comes in, it's first written to a commit log, which is just a sequential log-based format on disk. And then it goes to the memtable where it's stored in memory. And once this memtable fills up, it'll be flushed to disk. And when it's on disk, compaction runs. And this will load the data back up into memory, sort it, and then write it back into disk in these immutable SS tables. And this is all important because the integration with DCOS, one of the features that we'll provide is node recovery. So it's very important for these different data directories to be able to be stored locally with the possibility of them being attached later on. So the next thing I'm going to touch on is the configuration management. There's more than 30 configuration files in DSC. So there's a lot to keep track of if you're trying to do this all by hand. And depending on your workload, you can have upwards of 1,000 different settings. And especially when you're making changes in production, which changes are likely, it can be very hard to keep track of if you're trying to do that all manually. And specifically here in the Cassandra YAML file, there's more than 1,000 settings, and in the DSC YAML, there's a comparable amount as well. So I describe all these things to get to the point that stateful services are hard, and managing, deploying, configuring stateful services are hard as well. But this is where the DCOS SDK really comes in and adds value to the data stacks deployment. It can do things like manager deployment, like I said before, either on-premise or in the cloud. It can perform maintenance for these stateful services automatically. If a node goes down, we can try to bring it back up automatically up to a certain point until we deem it down and that it will need manual intervention. It can also do things like back-of-the-store configuration, and each one of these pieces we configure the data stack scheduler to do it specific to the DSC tasks that I just described. So at the core of this is DCOS Commons. It's the open source library to manage these sorts of deployment and configuration strategies. And it makes things really convenient when you're operating in this distributed environment or you want to deploy microservices in a uniform fashion. It makes this sort of thing very, very easy and more doable than having to do it all manually. And it accomplishes these tasks through a declarative goal-oriented approach and to show you a little example of that, I have this goal-oriented design here where it's really operating with two states, the current state and the target state. So in this example, if we're starting with CPU 2 and memory 4 gigs and we want to get to having CPU 1 and memory 8 gigs, first the scheduler knows to unreserve one CPU an additional four gigs of memory and then it will communicate with the DSC scheduler to launch a new node with the configuration value specific to DSC that will achieve this target goal here. So now I want to talk more specifically about the integration of these two things and how we went from having nothing to where we're at today. So as I'm sure you guys may know, engineering resources can be strapped sometimes and the number of customers that you need to serve and the needs that you need to serve outnumber that of what you have so you need to be smart with your design. So in version one, we were really looking to go low-touch and make something that would be easy to develop and easy to customize and expand upon. And we wanted all this to be just in a single universe package on DCOS. And the outcome of this, though, we thought, hey, this is great. Like, we can give it to anybody and anybody can use it how they want. We found out that customers actually wanted it all in front for them and they didn't necessarily want to just build it up and extend it themselves. So that brought us to version two. And I have this that give them the plain phrase up there because DSC can really be like a control panel of a plane where there's a million different knobs and switches that you want to turn. And these customers wanted the ability to turn all of them from this interface on DCOS and we really just needed a tighter integration than what we had in version one. So the first step was to dedicate a data stacks engineering team to this effort where we can provide the data stacks expertise needed to make the integration flow much more naturally and to build DSC specific resilient factors into the platform and into the integration. So we were working hand in hand with Mesosphere on this. And at first, we were meeting every single day and then that eventually turned into a few times a week and now it's pretty much just as needed now that the integration is more stable and the foundation is there. We really baited the heck out of this thing. I think we went through five different betas before we went to GA. So we had a pretty tight feedback loop with the customers that were trying this out. So they really let us know exactly what they wanted and it tailored our design and we were able to fulfill those requests for them. And finally, the joint support agreements was something that was definitely needed because with a platform like this, when the integration is so tight, you don't necessarily know as a customer, okay, I just have this error, but you don't know whether it's coming from DSC or whether it's coming from DCOS. So we had to set up a system where customers could contact us and go through either channel and then we would then take that request and go from there based upon whether it was a DCOS bug or whether it was specific to data stacks enterprise. And the outcome of this version too was that we were definitely better together. Like I said before, with all these configuration files and all these different knobs to turn, that's a very difficult task to do manually. So DCOS gives us the ability to have all these things in a standard format. You can fill it out as I'll show you later in the GUI itself and check off boxes for what you want to deploy. And a big advantage to this too is that you can roll back these changes. So let's say that you are a retail company and Black Friday is coming up and you know that you have your certain performance configuration for these increased traffic times. DCOS makes it very convenient to be able to roll out these changes for these higher traffic times, which is a very, very useful feature of the integration. And the next bullet I have here is the automated vertical and horizontal scaling. Users of Cassandra and DSC are very used to this ability to scale horizontally. If you need more throughput you can just add more nodes and it will scale linearly. But DCOS gives us the ability to also scale vertically. So in the cases where you might be restricted to the number of nodes you have but they might have the ability to use more resources, DCOS can allocate those for you. So now with this integration you can scale both vertically and horizontally. And finally, the other big advantage that we've seen is the uniform deployment of these enterprise applications. We have a few customers who are using this to do things like platform as a service and also microservices. So when you need to deploy things uniformly and efficiently having the click of a button in DCOS has been a real advantage to those customers. All right, so now we'll touch on the current state of the project and what it actually looks like in the DCOS universe. We have these two tiles. The first is Datastax DSC and Datastax Ops. The Datastax DSC tile is the actual Datastax server. So through this panel here you can just click configure, configure everything that you want, and then hit deploy. And I'll go through an example of that in a bit here. And then the second panel is Datastax Ops and that is our Ops center product which provides the ability to get more of a view towards the JMX metrics that DSC produces as well as it lets you do things like backup and restore, schedule repair services and things like that. So we provide both of these products on the DCOS universe. So in this most recent version we have full platform support. So that means that we support the Datastax enterprise features like advanced replication, search, graph, analytics as well as some of our advanced security features. We have node placement and node task failure recovery. I touched on this briefly before but essentially what that means is if DCOS senses that this node has gone down it will try to start it back up automatically until it will eventually give up because sometimes it does require manual intervention. But this is good because it does add some resiliency to these distributed systems where the failures can happen really all over the place. The next thing that we support is strict mode support. This allows you to run on air gap networks. Multitenancy which means that you can have multiple DSC nodes on the same physical host. This is good to just if you have beefy boxes you can build denser and denser nodes. But there are some replication considerations to take into play there. And then the next one here is pod replace with local storage. If you have a pod failure and you have your local storage backed up there's the capability to reattach that to launch a new pod and reattach that to the new pod which again just builds more resiliency into the platform itself. And then the network management and CNI this also comes into play for the dense node support where you can have multiple DSC nodes on the same host. And it's just really important for all that stuff to be working well or if the gossip protocol that I mentioned before isn't able to function your server is going to be a mess. And then finally we have expanded monitoring where you can see some of the errors and some of the DSC statistics actually in the DCOS UI. So that's where we're at today. And then I touched on these briefly before but I'll just go over them again. We have a couple of customer deployments or a couple of patterns in customer deployments. The first one is for a large travel company they're using it to deploy these micro services in a very efficient manner. They typically don't have a lot of time to deploy these services so being able to just hit a button in DCOS and have these things fire up has been a real value add for them. And it really comes back to just being able to have a repeatable consistent deployment where when they have a configuration that's working in dev they can take it directly to test and then directly to production knowing that it's exactly the same and without having to do any manual porting there. And then the second one I mentioned here is a platform as a service. This is similar to the first but it's also very important for them to be able to have this uniform deployment such that when they need more resources or more endpoints to meet their internal on-prem cloud they can just go ahead and fire those up through the DCOS and DSC integration. Alright, now I'm going to get to a couple of demos. They're pre-recorded. I couldn't do them live. I tried last night and it didn't work so I ended up having somebody else send me some videos. So the first one here I'll show you how to install the database through the DCOS catalog. So here if we go to the catalog section we can see the couple panels that I mentioned before Let's try that again. So we'll go to the catalog and you'll see the couple panels that I mentioned before. And if you go ahead and click configure here you can see the different options for enabling any of our full platform support and as well select the number of nodes and the different options that you would see in the Cassandra and DSC yamls here. As you can kind of tell from this there's a lot of options. So a big piece of this is when you're setting all these things afterwards you can just download that configuration file and save it so that you can roll back later on. It continues to flip through some of the configuration options. The takeaway here is that there's a lot. So after you set all these things you can also deploy Opsnare to the GUI management. After you set all these things you can go ahead and review and like I said download the configuration and then deploy it. So this will take you to the service panel where you can start to see these nodes being launched and in our lab it takes usually about 10 minutes or so. So this is sped up a bit to just make a clear point here but in this case it's a three node DSE cluster and on each one of those nodes there's an agent as well as the DSE service running. And like I said this is sped up just a little bit to just make the point that you can see the resources that each one of the machines has allocated towards it. So that's installing DSE and now we'll go through a quick one for installing Opsnare which is the management GUI. So a separate panel here the data stacks Ops panel but same sort of a thing where you configure it and there's much less options here for this service than the DSE service. And one of the caveats here is that right now the Opsnare service is running on a single node where in the future we're going to have that be running on multiple nodes so that that's actually using a distributed storage system as well. So same sort of system here just launching that Opsnare node which is going to give you a view into your DSE cluster and be able to manage things like backup and restore and repair. And now I'll go over to the command line interface to show how you can pull up that GUI and get the endpoint that it launched at. And my name is not Catherine Erickson but she was the one who provided these demos for me. So you use this DCOS command and you have to install the CLI for the datastacks Opsnare service first. So we see that going on here. And then we'll use this endpoints command to grab the endpoint which we can then throw in the browser to get to this GUI management service. So we're just grabbing the endpoint here and we can see the address right there. So we take that and we can put that into the browser. And this brings up the Opsnare panel that gives you just another set of visibility into your cluster. So here you can see that we have this three node datastacks enterprise cluster standing which we just launched through the DCOS catalog. So the final thing I want to show you is the most recent command line interface. I don't know if any of you have used this last but there's been a lot of improvements here to make it more usable and just overall more enjoyable. So we see our DSC nodes here again and now we can click on this install CLI and it gives us the commands that we need to install this interface. And she has to use sudo for her curl command here so luckily her password is hidden. Go ahead and install that. And really what this command line interface does is just gives you some more different touch points for your cluster. If you want to essentially SSH directly in you can do that through this command line interface. But again we see this install for the Opsnare command line interface and we also installed the so you have to do a separate command line interface for each one of the packages for DSC and for Opsnare. And now we'll go ahead and look at our DSC endpoints here and the one that we're interested in is the native client which is the communication port that you would write to just regular queries for data stacks. So we grab the IP addresses and the ports here and then we can go ahead and just show a shell on that first node. Now we'll go into CQLSH and we'll show that we can just execute queries directly on this node. And CQLSH is just the shell that is used to interact with Cassandra to issue requests. So we'll go ahead and just create a key space which is essentially just the bin for the data stacks schema. We'll create a table called customer sales. And then we'll just go ahead and insert a couple rows and then do a quick query there just to show how this has been much improved. I don't know if you guys did use it before, but this is like great. So just a normal select query there based upon the rows that were marked with a different time stamp above that one. And that's the command line interface. So next we'll show some of the value and how easy it is to add a node into scale horizontally. And this is really great because typically if you are deploying Cassandra or data stacks enterprise you'd have to do all this stuff manually. And now it's literally you can see the configuration files over here. And then on this side is all the environment variables that you configure. And now all you have to do is add this pod count to 4 and it will go ahead and like we said before with the configuration it takes this new request. It will reserve these new resources and then go ahead and launch that new node for you here. So we can go from our 3 node cluster to our 4 node cluster through a UI with a configuration that we know is working and that exists and is managed in a single place. And we go ahead and just hit deploy and now the DCOS service is launching this with the same resource allocation that was on the previous 3 nodes that we had. So it makes it really easy to scale horizontally for you. So we'll go ahead and here and just do the endpoints command again to show that one was added to the previous 3 node cluster. So just like that now we have these 4 hosts here and our cluster was just scaled horizontally. And that's really what I wanted to show here with this added node demo. But this is the sort of thing that when you are deploying these platforms as a service that if you do need more resources or if you do need more endpoints it's as easy as a few clicks of the button. And now we'll go to the OpCenter again just to show that the same change was reflected in the OpCenter GUI as well. So we load up OpCenter and now we have 4 nodes in our DSE cluster. I put this up here and I'll go ahead so on this MessosCon link here we have a bunch of examples if you want to try this for yourself to go through some of the exact things that we just ran in this demo. So these slide decks are on the MessosCon Europe website so I'd recommend if you guys are interested to go to this link here and take it for a spin. And that's it. Do you guys have any questions about the platform or about anything that we've built here or anything that was presented today? Thanks Chris, great talk. I have a couple questions. So the framework the Messos framework that Datastacks built, it's tied to DCOS, right? So if I want to run a vanilla Messos that will be like tough nuggets, right? Correct. It's built on DCOS it requires DCOS so you need that service to be able to run it. And the second question is you mentioned that the way you work with local disks allows replication and fast moving the container to another node. Can you elaborate a little bit on this? Sure. So when you persist the volume it gets stored to an underlying host pod replace mechanism. That data doesn't actually go anywhere but the previous pod that you had before will go down but then a new pod will be launched and you can tell it to map that local storage that was there the whole time to this new volume or to a new volume in this new pod so then you can access that same data there. So there is no rebuild associated when we recreate the pod on the same node? Correct. And do you work with external storage there as well? I don't know honestly. I'm more just I know more about the data sacks side of things so I don't know exactly but we can follow up later if you want to get an answer to that. Thanks. Just one note for that though the Ops Center management GUI that I mentioned before does provide the ability to back up to S3 or any external storage that you have I'm just not positive about the DCOS side of things working with external storage. Thank you for the nice presentation. Question when you add nodes how does it handle tokens underneath the covers? Yep. So that's going to depend on whether you are using single tokens or V nodes in Cassandra. If you're using virtual nodes that allocation algorithm works automatically so if you let's say you have 256 tokens per node and you launched a new node the way that the token allocation works is that it will automatically select the tokens that would best balance your existing cluster so all of that is done through DSE and Cassandra automatically but if you're using single tokens then you would have to configure that yourself for the best token allocation for your single token architecture. Okay, so you'd put that into I guess the configuration and then when you're done you'd still have to clean up Yeah, you'd still have to clean up if you're doing single tokens. Cool, thank you. And you would just do that via the CLI. You can go into each node and do the cleanup command there. Was that running containers or was that actually running the process? Yeah, it's running containers. Okay, cool. Any other questions? Alright, thanks guys.