 Thank you so much. First of all, it really feels great like this morning I was first up on stage and tonight I get to be last on the stage That's really cool. Thanks that you're all still here for the last session. No, not the last session But my last session So what I want to talk about a little is about yeah distributed data So we talk a lot about persistence. We talk a lot about how to store different data sets So for example when we had this match workshop today we had a lot of discussions about what are like the best data models to fix and where to store my data and Usually this talk would have mainly been given by max from a Rango DB because we're gonna be talking about a Rango DB He unfortunately couldn't make it and he asked me to do it and a Rango DB is one of those Exceptions where I'm actually more than willing to do it because they have been really great partners around Mesa's for a long time So they really helped us implement the persistent volumes They really helped us to always test new Mesa's features as kind of first guinea pigs with their framework. So This is why I'm standing here today and not max So max he is a senior software architect at a Rango DB He mostly wrote the data center to data center replication. We will look at here right now And I as I was introduced. I'm a distributed systems engineer at Mesa's fear Why distributed data? So as we are distributing our infrastructure, and this is kind of see promise of Apache missiles We also have to distribute our data. So As we are scaling and this is kind of one of the reasons why we need so many so many large clusters is That we have a lot of web cloud applications Look at Twitter. Look at Netflix, which are actually generating data and One of the realizations was that those exact existing monolithic systems may it be a large SAP instance may it be a large oracle instance do not really scope well and do not scale Well to this kind of workloads And this is why we actually seen that many of the companies large companies like Facebook have been developing their own solutions for storing data in a distributed fashion and Yes, this is basically the realization that those new tools which those company have been developing They're actually not on a single box. So what they try to do is basically Actually have different data models because if I have a traditional relational database Whenever I scale that I'll probably hit certain limits and Usually the guarantees which a relational data store gives me they're really great But they are also restricting the model in some kind of way. So Those big relational databases have been made for a different era don't understand me wrong We still need relational databases, but we might not only need relational databases We might want to talk about different data stores as well And the clipper doesn't quite work So this is why we actually nowadays talk about So-called multi-model databases and what does it actually mean? I can have different models for my Data, I can store it for example in a relational database. I can store it in a graph store and Whatever is popping up here. We'll see that in a second. I Can store it as a graph I can store it in a document store and all those data models They have their valid use cases if I'm restricted to my only relational database What I have to do is I have to map all of those different data models to a relational model meaning I have to break up my Jason I have to Break up my key values and generate like a very simple relation for that And also my graph there are multiple ways of representing graphs in relational databases, but usually they're rather inefficient So it doesn't actually scale out in the way. We would need it for this new era of web applications and Also those traditional databases as I were single note instances Usually they haven't been designed with too much resilience in mind or automatic fail over that they can actually deal with Failing tasks, and I'm just going to see why my Okay, why this keeps on popping up and this is exactly if I'm a database vendor and I have to Write my distributed database. I usually I want to focus on something as Yeah data storing they being a database and all those new Requirements that I actually be able to spread out across a large cluster that I have to be able to detect failures Then I have to restart and so on and so on this is really more an annoyance because it's not my core business I want to focus on and this is why a Rango DB was one of the first to actually write their own really impressive missiles framework and following this approach of integrating in kind of orchestration tools which Don't require anymore that I implement everything by myself So as mentioned a Rango DB It's a native multi-model database meaning I can store JSON data in a native Representation and I can store graph data in a native representation and also key value data All of these I can actually store and I can enter link So for example, I could have a graph where each node is actually a JSON document And that makes actually then also querying all those this distributed structure rather nicely They have their own queries language to enable that to query across all those different models and They're actually quite scalable So they can all of this can run in a distributed fashion and I can actually also query across distributed graphs If we look at a model and maybe this helps us to understand a little better Why we actually need those different models in a relational world My tables look here as on the As on the left here where I have like columns of data and each column has like the same data type I also kind of something like graphs where I have connections between certain entities and Maybe the last thing if we talk about different values is then Here talking about a document structure So a document structure it might look similar to the relational structure on the left But it's different in that way that actually the schema of each document can be quite different So whereas in a relational model, I really have the same schema for each row In a document I have a quite different schema for each of the documents which could be rows in that document and And so this is actually why different models make sense and we could talk about the other ones like time series Which is really good if I have a lot of sensor data, which looks very similar column not data stores if I really have repeating Same data value and I want to have aggregate queries across a certain column Or key value stores if I have a very simple representation But all of them they have to use cases and they're helping me to natively match my data to my data store because otherwise if I all have to match it into one data store I always have to rewrite data which first of all adds logic and secondly it adds a lot of complexity to my data store and usually duplicates data So the native approach is for a Rango DB to take both documents Documents graphs and key value stores and all put it in like one big database make it available Why a one query language and one deployment artifact? So I don't have to install a document database plus a graph database plus a key value store And then have queries in between them. I can create them all in a similar cohesive fashion How does it look if I deploy that and we'll see how that looks on DCS in just a second So I have DB servers which are really Responsible for storing the data and then have coordinators which are responsible for coordinating those queries across all nodes So I can have multiple coordinators and a number of DB servers And this is the way we can scale a coordinator has certain overhead for running a query so if I see that my query workload is going down I can deploy more coordinators if My storage space is running out of my compute capacity. I usually would add more DB servers. There's an agency With agency you can also Replace it in your hat with zookeeper at CD, but it's their own implementation Which helps to coordinate all this so you still need distributed system. You still need some kind of majority decisions and Overall way to persistently store certain decisions. So for example where data is stored or who's responsible right now Who's owning that data? So as we've seen on this slide, this is a distributed setup Even deploying that on me source whereas messes makes it already much much simpler of writing something like that It makes it much easier. It's still very challenging so especially getting the scheduling logic right So where do you want to deploy which tasks and what do you do in certain failure scenarios? Failure scenarios. They are rather easy if we're talking about stateless services because if my engine X container fails I'll usually just restart it great, but if my arango DB database server fails I actually need to recover data and I need to make sure that my replica number of replicas is consistent again and So the first iteration of this arango DB framework consisted of over 5,000 lines of C++ code and With that number just keep in mind that was the first implementation of a framework really using persistent volumes and Reservations in mesos so it also took us some time to really experience Get a feeling for how it should be used So probably if we would rewrite it from scratch it would be a little smaller by now But still in the orders of thousand lines of code so What this framework does it handles deployments it deals with persistent volumes and reservations and Does take care of failover and is also enables up and scaling down which I personally like a lot Then just to give you a feeling for how complex this Framework is so this is the state diagram for creating reservations so if you want to create a new persistent volume you first you try to reserve and Once you have reserved you try to persist and so on and so on and at each layer it might actually fail So only the way to get a real Reservation to get a persistent volume, which can be used by the framework Actually requires rather complex code here and This is as mentioned this morning This is exactly the idea for the DCS SDK because this pattern is actually it's the same for almost any framework Which wants to use persistent volumes they all have to go through the same pattern So the idea for the DCS SDK is to kind of pull out This logic and make it available as a library or as a generator And if you're using the defaults, so if you if that's exactly what you want you actually you only have to write your YAML Similar to writing a Docker compose file You don't have to write any real Java code and you also don't have to be an app expert You simply say I want to have n number of database servers up and running If you need custom failover logic or in case of a RangerDB custom logic to scale up and scale down You might need to write a little code. So you can still use your overall YAML Generate your scheduler, but you might overwrite certain parts with your own custom logic So for example for scaling down and this isn't yet implemented in the SDK This is one for them one of the biggest missing steps, which they can't get working on the SDK yet is the process of scaling down if I'm scaling down is a Stateful service. I actually have to make sure that I make migrates the data first So I first have to clear out the database server. I have to move the data somewhere else And once the server is empty, there are no connections no data anymore or no data Which needs to be replicated. I can then shut it down RangerDB has that implemented for their own framework, but this is not yet easy with the SDK so we Felt we have to write it would have to write a lot of code if we wanted to implement that in the SDK and So for now a RangerDB is actually still down here using their own scheduler But they're really trying to move up the stack because it simplifies things So as said for now and also they started before the SDK was around they have their own scheduler as Soon as we feel it's has enough. We have production quality similar to the existing framework They will probably move up the stack here replication so even though we support fail over by Having persistent volumes by being able to shut down an individual database server here This still will fail if an entire data center is failing so Common problem is how can we actually replicate data across to data centers and So we have different options for that the first one or the Yes, the first level is simply to use a replication as an off-site backup. So I simply I create a database Dump every every hour and I write it to an external data center. This is one option and One goal basically have an off-site backup of my data The second one could be for disaster recovery Which actually would mean that my second secondary data center can take over that workload if it needs to and The last level and the most complicated one is actually to have two active data centers Which can then also be used for geo location? Meaning I can actually get requests to both data centers. So if we look down here The first approach basically means Dump your data to the remote data center somewhere else The second approach means you have exactly this picture where you have requests coming to one data center But if that data center fails, you can actually buy a load balancer Why are some kind of switch? redirect the request to the second cluster and there might be a small lack some minimal data might have been lost but The applications can keep on running as before In the first implementation For Rango DB what they focused on is the first two parts as mentioned offering geo location services This actually requires a two-way communication and two-way synchronization with the cluster Because all of a sudden you have two Masters up and running which can both accept requests. So for now, they simply decided is to have a secondary cluster as a fallback And but only you can only send requests and write to one of the clusters at the same time so For this first iteration the goal is actually to run database Clusters in both stages centers or in more if you want to but the current customers are doing it in two and then replicate data automatically between them and One one of the data centers fails they can switch over. So this is a goal of What we are presenting here today? the V1 application is to basically take the Default a Rango DB clusters including all usage settings and replicate them to the other data center and They actually already have an existing replication API called a Rango sync Which used to be to sync clusters if you restarted or you wanted to switch between different versions for example That tool was used and what that actually does in this new scenario It uses Kafka on both sides and a Rango sync on the one side will write all Changes into the Kafka queue and on the other side The other a Rango sync will read it out of it and apply it to the other cluster And using Kafka there has multiple advantages first of all We can also handle spike workloads in data center a so data center a being the active one So and if some point the workload gets too much a Rango sync can still dump it But it's just too quick to apply it in real time on the other cluster Actually Kafka can serve as kind of a buffer and help us reduce that risk of losing data and Also Kafka already comes with between data center replication So it's helpful because it's already been tested between different data centers and Similarly as we see spike workloads in general it helps to distribute the pressure because it can also give me back pressure to the other side and important for Advanced setup is I actually have an encrypted communication between the clusters So I'm not sending any unencrypted potentially sensitive information across the internet So This is basically then how it looks like so I have here a Rango sync Which is aware of each other and it basically uses Kafka underneath to synchronize that data So in data center a data center a usually being the active one it writes the data into It dumps the data into wire a Rango sync into the Kafka queue and the Kafka queue on the other side will read it Rango sync will then apply it to the database servers And the second cluster is then up to date with the first one in the first V1 implementation this is asynchronous. So If data is written here, it's not immediately applied here And if I have a transaction for example a user writing data the user will be told your data is written before it's necessarily synced to the second data center and This mostly has to do with the overhead because Kafka adds a certain latency and it would really reduce The performance of a Rango DB if they would be waiting for each each of those events to be soon and secondary in many applications a Little tiny bit of data loss doesn't matter too much. So if I lose like a few events In many of their customer scenario. It doesn't matter too much. Just I have to be able to keep on running so the most The most important goal here is actually to keep the data center up and running and not so much to prevent the latest mini tiny bit of data loss and The other disadvantage is that basically I have a complete duplicated data center Which usually might include up to somewhere around three to five nodes But I'm not using those nodes for any performance gains for my users If I had a hot-hot migration and double master set up So if I could actually query both of them at the same time I could split up my user workload for example between those two data centers as we're talking about a Rango DB on mesos and a Rango DB on DCS Let's briefly talk about the replication between two data centers or the possibility of running DCS Slash mesos across two data centers. So what actually works Rather well is to distribute agents across regions across availability zones What is a lot harder? So what I can easily set up is I have all my masters and one Availability zone and then I have additional agents in different data centers What's kind of a little more challenging to set up is to distribute the masters and this is mainly due to To the fact that Zookeeper doesn't really handle high latency links very well, so what some people are doing and It works rather well is to distribute them inside an availability zone in Amazon terms so kind of Near local not too high latency links, but kind of two fault domains But what we recommend not to do is to distribute the masters across different regions so and You can also have a Synchronous so this would be the synchronous part because you basically just split up your cluster. The other part is actually to spin up two Clusters and makes them aware which there's ongoing work But the current awareness only consists of that the two clusters know of each other and you can easily switch besides them They actually they wouldn't replicate any data so for replicating data in today's world you have those two options you either you distribute your notes so you distribute one cluster across two availability zones or You add a proxy up front which would actually Duplicate all requests, so I've seen some people doing that for example for Marathon Where they have a proxy in front of Marathon, which is then distributing all queries to Marisans to two data centers Alright, and this actually gives us time for the demo and the first thing I would like to show my cluster isn't here is simply a Rango DB up and running and I spun up a new cluster and This is an EE cluster. It will tell us that it doesn't know the Certificate and here we have an empty cluster as a Rango DB is in the universe You can simply go here install it so right now the scheduler is coming up and maybe one neat feature of which isn't enabled by default, but We now since two days ago. That's when they changed it. We can also deploy it in a UCR configuration so and what we installed right now, it's actually it's using The Docker containerizer, but we can also switch it to use the Mesos containerizer, which is hard to see here our Main schedulers up and running and right now the database and coordinator services are coming up And we'll have a look what that means in just a second. That looks good See whether the OUI not yet available It's slow, but it's yeah, it's not healthy yet. So While it's still deploying, let me jump to the second part of the demo and then we're just going to switch back in terms of time so What I've done here before or what they have done for me is I set up two clusters So this first one is being based in Paris as we had two French keynote speakers today And the second one is based in Vilnius So there's the same cluster setups. They have the same user settings and They are linked to each other. So this this is data center one. Let me actually create a collection collection is this Term where I can then store documents. So in a traditional database imagine it being like a Space with multiple tables Mesos con it should be of type document two shorts So here we are generating our Should be up. Yes, it's here. Now. Let's have a look when it shows up here. It's already here. So we were Too slow to switch over but as you'll notice if you put in more and more data and Can just create random document here key test so I Really create an empty document test It's saved See how quickly it shows up over here already here So this is basically syncing the data over from one data center to the other and I believe and this is why I find this Important that also people are trying this out. This is going to be one of the challenges if we really really want to support fault tolerance across data centers, so We've seen that one of the first data stores to provide that probably was Cassandra, but Over time More and more services will have to provide that and I believe this is something really relevant for us Working with Mesos that we care about it early and this is why I like to see people experimenting with these kind of options Okay, I hope my yes my arango dbs cream now and now I also get the UI and It's a similar UI as we just seen The only difference is that this is a community edition, but Still works for everything we want to show The one feature I won't create another collection right now But what I really like about their integration into Mesos It's that I can actually scale up and scale down from the UI So this is a special Mesos UI and we now look here. It's a resource utilization and we scale up So the easy part to scale up as a coordinators because they are stateless So there's a third one coming up and let's also scale up a database server Which probably will take a little longer after this is deployed Look at the resource utilization Should go hopefully go up in a second Yeah, we see the resource utilization going up and if we look at the different tasks here Yeah, it's already running there Good now we have three up and running Let's do something similar with the database servers. We're bringing up a new database server. That looks good So right now it's resharding the data So what will happen is as we're adding one more server and this should actually be production ready I mean, we don't have any data on there. So it doesn't shouldn't take too long Yeah, and it's up there, but it will actually recharge the data when we bring it up So it will take the existing data which is on those two database servers and will then distribute it across all servers Now and this is the fun part. I wanted to show is actually we're also able to scale down So just briefly switching back here. We see there was another increase in resource utilization as we've launched a new task and Now we can actually also scale down and This will Most likely even take longer But I believe it's a really nice feature because it allows us to scale up and scale down in both directions and Actually, we can try that as well. I won't be able to scale down to less than two servers Because we have to find that we need to have two charts For this one collection We earlier created now we've scaled down and we also see here The resources are freed again and can actually be used by different tasks Alright This was actually already the demo and this lets me finish way too early But at least it gives us some time for question. Sorry. It's not my talk. So I Probably a max could have spent some more time on certain slides. Yeah, I Can repeat the question So what's the consistency model it uses? Consistency model. Oh the consistency model the consistency model They have a notion of distributed transaction, but by default it's just going to be Persistent per agent. So you have a notion of atomicity per single Key space which is stored on a single server. Okay. Thank you. Okay behind you behind you behind you. Oh Sorry, I have a Be weird question So you're talking about people building schedulers and stuff like that and then you want to see more people Develop stuff like this Yeah, okay. So my question is How so Basically container con was there and we saw that I mean we can all kind of see that Kubernetes has a kind of a huge push from the community so what is the Mesosphere's if you will or what is the mesos community doing in that department? Regarding trying to kind of you know, get the hold of people to actually build schedulers or We there were some steps with the SDK, you know, and okay now it's a little bit easier But it's still not dead simple if you agree Yeah So what I believe over time and this is also why I believe it's still hard with the SDK and also Talking to the outside It's probably hard for people external Or not collaborating with Mesosphere to use the SDK what I really enjoy seeing in the last month So we adding more and more documentation and actually there a lot of community frameworks based on the SDK are coming up And so I believe it will get easier over time What I like about mesos is that that simply gives me the choice, right? If you say there's a lot of push around Kubernetes, I can run Kubernetes and on top because it's actually it's the application scheduler and If I want to run multiple Kubernetes clusters Mesos enables me to do that So, yeah, no, I completely agree with that. I'm not this is like not a tech Yeah, yeah at all. This is a community. Yeah, I'll come to the community question as this is Probably relevant to me especially because I'm part of this community team. So I believe we we are working on Doing a better job there But overall I think it's a lot is due to documentation and this is actually where the community can help as well Contributing documentation as you mentioned it's hard things, but I believe that in certain things mesos is also taking the professional aspect to certain things so For example some some default values or some very conservative values where Kubernetes is more developer friendly and sorry for taking up that example again, but DCS basically chooses a way which will work in all distributed scenarios Whereas Kubernetes is often or sometimes giving you choices Which make it easy to develop, but then if you really want to deploy that into production It still makes it kind of harder. So what I believe is happening in the future We are working hard on making things easier like we see SDK Metrics API and in my opinion most importantly adding documentation around things how to do stuff Whereas Kubernetes I seen For example, I was just at spark summit yesterday and there was a talk about running a spark on Kubernetes and that was really really hacky and in my opinion I can Could count at least like three cases where that would probably fail in production if those failure scenarios happened And this is something I like about a mesos that in the mesos world we take this conservative Approach which is not that developer friendly, but will work in the end in production and I believe mesos will work move towards developers more or DCS for that as well and Kubernetes will in certain aspects also become more complex whenever you start a green field project. It's usually easier Did that kind of go in the direction you wanted Okay, I tried and I'm happy to discuss that afterwards for longer Then thank you very much and enjoy the last sessions for mesoscon, right?