 But we already talked like a lot about like graphs and like graph databases themselves So I kind of want to not follow that path directly But I want to talk about like how we can deploy them and maybe just question up front Who has heard about mesos or has even used mesos or DCS before? Wow, that's actually that's a lot of people who have used Neo4j on mesos or DCS One hand. I hope one hand. Yeah, at least okay So that's what we're going to talk about here and to start out. I'm Yerk So I'm working at distributed systems engineer and developer advocate atmosphere So that means I'm mostly coding on here my shirt hatching mesos or I'm like around getting talks developing demos Talking to you those guys as for example Neo4j to get like more partners involved in this ecosystem and so this is actually also how we got this idea for this talk and First question we actually need a cluster so Neo4j also runs kind of decent on like one box We already do quite a lot quite large graphs like a single box And so the cool thing about Neo4j is actually that we can cluster for both like high availability But also for like those replica servers where we have read replicas for specific use cases So those of you who might not have seen that yet So Neo4j it has like a core Clusters core servers. They're actually in the current version. They have like synced writes So they're like also read write here And they're gonna have one elected leader who is kind of like the server in charge And then around we can actually we can create a bunch read replicas Which are only there to basically read data and the cool part about us I can have dedicated read replicas for specific like purposes or topics so they'll be optimized For for example for reporting in another analytics. I can have other read replicas targeted to specific query types So I can actually by utilizing multiple cluster nodes. I can really efficiently have answer like all slash most of my queries through of my cluster and So this is like one use case where we need a cluster and then of course the question How can we actually manage this kind of like large-scale cluster? When we as MIS sphere when we look at clusters we often see is kind of like a static partitioning across different workloads Imagine so what we would for example see we would see like a four servers being Dedicated to Neo4j and this doesn't necessarily mean for physical servers This can mean like for virtual servers or yeah for instances basically reserved for Neo4j Then we have another sub part of our cluster, which is dedicated to Kafka We have one sub part of our cluster, which is maybe dedicated to like a second Neo4j cluster So this might for example be the test cluster Which some other team is running We might have a bunch of microservices might have something as we heard about before we might have a patchy flink flinks there So we actually what we end up doing is we end up Subpartitioning our cluster into like very a number of smaller sub clusters each dedicated to specific like service or a specific Framework and this actually has multiple disadvantages So first one from an operational standpoint is that I actually as an operator I really have to take care a lot about like moving stuff So for example, what happens if one of those Neo4j nodes like my main Neo4j cluster is failing Then I as an operator I would have to go in move over one node from for example from the flink cluster And so it would actually involve like a lot of manual work The second big disadvantage of this model is that we actually end up with very bad resource utilization So usually those clusters they end up at resource utilization somewhere between 50 and 20 percent Because each of those sub clusters so my Neo4j cluster my flink sub cluster I have to basically provision them for like the highest load around and so I'm actually bring normal operations I'm wasting a lot of resources So the target image where we would actually like to go to is This here where we actually we kind of don't care about so much individual machines We basically we treat our data centers like one big box one big pool of resources and they're simply deploy all those services and I as an operator I don't care too much on which particular node for example Neo4j is running as long as my cluster is up and running and healthy the system should take care of this and This is this picture is exactly the reason why Apache Mesos was originally developed We're going to see a bit about the history in just a second But basically this ease of operational use so how can I deploy like my large compute cluster? How can I deploy? Near multiple Neo4j instances This was like one of the main reasons and second reason is basically increasing resource utilization Throughout my cluster and we just briefly go back to this image like one of the main drivers in my opinion What we see is the increasing need for such kind of system As mesos is that actually here is the number of frameworks the number of Systems which actually would like to own which would like to have their own sub cluster is increasing like nowadays We have spark fling and Neo4j. So they're like a lot of Frameworks a lot of sub clusters I could potentially create and so it becomes very like Unfeasible back in those days when we just like a dupes there one system which wanted to own the entire cluster that we still like I'm okay ish But now we have like a lot of different systems wanting to own a cluster. It's not as okay anymore and The way mesos actually doing that it's by this fancy term called two-level scheduling and two-level scheduling basically refers to that we have Mesos on the one hand basically just dealing with resource allocation and then for particular Frameworks of basically for each of those frameworks we saw the left side. We have a Scheduler so the schedule is basically like this component which is controlling which resources used in the cluster how do I react to when one of those notes is failing and Actually, there's even a default scheduler. So for example for the Neo4j example, we will see in a second We didn't have to write a scheduler, but there's like a default schedule I can simply use which will take care of that scheduling my stuff restarting stuff if it failed in the cluster and This is actually done and this is like symbolizing this two-level Scheduling as well because here in the middle, this is basically like the mesos layer This is the mesos abstraction layer which subtracting the schedulers which here I have flies the Marathon scheduler Marathon is a default scheduler I was talking about and then this for example could be a dedicated Neo4j scheduler this could be a spark scheduler and So they are basically separated by this layer of mesos by the mesos masters from the underlying nodes from the underlying hardware So actually those schedulers here, they didn't see the entire infrastructure they basically only see the subset which is provided by the mesos master and That makes it actually very flexible and independent to failure so if One of the nodes fails basically the scheduler can restart it on another node and the application can continue running Also just to have the complete picture also this master level the mesos master level is highly Available you have a zookeeper core which basically always make sure that we always select like the leading master similar to like this Yeah, we have basically multiple standby nodes if one of the masters fail The history is actually it's also kind of interesting So it actually originated as a class project at UC Berkeley and that's actually the same lab where spark was initially invented or Developed and spark was originally actually developed as like a demo framework to show how easy it is to write distributed systems together with Mesos and then what actually happened so students who were developing that at UC Berkeley Ampli They actually gave it back talk at Twitter and Twitter if you remember had like all those issues like this fail Well, whenever the infrastructure couldn't cope with all the users And so it's actually we're really happy about it and they which in my opinion was a really cool decision They also decided to actually Make it an Apache Open-source project pretty soon. And so in 2010 it became an Apache incubator project and I believe about nine months later it was like a full top level Apache project and Then the last step on this slide is basically DCS DCS We're going to see is another open-source Distribution of mesos which basically brings like all this stuff around which makes it easy to ploy and which makes it easy for like Other frameworks as for example Neo4j to integrate there So I mentioned almost all of this maybe just Those for reference like who's actually all using that Twitter is heavily based on it Airbnb Whoever booked his Airbnb while staying here in Brussels has seen that PayPal Netflix and actually anyone who ever used Apple Siri also has used mesos underneath So when we're talking about databases, of course, we also have to talk about storage and There are different kind of Applications or different kind of data applications and all of them they have different needs for storage So the really easy part this is basically on the top This is like those front-end or non-sistant applications if I have my no chance application Which doesn't have any state my have my engine X which doesn't really have a whole of state or none Then I don't really care on which of those nodes is restarting imagine my engine X service failing I don't care where it's restarting. It can be any note as soon as it's coming back up quickly If I'm having a database as for example Neo4j, it's it's slightly different, right because there actually have state and There I actually end up with two two different kinds of models So there are databases as Neo4j which are really written for this kind of like distributed ecosystem already So what they have is what's all like on this second slide. They have like inbuilt replication So they can actually survive a single node failure or potentially multiple depending on how many replicas you have But there are other databases, so I take like a standard my sequel instance like a single my sequel instance This won't survive like a single node failure, but then basically all my data is gonna be gone So for actually both those models We have different storage opportunities inside mesos So for the malls as for example Neo4j, which already have this inbuilt replica We have something called local persistent volume. This means you're gonna get fast storage, which is directly attached to the node It's just gonna be the hard disk flash drive SD inside the node And I can really use that quickly and as fast as I would use a normal non-disk and the nice thing is whatever happens in the cluster when that node fails When Neo4j the instance is failing whenever it comes back up I still get all those resources back so I can keep on running on that particular node If I'm on the other side and I'm running something like this all fashioned my sequel Then it's slightly different because in that case I cannot buy for single node failure and I actually I want to make sure that I Can restart up on any node in the cluster because if that node is failing I need to be able to retrieve my nodes and that's what we actually have external storage Which helps us to be able to restart on any node and basically retrieve that Distributed storage if I'm something like Neo4j I don't want to use the distributed storage because in that case I would have distributed I owe on like two levels first inside Neo4j the database for each replica and then again unlike the storage layer So if I already have the layer inbuilt. I don't want to use it DCOS so as mentioned before DCOS is basically distribution around meso So it basically comes with like all the features service discovery load balancing comes with a nice UI Which we're going to see so it actually it helps me to deploy it because I don't have to worry about like all those nifty details It basically it comes all out of the box. I can install it on them their package is available I can install it with cloud templates so I can actually also choose where I want to run it This is UI which we're going to see in a second and maybe just more important as we're talking about particular applications. We have this cool thing called Apps we have basically the app store for your cluster So here you can go with like one click you can install spark Kafka or actually also our different Neo4j packages and as I have some minutes left, I would actually just like to Go to my cluster and show that So here I have my cluster running so that's basically it's running on a cluster on Amazon and as we can see My components are healthy and what I actually did already because it takes some time. I installed the Neo4j core service if we go to this universe app store We actually we have multiple packages available So Neo4j. How readable is that? with a beamer kind of okay Okay, so actually in this app store we have three different packages available So the first one it's the core package and what that will do and what I already done It's basically going to go and install Neo4j inside my cluster So inside my cluster means that typical ECOS cluster It has internal nodes which are not exposed to the outside for security reasons and then there are External nodes, so the typical pattern which we also going to do here is basically on those publicly available nodes We're going to install a proxy a load balancer and that I'm basically going to proxy into my cluster to my particular application and That's something which I can do right now. So I'll go here and install my Neo4j proxy Cool, it's been installed If we go back to dashboard we actually see is here my CU allocation is slightly increasing also Slightly my memory location. It doesn't take too much memory and I actually now also have four tasks running So it's going up if I go in here I actually see I have now my core and my proxy running and Let's hope that it's really running and available. So this is a public node and Yes, here we go Let me just double check That I'm not using bolt. Yes, because we are dealing with Amazon like internal IPs Unfortunately cannot use bolt from the public From the proxy, but that's all configured nicely and Let's have a look at our cluster so What what I can see here? I now have three nodes available in my cluster So I have this one leap here and I have the followers and what we actually can also do in a second We can try to kill them and just see that it stays up But I want to do first. I want to go back to the universe and install like the third package We solved for Which is the read replica? So we are installing those read replicas. Let's look again on the dashboard. Yes This takes more CPU and memory. So it's starting up and Currently, it's still unhealthy If we take a look here, we can actually follow what's happening So they're actually running but they have health checks defined and those health checks They're like an important measure because they actually they tell multiple components within the system Which parts can be used and which not so for example If one of them is unhealthy for a number of time, this is a signal to the system Hmm, maybe I should restart it because it didn't really work as expected So in that case the system would go and actually restart this one instance, which is not healthy in this case also load balancers so for example if I have a load balancer front it would And it's smart a smart load balancer. It would only use those Instances is back and which are currently healthy and hands only go to the healthy ones so if we go here and We reload we actually now see here. We have to read replicas cool Actually, but now we need even more. So now I can actually just go here and let's scale them up So I can actually do all of this what I'm doing here the UI I can also do all of this wire API endpoints wire rest API I can do all of this also from a CLI and now we're just just deploying like our Third read replica currently it's stating Hope I have enough resource left Yeah, and now it's running and hopefully it's also going to be healthy in just a second This is what it's still waiting for and once that's done. We also going to see it in the Neo4j tab Duh-duh-duh-duh Let's read out here. Yep There is our short read replica Now let's actually just go here and kill one of them. So kills it could either be Some error happened my application was killed because was using too much memory It was there was the programming error inside and just crashing and so My one instance is gone kill And we actually didn't really see that was so fast because also we didn't really see too much But what we can actually see here updated the second go So as soon as one of those instances dies The system is going to recognize and the system basically tries to keep this invariant That there are always three instances running and so it's going to go there and actually restart one of them wait Yeah And there it's restarting cool But this kind of isn't too cool yet So what we actually we want to do is we want to run some stuff on it And I hope this is yeah, healthy good And so what what I prepared here as mentioned before we have also CLI support and I just gonna show you a short ad definitions. This ad definition Is basically how the system can be tool to start something and this is actually using the near for day Twitter loads It's a load generator like artificially generating Twitter data And so I'm going to post that to the cluster in just a second One nice bit. I wanted to point out It's actually how we can dress those servers within the cluster, right? If I'm writing such kind of ad definition I don't know where it's running in the cluster and I don't want to code that in my ad definition So what I actually do I can use a service discovery Names so in this case, it's actually it's a virtual IP named virtual IP and No matter where that's running in the cluster. I can always reach it under this address So in case this is actually also a load balancer as if I have multiple back ends This address is going to load balance to anywhere in my cluster And this is like what useful because I can't just hard-code that from within my application And I don't have to particularly care where the other application is running in the cluster so And I actually I could have done the same now from from the UI. I could have just copied it here I could have done the same posting it to an endpoint and now actually see it's all staging This usually means it's holding the Docker container in production use cases You would have like all your Docker images in a private registry within your cluster For demo purposes, I'm using the normal Docker hop. So actually the Docker poll saved a while But seems to be deploying seems hopefully to be deploying staging Demo guys are not nice to me Yeah, let's see what what's happening inside new whether data is Actually pushed in now. I'm not seeing my notes yet Yeah, yeah, but it's it's failing containers. So let's do life debugging and figure out what's wrong So I can now actually just go here and check see standard error No right operations are allowed Directly on this database. So he's not finding the leader for one Yeah, I hope Oh, I see I see I see one leader Okay, so I I gonna just debux this after this talk, but basically Uh, what we would see is basically This is so it's app definition Say a really really easy way to deploy applications talking to me for j if we find the leading master right now, which Uh, I'll just figure out after the talk. Uh, what's long there why it's not deployed But this is why we actually see this keeps on restarting and uh, failing curse. It can't talk to the master Nice part of what we could see that it actually it's kind of easy to figure out because we have an easy way to get to the loss I don't have to figure out now Here and we can actually see It's running on different servers. So I can actually Uh, we don't have to worry about like on which server's instance is running now I can simply go there and figure out what's wrong here. Uh, why isn't that running? Okay, that would actually bring me back to almost my last slide Um, and yeah, this would have been the demo We would load like our twitter data and actually see that it's running the cluster would be then can actually do we can create different read replicas for that and uh, basically scale up and sail down the cluster as we did before And yeah, this actually brings me to my last slide Whoever wants to try that out to code available online feel free to play with that And the packages are also available whenever you install these to us anywhere So, uh, feel free to play with that and actually as it's open source, you'll feel free to contribute Uh, and uh, give us feedback what we can improve there Okay, cool. Thanks a lot. Any questions for you? What do you mean one? So we actually we we have both running so So it's only because the setup across the cluster and all the Covering of the cluster and swat running and so on. It's only because the clustering and going for Oh, okay That's a different boat Oh, you you mean It's basically you you can limit as you like Because and Courses can have two roles. Courses can be the leader or fellow of the core in general as you see it Should be one one of them is the leader in the core and the other ones are It's an automatic setting so they fail over to uh, so the leader coordinates the uh, we are off the uh commit and the uh cluster membership With the performance So, uh, you mean in particular of neo4j or in general of like the overall system Okay, neo4j would actually forward to use that's okay and uh for uh, dcs We actually have metric endpoints. So basically and this is also where something like The application would integrate so that basically, uh, if you output your metrics on like these stats endpoint systems going to pick it up Basically aggregate that And you can actually see this is belonging to neo4j. This is belonging to that particular cluster And so basically See all those performance statistics together On like both system level. So for example, what we saw there is like the allocation. So I'm saying I want to give My neo4j read replica should have most have like two gigabytes of RAM And so one thing which you should always monitor is like how much is it actually using it always using like 1.99 And you're really close to being killed or are you actually using much less? Uh, you're wasting resources in your cluster And I'm not sure who was first So So you mean if one of the core servers is dying So if one of those core servers is dying one of the others would take over Uh, and the system first keeps on running, but you mean how is the data recovered? Yes So basically this one is gone. What's going to happen? So, uh, what's going to happen, uh, kind of depends on how you have different configuration knobs By default what's going to happen is he will try to put it back on that server And then you can also specify like after a while you should stop that and go somewhere else So basically you can define this behavior, uh, this failover behavior. What should happen, but in As we're talking about distributed systems, uh, Often what's happening is actually that you have like a short network partitioning Between some servers, right because that work is so reliable. Uh, and so Usually you want to wait for either that server to rejoin completely or if it's just like a task failure If just see a core server task has failed, you want to wait until it's restarted Which also see what would happen and in that case it's being restarted on the same note By default he's going to pick up his stature again Yeah So that's on the method you're using right so, uh What what we saw was a virtual eyepiece, um, you can You can't confuse that, uh, if you know like certain endpoints it's quite hidden But you can select your message and for the other load balancing for which we call like external load balancing Which basically is an ha proxy. Uh, you can also, uh, configure your ha proxy load balancing method Yeah And um in the reconnection alone Can I Put in you to input data and synchronize later? You mean with to read replica So because you just read only like a hash instance So you can have a large number of these but they're not you can't write to read the class So you can write to uh notes in the core, but only if you have a right form Otherwise you have buffer the the input, uh in a queue or something like that This is So replica itself it's exactly replica is just a throwaway instance, but it doesn't have to be up so it can't be killed and then there'll be new Do replicas really have an identity or something like that? It's just a machine of the class that it provides data So you can't really say I want this instance to be up Because it will just restart instance as it needs to and uh, what I could imagine what you could do is on the on the Like loader balancer level if the server is in a for instance partition split mode where there's no right majority Then you queue up for instance requests in the in the load balancer until the Um, the core has healed the partition and then it will continue to write back the But that's a good point. So we will make sense to provide some Like documentation on how to set some something like that up so that you can have application that sends events to the cluster And then it can replay recover events after the partition has healed. Yeah, so that's a good In general that problem of any distributed system, right? Yeah, so Cool. Thank you very much