 So I, myself, I'm a distributed systems engineer at Mesa Sphere. That means I mostly bring in like mesos. But I also, I've traveled quite a lot. I've, for example, seen sparks on it. I actually try to keep connection to the community, talking exactly about those topics. Like what kind of applications do we want to run? What kind of architectures do we want to run on top of mesos? So, in the beginning, it was all pretty simple. We basically, we had batch going on. And the Dube method used, it was crunching all our data. It was quite simple. We had just one single distributed application to do running in our clusters. But actually, we need to turn faster. So, today, actually, the Dube and batch processing isn't enough anymore. For example, like considering YouTube is streaming or uploading 300 hours of videos like every minute. Every minute, Google has like over three million search queries. Probably like Bing and the Chinese services as well. Twitter has like around 400 to 500,000 tweets. So this is like way too much that we would aggregate all of this and then run a big batch on top of it at the end of the day. So it's just like too much and too fast data to do that. Also like, for example, I see IoT use cases where we have a lot of sensors. As for example, I see new Albus. It's actually, it has over 10,000 around between 12 and 15,000 or like estimates, 1,000 sensors on each wing. And so like on each flight, it's producing like 7 terabytes of data which need to be processed for streaming. Other use cases are like, for example, like in traffic control where you also have a lot of sensors. So basically the world we are living in, there are more and more sensors that are collecting more and more data which need to be processed. Like all this data, for example, from the traffic where it tries to adjust like real time, it tries to control all the lights going on and basically tries to control where the traffic is flowing. It doesn't help me if I end up the day running my batch job, but I actually want to use it in real time rather fast. And that's what we're talking about fast data here. And what we're seeing, therefore, on modern clusters is actually we're having several kinds of applications or analytics running. Like they are still like use cases for batch processing where I collect all my data and end of the day or end of the month. I run like a big job across all my data but it's more moving towards like micro batches. For example, Flick is doing where I run like batches like this really small size like in the megabytes, one or two megabyte range. And so I can basically get results within seconds and not within hours once after starting the batch job. And for like faster, when I give me faster feedback, I'm really moving into this event processing field as for example like Flick is doing where I get results like in under a second, in microseconds. So that helps me to, for example, if I really, if I click in for example in like some store and they're really neat, was in computing the answer and delivering the answer to me it's going to get like best recommendations from me. I cannot wait for a second day because otherwise the user experience would just be bad. And for all of those areas, for all of those timings, there are actually different applications I can run and that's what's making this entire field kind of like difficult, because I always have to decide what I want to use and which timing, like which response times do I need. So the usual flow in this universe of fast data or IT, it's actually we're having like some sensors or sources as in the playing example we might collect data from the wings from like all the engines they have from all the sensors. So this is usually like the input while it's getting into my system. Then usually we're having some storage component as we also want to either store like the sensors directly or we want to store like the processed outputs. So I need something of where I have persistent storage where I can actually get my results or get my input into the query again. And yeah, then probably most important is the actual data processing. So this is the part which actually runs the queries and basically extracts the features I want out of my data. I also need like some kind of actors acting upon these results again out of the data processing. So whenever I do data processing I write some results somewhere. Usually I want some kind of app. For example, when we had this app store example before that would kind of be like the app stores and presenting the best recommendations to you. But it's basically like some application which utilizes the output of the data processing. Some actor. And to connect all of those components you can especially connect like sensors to the data processing component. We often have this in those systems. It's a message queue. So it's basically like can you imagine this as a buffer in between the producers which are the sensors and the consumers which is like either storage or the data processing pipeline. And this is like especially important in the case of a failover. You mentioned like my data or part of my data processing pipeline is failing. I can have lower throughput. I need something in between to basically buffer the input and once it has recovered I can actually retrieve all the other data from this message queue. And message queue, there are actually a bunch of different message queue. Probably like the most frequently used one is Kafka. And I personally would classify the field of message queue into two areas. So first of all they're like the typical message brokers which basically is like a project for your messages where you can have your input. You can define for example in Kafka it's called topics and then decide which about those topics basically to which outputs to which things. So you could decide that certain topics should go to your Spark cluster. Other messages they should just be stored on to persistent storage because you write them and you don't care about them. And this is basically what the typical message broker would do to browse those messages and do simple filtering on them but no like complex aggregation. Then like for especially the field of logging there are like some log-centric queues like you need a log stack which actually take a lot of logging output and helps you to do something similar. So principle-wise it's still like a queue like a buffer in between but it helps you to do like log-specific operations like already filtering or expecting features out of it. Here I just like this one example I said Kafka is the most frequently used one. I just depicted Kafka and here we can nicely see like this idea of decoupling the producers from the consumers of the data. So we basically have like several partitions and the nice thing is that first of all that helps me to scale so I can simply add more producers as long as I also add more partitions of data or partition queue so message brokers in Kafka so I can easily scale Kafka up to handle more message workloads and then on the other side I can also have an arbitrary number of consumers to which I basically route those messages and while the messages are in the system they're basically buffered here within those partitions when choosing one of them versus sets area like main and what most of us want to be talking about the system we would actually expect that each message I put in on this side so each of those messages is delivered exactly once one of those consumers but actually in distributed systems this is really really hard because in each step in this pipeline we might have failures so it's kind of really difficult to define that it's exactly once delivered so what most of the message queues will actually deliver in reality is either at most once which means it's either delivered or it's not delivered don't have it like that it's going to end up in the consumer light twice for example and the other model is at least once which for example Kafka is an example for or it ensures that each message gets at least once delivered to the consumer but it might also be delivered several times so depending on like the failures happening in the system it might be that the same message ends up twice at your consumer end pipeline and this is something you just need to consider when writing your applications or designing your system around it what kind of delivery guarantee you have on your message queue because for example you might have to deal with doubly good values in the case of at least once it might be delivered twice it might be delivered three times to the end and maybe just like one word by exactly once you're going to read that when you read the descriptions of many of those systems you're going to see that they tell you yeah we have exactly once guarantees but what's actually happening you should read like some fine print underneath because it's usually saying basically we have exactly once guarantees in cases where there are no failures if there are no failures it's kind of okay like not too complex to reach exactly once guarantees but as soon as you introduce failures it's at least really really hard to do impossible to ensure this exact once guarantee so whenever you read one of those systems it's guaranteeing that to you just try to read on in which cases it's actually guaranteeing that to you alright let's move on to the next component and this is the stream processing and stream processing this is like a current hype topic a new buzzword so actually it was in like the last six months there's been like a number of new systems being pushed out or being marketed and so they're like really a large number of options right now you can choose from there's like spark streaming for example which a lot of people use there's fling another apache public project there's storm and actually the C++ rewrite of it called Heron there's apache APACs apache Zambler so actually you have a large field to choose from the general idea of the stream processing is that you have a stream of data and you want to have like some queries on it so you want for example to aggregate over the last two minutes of the specific type of events so for example like in the engine case you want to collect engine failures over like the last hours for example and if that's greater than five you want to actually throw an error in terms of pilot just as an example and as there are so many options there are actually also a number of guidelines how to choose which of the systems might be best suitable for you first and most important in my opinion is like the maybe not most important but quite important is the execution log so the two big fields there are called native streaming and micro batches so micro batches this is basically what apache spark streaming is doing sorry what apache spark is doing with streaming and this means they basically they don't do like a continuous query but they actually take like really small batches of data and generally do actually batch processing of that but as batches are just really small usually people don't see it so it's in like the milliseconds multiple milliseconds of batching time if you actually need a higher or lower latency guarantees then you should look into native streaming where you don't have batches they really take each record at a time and process it and updates a model whereas in the micro batches they take like 10 records for example and then updates a model so this is like from a perspective in my opinion most often micro batching is sufficient for most models often like architects are here for real time but it's really good to ask like to which levels of response time you actually need to get and often micro batching is sufficient in those cases the second model the second criteria I would look at is like the fall tolerance granularity so basically at which granularity are checkpoints taken and if something fails at which point can I basically rerun or restart a workload so there are systems which easily do that per record so they actually take per record also a checkpoint and so they don't have to reprocess a record in case of failures they actually do it like per batch and also like in this per batch model there are different granularities again like there are systems which take like rather coarse grained batches or each checkpoint and there are systems which take like really fine grained batches like 10 records at a time again this is like a trade off like more often you take a checkpoint of course it's costing you performance but on the other hand in cases of failures you can restart more quickly if you're taking really coarse grained if you're taking really fine grained checkpoints you don't have a high restart cost but on the other hand you have a high cost and lower throughput as you always taking those checkpoints then similar as with the message queues delivery guarantees also streaming systems are also different between different models and it's worse looking at it what you really need it might be that certain streaming systems are actually not guaranteeing that each tuple, each record gets processed but they might drop records in cases of failures so you need to decide whether that's okay for you or whether you really need to consider each individual record and you can't resist anymore then something which is also if you look at the GitHub of all those projects which is wearing quite a lot is the community activity so some of them are really active and actually gaining activity with lots of graph others are really like more in a decline and I would usually do for one which is still supported in a while first I believe the market is quite big and I don't think all of them are going to survive for the next two years so I would have a look at which direction is the community going when deciding for one of them and of course for us most important is the MISO support and many of them actually support MISO as a native scheduler and actually many of them are currently working on either improving or adding MISO support to that system and most notably maybe Flink is going to have in the next release native MISO support, they already have very basic one undocumented in the current version which is out but from the 1.2 version they are going to have really nice support for using MISO as a scheduler for Flink and also the Apex people actually Apex they are also just starting to work on MISO's integration so it seems to be really a topic for them and also what we hear from customers are also like asking or users whether it's possible to run such workloads on MISO itself natively as example for a streaming system I just picked Spark Streaming that's what I see is still most frequently used and in my opinion the big advantage of Spark Streaming is that it nicely integrates into this entire ecosystem of Spark so if you're already running your Spark jobs you can actually reuse the same Spark jobs and just run in like a streaming fashion on your data so you don't have to rewrite your jobs from scratch you can just take them and throw them on the streaming part of your platform and as mentioned before Spark Streaming is using those micro batches so what we see here is actually they're taking really small chunks of the input data and then run those queries over those small batches and this is also why you can basically reuse the same queries under the input still batch processing and nothing changes in the end alright, now the storage part of the picture so storage parts even larger than the streaming part because there's been like really a large number of projects so I just try to classify them a little bit into different areas it really differs what kind of system you want depending on your use case so for example like time series databases they're really great if you want just for like IT data, like sensor data because usually you have like for each point in time you basically have like the same record structure and so it nicely fits and nicely compresses in one of those time series databases for other use cases maybe like those new SQL databases like for example document databases or graph databases might be a better fit for your model and if you're in like a traditional world also like big SQL systems are the right tool for you because you have an existing query for reasons you want to keep support for like a SQL interface if you're just collecting large chunks of data and you just want to write it to a file also like file systems are quite interesting to use for you here I actually picked as an example Cassandra so Cassandra is I would call it a column oriented key value store so not necessarily database but it's basically it's a key it's a key value store which allows you to have several columns of data internally it's replicated so you can actually nicely survive failures and it's quite there are some large companies using that relatively large production settings so this is like the nice part that it's basically validated and checked in production proven in large running systems alright and all together actually there's a common combination and it's called smack stack I'm just going to top it up here and so this is actually what a number of people are running on top of mesos or on top of ECS and this is like a nice and proven combination for doing especially those IOT like word loads so Snipe comes from Spark which we just talked about the album of course that's for Mesos we here at Mesoscon A is ACA so ACA would kind of it's an actual based framework to write applications which is really nicely fits to distributed environments for storage we have Cassandra in there and as a message queue we would use Kafka and one example how that could that jumped a lot so this smack stack is actually as mentioned it's used by multiple people but they're still like even though it's easy to deploy that on top of mesos they're still like challenges so in general distributed computing is hard so even though it might be easy to install those systems you still like to keep on monitoring and where it might be difficult to system elasticity usually you need to be able to scale up and scale down individual components so if you have set up this big stack usually it doesn't don't scale all with the same factors so for example you don't need to increase the storage by the same factors you would need to increase your data processing layer if the input increases so imagine you now have a double number of users usually each of those components in the stack is going to have a different factor for scaling and yeah of course also really difficult topic in general is like how to figure out when something is wrong in your cluster and that's actually as you have a lot of components and they basically chain it might be kind of difficult to figure out at which point is this chain now as an arrow occurred what's actually wrong so as mentioned a lot of people start out going to other clusters and this is kind of like this typical motivation to use mesos right initially in a pre-mesos data center you would actually you would pick like certain sub-plusters you would basically partition your big clusters into one for doing data analytics with Spark or Hadoop you would choose one for Kafka like three or four nodes you would choose two nodes for MySQL five nodes for your microservice and another number of instances for running your storage for example with Cassandra and so as you always have to pick like maximum number of nodes which you might encounter so to serve like the maximum workload which could ever occur you usually wasting like a lot of resources and having like a relatively low resource utilization and so this is actually this as mentioned prime I want to say prime use cases for mesos which allows you to actually consolidate all those different workloads on fewer machines as you can basically co-locate it and you don't have the static partitioning across your cluster just one slide on ECOS about that so ECOS is basically like enables you really nicely to do especially because you have like this universe store from which you can easily install all those different applications and so this entire like smack stack you can easily set up by just ECOS packaging stalls individual packages you just need to write like your ACA application and you have this smack stack basically running from ECOS rather quickly still even though it's like really quick to set up what you should keep in mind is also like how do I operate that so this is what we usually refer to as like data operations so how do you do updates so how do you first update like for example your standard service how do you update your underlying ECOS service this is all like operational points you should keep in mind even before designing architecture in such system and also like how do you do general maintenance so even though it's like inbuilt fault tolerance you should have backups to be able to restore as they're like in critical failures and you need to restart your cluster you should monitor progress and basically monitor metrics to figure out when something is really going off of wrong in your cluster is there too many tasks failing is there a task restarting over and over again and the master is flapping and restarting and restarting so those are all like important points you should keep in mind when actually operating your cluster and those points are going to make it easier also as you understand your cluster to debug potential runtime problems of your cluster but actually we had a talk just two hours ago about ECOS SDK so if you want to run exactly the smacks that we were talking about it's all there but you might actually want to decide to write your own framework or integrate your own framework as for example, occurrences of the fling people or the Apache Apex people are doing and there's actually like an experimental SDK which can help you to develop those new stateful services where you actually want to store data and write data so it's rather simple to write like a new framework or doing data analytics or integrating one of those new frameworks and all this smacks that is actually it's running in production at rather large scales so for example Uber is using Apache Cassandra on Mesos Bing is using Kafka or DCS Verizon is also using Cassandra and Kafka so those are also like a nice validation that it's running safely at its scale on top of Mesos or on top of DCS all right this actually brings me to the demo I briefly wanted to show and it's actually the ASRI demo which was mentioned this morning even by, I don't know, yesterday by Aaron so it's actually like ASRI we're like a big geospatial geomater processing company and what we have here what we'll see is we're going to see like how the taxis are being tracked driving throughout New York City and it's a different use case as we can use for that see architecture just so we know it from before seeing the demo it's actually like underneath we have like a storage layer we're using elastic search so it's not exactly the smacks stack we're replacing it to be C by an E for elastic search but the rest is actually the smacks stack so we're using Spark for data processing and then we actually have an input of all those taxis driving around within New York City or simulating New York City in the demo and the flow actually it's going to be we're going to have event sources here and then we're going to use Kafka as a message queue in between so it's first all going to go from Kafka and from there actually the analytics Spark which also going to pick it up and then push the output into elastic search from where it's actually picked up and displayed on our map so the map we're going to see in a second it's actually it's going to be by the elastic search storage I actually wanted to use the demo here but it's kind of was kind of hard to set up here because it's not a connection for your laptop so I actually would just show a video if you've never seen it throughout the video thank you so we start by just seeing basically our cluster with 11 nodes so we actually this is a rather large cluster for a demo but we actually as the demo data set it's quite large so we consciously picked like a large cluster to run that demo and fitted the data set to that here we see this universe we talked about before which actually enables you to install this MaxStack just by clicking install package here and they're actually also like what we saw on Alps this is like official packages which have been certified and well tested and they're actually like all those community packages so this is also what I like about the universe is that anyone can actually put a package there and also bring out a new package and enable a lot of people to use it here we actually see then the elastic search UI and so when running elastic search here was actually quite cool what we just saw we could easily scale up and scale down our service so we just there by clicking we just increased like the number of service we wanted for our elastic search so here we actually rechecking that Kafka is running so I can actually control Kafka why is it ECOS CLI so I don't have to go like to note that in Kafka specific configuration I can actually do all of that from the ECOS CLI and we just checked that all the brokers are basically running Marathon which is used as the init system for the cluster and so it's actually running all the services we just installed and we see all of them are basically healthy and running clear and now we actually start deploying their so called Rats that's for real time analytic tasks and this is basically the spark screening job so we're deploying two different Rats and one data source which we see here so data source is basically like the simulated data of taxis driving around to New York City and we can check here they are all running and they are actually all running on different hosts and if we check what those Rats has actually are they just using like normal spark screenings and using the Mesosphere spark docker image and do like a simple spark submit to run their to run the job so nothing special nothing fancy and this is the source so this is as said this is going to simulate the taxis driving around to New York City and again this is just like a docker image and creating those data and pushing it into Kafka and actually as Mesos is powering all of this so these guys feel like built up on top of Mesos we can actually check within Mesos itself which tasks are currently running within the cluster. Alright and here this is now the actual JavaScript application just playing all those data and what we see here this is basically like an aggregated view of how many taxis there are within this region I select and right now we're zooming in and we can see the further we zoom in the more fine granularity you get and we see that like in each of those rectangles how many taxis there are currently and if we want picking you switch the street like a more hexagon view and also see the aggregated view. If we really zoom in to further we're going to see individual taxis as individual data parts so those are basically like see simulated real-time taxis driving around in New York City now we actually we're going to take a look at JFK airport and see what's going on there how many taxis there are and what we can actually do we can identify individual taxis so here we're checking out this how many taxis it has how many passengers one and we can also see like the taxi ID so we can actually identify and drill back on individual data points here the other part that we can do as this is like geostemporal as well we can actually replay times if you go back in time and view how something happened over a specific time frame we can actually even get queries on it so right now we saw this one taxi before with like ID I think 180 and we can basically just say I want to track this taxi what it did over the last hour at JFK if we zoom back out and I think now we're zooming to Central Park slowly but slowly we can actually do a more IoT like use case so what we'll do here we have the simulated person and this area you're seeing is actually the one minute radius from which it can be reached by a taxi so this is like factoring and current traffic situation and like road situation and you just added another data source which is going to deploy a new taxi driving around so this taxi it's going to drive around and as soon as it hits this area you can actually inform that person and yeah he's happy now because the taxi is almost there and it's going to reach him so this is kind of like an example you saw how easy it was to deploy that the standard components are already there so basically I just have to connect them and then have my data input which in this case it was simulated and I can easily analyze and for example like this one minute radius from where someone can be reached or picked up by a taxi now we're zooming into north and what we actually do is we can also display a heat map so we can basically over time so again this is like the temporal aspect we can see the hotspots like where taxis used to be and this could for example be used to identify if you're matching people being there and taxis being there where like hotspots I like operating my taxi service where should I deploy more taxis at which time of day because there's like more demand or too few taxis coming up alright yeah that was the demo video I said I wish I could have given it life but it's kind of difficult to set up here but that actually already brings us to an end of this demo actually I can just urge you you can actually run that yourself I have the different links to the demo video in here so go there and actually try it out and then just build your own application processing and yes everyone would actually open up your room for questions no questions