 Yeah, welcome to the first session this morning and what I would like to talk about is Apache Flink on Apache Missiles. So maybe just a quick raise of hands. How many of you have used Flink before? How many of you have used Flink on Missiles? One, okay. Let's maybe change that a bit. So why I'm actually really excited about this project is because it's a really, it's one of such showcases of community work coming together. So there's Apache Flink scheduler which is being written. That's a collaboration of people from LightBand, EMC, data artisans and a smithosphere. And there are actually also other like individual community contributions. So this is why I especially like this project quite a lot because it's actually a lot of people coming together developing really cool software, really cool integrations of two open source projects. This talk, Till, he's working with data artisans and he is actually right now they're having a release which we'll see in the end which has some pretty cool new features even giving us more power when running on Missiles, giving us more elasticity. And this is why he unfortunately couldn't make it here to MissilesCon, but he's still, he's like part of this presentation in heart. If we look back at computing, if we look back like 10, maybe even like 15 years time flies, it was pretty simple. We had like one option if we were open source, if we weren't Google or Facebook and this was like build a huge Hadoop MapReduce cluster and crunch your data. So that kind of worked. We didn't have much other choices and that was basically our one big cluster. Nowadays it's unfortunately it's a little more complex and we need to turn faster. We have realized that MapReduce isn't really efficient, isn't really fast in latency when it comes down to latency. So we actually need something faster here. What do we actually mean by faster? And so what we often see is that people implement something which is called a smack stack. So the smack stack, this is basically this typical iteration if I want to implement something which needs to be faster. Like on the left side I now have events where if we look back to MapReduce it often were like large chunks of batch data we were collecting over either the entire month over the week, but now when we're talking about fast data we are more talking about events and events that could be for example credit card transactions, that can be a plane, that can be any kind of infrastructure which has sensors attached, self-driving cars probably come to mind, Uber and all those technologies which are actually collecting a lot of data in real time. And with such kind of systems the infrastructure becomes a little more complex. So all of a sudden I can't simply write it all to a big file and then digest it at one point because I want to do that in real time. So what I'll usually do instead is I'll write it into an ingestion queue which could be for example Apache Kafka. And then I actually have my analytics layer which could be Spark, which could be Flink or any tool I actually want here. And the results I then usually store somewhere in some distributed data store. And storing of course it's not sufficient by itself I actually have to act upon this as well. So usually I also in such kind of infrastructure for processing fast data I also have like an actor which often is implemented in Acura because it's really nice but there are also many different other implementations. And with this stack we actually we're fulfilling like one part of this picture which would be the event processing part. But if we look at the overall picture of our overall data analytics needs it's not just this smack stack it's not just this fast event processing data it's actually it's more. We still have use cases for batch data for example where cases where I don't really care about latency where I can collect all my data and then I run it at the end of the months with like best effort on my infrastructure basically just maxing out whatever compute resources I've left for my other services. Then for many other services so for example if I'm Amazon I want to show you new product recommendations I want to update my real-time pricing depending on what users are willing to pay or how many users are. I actually I want something faster because then I cannot wait for days or hours to actually update that but minutes and maybe like tens of seconds is actually okay. So that's then what we usually talk about micro batches and then this last use case where I actually need response times immediately so if I'm using my credit card here in Prague I don't want to wait for 10 or 20 seconds until the bank authorizes and says yeah this is an okay transaction I want an answer as quickly as possible. So as we've seen there are use cases for actually many of those in our infrastructure so it's not just reducing to one of them we actually we need all of this for some kind of scenarios or in most scenarios and for me this smack stack so coming back to this picture smack stack actually as you might have guessed is comes from this individual name so the S actually stands for Spark the M actually stands for Apache Missiles as we hear at MissilesCon the A stands for ACA C stands for Cassandra and K is representing Kafka in our case and for me if I'm talking about this smack stack it's not so much about those individual technologies making up the name because actually in each of those layers we have a different number of options to implement them differently and here actually would like to take a look at the analytics layer what kind of options we have here. So if we look at stream processing even though the smack stack is named after the first letter comes from Spark there are actually many other options so the first one probably around there being used was Apache Storm then Spark is very common still probably the most favorite tool set to be used here but we also have other tools as SAMHSA, Flink or Apex becoming really popular right now. I was really happy to see the Apex stickers at the booth over at the Apache booth for example and then if we also take a look at those different cloud providers we are having most of them are actually offering their own solutions like Kinesis, Dataflow, so all of a sudden I actually have a lot of choice to choose from. So if we take a look at Spark and I said Spark is probably still the most commonly used one and that's probably for this reason because it's not just for stream processing a lot of people have already implemented jobs in Spark for example for their batch processing they already have a setup infrastructure which they might use for any of those use cases up there so machine learning, graph processing even like Spark SQL so there's actually an entire ecosystem around Spark and Spark streaming is just one part which nicely fits into this picture. If we look at Spark streaming or actually as I was just at Spark Summit we are mostly talking about Spark streaming 2.0 nowadays so what's kind of the implementation detail which is also important if you consider whether it's suitable for you or not is actually the way in which Spark is dealing with that. Spark has been originally been written as this batch processor and they actually utilize the same code paths and the same structure for their streaming jobs and how they do that is they actually collect individual tuples so they might for example collect five tuples and then they will actually go and process this micro batch similar as they would do with a large batch of data so they actually can reuse the same jobs and the same code to do so that is very nice because first of all it's efficient secondly yeah it's really cool for me because I don't necessarily have to write new code but on the other hand it adds some latency because all the sudden I have to wait for this micro batch to fill up so if I actually have to choose between Spark and maybe some of those other implementations some things I should consider is for example the execution model whether I have the need the latency need for something native streaming which would process each tuple individually or whether I'm actually okay with micro batches I should consider which fault tolerance guarantees I need how quickly I should recover from failures so for example if I'm running on a lot of spot instances and I really expect many of them to fail slash be shut down during the runtime of a job maybe I should also consider like how long does it take to recover from a failed job and if we look at this execution model and this is basically what we've been talking about this main distinction is that Spark is collecting those micro batches of data and then on the other side most of the other frameworks they're actually processing each tuple individually this is not necessarily just purely better this is just different because it also adds some overhead such as accounting such as processing if I can just process one batch at a time this of course is being more efficient than if I have to have the yeah accounting overhead for each individual tuple and this is actually this this is true also for the fault tolerance guarantees and as we can see flink is actually kind of cheating or trying to be more efficient there as well as actually keeping track of each tuple for fault tolerance so basically saying checkpoint this tuple has been processed and if there is a failure I don't have to reprocess it it actually doesn't want to do that per each individual tuple so what flink does is the acknowledgement per per batch and I think I this should be in the other order sorry for that so what storm does storm actually does actually acknowledge each individual record and what flink is doing is a checkpoint per batch so they go in they process each tuple individually but then the checkpoints they are making is actually per batch of data or per micro batch of data and obviously Spark as they are processing micro batches anyhow is doing the same for fault tolerance delivery guarantees and somehow on this beamer I think my head headings get messed up so what storm is providing or claims to provide is for example exactly one semantics so they are saying I'm really just gonna process each of them at least once what many other frameworks right on on the outside is that they support this exactly once guarantees but you should be careful what's being meant by that because we are talking about distributed systems and actually in distributed systems it's never you can never guarantee to process each tuple just once because there's always like one point of failure one failure mode where this might fail because it fails at exactly the instruction where you would checkpoint something so you should be careful when writing your applications whether you actually gonna receive or process tuples multiple times or whether you expect them to be just processed once so for example a typical pattern is to include include some key in your data and once you have processed that key you actually wouldn't process it again what does it mean for our data center it means for our data centers those times where we where you do would own the entire cluster they are basically over and so nowadays we actually talk a lot about different subpartitions in our cluster we might have a fling sub partition we might have a Kafka sub partition we might have a microservice sub partition like 10 20 nodes for our microservices and this is usually very annoying because first of all it adds operator overhead and secondly the utilization is really going down because I'm wasting resources in each of those sub clusters and as we're at missus con this is exactly the vision of missiles to basically unify those all those resources into like one big pool and I'm actually treating them as like one big resource and hence I don't care on which node for example fling is being scheduled I simply say I expect fling to have I don't know 20 CPUs worth of computing time still if I'm just looking at pure missiles they are still challenges so if you've seen every x DCS talk actually missiles is just a kernel and we have need for stuff around so for example scheduling monitoring security CLI companies as Cridio as Apple as Netflix they all have large teams around it which can actually build that themselves but in general we don't want to build all this themselves we just want to install our operating system and be able to roll and this is the vision of the open source DCS where I basically can install all of that out of the box I don't want to go in more detail on DCS because I think we heard a lot about it this morning still developing all those services developing a spark service on DCS developing a Kafka service developing a Cassandra service is really hard so for example here this is a state diagram for the persistence and one of those frameworks so you can see you don't necessarily have to understand that in detail but you can see it has some complexity of really insuring that you have just reserved persistent volumes in a meaningful fashion and there are other challenges such as how do I support multiple frameworks how do I support upgrades between frameworks and so on and this is exactly where we need to operate those distributed services or develop them and this is where the SDK which was also already mentioned this morning comes in where I actually can simply write a YAML file and potentially extend it with custom strategies so I don't actually have to write the scheduler from scratch with all the complexities such as a reserving persistent volumes but I can actually just install that out of the box and this is actually what we are currently working on for flink as well flink currently is one of those schedulers which is down here built your own scheduler but we actually are progressing rather quickly in both bringing that scheduler forward and also moving towards the SDK so right now our focus is actually to implement new flink features so we'll talk about this really cool Jira which is called flip six in just a second which actually adds a lot of elasticity so that will currently still be implemented in the original scheduler which lives in the flink code but in the future we actually are planning to move to the SDK yeah so why is that actually a good match so what do we need if we are flink so we actually we need to be able to run multiple applications multiple jobs we need to have dynamic resource allocation in a large cluster and most important we actually we need to be available if our stream streaming jobs are running and there's a note failing I still want that the job is up and running because it should actually process all the events coming in why does Apache Mesos help me there because actually Mesos is exactly written for this goal to implement fault tolerant and elastic distributed applications and as an interesting fact together with the flink community we had a survey of how people are using flink and 30% of the survey respondents that they were running flink on Mesos even before we introduced the official support into before we introduced this official Mesos framework so they were running it in marathon or some kind of other setups simply to help us fault tolerant and so that was kind of secure for us to actually write a proper Mesos integration of in the flink code base and it's kind of it's a traditional scheduler if you want so to say so the scheduler parts is there on the left of where we have the resource manager which is talking to the Mesos master and then the job manager which is basically responsible for spawning up all the tasks so kind of this two responsibilities of a framework first of all resource management and secondly those task management has been divided into those two components and if we look at a little deeper into the resource manager it actually has four components first of all simply the connection manager which is checking hey can I still talk to the mesos master or do I actually need is there a new leader election has there been a failure in the mesos master then there is the task monitor which is monitoring the task and the launch coordinator which is actually responsible for launching them and the interesting part for the launch coordinator is that we choose to write it in fence so how many of you actually know fence so yeah many actually so it's it's a library which allows me which allows me to easily write schedulers and in particular it allows me to easily write off a matching logic so often if I'm implementing my own scheduler I have to write things like I have or actually constantly I have to decide whether I want to accept an offer or not and this should be based on certain criteria do I have enough resources is this co-located with other stuff and so basically what most scheduler implementations do they'll collect some offers and then they'll decide should I keep an or reject them how many of them should I actually accept and fence is a really easy tool to make said very simple to implement so you don't have to implement all that logic from scratch the reconciliation coordinator is responsible for if there has been a failure if there's been a master switch if the framework is restarted somewhere else because the scheduler failed for some reason it actually we always need to reconcile the state between the master and the scheduler so if you want of that both have the same view on the cluster again and this is what the reconciliation coordinator is for so the interplay is basically here those the missiles master is going to send out offers and the launch coordinator will then decide whether it should start something launch coordinator will then receive startable task from either the reconciliation coordinator and then it will basically launch them once it has launched them the task monitor is responsible for actually monitoring some so that's the component which will process the stat status updates it's going to receive from the master and if something is failing the reconciliation coordinator will coordinate with the master to yeah restart all those new tasks to recover those tasks as mentioned this phantom library if you ever go to implement your own scheduler without the SDK I can just highly recommend taking a look here so it's kind of really helpful because it has this pluggable fitness evaluator which as mentioned helps you to decide whether you want to accept an offer or whether you don't want to accept an offer and this is just integrated into the launch coordinator here so the launch coordinator basically receives a task it wants to launch with their task description and then fan so can automatically match them to resource to the matching resource offers so and then kind of return which one should be started and now this is actually my probably my favorite slide this is a new architecture which will be in the next flink version and this is what I previously described as flip six and we actually restructuring a lot of the code around so first of all we're going to have a proper mesos dispatcher and this actually helps if we want to spin up multiple of those the way you would currently start flink is basically you spin up here this component here but now you actually have this long-running dispatcher and for any job you would you can start you would spin up one of those new resource manager job manager process bundles so it makes it much easier to actually run multiple jobs on flink the other big advantage we are having which maybe it's not quite apparent on the slide but as we restructured this code we actually made it flexible the resources you can allocate to running job imagine you have a long-running flink job it's the beginning you might actually might want to use some more resources but then at the end or maybe at night you actually don't have that many users you don't have that many events coming in so now you can actually scale it down and then scale it up again while the job is running so you can basically spin up those task managers more dynamically during the runtime of a job which previously was only possible when you started up your system so this really gives us a lot of flexibility and better resource utilization when running large long-running flink jobs on top of missiles all right let's flip over here because how much time do I've left really okay awesome so this gives us enough time to actually go through the demo and the demo is something where I simply want to show one of those smack like pipelines together with flink here so on the left we have our data generator which is basically just putting out financial transactions so our goal with this demo is that we actually want to detect fraud of fraud in this meaning of money laundering so a transaction is basically I'm transferring money from account a to account b and whenever this over multiple transactions sums up to more than 10,000 US dollars there's potential money laundering and then something someone should be alerted and so how this demo is set up and as mentioned this is kind of the typical smack stack setup so this is on the left our data generator this will write all those transaction data a to b this amount it will write that into Kafka and then I can have I have multiple options I can either run spark I can run flink and those would be consuming the data out of Kafka and trying to aggregate it over certain time windows and trying to detect whether the amount of transactions in this time window is greater than 10,000 US dollars we'll see that in code in just a second but what I really like about it and this is kind of one of the advantages of using Kafka and such kind of smack stack architecture is that actually here in the middle I have a lot of flexibility in what I want to do I can actually run spark and flink simultaneously processing the same data from the Kafka queue and this is because Kafka actually is also persisting data and so I can have multiple consumers consuming the same data and this feature of Kafka that it's actually also persisting data we make use of that in the fourth step as well we actually use Kafka as the data store and so the results out of this flink job they're actually written back into Kafka as kind of the persistence layer in our smack stack and then we have a short little display which will actually show that in the end so let me oh and people are already clapping over there do I really have half an hour left okay I'll try to type quickly so if you haven't seen DCS it kind of comes with this with this app store this app store makes it really easy to install things and first of all I actually I need to install Cassandra because no I don't I'm talking wrong wrong demo first thing I should install this flink so let me install flink here great and the second thing we actually need recalling my slide and recalling the right demo is I need Kafka so let's also install Kafka and I want to normal Kafka and not see confluent version great so if we look at our services we see that both flink and Kafka are deploying and while they they're doing that we can actually already go in and distribute our data generator so here I'm and actually this demo is online so anyone who wants to run that feel free to do so and just to have a look at it the generator it's a really easy easy JSON file and all it does is basically curl this image curl the binary and then run it and in my opinion this is something really nice about DCS and mesos therefore that I don't have to construct the full container image so usually I would expect that I have packaged that into a Docker container I pulled that Docker container I run that Docker container here I can actually just run that S is and kind of constructs a container on the fly so let's deploy that DCS marathon at generator.json and this is deploying so we hopefully see that here in a second yes here it's coming up and it's already running because actually I don't have to pull like large images I simply need to pull like this I think it's like two megabyte binary and it's actually up and running so if we take a look here into the logs it's hopefully already producing yes it's already producing transactions Kafka is also already up and running so the next thing I need to do is I need to create those pipes in Kafka so I can do that from the CLI as well DCS Kafka topic guess it's just topic create fraud let me create the first one which would be the output DCS as I installed Kafka from the UI I don't have to CLI extension installed so let me quickly do that package install Kafka yes I want to CLI extension and now I hopefully can create my topic that's looking good because it's taking long great and now let me also create a transactions topic transaction ah it already exists because my generator is already up and running as the generator is actually writing into transactions the generator was faster in bringing it up so once we automatically write data into non-existent topic it's automatically created so good we have both our topics here great so let's look at flink here and flink comes with a usual web UI you might be used to by the way I can do the same from the CLI so similar as Kafka flink also has the CLI but I usually I I like this squirrel just so much so I usually go way here many folders it's in my go directory because there's some go code in there almost there okay flink job and now I simply gonna upload to jar upload and if we care about the code I don't think we have time so or now we'll we'll skip the code the code is online so what I like about the flink code and this is actually why I choose flink to do this demo is simply because it has a very nice way of dealing with event time so if we are dealing if we're doing stream processing with multiple options to deal with time I can either so if I'm doing a window for example over a day so if I'm saying within a day I don't want to see any transactions summing up to more than ten thousand dollars and then the question is what notion of time we're talking about this is the notion at which the event arrives at the stream processor is it or is it actually the time at which the event was created and flink has very nice support for this event time so actually the data generator is adding the timestamp at which time the event is created and flink makes it really easy to utilize that event time and sum up over event time or have the window size over event time let me just start the job cool you see it's up and running and now this is basically this streaming job which could run forever which could run forever in the background so for example also when you would upgrade your flink job or your flink version you can keep the job up and running which is kind of necessary if you actually want like high availability and you don't want to impact your user simply because you're upgrading your system cool and we see that everything is running now that's great and the last thing we actually need is our our monitoring tool and this is actually also here in the repo I said the link is also on the slides so this is our actor and simply going to deploy that again this is a simple go binary so it should be up rather quickly it's deploying pulling the binary and now it's up and running and now all we can do here we already have the first detected fraud so what the system will do it will show us from which time stamp to which timestamp we have detected transactions summing up to more than ten thousand dollars and this is like one transaction over three thousand three hundred dollars and one over eight thousand dollars so it sums up to more and if we keep that running it's probably going to detect more over time so as it goes on it detects more and more of fraud over time so this is as easy it is to set up such kind of pipeline in an orchestrated fashion we see that our cluster utilization it's going up and this is probably one of those metrics I should monitor how good my cluster utilization is which actually brings me to my next slide which would be about how to how to keep this actually up and running so if I go here usually when giving a demo I really like this demo effect it's up and running in the end but if you're an operator or even a developer developing such kind of pipeline you should remember that the hard part actually comes afterwards how do I keep that up and running how can I update my Kafka how can I update my spark how can I update my fling while keeping this pipeline up and running and being available for the users because if I'm using my credit card and that wouldn't detect like money laundering fraud but it would see whether my credit card transaction is okay then I actually don't want to wait for the system to be upgraded until I can use my credit cards this should be up and running all the time so conclusion flink is has really nice integration with mesos and it's as said it's going to be even nicer in the next flink release which should be out next month because we actually support dynamic resource allocation during a running job and this actually together with DCS and the other packages which are available in DCS it's a really easy way of building first of all of running flink and secondly of building an entire pipeline which usually is needed to run flink in an efficient way in a streaming architecture thank you very much for listening as said till he was also quite involved in here but he's finishing the next flink release so we have even cooler things to showcase next time around any questions I believe you were first or behind you thanks a question about the upgrades deployments a new flink versions of new jobs versions if we fix something in the job does it use a save points what how does it work to minimize that down time to minimize that downtime you should configure flink together with a stable store usually HDFS and then it's checkpointing those yeah checkpoints on a stable storage and when upgrading this will also be easier with the next version you basically you the new you killed one worker it's been up the new version worker that picks up a checkpoint and so on and so on and you just cycle through can I spin up the new version while the old version running and then kind of do the switch there you can do the same so this what I described first would kind of be a rolling upgrade the second thing would kind of be a blue-green deployment style of upgrade okay thanks and oh by the way if you're interested in or exactly have such kind of questions there's in the DCS slack channel which is DCS with chat DCS IO there's a flink channel where exactly those things are being discussed thank you for thank you for the talk I have one question if I maybe for my lack of knowledge of flink but do you know do you talk about dynamic resource allocation but how come flink managed in a state a stateful situation like I always think like a spark when the spark use of the state by key and you have to to have this location you have to base your your programming external services is the same for flink or has another architecture the architecture is slightly different I yesterday had a really interesting discussion about exactly this architectural difference at spark summit key tl tl dr is that flink is keeping track of the state differently so if you spin up a new worker in the new version right I'm talking about the unreleased version right now it will basically spin up a new worker and then using similar to dynamic hashing techniques we basically redistributing new incoming queries and it can automatically gather assets in checkpoints are written to htfs or some kind of stable storage in the cluster so the new worker can also retrieve that stable storage that is a tl dr I'm happy to share the design docs for that if you're caring about more detail on that if I understand well I think that you say that the framework is moving to the commons in the future yeah not for the next release but there are plans to move it it kind of depends but we are actively discussing it let's just put it this way there are meetings as said if you're interested in also inputting your opinions there so it's a lot of discussions of pure msl support whether DCS support so kind of balancing it's just the discussion point we are seeing is that some people running it on DCS that would benefit by having better security integration so right now you can set it up but it's pretty difficult you have to set a lot of parameters and if it's basically generated by the DCS SDK you kind of gets it all very simple in a very simple setup way what most likely will happen is that it's the beginning we're going to maintain two kind of versions until first of all the DCS SDK is either it's runnable on mesos so it kind of has an output let's see whether that happens or kind of we have all features completed so we'll always let's just say we'll always support the mesos users as well which are not running on DCS this is like an important part of our discussion also about the commons the actual intent is more relate to to to start a framework than processing one I suppose and you mean like this for example the dispatcher architecture yeah yeah right right now this is also why this is the second reason why we're still waiting is the SDK support but this is in general we also found some ways to hack around that for for the TensorFlow framework and this is the same problem there so we could find out figure out a way how to do that it's just once the SDK probably supports that this kind of dispatcher architecture which you also have in spark which you have in many other frameworks then it's going to be much nicer and the and the last question is about FENSO how do you decide to include FENSO at the core of the of this scheduler versus a support implementing that kind of logic a custom logic for doing that we first of all we looked at the logic in spark then we looked at so the history of how we got to schedule of being written was the initial support and fling only yarn and then we moved over so we kind of refactored the fling code and moved over moved it to a more general architecture so we can support both yarn and mesos and in that we figured out that we actually we have to do a lot of code of matching which you know that yarn has kind of a different model where request when writing that it was kind of holding us up and they're actually FENSO was really helpful in writing it because it took away all this logic we would otherwise have to write and maintain into a very simple component. Thank you so much.