 So, I do not have to explain why it is called test right this conference at least, but this particular effort starter of us something that built on top of what MapReduce had built over the years in Hadoop which is to have a way of having a data pipeline where you have a constant stream of data coming in being processed being query and being consumed by people in various ways. So, this at the very high level is a distributor execution framework which lets you process data and it uses something that is called a directed acyclic graph to schedule things and it is written to be very, very flexible and customizable for anybody to use for the special cases. It is in some sense a generalization of all the things that people have asked for over the years from MapReduce and it is built on top of Yarn which is what Hadoop 1 to Hadoop 2 was a shift from to clarify for most people I am assuming that almost everybody here has heard of Hadoop. How many of you use Hadoop? How many of you use Hadoop 2? Smaller hand today. So, Hadoop 1 had was a single purpose framework. It had MapReduce and had HDFS. It had two things which you could use with each other to do everything you wanted to do. The other side of this problem was that if you wanted to do something else you had to convert whatever you wanted to do into a MapReduce job running out of HDFS. This was apparently enough for people to build a large number of things including an SQL engine called Hive data processing system called PIG and the basic MapReduce thing that people wrote Java applications and that worked very well. It was scalable. It did a lot of things but what it didn't do is it didn't let people make framework decisions that benefited your workload. If you had a workload where your processing data continuously for instance you couldn't actually host it on MapReduce. You'd have to actually start a job to start processing data and the job would end and the data processing is over. So, Jan came on top of this particular thing or rather below MapReduce in a way that lets people run various kinds of workloads on the cluster without having to actually make assumptions like it is written in Java. MapReduce had that strong assumption. So Jan lets you do things like let you run applications which are not written in Java written in any language you want but on top of that you had to build a handling. I think this statement has been made many times in the last two days that distributed systems are hard right and distributed systems that perform fast and are scalable is doubly hard like picking one of those two is really easy like I'll have a fast system that fails occasionally is easy in relation to building a system that is scalable and fast and you ended up with a bunch of data flow mechanisms that are traditionally written using MapReduce which need a sort of new home right that is where there's really comes in and the key innovation that lets us do this is that the bottom layer is Jan which splits away the execution part from the Hadoop cluster part. So you run a Jan cluster you can run Storm on it you can run H-Pace on it you can run big on it hive on it MapReduce on it in that that used to be the same thing before now we put in a layer in the bottom called case that lets us build some things that MapReduce used to provide but specifically tuned for things like big and high now the fact that MapReduce is slow has been set by so many people over the last five years and the solution most people have is to go back and invent something completely new right people have tried to build like SQL engines which is which do not use MapReduce and arguably there are some arguments in favor of building SQL outside of Hadoop but if you want to do SQL in Hadoop right you had to actually split away from MapReduce and lose all the nice things that people are done into put into MapReduce over the years right nobody's going to like if you build something new today and you are structure today it is very difficult to get somebody like Yahoo or Amazon or Google to actually test it on like 10,000 nodes and actually figure out what decisions that you made for 100 nodes work for 10 nodes and how many of them work for like 10,000 nodes so people use the word legacy to describe MapReduce right I want to use the same word to describe MapReduce but as somebody would describe a legacy that you inherit right it's a large amount of collective work and time spent into optimizing and fixing things at scale that we do not want to lose in the process so on top of that Hadoop had a lot of security work done for MapReduce on Yarn which we didn't want to lose when we built this on top of Yarn so this so this addresses really like three main problems that MapReduce had first you had to convert everything you had to do into a MapReduce job so what if what you wanted to do didn't actually fit into MapReduce what if you wanted to put something in the middle right you had no option of living outside that particular constraint framework and in some sense people built sequence of MapReduce jobs chain to each other as a way of describing what they actually want to do right and this is not the most elegant way of doing things but as long as you only had MapReduce you had no choice but to use that to run it on a Hadoop class so expressing the computation better is the first thing that we looked at second thing is performance because as as people said MapReduce is slow some of the slow parts are because you couldn't express what you wanted to do in MapReduce right but the next part is the actual run time if you could express what you wanted to do with MapReduce say you want to run a map reduce and then a reduce could you do things parse murder in like something like this which lets you do the fact that the distributed system has distances between nodes right like running on 20 nodes basically means every node is equally accessible to every other node when you will go to 40 nodes half the nodes have twice the bandwidth to half the nodes are there half and the other ones have to go across the switch right so could you use information that you knew about the future jobs that you are going to run on this data and plan things so that you can run things parse plus we discovered that there is a bottom 20 seconds or so in MapReduce that is a lot of things that you have you did in MapReduce that you really do not have to do if you were running the same job jar every single time so MapReduce was written in a time where you would write a Java application using MapReduce you would run it whereas we are living in a time where you would write high or big which is built using MapReduce or base and run the same jar file multiple times hundreds of thousands of times for each job right the last part is something that people do not usually appreciate about this is that this was written for operation simply that upgrading a MapReduce cluster say from Hadoop 1 to Hadoop 2 is a ridiculously hard challenge it requires you to like stop things running on it to do that and when you run 10,000 machines stopping 10,000 machines and starting them up will take two days right so for two days if you say that your cluster is not operational it is not an acceptable thing so you want a way to deploy this in a way it doesn't require either an admin interaction and without requiring anything other than just a file being copied into a location right so the significant advantage yarn brought us is that yarn lets us isolate different versions of this running in the same cluster from each other so I can run this point for somebody else can run this point somebody else can run their own version of this on the very same cluster whereas in MapReduce case the runtime was part of the cluster and so on right so you if you have to upgrade the runtime you have to upgrade the cluster so the simplifying operation part we use the most common mechanism for sending files across a Hadoop cluster which is to copy twice this battery time no no just okay so the simplifying operation part basically uses use a pure yarn application right so that as long as the only deployment that you ever make is yarn you deploy by Hadoop 2 cluster and then you do not deploy paste after that paste is something that you can run as part of a single so it doesn't require an admin user to actually go and install this and this is a big deal because it basically lets us coexist with the MapReduce job with an H base job with a storm job in the same plan next how do we express a competition so MapReduce used to be MapReduce connected using one net which is always sorted some of some people don't want sort some people want to put something in between MapReduce like in particular case right so we tend to have a generalization of MapReduce as a bipartite graph of two vertices connected to each other right something like this so we took MapReduce and wrote out its assumptions that map processor reads data from a split generator and produces on file sort of output right and the next stage actually figures out that it is supposed to merge it so it runs shuffle merge input in this case legacy because assume some things about the input format and the reducer is actually independent vertex which actually has no real logic saying it's a reducer or a map we just call it that because it makes it easy for people to see now it is not immediately clear how when we go from MR to this generalization how things change so may ask a question how many of the people here run more than one join in a hive big job so for all those people next slide would be very interesting so this is a seven or eight way join that you actually run as a single job in case the split apart from just being MR to being able to connect vertices in any way you want basically let's express the fact that we are doing this particular Multiway join directly into this so high gets to not have to figure out which point to actually split up maps and reduces only where the where the vertices end and I'm not sure it's very readable but there are some edges in this particular image which are not sorted right and things like things like that in the DAG API basically lets us set up a vertex graph that actually describes what we actually want to do rather than trying to fit in what it would have done in MR. Now this is not an easy translation it took a while for high to catch up to using this big is using this now then at the bottom layer of things there's a runtime API that actually forms the interface between the edges and the data right there's some sort of assumptions made about what things people who are familiar with MapReduce already want like I said on files order output like on files order output is no different from in memory sort of output right the only difference is that on files order output is what people used to use so if you if you wanted to just replace your MapReduce pipeline you have multiple stages into this you just used on files order output in every edge so the API sort of looks like this and I do not know this particular join graph makes any sense to people but the traditional way of doing MapReduce joins is to do go through both sides of the table in the same map task and do a lot of magic in the run runtime to write an integer which says is it from the left hand or the right hand and sort it based on the key and that integer and then merge it back and do all kinds of this is far cleaner far more understood and far easier to schedule because we know that if the left hand side produces 50 and 50 GB of data on the right hand side produces 120 GB of data we can actually figure out how many data items are required and then how many processes are required in the next stage. So there is what it does is it takes its logical plan and produces a physical plan at runtime now people who have used MapReduce would know that you would run something called a split generator right split generator is something that is hard coded it's like how many splits of data do I process in parallel and what is let you do space let you decide those things after getting on a cluster instead of before this basically means you get to figure out for example how big is my cube do I have 200 containers available or do I have 100 containers available I can potentially generate 200 splits from like the 50 GB I have or 100 splits depending on how many containers I might likely have and by taking information from the left hand and right hand side independently it can figure out how many joint tasks I actually interrun again at runtime now we put like safety bars in this so we let people set a min and a max and let case figure out the difference between min and max because just in case all the data is going to come from the last 10 mappers you ran right you do not want to get hit by a pessimistic assumption on top of the edge part there is the actual processors which is the vertices the vertices have like two ends the processor of sorts we have a classical map and classical reduce processor which both end of having very standard input and output specify now the real advantage between the classical reduce and the intermediate reduce is the light this particular one is that this is not going to HDFS right the output that I am generating from this particular reducer is not going to issue in map reduce every single output that reducer produces for traditional case at least actually goes directly into HDF now this is not a really a big problem for most people except when you've tried to figure out what happens when you have 4 MB of data from 600 mapper 600 reducer right you will end up having a large number of files so to say it in a different way I should explain everybody's familiar to SQL right and so there is a select average case with a group by that we're joining in something else right in a traditional map reduce case Hive cannot actually figure out how big the data will be before it actually starts to do something like this so Hive has to actually run that query then run something on the server gateway box here you're running and then send it back again to the cluster to do close now this would make sense if you're running it on a single machine which is a gateway machine within the cluster but for production use cases for a secure cluster you are never really given access to the class the admins have access to the cluster people running jobs don't have access to the cluster so this operation is really slow because you write to HDFS come back decide to do something again generate splits again go back to the cluster ask for resources start going work again so we have something called a broadcast edge which describes not whether it's shorter or not but about how data is distributed right the traditional map reduce edge is called a shuffle it is called a shuffle because you generate 100 items from each and give each item to each reducer so it's like a merging of cards in a shuffle like it's like this what we do in this particular case is something called a broadcast edge for all tasks with all the data generated so the average number that we calculated is sent to all tasks because it's a very small amount of data instead of forcing a joint that is actually using the HDFS let the next important part where they actually becomes more important than how a simple DAG scheduler works is they support something called vertex manager plug-ins and edge manager plug-in the edge manager plug-in actually lets you decide which data item gets which task gets what data item independent of what days decides so the application can actually write something that will route data depending on information it has now this particular one is actually a join from a sales table to an inventory table when they're both stored bucketed by item right if I know that bucket 0 of store sales and inventory 0 of store sales is the only one I need to join I will actually have like if I divided by 100 I love 1% of the data that needs to be read by single task to do a join so it lets me do something which I couldn't do before like a map join with distribution and to be able to actually say duplicate data to different places so something similar used to exist in hive before as well the biggest pain that it had was that after you decided that bucket 0 will be processed by one task it doesn't matter how big buckets 0 bucket 0 is 200 gp 200 gp is going to go through that one task so it is potentially slower to do the fast logical way than it was to do the slow logical way right so with this basically says that from inventory I'll take bucket 0 give it to everybody who is processing buckets it also says so stores will suddenly became a distributable problem right so that 200 gp suddenly became distributed and in general taking a slow problem and distributing it anyway makes it faster taking a slow thing and making it faster and distributed is always a win so being able to write these custom edges is something that no other dive framework that people popularly use support and this is a big win for things like select star where x less than 10 limit 10 you want to run the task until you calculate 10 results globally you do not want it to go through all data collect 10 results from all data and then finally decide to read only the first thing right you want to exit when you hit the first one the very first task finds 10 records you wanted to exit then not go over the remainder 200 gp right being able to put fast exit parts is how why edge manager and vertex managers make this very very flexible the other part that most people don't know about and this is a SQL ish aberration that hive has is that hive allows you to run one query and insert into multiple outputs right this basically means that if you have a very large table you can calculate many many many different queries on the same large table in one pass so for most reporting use cases you will have today's data I want to generate 15 different reports right the way most people do it in the traditional SQL world is to run a query that goes over the data once and then do the same operation 15 times what this lets you do is this lets you go over the data once generate those 15 outputs and send them to independent reducers later this is because we are not limited to for a given mapper. There is only one reducer we've broken away from that limitation. So it lets you do things far more thoroughly because you do not have to write to stfs in the middle and I might be banging on why stfs is slow for a while but stfs is not slow because stfs is bad stfs is slow because stfs offers very strong guarantees if an entire ups for a rack dies stfs will have to keep data. So what stfs this is that when data is generated and return to stfs it will write to approximately three machines. It will try to write to more if any of them failed but it will write to three replicas one of those replicas is guaranteed to be in a different ups or track section from the one you are currently in. Now it doesn't matter at small scale all these delays because if you have 20 machines and one rack or alternatively if you're an easy right and you haven't figured out how to use like distribution of AWS regions for this. It is not particularly slow to write to three machines versus writing to one machine but when you have to cross a rack boundary or cross two or three rack boundaries it actually really starts to slow down taste doesn't need such strong guarantees on data because they is knows exactly what to run to recover that chunk of data in this particular thing right if one chunk of data output of one reducer gets lost or a machine gets lost it knows what all things to retry. Right and being able to reliably retry is another one of those hard scalable problems which affects your ability to be really really fast but there is no real value in being really fast 90% of the time and not working 10% of it right. It is it is not something you can rely on in a production use case then you get into the really exotic things of what you can do. So in this is number of scenarios you go over the same data multiple times to do different things right. Sometimes you might not be able to do all of it in one go. You'll have to do go over all the data then go over all the data based on the information you collected from it right. So big had something called skew joints where if a given date has 100 times the traffic of every other date if you did a uniform joint that whoever gets that date will be horribly horribly slow right. So to be able to split apart that particular date into like different buckets or different process big comes with something called a skewed joint. So you'll be able to say my data is this most of the time it says it's not loaded but make sure that when I do a joint distribute the case that has the largest load. It used to be that big would go over the data once generator data file read it back and then decide to do the partition and join part later right. What these let's us do it's let's us put one on one edges that let's me calculate an aggregate and give data to the next stage of the same time. In fact, we use the information that big gives saying that it's a one-on-one edge to make sure that the second job runs on the same machine as the first job provided this capacity. Now the real advantage of that is it the data never moves over the network which means that as you add more machines to the cluster it doesn't get slower. If you use a completely connected pipeline system which moves data from one end of the query to the other end of the query you will end up moving a lot of data over the network and it wouldn't matter so much if your data queries were like narrowing down the output like if you pump data in and pump the exact same amount of data out like in this particular case you're doing a joint without any selection right. You will end up having a significant number of problems moving data over the network if you just randomly distribute right being able to schedule the next task on the same machine and not having to move it over SFS both actually make this significantly faster for quick so there is one thing that I didn't talk about is that yarn has something called an application master. So the reason yarn made things possible things like this possible is because unlike MapReduce MapReduce had something called a job tracker. We should track every job right one that is a horrible way to do like a distributed system because having central thing that keeps track of everything that happens in a cluster is not very scalable. So yarn splits it up into two things a resource manager which is generic and an application master which is application specific. So this is an application master which understands everything I talked about earlier while the MapReduce application master doesn't have to know anything about what this is right. So you can have the isolation of two different architectures in the same yarn cluster because the resource manager is the same for both both of them need memory both of them need CPU both of them need disk right now the performance gains are mostly to do with the fact that now we are doing exactly what Hive and Pig wants. We're not doing stuff that MapReduce has that Pig wants to try using right one of the few good things that happened in this particular process is that people from the Hive and Pig community work very closely with this to tell us what they were missing in MapReduce and told the face community to build right that close interaction to try to describe to try describing what they is and Pig in Hive and cascading have to do together has made the fact that the DAG expression is as close to the best case scenario for all these applications. Now the next part is efficient use of resources, right? There is a lot of things that you could do differently if you knew what you are going to do later like the co-scheduling that can some I described earlier was that I know I'm going to read this data in a few seconds. I know that this is the machine that I will send that task in MapReduce. I would let go of that machine when my task completes on that machine. So this let's this because it knows you are going to do something later this can actually send the new task to that machine and have it waiting before all its dependencies accumulate. So if you have a one-on-one edge, it will finish this edge start the next task if there are no pending tasks for the first stage, right? Being able to use global information to do task scheduling makes a lot of these things faster if you pay attention to all the details. The next part is that at runtime being able to change the plan you made like MapReduce after you decide what your MapReduce plan is going to be. You cannot change it after the very start or the run starts after it starts and you realize something is I could have done something differently better based on information I got from the first phase of the task. There is no way you can influence a reducer after you start the map, right? As you saw earlier in the edge manager and the vertex manager cases, we actually have a plug-in mechanism which lets you control parallelism edges and scheduling models of each task each vertex independently of the previous vertices. So the the predictability of this thing has a very different angle to it as well. So usually if you're submitting a single query, all of it is going to run as the same job, right? In case and the only reason we can assume that because it's all going to run as the same user in the same security context in the same queue, right? In if you had to run three separate jobs, every time a job finishes, you'd have to give up all the resources you got for that job and you have to pick up new resources and for that you'd have to enter in a queue where you're behind everybody else. Imagine running a three-stage process where after the first two stages, somebody else run their first two stages, right? This is not so bad for utilization of the cluster, but this is really really bad for your query and their query. It would be better for both of you to get out of the cluster by making sure that whatever resources you need for the next stage are already kept on the resources you collected from the previous stage on top of that within a stage within a dad, we actually reuse JVM SwissPinaw. So people say Java is slow, right? And it's slow if you hit the GC, right? If you allocate in a for loop, it is slow. But Java is not slow after the JIT kicks. But the problem is for the JIT to actually kick in and get enough information to figure out what you're doing, you need something that runs for a few seconds, right? And you usually do not get things that run for a few seconds. Obviously the goal of this is to actually run things in like one or two seconds. So if I'm going to run my entire query in two seconds, I obviously do not get any benefit from the JIT, right? And that's bad. So what we do is that we actually do not tie up the the task with the JVM. We have the containers as a completely independent entity from the task and each task ask for what I should do from the application master and does it and gets out and the container stays alive. So imagine you're doing a table scan in height like where class filter operator would get progressively better as you just kept using, right? You wouldn't lose that information every time a job. This is compounded by the fact that if you run high server to in the VI mode, we actually keep a bunch of live sessions. We keep a few sessions alive, which have 10 or 15 containers alive. So when you fire a very small data query, it actually executes immediately and gets up. It's like execution like four or five seconds and we have code and paste today, which does things like keep one container per node, right? Instead of trying to say I keep 10 containers, right? You can say keep 10 containers, but never have two containers in the same node if possible. If you're more than 10 machines, make sure that at least all of them have it. This gives you a lot of locality. Not only that if you run so a map join is a very classic case in high where you load up a small dictionary and stream a big table through looking for whether the keys that you find are in the dictionary, right? An example would be you are trying to join store sales with the like a location store data, right? Find all stores which are in this particular state. Find me all sales that happen in that state. The way MapReduce high used to work is that it would start up it would load this data. It would go through the big table and then exit. What these does is taste says am I going to run a different chunk of store sales before I exit if taste knows I'm going to scan through store sales again. It will not drop the state data. So that when the next one comes in, you don't have to go through the effort of loading data back. Whenever the vertex changes, it will actually drop this data. So when map runs until map is done, all this data is actually kept in menu. So this basically means oh oops, there's no color. So this basically means when map to runs next map to it is faster than the first map because this guy doesn't have to actually load any data from memory, right? The core engine has more things than what I described. It does priority of things. It does it lets you override a ridiculous number of things that it is actually hard to use. But in the general case what what we have figured out for vertex managers and Ed manages and brought the input and outputs faces. We have built a bunch of library of things that you can compose in any way you want. Right now. Most of what I talked about had to do with how we designed how we what tradeoffs we made while building this is before and after for mapper this is not a trivial query. It is a large number of joints on 10 TB data. On 20 nodes each of which have 64 GB RAM, which is not really a big cluster and which have four HDFS this speech. I wanted six but fine. So this is sort of the query 68 at this point never finished in MR after seven hours. I think the skills but what the improvements that you've gotten so far over the period has been a significant amount has been being able to express what you want as a dad and being able to run it really really fast based on all the runtime improvements I talked about. So if the runtime gave you 2x and the plan gave you 10x that is actually 20x even though each of those two things really are not. Well, we people don't really conflict both those things together now. Spend a lot of time with Hive figured out what Hive needs and suddenly looked at pig because obviously pig also has very similar requirements from MapReduce which was never satisfied. Now pig did the simplest thing possible pig took its work plan and just convert it into map and reduce into into phase. It didn't do any of the complicated edges. It didn't do any of those things at the past. So most of these improvements that you're seeing do not use any of the complicated things that it used to do and you might notice that all the work we did to make the queries run in four seconds. Can also be reaped by queries that run in 4,500 seconds. Right and a 3x gain in this means tens of thousands of servers not being bought right? So Yahoo did this particular benchmark where they've tried to figure out whether to use they have or something alternative to MapReduce and they found out that days was the only thing that relatively scale to 300 400 nodes and handle tens of TVs of data then obviously high pig. The next common use case people say is that MapReduce really really sucks if you want to do iterative algorithms if you want to go over data and run k-means and clustering MapReduce is a really horrible and it's mostly because you have to go to the beginning of the queue ask for containers get a JVM lose all context every time you exit exit a particular stage of your loop, right? So big guys built a loop and roll k-means not loop and roll big big did a very heavyweight loop of k-means which was basically load data run job decide whether to run next phase run job run job run job. It's not the most effective way of doing iteration but even with such a simple thing we got like a 15 14x improvement per month. Now the one on my digest I talked about earlier basically lets you do iteration even faster. Right? It lets you take your entire iterative job and you do not have to wait for the first stage reducer to finish. You can just run through a vertex group of 10 reducers before you exit, right? You can actually change your reducers with one on one edges or tasks with one on one edges and make sure that the first one to reach the end leaves the cluster free for the other ones to do the other one, right? Not being a selfish thing that takes up a land grab in a cluster is an important thing for people who run muddied and if you spun up a cluster for every job you did you wouldn't need most of the stuff that is limiting us from getting to like very very very far but being something that there's a land grab in Hadoop and takes up hundreds of GB and stays resident under the last thing exits is a very very bad way of getting multi-tenancy and utilization. So one of the key things that we didn't taste was to decide that we are building this for really busy big clusters. We are not building this for a cluster which is going to have only a few boxes and we are not going to connect from every box to every box. Right? If you have 20 boxes 20 square connections is not a lot 400 square connections is a lot, right? The other thing is this was designed to work better as the cluster gets busy at it is designed to work better than MapReduce as a cluster gets busy. It is because if you have a four-stage operation, you do not go to the beginning of the queue every time the first each stage finishes. The next is if you have intermediate output between jobs because you're not writing it over the network. You're not writing it across racks. You have less overhead for that operation. You are not bothering anybody who actually needs to move data across right and more importantly, if you run a significantly large cluster, you will end up having a name node which uses a significant amount of memory and which is tied to how many files you have. So the fewer files you write to HDFS, the better HDFS performs from the name node perspective. So the fewer small files I write the better it is and this basically helps me do that. And as the container use example said like MapReduce traditionally used to work even if you threw hundreds of terabytes of data at it. This is designed to actually work better and better and better as you throw more data at it because of the container use and the chip performance improvements we did. Plus if you do not have that much data, if you had like 10 containers worth of data, you had the BI session pool lying there just to run those 10 container jobs, but those sessions are not limited to 10 containers. They can grow to 200 and shrink back to 10, right? Being able to grow and shrink elastically is very very hard if you do an in-memory work, right? Because when you kill a container, what what do you do with the data it has in memory? You whether recomputed later or you lose it. Finally for things like map joints and hive, we moved a lot of work away from gateway boxes, which is the box you log in or people running hive, high server to we moved that job into the cluster. Now it doesn't actually speed up things a lot, but what it does is it prevents your high server to from dying from load, right? You could run like one or two gateway boxes and they wouldn't die. And that is a big win. If you actually have heavy production traffic at the same time. From different people. So the real problem in a busy cluster is what if you only get 50 containers and you have a thousand tasks, right? So what we do is we pack everything as tightly as possible into the same containers. So this is container number 9, which is running all these tasks one after the other without downloading the JVM, right? To actually do this, we have to go through a lot of QA to figure out people who have done made assumptions that they can allocate memory and not free it. Thankfully Java is easy to debug that because the GC very clearly shows what objects are referred by home. But after doing that, this was a huge win because in a gang scheduling system where you get you wait if you need 10 containers worth of memory, you wait for 10 containers before saying, okay, you can load data being able to say that I will run with nine containers, but only a little slowly is a bigger win for production use cases than actually waiting for 10 containers to become completely ready. So adoption is currently high pig cascading and the good thing about cascading is when you cascading supports, we automatically get scalding and Scala support which will help a lot of the ML iterative use cases that talked about earlier and bunch of commercial vendors are trying to use this instead of mapperduse because I think for hundreds and thousands of TB of data for a petabyte warehouse, that is one of the way this is one of the ways you will be able to survive in a busy cluster and we have got a bunch of things coming up. We are building an in-memory HDFS layer. We are building all the collections that we have built the edges and things like that so that people when they come up with a new use case that we haven't run into will try to generalize it and add it to this. We are planning to improve the core scheduling preemption. Basically what happens when a cluster is 100% occupied how to let go of the right thing and there's a huge community behind it. You can send a list to the all the mailing list this week. We got of the incubator and that is any questions. We have time for about two or three questions to three three at most asking questions raise their hands. Anybody has any questions now is the time to ask. Oh, how is it different from Apaches Park? So Apaches Park is not a data processing system. It is actually something that combines data and processing into one thing. So it has rdd's which hold data in memory and something that and code that runs over it, right? If you have a workload that fits entirely in memory for you. Okay, so what is the US piece for face? So I can say this everybody was benchmarking against their product mark stays as number two, which should tell you that no matter which use case you're going to throw it and it'll be good but being number one at the expense of affecting production is probably a bad idea. If you really want to scale, yeah, so there's a question.