 I just want to say that, just to make the presentation lively, I've used a few cartoonish characters. I hope that's OK. For someone who is more inclined to code, there's a bunch of code. And if there is time in the end, I'll probably show a demo. But because it's a crisp talk, I just pasted the code inside, so I hope that's OK. The second point I just want to make is that I'm going to be comparing what I'm going to be presenting with other products that are out there by no means. And I'm making them less significant. They are as useful every tool has its place. I just want to make sure that I'm just doing that for highlighting what I'm presenting. So those two, let me just start. So timely data flow, right? So this is a project written by Frank McSherry. If you actually go to the GitHub page, it will actually read as low latency, cyclic, data flow, computational model. Basically, it's a stream processing library for your graph. So that's what it is. So what's going to be the agenda? So this is something I picked out on my own passion. So I'm going to be convincing or giving my perspective of what it is and ask prime directive questions. What does it do, how to do it, what's in it for me, right? Then you're going to see some code. Hopefully, you'll have some take away by the end, right? So that's so you can ask this question, right? There are tons and tons of stream processing system out there in the market. You have Spark, you have Flank, there is Beam coming up. Why bother another stream processing platform, right? So this is my first reason, right? So there is this world of academics who produce brilliant papers. And there is this world of enterprise software who write problems that solve the real world. So timely data flow sits in this intersection, right? So it's actually an academic research paper. But instead of just being a paper, the authors have taken the pain and implemented as a working solution. So people can adopt. So it's in that intersection. Second is that it's not intrusive, right? So if you have something else, this is a very thin library. So it can coexist, which is very critical, right? If you want to play around. And the third, I don't know how many of you are aware, it's actually written on REST, which is a new language from Mozilla. So if you're a geek and you want to try out a new language, it's an awesome opportunity. So that's my reason of why bother, right? So what is timely data flow? So that would sound like a buzzword bingo. So it's a real-time distributed data parallel library. And hopefully by the end of the talk, I'll try to justify what I'm claiming. So I just want to give you a perspective of what it is like. So in the bottom of the stack, it's not misspelled. It's abomination by reason. So any data framework, you would need some way to serialize from one machine, one process, one thread to the other, right? And it so happens that in REST, you will have fast serialization. But the author has gone a step ahead and has written unsafe code just to make sure that it is as optimized as possible. Hence the name, right? And it's isolated in this layer. So the rest of the layer is all safe REST code. But on the network serialization, it's something that you don't have to tweak, but something you should be aware that, for example, if you have a vector and to stream a vector from one process to another process, you will do a heap scroll. That's the image. So what happens is that the vector basically becomes unmodifiable, which is OK if you use in that. So it's something that you have to be aware of. But on top of this is what you have a timely data flow, right? So people would recognize that as a DAG. Yes, it's a DAG, but it's also a special DAG. For example, how many products that are out there, which lets you form an acyclic graph, which has a cycle in it. When you say cycle, it's a loop, right? So timely data flow, you can model such a graph, right? Because of its characteristics. On top of this, there is another layer called differential data flow. I'll come to that more later, but it's more of the hypothesis that people are very good with collections, right? Given a bunch of things, right? I can write a program to transform from my domain to the range, right? Very good, all functional programming handles that pretty well, but not a lot of options out there, which can handle when the collection changes, which is your stream. So that's with the differential data flow. So what will you do? You will write your application on top of these, right? You could use differential data flow. You could use timely data flow. It's very unlikely you will fall to the network layer, right? So you can ask me, what do I get in return? Why take the pain of learning a new language, learning something from academics, which is not enterprise-ready, and what do I get in return? You would get as much low latency as you possibly can, right? And that really matters. And you will also get as much high throughput as you can. If you have worked with systems, it's quite hard to achieve both, right? You can have low latency, but you'll compromise throughput. You can have high throughput, compromise latency. But you can try to get as much as possible in both these dimensions. And the claim is with a number, right? So this is what I mentioned about other systems. So this is a public, so it has been proved. So it's actually from the paper called cost. Cost stands for the cost, the configuration that outperforms a single thread. You have all big data systems out there that has clusters of 128 cores. Can it outperform one single thread, right? That's the question the paper tries to answer, and you can see the numbers out there. They have real numbers from real systems, right? Spark 128 cores, and you see the numbers. I don't have to mention them. In the bottom, you have timely, you also have on a single core and on the big cluster. So that's the performance that you can expect, right? Mainly because of the care that you take in writing program and timely flow. It's not out of the box. You have to handcraft most of the things, and that is what you will get in the end, right? I also mentioned this. So if you have a DAG and DAG seems to have loops in it, you will have to have things in cycles. There are not many options out there that lets you model them, right? All right, so if you're now convinced that you want to try out timely, you know, what do I do with it, right? So you get a handle to the root, right? So you write your program as closure, and then the timely flow library will take that and run, right? So basically you want to get the handle. There is this concept of a canvas. So this, think of canvas as the context in which the flow happens, right? You need to create as much scope as you want. So once you have the scope where the flow happens, you can express your directed as a graph, your processing of nodes inside the scope, right? It's the scope that actually makes loops possible. So if you actually see there is an outer scope and there are a bunch of operators, there is also an inner scope because that's where the loop happens. And you have this concept of an operator, right? So everything happens in an operator. There are a bunch of predefined operators, what you can expect, map, filter, concat, inspect. But there are a few exotic operators, and I mean exotic, these are the operators that makes the loop possible. But if you think these operators are not enough, I want to write something specific because what I have is unique. You can implement your own operator, right? And those are the unary and vanity of the generic operators. You can write your own logic inside an operator and express that in the graph, right? So let me try to justify the distributed data parallel nature, and I'm going to do that with the Hello World. So let's say that you have two workers, and these two workers are going to print from 0 to 10, right? That's what they're going to do. That would be your code. So it's in Rust, so if you're new to the language, it's quite hard to digest. Basically, this is the part that you need to be worried about. You have a scope, and you're creating a stream where all you're doing is inspecting what's coming in the stream, and there is an input for which you can send the numbers, right? And what do you get? So if you send 0 to 10, it'll print 0 to 10, but that's not anything to do with parallelism, meaning there is no way to provision data. So if you have to say that all 0s goes to one worker, all 1s to other, and so on, how do you do that? There is an operator called exchange. So what it will basically do is it will let you write a function which can partition your data so it can shuffle, right? So these workers can be anything inside the process. It can be outside the process on a different machine, not a problem, right? Exchange lets you exchange information so that you can partition. So if you put an exchange operator in the end, right? Like that, and you simply say that whatever that number that's coming, that number is what I'm partitioning by. So what will happen if 0 comes? All 0s will end up in one worker. All 1 will end up in another worker. But you still have an issue. So if you actually see, things can go out of order. If you are expecting 0 to 9 to be printed by partitioning, that's not going to happen, right? So that's the reason because you're only partitioning, but there is no coordination that's happening, right? So that's the fourth operator. So I think by now you get the sense of what I'm trying to emphasize is that you build a graph with operators and there are built-in operators to do the basic things. So what Probe lets you do is that you can have this lockstep progressing because Probe will make you only proceed if all operations of the previous step is completed, right? And that's probably how you would write a Probe operator. So basically what is happening is this. Every worker, once it is done, will indicate that this worker is done with this job and it goes on. What's very interesting is that unlike other systems where either you can synchronize or you cannot synchronize, intimely the Probe is very flexible. For example, if I have a bunch of nodes and not every node is of same capacity, some nodes will be ahead of the other nodes. If you want to leverage the processing power, you can build some slack in the system. Say I want to be two steps out of sync and that's OK, right? You can just express that in your graph, right? So just by putting that I'm OK for the Probe to be two steps behind, you can just say minus 2 in the input time. And then your job will not be asynchronous, but you will have more throughput because you're leveraging faster nodes, right? So on time, be good. So on the differential flow, right? So let's say that you have a domain of some value and you wrote a program to translate that into a range of some value, right? And what if the number changes, right? One dumb way to compute is that for every change, you will compute the result, right? That's the most straightforward way. Not very optimal, it's very costly because if you have a very heavy computation and the data is there, that's going to take time, right? So instead, if you are expressing your operations as a differential data flow, what happens is that when the input changes, your computation automatically catches up with the change and you don't have to do anything extra. Once you express what you want in the differential data flow operators, the result will be automatically updated. You can ask me, this is what does view updation, materialized view in the SQL world? Yes, in a sense, but you can do this on a graph. You can do this on a graph that has cycles. So you get all those benefits, right? So what does the differential data flow have? It has almost all operators as the timely data flow, but it has a few more. The essence or the secret of making the output in sync with the input is because what's implemented in differential data flow, it's called a Mobius inversion. It basically means if you have a node. Because if you have a collection which is just an integer, then the def is going to be easy. But if you have a pair of things, it gets difficult, right? And that's what has been solved in the differential flow. Just to give an example, let's say that you have nodes. You have a graph of gn. You have n nodes and m edges. And you want to do a breadth first search, right? What would you do? If a row randomly is added or an edge is randomly cut off, you pretty much have to compute the whole graph again. There is no possibility that you know from the given node you can reach the other node if you're changing the edges and adding new nodes. But if you express this in the differential flow, you can actually make it happen. And it's also very performant, right? So this is another public number that's out there from Big Data Log Sigma 2016. So basically saying the systems out there basically solving the reachability, right? And those are the numbers. And that's the number of differential data flow, which is pretty interesting. But what's more interesting is that while other systems don't let the input change, you can change that in differential data flow and get a response in microseconds. That depends on how much change that you're making to a graph. If you have a reasonably dense graph, if you just cut one node or if you cut one edge, not a lot is going to change. So you're going to get as much response as the level of how dense the graph is, right? So I just want to say that this is not the first project. The actual base project that it is based on is called NIAD. It's from Microsoft Research. But for some reason, Microsoft shut this out. And hence, the author actually went ahead and implemented this in Rust. This is also reviewed by the morning paper if you are following Adrien Keuler. He reviewed NIAD sometime back. So you can ask me, what is it going to be used for? So if you have a graph which requires low latency, you can use it. It's not very good fit if you want to do something like machine learning. If you have something in Python and you want to integrate, I've not done this myself. But Rust is much more embeddable. So you can embed Rust in your Python flow if you want. And questions. I also have some appendix, but I'm good. Yes. Yes, you can. So in all the operators, I didn't show that because it's a crisp talk. So in each operator, you are free to implement however you want. So you can have state in memory. You can have state in whatever form you want. So what is very nicely implemented in timely data flow is the concept of vector times. So that's what probe actually gives you. So you will know for sure that all data for a given time is processed before it moves on. So that's what probe lets you control. So you are basically solving data and all of those. Oh, OK, sure. Yeah, does it coexist with the rest of the equation that I mean that the reading part of this is it directly from socket? Or it can coexist with queues, maybe HDFS, or any of the other products out there? Yeah, so that's something. So there is no connectors if that's what you're looking for. There is no ready-made connectors out there. It's basically a TCP and you just write to it. You can embed this, for example, if you are having an HTTP server, you can embed that in there and make them talk. But you have to write rust for that. If you're trying to embed, you can also embed, like I said, you can embed language in another language. Rust is very much embeddable in any other language. So you can do that option as well. Thank you. Here. Here. Yes. Here, here, inside. Sorry. You said you start with the rust and enable the unsafe feature. Why not C++? I mean, the close counterparts. OK, so the question is, there is unsafe code in abomination library. Why is it not with C++? So that's actually a language question that's supposed to timely flow, but I will answer anyway. So if you are from a system programming background, the concept of ownership and compile time safety that you get in Rust is outweighing C++. So pretty much any new systems that are going to build out there, Rust is a very shiny option, as opposed to C++. Any plans to go back to C++? Any plans to? Go back to C++? As in for this project? Yeah. I don't think so. Because so one of the, there is a talk in the San Francisco Rust Meetup, the author loves Rust, and I don't think it's going to go back. The first implementation was in C sharp, which was as the performance number, the one that you saw on the left, was actually from a C sharp implementation. So there is a not too much on the language, as supposed to the concept. So it could be in any other language. Hi. Here. Sorry. Yes. So can you tell me about the exchange, the exchange operator which you are using? So how it is shuffling, or what optimizations you are doing there? Are you spitting into the disk, or what exactly is happening? No. So exchange is, the question is, what's happening inside the exchange operator? Exchange is just a function. So it's up to you to partition your stream of data in that function. There is hardly any state. So unlike other systems, this, because everything happens in micro and millisecond range, there's not a lot of state and dependency on fault tolerance at this point of time. So if you ask me, is there a HA availability? No. Is there any state maintained in exchange? It's up to you. If you want to write a state in there, you can, because it's just a function that you're writing. No, no. Everything happens in memory. OK, cool. Hey, thanks.