 So Stone is going to be a beginner talk, kept it quite simple. I'll talk about the basic concurrent and distributed programming constructs which I will do in Julia out of the box. So we have Stefan talked about this in the morning. Tasks also known as core routines or co-operative multitasking is present in Julia. That's one set of API which I'll cover. And the second is distributed workers, some examples of which can we just talk about. The thing is with Julia, the base distribution comes with reasonable amount of support to run Julia code across many machines. The packages which enable you to run on Amazon. We have had demos where we have spun up thousands of cores in a matter of minutes to be able to release it to the cloud again. So I just cover both of these approaches. Yeah, so tasks in Julia back by a single OS type of execution. There is work on multi-claring, which is a work in progress, is work on by branch. At this time, I don't know how to map on to the task infrastructure itself. What we have today is basically a new UV back event-driven IO mechanism, which multiplexes all your IO tasks. That's network, a file, and the other useful thing for tasks is implementation of timers or background tasks. Since it's cooperative, I'm not asking there's no preemptive scheduling. So if your task is CPU bound, nothing else works. So it's good in a lot of ways. It simplifies your code. You want to deal with blocks and a whole bunch of that kind of stuff. High level API, again, the morning that's seen at async at sync. I'll just talk about it a little more. So typically, the at sync block takes an expression. So as an example here, you want to collect the results of some URLs you want to download. So the at async block can do the actual fetch, which is a network call which allows other tasks to run. And then you process it. The processing, of course, prevents any of the tasks from running. And then you push the results and collect it. So in the task communication, there is the order model of produce slash consume. So two or more tasks can sort of work in lockstep. The producer tasks could be reading of a database or network or pie system. And it could be generating jobs within the same Julia process. So produce block still a consumer to lose a value. It is like a queue of length one. And you could have one or more consumers who would consume whatever value is produced from task T. And that block till a producer has a value. So it is sort of a synchronization mechanism in that sense. Julia 0.4 has support for channels. The channels are type aware. As an example here, you create a channel of inch 64 of size 1000. So this channel can store objects of type integers or 64 bit integers. And the maximum size is 1000. So some of the calls will block if the size 1000 is reached. Anybody trying to add to a channel using a code call that task will block. So that was about tasks. So the multi-processing, we have the same model called leveraging multiple cores on a machine or distributing computation across machines. The user API in this case is more of a remote function execution as opposed to message passing. If people are familiar with MPI, they are sort of shipping data and the same. The different programs sort of done at the same stage. Whatever data it receives, it works on it. In this case, it is more of a... So the APIs, the low-level APIs are like a remote call on a particular processor. They execute function F with variable number arguments. The function F can be a closure which you create here, an anonymous function, or a function in a module which I wrote it. So a remote call basically just executes a function and gives you a handle to the result. It's like a future. A remote call fetch would block till the function finishes the execution and returns the result. And you've got a couple of macros. AdsCon expression takes an expression and runs it on a worker. It just cycles through all the worker processes. AdsCon add runs the expression at the particular process which are indicated in the pit. Both of these return a future, which we call the remote reference. So the multi-processing model which set up is the initial process. If you start with really a little, that has got a PID of 1. We call the master process or the driver process. It launches worker processes via cluster managers. The Yarn cluster manager is something Tanmay just demoed. So a cluster manager is responsible for launching workers and providing information on how to connect to those launch workers. This is only two main things it does. So in this case, we're using Yarn. Yarn has got ideas for starting Julia processes. So you leverage that and then you provide information on how to connect all these Julia processes into a cluster back there. So by default, all the Julia workers are connected to each other. But you can specify in the ad cross API that you want only the master to connect to all the workers. The problem in the first case is if you have a thousand nodes, it's about 5,000 PC connections. And you start adding some of the limits like the number of open file items and you're going to read those system limits and all that. And in some use cases, we don't even require the workers to communicate to each other. If you have a problem statement required that the master generates jobs for the workers to work on, you can just opt for a master slave connection. So ad cross is the API which adds workers. There are two cluster managers that shift as part of base. The local manager adds processors on a single node. So if you use ad cross with just an integer value, that'll add that many workers locally. And an SSH manager uses SSH to remotely connect to a list of posts that you specify and launch workers on those machines. So both these are available as part of base. There's a package called cluster managers.jl which has got support for various clustering technologies like a sun grid engine or something called slow that you use a little, not much. So the API is part of the API. So the distributed API we have, the higher level API is the matter called ad panel 4 which is typically used to distribute a large number of small tasks and team app would distribute very compute intensive tasks over your workers. And the remote call function execution is why remote call fetch and a bunch of other. So I'll just show you one. So here I have what I'm doing is I'm just doing a point loss and list a point loss. I run it for 10 to power 8 times. The ad panel macro takes in a reducer in which case it's just an addition function and it splits it up as workers. So if you go line by line, if I have got more than one worker, first I'm removing all the workers. Just so that I can run this for multiple times. And we run it without adding any processors. In which case this code runs on the current master process. The first one is just to prevent compilation of code. Any benchmark you do in Julia, this is one thing you need to be careful about. You just run your code once so that all the code parts get compiled and then you do the actual time. I just run this. I took around 4.3 seconds. I've done the same thing again after adding four workers from one. That's a pretty good speed up. This is about four code with eight hybrids. Hybrids in usually don't have too much effect on really CPU-bound stuff. And we can try it out. Yeah, there's no change. So four real codes and that was a pretty good speed up. There will be a lot more examples due to workshop tomorrow. Just run through a lot of good examples. So that was that parallel for that would take the range and then split it up as workers. So pmap is like a parallel map. It takes in a function which will execute all the workers. And you can pass it lists of the data you want to work on. So for example, pmap, just few lists that we would call function with parameters 1, a, 2, b, 3, c, and so forth on each of the workers. Remote call would execute functions with the various parameters on KID. So remote references, when they returned from the remote call function, they're more like future. So typically a remote call would execute a function on a remote process and store the value in the reference. And a handle to that value is passed back. So by default, all these APIs return a handle to a channel of size 1. At the time, it can hold only one value. The remote references themselves can be serialized across processors. They're pretty small. The actual data is not set. So the calls for remote references, the API is wait. So you could execute a bunch of functions on remote workers and then you can wait on all of them. It just tells you where the data can be fetched. It's a possible race condition because if you've got multiple processors, testing for is ready, and then you do a take or a fetch, somebody else would have removed it. So you need to be careful about that when you're using it in the context of distributed computation. It sets a value to the remote reference. It blocks if the reference is full. It removes and returns the value when it blocks if it's empty. And fetch just returns the stored value and it blocks if it's empty. So there are two types of synchronization which you can do. Condition variables which have basically wait and notify calls. So once you have a condition variable, and this is then triggered in the sense that only tasks that are waiting on a condition are notified. And if you want an event triggered, you need to keep some state in which case you can use the channel type. It can be a size of size one and you can test if there is data there. So those are two types of synchronization mechanisms available right now. With 0.5, it's not the threading I'm sure we'll have quite a few more. The other relevant packages are MPI.jl for additional MPI-style parallelism. The one nice thing, interesting thing about this package is you could also use MPI transport, MPI transport for regular GUI, MPI calls. Let's add parallel form and team app. So the typical setup is we use these three sockets. But some of these MPI clusters have their own high speed interconnects and other technologies. And we can use MPI for shipping our messages. Shared arrays, the leverage, shared arrays are single node. The arrays have to be of bits type. Distributed arrays are for working on arrays which quite large and can't fit into RAM on a single machine. So that would span multiple nodes. Each worker creates the local part and you would use team app or some mechanism like that to work on the local part and then fetch the work itself. The package for database.jl which has an API for launching machines on EC2 and starting workers on them and that. Once your work is done you could release resources. And that brings me to the end of my presentation. Any questions? I think we can maybe connect the class. So there's this low level yield stuff for tasks. And then we have like photographs of channels and produce and consume. I think the producer consumers is more historical. So that we can probably have a debate and do a little bit. The remote production channels, again the 0.5 I think there will be some amount of work towards nationalizing that part. But I think at a fundamental level the in process stuff, whether the tasks and the distributed computation, just sort of two different models and each will have their own vocabulary. But certain amount of simplification, yeah, I guess we can have it. My understanding of online is it's a whole lot of message passing, right? And it's slow. So online has its uses in the telecommunication world. No, I'm saying as in, I mean there's a benchmark I had done quite a while back. For the same task, when I, for the same load, what could get done using one code, C code, I might look around 12 codes. Everything else was the same. So the whole one thing, one of the background of Julia is its high performance in terms of for numerically intensive tasks. So we can, I'm not saying we can't take the good ideas from online, but I'm saying there will be some. It becomes common for everybody to use. I hear you on that. I'd also like to just say the MPI crowd, right? The traditional human computing crowd, their concerns are all quite different. The kind of optimizations they look at is when there's network calls going on, can I load my CPU? So the direction from which they come from, probably it will be different. That's fine. When the cloud has got a different. And for a lot of enterprises stuff today, this model is definitely a smart phone developer friendly. I think the key is flexibility, right? To be able to support different models in Julia itself. But the problem is if you have too many models, then it becomes a question of what choices you made, right? And if you have one single model with two as a developer, then the framework can be different or the library can say you can go and look at it. The developers don't have to worry about that. Depending upon my problem statement, depending upon my domain, I make a selection that basically becomes a work of like smart things. Yeah, to add on to that, right? You already have like tasks which are run on the same machine. When you add in threads to that, then you add in the worker slave model. So as a developer, I have like four or five options. But the thing is that you have a task, it doesn't matter if you run it on this machine or somewhere else. So Stefan already introduced the idea of tasks. In case it takes the same task, run it on my machine and if I have to, run it somewhere else. So that is how maybe you can just let me know. So there is a package called message utils where I implement in nail tasks. So Julia 0.3 for channels or not in code distribution. So I think some concepts from that, either a very online kind of thing. Yeah, so we can look at having that kind of structure in base. It will be great if you guys can get on and get out, get in on the discussions. It's always useful. How does the scheduler decide to switch from one task? So it's not preemptive, right? So it has a queue and I think Jeff can talk a little more about it. But it's basically just runs through that you see which task is available right now and then executes it. And the underlying UV layer notifies the scheduler depending on the events. So UV is the event-driven IO engine, including timers. So that's IO or you want to sleep for a couple of seconds. So all the good tasks are just done. And if any of them take control of the CPU and input an intensive task, other tasks don't work. It's not good. These are all switches. It's not preemptive.