 So what Eric provides is a Julia file system, aka for the JFS, similar to what Stefan talked about in his talk. And a Julia cluster plan as an interface for YAM, this is something that Amit is going to talk about later in the interface. In addition to these two, it also provides YAM and SDS as specific ATAs, which are so map-on to be read by systems. The E is pure Julia, it doesn't have any dependencies on JVM or C-Library, so it's easy to install and fight for it. So we use this protocol of course to talk to Hadoop and Iqtof and Iqtof. So let's get on and see some examples. So I have a standalone cluster on my laptop with this new node. To start you need to auto-fight it using the package node. And to connect to SDS, you get a SDS client and give the main node a question paper. So here I'm not connected, I'm just having a connection, so it's just false. You get connected in the first place. What is this file size here? Yeah. So use the regular file size to make a swd, it will be broken directly. And everywhere you pass the connection. DFS is the connection that we got. Read the IR, it gives the contents for the cd to take directly. So on and so forth. You can start a file that will give you the file size and the sdfs block size. You can get the blocks information. So for this particular file, there are these many blocks. So these are the offsets. First number is the offset, right offset. And the second array is the list of nodes that this block is provided. So you could use this information as it is and do your own files and on and so forth. To represent a file, you create something called sdfs file. And you can open it as a regular file. So open api or open call, use your IR stream. And you can read and write. So here I'm opening this file and writing it over to it. And you can give it back to it. So that's what I'm going to do. Okay, short introduction to Yan. So you create a Yan cluster manager by instantiating a Yan manager and pointing it to the resource manager. The fourth number is default. So now I'm going to connect it to my cluster. I can do add procs, which is the Julia API to add a new Julia process. And what I'm saying is on this Yan cluster, add one processor. And I can pass it on to my user. After I add the node, what I'm doing is I'm printing the node ID, the Julia process ID on every node. So I got from worker 2, so I have one more in addition to my master node. So that's printed right. And once you're done, you can call IR procs and disconnect. So as I said, it also exposes the native, it exposes other AKs. The sdfs and the Yan APIs, so that it can integrate in a more fine-grained manner. So with Yan, you can, if you use the native Yan APIs, you'll get much more fine-grained access to the resources. You can optimize your customers. But the port can be much more complex. Just to show an example, this is how you connect to your user information. And you get a Yan client. I have one more. I can print some more information about that more. I created a Yan application master. And I have registered two methods to it. So when I allocate a container, I get notified when I'm allocated. I can institute the runs on that. And I submit my application master. So once I have this, I can request containers. I am allocating one container now. So on this container, I can run an application. I can specify the power and governance. So once I launch an application, I can use it as and when I want. And when I don't need, I can stop and use the containers and finish and do this. So this is, if you need it, you can still do it. So that's a very brief introduction to the AKs. You can see, so we are probably packing in documentation, but if you just go to any.jl and look at the main source, every.jl source, you can see all the exported functions. And that's where we're going to do the idea of what we're talking about. So these are the exported functions. And since we use Protobuf, all the APIs that are exported by Protobuf are also available, don't export it. And there is a sub-module of Yann, which is Hadoop. And if you import that, you have access to everything that is exported. I'll use this on a slightly bigger cluster. Again, dummy cluster, it's hosted on one single machine but inside Docker. It will just start now. So this is a 10-mode cluster on a single machine. Everything is organized. So I'm a master node right now. So I have this file, Twitter link data. So it's a link data. It's a series of edges on the Twitter graph. It says user with ID 1 is following user with ID 1 on Twitter. So we intend to do a page map on this. So what I've done is I have this small package which I have with using any and few other packages. So the other packages that I've used are something called blocks. So blocks, let's just represent something, a large entity as a chance. So a file, if I do blocks of a file, I basically what I get is spritz of a file. If I do blocks in HDFS file, I get spritz for each HDFS block. And then there's something called root blocks which lets me do parallel constructs on blocks. So I have used these data packages. And what I'm going to show you is this data set with 1.3 GB. It's not a very large file. We'll just go to the source view to see how it works. I will put it right on the slides. So what I'm doing here is I'm starting Julia with a machine file. So a machine file has list of all the slave modes. So there's no startup Julia and give me Julia instance on each of those modes. I'm not using yarn for this. I'm just using clothing. So what I've got now is one master process on this machine and attached to Julia process on each of the slave modes. So I load this package. Now this is loading the package on all the modes. The next class takes a while. There are lots of warnings because I'm just using the most latest version of Julia. So after this I'm going to read my data set as a distributed sparse matrix. So each node is going to read whatever blocks it has. And it's going to play its sparse matrix. I know that I have a list of sparse matrices which is distributed. In particular, it's the working package. So what this is going to do is figure out all the blocks of the file and schedule jobs to read and pass it as a sparse matrix on all the modes. So this is done by Haru blocks. So in the meantime, I'm going to show you the code for this. So I have a reader. The reader reads a record from a block. In this case, we are creating a whole block as a record. We are creating a whole block as an array. And the map block is supposed to map each record. In this case, we are going to map the matrix. All that we are doing is creating a sparse array. And collect, merges all the individual matrices on each node and reduces the terms that we reference to that. So as the distributed sparse is calling all these internalities to create a sparse distributed matrix. And finally, I have this is the p-length conversion using power iteration. Let's see here. So I am going to normalize. So this is again in pilot. And this is going to run the matrix. What kind of integration is this? This is finding the binary. So that is the integration. This is the integration. And this is again distributed. So if this takes a while, I have limited the number of iterations, but it still stands there. How big of a graph do you think this is? This is 11 million, with 80 million. So it gave me one number. 3493, it says it is the most influential in all the years that we have. Just print. What is the sum of dimensions? Yeah, it has around 200,000 in and 200,000 in. Like, you know something? Yeah. So that's all that I guess. Yeah. This Julia is running on each node on the system itself. Yes. How is it accurate? For example, like a pilot can be multiple in different nodes. Yes. How do you make sure that there is no problem with a professional pilot? Yes. So how do blocks have a scheduler on its own? It creates one queue for each node from that worker on that node. And if a node is a multiple node, it is actually scheduled for all the nodes. And if it is processed by one of the nodes, it is removed from the node queue. I hear you. Yeah. How do you specify basically how do you do that? What is the problem with this? All right. So when I did using package, so in multiprocessing Julia, if I load a package on master, it is actually loaded on all the new workers. But you have to actually say that I want to. I want Julia to be running on all the nodes. Yes. Julia has to be running on all the nodes. Okay. Yeah. So you can either use the handle of that, or you can use the Julia customer service. Okay. But I will have to talk to you. Yeah. Yeah. How does it compare with a partner? We don't have it coming. If you want time preparation, they haven't done it yet. So I guess that's something we need to do. I can see that. I'd like to invite both two partners to open those projects. So this would be a great thing to do. So, I mean, the only comparison that I have is the regular file copy using this, or using the loops from my line. Both are similar. But I don't know how to do this. I'm pressing this bar. Just do it. I was just looking at the data potential. It looks like it is fairly mature now. What would it take to have distributed data files? How much work is involved in combining the blocks and the data files? Actually, we had an implementation of data files on blocks that were not maintained. Is it a lot of work? Is it a lot of work? No, it is not. Even though the old code is still there, it doesn't make similar issues at all. I think maybe for two weeks of work, I think that should be enough.