 So this is the first case where we are actually in this property. How? How is it in this property? You think you cannot do anything in this property? Okay. So essentially we acquire properties. Oh, I see. But I don't know what understanding is. So we are now providing a platform, a platform, where we can render this property for the few things. So basically we list the properties of the property and then we can come to this property. So we are creating a platform where we can render this property. Oh, I see. We take the whole property. We take the whole property. Because looking at the demand, how is the demand looking like? The demand is great. We give a certain amount of money to our property. Okay. So based on the data you are going to find out. Yes. Yes. So how long have you been using it? It's only three years. Okay. So what brings you here? I've been using Julia for quite some time. It's a good opportunity to learn from her. It's time to use the L. Awesome. So you guys are using the L? Yes. I have to start using the L. 9.35. Okay. I have to start using the L. So two things that gets me here, but instead of learning a lot about that. So we are using the L. Yeah. Okay. Because we are using the L. So it's a good opportunity to learn from her. So people can actually go out and find her. Okay. So where have you guys put up? I just went first place. It's just next. Okay. So planning to put up a team and stuff? Yeah. I've been doing stuff for a year. Now it's profitable to scale up. So next year it's going to scale up. Now it's profitable to scale up. So next year it's going to scale up. Next year it's going to scale up. So let's keep in touch here. Sure. I know this thing. I don't know how this is going to work. I don't know how to write. Yeah. It will be so good. Okay. But I don't know how to write. I don't know how to write. I don't know how to write. I don't know how to write. I understand this so that I can connect offline. Is this fine? Yep. Thank you. Have you, you know, yeah, no, it's fine to me at 40 minutes. Probably done. Yeah, I think it's fine. I didn't change anything. I'm about to check the hearing. The switching between hearing and not hearing, that's the end of it. Okay, I think that's fine. I'm sorry. Just a bunch of announcements before we get started today. The Wi-Fi to connect has Geek and the password is GeeksRUs. You can see the placards put up. You can just sign up using that. Also, all the volunteers are wearing a gray T-shirt or you'll see us in the yellow background. So we can approach this for any questions or any help that you need. I'm just going to bring in Shreyas to talk us through the QR code and how the contact exchange is happening. So just a moment. So we have been trying this out for a long time on the base work. It's only a matter of sort of apologies if you have, if you can't access the app earlier than I thought, but we're working on it. But the idea is that you download the app and you log in and it's scanned. And the contact is added to your contact list on the app. So I just want to let you know that this is a quick session on that. This is the app that's on top of the application. And the last announcement I'm going to make is we have a conference dinner happening tonight at 7.00 and I know that Anivita will deliver. So anybody interested can sign up and have a charge of 1,500 at the checking counter. Thank you. First off, we have Stephan. So the talk to us about solving the two-dimensional problem is in the right direction. Okay, thank you. I think having this on the chain here is probably good enough. Is that okay or should it be? No. Okay. I think this is the first time I've given a keynote address while sitting down. It feels like we're in a living room just having a conversation. So I think that's good. Hello everyone. I just got into Bangalore last night. So far, I've enjoyed it. Mostly spent most of my time in traffic or in bed. And as introduced, I'm going to be talking about solving the two-language problem in data science, but that's not quite right. I'm going to be talking about the solving the four-language problem in data science. And what four-language problem is this? You may not have any idea what I'm talking about. The two-language problem, the four-language problem, you certainly don't know. And I'm actually thinking of a very specific problem that I had at one point back. In 2009, I was still in grad school, I was a data scientist, although I didn't realize it because it was 2009. And the term data scientist hadn't really taken off yet. I did a Google Trend search earlier today, and I verified this in 2009, and it was variable talk about data scientists. But I had this stack that I was doing a project. And in this one project, I was using MATLAB for numerical linear algebra. That seemed pretty reasonable. But I was also using R for statistical analysis. So I would do some experiments with matrices, and then I'd collect some data about them. But doing statistical analysis in MATLAB was just very painful. So I'd export the data in CSV, and then load it into R, and then do the statistical analysis and visualization in R. Okay, so far, not terrible with two systems. But then some of the stuff had to be very fast. I couldn't get it to be fast enough in either of these other two languages. So I ended up using C for that. So there's C code in there, too. And then because you're loading data and saving data, and you're producing these CSV files and all this stuff, troubling that around in any of these languages is terrible. You don't want to write that code in C, or in R, or in MATLAB. So I had Ruby files to sort of handle all of that. And at some point, you step back at this and you look at it and you're like, what have I created? This is a monstrosity. I don't know if anyone here who was this illustrator who was pretty famous in the U.S., they even made a stamp of sort of honoring him at some point in the 90s. You make these contraptions where they sort of, like, one thing hits another, which hits another, which eventually, you know, accomplishes something useful. And that was exactly what this was like. So all these pieces coming together. It was actually generating R and C and MATLAB code from inside of Ruby. Total nightmare. So from my perspective, this very much was my motivation for Julia. I was like, we have to be able to do something better. And I was hanging out with her all one day and complaining about this. And he said, you know what? There's this guy, Jeff, that you absolutely have to talk to because we've been talking about this for years. And we think we can do better and we should do better. And so the three of us started talking on email about that and decided to try to do something. And Julia is what came with that. And I have a confession to make, which is I've actually done worse than this. I've done up to six or seven programming languages in one project. And there's always some justification for every single one. But, you know, it's really, it's a terrible confession to have to make. So what is the two language problem in the first place? This is a term that people may not be familiar with. There's this other thing that may be more familiar, maybe not. It's called Uster Holtz dichotomy. And then after this guy, John Uster Holtz, who is actually the creator of Tickle. So, you know, we can blame him for that. But he observed that there was this sort of division between programming languages that seemed to be happening. I think he observed this at some point in the 80s. And he noticed that there are systems languages and there are scripting languages. And the world seems to be divided this way. And systems languages are static, scripting languages are dynamic. The systems languages are compiled, scripting languages are interpreted. System languages let you define, the user define their own types. The scripting languages have standard types. They just have sort of arrays and strings and dicks. That's it. You don't define the types. You maybe can define classes, but I think even back then, the idea of having classes and scripting languages hadn't really become much of a thing. And systems languages programs tend to be pretty standalone. You could create an executable. You run it. It does its thing. It takes a little further to produce its output. Scripting languages are often used as glue. They connect lots of other pieces together. And so this, you know, at the time was a pretty clear divide. There were a lot of examples of this. And it seemed like there was just these two different worlds. And so, so why is this a problem, right? This is an observation. It's not really a problem. And so the problem comes in that because of this dichotomy, we end up typically doing a two-tier compromise. And so that is for convenience because writing in these systems languages is usually a little bit difficult. There's a lot of extra typing. You know, they're kind of verbose. So we use a high-level dynamic language that falls very much into that scripting language category. It has all of those properties. It's dynamic. It's interpreted, et cetera. We use that for convenience, but then we end up doing all of the hard stuff in a systems language. So all of the actual work gets in C or C++ or Fortran. And then all you really do is use this high-level language like Matlab or R or Python to tie it all together. And this is actually pretty good. It works well. I've written many systems where, you know, I write the part that has to be fast in C and I make it a Ruby extension or something like that. And then I actually play around with it and build all the high-level logic in Ruby. It's a pretty good design. It's actually really practical. So what's the problem? It has some issues. So the first one is, it's the hard parts that you would really benefit from having an easier language for it, right? So if, you know, all you're doing is the easy stuff and the easy language, that's fine. If all you have to do is the easy stuff. But when you're the person who actually is implementing whatever numerical algorithm or data analysis or whatever, if you have to do it in C because it has to be fast, now you're not benefiting from this convenience. You're not benefiting from ease of use that this other language is supposed to give you. So it's really only the end users who get the benefit, which is unfortunate. It tends to force vectorization if you want performance. Anybody who's written Matlab or R code is familiar with, you know, don't use a for loop. Like God forbid you use a for loop. It's a terrible idea because it will be very slow. And that can sometimes be very nice and convenient. You can write these very tiny one line expressions that can do a lot of stuff and are pretty efficient. But sometimes it's awkward. Sometimes you really wish you could just use a for loop. Other times it's wasteful because you end up allocating a lot of memory for temporary arrays. So finally, this I didn't realize at first. When we started this project, I had no idea what a big deal this was going to be. This separation of two levels creates a real social barrier. It makes a big wall between users and developers. So the developers live in the low level language and the users live in the high level language. And if you're a user and you hit, you know, you're trying to figure out what's going on, maybe there's a bug, maybe you're just confused about how something works. You can step through, you know, let's say some Matlab code or an R code or a Python code, but then as soon as you hit, well, if you hit the built-in wall in Matlab, you're stuck because you don't get to see the Matlab code because it's proprietary. But, you know, in R and Python, you could go look at the C code. I program and see all the time, so in theory I could do that. I've never done it. Because once you hit that wall, once you hit the C library, you're just like, ah, I can't be bothered. It's too much of a difficulty. You can tear down the difference between the user and developer, and then your user will automatically kind of become developers over time. So, this is that dichotomy again. And so what Julia lets us do is it lets us tear it down. That was supposed to be a build. I totally worked on that. Oh, that's great. Very upsetting. But yeah, so Julia takes, you know, it's dynamic and it's compiled, which is one item from one column and one item from the other. It lets you, users define their own types, but it also provides you with all the convenience standard types like arrays and dicks and what else are you used to from your dynamic programming line. It's a nice string type. You can produce standalone programs or use it as blue. Another way to look at this also that I found out is this, is that it puts this in a unique position in terms of speed versus productivity. Right, so this graph is sort of an aggregate over some benchmarks where the score of a language on a benchmark is what it's time, how long it takes relative to C assuming that C is about as good as it gets. And you can see that C is on here. It's on this list of languages and it's at exactly one. So C is as good as itself in terms of performance. And then normalize lines of code. I think that's how many lines of code normalize so that they, so that it's, I don't remember how the normalization is done, but so that this number kind of makes some sense. Essentially what it says is that from zero is the smallest number of lines of code per benchmark and one is the most number of lines per benchmark. And so this is a little weird because JavaScript looks really verbose here and this is actually because in JavaScript we have to implement our own matrix multiply. So it's a little bit of a cheat. JavaScript isn't really that verbose, but you can see that Fortran is up that is very far to the right. C is very far to the right. Java is slightly less far to the right but still you tend to write a lot of code. And then in terms of concise, you have other things like let's see Lua I think is pretty far left. Oh no, that's R. R is actually that purple thing over there. But you know, so some of these these high level languages. So essentially we have here these are the these are the systems languages and these are the scripting languages. And what you want is you want to be in the corner. You'd like to be in the corner between both of those and that's exactly where Julia is. So I think that's really satisfying. That's kind of the whole pitch that you can have your cake needed to. Okay, so let's look at that Rube Goldberg data science stack. So today I use Julia for numerical linear algebra instead of MATLAB. I use Julia for statistical analysis instead of R. I use Julia for the fast stuff too. And I use Julia to tie it all together. So now we've eliminated this four language problem. A lot of this you could also be done with Python. So Python has been growing very popular in terms of data analysis. The one thing you can't really get is you can't get rid of the C for the fast stuff. So you can get it down to a two language problem but you can't get it down to a one language problem in Python. There are systems like Scython, but I think that's a little bit cheating because Scython is not really Python. It's Python. It's like Python and C had a weird love child. And I don't really like it. Some people really like it. I find it very confusing every time I look at it. But it teaches on. I like having one language that is truly just one language that I can do all of these things. So people are going to talk about some statistical analysis, some visualization, some machine learning and numerical linear algebra later on today. So I'm going to focus on this last item. How Julia lets you tie all these things together. How it acts as sort of a ruby or rake file or Python replacement. So what does it mean to tie things together? Well, so one of the natural things you have to do is you need to read, parse and write files. You can't live without that. To get to the computation, you need to have done all of this grunt work just to get there and then you have to write it out again. Running external commands, this is huge. Sometimes you just have, you want to call a Unix command. All you want to do is call, find or sort or something like that. You need to do networking. You need to pull stuff off of the network and you need to also be able to serve things up to people. And you also want to be able to call other programming languages. So just by the way of example, you know, you would not generally if you're a same person consider writing a web server in that lab or in R. I believe people have actually written web servers in R. But that just seems crazy. In Python, it's totally reasonable. So that again is another place. Python is a general purpose language which is sort of have the numerical stuff wrapped into it. So it's very different from these other numerically oriented languages. So now let's see some code. Let's say it's 9.55. What time are we on? 8 minutes. That's great. So now I'm going to turn the mirror on. Okay, so we started Julia. Now let's see. The first thing we might want to do is download a file. Okay, so there's a handy download command. Now I actually have this in my history. This guy, Jared Lander, a friend of mine who was in New York, does a lot of our work. Somehow he managed to be friends anyway. He even helped me run the Julia NYC Meetup. He's very proud of Julia. But he still does all of this stuff in R. So we can download this housing data which if it's too slow I have a copy so I'm just going to not download it. But oh, yeah, it's pretty quick. Okay. And so we can see that it's switched into a shell mode. I don't know. People may or may not be familiar with this, but if you get semicolon at the beginning of a line, you get shell mode and then you can just type shell events. So we can type, look at the top of this and what we can see is that this is actually a it's a CSV file with a lot of data about housing prices and various other things in New York. So let's load it. There's this read CSV command. So we get as in any array, it's just got a bunch of junk in it. All strings if we look at some of these let's just call it HL type it a lot so it would be easier. Let's look at the fifth column. So those are all integers but it actually ends up being in any typed array because it includes that first thing. We really like these to just be straight up integers. This is kind of awkward. Well, it turns out there's a whole package for dealing with this kind of thing, data frames. And you'll note that loading that, if you were on version 0.3 is painfully slow on 0.4 it's so quick. It's so nice. This is because we can pre-compile cache packages that we've already looked at before which is a really huge bet and blessing. So instead of doing read CSV, we just change this to read table. And now we have something that's a little nicer. So we can do H column 5 and we see that that's actually a data array of ints. And we can do much more stuff with that. It's a lot easier. I'm not going to spend time actually analyzing this data set. I just wanted to show how you can download things and then read them so let's say I wanted to I kind of thought this was fun. Here's an example where it shows the kind of thing that you can do you can do this in Python, you can do it in Ruby, you can do it in Perl. I actually think it's nicer and easier to do this kind of thing in Julia than in any of these other languages. So here's an example that's a little fancier than I wanted. But what this shows you is that you can you can open a command so in Julia, let me start with the other things. So if you want to run an external command, you put in backticks. This is pretty straightforward. So in other programming languages the backticks will just actually execute the command. And I had a friend who I was working with who writes a lot of Ruby code and he's one of the most productive programmers I know. He also does these things that drive me crazy because if you put everything in backticks what it actually means is take the output of this Ruby anyway, take the output of this script and save it into a variable, but you just use this to run stuff and do it everywhere and I'd like sort of twitch a little every time I saw this code but it didn't matter it worked. It worked until it didn't work and there was some subtle bug because he was splicing a variable and he was filing with space in it and then the whole thing breaks. But watching him it would bother me. I would get that little twitch every time I saw him do something that I knew was technically incorrect but I was also like why is he wrong? He's not wrong. The programming language should actually make what he wants to do right. Instead of, you know, giving him a hard time and just being like no, no, no, don't do it that way. This would do it this other way that's much more difficult, you know, I said, okay, well, let me, I was at the time developing Julien on the side. I was one of these crazy people who had his own language because it was 2010 and nobody had ever heard of it. And I thought, his, the way he used the back ticks in Ruby inspired me to try to do this better than Julien. So, let me show you some of the examples. So, let's, let's, you know, take the first, I don't know, three lines of that has been filed. So we have the command and we actually just want to run that. Okay, so we get it, that's cool. Now, let's say we wanted to actually cut out a couple of columns using, just using, using Unix tools. But we can do a pipeline, which D comma dash f one comma two, we'll do it. We'll see, sorry. Okay, so there you go. So, one comma two, maybe three, four. Okay, so now we're using Unix tools to cut out a little tiny slice of this thing. So what's going on here, pipeline is like, is, is constructing a pipeline between these different things. The same way that you could use a pipe, pipe character in Unix. And then run runs the resulting thing. And you can see that if we don't run it, we just get an object that represents that pipeline. And it, it sets the standard out to be this other file. But we can run it. Now, the key, the reason I didn't want the back ticks to just get to run the thing is because a lot of times what you want to do is something else. You might want to do something like, read it. Instead of printing the data, I want to capture it into a string. So the idea was you have this object that represents the thing you want to do and then you actually can read from it. So of course, you know, we don't typically have a fixed file that you want to read. You have some, you know, file name that you want to place in there. That's not going to work because files are defined. But let's say we do temp housing.csv. Okay. And now that works. It does the thing we wanted. Let's just run it. Okay. But, okay. Let's say we move that temp file with temp housing with spaces. Okay. So now we can see that in the terminal if we cap this with appropriate quoting we do get the file. But this obviously isn't going to work because now the file doesn't exist. So let's try changing this housing with spaces. Now this works. Why does this work? This is a really good question. This would totally break if we were doing it in Ruby or or Perl or something like that. And the reason is because if you take a look at what happens with this object when we actually interpolate that it quotes it for you. What is going on? Is this magic? So what's actually happening is that you this back text syntax, the easiest way to think about it is it's actually a very strange array syntax. It's actually a syntax for array literals. So if you take this thing it actually has a field called exec which is the array of things that actually is going to get called by the pass to the exact system call. And what that means is that we're actually never calling a shell. We don't do the whole thing where you pass the buck to a shell which does all of the work and actually constructs the pipeline for you. Because the problem is then you have to sort of work around all the things that the shell makes is special. So instead the mantra here is don't use a shell be a shell. So the idea is that Julia implements all of the things that your shell implements with sort of similar syntax. And that way it knows about things like well, okay, files actually it actually just takes the file object that puts it in there as the third argument because it knows that that's what's supposed to happen here. If you have, for example, multiple files and this are the graphs the red and also obviously there's a 550% reduction on that part, on that part. This is what's interesting and the database performance is looked at. And I'm on time. Thank you. So questions please. Yes sir. You said that you spent almost a year or so on some of this stuff. How much of the gain actually came from just the making room in your organization for experimenting with a different system and how much of it came from Julia's So let's talk of one week so we are currently in C, PHP, PHP models calling C The kind of maturity we have got as an organization for the business tools, obviously Julia has no contribution at that part. So we are looking at a particularly better object and faster way to compute. So one of the problems with C is when you see coding, it's very unforgivable. So the kind of skill sets I need to make an optimal program often makes it a constraint factor and hence a lot of the business logic which is on the fly is getting done in PHP which adds to the problem. So the process I'll go to all these three together have to be done and resources are part of the process. So we will look at blow up we can talk of in the previous session and we did look at go go with Lua embedded so that Lua doesn't thread, go threads if you increase, if you create the instance of Lua from inside go, expectively you are making Lua multi-threaded, we take that out. We used some sort of, I don't know if you have heard of Sphinx, it's a full-text search engine where you can do some interesting stuff you can put compute on C index and we tried Python for which we have because P is not just PHP it's Python for it now. So what really took the case away was it was very intuitive to program and I had my business users who are actually data scientists math and editions who understand the maths and economics of it being able to deploy a program with slightly more volume in case otherwise they used to become, so programming was expected to sport. The data was by the subject matter expert in the business who was sitting and telling the thing, no, no, I don't want that I want this. So we have taken that out and that's the biggest advantage of a language like Julia coming to business and really make it effective to see code as you see or code as you visualize. I guess that's the biggest advantage. Doesn't answer your question. Thank you. Any more? Do you have time for one more question? Yeah. I mean it's not, when you look at the graph you just question what are you doing for seven years so I would not prefer to answer that. But see, we got the business to a state where we can deploy Julia so it has a merit but of course with time you have to move and accept that the fact that it is super normally faster and that gives me a scope of doing more things. So I would actually make Julia come to that but probably with 40,000 more applications. Thanks guys. First of all it's a pleasure to be here. Special thanks to Julia for inviting us to come here. So I'm going to try and make it very very short and it's my part we're actually going to present it in two parts. The very first part is first of all my understanding with this crowd uptake is probably going to be really really fast. So I'm going to brief through slides. Stop me if you guys want to answer. Before we get to Julia computing and the analytics part of what we do is just want to set context about why we do what we're doing. Essentially we believe that the Indian retail industry, very similar to what Desdael spoke about, very very distributed, very very fragmented and very large, $600 billion retail industry but extremely low on data. There are millions and millions of retailers out there who have no clue who their consumers are and that really happened essentially because of the last 25 years of rapid economic development and most importantly organization that really broke down the consumer-retailer relationship. Earlier we knew each other personally, they sort of broken down and today we have this whole large country with massive industry sitting there practically zero knowledge of who their consumers are. Literally 1% of people actually know who their consumers are that bad. So we operate in that context and our aspiration is to actually try and make sure that we try and help that situation. So imagine a situation where you have 500 million retail consumers, actual consumers who are spending every day in various places and pick up 500,000 retail outlets across India. Both of them plug into a common platform which is creating unbelievable amount of consumer insights and analytics and I'm not talking about if somebody poking somebody or somebody making a comment I'm talking about actual consumer financial transactions working in the store and buying something like that 500,000 stores and 500 million consumers. So our actual aspiration is to create something like that, to create real altogether into a common platform and if I go back to a few years 3 or 4 years back and if I were to say something like this it would have actually sounded ridiculous for any single small organization how can we do this, how can we do that. So that's been our approach. Before I actually go into how we do it today as an organization we manage loyalty programs for about 405, 409 million debit or credit cards in India we process close to 40 million transactions every month. Cumulatively it's just over a billion transactions. We actually completed a billion transactions this month I'm sorry, in the middle of September and when I say a billion transactions one individual walks into a store swipes and buys something we call it one transaction we know who that individual is what is the profile, name, last name last name, email, mobile all of that and we also know what did he buy what kind of store it is it's a massive amount of information. How do we get to even redeemed over close to 2 billion loyalty points where consumers actually step forward after having earned so many points within the redeemed and that also throws a whole lot of consumer behavior insights. So how do we do this, we essentially run loyalty programs for most of the largest banks in the country when we say run we actually keep all the consumer data with us we get all the transaction data we process the transaction data and award points and those consumers after having accumulated points for a while after redeemed those points we process the redemption and all of this actually means massive amount of consumer data and so we're not behaving data and then we go and create a lot of retail partnerships and we tell them this you guys actually don't have a lot of information about who your consumers are and that's perhaps why you do so many sales come to school, go to school, Diwali generally all kinds of sales and that's essentially because they really don't have in the retail industry really do not have a lot of understanding of the consumers and the consumers behavior so that is really what we do and that's where the analytics are coming where the aspiration is to sort of create a common platform where the platform is actually called max get more there are different kinds of retailers coming together plugged into it and there are millions and millions of consumers plugged into it and as an organization we sort of try and see how we can learn more and more about the consumers behavior who likes, what likes, dislikes preferences, frequency general all kinds of information and help the retail industry do business better with the large consumer base of this country that's really the aspiration and that's where the analytics are coming serious, heavy duty, high end very large volume analytics and we've been playing with Julia for some time even actually before I met Viral and we're still trying to learn more and progress further on that line but Wipo here who heads analytics for us our entire analytics department Wipo is going to take us through how we do that analytics part of it but again, you know it's wonderful to be here and as a disclosure I would say that I did my bachelor's and master's in history yeah, so it's just really geek and Greek for me thanks Viral, you should take the Greek part now here, hello everyone good morning so we just talked about what do we do, what is our overall vision I'll talk about the analytics vision and what kind of problems are we trying to solve, what kind of problems have we solved yet and how are we using Julia there are a couple of case studies that I want to talk about so we have all this data 400 million plus profiles 1 billion plus transactions across multiple years 1 billion plus transactions being processed everyday there is a lot of response data that we collect from the millions of candidates that we send and there is a lot of data that is being generated and collected right now from the mobile apps and the web output that we have so what are we trying to do using all of this data so there are two main beneficiaries of our analytics so to speak one is for retailers we want to build a platform where they can access data driven insights about their consumer base and take informed decisions about their consumers so right now as we just said retailers just run sale and sale without really knowing what their consumers want so that's what we want to the merchants to be enabled and the other part is the consumers for consumers what we want to do is we want them to create a platform through which they can receive the best possible offers that are available in the market without really them searching for those offers those should be sent to them using our predictive abilities and again because in today's age there is so much marketing that is going on we really want to not bombard our consumers with too many communications so this is the data that we have I'll just talk briefly about it we have the consumer demographics for 400 billion consumers right now which is basically the name, address, age and gender we also have all the transaction history for them this goes up to 5 years of transaction history and from that we can generate all these different dimensions or variables from which we can infer all the behavioural insights about them then there is a lot of campaign response data millions of campaigns being sent every month and we generate all the response data how many people click how many actually responded how many took the offer and all that because the program itself is a loyalty program there is a redemption angle to it so there is a redemption behaviour as well and from the app we get the interactions data and location data as well this slide just talks about our current analytics capability I'll just mention the tools that we use because this is about the tools mainly about the program language so currently we use all these things SQL is there, R is there Python is there and currently recently we adopted Julia actually one year back but we haven't really done a lot of stuff with Julia yet but there are a couple of case studies that I'll talk about and some techniques in machine learning we use a lot of techniques like regression, decision trees we have also tried to give you state trees, cluster, collaborative filtering so all these things are being used and probably we will use Julia to do these things later on in terms of the problem that we are solving just to give you a brief understanding of the complexity of the problem that we are solving so that we can understand what is the real need for Julia at our kind of organization so for any analytics organization data quality is a real issue and we have the data from public sector back so data quality is a serious issue for us because data quality is little worse than other banks in case of public sector banks so there are a lot of data quality issues and we have to we have done a lot of projects to improve this data quality through our analytics for example we have done a project to identify which transaction actually happened online which indicates a lot of consumer behavior in terms of who are the people who are willing to purchase online or who are not willing to purchase online or things like getting the location from the consumer's residential address because in India residential address is not at all standardized and in data there are too many mistakes there are temptations, there are spelling mistakes getting the location in terms of latitude and longitude from a geolocation API difficult from the original address so we had to clean and find out a searchable address so to speak so that it can be fed into a geolocation API gender identification from name age estimation from name other kind of identification from name so all of these problems we have currently solved the other thing is consumer behavior prediction so what we want to do really is to consider consumer behavior as accurately as possible because that's how we are going to you know identify what to consider once and then send them the right communication at the right time so for this we have some of the techniques like logistic regression gradient boosting and use these techniques and the data that we have to predict what is the consumer going to do in the next month using the 3 month or 6 month history now in terms of the future problem that we are going to solve is one is real time location based offers so this is going to be the future of these things because we are going to be we are going to be very mobile app heavy and everything that will happen will happen through an app so there real time location based offers become a big requirement so the kind of problem that we need to solve is again predicting the consumer behavior searching for thousands of different offers that are available and finding out the right offer at the right time and sending that offer to the consumers identifying the correct better creatives for sending the offer and in case of a self-serve platform that we are going to create for the merchant we also need to identify which are the good creatives are screening those who are something objects-nable or controversial content so this has to be done through machine learning cannot be done manually and the other one is called PENCY GAMPEN's framework which will enable us to send communications to consumers at the right time and still not bombarded with too many communications now this is the first Julia case I want to talk about and this was implemented one year back so both the cases that I want to talk about these are not really very complex problems so to speak but they are very much traditional intensive both of these problems are related to data quality first problem is let's say you have a merchant name and you want to identify all the transactions that happened on that merchant because in the data that you get the transaction data you will find all different possible variations of a name of a merchant name for example, let's say there is a brand called Irwin Brands so these are all the different kind of variations that you will find Irwin Brands, Irwin Brands Irwin Brands so all kind of different variations they will be spelling mistakes, translations, everything so the problem was really to match strings with different strings that are present in the data so it was not the closest possible match it was for the string matching problem so for this what we did was we created so we first tried out the standard instant distance algorithm but it wasn't really working well for us so what we created was a generalized 18 distance algorithm where we defined 15 different operators based on observations from the data for example, add delete punctuation marks add delete spaces replace first character, delete last characters all these different combinations that we found out by observing the data and then we assigned different weights to these operators and tuned those weights from the data so this algorithm was being implemented earlier using a CLR function which was written in C sharp on a Microsoft SQL server and it took about 36 hours to do a first matching on a set of 400,000 strings so this was a long time because we wanted to complete this exercise every month and it was a time critical job so to speak because we wanted to do it within a couple of hours maybe and that's why we had to shift to something else based on our experience with R and Python we knew that it is going to be very difficult to achieve that kind of performance in these two languages and because I knew Julia from some of my other work I actually implemented it in Julia so this function again was created in Julia, it was written and tested and optimized in less than a day with people who actually didn't know Julia earlier but had worked on just Python so pretty fast development time not a lot of time and without parallelization this code runs on the same treat as in less than two hours so that's what we wanted to do and it is a performance gain of about 20x and with parallelization on 8 to a set it up becomes more faster about 7, 8 times more faster so it is hundreds of times faster now so we have reduced this problem to just make ourselves competition from the next one is the one I mentioned because in India residential address is not so standard as it becomes very difficult to find a matching in geolocation in PIs so for this one again we had to find out from the residential address the original one we wanted to find out a searchable address based on maybe sub-area or area for example in this case these strings would be back to sub-area or areas for example Shri Shri Kupa Society Rajam Sarked Neymar Nagar Kharish Mumbai would just become Neymar Nagar Kharish Mumbai but then again this problem is difficult because of the spelling mistakes and truncations that happen in the data so the approach really was for each city what we did was we found out which words are appearing very frequently so the top words would be the words which are spelled correctly those are the correct variations of the actual words using those words we had to clean the other data then again recap to the frequencies of one word, two word or three word variations and then map them to let's say areas or sub-areas and then search two areas and sub-areas into the geolocation and then we found out that we were getting very good hit rate from age over there so in terms of performance the earlier quote was written in R a highly vectorized quote pretty difficult to read and manipulate and it took more than 7 lists good so we missed out on the MIE conference sir nothing thank you so all the participants who are coming in for the workshop tomorrow please retain your gadgets new gadgets will not be issued you'd also like to high call our sponsors for supporting Jolio India, Ola, Intel, Fausti loyalty rewards global there is a conference dinner tonight at 7.30 at high note bar on 100 feet road if you're interested please sign up at the check-in counter the cover charges of 1500 rupees and the last announcement please request all participants to keep your phones inside your phone thank you thank you good afternoon everybody my name is Fominathik I'm here representing the data sciences group by Ola Caps so I mean it's going to be here today in this first level but we'll have a conference happening in India and I'll be here meeting all of you so I'd like to thank the sponsors and the organizers for this so to the purpose of this talk is to basically give you an idea of the problems that the data sciences group at Ola Caps is working on to sort of tell you about the techniques that we use the algorithms at a high level perhaps and to sort of try and see in your interest so this is going to be a really short presentation so we're going to get right into that talk so I'm going to be touching upon as many areas as possible in this time and if any of you would like to have a deeper engagement or a deeper discussion on any of these problems please do reach out to me after the presentation we also have a few data scientists from Ola Caps who are here today and tomorrow so we have so please do reach out to them or to me in case you want to have any deeper discussions so Ola is today one of the largest B2C service providers in the country so we had too much of an introduction in that sense but I still wanted to highlight a couple of points about Ola which I think not many people would be aware of one is the scale at which we operate so we have a fleet that is of close to a million vehicles today so these are vehicles you know crafts they include compact cars, sedans luxury vehicles and you know buses and shuttles as well so the reason we operate such a large fleet is to essentially cater to an audience of several million active users who use our service on a rectoring basis and we have a presence in all of the tier 1 cities in the country and even most of the tier 2 cities so we operate across a length in Kerala and so that the scale with it brings opportunities for optimization which are like a hundred so the data thanks group at Ola is sort of interested that optimizing this business and to try and make it more efficient for the customers and for our driver partners so that is what the second thing that I wanted to highlight is the nature of the demand spectrum in India so especially viewed from a perspective of affordability there's a wide range of demand that is available so for instance today our default taxi Ola mini service is priced at a 10 rupees per kilometer price point which is a prevailing price in most of the tier 1 and tier 2 cities and this is something that is affordable for a common man in the city of India whereas if you go slightly towards the tier 2 cities or even smaller towns over there 10 rupees per kilometer is not an affordable price point so over there we have we are looking at services like Ola share which would essentially allow the user to programmatically share his rate with a friend or someone that they know and which would essentially halve the price to a 5 rupees per kilometer price point and at that point you have more audience coming in so over you look at people who essentially use buses for their daily commutes and they travel at a price point of 1 to 2 rupees per kilometer so for them we have this service known as the Ola shuttle which we are just launching and over here if you have a bus which can seat 40 people assuming it's fully occupied even if each person pays a rupee per kilometer that's like 40 rupees which is enough to sustainably run the bus so the reason I wanted to make this point is that at this point we are only the users of the demand for transportation in the country so as we every time we launch a new service no matter how small the start is like the shuttle shares services today we find that it immediately unlocks a new set of people a new pool of demand which was not visible earlier so these people were not engaging with us when we were offering our service at a 10 rupees per kilometer price point so this platform is bottom heavy which means that we are still to reach out to most of the people in India so that is one of the missions behind operating in this logistics sector for Ola and we also have other business lines we have Ola money which is a currency which allows cross selling between the different products that we have we have Ola cafe which is a food delivery service Ola store which is a groceries food delivery service and so on so what is the data that we generate so every taxi every vehicle that we have on the road is essentially a source of data for us so every taxi sends a beacon to one of our servers every once every few seconds and it shares a wide range of information including accelerometer readings automator readings GPS location, status codes and so on so it helps us assess the level of prevailing traffic in a road at any point in time given our vehicles and the signals coming from vehicles we will be able to model the traffic at that time on that particular road and we also will be able to sort of derive relationships and understand traffic flow better so we know that if point A or mark early is going to get clogged at 5 p.m then some portion of the traffic 25% of the traffic would flow into Dongur at 6 p.m. because it would not be able to stand even 1 fourth of the traffic that mark early is able to so these are some relationships which help us forecast traffic and with this we will be able to make our service more efficient in terms of giving better ETAs and alternate routes which should be faster so that is the main source of data apart from it we also have customer data in terms of who is using our app the right details and so on we give the most importance to user privacy and all this data is used only for improving the efficiency of the business for the customer and for no other purpose we also have data from other business lines like I mentioned which would help us understand more about the deeper so I just thought I will give an overview of the kind of problems that we are faced with pricing is a very important problem for this like I mentioned earlier and also there are different flavors of pricing there is search pricing which you would have noticed if you have used all that recently which basically increases the price to leverage the mismatch between demand and supply so if you have it works more on the lines of you know how adline pricing works so when you have more demand and less of supply you sort of search the price automatically so apart from search pricing pricing itself is a fairly deep area in the sense that you could price the service at 10 rupees per kilometer and allow it to surge up to 2x so that will give you a max price of 20 rupees whereas an alternate philosophy is to set the base price at 5 rupees per kilometer and allow a surge up to 4x so these two philosophies will have radically different reactions from the audience for instance a lot of philosophy will ensure that your demand gets smoothened out across 10 which means that there will be a lot of people who would wait for the surge price to drop and then take the ride at a much lesser price point and with that influence the traffic flow in the city to some extent or even at this point the scale that we are at will be able to ensure better time of arrivals faster times of arrivals for users who travel so pricing is a predominant area forecasting is another area wherein we would like to know what is going to happen in the future as an example I gave about traffic relationships between traffic is a very important area that needs to be forecasted we also forecast demand from users so that we can appropriately source supply and we also forecast supply in order to ensure that we meet the needs of demand so essentially we are running a marketplace where demand is represented by the customers who want to use our service and supply is the drivers that we work with so any scheme or any marketplace and we are the business, we are the manager of the marketplace so any marketplace initiative you would know would impact both supply and demand typically in opposite directions so we will have to ensure that we hit a sweet spot between supply and demand so for instance if you increase the price a lot then supply would be benefit, drivers would be happy but users would be turned off at the same time going the other direction would result in the opposite reaction so it is important to strike the sweet spot in the regular initiator and also given a bunch of initiators that you have a new and sourcing strategy even that you have a new incentive strategy how to ensure that all these strategies sort of work together ensure that the balance in the marketplace is not tilted because moment the balance in the marketplace is tilted then one side would deflect to a competition and moment you start losing supply then demand would fall aside so the managing this still is a very important part of the problem which we work on forecasting sort of helps us because it uses the view into the future and helps us to foresee what's going to happen ETA and Able Time Prediction ETA is a very sensitive matter ETA stands for expected estimated time of arrival so whenever you book a taxi on the OLA app it's going to show you a time in minutes that the driver is going to take to issue so this is a very sensitive matter because we know that the moment this ETA promise is violated cancellation rates peak up so the moment you cross that ETA time the probability of cancellation just shoots up exponentially so it's a very business sensitive metric in that in the sense that you know you'll be able to satisfy your demand better you'll be able to manage expectations better if you're able to predict ETA correctly and predicting ETA requires understanding of traffic which I talked about earlier and also I guarantee rather than time so when you have a sharing service for instance when you want to share or you write programmatically with someone else then it's important that your time the recent destination is sort of there's a bound of it when you cannot be deflected your crew cannot be deflected too much to ensure the other person gets picked up so guaranteeing on arrival times is also another area for which ETA prediction destination prediction is important in the sense that today we don't ask for the user's destination except in the case of very specific destination points like airports we don't really ask the user where he wants to travel the reason for that is because we don't want to bias in favor of you know people who are going to travel longer distances or whatever so it's irrespective of the destination you will get the same quality of service and so destination prediction is important because it allows us to foresee where the cab is going to end up and potentially the cab can take a booking even before it actually ends up there so estimating the supply pipeline helps us ensure that we are able to actually show the real supply that is available to the user so some of the other areas that we work on programmatic incentives incentives that are given out to drivers incentives that are given out to drivers in order to ensure that they stay on the platform it's often programmatic we have more work on the driver side we have predicting attrition for instance of drivers as well as users so once you figure out that a driver's usage pattern sort of became that he's no longer as active as he used to be on our network then the moment you immediately try and give him a better package or a better set of incentives so that he again resumes working for us again the same thing applies to users as well so it's important to ensure that when the user is slipping I want to give him a correct set of incentives to make sure that he comes back on board not at best optimization is something that we talked about we have thresholds and critical metrics which are sort of set forth by the business ETA being one of them inversion rate on search pricing being another so we have these critical metrics it's important that when we do all of this spectrum of projects it's important to ensure that these critical metrics are preserved so each one could pull a metric in some direction so it's important to have the right balance between supply centric initiatives and demand centric initiatives to ensure that the balance is maintained and that is another that sort of handled in an automated fashion so as I said we get a lot of data from the taxis that we have on the roads so we get data from the driver's accelerometer which sort of helps us to understand his driving profile how risk taking he is and so on and also every time you complete a ride you rate the driver the driver rates you as well so in order to do a driver rating system the straightforward way of doing it is to ensure that the drivers who get rated high by customers who are rated high are actually good drivers so that's sort of getting into doing something like page rank on the ratings loyalty programs I think I talked about so you sort of tie in the driver so today we have initiatives where we source cars for some of the drivers so a lot of people I mean so this is actually so drivers are pretty interesting because it's a new section or new classification of employment that we've created so these are actually self made entrepreneurs who are not strictly tied to any service per se and you know they are able to sustain themselves by getting access to demand in a very programatic and predictable way so that earlier the driver ecosystem was not properly managed in the country when there is a tourism season your earnings would peak and the next month it would just take off so people are not really sure if having been a driver is a good long term occupation but as you know by sort of programmatically providing them a channel and source of demand we sort of ensure that the driver ecosystem is right so user intent prediction is another thing to understand as much as possible about the user and the moment there is we realize that there is a need for him to try and to write this and so on and so forth so very quickly on some of the models on regression we use GLM generalize long linear models so there are a lot of regression activities like predicting what the conversion rate would be at a certain search price in a certain location at a certain time so this pricing is very sensitive to location is very sensitive to the category of service there are relationships between across categories so when you price one category at 1.5x of the other there is a certain relationship there is a certain flow if you sort of bridge it the flow would be better so understanding what is the optimal price point given the category given the time and given the location is important and that is one of the most important regression problems that we welcome problems as well like regressing ETA, regressing when there is some property of demand for a given user and so on and so forth so we have a combination of more categorical and numerical data which we use to train these models so we sort of decouple the categorical attributes and sort of use tree based models like gradient booster trees random forest in order to may leverage the categorical attributes and then once you've clustered or once you've worked on the data using your trees on each of the resulting and sharing partitions you apply your numerical techniques like the other time series modeling is important for casting so we have standard time series models like Arima we also look at the Fourier coefficients and try to project into the future last dimensionality handling is again taking care of standard techniques like TCA or regression and there is as I said every taxi being a data source is a lot of data and hence most of the algorithms are most all of them run on distributed systems so we use distributed systems like which are partly well known and which is part which is sort of something that's picking up so one of the reasons why we are here is to try and see how Julia will be able to help us you know in these initiatives unsupervised learning of course you know we even that we have categorical data we try to train our user tree models to clustering and we also do training and so on and so forth there is further graph problem you know for to understand and model the nature of traffic, the nature of traffic processing we have a large distributed graph which is powered by Apache's Giraffe which is an open source large graph processing system which we use to have median graphs with tens of millions of nodes each node represents some point of significance of the city and the LB sort of train our ETA models of all of this and also we have constraint optimizations even important when we manage a marketplace because we are a business sensitive matrix which are to be preserved as I said you know when we have these initiatives and experiments happening how do you ensure that the market balance is not let go so that is so the other thing I think I have basically done when I came to this so do you have any questions so so the thing that you mentioned about is the problem that you told about is a classic perishable inventory problem wherein you want to forecast the demand and on top of that if the demand is greater than the supply you want to optimize that demand so in the last slide the last block that you mentioned about optimization is that what you were referring to I mean is that what you are using for optimization that is one of the things so we have ultimately it is a constraint optimization problem everything is a constraint optimization problem but the individual constituents of this the objective function and the constraints have to be predictive because they are not already known so it is important that we sort of train models for example if you want to forecast or if you want to predict what the conversion rate would be you want to leverage the price at certain factor so that is something that would have never been tried before so it is important to predict that so constraint optimization with individual constituents plugged in via prediction that is the way it works so just to add on to that one so essentially a cascaded model where one set of things are being as a subset of problems which finally funnel into your P which gets solved and then you make sure that your cascades are not so it is a multi-level cascaded problem in some sense at the bottom you have the constraint optimization you know models cascade of models flow again just one last question are you distributing that at all LP we are not distributing how large is the LP how large is the LP so we have number of constraints number of constraints running into about 2 tens 200 did you arrive at the price of 10 rupees by optimization I was initially arrived at Betrayal under so right now we are doing some amount of optimization we cannot optimize it and we cannot have a price like 8.7 so we will have to ensure that the prices are smooth but there is some amount of experimentation so that is the reason why you find that across cities the price varies a lot that will be the highest level of optimization so if you take it to that question you said trial and error are you doing systematic experiment design when the company was started there was no data scientist group at that point but right now yes there is some systematic some levels at which we can set price business sustainability is important but given restrictions by business there are some levels which we play with it gets more complicated because pricing one category at a certain point would have influence on the other category for example if you make your compact cars very cheap sedan drivers should be upset and they would differ so it is important to ensure that that is exactly where I was going it is actually effectively a machine that is running and the side effects and also being able to it is not just restricted to yourself it is restricted to our other business scientists it is restricted to auto rickshaws for instance if you price the mini too low then people who use auto rickshaws will deflect the mini at the same time people who are using sedans would also deflect the mini so both sides would be upset so it is important to get the price spectrum right across the range of services so when mini surges it could approach the price of a sedan so over there what is the market reaction today interestingly people are still not deflecting the sedan that is something that we know even when the mini is actually after we apply that multiplier of whatever 1.5 or whatever if it exceeds the price of a sedan people are still booking the mini and if they don't find the mini they just go away so I hope you guys understand this thank you so the activities that the user does on the Ola app are recorded so they add anything that we look at when do you open the app when do you want people to view when do you use any of the other elite services these are the signals about the user so any other questions do you get to know when I open Ola and then you open the Ola app I cannot say but we know that alright thank you so please do please talk to me if you want to have any for a discussion just one quick announcement for the next session we will be distributing feedback forms for the morning session during the Q&A for the session so we really appreciate that you take some time and rate each talk that you collect you can give us the feedback in person for that asmr you can form thank you alright so we've covered our session on applications of Julia in hopefully what was the real world now we go to a little bit more of our open source and academic pursuits and that's great it's great to see sort of all these different things come together right your academia your research you have open source contributors your business is everyone using Julia and the next session is Julia and parallel computing so we have Ranjan who is going to be talking about multi-threading in Julia then we have Panmay who is going to be talking about you know the data problems there are root ecosystems and so forth and we have Amit afterwards who is going to be talking about concurrent and parallel programming so all of these guys have been contributing to Julia for a long time and they are sort of the four contributors so no further thank you for the introduction so good afternoon everyone my name is Ranjan and I work for Julia computing and I'm going to be talking to you about multi-threading Julia and most of this work was done with Dr. Kirp Nani from Intel Labs so let's get started so let us ask him a fundamental question why threads? but before we ask that why multi-core? alright is this better? or maybe I could just hold this so is this better? okay so why multi-core so as you can see in this auto familiar graph here CPU speeds have been flowing at about 3.3 GHz and this so chip manufacturers essentially are putting more cores on one chip so the responsibility has been transferred to the programmer to think about that application and write it in parallel so it can use all the cores on your chip so this is essentially what's happening in hyperforms computing today most HPC codes are moving towards hybrid MPI plus X models MPI because it stands for message passing interface so essentially you have an application and you have a cluster so your application will be run by many different processes sitting on different nodes and each of these nodes is a multi-core computer hence these processes would be upgraded or rather using a shared memory model to work on the multi-core node so this of course gives you a lot better performance but it's also prone to program error and you'll see why so how do you divide work amongst threads so there are a couple of ideas around this one is called task parallelism so task parallelism essentially means that you have different threads executing different pieces of code now unfortunately this is a bit of a challenge in terms of correctness so how do you verify that your final answer is correct so this task parallelism is notorious for Heisenbergs Heisenbergs are bugs that come that surface now and then but other times do not surface at all so in Julia we have implemented data parallelism so data parallelism essentially means you have this large chunk of large block of data and you're dividing into different chunks and threads are operating each of these chunks so essentially each thread is executing the same code but everything is happening in parallel so where does Julia fit into all this now if you take a look at this graph there are a bunch of blocks here but the key takeaway is that each of these blocks stands for the relative difference in performance of parallel code versus naive C serial code now this essentially means and if you look at the graph there there is a 50x factor there so this essentially means that to get optimal performance on a large parallel system there is a lot of architecture specific hand tuning and this increases as the number of codes increases so that is unfortunately become the industry norm and hence it takes a lot of time to write a parallel program that can give you good performance this is where Julia fits we are going to try and make that simple so Julia has a multi threading infrastructure which is currently on the branch JN threading on GitHub and will be merged with 0.5 master version soon as soon as all the conflicts are done there is support for locks and atomics and to compile this branch you need to set a flag Julia and Eagle threading in one of the header files you can start using threading so how exactly what exactly is the threading model now it revolves around this one macro called app threads so essentially what we are implementing here is loop parallelism so let's say you have an iteration space you can split this iteration space between your threads and everything will run in parallel just as you can see the example code over there you can do this in a number of ways for example you can do you can split the iteration space of k equals 1 to n or you can put a bunch of code within a block, begin and end or you can get all your threads to call a function full so what exactly is happening here now at first there is a broadcast operation and then finally there is a barrier now this app threads macro just expands to a C call C call is the way you interface with C functions in Julia this is how you call a C function from within Julia so what essentially happens is that all the threads get f arguments and the reference and all the threads are simultaneously invoking that function f now as soon as each thread, as soon as the thread is done it waits on a barrier so that all the other threads can catch up and then your code continues after that point there are a number of things to consider while implementing while doing multi-threading in Julia firstly there should be enough parallelism in the problem so what if there isn't enough parallelism so firstly think about this your chunk of data which you are operating on in parallel should be big enough if it was small enough and if you kept throwing threads at it you would run into something called oversubscription and the broadcast and barrier latencies that are associated with calling your threads will overpower any speed up you get so essentially these are the things you have to think about there is also another thing you should think about called thread safety thread safety essentially means that you are manipulating your shared data structures the data structures which all your threads are operating on in a safe manner so multiple threads shouldn't be writing at the same time that particular bit must be serialized and so on and so forth or in the case of add threads to give you a very specific example your second iteration shouldn't be dependent on the first if it were those two operations are inherently serial we shouldn't be using threads at all so we we decided to test this out on a few workloads and does the journey work? let's see so this is a summary graph of the entire of the workload effort we put in so Julia threading is the purple bar here the one of the fourth one so as you can see it's pretty effective compared to all the others Julia threading is doing a fantastic job and Julia threading is doing a fantastic job but you can see Julia threading is even single thread is doing a fantastic job but Julia threading takes that to mess up so the applications here so another one is a Monte Carlo simulation stock correlation problem another is a Lattice Boltzmann model a fluid dynamics model so these simulations have been done on Haswell server of 18 cores per socket and NVIDIA K80 GP Kepler architecture so let's look into these Julia threading is the last but one column and Julia single threading is the last column as you can see this particular diagram Julia threading is pretty effective and on every workload I'm going to show you something called a scaling graph which essentially plots the performance versus the number of threads you use so over here you see it scales to about 5 or 6 threads and this happens due to reasons I told you earlier even though there are 18 cores and you'd honestly love to see a NVIDIA scaling graph where it gives you speed up up to 18 cores but unfortunately due to its latencies it sort of plateaus at about 5 or 6 threads in this case though we have a pretty good we have a pretty good scaling graph with good scaling up to 18 cores so all these workloads have been benchmarked against MATLAB and MATLAB accelerated with GPU so this particular column stands for MATLAB accelerated with GPU this number is about 2.96 seconds and Julia is 3.0 Julia with threading is 3.01 so just close to show you that with your existing infrastructure you can do a whole lot with multiple cores on your machine so this is the final workload as you can see again Julia threading is pretty effective as compared to the others so apologies for the blank column here we unfortunately couldn't get this particular directive of MATLAB to run but yeah Julia threading in this case as well scales to about 5 or 6 threads and then plateaus out so there are a couple of other things workloads which we have done whose results are a little preliminary we actually showed this to sort of demonstrate the effectiveness of Julia's new garbage compiler it is of course Julia which is currently on this new compiler now so earlier we had a few issues and then with the new compiler it gave us a speed up of about 5x and this is a 2D wave equation this is actually a showcase workload from MATLAB's website for GPU acceleration so that is why GPU acceleration over here is very effective so this is in part due to the fact that it is very very FFT intensive and QFFT which is QDA's library for FFT is highly optimized we are calling FFTW which they call on CPU as well Julia threading still has a bit ways to catch up on that though so does it really work let me see if I can show you a little demo at this point if I can get that yeah is the font visible great let me hold it this way ok so I implemented something called a neoclassical growth model and that is essentially a model in macroeconomics which is being solved in an iterative manner just to show you how simple threading is so I am going to start up Julia and just run this model for you let's just do this I am just doing this with one hand you can put it ok this will be done so so let's try time this it is just called me because as you can see a number of iterations are executing over here and I think it is about 250 something iterations it takes about 18 seconds so let's see what happens when we thread this and for me I have the cursor exactly where I need to thread this so all you need to do is to add a couple of words which was this partition or iteration space let's compile this again so oh and it is a good 2x speed up so there you have it folks threading on Julia how many threads is in your file this particular one has 4 threads you can just check that with the n threads command yeah get 2x speed up on 4 threads yeah I got 2x speed up on 4 threads so this could scale further this is a 16 port machine I haven't tried increasing threads and seeing how it scales but yeah what is the exact program that you are running what is it doing maybe take that question offline yeah can I take you said yes it does depend on the number of cores you have it is important to sort of vary your threads till the number of cores if you are going to increase your thread count beyond that you are probably going to get negative you are probably going to plateau you might even get negative scale more in the sense of rather than x86 rather oh I see so this current one is just for x86 so this is in collaboration with that so we are just getting Julia running on ARM so as soon as we get there we will find out what should just work what is that place yes this threading is basically working on a cores for a processor not on GPU no it is not working on GPU so I mean is there any plans of introducing that because earlier you mentioned that the machine has well and Gpx right so essentially we were doing a comparison with matlab that is accelerated with GPU because I showed you a number for matlab and yeah Julia could be excited by GPU so we are just on the core of processor if you have you are reaching close by the matlab accelerated on GPU what you are saying the question is basically that matlab on GPU so Julia with multi threading is as good of better than matlab one problem is actually it was coming close yes it was how do you turn on the GPU how do you how do you tell it to execute on the GPU I could answer that after the session thank you thank you thank you thank you thank you thank you thank you thank you thank you thank you thank you thank you thank you thank you thank you thank you thank you thank you thank you thank you thank you thank you thank you thank you le is pure Julia. It doesn't have any differences on JVM or C library. So, it is easy to install and quite portable. So, it uses protocol purpose to talk to Hadeep Singh at JVMF, packages right here. So, let's get on and see some examples. I will just bring down part of Yeah, that is the start. So, I have a standalone cluster on my laptop. To start in le, you load two packets using the package mode using Hadeep. And to connect to SDFs, you get a SDFs client and give the name of the person. So, here I am now connected. Actually, I am not connected. I just have a connection. So, it says connect this false. It will get connected first. So, use the regular file system in case PWD directly and everywhere you pass the connection. DFS is the connection that they got. Read the IR, it gives the contents for the CD to take directly. So on and so forth. You can start a file that will give you the file size and the SDFs block size. You can get the blocks information. So, for this particular file, these many blocks. So, these are the offsets. First number is the offset, byte offset. And the second array, this is the list of nodes where this block is printed. So, you could use this information as it is and do your own values and on and so forth. You can open up. To represent a file, you create something called SDFs file. And you can open it as a regular, as if it's a regular file. So, open API or the open call. It gives you IOS stream. And you can read and write it. So, here I am opening this file and write it all over to it. And you can read it back. So, that's what I have to do. Okay, short introduction to Yan with Ali. So, you create a Yan manager by instantiating a Yan manager and pointing it to the Yan resource plan here. The full number is default. So, now I'm connected to my cluster. I can do add crocs, which is the Julia API to add any Julia process. And what I'm saying is on this Yan cluster, I have one processor and I can pass them in. After I add the node, what I'm doing is I'm printing the node ID, the Julia process ID on every node. So, I got from worker 2. So, I have one more worker in addition to my master node. So, that printed right. And once you're done, you can call IEPROX and disconnect. So, as I said, it also exposes the native, exposes the other APIs, the CFS and Yan APIs, so that you can integrate in a more fine-grained manner. So, with Yan, if you use native Yan APIs, you'll get much more fine-grained access to the resources. You can optimize your resources here. But the port can be much more complex. Just to show an example, this is how you connect to, you have to create a user information and you get a Yan client. If I'm connected, I have one node. I can print some more information and work that more. I create a Yan application master and I have to register two methods to it. So, when I allocate a container, I get notified when I'm allocated. I can schedule the runs on that. And I submit my application now. So, once I have this, I can request containers. I am allocating one container now. So, on this container, I can run an application. I can specify the power line for others. So, once I launch an application, I can use it as and when I want. And when I don't need, I can stop and use the containers and finally, I can do this. So, this is, if you need it, you can do it. So, that's a very brief introduction to the APIs. You can see, so we are probably lacking in documentation, but if you just go to le.jl and look at the main source, le.jl source, you can see all the exported functions. And that's what we're going to give you an idea of what's possible. These are the exported functions. And since we use protobuf, all the APIs that are exported by protobuf are also available, though not exported. And there is a submodule of Yan, which is Hadoop. And if you import that, you have access to everything that is exported. So, I'll use this on a slightly bigger cluster. Again, dummy cluster hosted on one single machine, but inside Docker. So, this is a 10-note cluster I've got on a single machine. Everything is a Docker instance. So, I'm on a master node right now. So, I have this file from a Twitter link data. So, it's a link data. It says user with ID one is following user with ID one. So, we are attempting to do a project on this. So, what I've done is I have this small package, which I have with using any, any and few other packages. So, the other packages that I've used, something called blocks. So, blocks let's us represent something, a large entity as a chance. So, a file, if I do blocks of a file, I basically what I get is spits of that file. If I do blocks and sdfs file, I get spits for each sdfs block. And then there's something called do blocks, which lets me do parallel constructs on blocks. So, I have used these three packages. And what I'm going to show you is this data set with 1.3 GB, it's not a very large file. Let's just go to the next slide. So, what I'm doing here is I'm starting Julia with a machine file. So, a machine file has list of all the slave nodes. So, this is going to start up Julia and give me Julia instance on each of those nodes. I'm not using yarn for this. I'm just using code here. So, what I've got now is one master process on this machine and attached to Julia process on each of the slave nodes. So, I load this package. Now, this is loading the package on all nodes. And the next class takes over it. Here are lots of warnings because I'm just using the most latest version of Julia. So, after this, I'm going to read my data set as a distributed sparse matrix. So, each node is going to read whatever blocks it has. And it's going to create a sparse matrix. At the end of it, I have a list of sparse matrices which is distributed. So, what this is going to do is figure out all the blocks of file and schedule jobs to read and pass it as a sparse matrix on all nodes. So, this is done by two blocks. So, in the meantime, I'm going to show you briefly the code for this. So, I have a reader. The reader reads a record from a block. In this case, we are creating a whole block as a record. We are creating a whole block as an array. And the map block is supposed to map each record. In this case, we are going to map the matrix itself. All that we are doing is creating a sparse matrix on the array. Collect or merges all the individual matrices on each node and reduces returns of reference to the array. So, as the distributed sparse is calling all these intermediaries to create a sparse distributed information. And finally, I have this is the baseline information using power iteration. So, I am going to normalize. So, this is again in pilot. And this is going to run the characterization. This is finding the algorithm. So, that information is here. This is again distributed. So, this takes a while. So, actually I have cheated. I have limited the number of iterations, but it is still fine. So, this is from also. This is 11 million. So, it gave me a number. 3, 4, 9, 3. It says is the most influential and all the user that it has. What is the count of connections? Yeah, it has around 212,000 in and 212,000 in. So, that is all that I guess. Julia is running on each node on the system itself. Yes. How is a locality parameter? For example, like a pilot can be multiple in different nodes. Yes. When you submit a job, how do you make sure that this Julia program is attached to the pilot? Yes. So, how do blocks as a scheduler and so on it creates one queue for each from that worker on that node surface. And if a block is a multiple node, it is actually scheduled for all the nodes. And if it is processed by one of the nodes, it is removed from that node. How do you specify basically like how do you do the other? All right. So, when I did using package. So, in multiprocessing area, if I load a package on master, it is actually loaded on all the speed workers. But you are actually only saying that I want to. I want Julia to be running on all the nodes. Yes, Julia has to be running on all the nodes. Okay. Yeah. So, you can either use Yandere to do that or you can use Julia to do that. Yeah. How does it compare with the project part? We do not have it. I mean, if you want time comparison, they have not done it yet. So, I guess that is something. I would like to invite folks to partner today to open source projects. So, this would be a great thing to do. So, I mean, the one comparison that I have is a regular pi copy to Alope using this or using Alope's command line. Both are similar. But I don't know how well I am guessing this part in terms of pilot. Just do it. I was just looking up the data functions on that. It looks like it is fairly mature now. What would it take to have distributed data frames? How much work is involved in combining these blocks and the data frames? Actually, we had an implementation of data frames on blocks, but unfortunately, it is not maintained. Is it a block of code? No, it is not. Even though Alope is still there, I am guessing that it is still there. I think maybe for two weeks of work, I think that should be important. Next, we have, I am talking to some of our competitors. Quite simple. I will talk about the basic concurrent and distributed programming constructs that are available in Julia and other blocks. So, yeah. Stefan talked about this in the morning. Tasks, also known as coroutines or cooperative multitasking, is present in Julia. That is one set of API which I covered. And the second is distributed workers. Some examples of which can we just talk about. The thing is with Julia, the base distribution comes with reasonable amount of support to run Julia code across many machines. The packages which enable you to run on Amazon. We have had demos where we have spun up thousands of cores in no matter of minutes. You do your work and you release it to the cloud again. So, I just covered both of these approaches. Yeah. So, tasks in Julia back by a single OAS type of execution. There is work on multi-threading, which is a work in progress in 0.5 large. At this time, I don't know how it will map on to the task infrastructure itself. But what we have today is is basically a new UV back event-driven IO mechanism which multiplexes all your IO tasks. There is network of file and the other useful thing for tasks is implementation of timeout or background tasks. Since it is co-operating multi-tasking, there is no preemptive scheduling. So, if your task is CPU bound, nothing else works. So, it is good in a lot of ways. It simplifies your code. You are going to deal with blocks and a whole bunch of that kind of stuff. High-level API again, in the morning, we have seen at-async, at-sync. So, typically the at-sync block takes an expression. So, as an example here, you are going to collect the results of some URLs you want to download. So, the at-async block can do the actual fetch, which is a network call which allows other tasks to run. And then you process it. The processing, of course, prevents any other tasks from running. And then you push your results and collect it. So, in the task communication, there is the model of produce slash consume. So, two or more tasks can sort of work in lockstep. The a producer tasks could be reading of a database or a network or a file system. And it could be generating jobs within the same Julia process. So, produce block, still a consumer will lose the value. It is like a queue of length one. And you could have one or more consumers who would consume whatever values produced from task T and that block till a producer has the value. So, it is sort of synchronization mechanism in that sense. Julia 0.4 has support for channels. The channels are type aware. As an example here, you create a channel of inch 64 of size 1,000. So, this channel can restore objects of type integers or 64 bit integers and with the maximum size of 1,000. So, some of the calls would block if the size 1,000 is reached. Anybody can add to a channel using the quick call that task will block. So, that was about tasks. So, the multi-processing, we have the same model called leveraging multiple cores on a machine or distributing computation across machines. The user API in this case is more of a remote function execution as opposed to message passing. If people are familiar with MPI, they are keeping data in the same the different programs sort of run at the same stage. Whatever data it receives, it works on it. In this case, it is more of a So, the ADS, the low level ADS are like a remote call on a particular processor. You execute function f with variable number arguments. The function f can be a closure or a non-destruction or a function in a module which I wrote it. So, a remote call basically just executes a function and gives you a handle to the result. It is like a future. A remote call fetch would block till the function finishes the execution and returns the result. And we got a couple of macros. Sorry, it is not right properly here. Add spawn, expression, place and expression on a worker. It just cycles through all the worker processes. Add spawn, add runs the expression add the particular process which are indicated with it. Both of these return the future which we call the remote reference. So, the multi-processing model which set up is the initial process. If you start with a little, that has got a TID of 1. We call the master process or the driver process. It launches worker processes by cluster managers. The yarn cluster manager is something that can be just demoed. So, cluster manager is responsible for launching workers and providing information on how to connect to those launch workers. This is only two main things it does. So, in this case we are using yarn. Yarn has got EPS for starting the worker processes. So, you leverage that and then you provide information on how to connect all these Julia processes into a cluster back there. So, by default all the Julia workers are connected to each other. But you can specify in the ad process here that you want you want the only the master to connect to all the workers. The problem in the first cases if you have a thousand nodes it results around 500,000 easily connections. And you start adding some of the limits like the number of open file items and you are going to read those system limits and all that. In some use cases we may not even require the workers to communicate with each other. If you have problem statement require that the master generates jobs for the workers to work on you can just opt for a master slave connection. So, ad process API which adds workers there are two cluster managers which are shipped as part of base. The local manager adds processors on a single node. So, if you use ad prox with just an integer value that will add that many workers locally. And SSH manager uses SSH to remotely connect to a list of posts that you specify and launch workers on the machines. So, both these are available as part of base. There is a package called cluster managers.jl which has got a code for various clustering technologies like a sun grid engine or there is a called slope that you use a little not much. So, the API is part of the API. So, the distributed API we have the high level APIs is a matter called app parallel for which is typically used to distribute a large number of small tasks. And team app you would distribute very compute intensive tasks over your workers and the remote call remote function execution is where remote call fetch and a bunch of other. So, here I have what I am doing is I am just like a point loss and it is a point loss. I run it for 10 to 4 or 8 times. The app parallel macro takes in a reducer in which case it is just an addition function and it splits it across workers. So, if you go line by line if I have got more than one worker first time moving all the workers just so that I can run this code multiple times. And we run it without adding any processes in which case this code runs on the current master process. The first one is just to prevent compilation of code. Any benchmark you do in Julia this is one thing you need to be careful about. You just run your code once so that all the code parts get compiled and then you do the actual time. So, I just run this and it took around 4.3 seconds. I have done the same thing again after adding 4 workers from one. It is a pretty good speed up. This is called 4 code with 8 hyper threads. Hyper threading usually doesn't have too much of effect on really Cp bound stuff and we can try it out. There is no change. So, 4 real codes and that was a pretty good speed up. There will be no more examples due to workshop tomorrow. So, just run through a lot of good examples. So, that was that parallel form that would take the range and then split it across workers. So, P map is like a parallel map. It takes in a function which will execute all the workers and you can pass it lists of the data you want to work on. So, for example, P map f just fuel this everywhere function 1 with parameters 1, a, 2, b and 3, c and so forth for each of the workers. Remote call would execute function with the variable argument and just talk about this with the application. Yeah. So, remote references when they are returned from the remote call functions are more like features. So, typically a remote call would execute a function on a remote process and store the value in the reference and a handle to that value is passed back. So, but if all these APIs return a handle to a channel of size 1 at a time it can hold only one value. The remote references themselves can be serialized across processors. They are pretty small. The actual data is not sent. So, the calls for remote reference of the API is wait. So, you could start a bunch of, execute a bunch of functions on remote workers and then you can wait on all of them. It just tells you where the data can be fetched. It's a possible race condition because if you got multiple processors testing call is ready and then you do a take or a fetch somebody else would have removed it. So, you need to be careful about that when you are using it in the context of distributed computation. It sets a value to the remote reference. It blocks if the reference is full take. It removes and returns a value when it blocks at its empty and fetch just returns the stored value and it blocks at its empty. So, there are two types of synchronization which you can do condition variables which have basically wait and notify calls. So, once you have a condition variable and this is then triggered in the sense that only tasks that are waiting on a condition are notified. And if you want a level triggered, you need to keep some state in which case you can use the channel type. It can be a size of size 1 and you can test if there is data there. So, though the two types of synchronization are available right now, the 0.5 with monthly trading I'm sure we'll have a few more. The other relevant packages are MPI.jl for additional MPI style parallelism. The one nice thing interesting thing about this package is you could also use MPI transport, MPI transport for regular GUI API calls. Let's add parallel form and team app. So, the typical setup is we use these blockage but some of these MPI clusters have their own high speed interpellage plan other technologies and we can use MPI for shipping our messages. Shared arrays, leverage, shared array are single load. The arrays have to be of bits type. Distributed arrays are for working on arrays which quite large and can't fit into RAM on a single machine. So, that would span multiple nodes. Each workbook creates the local part and you would use PMAP or some mechanism like that to work on the local part and then fetch the work is done. The package for database.jl which has an API for launching machines on EC2 and starting workers on them and then once your work is done you could use resources. And that brings me to the end of my presentation. Any questions? Yeah, I think the producer consumer is more historical so that we can probably have a debate and do away with the the remote types and channels that again the 0.5 I think there will be some amount of work towards nationalizing that part. But I think at a 1-1 to 11 the in-process stuff that is the tasks and the distributed computation just sort of two different models and each will have their own cabinet. But certain amount of certification, yeah I guess we can have it. My understanding of online is it's a whole lot of message passing and it's slow. So online has its uses in the telecommunication world. No, I'm saying as in there's a benchmark I had done quite a while back. So the same task when I, the same load what could get done using one code, the C code and I took around 12 codes. Everything else was same. So the whole one thing, one of the background of Julia is it's high performance in terms of for numerically intensive tasks. So we can, I'm not saying we can't take the good ideas online, but I'm saying there will be some I hear you on that but I'd also like to just say the MPI crowd, right? The traditional human computing crowd their concerns are are quite different. The kind of optimizations they look at is when there's network calls going on can I load my CPU. So the direction from which they come from probably it will be different. That's fine. And for a lot of enterprises today this model is definitely it's much more developer friendly. I think the key is flexibility, right? To be able to support different models in Julia itself. But the problem is if you have too many models then it becomes a question of what choices you make. If you have one single model to develop then the framework can be different so that we can say developer does not have to worry about that. Otherwise it becomes a question of head. Depending upon my problem statement depending on my domain I need to select that to add on to that, right? You already have like tasks which are run on the same machine when you added threads to that then you added the worker slave model so as a developer I have like four or five options but the thing is that you have a task, it doesn't matter if you run it on this machine or somewhere else. So Stephen already introduced the idea of tasks can I take the same task run it on my machine then I can run it somewhere else. So that is how maybe we can just So there is a package called message utils where I implement in nail tasks so Julia has a quantity for channels or not in code distribution. So I think some answers from that it doesn't have any kind of thing. We can look at having that kind of product in base it will be great if you guys can get in on the discussions. It's always useful. How does the scheduler decide to switch from one task? So it's not preemptive, right? So it has a queue and I think Jeff can talk a little more about it but it's basically just runs through the queue which task is available right now and then executes it. And the underlying libUV layer notifies the scheduler depending on the device. So libUV is an even-driven IO engine including timers. So that's IO or you want to sleep for a couple of seconds. So one of the tasks is just run severely one after the other and if any of them take control of the CPU and they'll input a computationally intensive task. It's not going to switch. It's not preemptive. I hope everybody has collected the data forms. Okay, there are active two different versions of the Ruby platform. You can choose to fill one in submit. Please. Thank you. Thank you, Lanshna. I'm able to select this. I'll use this. Okay. Yeah, that's amazing. Yeah. So part is available upstairs if you like to make sure. Okay. Okay. Okay. Okay. Maybe it should ask them for a lot of users like Spark user, a lot of scale up. So maybe you have a lot of scale up users sometimes. So if you're from Python, then you'll be able to use it. Okay. So R is what my score is. This will be through Scala 2. R is more natural. Because the R is what basically you're just trying to get one point to Spark and so on. Okay. As of how it's scale up. But if you want to write in R, it's possible. But there's two processors. Yeah, we are not doing any distribution using R. It's just to make your black mark. Yeah, preliminary. So it's small enough. But otherwise, then you have to so basically they use Scala and Spark for most of the things. For other developers, you're planning to see where it will be. We could do all of the prototyping. Okay. So can you just take your code so we can What can you take my email? Yeah. We are friendly. We are friendly. We are friendly. We are friendly. We are friendly. We are friendly. We are friendly. Bonds is So so the first team last team, we are not getting suitable. So it's a good competition for me. So competition is always good for the consumers? Yeah, it's good. We are getting So when you study, what are you going to do? So we are mainly into like, so we have a couple of products, one of the product is for managing the auto cycle, auto cycle cycle, the industry matters to us. There we need some kind of these analytics, where you know we can like, we have many dealers are purchasing similar kind of inventory, you know, from many different distributors, manufacturers, how can we do that, that is in the next one. These are really, again asking for the similar kind of product, or like approximately these very quantities, something like that. Also, you know the mapping of this whole products of N number, which are how to manage the master cycle. So I think very similar to that, just that problem. Not just the entire one, the matching of the addresses, names and so on. So we have like, one of the modules involved with N-Skala and so on. So that's why, you know, I master this. So is there any, you know, benefit to move directly inside this, and there's all like, startups and departments and what are these things. So I think we ought to do some kind of actual prototyping in the industry. So how come it's in the industry? We have a separate analytics team also. They store all the data, the data sciences and all the other things. Data platform, analytics. Analysts are separate. They store, they get data from the third party things, and they do a preliminary analysis before they give it to us. Okay. You do the bit more sophisticated. Good to see you guys. Nice to meet you. In simulation, you can't do it. Inside the, look inside the bit driver location. First the stage change. First the whole app to be in here. And then you'll have a polyline which has native points as well as observable, which will give it to you while you're custom polyline. See, polyline is only for the DRIPS page. Polyline is only for the DRIPS page. Custom polyline is for the main app and the driver TDP page. The driver is driving on that. For the observable. If you look at the DRIPS, you can see that inter-polyline has an observable. It has an F condition. DRIPS. I'll have to leave for lunchtime. I'll talk to you later. This should clear your field quickly. Go through the list. I'll just tell you if we have handled this or not. Auto update is broken. Yes, we know that. If I hit reload in the driver TDP page, bones not set properly. Yes, this is also nothing but auto update is broken. The start marker is missing. Yes, we need to handle that. So if you can take the start marker is missing and the auto update, that would be awesome. I can update the driver TDP page. The app crashes in driver details page few minutes after the trip engine. So if you just look through the console, like what error it gives, you'll get to know about it. This is an eye out for it. Driver details. On the right hand side, we need to show the speed, time stamp and the direction. So I'll take that out. We can't do that. That will be maximum. So remove that. I can't. It will be very difficult. Instead, what I will suggest you to do is to develop from a company which does not have RxJs and stuff like that and then merge that branch into master and promote it. So we'll have those conflicts. We'll have to manage that. It will be a bit difficult. Are you at office? Is Wagni at office? Just talk to Wagni as well. Obviously you want it to be promoted because you'll have a better idea. If you're promoting master, the deployment will happen only in the night, right? I think by that time we'll be completed. Sure. Kind of flaky. Works sometimes. Now it's not. Now it's not. This is yours. We could actually do one. It's just one and two. So that's how you do it. Yeah, we can't do that. Yeah, because maybe it's the first time right now. Yeah, no, the server is the first time. The company is the first time. It should be fine. No, no, nothing much. It's the second one. It's actually the second one, right? Next time I don't do the first time. The server is the first time. The company is the first time. Yeah, you already got it. The server is leading to this kind of a game understood. No, no, I don't think it's the first time. I think I'm going to answer it. It's kind of something. Yeah, it's the first time. It's the first time. It's the first time. It's the first time. It's the first time. This is the first time. Yeah. Yeah, good. Sorry? I'm talking to you. I'm talking to you. Sorry? I'm talking to you. Yes. You can do that. Do a card. Yes, check that. Yes. Welcome back to the post-conference session. I'd just like to address on behalf of all the organizers for the delay in lunch. This has genuinely been going to happen before and we're trying to investigate and find out what happened. Seems like the powers of my outage in this area seems to have messed up a lot of the logistics with the Gator. This will not repeat again. So I hope everybody's had a good lunch. And if you missed out on lunch, please let us know. We will make sure that we organize lunch for you. But otherwise, welcome back to the post-conference sessions and I hope you have a good conference for the rest of the day. Post-conference sessions. So the current session, the next session is going to be about machine learning with Julia. We have three speakers. We have Dinkar here from decimal point analytics. I think he has a fascinating story about how he started using Julia. In fact, it was one of the first programming languages he learned and I think we're going to have a great tour of what he's been doing with Julia. Interesting story. Then we have a large-scale recommendation system in Julia with Abhijeet. He's right there at the back. And the third talk is by Professor Kannan. That's actually not on machine learning but you don't always have free talks for every session. So you have to kind of mix and match a few. So that one is teaching with Julia and some very interesting experiences. So Professor Kannan has been an advocate of open source. He's been sort of instrumental in, he's seen Sylab adoption in Indian universities and he's going to talk about those experiences and getting Julia running on their $100 laptop and a bunch of other interesting things. So that's all I have to say about this session. Maybe while Dinkar is setting up, just let me know when you're ready. I'm just going to go to the whole audience just to see what's... So how many of you are actually using Julia or have used it in any form or shape until now? Just give show of hands. Okay. So that's about a little less than half maybe. That's pretty good. I was actually expecting much lesser. It's always good to be... What other languages around the room? Like Python, I'm sure a lot of fans should go up. Yeah, R. I'm always surprised at how few people use R even though everyone tells me that they use R. Okay, what else should we try? Python. Python is like the whole room right now. Matlab. It's about as good as R. SAS. Couple of people, okay. Scala. Scala, yes, of course. One, two. Okay, couple of people, it's Scala. Hey, we're doing better than Scala. Well, I guess this is being approached. How about Go? We've got a few hands on Go. It's called Java. Java? We don't know Java. Oh, amazing. More hands actually, yeah. All right. C and C++. That's still alive, okay. All right, you ready? Perfect. So, as Viral mentioned, probably a change from the heading technical stuff we've been going through in the morning and what's going to come. I'm not a programmer by training. I wrote my first program about five years back. Back when I studied, there were no programming courses for mechanical engineers. Then about five years back, the first program that I wrote was in Matlab. My wife taught me how to write a function in Matlab. That's how I started out. And I coordinate this area of machine learning. I work for a company which we have a division which makes products for the investment trading domain. So, we make the products using machine learning. So, I got into this area and for a year, I used Matlab. And I found that, you know, as a new BNU programming, I found in Matlab extremely easy to learn, extremely easy to work on. So, it's after a year that my colleague introduced me to Julia. Then I started working on Julia. I spoke to Julia from Matlab. That's about three years and more than three years back. And I have not used anything else after that. So, I've been working on Julia for the last three years and that is really the first programming language that I worked on. So, probably that's what we all wanted to highlight. Hello? Yeah. So, I guess to start with in Matlab one, so when I switched over from Matlab to Julia, one issue which I faced at that point was that there was not enough information on the web on Julia as there was in Matlab. For someone who's starting out, obviously that's the only way to learn. Now, I mean, over the period of time, I've seen that the information on the web has dramatically increased. And now I don't see an issue of not getting, not finding or having to ask someone for how to write a function or what function will work well. On the other hand, what I found really good was that in Matlab, when I used to write functions, it was kind of slowing down and scripting was faster. I do not know whether I was doing something wrong. I don't know whether it's the same right now. But at that time, scripting was faster than when I have a lot of functions. Which is not what, you know, which is, we didn't really fasten it. I wanted to write functions. So that was one thing. And when I switched over to Julia, I realized that here it doesn't work that way. Functions are better. You write new functions and that's better. So that was, so that was the advantage that I saw when I switched over three years, three and a half years back. And so I worked exclusively in Julia as I mentioned and I used a couple of machine learning packages. There are quite a few packages in Julia for machine learning. And I really can't evaluate the pros and cons of this package versus that of others. What I have seen, what I want to point out is that the packages in Julia, you have code that is transparent. Which is not the case with some of the languages that I saw. I mean, MATLAB is a case in point. You know, MATLAB has a lot of packages and obviously more than that, what Julia has right now. But some of the packages, when I want to see the code behind, I wasn't able to do that kind of impeded progress. So in machine learning, what I've seen is that, you know, machine learning is a set of tools, right? Many of you might know. And these tools can be used by people who want to learn machine learning and who knows maths. You can learn it. But the key thing, I believe, is to apply the tools innovatively in a certain way where you can fit into your domain, whatever application you want to apply it into. And for that, you necessarily have to make changes in the code. As in the package code. When you use a package, you have to make changes and you have to... There'll be a number of times when you want to make changes to the package code. I want to quickly mention a couple of examples of my experience of doing this. These are the two packages which I have used. And if you look at the code for these packages, they are crisp, they are short, they are elaborate. They have enough functionality. I mean, they're still being built and there are times when these change, people are contributing. So, for example, decision tree. I mean, I hope people are familiar with basic machine learning. So, when you use a random forest within decision tree, you have a package in Julia which allows you to do random forest. Most of the time, the tree is a train. You train the trees. The trees are generally trained based on a certain activity function. Now, you need flexibility for this activity function. You don't want to have a single activity function. And usually, any standard package, any standard training mechanism will train it based on accuracy. How many instances are classified correctly? That's the usual objective function used for training a forest or a tree. But you might want to do something else because when you use machine learning, when you use, maybe I should give an example. Let's say you're facing a problem of predicting how probable a person is to have a hard problem. So you have a lot of features and you want to build a decision tree which kind of helps you to see whether he's at a high risk or a low risk. A typical machine learning problem where if you have a lot of features, and features would be like the cholesterol level or the fitness regime. It could be a lot of such features. And when you try it, you might not really want to stick to the accuracy or duty function alone. You might want to bring something else, some other duty function. And there you need to do a lot of changes. And these lot of changes become very few changes when you go to these packages. That's the biggest advantage I've seen. You just have to write a couple of lines. You can get to the package, write those numbers, I just can then write a few lines to change the duty function. And there's validation. Validation is like when you train, you want to see how well it is performed. And I think this is standard, the standard methods of validation unfold across validation. You want to change it. You might want to wait. You don't want to wait everything, all the instances in a similar way. You might want to wait certain instances with a certain measure. Like for instance in finance, I work in finance. You don't want the accuracy, as in the number of instances that are classified correctly. You might want to base a model on returns so that you have a global model, you don't have a local model. So if you want to base it on returns, you'll have to change the duty function. So that's the biggest value that I have seen in my experience of using Julia, you can just change code in packages easily, which is helpful. And that's probably about machine learning. I don't want to get into it further. There is the other package which is used with text analysis, which is basically sentiment analysis using textual features. You look at documents, look at all the words and try to extract information. It can help you in decision making. That's text analysis. So that's another package used with extensiveness. The same thing goes for that package also. If you put, if anyone is interested, you could look at these packages and see how crisp the code is in there. The second part of it, which I want to talk about is about data manipulation. So when you work in the finance domain, more time goes in data manipulation than actual model. That's what usually happens because you deal with large arrays of data. And I haven't worked on a number of languages, but what I've noticed, what I've heard from my colleagues who serve Julia is that it's much easier to manipulate arrays and manipulate data to make it in the way you want. So that's the other advantage. I could just quickly show you something. This is a very simple thing, but I just want to show it just for you to understand how to deal with large arrays in Julia. What is important here is you'll have to do a lot of input output, as in reading from the file on your hard disk and writing into the file. So that is very easy in here. I mean, you can do it from any programming language, but the syntax is very simple in here. So I have a file saved here, demo.jl. I'm just picking up a path, that particular path. I'll read that file. So since I'm using $path, because path is already defined over here. So you have 24,000, 25,000 rows and four columns in here. Now let's say I want to get it out of the column labels. So this is assigned to a variable data. Petrol shift plus. Petrol shift plus. Can you see it now? See it now. I'm giving it the first one. So now I have it without the handy so that I can kind of make it clear. Now let's say the fourth column has probabilities. I want to pick out all the rows which have a probability of greater than 0.7. So we do that. So I get all the rows which have probabilities more than 0.7. So which is around 9,000 rows. Let's say I want to make the first column integers. So I have theta. Now I have these as integers. Let's say I want to convert these dates I mean this is a very important thing because in finance we deal with lots of dates a lot of times we use. So date conversion becomes a major work sometimes because you get dates in various formats. So here I'll just use the package. So here's the package. So I just have to put the format, the date format here which is like this y, m, d. For this. So that's done. Now I have dates in the date format instead of the screen that we had earlier. Now I can kind of use it. Like I can use it to find the number of days between two rows. Let's say for instance data, let's say 10,3 which is the 10th row, the date column minus theta first row. So I get 28 days. So I can basically now do anything with these dates. So now if I want to do something like if I want to concatenate to it's very simple. If I want to write it out to a file it just writes ESP, data, that's done. So the reading and writing which is the best part for people like us who work on a lot of data. And that's probably what I want to say. Even if you want to see something that has been done here, I can try it. Any other questions on using Julia? I probably don't need a pipe. So I just started with machine learning and so far I've just built static models where there's a data set and I tried to do it. So I just started with machine learning and I started with static models where there's a data set and I trained it and I get it. How do I make something dynamic? Julia has some packages that make it dynamic. As in Nudina comes in and it trains and whatever. Sort of a system model every day. You want to automate that? Not automate, it's more like dynamic. Automate, I can write a batch file or something. But is there something like that in Julia? Already it will be. Yeah. So what we do is once you have the model ready, once you have the code model ready we put the new design. That's what we do. What you're saying is to make it dynamic, any ways you could give new data. So the problem is this. My data is 100 GB and every day I get 1 GB. So if I'm running the whole thing and I'm basically appending this data and then running the whole script again and again. The new data is appended and there's some sort of connection only on that. No, is there something like this? I'm starting very, very. You could explain in any way. So first of all, you train the model with 100 instances and you have a new instance. And if you're going to drop off the first one. I don't think it's going to make a huge difference. Yeah. I mean, usually. Unless it's real. Even if it's real, if it's an outline you don't want it to make a big difference. Yeah. Because you want the model not to fit to that data. The idea is not to overfit. Yeah. So usually when you add one single data point it's not going to... I mean, we generally don't do that. You can say change the data if, let's say, our 120% has come new we add that in training. Okay. So how does your model work in your finance? Can we take this off? This process is called online learning and there's a separate package that you can... That's not a Julia question. I have a Julia question. Okay. Can Julia work in data streams? Stream. Stream of data. Data streams. The data that is coming in... Yeah. You can. I mean, you have a database where it's... No. It will be online streaming. Yeah. I haven't done that. Yeah. In general, I just support for data stream processing in Julia. Yeah. Do you know? Sorry, I was just... Does Julia have support for streaming? Does Julia have support for streaming? I mean, like, Spark streaming or... There are various streaming packages actually that people have written in Julia and actually you can just... In pretty much basic Julia code, you can just stream things through without sort of making too much of a fuss about it because the performance is good so you don't need an elaborate system. But there are online statistics, streaming statistics packages that people have written which make it a little more convenient. One last question. I think there's one. Yeah. So I wanted to ask this. Do there are packages, like, which have been released now? You told, like, you've told normal machines have got... Now, do they have neural networks, deep neural networks, CNN, Naranet... Neural networks are probably under the way you think of. Instead, it's there. No, so that's... I'm talking about multi-layer... Oh, I understand. You have the package for mocha.gl, which is the most accurate. The multi-layer is... But not on GPU as Julia currently... It does, actually, but let's take this... Why don't you talk to me or someone in the T-grade? We'll... I think we are short of time, so please move on. Yeah. But things are going on for the rest of the day, right? Yeah. Sorry. My name is Aruch and I am from Bukkar. I'm the most responsible for living your breakfast and lunch today. I've had the most emotionally distraught last, like, half day, probably the worst professional day of my life. I'm really sorry for whatever happened today. I mean, we take complete responsibility and please do not... I mean, you can hurl all the big bats, all the muses directly to us. HAZBEET is probably the most organized conference organizer and we really did not give up to the standards. I mean, I have one more chance tomorrow. I hope I can make up for all that went wrong today. Really sorry. For any other feedback which you have, please share with us. We would learn from it, hopefully, and crack the thing tomorrow. Sorry again. I'm waiting for a lot of you hungry in the midst of the conference today. I am sure you are going to do that tomorrow. Thank you. I just... from Australia after completing my master's, all I knew was a little bit of math and sub-math lab and I was having a great life and finding the job here. So luckily, thanks to Zanab, I was introduced to her in Fifth Elephant and, yeah, I am doing well since then. So the point is, Julia is very welcoming and friendly to everyone. She has a language and as a community. So now I have to welcome the assistants. So I just want to make this as a little bit of an intermediate to... intermediate level talk. A little bit to advance. Now, everybody know what the recommender systems are. It helps us in cross-serving, converting browsers into piles, and leading in loyalty, etc. The example I am taking is on a movie recommender system. For example, there are four movies and four users and you can see that there are other ones which the users haven't seen those corresponding movies. So I will be using this as an example. Now, let's have a little bit of a background. So this very good matrix. Let's call it as some R-grating matrix. It's usually very sparse. It's like 99% sparse. My thesis was the Netflix problem. The Netflix dataset had around half of the users under 1,000 movies and it was 99% sparse. There was only 1% of ratings what I had. So you look at how many can imagine the scope of... And we very much we factorize the rating matrix into a user matrix and a movie matrix. And when you do a U-crossing that is there and then you can set a threshold as rated model 4 and not seeing the movies you can go and try those movies. It's as simple as that. But a little bit of math isn't here. So this is a rating matrix. Number of users, number of movies, you want to factorize this into U and M as I said. You can observe this. There's a K here. So U-cross K, K-cross M gives you U-cross M and this is an example of Dimension 19 collection. In this movie as I said in the previous example was around 20,000. So I have shown certain graphs in which I got the best results for values of K50. I didn't need it. There was a lot of noise. There are two methods I've used. I've tried it out with SVT and I've tried a method called as Alternating V-squares. So Alternating V-squares also does the same. And how do we do it? We have this is analogous to AX equal to B just that X and B are also resist in this case. And we alternatively solve for U initializing a random M and the updated U as known and then we take M as unknown and we keep doing this until we become much. For example how AX works is, let's say for user 1, he has seen the movie 1, 2, 3, 5, 6 and 10. And we are going to initialize we would have initialized in some way we would have initialized 10. This is our unknown unknown. So this is it. So this is like ordinary V-squares. You do it across all the users. Now you know your U alternatively you will find M in a similar way. Let me show you how to initialize but totally ready matrix. This is a sparse matrix how it looks as so this was the matrix you saw in the example. So zeros are the ones which you want to predict. So let's do that. For all parameters, we need to give R, we need to give let's say we run for 30 iterations and the inner dimension 4. It's a very small matrix so we better use all the 4. So no, let's do R. So now you have your predictions. So this turned out very good. We have predicted that that user if he sees that movie he's going to rate 4 so we're going to recommend that. Alright this was using ALS. Now what happens if we use SVD. This is a diagonal matrix so if you observe here this SVD has constructed the original matrix. It's of no use to us right? It's a zero certain way. So what does this happen? I used a full rank SVD share. So now if we do the same thing by reducing the rank let's try to use it's not zero well it's almost close to zero but on the dimension we're talking about 20,000. So let's say for some rank 50 you're going to get some pretty good results out here. No SVD is not wrong. SVD when you use it in a full rank is definitely wrong. Because what SVD does is it's going to factorize your matrix, input matrix and it's going to give you an orthonormal decomposition of the same rank. So all the eigenvectors what you have are orthogonal to each other. So these vectors just 50 of them for around 20,000 are enough to span the entire space represented by your r matrix. So that's how SVD works. So you're wrong when you're using SVD in a full rank sense. You should always reduce your rank. You can start on it. Because I started at 10 incrementally I just went up. Oops, I'm moving that here. Factor analysis what it gives you this is the decomposition matrix for example. This arrays of number makes no sense but let's just pick one of the column and if I sort them out you can see a pattern here. So you might argue how are you sure that whatever your eigenvectors you can treat them as your basis vectors. So when you sort there is some kind of a pattern here. How many of you have heard or seen the movies on the other side? Have you even heard about it? You can't see this. Some little arrays are flirting with disaster. Bandwagon. Oh really? That's really funny. Majority say otherwise. Okay, how about Schindler's List Titanic, Tasselpanther, Sausage on Perumption, Sherby Dance on the other hand. There's a clear pattern because I can see some pattern here. For this. Well, looks a bit cheery but yeah. There's a pattern whatever. This is a clause that I had for SVD. As I mentioned this was on the Netflix problem. After 9.50 you start overfitting. But in ALS when you decompose your inner order vectors they're not like orthonormal. So you're just going to play too but you're not going to really overfit comparable results. So it's not like SVD is bad. We were trying to use SVD in some ways. So this game is all about dimensions. So the trick is to get the dimension right and this is your right dimension. Is it a trial and error? Pardon me? That's the error of RMSI. This is what it was. It took around 10 seconds, 400,000 points, 5 minutes for 20 million points. It's parallelized. SVD is not parallelizable. ALS is empirically parallel. Very easy. And these are things to what we are currently working on. And yeah you can always go to the package and contribute. But I just have another very nice thing. Thanks to Shashi that he comes in. And yeah, you may want a couple of these packages as sure. So we have come up with this nice UI. This is an interface where you can just go around to show you these random movies where you can rate. And let's say randomly I don't even know those movies. But the interesting thing here is somebody was asking so if you can observe this here. So you can see the values change. And let's say I have given these ratings and I submit hopefully this will work in progress or all that. So these are the recommendations based on the ratings I have given. So this is like an online learning like the question somebody was asking. Incremental learning is what you have to take care of. We cannot generalize that in a package. You know your data, you know what model you are using. But you will have to fit into your code. How much time did it take for you to code this? You already knew the math so how much time did you feel to Very interesting question. First day at the office I was able to do the entire thing. I think we should show the code. We just like kept the file. Yeah, it's around 100. It's an unoptimized code but it's just it's fast enough so we didn't even bother off. That's the package where all the code is hosted. You didn't go on to do it? You need to say what thing I go on to do. You said what we are currently working on. Yeah. On these things. This is the roadmap. Do you have a principal component extraction somewhere in the back key? Do you have a principal component extraction somewhere in the back key? Yeah. I'm using it if I want to wait until it's done. So on. On the 20 million data set it takes the entire model. There's not something we do very regularly. So probably this time when you accumulate few users you retain the model. So weekly ones you just do it in 400 seconds. What's different in ALS compared to SPD? Is it a non-negative factorization or? Yes. Let's thank Shruti. And there is also other things like Sympires. Sympires are package for symbolic manipulation. So we are differentiating sine x squared six times. I can increase it to run it again. I can see the output. So you can also for your own types in your own packages you can define your own output methods. So there's this function called writeMind. You just add a method to it and it just starts working. So I created this thing called myType where I'm showing each character with a different color. So if I create an object of my type so it's going to call this writeMind by there and pick a color for each character. So how do you see data? So I'm using this RdataSets package. It's used basically to teach working with tabular data. So I'm loading a data set from there. The type of this is actually a data frame package. Stefan showed this in the morning. So it's the same thing but in i5 then it looks like this. I need to be able to say this. It's likely people. Sure. Is that good enough for you? Okay. So the other package for the Julia is this thing called badFly. So it implements something called a grammar of graphics. You might be familiar with this if you have used ggplot in R. So it's a grammar for constructing visualizations from data. So over here I'm plotting from this iris data set. So for different species of the iris plant, I want to let plot the sepals and sepals width and color the dots according to species. I can also color the dots according to say pedal and its color make a continuous scale for being into the plotting. So there are these features in badFly that you can use to make very complex plots. So the other plan package is interact. So the jupyter notebook is not complete with at manipulate macro. So here you see a simple follow where n goes from 1 to 100 and in the body I'm creating a random matrix. So I do this follow by preparing this macro called at manipulate. So what happens is n just becomes a slider. I can increase or decrease it. As you can see the size of the array is varying. So each change to the slider to get the update to the array. So I can use this it's not just a slider that you can use with it. I can use it with say a text box. I'm just using the previous type of writeMind method. Or you can use it with like a bunch of selection widgets. If you just give it a list it will become a selection widget. Again I can have interactive plots that work this way. One of the very useful things you can do is explore mathematical functions with this by varying the parameters. So this function is the beta function and plotting the probability distribution of the beta function. I had no idea what the beta function does before some unshored data could interact. So basically what happens is alpha sets how far left the bump is and beta sets how far how short the bump is. Yeah. So another thing you can do is vary the parameters of a thing, kind of internet or something. Okay so that's supposed to render as late. Sorry about that. I can also do Yeah. So that's the MacJax library. That's not loaded I suppose. So that's an animation. I'm just waiting the phase of the sine wave according to the time. It's just like I have a switch and the switch works. A switch involves this FPS function which generates updates every 30 times a second. My office is working now. It's okay. So in that case are you regenerating the entire plot or are you able to manipulate the entire section? It regenerates the entire plot. But there's some old cleverness going on. I'll come to that. It doesn't draw the entire plot however. It changes only whatever leads to change. It generates one section of it. It generates the whole plot and then takes a different with the previous one and then figures out only the plot changes and then updates the plot. Okay. So let's calculate other plotting packages. Byplot is binding for matplotlib the Python plotting library and there's vega which I'm very excited about. It has to run vega.js It's another plotting package from the uwash database group and there's winston which you can use to plot with gdk. So there's this other package I'm working on. It's called escher. You're looking at it right now. So this slide shows made in escher. So it lets you basically build web uis entirely into the area. The way you build is you take simple parts that escher provides you and then you stack them up and make more complicated uis. So I'm going to show some examples of this. So the most basic thing you can do with escher this is not even escher. It's the most basic data type in escher. It's called LM type. So what it does is it just creates a DOM node. If you're familiar with JavaScript you know what the DOM is. It's the document object model. It is what your html pages become in the browser. So Julia can actually just generate the DOM and then basically draw the DOM on the browser without going through the intermediate phase generating html and then leaving it back into the DOM. So I have this code editor here made in escher. So I can edit the style of this thing. So as you can see I have been given a pattern of one here and filled it with colored blue and I can change these things and run it and that's that exactly reflects the DOM. Similarly it extends to SVGs. So if you're already starting to create these functions. So I've created an empty circle function which abstracts the job of creating a function and then I have a bunch of functions here. Oh this project is kind of too small for this. Okay. So it also allows we to create custom html elements of my own choosing. You can create your own custom html elements these days. 10 points. Okay. And here's the latest custom html element I created but in escher it's just one download but if you inspect this you'll see like a million of these things but you will have to generate two of those things. So that's how escher works but there are very high level transactions. So escher can convert any Julia value into a UI. For example here's a simple string that works as you can see and I can create markdown text. That also works. Let me add some heading as you can see it's called getting converted into escher UI and even code is an escher. It's just one showing some code is just another element in escher. So that's how this code southern thing is working. So I can create a code and actually the code slide you're seeing is actually a function that is inside this UI. So I can use the code slide to create one more code slide if I want to. So of course I can use simple as well. This is compost a very powerful vector graphics package. You can actually do very complicated things with this. So the other thing I showed you in the escher introduction is actually generated in Julia this picture from a single fish. So you can do things like this with compost. If you just google the original geometry you'll find this book where the construction of this artwork by Nancy Escher is explained. So this is the Cierpinski's triangle. So it's a recursively defined triangle. So for each triangle you divide them into three more sub-triangles and draw an n-1 j triangle there. So if I start at 0 it's just one single triangle you can create those things and this is the definition of the n-compose which is actually exactly prevailing. So you can also plot catfly plots and Escher can draw it. You have these things for layouts so I just created a bunch of boxes. They are not given a layout that's why they're being shown as code over here but I can stack them vertically vbox, stack them vertically and align them to the center or even horizontally. And since I'm using a powerful programming language to create layouts I can do other things as well. So I just shuffle the order now that's how you make layouts and you can have higher order layouts where you have data and stuff like that and you can make the pages come below the tabs and also there's a type typography scale that you can use right away so all these fonts you're seeing are right inside Escher you don't have to look for I mean spend hours and hours to get your products good. And finally interactivity how do you do interactive stuff in Escher so we use this thing called reactive programming there's a package called reactive.am which was written to write an interact actually it's a very general package which lets you deal with values that vary over time so you can have uis that vary over time that's just how we analyze and see if it's key triangle with this slider so I'm just drawing this okay sure okay I'm almost done animation controlled by the switch that's all the code that was required to write that and there's these things called behaviors these widgets are nothing special you can make widgets out of anything in Escher so here's a button you just have to click on the behavior to it you can create a counter with it here's a counter so I have these two buttons and they each sort of emanate these constants plus one and minus one and they go into this input signal called delta and then you fold over those inputs and show the counter in the build that's how the counter works and it's I wanted to show more demos let me see yeah this is an animation one of my colleagues he and I await it's just showing the behavior of birds blocking I guess I'll stop here and take questions and I'll finish these questions so do you only endow that about general so for example Gathlight blocks thanks to all these people Gathlight blocks you can write them to PNG files SVG files, PDF PGL so the interaction won't of course be available for you without a browser what are you using behind the scenes SVG it's SVG and HNML yeah so that's the issue SVG SVG package is going to be somewhere as part of first-ever Julia somewhere of course so this is a package which we do 3D in practice graphics in the browser using the animation the animation you see over there is basically the but in 3D so it's a gif of that I'll show you an example later so 3DS was basically a WebGL rendering package for Julia so WebGL is this technology which lets you use the GPU from so you can use your GPU and you get all the power that comes with the GPU in just the browser so this lets you share 3D content over the Web like in a Julia Notebooks you can use it in Julia Box or over a natural server it's pretty nice to do and so this is a wrapper around the 3DS JavaScript library so the 3DS JavaScript library is a real next library it lets it very popular it basically lets you make WebGL easier to work with so it eliminates a lot of boilerplate that your WebGL has and you can do a lot of stuff much easier in 3DS and 3DS or JL is basically a wrapper around this library for example the queue is around 100x4 in just JavaScript and WebGL part is around 30x4 in 3DS.js if you're on 10 in Julia I'll show you how that works let's actually take a look at the queue this is the queue this is a very basic scene I have a queue in the center and I have a light for one corner and the queue is color red that's all it is actually it builds the scene right now so in 3D graphics the first step is to build a scene like you have a CCC scene object which has an initial setup like it renders a download event for you on the screen and sets up a WebGL renderer and you add all the other things that you want to split to the scene so this is actually the example like this is a function that you use to add a scene and then you make a list and you put whatever you have to see inside the list so when you just try an 8th scene and you can just go back and look at the STL and JavaScript and you can see that there's a canvas element that has been created yeah there's a canvas element and if you look at the logs you can see okay I see but there's actually a WebGL renderer being output there so the next step would be to add a camera because unlike in 2D scenes we actually have to know where to look at the 3D scene and cameras are very nice to have you can rotate around the scene and that actually is the other dimension you need a camera to set so in pjs.jl adding camera is a useless function you just say camera and then you put whatever coordinates you want a camera to replace that and then you just add a scene in the list so once it goes again like you have nothing to see so you get here nothing right now so the next step would be to add the cube so in 3D scenes the way you add stuff you create a mesh so in a mesh you actually have two things a geometry and a material so the geometry is basically telling you the shape you want to draw like a cube or a chord and the material tells you the properties of the geometry like earlier I said the cube would be red right so over here what I am doing is I am creating this box geometry which is a cube geometry and I am giving it 1,1,1 for width, height and depth and the material I am saying like I want to be red and I want to the kind to be a lambert material so I did what I was saying and then I just I just put a chord here to the scene so now I actually get something drawn on the screen but it is black but we got a cube to be red right so all of this is done like we have already the lights to be seen so the material is specified in different on lights to show color so next step is to add lights so how do you add lights you can add 3 kinds of lights in 3D and this is one of those kinds so I am just adding a point light and I am telling the chord next which is 3,3,3 which will be a chord or the cube because the cube was a side word so I just put it all together and so you just see like I have an in-scene function and the inside are mesh which has a box and material I have a point light and a camera and then this renders like this so I get no visible scene again so it's around 10 lines of code and that's all it takes to draw a scene in 3 days so this scene as such is interactive you can rotate so you see once it is black because there is no light there I can pan it I can zoom in any scene but this is not odd as I showed earlier you can use UI elements to interact with the scene so you can do this so how you do this is you make a slider so what we are going to do now is in the previous scene we have a static cube what we can do is we can make sliders and we can make the size of the cube update to the slider so I just add a couple of lines so if you look at the function over here this is basically drawing from the slider earlier except that I have the size size replace by the variable I want to and I have a slider here which I made so this is the size of the cube so I have a slider here and I can just increase it and the cube increases so again this is like rotated I can zoom all that stuff is still there and I know this too so the next step will be to make animations if you think about their logistics sliders but instead of having updates about the UI elements you are doing it I add a certain type of it so this is the lift of the rotating cube I made using this so this is the actual cube so in the actual scene I have to rotate the cube also so how I am doing this is like using the reactive this is again measured so I have an event loop here which is saying like every once in a second I just increase the rotation by 0.5 and then I just plot it again so this is around 20 lines of code and I rotate the cube like that so how does it all work so we are going to the website for a bit so I will just repeat it out there so the first step is polymer polymer is related to Google and it lets you create the scene so you can actually have custom STL elements and you can wrap a lot of functionality inside it so for example like in the example I just showed I have a Cadets element and then we will show Cadets so what I do is I combine 3DS and polymer to make 3DS custom elements so these custom elements are wrapping 3DS functionality into a component so I can say stuff like 3DS box and that's the way we are and all the obstacles inside it I don't bother about it but I need to do the high start link so now that's the web stuff covered we will do it from Julia for this there is a library called patchwork so this will be basically let's say I have a virtual down in Julia and I create STL elements in Julia so this does differ on the server side and it updates only the required elements in Julia so I combine patchwork and 3DS custom elements so now what happens is like patchwork creates the custom elements in Julia and then I don't have an action notebook we will output these STL elements we will just take over and whatever else we will add so again the good thing about this let's me use the goodness of patchwork I have the different capability so for example in the Qtube update, size of Qtube update only that part of the system will be sent across to the browser and the browser makes a change there and the attached is not withdrawn only the Qtube will be updated so this improves performance a lot so yeah this is the text tag so we have 3DS at the top creating those elements using patchwork patchwork outputs these 3DS elements which then is polymer and then 3DS the graphics library to draw stuff so yeah this is a good thing so when I say 3DS.box 1.1.1.1.2 it actually goes in this element which is not x, y, z, z, w, x, m, d and then renders as STL elements over there and then part of the part just stays over there so I have a couple of dev over so this is compost so this is basically my main project in Julia Summer of Cold this is compost 3D so compost 3D is basically the compost library so it's essentially a compost library to 3D so you can create complex figures from simple derivatives and it lets you take a context and use measures related to that stuff so pretty powerful stuff but this is not completely done yet so I will be finishing it up soon and so now I have a couple of demos for this the first demo I have is a surfboard so this is a surfboard we are using 3DS.jl so this is a plot of a function sine x into cos y so I can do this in Julia Summer of Cold this is what I have done so this is a function and I am wrapping it over and I just have a surfboard here so I just wrap it around and yeah this is the surfboard for the function sine x square plus sine y I can just change this I am going to see see this is like y square and I can hit enter and this will draw the operator one so I can similarly do mesh plots also so this is a mesh plot implementation so we can see that I can rotate them so the good thing about this now you can just do all this in july notebooks and this is july also you can do this so it will be nice and other example I have so basically this is a in 3D modeling you have these .obj files so this is basically model object and now like I have capability of loading mesh objects into the scene so you can see we made this pull request so you can do stuff like this so this is a cat so I just have the same load cat.obj and do it by itself and so this is the impact as you saw spoke about 2D stuff so this is the 3D 3D 3D I have a behavior here ok I know so basically you guys have this idea here so this is like the so I can update the request and so on so now I have a request I will update it again so this is keep going on so this is the last example I had which I promised in the beginning so this basically boils in 3D so this is the simulation of bobo merge so you can rotate this you can zoom so this is the space going around basically this is simple as the original example was done by Ian Dutting in Azure 2D so I was excited with 3D it is in the first week yeah I decided on Shashi and Saigandh for mentoring me I am ready for like ok so thank you the cache that you put in the file that is drawn in which one it is drawn in Azure but it is in 3D it is drawn I think it is basically a setting of the 4 buttons like this is a file I own library being made so there is the openJ files and there is in the previous system it was really nice like a couple of people when we work months we have openJ and they are focused on making everything work together so there is a digital organization as such so they have all these packages which are fine lined and they have been all lined up so as you can see there is this homogeneous messaging so we just convert that to a 3D activity and then we can just do it in this work not a question really but the this summer was the first Julia summer of code that was sponsored by the war foundation we did a Google summer of code and I think it is pretty cool that both of your mentors were students from last year and they made it forward that way and I think it works out really well this is great though this is for the evening we have just a couple of announcements one again whoever is attending the workshop tomorrow please regain your wages fresh wages will not be issued and the second one is the kids are obsessed so we can have some pizza all over tea there is actually a lot of pizza so you guys have to finish it in this tea break also the feedback form please this was the backup I came for lunch so there is two lunches it is too much food now people are already starting the pizza or this order are we going to check a picture yeah let's do the picture yeah yeah yeah yeah let's do the picture yeah let's gather all okay where is the picture I think the big one yeah yeah yeah I will just go for a picture but I can do two okay yeah okay okay okay okay yeah I will just go for a picture yeah I will just go for a picture and I will go for a video okay I will go for a picture and So that's what, it's still a work in progress, like I showed you what it is, but it's like I tried the same thing on surfline and that wasn't working very well because JavaScript integration isn't really a problem, so I will look into that and hopefully I will get that up and it probably can do something, become an Azure and when you start, you are having that code thing and you are updating it, that is what I want to do soon. I think I can do that, I think we will be soon, basically for coming, like the visit direction will be to actually build a platform every day around this, that's what everybody wants, it's the one. So, and then so, this is how you do it, and it is very well done. How tightly this is integrated with the existing UIL libraries, so can we use this like in the other interactive libraries in the future available? Everything is done using Azure, like I showed you what's in Azure, so I am using Azure UIL. So, Azure lets you build UIL frameworks using just UIL, so all of these I can actually show. So, for example, if we have already existing UIL, so our own application already, so it's like using jQuery, JavaScript and all that, so this is just HTML, so we should be able to build just have an iFrame in there. So, if you want a web component, that's also available, it's on GitHub, so you can just build a component directly if you want, you can build another app around it. So, the web component, basically when I wrote it, I modified it for my needs mostly, and a lot of it has been implemented, but not everything is still there, whatever is available in 3JS, but you can still use it, so basically you can do 3JS or HTML and it will just work. Okay. Cool. Can you just scan it back? Yeah, sure. So, and your majoring? Yes. So, what's your final page that's going to be around this time date? Huh? Yeah, sure. Let's skip that. Go. What more than that? That's one. Okay. I'll leave it there. I'll leave one there. Okay. Yeah. Thank you, sir. You're the most important one. Oh, that. But I'm sitting on that. Okay. Let's see. I'm not listening. Okay, good. I'm not listening. I'm not listening. Okay. Yes. Yes, sir. Good. Yeah. I think it's a very old cartex. It's old. No, it's old. It's a very old cartex. No, it's an old cartex. Exactly. So he said the wire was short. So get it out for something. No, no, no. More like that. Come here. That's it. I'm sorry. Thank you, Pani. It's a funny ghost of yours. Yeah, it's a funny ghost of yours. Yeah, yeah. Time. No, 4.35. 4.35. So we have time. 5 more minutes. 4.30 in the middle. Yeah. Exactly. Thank you. Thank you. Okay, help. No, no, no. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. All right guys, the next session is actually, no I know I'm just checking, so we have one, we have a slight change in the next session, the original talk was analyzing stress on aircraft wings with Julia, it was actually meant to be an IOT talk and there were a couple of problems that were going to be discussed, unfortunately the first problem was analyzing stress on aircraft wings and the second one was the problem that Ajnish was going to talk about, it was supposed to be two things but slight change in plans and we'll be talking about the use of Julia for systems for power systems and that's Ajnish and after that we have a couple of talks on databases and we'll end the day, thank you. Hello, I'm Ajnish, I'm working with Snyder Electric, it's one of the core companies and we manufacture a lot of power systems and related products and support interface for the ease use of this data, domains and utilizing and harnessing power system aspects in industry and currency purpose. I would like to close the screen right now and show you a small, a little bit informal way, not just saying that, we can see something in the screen, I'll get the complete screen out there. Okay, this is some kind of characteristics, we are really streaming from the device online, a live from the system, it's nothing but a small device here, hope that we can see a small device, I cannot really pull it because we have short cable connected to that. This is connected to some kind of self loading mechanism where it can measure its own power consumed in the entire system. We use this kind of equipments and the meters or relays or IDs in power system, let me speak a little bit brief about the power system where we have a lot of communication protocols involved in it and why do we have it, first of all. Going back to power system, we have something called automated system where everything in most of the system is monitored under one screen called SCADA. So when we go through the first slide here and I'll just take control, yeah, thank you. I'm sorry, we're not ready with this kind of intermediates. The first slide I would like to say is, this is typically one simple schematic of our power system where we have a generation unit going on transmission, distribution and again transform it to a different voltage levels, powers required for the domestic and industrial purpose. So this itself will be very monotonous and hectic to monitor in a huge grid or a huge network. So what do we do? We have a centralized station which is a schematic what we have shown here as a SCADA system. So here you can see a couple of routers, communication hubs, servers and the IEDs so far symbolically represented here. What these devices are doing? They are having kind of transit users or sensors they connect to the system on the voltage level and the current level and they convert the language of the power system into a digital signals. So in return we have the displayed meters or the information available in form of digital way that is you have numerical relays. The numerical relays itself will not suffice our requirement here. So what do we do? We again further boil down the mode of communication. Each guy cannot go to hundreds of relays or thousands of meters to recall the value. So of late all of the electricity ports have come up as a smart way of catching the data. That's what they have on the GPRS. They transmit the amount of energy or consumed in the domestic level to the sensors and test the value. That is a kind of example of SCADA system but it is evolving today right now in our own next year. So we finally go with how it is monitored or utilized here. Before this, this is exactly the SCADA has its application-wise implemented of the attack system. We have each vendor making its own proprietary tools based on the device manufacturers with all the communication protocol interfaces. We store it in a Microsoft based SQL server or wherever it is compatible because industry has a lot of Microsoft's dominated. So they prefer all the solutions out there. Coming back here, we computed using MATLAB or Equalite in R or in Python and streamed on or classified the data according to what is required, what is critical, what has to be eliminated, and what makes the thing as billing. Everything goes as a billing part in electricity port. Everyone knows what kind of power they utilize and how well it has to be billed accordingly. So this is taken here by MATLAB or Equalite in Python or not Python much but R or MATLAB is taking a role today in our industry. And then coming back, we have a proprietary application interface that will show up the schematic what we are doing and it will be very live in its presentation all the time it is running. So you have more power, the entire line turns red, the optimum power turns yellow, you have a lease or underlying power which is supposed to be the basic. Utilize or consume power and be blue. This kind of schematic we present there so that the consumer has statistical data projected for the supplier so that the supplier knows and what rate he consumes and what time of the day and how is the industry running, if it is the industry will have or if it is the domestic one is the peak load and what happens during the peak loaders, they have to generate more power for this. So we try to work around the small. This is a disadvantage actually as shown now in this we have different tools at different levels. We need license for each of them to use and it is a complex architecture to mesh between two different types of application and interact the data. Finally analytics and fetching will be a huge process and task it is not a delay or a day-to-day activity. So last and foremost is third party device. Suppose we have a company A who manufactures IEDs and leaders and company B who provides the interface, he doesn't really fall in the same place or platform to fetch all the data. So that gap is something that is a real challenge today where people have tried to make it up irrespective of the make of the product, it should appear in the screen to give all the lives. So now first and foremost a practical workout stuff was placing Julia at the pay level. Here Julia is actually fetching the real-time value from one of the communication protocol that the device is equipped with and it even can further compute the data with the raw information what it is getting and make the statistical output for the user for the next stage of utilization. So how we have done this, this is a kind of a device what I have put up here. It works on what does protocol, it's one of the industry standard protocol for automation and our systems. So we have what does our jail library, that's directly working on a physical layer to connect or fetch all the information. This is a PCB so we have a client set up on here this intern returns all the value of Julia and here we are using the same analytical representation of how we are catching the data here. We could process some real-time data with different characteristics for the device supports and some were actually computed to plot on a graph on a real-time. Now they can give a small explanation of this. We have a P plus, P plus, it's a kind of characteristic of value of energy we are reading. It's something that is going to be a predictive value for a given interval and present is something as the real-time value of the project. So what happens with respect to prediction and presentness? Prediction says on estimate of 10 minutes or 5 minutes this will be a low consumption and present is something which is called chase and read that because it's a real-time consumption order. So at any given time we will make sure that the present is not going to exceed the prediction value because if it exceeds as a consumer you are taking more than what is being supplied to you. You cannot all of a sudden switch on 10 easers and 20 boilers at home with 15 TVs in the environment. It is just a random calculation. The probability that so many number of users will consume this much of loads that is the way the electricity generation takes place. So this graph on a system level gives information to the generation unit that the prediction should be so high or the current consumption is so high so what should be the prediction? They evaluate it or forecast it in a longer run in a given interval and make sure they have a sufficient energy generated. What we find in this is it's 1, 2 and 1, 2 maybe I'll come up in the slide after this so we can position Julia here where we are actually fetching the data from the runtime system. Here we have database where we can use the same language and different algorithms or different libraries that are embedded in the repositories to use it make the system easy to fetch its own value and compute it further and take it here for all the statistical output for the next stage of delivery and there analytics and monitoring and control it also supports on the physical layer where we can even write back the commands to the device and control the device this was tried out but the interface is quite... you have an internet which did not... here we take up easy causes within the short span so we could only do one kind of... this is sufficient we have the interval set as one minute in one minute we can see the present value is ramping up every second because the device has sampled here for one second dynamically we are fetching the value from the device at every second and writing it here and prediction is the peak value we can see some cap also here which is always higher the estimate is slightly higher than the actual value fixed so this is well within the cut-off suppose I just want to change the characteristics here I will give a load on this so you can see the curve dynamically change it it has its own response time depending on the response time we can see the curve taking a different shape because the load has changed on the device so when we extrapolate it to a video system we have only one language that will help us to fix the value compute the value, process and also enable the user to control it and no proprietary or third party limitations falls into the criteria and there is actually a benchmark for us to use on a larger scale but of course there are a lot of challenges involved like how do we overcome the UI part or make the presentability too much a dynamics because the proprietary tools are very dynamic so we have a long way to go on this any questions? this modbus protocol is based on rs485 protocol no therefore we have rs485 and tcp also yeah tcp but does it support rs232 protocol no no no the standard of modbus supports only 45 I am just asking if Julia supports this rs232 protocol Julia supports cdn232 so 232 directly cannot be taken from the device so we use a converter to make 4522 it is an external hardware program no no just leave that rs485 I have a controller the peak aviar whatever it is there and on my system side there will be Julia but is there any library so that I can yeah it is available right now you can directly talk to the device on rs232 if you have a proper command set driven to fix the values what is the name of that library? this right now we have implemented for tcp there is a subsequent repository with the help of the cdn also I am going to request to take this discussion with Julia it is ok let us move on thank you the next session we are going to do couple of talks on databases these are going to be very quick and lightning talks actually the first one will be Prajip who is going to talk about various different databases and the second talk will be from Nishant about how his experience with implementing mySQL support in Julia and then if we have time we are going to have an extra talk on testing with Julia but it looks like we are going to run out of time and we may just do that as part of the workshop tomorrow Prajip are you ready? we have a few things so as we are outside there will be a quick introduction on what databases we support with Julia so it is going to be here so basically here there is a list of supported databases so if you see Julia supports majority of the databases so you can see relational databases you can see NoSQL and you can see your in memory databases there are some of the top ones here but the key point here is it supports variety of databases so what we are telling is some of the DB drivers basically you can see here mySQL driver that can work with mySQL and if you see ODBC or the JDBC driver it can work with majority of the databases to human see shortly and then the JDBC driver that is available you can also you can see through the Apache Phoenix SPR there and then there is support for both players so basically there is a feature-wise comparison so of course there is a separate topic of course there is everything so there is also the various other features each of these driver provides for example the pre-play the drivers there most of them then support for many queries in the mySQL driver, support for in that case unfortunately we are protocol so basically one of the ideas here is to have a kind of common interface there so that we have the same kind of interfaces for all the databases you will see that I am saying this there is support for proxy and then the supported packets so if you see as I said before for the ODBC and the JDBC drivers some more or less it can connect to any kind of packet so we did some kind of benchmarking it is not a big one we are actually going for billions of records that is still the problem unfortunately we cannot get that into the same time but that is something that we are doing so that we thought of at least again that you operate like you are just looking good on the billions of records so if you see most of the drivers here are sort of you know very tough drivers in terms of their performance they are doing very good so there is the same thing that I have said before so if you want to see how you can actually connect to a database it is very simple in most cases it is just 2 or 3 or 4 lines of code so in the MySQL thing you just have to connect with your basic connection parameters and then just call out execute and of course MySQL also supports repair statement so in which case probably another one more line of code so it goes there again it is 4 lines so it is as simple as this that is the thing and if you notice here one of the things I said earlier you have some kind of very common interface layer if you see here you can see there is a difference in the interfaces here so that is one thing that we are working on and probably we have a common interface across the databases so in which way you know so what that means is that you will be able to switch between databases very easily without having to change any of your code it is giving data frames right the right hand side both of them are they retaining data frames in fact in fact all of them are retaining data frames so in terms of roadmap as I said we want to have a common interface layer across all the databases support for logs then of course we do have support for you saw in the earlier list that we do support MongoDB and what we have there also we would probably want to have some kind of a common interface layer as well and probably we will also add more databases into the list and also another thing which I am not tension here is probably we will also do this benchmarking for all the databases and publish parts before when you use you have some idea of what you are getting we need more and more contributors so any of you what you can have short introduction on the databases any questions common interface is a design which can be accessed so we know that R has this library dvi for example which is sort of common interface for accessing any kind of original database and coded in it so in fact there is a dvi package even in Julia but unfortunately not all the drivers actually come up to that we want to get there as soon as we can but there is a package but we don't come from not all the packages to come from so that's where we want to get to okay thank you very much a design to handle streams get next to loop correct what I am saying is that beta frame is one usage model which works for people like me but there is lots of application especially if it is written in Belgian books then you don't want a streaming model in that case you just need to I am going to talk about MySQL.jl MySQL.jl is a Julia package which lets you execute SQL commands from Julia so the objective here is to talk about three things MySQL.jl usage it's a really simple API this is not much to talk about it so I thought of adding a little bit about its implementation and some things that might not be really talked about in Julia so the second thing I am going to talk about is the Julia C interface so and the last thing I am going to close with is improving performance so you might think why you would want to bother with the Julia C interface as say a data scientist or someone working at the high level I think there is a lot of code already written in C all the decades and this Julia C interface lets you call Julia get data from Julia and I am going to talk about how to get it as a Julia data structure like say an array so I think that is a useful thing to know because you can write packages in a few lines of code that do a lot of things by just calling Julia this is a basic API of MySQL.jl there are some more functions which I am not going to talk about just for the interest of time so you include the package by saying MySQL and you connect by saying MySQL connect the user name password and it returns to you a MySQL context which you can use in your other MySQL function so this is a wrapper a handy wrapper called MySQL execute query it pretty much does everything that you need it like if you pass it a select statement it returns a data frame if you pass it an update statement it will tell you how many affected rows you have affected suppose you don't want the output to be a data frame you can specify it in the third argument saying MySQL array and you get it as an array of arrays like this you can also go through each row of the tables using an iterator using a fast loop like this so you say MySQL row iterator and you will print I am printing each row here I am iterating through each row the MySQL APIs that I have written I have not really put an error check because that would be an overhead if you are calling it say 10,000 times but if you want to check for error you call MySQL display error and you pass it the response response that is given by any of the MySQL APIs and this error you see here that is from MySQL not from Julia the rest is a status from Julia but that is from MySQL saying that some tables doesn't exist and you disconnect by saying MySQL and disconnect that is the basic API now I am going to talk about the Julia C interface I think the Julia C interface is really simple and easy to use so we will see suppose you have a C function it will query in C and I had to write a wrapper for this in Julia so a function has 3 parts it has a name, it has a return type and it has some arguments so how do I write this in Julia how do I call this in Julia it is really simple so there is a C function there and the Julia code is the Julia code is here so if you keep looking back and forth there is a return type here the arguments are here and the function name is here but you have to also tell from which library you got that function here so there is a string here this is actually a string, it is assigned to a variable and it is basically like lead mod bus sorry, live MySQL and then you pass it to the arguments so this becomes a wrapper so now you are more simplest and best example I should have just shown print up or something from MySQL code itself so that is about functions now how do you deal with C data types you have C data and you want to read it as a Julia area you want Julia to understand it or you want the other way around you have some Julia data and you want C to understand it so let's say you have something like this it is a simple pointer to a pointer in C it is basically saying this name here should represent this cached star star and that is MySQL row that is how each row in MySQL is written actually so in Julia that would be simple just replace the stars with pointer and type this becomes type like this and you have MySQL row, pointer, pointer and C care it is as simple as that if you have a struct like this what do you do so if you keep looking back and forth you will see a pattern so you will see that a star becomes pointer and it is a pointer to itself so you have the same thing there and then we have just declared this there in that type in this thing and then unsigned long becomes C U long so in the same way float becomes C float double becomes C double ind becomes C int so I have a bunch of question and answer kind of things so we will just see them quickly so let's see a C function give me a pointer how do I get a Julia array from it it is a pointer, an array in C is a pointer how do I get a Julia array from it so I have a pointer here that is a C pointer 64 bit integer so all I have to do is say pointer to array and then I have to give the length so C just gives you a pointer and usually the C APS give you a length unless you are getting a string which is null terminated if you are getting a bunch of numbers you need another length so this pointer give me back a Julia array and it is 1, 2, 3 it is just an array of 1, 2, 3 another question how do I do the reverse how do I get a C pointer from the Julia array I have a Julia array how do I get C pointer this is really simple all you have to do is call pointer under the array that is it this is just an example from the MySQL code so this C call here this has a C call inside of it and it requires a pointer to the MySQL bind it is going to bind the results there it is going to put the results there so this is how the prepared statements work so what I am doing is I am just calling pointer that is it just pass it there and it works another question C gave me a pointer of type wide how do I get a floating point value from it so C uses wide for its what, genericness or whatever point to anything this is a wide pointer all you have to say is reinterpret that is a very well named function say reinterpret pointer float and I get a float pointer so I now have a float pointer how do I get a float value from it I say unsaved float I say float is like saying star variable in C anything anything in C is unsafe this is not like a limitation it is just saying that it is unsafe Julia won't know about it this is something I just want this interesting this is from the modbus while I was working on modbus which is just the device thing and the floating thing so the API there it gives you a floating point it gives you a floating point number but it gives it to you like this just gives you bits how do you get that into a floating point number right so if I just say float of that if this was too well and if I say float of that it is going to give me 12.0 that's it I don't want that I want it to be decoded into a floating point number so how do I do that in Julia it is really simple all I say is reinterpret 3.1.4 that's 3.1.4 in the IWE format in binary if you can note it to bits the last one so you know how C frees memory right so C frees memory C doesn't free memory it doesn't have a GC a garbage card you have to free it yourself so if a C call you call allocated some memory Julia doesn't know about it so how will Julia manage memory it won't actually but you can get it garbage collected you don't have to free it yourself so the way you do it is really simple although this looks a little big but it's really simple you call finalizer and you pass it this is available to which you assign whatever you have allocated and you say when Julia is going to garbage collect this also call this also call this C call so this is freeing this result so I can't just wait C call it because there will be a call instantly I don't want to call it now this is basically syntax for an anonymous function this arrow thing so I want to call it when the GC kicks in the GC kicks in and says reserve this out of scope you don't know how we're using it I'm going to clean it up so while you're cleaning it up also call this so that whatever memory C has allocated clears up okay that's all I'm just going to quickly go over performance okay why performance first of all so what we usually do what some software developers usually do is make a prototype in a Ruby or Python and then decide it's too slow and then make the production version like C++ because it's faster the type is more static or whatever so how to deal with performance how much time do I have three minutes okay okay I'll just finish this so when you're dealing with performance the first question you have to ask is what should I improve what parts of my code are taking the most amount of time how do you find out which part of the code taking the most amount of time so there are tools for doing this there's the macro at time there's profile view and that is bin I'm just going to show some screenshots about that so this MySQL example query when you pass it like a small table like this it actually does some work converting it into a data frame this is the thing that takes time and that's what we're trying to optimize so what we do is we give it a 10,000 row array because performance is most effective at scale rate so at scale we found it took 5 seconds and then we looked into where we could improve and I'm going to show you how to improve that so basically at time it gives you how much memory is allocated and over here it tells you how much GC has happened really no GC is happening because we've removed all that if you have a 20% GC then you're doing really bad this is another tool called profile view each color represents a function this function is calling, this function is calling that way like a stack and the horizontal length represents how much time it took and if you hover your mouse over it actually it's a big image showing all the functions in your code but I'll just show the screenshot so that you can see the mouse pointer and it's telling which function this thing represents and then this is a tool it just shows you how bad, tough a program you are this tells you if you see here I'm assigning a 64 bit int value to an unsigned int which could potentially be a big bug so it finds out problems like this and this is typical of what GCC would send you, it will tell you you have a variable ITER you've not even used it that way I'll just say one minute on this a simple take just use const in front of your global variables you'll make it faster another difference so Julia has concrete types and it has abstract types abstract types are like polymorphisms like decided at runtime and that takes more time so if you know the type beforehand you can mention it to Julia I could write any here and I could say array of any but if I give it a concrete type like this plus you can also initialize memory for the array how long do you want the array to be instead of pushing to an array as if it was a queue and another thing what is this here I'm just saying assign to this a 0 of type this because if I just say 0 assign a basic int 0 to it and somewhere I assign a u it and the type changes one thing is avoid changing types in Julia so if you want to avoid changing types you have to write things like this you can just google performance tips in Julia and it will take you to the read the docs page and you can read more about this in conclusion although this was a mysql talk but the conclusion is in Julia you can do high level programming data science and all that stuff everyone knows about that but also what you can do is you can interact easily with C and you can improve performance without having to re-implement your code in a lower level environment that's all we take one question where the next speaker sets up who is not on the list so you talked about specifying the type the type of functions which do multiple dispatch is there a performance in fact if you specify the type or not when you do multiple dispatch you usually mention the type you've written the function once and you wanted to when you write the function if I don't declare the type of the arguments when it does the multiple dispatch it figures out the types that are given that it is compiled any difference happens at compile at compile so there is it's not slow okay great alright, thank you sir we've heard about how to develop since morning we were discussing about how to develop code on Julia we even had a talk on how to connect Julia to Julia database and even our optimization we had a talk on optimization too so now we will focus on testing so now we will focus on how to make your code reliable so Julia gives you a lot of packages for testing we will discuss these packages today first one is code coverage Julia has a built-in support for code coverage in order to use the code coverage let me tell what the coverage actually is let's say you have a test file and you don't know how much of your code is actually being covered by your test how much of code is actually being covered by your test case this is where the code coverage actually comes in so you have to use the code coverage what you have to do is start Julia and this is how this is the command to start Julia and once you start it Once you are inside the Julia app, you just run your set and now exit out of Julia. Now log into Julia with the regular mode. Now that you are in Julia with the regular mode, you can use a package called coverage. So the first thing is you have to actually change your directory to the directory where your project is located. Once you have a function called process folder, what this function does is, once you run your test program in the coverage mode, Julia actually creates docuo files for every drop-down file that actually ran in your test file. So this process folder will access these docuo files and it will actually store the entire entire summary inside the variable. So we have just named the variable as coverage in this case. So we are actually extracting just two variables for the summary. We are extracting the number of lines covered and total number of lines. So now we have the number of lines covered by our test page and the number of lines in your program. So it's just a matter of, you just need to take a percentage of it to check the whole coverage of your project. So the next is, there may be a situation where you want to check the memory allocation of your program. So this is how we do it. I mean, once again, Julia gives you embedded support for this. You should start Julia in the memory allocation mode. This is how we do it. Julia, the drop-down, so that's how we start it off. And the next step, what you should do is you should run your program twice. I mean, not just once, you should run your test scripts or run your program twice. Once you're done with it, you have to exit the memory allocation mode and you have to enter back into Julia in the regular mode. So, I mean, both of the purposes, you just have to enter the project folder and call either one of these functions depending on your Julia version. If it is point three, you have to call the clearMilok and if it's point four, you have to do profiler of clearMilok. So what this does is, I mean, once you run your program in the memory allocation mode, Julia will do a similar process, what we covered earlier. It will do a similar process. It creates .men files for every program, every .gen files that you have in your project. And this function actually analyzes the .men file and it gives you the output of, I mean, where exactly the memory is being allocated and if there are any generic views, you can actually optimize your support using this. So these are the next things. So Julia, apart from these, Julia even provides you with a lot of examples of which you can actually measure the performance. The first is what Nishant discussed at time. So I did not really confirm that once again. There's one more function called time. So there's a difference between time and time. Time actually returns more, I mean, it returns a different set of parameters. It returns the value of the expression, the value of the function that you have actually called again. And you have the time, I mean, whatever the time does. And you know the garbage collection time is the extra thing, extra parameter that returns which at time doesn't do. So after allocated, it does the same. I mean, after allocated, it gives you the memory allocation that your program is actually taking. So you can use after allocated to check whether how much memory is being allocated for your program. So next is Vint, my friend. Did you really cover the Vint? So I will not go through it once again. Yeah, that should be open. Oh, sorry. It was based on test. Yeah. I won't go through it again. Okay. This is a bit in the middle. This is a bit in the middle, I think. The package for based on test. Based on test actually, what it does, it makes your test cases readable. So this is a simple thing. At test, one is good one. I mean, let's just say you have to prepare the test cases for your, to test your project. And you want these test cases, test cases to be displayed in a very variable format. This is where the based on test comes in. You can just compare your test cases against the predicted values. And if you give the output, and you can even, you can even throw in the test of error messages in these test cases. I mean, if the predicted output is not equal to whatever you do or say, whatever you expect. So in, I mean, the based on test gives a very less approximation. I mean, there are two functions related to this, and both are related to 14 point numbers. So what this does is, this just approximatively, I mean, this just approximatively compares the number with, just approximatively compares the value, compares the integer value with the 14 point number. So fact check. Fact check is actually a requirement of a based on test. I mean, this gives a lot more flexibility for you or based on test. It's even, it's even much simpler to use. The, what improvements it gives of based on test is, you actually have a lot of, you actually have a lot of functions in here. Like, you, I mean, you can add, of course, you can compare the exact value of it. And you can compare the rough, I mean, roughly, whatever, plus, which we saw over there. It's even present in here. And apart from this, there are many other, many other functions here. Just that, just that, these kind of functions are present in the based on test. That's right. Okay. Okay. Yeah, that should go. If you have any questions. It's up tomorrow in the workshop, but we'll be now. It's great to be here. So I'm Jeff Bizansin. Paul and Stefan started creating Julia in 2009. It's great to be here. Thank you all for coming. And also thanks to Shashi for suggesting the topic for my talk. So I'm going to talk about why Julia is fast. People talk about performance in relation to Julia all the time. So I'm going to try to delve into that a bit. So, you know, ultimately, all right, so how does performance work? I mean, ultimately, I think the speed of everything is kind of limited by the laws of physics, right? There's a speed, there's a light speed limit, right? And there's some, you know, for say your car, there's some theoretical maximum speed that can go dictated by physics. And also your CPU, there is a theoretical maximum rate in which it can do operations, right? So talking about performance, actually, I think it's much easier to ask instead of asking why is it fast, to ask why would it be slow? Because why aren't we just getting the maximum speed that we can get? You know, why doesn't everything just go light speed? How come I can't drive 100 miles an hour all the time, right? So I think that's an easier way to answer the question. So what causes these kinds of slowness? And my answer is going to be this various forms of uncertainty is what causes slowness. It's something to do with not knowing what's going to happen, having to respond to many possibilities, rather than just doing the one thing that you want to do, having to deal with many, many, you know, irrelevant things, is what really slows you down. So as an example, as an example, here's something that's slow, right? This is really, really slow if you're sitting in this. And why is this slow? Well, because if you're sitting in one of those cars, you don't know what's going to happen in front of you. In front of you, you could just stop. Some of them could run into the street. Some of them could turn. All these things could happen at any time. You don't know what's going to happen. So you have to proceed very carefully to watch for all these things and check for them all the time, right? That's why this is slow. And I think the exact same thing happens in programming languages. Instead of just doing the one thing you want to do, systems often end up having to check for lots of possibilities all the time. So in the extreme case of that, in programming, is if you run in an interpreter. So an interpreter is a program that takes a program as its input. And it also takes the input to that program, and it runs the program that you give it as it reads it. You know, you can feed it and get the input data as it goes. So in this kind of execution, you're sort of constantly rechecking the program to, you know, see maybe a change. You know, as it goes through a loop and repeats it each time, it rereads the code every time, right? Oh, maybe this time around the loop, you know, something different will happen. Maybe they'll change the program on, you know, with I equals 10. No, it's still the same program. Well, with I equals 11, maybe it's going to be different. No, it's still the same program. But it has to waste its time, you know, re-interpreting and re-figuring out what the code should do. So in this kind of a system, this is sort of the maximum dynamism you can get. You could change anything about the program at any time. But, you know, really how often do you need that? You don't change your program all that often. You write your program and then you run it. And while it's running, it's not changing, right? So this is something, you know, this actually has some uses, right, for some kind of interaction or debugging. You might want something like this, but in general, you don't always need this. You don't need your program to be changing constantly. At some point, it's fixed and you want to just run it, right? So that's sort of the most extreme example of sort of unnecessary uncertainty about what's going to happen at runtime. But then there are more, you know, more fine-grained sources of uncertainty in a program as it runs. So, you know, the variable might start out as a value of one type, and then based on some condition, you might change its value to something else. Its value certainly is going to change. In a dynamic language like Python or Julia, even the type of variable could change. It could be an integer. And now, you know, under some condition, I change it to a string. And then also, you know, if I call a function, you know, the system might not know what this function I'm calling does. So it doesn't know what it could return. Maybe now it returns an integer, but maybe the next time I call it, it will return something else. So there's uncertainty from there. There's a lot of uncertainty that comes from manipulating data structures. If I'm looking at a data structure, well, maybe somebody else is also looking at that data structure. I look at it once, but then while I'm doing something else, they might change it. So the next time I look at it, I have to, you know, check it again, you know, reload information from it. You know, even though it might not have actually changed, but you have to sort of allow for the possibility that that could have happened. So that's the source of overhead. So all these kinds of variations and updates, you know, you have to be able to reason about when those can happen. The language and compiler in particular reason about when those kinds of things can happen to try to cut out this kind of overhead. But it's interesting. So there are a couple of what I consider kind of important informal results in the world of programming about what happens. A lot of people have studied dynamic language programs like in Ruby and Python. There are a lot of papers about it, about studying their behavior. And people have found that in general, these programs are not as dynamic as people think. There's always a lot of regularity. Even though the system is designed to allow all kinds of runtime variation, it doesn't actually happen. In reality, people process the same types of things over and over again. So even when people use the more dynamic features, they tend to occur in the code, but they don't necessarily happen often at runtime. And so also, if you think about, you know, the idea of an inner loop, like a performance critical loop, it tends to be repetitive. And in fact, I think even saying that, it's hard to say in a way that doesn't sound like a tautology, right? Of course, it's a repetitive thing. It's the inner loop. So it's almost, it's a performance critical, you know, high iteration count loop. It's almost certainly doing something very similar on each iteration, right? That's just a very common pattern. So there really isn't all that much variation at runtime in running programs. So we can exploit that. So there are now a lot of systems that actually can run programs in dynamic language. It's very, very fast. The widgets are especially good example. And they basically exploit these kinds of findings. So there's a lot of spurious uncertainty. There's, you know, in general, you think you can't predict what's going to happen, but in reality, you really can. So like what I learned about yesterday is that there's, you know, when you're driving through the street, there really isn't any reason to slow down because the pedestrian's going to get out of the way, right? I mean, if you value your life, you're going to get out of the way. So, you know, while I slow down, of course they're going to move, right? So that's sort of, you know, that's how that works in traffic. But the same thing can happen in programs. A lot of these systems will just, you know, okay, I know in general I might not know what's going to happen in this loop, but let's just assume the best case. Just assume it's going to be really easy and repetitive and generate code for that. And then run that and then try to back out, you know, slam on the brake at the last minute. That doesn't turn out to work. So that's sort of the first category. Then that can only get you so far. You can get really big speed-ups from doing that. I'll show an example, but to really get to the next level and get even more performance, you have to try to remove some more uncertainty. So this is played out recently in JavaScript execution engines. And JavaScript is a really dramatic example because it's a very, very dynamic language. It has this very flexible object model where every object can have any fields and you can change the fields in any way you want at any time. Or it doesn't even have classes, right? You can just stuff in anything to any object at any point. It's super dynamic. So it's a very hard case for this. But nevertheless, these two, the jits here, these are times on this graph. So the blue and red bars are kind of the state-of-the-art JavaScript jits which use these optimistic, speculative optimizations. And indeed, you get really good performance from that. So there's a comparison here. The little yellow orange bar is the time for highly optimized Z-code to run these things. And JavaScript with these optimizations can get within a factor of 5 to 20x. That's 5 to 20x of more or less the fastest code you can get. So that's actually really good performance considering what's happening. But, you know, that's not quite enough, right? We don't want to be 5x slower. We want to be 1x slower, right? So there's been this project to take this to the next level, this ASM.js thing. And they have the green bar, right? So they sprinkled some extra magic dust here and they actually get this down to within a factor of 2 of Z-code. And once you get to within a factor of 2, I think a lot of other factors come into play. At that point, it might be just a matter of, you know, details of instruction selection and instruction scheduling can account for a factor of 2. So at that point, there might not be anything left for their compiler to do except sort of routine optimizations. So how do they do that? So this is what they did to get down from 5 to 20x slower to just 2x slower. What they did was write in these little or zeros on lots of expressions in the JavaScript program. And that's a very clever hack. It basically exploits the fact that that bar operator is an integer bitwise operator that only produces in 32s, 32 integers in JavaScript. So this is basically a way to force every result to be in 32. In this example, it doesn't look too bad, but in a bigger piece of code, there's just or zero on everything. You know, you have to put it everywhere. But when you do that, and then if you write your JavaScript engine to sort of recognize that and exploit it correctly, you can get very fast, you know, register size operations from all of these. So they do that, and then they also add typed byte arrays. So just, you know, arrays of bytes like memory. And once you have those two things, you basically have what I would call a C machine You have effectively a processor that can run a language like C, where you have an array, it's the memory, and you have 32 bit registers, and you have arithmetic, and you can do whatever you want. And indeed, once they have that in JavaScript, they've actually compiled huge programs, even 3D games. They compile to this, and they can run it, and you can get great performance. They have these amazing demos of 3D games running in the browser using this. So this is kind of surprising. And so it's cool that you take a step back, and I kind of know what was going on here. So we had this very high-level object-oriented language, and we just suddenly, we seem to have just thrown it all away, and now we just have 32 integers, and byte arrays, and everybody loves it, because now we can have 3D games and all this crazy stuff in the browser. So this is, you know, how is it a revelation to have 32 integers? Like this has the whole world gone crazy. And what is going on here? And I think that, you know, what this means really to me is that it's really important to have efficient abstractions. You have to have something very efficient in the language. And I think this is a traumatic example of that. Just adding, you know, just adding a little bit of the efficient building blocks that you need to something. And then, you know, once you have those efficient pieces in there, you build everything else on top of it. Rather than trying to start with something very high-level, so in Julia, essentially, our approach is just to have a much more general version of that or zero thing. You know, that, so we just generalize that, essentially. I think historically it didn't happen in that order, that we weren't directly influenced by this or anything, but this is, that's kind of what's happening. So what's the more general version of that look like? Alright, so first of all, instead of having just or zero that implicitly means in 32, we want to have a whole vocabulary of these data types available. So in this case, I mean types in the sense of data types, like what size number you have. So you want to have a general, you know, combinatorial vocabulary of those. And then once you have that, you can do type assertions and convergence. So you can declare that something has to be of a certain type or you can convert something to a certain type. So where they have or zero, we would just say convert to in 32. And then clearly any other type could go where the in 32 goes. And the other thing you can do with that is have basically typed storage locations. So you can have a location in an array or an object that has a type attached to it somehow so that every time someone stores to it, you know, it's insured to be converted to that type. And then once anytime you load something out of it, you know what type it's going to be. So that sort of makes programs much more predictable. And then the last ingredient is to try to move a lot of program behavior into type-based dispatch. So Julia is based on multiple dispatch. Everything is a generic function. You write, essentially when you're writing your libraries, you write multiple definitions for every function and you can just say what all the argument types are to make a certain definition apply. And because that has a lot of flexibility because you can talk about the types of all the arguments, the types have this combinatorial nested structure that's very expressive. So that lets you actually move a fair amount of program behavior into a regular type-based system that's very easy to statically analyze. So doing things that way instead of with, you know, a lot of explicit logic and branches basically makes the program behavior more predictable to a compiler. So one of the Julia contributors, Oscar Plumberg, sort of summed this up nicely once he said, it's designed to make it easy to write programs that are easy to statically analyze. And this is really what we're about. So Julia actually right now does not have a lot of the really fancy, just-in-time compiler optimizations like a JavaScript engine has. If you sort of try to write JavaScript-style code in Julia, we will write it much slower than the V8 JavaScript engine does because our answer is just to say, you know, just put a few type declarations in a couple places and you get all the performance. So might as well just do that. But then the really neat thing, though, is that using type inference, we don't require you to actually put types everywhere. You know, essentially what happens is somebody writes a type somewhere once and it just sort of propagates through the system. You know, I can start by saying, maybe a 32 begin, and then everywhere I pass that, if I pass it to the plus operator or some function that makes a raise, you know, if that type just basically flows to all of those places and now I know all of that code, those are all in 32s and it just automatically propagates. So you don't need to write the type again everywhere. Compiler can do that for you. But at some point, someone has to write the type. That's sort of the key. So a bit more detail, how this plays out. There have been traditionally two ways to represent data structures. There was the sort of the dynamic language way, which is what was done in the list, the Python, the core Python language, not coming up high, of course, where everything is a pointer, basically. This gives you the most dynamism possible. Since everything is a pointer, anything can be switched to a pointer at any time. This is sort of ultimate flexibility. And then a language like C has much more sophisticated control over memory layout. It's much more complicated in a way because you have to talk about where are all the bits and bytes in great detail and all of that is mostly handled at compile time. So you have this more complicated type system at compile time but faster run time. And if I think for a long time these just seem to be an inherent dichotomy. This is exactly part of the two language dichotomy that Stefan talked about in his talk. Depending on whether you're in an assistance language or a scripting language, you pick which one to use. But I think that sort of interesting piece of wisdom has accumulated gradually that's revealed, I think, that the dynamic language, everything is a pointer approach. It's just really not right. It's just not optimal. This is a really nice paper that's kind of about that. Store strategies for collections. Like dynamically typed languages, this recent paper that basically showed that you really don't want that everything is a pointer style of representing things ever. Even in Python, you don't want it. And what they basically did was they modified the pypy for Python. Basically what they did was add strongly typed representations of all the data structures under the hood. So they added the ability to have a 32-bit inter-ray and a fluke array and all of that kind of stuff underneath. Not exposed in the language. The language was still just Python. So they would represent things that way within the system and just put a wrapper on top of it that gave you the Python interface to it. So that involves a lot of extra complexity. There's some extra dispatching and switching to pick different code for different arrays. Sometimes you have to actually switch or you can start out. Something starts as an inter-ray and then someone stores a string to it. They have to change the representation. So there's a lot of overhead and it's more complicated. But amazingly enough, it actually turns out to be worth it. So you can have any extra complexity. Specializing the storage of the data structures is such a big deal. It's so useful for performance that it actually ends up being worth it to do at the end of the day. So that basically means that doing everything as a pointer thing is just not the right thing to do. And in Julia, I think it was kind of designed to exploit this from the beginning. But this paper has some very nice substantiating evidence for this. All right, so here's an example of mechanically of how this works. This would be loosely what would happen inside the storage strategy system and this is what happens inside Julia. So in the middle there we have some memory location. So this would be one of our typed locations. So I have 16 bits. And then somewhere there is a tag that says that it's 16 bits. It's not going to be stored here. It's going to be stored somewhere else. The fact that it's 16 bit might actually only exist at compile time. So that's just somehow attached to it. And then we have operations that happen at this location. There are stores and loads on the left and right. And then there are cases where we know what types we're dealing with and we don't know what types we're dealing with. So we're generating code for the case where we have to know the types of everything. Of course it's very fast. We have a 16 bit and we know we're storing it to a location that's 16 bits so we can just do something very fast and store it directly. The same thing on the loading side. If we're starting with data that we don't know that much about in that case we'll probably have some sort of boxed object like a heap allocated thing that will have a tag on it. And then we sort of check the tag. It's okay, yes this is an in 16, that's good. And then we have to load the data out of that and then put it in that location. So there are a couple extra steps. It's not too horribly slow, but there's a little extra in direction there. Just a little bit of extra work. And then finally when we if we're going to load something out of this location and we're putting the data in a place where we don't know the type then we have to actually box it because the receiver isn't prepared to receive exactly an in 16. And then we have to actually keep allocating a box for it in that situation. So that is slow. So that's sort of the one bad case. The good thing about that case is that the box that we allocate is probably very ephemeral. And ephemeral garbage objects are actually pretty cheap. So if that box is unreferenced very quickly then this actually isn't that bad. But this is definitely the slow case. But basically what you see here is that you get a bunch of one case that's kind of awash and you get two cases that are very, very fast and then there's just one case that's bad. So overall I think this turns out to be worth it. It's basically what happens. And this actually is the exact same scheme for this exact same scheme for specializing stuff. We use the same idea for code actually as for data. So in the case of data there's this idea that there's some memory layout. So starting with this in 16th thing you can find out that it's 2 bytes it needs 2 byte alignment and so on and for more complicated types there'd be more information about the sizes and offsets of everything inside of it. And similarly for code starting with types you can generate machine code from that. And in that case the function is a lot more complicated. You have to do type inference and code generation but it's really the same basic idea that some of the types imply what the data layout has to be. And then in the data case where we have full type information we can do the direct store and load similarly for code if we know exactly what's happening we can do a direct call or we can even inline the code that you're calling. So that's the same pattern. Or in the case of data where we don't have good type information we have to box certain box values and case of code that corresponds to doing the dynamic dispatch. So that's a very, very parallel structure here. So overall in a system like this you get a pattern of there's specialized things and then you have occasionally there's this sort of more dynamic dispatch that goes on around it. So you have sort of a dynamic glue around efficient data structures or around fast compute kernels. If you take a step back that's really very similar to the design like R or MATLAB where you have a lot of pre-written fast compute kernels and then you do kind of dynamic dispatch on top of that. So it's really a very similar pattern but the key difference is since this is sort of part of our language from the beginning you don't have to tiously, manually separate which part is going to be in a kernel. To do these systems like NumPy people have to say, okay we're going to have this function that's easy then some part is written in Python that you have to sort of each time you have to manually decide which thing goes where and then every time you do that you have to sort of figure out what the interface is going to be. How is the high level program going to end up calling the fast thing underneath? That's sort of a rethought every time but in our case it's basically just the types the interface is the type if you have the type you can dispatch to the fast thing that handles it in every case but also in the when you do it the manual way you can actually end up being slower even in the dynamic case because you sometimes need multiple layers of dispatch so if this happens in NumPy for example when you do certain NumPy calls first there will be a Python method implication which is one dynamic dispatch and then NumPy has its own dispatch tables to pick among its various compute kernels where we can do this with just one dispatch system we have is good enough to handle all of that and this kind of pattern of the dynamic dispatch over the fast code happens in a bunch of domains I think numerical computing is a big one where basically you have a lot of important compute kernels like matrix multiply where the computation inside the kernel is very very regular but sometimes selecting which kernel to run is a very irregular process and I think this also happens in database systems where you often have dynamic scheming so you don't know the schema until run time but once you know the schema you could do something very efficient potentially I'll just show a really quick example of that let's see I don't think this is showing everything yeah it's not, I'm going to show it in the notebook alright so what I did here just to sort of simulate this kind of dynamic situation so I created an IO buffer which just simulates some kind of a data source and then I just saved a couple of, serialized a couple of types into it just to pretend we had something like stored on disk that tells you what the type is and now what I can do in my program here at run time I can say read what those types are and make a tuple type from them and then you can immediately do things with that so here I constructed this type from the data I read in you can see it becomes a tuple type we can see it takes up 8 bytes and I can right away start doing things with it I can convert some other type of tuple to that type so this type thing gives you sort of a common currency once you have this you can spend it anywhere you can do anything in the system with it in fact just to show what happens we can say code LLVM for that convert operation I did and it will show you the LLVM code that we generate you can see we actually we get specialized code here the returns if you read it the returns an I8 and an N32 just like we expect and it takes an I64 and a double since that's the argument that I gave it so basically you would get this code generated internally you gain access to it automatically just by having this type and when you construct the type and you call convert this code might be generated right then and there on demand or it might be pre-existing so if this code already existed it would just find it and run it every less is all cached and you can see this generates there has to be a couple of checks a check for overflow but then this generates very specialized code it's quite compact so you can actually take this a bit farther well it's really big now I won't bother with that okay so these I'm being a little bit loose with the terminology so when I talk about these types you typically think of data types like numeric types like N32 but actually you can put a lot more information in them so you can almost put arbitrary values into what we call our types in Julia so you can actually specialize code on many other things so intuitively you have to generate specialized code for a different data type as for example the CPU has different instructions for integer operations including point operations so you need different code in that case but there are cases where you need different code for all kinds of different situations like even a different integer value for instance so here's a function specialized for an integer so this is how a function that applies another function to the numbers one and two so this is specialized for the case where n equals two and we're three then we need to do f of one f of two, f of three and this kind of situation actually happens pretty often in mathematical computing where you need to actually generate sometimes very different code based on integer parameter and people have used this very productively in the Julia world so this gives you kind of an interface to specialize code generation that's pretty general and the things in Julia are still slow so right now actually higher order functions passing functions to other functions and using anonymous functions are actually don't perform very well I'm working on that that's going to be fixed very soon global variables are still very slow we basically treat global variables like an untyped dictionary it's