 I hate mics, sorry about that. If you can elaborate a little on your role and of course the talk is going to be on the Shogun machine very much. Yeah, hey everybody kudos like that you didn't go to lunch. And I'm sorry, I hate mics, I might take it off and I'll shout because I usually shout. So I'm Victor, I'm the core developer of Shogun and yeah, I wish that this would be my full-time job. No, this is my hobby. I started this with my PhD studies and I have never stopped since. But yeah, I work as a data scientist for a common startup. But yeah, this is just a hobby and we're trying to keep up the spirit with open source and the library itself. So let's talk a little bit about like what Shogun is, not about me, not too much interest in there. And then I'll try to go into all the aspects and like please if you have any questions like do because other than that I will keep just talking about things. And yes, I'm very hectic so I do this kind of talks. But it doesn't make too much sense to anybody else. So yeah, let's go into it. So it's a unified efficient machine learning library. Terms and conditions may apply in any of these statements. So we are around since 1999 which is like nobody knows us. Like nobody literally uses us or knows us. Joke, there's some usage of this. Maybe we are out of university. So all of our current contributors, previous contributors and the two guys who started it actually. The name everybody has. Why Shogun? Is it because it started by Serend and Gunnar? Shogun, okay. So it has nothing to do with any Japanese roots there. They are both from Germany in Tübingen. So that's where they started. It was like bioinformatics in specialty in gene applications and mining. So they developed a lot of their stuff there. But it is a full C++ library and I'll go into more. What is it for? We have a wide range of models. The problem lately is that we have so many models that we can't keep up the maintenance of it. So sometimes we just throw out some stuff because we can't maintain it. Because the contributor was just lost and we don't know what's the code doing anymore. We changed so much stuff that we don't know what's happening. But we still have unit that's going on. So actually it's predictable what it does. So it's not that bad. So it's for hackers because it's all written in C++ with some unified interfaces. So I'll go in later on. We don't care what you like. If you like C, R, Python, it's all good for us. We love it. All of them. For scientists, as I said, all of us are doing this as a current two contributors. We're actually running one of them me. Heiko is at the UCL in University College London. What about us that are doing our PhD during this time? And that's how we ended up doing Shogun at all. And then it's for ILOs because it's a GPL3. Because of that many of the possible applications in the field of industry is not happening because we are GPL3. Most of them will go with Apache 2.0 or BSD license. We assume to have something like that. But that's a complicated matter. So because we are in C++, we are running on Linux in any of the distributions you would wish. Of course, OSX, Windows lately. Thanks God for Microsoft starting on doing open source things. And they are starting like the compiler start to be accepting standards in C++. So that free BSD and FBSD, open BSD and of course your toaster as well, whatever you want. We run on it. Like yesterday I heard the talk about TensorFlow. They are like that. Yeah, they run on somebody's running them on Raspberry Pi. Yeah, we did that like six years ago. We have like full Fedora support, meaning like we are compiled, pre-compiled packages for any architecture you name it. Meaning MIPS, ARM, ARM7, ARM8, ARM64, whatever you want. Yeah, but what is it? Like really, like it's a bunch of comments. Like we have like more than 15,000 comments over there. A lot of lines of code. Like as I said, like we had to like this happen, this jump back on the lines of code happened in October when we threw out a lot of things. We didn't thought that we use a bit on many contributors. We run many, many for the last seven years. We ran Google Summer of Code that hosts our contribution mostly. And if anybody happened to be here, student, please come and fetch me. I'm more than would be happy to mentor somebody. I'm usually mentoring like five students a year. There was one year when we missed Google Summer Code. Awesome thing. So yeah, like let's go into a little bit about this whole unified interface and all about like this religion, especially in data scientists, I believe that what I see is that everybody's like ARM, Python, ARM, Python. And that's happening like constantly. And then like you show me a Python code. No, that's horrible. Yeah, no. So like this is just writing or random job descriptions like what people like tend to use or want to use. Of course, this is like the overlap. I don't know how correct is it, but I try to be researching in this area to be a bit correct about this. This seems to be a nice partitioning about the people what they capable of or want to programming. Don't get me wrong. We can go into these discussions like what is, I think like I might have offended somebody a week ago about these comments. And like, sorry, I don't meant to do that. So like, if we can start discussing like what is wrong with them, like Python, it's like it's a jake with hash man. Thank you. What the hell? Sorry. Or for me, the whole thing of our syntax is horrendous Java. Anybody use Java in production. Yeah, it's sometimes it's a hassle, especially with GCC and C++ is just because of that. If you know what that means, it's cool if you don't. And then as well, I think you missed some not so interesting stuff. Anyhow, like, yeah, the problem is usually that like what I see lately that everybody's developing like or testing. Like I love Python as well. Don't get wrong in the sense of like testing ideas quickly to have like see how it performs like the idea. But the problem is that from that moment on that you decided it's good and you want to bring it in production. That's the problem because your Python code is is I mean, Python is I still believe and I can have a huge hours discussion about this. I believe that it's not meant for for production. It's just not it's a dynamic type language. If you put it with the people developing 50 over that anything can come in and anything can go back. And then I know about these traits like that then start typing the language and then you are like, but then why do you do this? Why do you use this language if it's already typed? And yeah, and it's very slow. Like it's it's for me. It's like super slow. Like I can do comparison with Matlab. Matlab, I mean, that's that's not a hard performance thing. And it's like in the same whole part. Anyhow, like, I love it. But then you have the problem of deployment, right? And then running in production and having it passed. And that's what we try to do. So show them interfaces for everybody. We don't mind. We want every every interface possible there. So the currently we support Python obviously are Lua, Ruby, JVM, meaning Java or anything like running in JVM. C sharp. Okay. If anybody uses it, I don't know. It's a nice project actually. And currently I'm working actually on this, the JavaScript support. So the way we are capable of doing this is that we have written in C++ and there are some very nice library called sweet. And sweet helps us all to generate all the interfaces. So I'll try to show you some like what happens here. Can you see this or should I zoom in better? Okay. So like this is our website about like snippets of like what can you do for us with us? Like what are the different models you can use like machine learning models? This is by far not all of them. It's just that we didn't have time to port all of them into this example page. But as you can see here, like you say like you like random forest and you want to use it in Shagun. Then like as you can see here is the listing. Like what is the input? You say like it's classification. So it's a majority vote, right? And then it's a random forest and so forth and so on. Currently it's in Python. As you can see, you just shift to Octave, Java, Ruby, C++. And as you can see, like the most of the time, like not most of all of the time, the actual function names are not changing, right? So if you once get the whole little bit of Shagun like the way our API is working, you can just keep on like back and forth switching between languages. And yeah, again, all of them is running under the boot in C++. You're just having an interface like TensorFlow does this as well. Actually TensorFlow uses this way as well. So back to the story about interfaces is that, yeah, like the only thing I don't understand is like by far today why JavaScript is not being used so much more by these scientists. I know that there's no tools for it, but the speed of it is like significantly better than any of those dynamic type technologies. So that's why we actually try to go in there. And then like if anybody is from here from Mozilla, it's like whether it's something that you should check out because it's something really new and really nice. So yeah, but we have this huge problem of because of the fact like currently we have students coming in that like trying to get into Google Summer Code. But every year this is kept happening. People come in, try to use the library, and there is this huge chunk of C++ code and like we didn't have pre-compiled stuff usually on the Vivian or Red Hat, only on Fedura. Of course, like the core developers always knew how to compile it, but then people came in like, yeah, but it just doesn't work, you know, like this typical thing. People come to the IRC channel and they're just like, this does not work, and it's a complete pile of whatever. And you're like, oh yeah, I worked on this like years. And anyhow, like we just realized that there's this huge gap that like people like to get into the project is very frightening. So we started to think about like, what did we do about it? And currently we have like a showcase of iPads and notebooks where it's all about like education in machine learning. So this is what we started like last week we had a course at UCL by Heiko. He was running a graduate student course in a lab with about 30 people to have introduction in machine learning. So each year when we do a certain, okay, so each year we do a certain Google Summer Code that we run actually like everybody has to make a iPad and notebook. And it's all about like explaining different aspects of the model itself as well as the do a presentation of it. So this is one of my favorite ones. It has been written a long time ago. It's like independent components and component analysis. Like if you have two different signals coming in regarding like sound, even videos or whatever, you can actually like divide it back and get the information you want. And the demo was done by like this. I don't know if you can hear it. But so this is all about, can you hear me still? All about the sound and what you see is that you have the input. Good day commander. You can't hear it, right? Okay, so you have that and then you have the other one. You want a piece of me, boy? Anybody Star Trek? And the third one. And you take these three signals, right? And then you do a mixed matrix here and then you can use it to mix them up together. So this is like the mixed signal, right? And you can do various mixtures of it. And then you can hear create like the input mixed signals, right? Take it as a simple feature set like a doubles. And then you train J. Like it's a independent component analysis. It will give you the estimated mixing matrix. And then basically if you apply this estimated mixing matrix in your input vector of the sound, you should be able to get back the original sound. This is actually the original sound that you got back from mixing with the mixer. And the only thing I want to say here is that whatever we did here, there's many, many notebooks there. So this is now available online. So thanks to AWS, they give us significant amount of credit. You can go to Cloud Shogun ML right now. Just use your GitHub account and you will have all the notebooks available for you for free. And you can run it and you can use Shogun or that's actually like a simple Jupyter notebook. And there are all the demos, all the data for you, but it's a sci-fi stack so you can use other libraries as well. It's for free for the time being until we have support from AWS for that. And yeah, so with this we are hoping that there's a shorter acceptance like getting into the project. Two things. When first time you log in, there's a marathon because we use DCOS. So in very 60 seconds you will need to get in the first time. Everything is persisted so you can get your data in and out. Use place the Python 2 kernel because I actually only use the Python 2 interface, not the three ones. And then there's applications in the real world. There's like one thing that I did this my PhD about this like organ segmentation. You can read about it and then it's all about like how I structured out the prediction, which is all implemented in Shogun to detect first the organ and then later on do segmentation on part-based lung detection. And there's one like these are the results, not so important. There's one very important like the anything about the kernel 2 sample testing which is Haco is doing. It's a very interesting what it's all about. Like let's say you have like two data sources of different type but like different. These are the two different probability distributions that they are following. And you want to be able to find a way to distinguish them like when the next data, like you don't know with this mark what's the type of probability distribution is coming from, you should be able to tell it and then you can learn this. And like they did a paper on this. This was our first runtime as you can see. This was really bad and then with the new version we actually down like actually like 10% of the runtime. And then like we really made it like threading as well. And all the paper is like this is going to be appearing at ICLR. And although there's a notebook about this you can read about as well. One more minute just that summer code except in the organization. We are like three months of open source hacking. It's really nice. I did this twice in my life. You get like five five and a half. Project ideas are all listed. But of course come with your own ideas. April 4th is the deadline and send PRs today. Thank you. If you have any questions please. Thank you very much. I have actually two questions for you. First off, could you elaborate briefly on any characterization that's done in the code, for like random forest condition there. Secondly, as you know Julia is becoming pretty hot language without a scientist. Do you have Julia wrapper in the code? So okay like first of all Julia like officially I'm not allowed to say anything about this, but like we are actually being accepted this week to be the part of Noom Focus. It's an organization where Julia for example resides. So and we met them last year in Mantar Summit. So actually we are working on that for me. It's not so much interest, but some other people yet. So we have Julia works on getting integrated with the dead. Second parallelization. So we currently work on a huge rebase of all our linear algebra implementation. A girl from New York. She did like a major stuff in Google Summer of Code. We fully support now like we are using OpenMP if you are running on CPU, but you can dynamically switch between GPU and CPU. So you basically can run whatever and then just switch to a GPU and run the same thing on it and back and forth. So that's very, and it's run time. So it's not compile time, it's run time. And now that like TensorFlow came out with XLA, we are just going to add that support as well. And that's cool. Thank you, thank you.