 No, no, no, turn it off because I usually shout a lot so hey how's going I I will be not using a microphone I usually shout so but if you don't hear me please just tell anything and yeah there's my Twitter account so like in the meanwhile if you will see that stuff will not work but I don't see anybody having here like a laptop with himself so that's all right doesn't matter you'll see it later so hey I'm Victor I'm one of the core developers of Shogun which is a machine learning library for a long time now I I joined the project when I was doing my PhD actually like all the stuff I did in my doing my PhD either I contributed to this library and ever since I just called up doing this thing so we are run for a long long time it's often source for at least 10 years now or 15 years there's a problem that like it's written in C++ so people are having problem of like entering the barrier there but anyhow so there are many good features there that you'll see so we are like since around since 99 and basically we are practitioners like all of us like core contributors in this and and and some younger like we have many Google summer of course students like I'm like the sixth year mentoring it so yeah we have one like summer of course students were like four of them entered MIT after Shogun two of them at CMU and somebody at Stanford so like yeah we have a good like track record of people contributing to this then you just end up doing your research somewhere and usually they leave because we are like a big big library and they just need a niche thing for themselves nonetheless there are many libraries many many algorithms there the problem is that like some of them are like not maintained anymore because it's like for genomic things but then again like you could use it and there currently somebody picking it up some hackers so we are C++ as is like we don't do everything in the core we don't do any scripting languages and you will see why later and a scientist as I said I did my PhD the two contributors like the founders did their own PhDs for that so currently we have like other core contributors as well doing their PhDs over this so it's all about that we don't have any industrial funding we just survive on Google summer of course and our own passion about it and yeah I'm very hectic today sorry about this in my company like this is not my daily job I just do it as a passion for a week we have like a crisis so sorry I just just got to put your stuff please ask in the meanwhile if you have anything I am more than happy to have an interactive thing going on here so and it's ideal is because we are fully open source as actually like it's a gpl 3 so no bsd no apache thing what if you use us you have to contribute as to back basically that's it so like linux but like then again the currently there's a pushing that we're going to have like a bsd version of ourselves but it's a mess with the licensing and chasing down people of you'll see later like we have like 150 people contributing and chasing down them that like getting the agreement that you can do this it's not easy so we're running on linux obviously like all the debian and redhead there are clinics whatever you want then all sx windows free bsd net bsd open bsd and obviously your toaster as well because we are c++ we don't care about the environment and yes I did test it many embedded systems including my phone and yeah it's it's it's working nicely so what is really like because like I kept saying that it's a great thing and like it's a big thing and it's we are all but nobody hears us because like everybody's using agg boost everybody's using what's what's deep learning is like a hip thing today so like yeah everybody's like about deep learning yeah like just random note here that fun fact since 99 we are column based and if you do big data you'll understand what that means so yeah it's it's not a new thing actually so what we are like like let's see like what are how you can describe us so this is our statistics on all like currently we can't cut down like the lines of course to just to 400,000 but it used to be half a million lines of code and maintaining this is really really hard as you can see it grew like really fast like I joined somewhere around here maybe that's my bump here I don't know and then then just kept growing I don't know I haven't been touching it for the last two years actually but then somehow we two of us with Haco who's another contributor is at UCL like university college London he and I started like actively doing it for the last half a year so yeah there's many lines of courses currently as you see like still a lot yeah Jesus we want to cut down a lot you see here the mixture of things we have you have some Python thing going on that's because of the build system we have it's it's a mess and yeah other than that it's mostly C++ as you can see and then like what are we as well like a community so there's another number round here and yeah I'm going to talk about data science soon yeah I know I know so yeah here the code comments going on like their comments is high and like contributors like roughly like 175 give or take I think there's going to be 150 this is like some statistics of like two people with different aliases and stuff anyhow so what we want to intend to do is to like was always to stop this religion about language because like most of the data scientists are like super opinionated about like are or Python that's the biggest two things going on between them or then some people are like at least talking about Java and JavaScript is quite nice I have many opinions about it no hard feelings here so I'm just wondering like how many people uses here like Python like is like most regularly okay are okay this is a bit fewer and then like C sharp cool yeah that's very rare awesome that's actually a good language I love that language damn that it's didn't pick up as good as like Java Java anybody okay okay there's few cool so anyhow like these are like how they are like interacting with each other you can see that most of the people are like very opinionated of their own language and then they keep using it and then whatever happens my problem is that like I have a couple of comments here like about Python are you serious that an object is a hash map sorry like that's a no go are I never could have used to the syntax I'm sorry like my brain doesn't compile with that Java is awesome I use daily Java but like when there's GC in the JVM I just get very very crazy I used to work for a recommendation engines company and we had all the stack in JVM and we had to do many many tricks about because everything was JVM we had to do tricks about like being able to reach the SLAs we had to agree on so that that is a no go and then C++ is there you go I love that but then again somebody hates it so let's go back so what interfaces we support so there's like a library called SWIGS we SWIG and this is why we want that like we want interfaces for all so we don't care about your language we want to like help you but the problem is that like especially with the scripting languages like Python you can really fastly go to a level of arriving to a good model but then if you try to take this to anywhere in production you are doomed like that's like I've been through like a couple of companies of they're trying to do this like they have a Python stack and they try to bring that into production it's and yeah dynamic typing I'm too old for that I don't know how about you but like whatever comes in whatever comes back it's scary in production same it are in speed especially like both of the cases like runtime is horrendous for for simple simple what is it operations you don't want to know what's happening Lua I love that language actually Lua is cool Ruby yeah Ruby Java or anything JVM we support C sharp as I said and then in progress we're doing JavaScript actually like like if you look around JavaScript it's like I don't know why data scientist is not picking up on JavaScript like that's vastly faster than than Python and are okay you don't have the supporting libraries but but it's it's much better like we ate it's it's it's amazing anyhow we support all these languages and how we do that is basically let's look at it here so we have this week as I said and then sweet helps us to automatically generate interfaces to any languages so we basically meta we show the meta information to swing and it's automatically generates all the all the language interfaces you need and then right away you can use it so here this is our like in October we release this new feature that's like here's the website where you can check this is just like a handful of arguments you see here I don't know like any opinions about like what do you want to see no okay then this is obviously my favorite to support vector machines with kernels so yeah it's a simple like here's the optimization for you like the primary objective here are the subject to things and I'm going to go into the math details and on here here's the example in this is currently in python that like what is the code you need so you get your features you create your features this is actual code you write in type python is going to work you create your Gaussian kernel you stuck it into alibis we am set the epsilon if you want you do the training do applying the big binary classification get the bias and do some accuracy measures right same thing I yeah octave I forgot totally so octave who knows octave wow yeah we actually we are currently working on matlab interface there's some guy doing on the swing extension for matlab and it's really good it just need time yeah so yeah octave the same like it's you see like this is all everything works as is so we have like this meta in meta examples that we then transcribe so we built a framework actually for this that we have meta examples that we transcribe into any languages and then you regenerate you this basically this example and as well these are actually generated the codes that are doing integration tests over us so that actually we see that all the mappings between every languages are actually doing the same thing that they supposed to do side note yeah you see the are if you want our library the native C++ implementation using this is using some C++ 11 extensions awesome stuff so yeah and then like you know like you can hear some of the other like algorithms random for us usual suspect right actually this is like our strong side who's currently in Japan he's doing our basically the whole stack of Gaussian processes anybody heard about so I like I love it I love it except the fact that then you that you have to invert a matrix yeah that that's that's a bit shitty part but like there's ways around it but I love that because you get like a density function for your classification as well awesome stuff so he wrote like our library actually regarding Gaussian processes is like the more most full implementation of Gaussian processes at the moment and you're going to see you can check benchmarks around it he wrote like a whole I Python notebook about it how much faster it is compared to any other implementation out there actually the one good implementation in Gaussian processes he started to use was was in my lab by the way anyhow so so we use like you can use any of these interfaces the only problem is that like if you go here and check how you have to use us it's getting getting messy like really messy so that's why we started to go around and we were thinking about what should we do about it so so it is something that that existed before it was written by me over like a weekend of when Docker was like just a new thing I had to get her a flask application and then like it bring up Docker containers on my own machine and then like I had to randomly kill people out because we were running out of time out of instances anyhow so the thing is that like shogun was used and currently use that UCL as an education like actually for teaching master students in in data science and here like we have like a showcase of different like a longer explanations about different algorithms you you you like or you want to learn about it and how to use shogun to actually use it it's with I Python notebooks but with now with Jupiter you can do much better things actually so I know like this is a simple introduction to to for example okay wait so yeah I'll just so simple interaction about the machine learning what it's all about how you get in a data set what does the definition of a data set that it's features and labeling and what is the label set all sorts of things and then like is it binary classification is it like regression is it like maybe a clustering if it's unsupervised and then like you can learn here how about to load in shogun and how to learn more about your data so you can do the pre processing as you heard like like it's very important to do some feature extractions or or actually dimension reductions because sometimes your data is just so big that you can't handle it and then or not I mean your machine can't handle it or your algorithm and then here you see like basic examples about like I think it's about like plasma glucose concentration and how they are being classified and using a libSVM so yeah and then here's like the classification again like the density and then the boundary this is the boundary that it learned the libSVM obviously and yeah you get the accuracy of 3% it's always in context how good that is and then yeah something is something right yeah housing data set as well there I think it's like you know like it's pretty simple stuff like if you know some data science then you are not so much interested in this one if you're just a starter it helps you and obviously this is just like a actual generated HTML of an iPad and notebook that you can actually use and modify and and see how it goes and then like you have the KNN for clustering for example clustering people clustering I think here the example is about clustering numbers like you see numbers images I want to recognize them want to see how they fit then a nice auto encoder again over the same data set as I think this is you USPS or something like that and sorry yeah that is at least as well USPS has the same kind of thing they have like a data set of like I think written numbers and then like that's it then then here I think this is quite nice comparison of like how like what's linear SVM what's a what's a kernel SVM you see like if your data is like this and you have a linear SVM which is basically don't know how to else to put the plane around this because it has to be linear meaning like a straight line then what to do what to do you take the kernel machine and then it's awesome because you get this but then you pray a huge price over memory and runtime actually but nobody tells you this in the beginning so it's all looks amazing again like many many kernels the difference is you can learn here about many things about like naive base how it works in more explanation over it bugs bugs bugs bugs bugs yes we are open source so if you see a bug and you hate it come back come with the PR fix it please please you will be very happy and then then some gauges awesome here good probability likelihood and then you can generate this very nice stuff I'm not going to go into the details because you can read it or not like I'll just show my two favorite examples like this is a very simple by source separation over audio files with independent component analysis it has been written by somebody from Canada as a Google summer of code project and I mean just the whole back end as well this this thing and it's like I really like it so what he does is like he want to show you like how you can like if you have a mixed signal and you want to get out like like somehow to be able to separate the signals and get back the information because it was a noisy then you can think about images sound as well as data right it doesn't matter it's any kind of data then he shows an example how can you do this and here this is just a basic implementation that shows you that he's going to do like anybody's like recognizing this okay so Starcraft of course so he took took three different signals from Starcraft and this is it like you load in the data and this is just a representation I mean here it was a representation of the in the UI never mind that again you get the third second second signal and the third signal right you have these three different signals and then you define a nice mixing matrix over it so what you take you take the signals and you mix them together separately on purpose you hear and then like there's a lot another signal mixing and so ideally this is what you get you get this mix of signals right and then somehow you would like to be able to separate them like finally get an algorithm actually can do it and then like you will realize that like wow you can actually do this like there's an there's an algorithm for this and you can do this so you get in trouble you get the signals you get the mixed signals is like features right it's just random like it's a byte series nobody cares and then you convert it and you take jade it's an ICA independent component analysis and then this is your estimated mixing matrix over there over the input and then this is it we just plot back the things and this is the first signal and the third one so these are all like this is actually happened with the input so it's not not like we took the other input so it's it is working and and my other stuff like it's from Sergei Sergei is like at the moment at Yandex he's amazing I love to meet him because you can just like wow you can learn so many good things from him so he wrote this library like tapkey and tapkey is currently I don't know like it's a header only library if you know what that means that's awesome stuff like Eigen and and it is all about pre processing all about unsupervised pre processing of your data namely dimensional reduction so obviously like you have sometimes like feature space of like I saw somebody saying like hundred thousand features and that's like I don't understand what do you get out of that but yeah like huge feature spaces and like there's obviously some data sets that cannot handle that like machine learning libraries that cannot like handle that so what he wrote like is a full set of like different dimensional reduction algorithms where it automatically detects it and somehow unfolds your your your input like you'll see here for example that you see this like it's a spiral but it's a 3D spiral right and your data like the color of the data meaning like it's a different type of labeling of the data you could represent it like that and so this is this is the three dimensional representation of it this is just the cut in in one way and this is the actual unfolding into two dimension automatically like just here here's the here's your target dimension for the isomap and boom you got that it's it's awesome it's really nice same goals here you have these spirals here and you just say like it's it's a stoic proximity embedding as you get the two dimensions you get it straight out there so like any anytime further I will don't want to go now because yeah I actually wanted to show you that that so shogun cloud why I was talking about like we have this many many I Python notebooks and like we have this huge library like nobody touches it now this because everybody is doing like spark what else is hip today what what what what's hipster thing now with data science yeah probably yeah those those stuff awesome stuff yeah sorry I can't I can't hold it back I can't hold it back it's now no I it this is a complete crap like like there's so many bugs in it but I just do it out of passion but at least when we write some code that is like it's meant to be like somewhere near to performance sorry about it so so the problem is that there's a huge entrance barrier for everybody installation understanding C++ understanding how do you get the library yourself into your own language at stress so we are probably that and we understand that so that's why one time I tried to fix that but then like okay we need to fix this we need to do something that people can start using it because it is insane that we are people trying to use random forest for trend detection like please please come on and and actually boost is like yeah for everything right okay so release date is now like today here shogun cloud for you thanks to AWS if you have any tools please go to this place you need just your github account and login what you get here is is the following it's disclaimer is warning currently data is not persistent so if you use the anything if you're going to lose it it's only Python 2 because we are using ipython but it's insane jupiter so I'll show you later and it's running on spot instances so it's yellow square because nothing is persisted and all is gone so make sure that you keep on downloading shit no no but it's going to get fixed till next week till false asia for sure so like anything further do like what you get is this so this is the cloud shogun implementation cloud shogun you go in you log in this is the data sample data sets we're going to be using for our demos and then here you see all the one come on come on why is it so slow so all the all the ipython notebooks inside that that I was just showing like parts of it and yeah and then like let's go to the one I guess everybody like do we do the the same thing or do we do the the image version of it and let's do the audio I knew that worked I don't know if the image is going to work actually yeah so come on why is it so slow it is it is no but it's AWS it's AWS it should not be it's not our fault okay wrong format doesn't matter okay as I said make sure that that your kernel is not Python 3 which it will be by default you just switch to Python 2 it will start up the kernel and then you hear you see like this is the same notebook but it has not been yet executed and what you can do is just go in cell run all or run like you know manually one by one see what's happening and did I push it yeah and there you go it's currently running all the yeah there's the signals and that's it this was on the fly doing the same thing so you get all this in in the cloud you can use it with those restrictions which which I hopefully gonna be able to I mean I did this today right like this cloud thing it's like it's it's just like in the afternoon I put it together that like we need something but like now like it's 2017 many many things are changed and many many things are much easier than it used to be in 2011 when I did the first version of it so yeah and then show going in the real world and then I was thinking like okay who's because I'm usually getting this question like who is using you we are like there are rumors about people doing stuff in places where we get like okay so like I think the there was somebody in Hong Kong who wanted to use it but we never got a reply for why and then like as I said like many many research areas like like we have a lot tons of people coming like especially because the first part was invented like most of the implementation for bio genomics so you have a lot of tools for that that is not available anywhere else so people still doing that they come you show them and then then for example the weirdest stuff as a request like I've that we know that something fishy is going on that there is actually currently a bug somewhere down in the representation of the matrices that we only use a 32 bit for for indexing the matrix and then we were asking like but what is that machine that you have that you need a memory that like you can index a 64 bit and he's like yeah it's a big one and okay and what's the data then again I currently work in that feature and that feature branch is like open so you can use it and help fixing it but yeah like currently like actually like that's that's nowadays getting more and more popular view I heard it many times like in the last three months I heard in a conversation that at least four times the idea of everybody's like how do scale horizontally do you know like the price range is currently about like a terabyte of RAM because it's not that much if you compare it to your your Hadoop cluster so if you take like a Hadoop cluster with all the nodes and you can like super horizontally scale you will have the bottleneck of super GCing of spark which like they did even like a breaking change into 1.6 that like half of spark was just GCing all the time you got the bottleneck of your network it's awesome to store data but if you really want to do this kind of stuff honestly if you have a terabyte of data it's not that expensive you might get much faster somewhere than with your spark setup but non non I don't want to like again I use spark on daily basis so I'm not saying that's a shit I'm just saying that there needs to be some understanding of the limitation of the two so real word back to the story was it a real word I don't know so I'm gonna talk a little bit of the segmentation I did for my own PhD and this is what I used for doing that so usually I didn't did much stuff in in medical image processing and the usual problem is that you do like you get like patients and the radiologist time is super expensive like super expensive that like they are like optimizing everywhere in the world to try to get like one extra person to be examined like the only the records of examine to buy a buy a radiologist so like if you can help any how to guide the radiologist work that's a super important thing so usually that's why we do automatic segmentation automatic like detection of of malignance points whatever in the lungs in the hippocalamus wherever you want so like this example is talking about what this part of my work was about like segmentation of hippocalamus and these lungs so you get like this is like opinions of different radiologist and they are like superimposed like this is the median of it mean of it sorry and so in inside here you have the heart the two lungs the spellbix and you we wanted to learn like we wanted to give like create with the colleague of mine the segmentation framework that allows you to not only use on this but as well use on on on other fields because every segmentation method is awesome on one particular organ and really bad on others other organs and we wanted to create a segmentation framework that auto tunes for this so and so basically what what we started first idea is that like first let's make an organ that actually system that like we can detect like where is this organ and what is this organ and if we can do that then we can choose like the set of like what are the algorithms we have for the segmentation and that's it like it's going to be much better right I don't know like so the thing is that I wrote the part in shogun like it's a latent structural as we am I didn't know if you know it if you don't know it it's some fancy stuff doesn't matter it's the same stuff you can do in deep learning nowadays with tensor flow I believe in minutes that I spent months so yay anyhow so yeah I use the part based detection it's it was like super super popular at that time like for in doing detection of people objects whatever on the images and then what was we had an idea that as soon as we have the detection of the organ we we can actually specify areas of the organ itself because each segmentation method works much better if you are capable of saying showing various the prior meaning there is the area which is inside and which is the area that is outside of the organ because either you're gonna do a graph cut or or a level set of segmentation doesn't matter if you have the prior it's going to grow nicely and better and you can like shape it run time so anyhow like we I learned that model and of course we made the state of the art better than the state of the they are we were all happy these are the results that you get these are like the green area is the is the correctly detected area the red area is the incorrectly undetected area and there's I don't know if there's yellow and this is this is false right so that's it like that's what I was doing it was awesome and then it's all in shogun nobody uses that I suppose and then like here is my any is there anybody here from like who's a student like university student yeah cool so I'm talking to you just quickly sorry for the rest of you so Google summer code like you you get like we are accepted organization you get three months of work you applied to us with with some ideas or we have some ideas you show us how we are who you're how's your skills and if you get in if you choose if we choose you then you get like five thousand five hundred dollars US dollars for your work for three months during the summer there are some project ideas that I show here like it's it's on the wiki I think I'm running out of time right oh sorry and the application deadline is fourth of April so please do apply if you want it's a I loved it I did this twice once with with yeah so safe which is a really good media framework back in 2010 and once then I joined Google now that I joined shogun the first time with my summer of code a bit prior to that it's a really good stuff like you can really like do do really good things on your own and and nobody really constraints you only your mentor tries to help you out to make things better and send us your PRs and we'll review them and like just send in your application and hopefully it will work out and you'll understand why we like it so much because it's it's a really good thing like we kind of like I have a feeling that the people who get out like usually we get like people like pan just like a girl from from from China but she's now living in New York she joined us like last year like Google summer of code we chose her because she seemed to be like like willing to do things but then we were seeing her code this was like oh my god thank you you're like oh my oh my god are we and like what are we gonna do with her but now she's like a core contributor like she had like zero understanding of C++ at the time like literally zero understanding and now she she actually ended up writing half of we're writing half the linear algebra stuff where we can like on runtime choose between the GPU CPU and might mix that and vice versa and she all write that templated in C++ and she's currently a core member and not because we are desperate of finding new members really not like she she did an amazing job and and like we did this couple of times and we feel that like if you join you do this job which is not actually a job because it's an experience there's a lot to learn about not only like like your job because I guess some people joining this is they that's what their job is I suppose and then but as well you learn about open source communities and what does it mean how do we work and how are you supposed to interact with them because I hate it when somebody comes into the channel and says like this is a shit it doesn't work it doesn't compile and you are like okay yeah we are working on this the last I don't know five years thanks a lot for your contribution by and yeah it's it's really good and and and I'm set to see nowadays more and more projects are like yeah I mean like Apache projects I guess they work out nicely that they they they not only being used but they are being contributed back a lot of time or like some projects being generated to be an Apache project but there are many many projects I see dying slowly slowly because companies just take it but never give back which is like sad because like many many good things just started off with open source things your kernel in your Linux right okay that's it sorry about like oh we're running and and I'm sorry if there was not too much data signed in here around us for yes yes yeah it will be me a bit more explain it like going in details about data science is actually and and then we're gonna persist all about is trying to get trying to get more of the companies out there trying to get people on the ground for you guys to help contribute back to the community to this kind of awesome projects and that is really the sole purpose of my is to actually give them the opportunity to showcase their open source projects even though if you have any open source projects they are working on I'm working on an open source project just me and someone else very small open source project you know and then we are sure good we have us to see we have accelerator are you know like corporate companies are also doing their own open source projects so and this is one of the things that we are trying to promote and get more people hopefully to contribute back and also at post Asia during the summit in next week on Friday Sunday Sunday there are a lot of amazing people flying down from us from Europe from Asia all the different people say from Singapore we have a few speakers around here also says yeah all right thank you very much for contributing to the open source community and sharing your knowledge and your contributions to force Asia or rather to the conference itself so I hope if you would like to help contribute you can't go next week but you know that your company or your friends or your communities or whatever we have we are still lots of pamphlets brochures things that you can stick up stick up on your notice boards if you have your companies to share this conference with anyone else it's really really cheap for a community ticket it's like $42 and that's over three days and honestly speaking lunch all three days lunches are provided and that is essentially what the $42 is for is to pay for lunch we don't need anything at all right so with that I hope you enjoyed a preview a sneak preview of what's gonna come up in force Asia summit 2017 next week the speakers will still be here for the next few minutes if you'd like to speak with them but with that thank you very much we're gonna have a summit