 much nectar for providing him, paying for his airfares, five-star accommodation? No, urban nest of course. Urban nest, oh you need more toilet paper too. We've had that one from everyone. I'll be bought out of those, no problem. Chris Feed is also a lecturer at ANU and he's actually talks at ANU on this particular topic he's chosen for you today so without any more ado, over to you Chris Feed. Thank you very much. Good morning everybody. Thank you very much for giving me the chance to talk about what we are doing in Nikta. We are machine learning researchers and so before I go into tools or toolkits I will tell you a little bit about machine learning, what the problems are which we are facing when we are doing machine learning to get everybody on the same page. I'll give you a short demo of Elephant and then I'll talk about the Elephant architecture and implementation issues and then put that into a broader picture what I think machine learning in the future could be. So I'll just give you some examples you know junk mail filtering you have a lot of emails, a lot of them are often junk so you as a user you label them as junk or no junk and then you feed a machine learning algorithm which then learns what are your preferences. Now I have only given you junk here. You think your private males are older in here and to each of these here we would all label with junk and the others would get a label no junk. So the task is learn to identify new incoming junk mail so you can throw it away or put it in an extra folder. Handwritten digit recognition. Twenty years ago that was a real challenge. People wanted to automate the snail mail office so you had all these handwritten digits on your letters and as you see you have very very different ways of writing digits. So you take a lot of these as input examples and you tell the machine that this is a 2 and this is a 2 and this is a 6 and then hopefully you have chosen the right algorithm and your machine is fast enough so that it can learn something about how humans perceive digits and then when you present it with a new digit it will hopefully recognize the correct digit. This is already a little bit more advanced here. We are given a set of photos of images containing natural scenes and we are trying to look at the statistics of the patches in these natural scenes. So we're taking five by five pixels here and these patches here are sorted by the probability so you see it's these are natural scenes. The most important colors are blue, whole patches of blue probably sky, whole patches of green probably somewhere grass or something else in green and then you get certain patterns and you go on and go on. If you think in terms of Fourier transformation these are the lower frequencies and then you go on to higher frequencies in the patterns you find. So you take this you learn the statistics over natural images and then you take this information you grab a new not yet seen image you put a lot of noise on to it and we have the river did not chosen red or something which is not there we have chosen green and then you take the statistics which you have over the normal images from natural scenes and you are able to denoise this and you come up with this as a result. So you see here we don't have directly given a label this is yes or no a junk or not but we're still collecting some information about what a natural scene is it's not a randomly created set of pixels it has a structure it has a certain probability structure for the patches. So if we generalize is a little bit more so we have inputs we have possibly input labels which tell the machine this is what we want but that's not always necessary if it's available we call it supervised learning if it's not available call it unsupervised learning we may have other information or other assumptions like for instance statistics over the natural images which we had here and then we choose a model a mathematical model for instance very simple a polynomial model with some parameters. So that's the first step we have to find the best model which of course takes a lot of time to learn because you have to understand which model is good for which kind of application task but even if you've chosen a model then you have the parameters so you have to do a lot of experiments in order to figure out what the parameters best parameters for your model are. So to make it even a little bit more general and formal that is the definition what machine learning is and it's done by one of the famous machine learners Mitchell. The computer program is there to learn from experience e with respect to some class of task t and performance measure p if it's performance at task t as measured by p improves with experience e that is it sounds silly but I can tell you when you talk people to say are we doing machine learning well can you please define what is the experience you're using what is the performance measure you're using to ever late and how do you measure improvement so it's good to keep that in mind I want to stress another point here machine learning is not memorizing it's not just writing all what you have in a big table and then look it up it's more and usually we use the term generalizing because you have unseen examples which sit somewhere between what you have seen and you want to generalize what you have learned about the other stuff to these unseen examples and generalizing means you want to give a good prediction for what you want to get out of your machine learning algorithm for these unseen examples so I'm going through a very small example here to just give you a taste of what it means I have a set of data points here they are all scalar and I have a set each of the data points has a target that's what we measure now we want to learn something from this data so that's just a formal definition I'm normally always writing vectors as column vectors to not confuse somebody so that's obviously transpose there makes it easier when you have to implement that stuff so the question is what can we generalize here so you can come up with many many answers one possibility is to say well let's assume we have some kind of polynomial model we want to fit some curve some smooth curve to it so we take the input we take some settings for our weights and then we calculating the output by calculating multiplying the weights with different powers of the input variable but this is just shorter and that is even shorter the same thing so that's a linear model it's nonlinear in the axis because you have powers up to M in X but it's a linear function of the unknown model parameters so the question is how can we find good model parameters now for our specific problem here well remember the definition to pages before before two slides before we need to define some performance measure and in our case a good idea is to take the difference between what we have seen in our data and what our function our polynomial would give us and square it so we take this squared error that has a nice side effect that we don't need to care about the sign but it has much deeper reasons why a squared error is often a good idea to take it as performance measure and if the situation is not very strange we normally have a nice minimum so we can take this and we are looking for the minimum of this function of this error function over all possible settings for W so that we find a smooth curve through our data simplest model we just take our input data really not into account we setting M to zero while we are taking them into account because we're calculating the mean of our input data for this if we fit this model to our data we will get the red line here and I've also drawn here the the other life from which the data were really produced but normally you don't know that the this is not visible it's not given so we can go a little bit further and take a linear function while that fits a little bit better our performance our squared errors are going up they are going down we can take a third-order polynomial so that's already pretty good I skipped the second order because you cannot fit an odd data set to an even function but we can go further the world while stop at three let's go to nine let's take a nice order polynomial and oh beautiful it all fits our learning task is done and the term is called overfitting because in this case if you have n data points given so we have two four six eight ten data points you can always fit it to an n minus one order polynomial perfect fit but you can imagine if I take another point which may live here my prediction would be somewhere up on the slide so that's not good but there is a lot of theory behind that how can you avoid overfitting the general principle is takes a simplest model which is good enough to explain your data and sometimes called all comes razor and it's a good guideline whenever you do science don't take the most complicated model this is a little bit a more different setting here when we do machine learning we general assume even if our data are in a very high dimension that they are living somehow on some smooth service with which has low dimensions so you can take all the pixels in your picture and it gives you hundred thousand thousand dimensional spaces but you still assume that the data are living on some space in there some service there it's technically called a manifold so here for instance the data is really sitting on a two-dimensional manifold although the whole thing is rolled up in 3d so what you really want is you want a projection into two dimensions which somehow preserves the topology of your data because if you can throw away dimensions if you have not thousand but only two or three you can learn something about your data you cannot visualize thousands dimensions so that's called nonlinear dimensionary reduction and it's a very powerful technique to look at nonlinear data another application of the same technique here is you take photos of different phases of the same person but different face phases and you throw them into 560 dimensions each pixel is one dimension just grayscale value but when you look at this and you do a projection on to two dimensions you can learn something about this phases so here you have the really happy face here more a neutral way face well I don't know how you would describe this here but okay but it's definitely different from the others so what should this examples illustrate besides giving you a little bit of a taste how machine learning is done we normally deal with very large data sets lots of data lots of dimensions ten thousand dimensions very normal if you try to approach problems like labeling images by by their object objects which are in there you have to work as feature spaces which are hundred thousand dimensional you need to do a lot of experiments it's not an algorithm they say okay here my data plug it in give me the result thank you you do a lot of experimenting you do experimenting with different models you have to try different algorithms different methods and you also have to deal with different hyperparameters the hyperparameters are the parameters like for instance the order of the polynomial you have to set it from the beginning you are not learning that from your data and then of course if you do this and you do a lot of experiments you have to record the results you have to document the results and if you publish a paper you also have to make sure that you're able to prove that what you have claimed in your paper has really been done and one good way is of course to have the code available on the data available and then of course if you teach you would like some tool to help students learning statistical machine learning without doing everything from scratch so that was the motivation why we started to build our toolbox Elephant which we call efficient learning large-scale inference and optimization toolkit the Elephant is written with F because the initiate initiator was a German not me and the second person was coming from India so the Elephant is hasn't does not mean it's big and bulky but it's a wise Elephant so I'll show you a little bit a demo here so we'll switch over and so you start with an empty slate you can take data readers writers you can set filters you can choose algorithms and they are all coming as as building blocks which you can connect so you plug the data for instance in the test data and then you go can go on in your data flow they are really building a data flow these connections are sets of types I'll come to that later so you cannot draw any connection to any port here and on the left hand side you have explanation what the ports are but and you have properties for all of these components here for instance you can choose different kinds of kernels I don't go into the details what that means here if I choose the wrong one I get an error message before I build a whole setup now I'll go back to something which I have prepared before and the nice thing is you can start an experiment directly from a file so we have a regression module here we have some input data from which we want to learn and then we have another data source which provides us the test data and then we want to use this algorithm which we have for these machine learning algorithm which we have learned to output our prediction and then we plot that on a monitor so I'm just running that done so you see the blue diamonds are the input data from which we try to learn something and the red line here is is a large number of test data which then takes the learned function and produces an end result on the y-axis you can change things here for instance we can go and we can take another kernel or we can change the scale for the Gaussian here so if I put in a hundred you get an error message because don't know if you can read it this property is not a integer it's a floating point number so we have to provide a floating point number here and I think I somehow have a problem because my interface is stuck so you can normally get all the data over the internet you can provide URLs you don't have to have the data locally but I did that locally here and well you can change things in the plotting if you want you can change colors transparency the style of the line whatever if you're happy with your experiment or even if you are not happy you might want to save it so you can save the whole experiment as for instance regression one you quit go home and the next day you continue by now asking for regression one and you get back your setup if you don't want to store the the visual interface you just want to generate code then you can save your whole experiment as Python code and sorry I should have tried that before now I didn't I have problem with this 32 bit and 64 bit on this machine so what what you really can do is you just execute the code because it is plain Python code sorry but it's just pure Python code so you can execute it as Python code you have all the parameter settings in there which also means that you will you can easily report to use your experiment because it's all stored here in this file okay of course we can do other things like classification so we're given a set of data points they all come from different classes and we would like to figure out how to cover this area in such a form that we can later find another point and figure out to which class it belongs so the algorithm is running so that's the result you get when you do that so here just to mention the people of our group have contributed Kishaw Gawande is my colleague software engineer has done the graphical user interface then we have many students having contributed code and not only students also researchers contributing their code into the toolbox so the whole toolbox is provided under the Mozilla license for Linux Windows and Mac OS and you can see this is an older version here because we had an extra data flow but we got rid of that I'll talk about that in a moment I'll talk a little bit about the architecture we have several layers and we'll look at the lower part here so we have external modules which we use for instance external packages CNC plus plus very large-scale scientific packages for distributed systems solvers for the distributed systems and then interfaces which you can take CC plus plus and so on code which is used inside of our modules so we are heavily relying also on the sci-pi and numpy implementations which give us the real fast mathematical processing and we have our own solvers and optimizers and then the different machine learning algorithm this is a complete functional API on top of that it's a component architecture that's what gives you the graphical interface it wraps around the algorithms it does pre-processing model selection visualization and then you can build your applications with that we have documentation of course in different formats our own scientific documentation but also doxygen generated pyson c and c plus plus documentation the website is elephant developer nick da commau and we have also some mailing lists when you deal with real data you're dealing with a lot of different data formats which people have decided to store their data in so for instance labels are differently used for binary labels you have minus one or one or you have zero or one or if you have more than two classes you have from zero to n minus one or from one to n so the data readers are basically shielding you from a lot of this complexity you can define the separators you can define where the data in your file set and in which column the labels or the label sits we can also read MATLAB data of course and create random data we have a number of algorithms these are just some examples I don't go too much into detail here believe propagation regression classification online learning Gaussian processes feature selection and some others here we have also some modules which are not integrated in the graphical user interface for instance we have a module which can deal with very large data that's which don't fit into your memory so it's using the Intel thread library which we don't provide you have to get that from Intel but the rest which we provide is open source and then you already saw the graphical user interface for quick prototyping designing application workflow quickly share the experimental setup with other users and it uses the component framework to for instance dynamically question the files which you want to load because the MATLAB data they don't have a data format where they provide you everything in ASCII you have to really question the data and to figure out what are the parameter settings inside of the MATLAB data format and the loading and saving of the XML of the experiment is done in XML I'll come to that soon and and we have I hope you believe me that you have executed the Python code so when we thought about what kind of features we would like we want some MATLAB like language because people are very used to write MATLAB experiments and machine learning so that's why Python was a good choice for us it has its drawbacks we'll come to that in a moment but we need efficiency so the libraries are in C or in Fortran you can plug in your Atlas tuned the lab pack and players libraries we want reproduci- reproducibility for verification so we have this command line executables ease of use the GUI and then the other things which I already mentioned we want it to be to go out together to get with the papers we published so it has to be open-sourced but we also want companies to be able to use the code in their applications so we made it under the Mozilla license I'm going a little bit into details here with some implementation issues for you programming in pysons that might be interesting so I already mentioned we have the API and that was one of the main decisions that we said we want to have an API which is completely independent of all the graphical GUI interface we have so other people would like to use the GUI but how do you reconcile that the problem is what do you do with your instance variables we had to install them in the in the algorithm which is accessible via API or in the component then you have to double it so what we did is we said well everything has to be in the API because we want the API to be standard standing alone and everything for the GUI is derived from that but after that decision we realized we can do more now because when you default for instance design a define that you have a degree of 2 for your polynomial and you're using that in your interface you don't want the 2.0 or 2.5 there you don't want the floating point number you also don't want a negative number there and you might even want some extra information in your GUI what this number means so what we did is we used our own custom properties so when you define a glass polynomial kernel you write down degree is custom property it has type int it has the initial value of 2 it can range from 1 to the max limits of integers so this avoids getting negative numbers here and it gets some explanation so that means when you have this you don't need to do anything in the GUI the GUI can very generically check whether the user has put in a positive integer number and if not it doesn't work it also allows you to check the connections if the data type you're providing here as an output and the data type you expect as an input is not the correct one then you cannot connect these two ports how it's done is we are tampering with the meta class meta class is the implementation of a class in Python so we overwrite that and here you when you create a new class you call this and from there use and do the adjustments so you can easily write it in this form as I've shown here you just write degree is equal to this without having to do anything else we found it a little bit stress so stressful that you always have to define the execution flow when you work with the GUI so we realized that for many applications you have an execution flow the graph itself is an acyclic graph because the data are coming from one port and going into another port the operational units like for instance the algorithm here has a natural ordering you need to first do something with the train data because you can do something with the test data so if you have all this information then you can run an algorithm and automatically figure out what is one possible execution flow so we just define on each of the components the properties of each of these ports whether it's an import and what kind of possible data structures can go in there here in this case you can have dense matrices and different kinds of sparse matrices and then you run a topological sort over all these components taking the partial order into account you come up with one of the possibly many execution orders when you want to save and load these experiments well you get pickle and un-pickle and a dump and pickle in in pyson but there's a problem somebody has an idea what the problem could be it works the first time but when you have loaded data into your object it dumps all your data out so it doesn't really it works but it gives you a huge data file for your object so what we did is we overwrote or we wrote our own pickling and or technically speaking our own marshaling and un-marshalling algorithm which is nicely supported by the properties because we know everything we want to know about our object is in the properties so we can just write out everything from the properties and have a nice XML representation which can then be read in and the experiment can be set up in the same way as it was saved okay good few more slides some problems and challenges dynamic typing in pyson is a problem especially when you try to work with larger set of modules and several people work on the same project so you have to do a lot of good new tests to make sure that things don't break everything is public in pyson that just demands a lot of discipline we are currently in the transition to pysons 3 and of course we also depending on the numpy and scipy modules to go to the same level to the same version interfaces to patsy and tau which are big systems they are also of course changing when the versions on the other side change and if you want to use these algorithms you don't always have a pyson interpreter so what do you do when you have an embedded system should we provide more modules as coc plus passcode the direction we are looking into is of course dealing with multi-core processors the machine learning will benefit a lot from having multi-course but only if we rewrite our algorithms because a lot of the machine learning algorithms are not easily parallelizable because you have algorithms running over graphs so you can't just do this nice vector operations which you can do in linear algebra where you just put things in partitions same for GPUs we're getting more and more powerful GPUs maybe open mp they are interesting projects which use machine learning to learn to optimize code patterns for compilers milepost and seed tuning that's an interesting direction structural if we think about the GUI we could have loops in our data flow and then we would get close to what simulink has in medlab but we would have it for free if you manage to do it and because of course we can think of an experiment as a block with some input and output and then you can think about a hierarchical system that's what we're currently thinking about so as I already mentioned the elephant was not chosen because the elephant is big but we continue to use the name so that is a joke here but this is really the big problem or the big challenge we are not alone in the world they are great machine learning toolboxes the vaker from the University of Vaikato yesterday we had a presentation about Hadoop, Mallet and there are many more should we not really go into the direction of thinking about structures and protocols so that we can use data from one place one algorithms in another place send the result to a third place which does another algorithm on the and the data and that is probably the direction which we will go as a community so that's the third point as a force point here protocols and structures and there is a project at ANU and NICTA done by Mark Reed and Robert Williamson who deals with what are the generic elements in machine learning when you want to define structures and protocols and really from the practical side but also from the theoretical side because in machine learning we are realizing that many different problems could be rephrased in the form of the other so there is some kind of unification going on so that we don't have to start from scratch coming up with new ideas and solutions all the time other questions of course I mean how do you support users who don't know too much about machine learning I already mentioned standard formats for data exchange what is an experiment machine learning anyway so replicability might not be the same as reproducibility that's a sort I would like to leave you with Chris Drummond man put that out the ability that you can can run the same experiment with exactly the same result what does it mean you can reproduce it okay like you can replicate it but what science really is about is if I change a little bit does the output also change or that you are suddenly get a very different result or if I do again a different result what does it mean that I get a different result so that's meant by reproducibility you're not trying to just get exactly the same number at the end or the same result at the end you really want to vary things and figure out what's happening there and of course ongoing discussion and it's going in the right direction that if you are writing good papers you should also write good code so other people can verify that what you have done makes sense again because replicability just running an experiment and getting the same result it's not very powerful okay yeah summary I think I mentioned all of these things here and I think we are really done with time so I leave you with the last slide again the URL for elephant if you want to look at what our group is doing here and Nikta has currently 12 open source projects you find all of them here under this address and if you're interested in the protocols and structures for inference there's a URL here and if you want to learn something about machine learning you find 800 slides about machine linings there thank you very much for your time and lost this magnificent gift this is I don't know in the other halls there you've seen the gift here made out of macadamia nutshell compressed flooded lovingly restored the mentor of this magnificent conference we've had here for all of you hope you've enjoyed it we've got another one on just 10 minutes