 Thank you for your patience for the technical problems, hopefully the last of them. I'm from Wolfram Research. What we do at Wolfram is we automate computations. We do that to make experts more productive, but also to allow people without expertise to actually be able to use powerful computational tools. What I'm going to do in this talk is very much focus on that second end of things, that I'm going to treat you all like complete novices, and my claim is that in about 30 minutes or so, I can show you enough that with the automation provided by the Wolfram language, you can actually be productive as an AI expert. That's my claim. I'm going to throw in some live computation and a few random things that may or may not work along the way. Let's start with some conceptual background first of all. You need the basic ideas before we client actually do anything. I'm going to do the opposite of what my industry is supposed to do. I'm going to unhype machine learning because all of this stuff that you hear about, deep learning, recurrent neural networks, convolutional neural networks, my claim is it's all just fitting. It's all just a fancier version of what you did in school when you were probably 15 or 16. Let's go right back to basics here and say, what do I mean by fitting then? Here's the kind of thing that you did in school. You had some data, maybe this was how far somebody seemed to have travelled over time, and what you did was you had a model in your mind. You said, we're going to make this a straight line. That's our model. You come along and you give it a model and you got your ruler on the paper and you drew a line that you thought was a good fit. The idea of that, the basic idea of the model, is that you can then start filling in the gaps. You can say, we know what would have happened here. If we'd measured a point here, it would have been at this distance at this time. To some extent, you can extrapolate as well. Although, of course, it's a leap of faith to say that what will happen after the data was collected continues to behave in a similar way, but that is also a benefit of having a fitted model. You've already, anyone who's done fitting on a computer, you've automatically done machine learning, because when you do this automatically, these two numbers that describe straight line, they're learned from the data. But there's a couple of problems with this kind of fitting that you learned in school. One is I had to come in as a human at the beginning and say, that's a straight line. That's not completely artificial intelligence. I had to provide the insight and the computer just automated the application of my model. Of course, we can change that and try and adapt to different kinds of data. If this data wasn't straight, we could say, well, what if our model was curved in some way and we can change it? But we're always having to intervene as a human. Then there's another problem, which is that if I intervene too much and have a really rich model, crazy stuff starts happening. The reason is this is called overfitting, is that now I've given it too much freedom in the model. Instead of getting the essence of the thing that was going on, the car going on a journey, it's now trying to capture the noise, the randomness of the bad timings and bad record keeping. We end up with a model that doesn't help us in any prediction. This is where we come back up to date and say, what is it that's exciting in the AI world in this whole automated machine learning that's different from that? Now we have a collection of tools for doing this fitting that are much more flexible. We don't intervene so much. They're more robust against overfitting, so they capture the predictable essence of what we're observing more than the noise. But the biggest thing is dimensionality. In my, how far have I travelled over time? I had one dimension in and one dimension out. Time in, how far have I travelled out? So it's a totally one-dimensional problem in both directions. Usually what we want to do is solve interesting problems. Interesting problems are very high-dimensional inputs. We have lots of data, not just one number. Very often as well, we want to get high-dimensional output out, something that isn't just a number out. We've also got a complexity that's not always numbers. If we actually look at what these good models can do that aren't restricted to low-dimensional pre-decided models in too rigid a way, then it takes a bit of a leap of imagination to think what does it mean to fill in that gap in the data that I had in my original plot. Well, let's have a look at a few real models that are doing things trained from real data. So this is taking very high-dimensional input in. It's seeing a person, so it's getting maybe 10,000 data points in for the image, and the idea is I can hold up objects and it can figure out what they are, bottle water and goblet, that'll do. Or I could ask it to look at me in particular and give it some different background information and say, what kind of person do I look like? I look male, and now how old do I look? And it's very unkind to me, yeah, it's being unkind again. I'm 47, but if I try not to look too wrinkled, it hit 46 for a moment, but it tends to make me look in my 50s. I guess it's the miles. So that's large dimension in, one dimension out, because we're just getting a class out. His large dimension in is exactly the same kind of thing it's images in. I'm going to do a quick Google image search here, pretty villages. Get some pictures back from Google. So that's how I'm going to use this as my source, and hopefully if I've got good net connection, I can pick a picture that looks like where I came from. This one looks like where I come from. So I'm clicking on that, and I'm telling it look at that picture and give me two dimensions out this time. Not just a class that might say village. I want to get out a latitude and longitude, which now allows me to throw that onto the map and say, here's where it looks like that picture came from past experience. And I guess this one looks a little bit less like Oxfordshire. And if I click on that, then we get some different estimate that this looks like southern Spain. Right, what else should we do here? Let's have a look at something where we're getting high dimensional out. So if I get myself in front of the camera here and grab a current image off the camera, if you want to, let's say yes to that, whatever that was. And hopefully what it'll do now is it'll take two lots of high dimensional input, a picture by Kandinsky and grab off my webcam, and it's taken the essence of the Kandinsky style and imposed it on the photograph from the webcam. So now we're getting two lots of 10,000 pixel inputs and one lot of 10,000 pixel output out. So we're very high dimensional in both directions. And maybe I have time for one more here. Maybe, yeah, one more. So one thing I suggested was you fill in the missing gaps. Well, very often thinking about what's missing in things is again a bit of a leap of imagination. So I've got a little photo here, all the code that's not very interesting. But if I run this, we can fill in the missing information, which is a fairly obvious bit of missing information. It was a black-and-white photo, so we've used past experience to guess the colour. But we can go further than that and we can fill in another bit of missing information. It's a photograph which is two-dimensional. So hopefully with a little bit of deep thought on this, on a pre-trained model, we can get an estimate here of how deep the photo is at different points. And we can see the dog's nose sticking out and its forehead sticking out further than the background now. One thing before I continue down this AI machine learning roots that it's worth remembering is it's not the only trick in town. It is not going to replace everything that you've ever known. It's a tool to use when you're data-rich and understanding poor. We'd all kinds of cool computation if you are the opposite way around. If you actually have an understanding of the physics or the chemistry or whatever of the situation and you don't actually have heaps of data, then you're better off using classical modelling techniques. So let's get on to what we can do for that now. There's a pipeline that needs to be followed here in the machine learning process. The idea is that we start with some training data and then we have to encode that into numbers. There's a step there that we have to get through in order to use all of the modern techniques which are numerical. So we have to have a way of taking that photo and turning it into numbers. That's fairly easy. Taking words and turning them into numbers is a little bit more choice about the way to go about it. And that gives us a bunch of numbers. We then take our model and we train it so that it adapts the model just like we did with fitting to fit to the data. And then we end up with a train model. That can be expensive. That image identifier takes many hours to run on a GPU to train. But then the evaluation step is very fast. We do the same process with the input. We encode it. We get vectors. We put it through the train model. We get vectors back. We have to decode into some meaning at the end. Now, the good news is that where we have low-dimensional output, things like classification or just predicting numbers, without the stage that machine learning is essentially fully automated. You don't really need to know anything to be productive, apart from how do you prepare the data and how do you tell a computer you want to do machine learning. So let me walk you through that process. That's things like classification and prediction. Sequence prediction is low-dimensional output as well. What comes next in sequence is typically just a name or a number or a letter. So let's do a bit of classic machine learning from scratch. I'm going to now drop into code here so you can see what you would have to do as a developer in order to do the machine learning. So here's my classic machine learning dataset. I've got some passengers of the Titanic. So the first passenger here was a travel first class. They were 29 years old and female, and she survived when the ship sank. So the task here is can we predict what would happen if I were to go back in time and travel on the ship? What would be my chances? So the task here is classification. That's our approach to the way that we expose models to users as a start, is to say what you want to do, not how it's done. So we'll say classify the data, and it'll go away and do some thinking about that data. Hopefully it only takes a few seconds. It's a bit small print, I'm afraid. I probably should have made this bigger, but what's interesting I think about this example is buried in this small print here, just in this third line here which says, I'll read it up for you if you can't see from the back, it says method decision tree. So one of the models for machine learning is decision trees, and it's decided on its own and the best model to use for this data. Now, I might be an expert and I might say I want to use K nearest neighbor or neural networks, and if I am using nearest neighbor, I want to use a four neighborhood with a Euclidean distance and all of these details that you might care about if you're an expert. But if we're trying to tackle the question of how to make people who are non-experts productive, then the first step should always be to automate all of that, all of those choices of which is the method, what's the hyperparameters, what's that pipeline of encoding this thing, you'll see here it's said that data is a numerical nominal, so a category, a number, and a category. It's figured that out for itself. I don't have to go through lots of work trying to encode the thing into the correct vector for the training. And the net result is that I can now say, actually I had a birthday since I wrote this example, and say, okay, I'm 48 and male, and obviously I've travelled first class, and I would not have made it, and we can drill into that model a bit more deeply and say what were my chances, and you can see why it made that prediction. By this model I had a 64% chance of drowning. Now, what we've gone through there is the whole training and application. We also have a deployment step. It's not strictly speaking anything to do with AI and machine learning, but automation should take care of that as well, so I can go the final step and describe this sort of symbolic user interface for this predictor, and deploy it to a website so that I can make it available to you, and now if we open that website, hopefully I get now a form where we can fill in some different values, and let's say a 66-year-old, so we don't think we want 667, that is extrapolating too far, and submit and get back the results of the model. So we've gone through the whole compute-developed-deploy cycle in a totally automated way in whatever that was, eight lines of code, and that's our aim to do with everything in computation, although the level of automation varies by the type of computation. Now, let me show you a completely different example. That was some nice numerical textual data. Let's do some computer vision, like the examples we saw before. So I'm going to start here by capturing some data, and this is... Let's see if we can get this to work. So I'm going to show it... teach it rock, paper, scissors. So here's some examples of rock, and we'll get a few of those, and we'll do some paper. Whoops, get it on screen would help. And when we've got some of those, we'll give it some scissors, and when we've got some of those, we'll say stop. So all I've done here, I made a little user interface just to capture this input data, but that's just like the Titanic data. We've got some training examples and some outputs. So because of the automation of the feature extraction and conversion defectors, all I have to do is exactly the same thing as a developer. I say just classify that data, and what it's going to do is whatever it thinks is appropriate here, it's going to say logistic regression on image features, and now hopefully if I put the camera up, I can put up rock, paper... Oh, go on, paper, paper. A little bit of paper, scissors. Rock. If I stay away from paper too much, it seems to be working okay. So I think that's kind of pretty amazing that it's exactly the same task to deal with Titanic survival statistics and computer vision because we can abstract away what it means to do all of those different steps. So if you're going to be an expert or at least a fake expert in the field, then you also have to think about the methodology a little bit. And a lot of people think, well, how does this work? That's not really the way you want to think about AI. You don't ask how, you just ask does it work. Typically when you do classical modelling, if I measure projectiles, then when I fit the parameters, they tell me something about the real world, what air resistance was on the object or the air density or things like this. Typically the parameters aren't meaningful in any way we can interpret. Does it give me useful predictions? And we have to treat it a little bit like magic, but magic that we don't trust. So one of the key parts that we can't skip out in doing machine learning work is we have to validate. And so the idea is, here's the basic idea, is you never use all of your data. In theory you'd get a better model if you used all of it, more data is good, but you don't use all of your data for training the model. You hold some back and then you measure for the data that you knew the answer to, but it doesn't because it never saw it, does it give a good prediction? Let's do that quickly here with a similar example to before. I'm going to classify some flowers. Here's the date, so I've got four measurements of these three different kinds of flowers. And I'm going to split that training set into 100 samples randomly that I'm going to use for training. And the rest, about 30, I think it is in this training set. I'm going to hold back for testing. And then I'll do the same classification I did before. It's getting easy now. We've seen this a few times. But now I'm going to pass that classifier itself through a measuring object with the data it hasn't seen. And then we can query that for various metrics. So this is the headline one. This is saying it was 96% accurate. Well, that's a very contextual answer. If I'm packing bulbs up for a flower shop for irises, that's probably pretty good. If it's a self-drive car, that's pretty disastrous. That it'll make the right decision, 96% of the time, I'll be lucky to get home alive. So you have to think about what these numbers mean a little bit. And then there's all kinds of ways to drill in and say, for example, where did it go right and wrong? So we can see from this example that Satoza, there were 19 examples of, and he got every one of them right. But it got a little bit confused between versicolor and virginica flowers. It got one wrong in each direction. So that might guide us to where we need more training data or where we want to be, what kind of failures we're going to get. Again, that's fairly contextual. So what those measures are, very little bit, depending on the type of machine learning you're doing, if we do something like prediction, another classic example here, predicting the price of houses in Boston, I've got these different data points on these flats and houses, and the data looks like, I've lost my mouse, there it is. Looks like this, we've got some numbers, this is a pretty straightforward numerical data set. Here my task is prediction because I want a number out, not a class. I don't want the word expensive or cheap, I don't want an actual prediction. And that changes the way we measure it a little bit. So when we measure a thing, we can then start saying, well, what does that look like numerically? And the perfect thing in this prediction plot is where you've got inputs on one side and outputs on the other is it sits on the straight line. We don't care which end of the straight line, we want it on that line. So we can see that mostly it's doing a fairly good job that there's some outliers up here that it does a poor job on. But really the headline measure here is that within about $4,000 of the correct price on average across that spread. OK, so far I've focused on one very big domain and probably the main domain of AI and machine learning is supervised learning. The supervision there was that I told it the correct answer on some examples and said you learn what it is that makes those outputs what they are. But I did tell it what some correct answers were. Sometimes there's no truth. You have data but you don't know anything about what it means and what a correct prediction would be or wouldn't be. So I've picked an example here to illustrate this. I've chosen because it's something that we are very good at. What I've got is some photographs of dogs but I'm not telling the computer that they're dogs so I've just imported some random selection of these three breeds of dog and if we take a few samples here so here's the first eight from my random collection we've got I think I picked Chihuahuas, Bassithowns, oh it's in the code, Labrador's, Chihuahuas and Bassithowns. I'm not telling it that I haven't labeled these things as Bassith, Labrador, I've just said photographs. So what we can do there is we can have the computer look for patterns that are similar, that they have in common and things that are different and get something that tries to reduce these 10,000 data points down to a small number of numbers that actually is useful. We don't want to care unnecessarily about what the colour of the top right can corner pixel is but we might really care how big the largest circles are and some features like that. So now we can take this picture of a Bassithown and I can reduce it down to those learned features. Now, it's nice to imagine that these things are things we have names for. The first number could be how big the nose was and the second number could be how pointy the ears are. The reality is, as I said, it's very hard to understand what's going on inside machine learning. If we could identify these features they would probably be something we have no human word for. It would be some kind of zigzaggy splodgyness or something like that that is a thing that it sees and that could be represented by one of these numbers. But that doesn't matter because as soon as we can represent the knowledge in this image in a space learned from these pictures of dogs in these 50, 60 numbers then we can start doing calculations on the numbers. So one obvious thing we can do is to say how close are two pictures if we imagine them as a distance in 60 space. So we've got two pictures of Bassithowns here and hopefully if this has worked can I say, compare the distance between the same Bassithown and a Chihuahua picture, we'll get a bigger number. So within the features that it discovered that seem to be common and different across pictures of dogs it's learned a notion of similarity that has been able to say that these two Bassithowns look more similar than the Bassithown and the Chihuahua. Now, maybe it's because of the breed maybe it's because the background they both had sky in the background or didn't or whatever there could be some feature that is not what we're looking for but we can actually work with. Now I can take that to kind of an extreme by saying, well let's squash that down to two numbers we'll represent all these dogs with just two numbers and there'll be a composite of these 60 numbers and when it's had a chance to think about this what it'll do is use those two numbers to stick different pictures on the screen in different places so this is rather cluttered, let's make it nice and big. So let's see if it's worked here hopefully what we have so if we look over in this right hand corner over here we see that these appear to be mostly Bassithowns in the middle we've got all the Labrador's so I'm guessing these are the Chihuahua's on the left a little bit out of focus but there's certainly some in there there seem to be dark Labrador's at the top that seems it's somehow separated out as a different class than Labrador's but this clustering is doing one of the things that particularly unsupervised learning can do in a very useful way which is to introduce insight in an undirected way the way we for a long time have used computers is like the way 1950s management was you had a manager who was the boss and said you will do this and you'll do it like that and the workers just did what they were told and computers were like that unsupervised learning is a little bit more like a modern office where you want your staff to actually have ideas of their own and contribute and have a discussion with you so here it's really obvious to us as humans that what it's found is dog breeds sometimes I do this example with the last 60 days closing prices on a bunch of assets in the finance markets and you get clusters in just the same way what are the breeds of asset from those numbers I don't know that's something for us to think about and do further research in but the fact that you get a group that have something in common within the space of financial assets is already an interesting insight and it may be that it's a useful insight that there's this leap of faith that says if I've got two dogs that are close to each other maybe they'll behave in a similar way well that's probably a bit of a leap of faith from photographs it may well be that in financial markets that having assets that are in a cluster helps us to use some of them to predict the behaviour of others investigation and validation would be needed to know if such a leap of faith was correct but starting with the clusters is a good first step okay now we end my utopian bit of the talk because I'd love to be able to say you don't need to know anything we've done it all for you buy a Mathematica and you're done when you get high dimensional output we're still not at that phase where you can just point the machine at the data and magic happens and of course in pathological cases and trickier data then there's still work to be done as well so in this space of high dimensional output there are things that we need to contribute in order to make it do the right thing there are notions of what it means to be close when you're particularly in a number where we have a very obvious measure of whether the number is close to the number you wanted it to be if you're trying to make a picture measuring whether that picture looks the way you want it to be is actually a complicated task in itself to define so we need some kind of language to express that different neural networks have different characteristics some are very good with images some are very good at predicting sequences there needs to be a language to describe the characteristics that we're trying to achieve and therefore you need a language to be able to express that to the computer so you need some kind of programming language for neural networks and in the end I don't think for now there's really much getting away from that but there's not be too depressed there is plenty that we can do so let me give you the basic concept again for those who aren't familiar the idea of neural networks is they're basically just fitting again but they're fits that feed more fits that feed more fits and in theory they get more and more abstracted that you might be fitting in that image recognition at the bottom for corners and lines and the next bit might be fitting for shapes using the corners and lines and then the shapes might be used to decide whether it looks like objects and then you might have something that takes concepts and decides whether it's a bottle of water or glass or whatever based on the extra layers and the basic idea is you mix linear and nonlinear layers alternately vaguely speaking linear layers are just matrix multiplication it's just fitting the nonlinear layers are there to make it kind of complex so that something interesting happens because if you just multiply a bunch of matrices or a matrix that was the result it's never going to do anything interesting on its own but by mixing in things that are little bit less straightforward then interesting features that can be trained pop out and then there's a collection of special layers that have certain characteristics which is where your expertise needs to develop if you want to code these things from scratch but we can still do a lot to automate so even though you can't get a wave in that language of neural networks it is quite a bit that can be done so the first thing is let's skip the really simple example here and take this first example I could think of that would do something so we have a symbolic language just like when we type equations into math and ask to solve them or to do integrals all these kinds of things we have a language here to describe neural networks so this network is a linear layer which is a 10x10 matrix basically then a nonlinear layer it's going to do the tanch function another linear layer with some other number of nodes I'd rather pick these numbers randomly it seemed to work and then we're going to go down to 1 so that we can map that to a scalar sorry the output to a scalar and we're going to tell it the input is going to be a scalar so take one number in, do a bunch of magic one number out and this net initialize says the weights, the things that we want to learn make them up, pick some random ones so if we do that now we've got this thing that represents the network it hasn't done anything apart from filling in some random weights that are buried inside the network but what it effectively is is a random function so here's the function I just created if I did this again I'd get some completely different random functions so let's plot that again not very different sooner or later I'll get something more interesting ok there's something a bit more interesting so random functions but buried in there are a bunch of numbers this 10x20 matrix I think it is in fact we have one number in so we need a this plumbing to say that in order to go from a 10 layer to a 20 layer actually this one needs to be the 10x20 matrix and then we need to go to a 20x1 matrix that kind of inferred dimensionality that gets automated for us we don't need to worry about the plumbing we just say here's what our four layers are and now we've trained it just like the pipeline that we had before when it was fully automated so I'm going to take some data here which is the sign function so here's some inputs and outputs just like we saw in the really simple case model using those those numbers and it'll think about it for a few seconds this line at the bottom is the error so we want that to go down at some point it may go flat when it stops learning anything it's still doing well so I'll leave it to go and what we end up with is something that once trained starts looking like a decent approximation for the sign function so here I've just taken one number in one number out not what I said with neural networks before it's going to do something where you can kind of see the entire process so it's just like the fully automated case except that I have to get involved in the network here I had to define this network in exactly the way I said at the beginning was kind of bad about models that we kind of want to try and avoid that but there's no really avoiding it when you get into the neural network space now the collection of layers available is quite large let's make this a bit bigger here so now I haven't got them fitting on screen properly without line breaking but here's a whole bunch of layers and they have special purposes and this is where you need to start building up expertise if you want to synthesise these things from scratch knowing that a gated recurrent layer is useful in sequence prediction is part of that education I promised you could AI like an expert without having to study hard so we're going to just ignore that knowledge that you could create and say what can you do then if you aren't going to learn all these different layers in the big neural network game well there's a couple of things we've done to make that easier one is we're trying to build this big repository of ready to go, ready trained neural networks you can visit this on the web here it's something called the Wolfram neural network repository and there's nice web pages here to describe each one but when I'm working within the wolfram language here the idea is that it's plumbed into the language so I can just go and help myself to a network and reuse it so I can take an existing network here I'm going to get the lunet trained on emnist data network slightly more complicated than the one that I had before this one's got 11 layers and you can see convolution layers and pooling layers and we don't really care what they do because we can use this model straight out of the box and say can we predict what those pictures of handwritten digits are and it's saying here that this first digit looks like an 8 and a 0 4 1 6 has got them all right in this case so there's a big collection of these models and we could help ourselves to something more complicated like I could take the net model of let's say the one that I showed at the beginning the image identifier and let's go back to get a picture here let's have a picture of a tiger and let's copy that image and apply it to the image and we can see it's making a prediction of a tiger we don't have to concern ourselves with the fact that this particular model is really quite complicated there's about 300 layers in this thing 20 outer layers but some of the inner layers here themselves are complicated layers with lots of sub layers in them so we don't have to synthesize that thing we can just take it and use it but we want to do more than that we also want to be able to do new things not just out of the box things that other people have solved for us so one of the things we can do is we can take those models and because we have the full specification just like I wrote myself but better we can take those models and retrain them so I can take that net the MNIST net that we had a moment ago and I can give it this Arabic digit here it's 7 but the network was only trained on Latin characters and so it doesn't know that it doesn't know how to recognize that it does its best if it looks like a 4 that's fallen over and doesn't have the vertical stroke on it or something like that but we can repurpose that network to a different training set of data so here I'm just going to say let's take the network that we just grabbed that was pre-trained and we're going to train it with a new data set which are going to be some Arabic letters digits from 1 through to 9 and there's a small training set so I'm not going to give it too long it should be enough, let's stop that early and now when we say let's take this newly trained net you can see it's now making a prediction that is more useful I'm actually cheating here of course because I'm doing exactly what you shouldn't do with validation the example that I used for training is that you need to repeat here and you really need more than one example of each digit to do a decent job but this idea that you could take something like the image identifier and retrain it for cancer cells and non-cancer cells instead of whether it's water bottles and glasses is just a retraining exercise and it's a case of finding the network that is nearest in purpose to the one you're wanting and see if you can just hit it with new data we can also adapt them to a new task now this does take some knowledge but it takes a lot less knowledge than synthesising a network from scratch let's go back to this existing network here not that one the character recogniser this one here so these layers here are targeted for recognising the digits 0 through to 9 and that's really buried in the last couple of layers see this last layer here is a linear layer 10 dimension 10 is the 10 classes that we're going to end up in and then it does a softmax layer which is sort of a probability layer it turns the features into something that will add up to 1 so all the probabilities if we ask it what's the chance that this is an 8, a 7, a 6 and a 5 those numbers should add up to 1 so those two layers and the decoder that says go from some number through to something that we present those things are encoded into the network for the purpose it was designed for but if we want to repurpose this we don't have to rewrite the whole thing and we don't even need to throw away the training data we can adapt the network with a little bit of surgery because we've got a symbolic object for that network it's just another piece of data I can program to operate on that data just like if I was swapping the numbers in a matrix or colouring a picture I'm just changing the objects so what I can do is here's my basic surgery I take the first 9 layers and instead of the 10 and the softmax I've replaced it with the 2 and the softmax so that I only have two classes of output and then I've come up with a custom decoder that says when we get to finally to some numbers you want to interpret those numbers as one of two classes Arabic and Latin so that's my surgery that has taken it from being a character recognizer to a language recognizer so let's do that surgery and now I'm going to train it with some new data here which has the numbers, the various digits that we've seen before but now they're labelled in different ways they're labelled as character goes to Arabic and another character goes to Arabic we're only interested in these two classes now and if we give it a few seconds to think about that let's again stop that early then hopefully now if we ask it what does it know about those those letters it's got its predictions that says the first one is all Arabic letters and the last one is all Latin letters and I can drill into that softmax layer and say what are the probabilities I'm reusing the training data which is why it's so incredibly confident that it's definitely definitely an Arabic letter because it's actually seen that exact example before so what is the humans role then in this we still have a little bit of work to do to tell it things like neural networks but the first role for us is we've got to see the opportunities we've got to make these conceptual leaps to say what is it that we're trying to fill in what does prediction mean in this context we really have to worry about the data that's coming with it we want to be able to focus the AI as well sometimes we can help because we know something that it doesn't that we can start off with and I think I've got time just for two more examples here here's an example of focusing the AI I've got some text here that I was predicting to be either a cat or a or a dog and what we want to do of course is spot that the word cat is significant but as a human I can make this data set more powerful by telling it first of all that we're not interested in the letters there's a lot of letters in there maybe Cs are important, maybe Ds are important but we can tell it we care about words that gives it a head start I can then tell it that umlauts and accents don't matter so that this ohat here this dug is just to treat it like another letter and we can say case doesn't matter I don't care if it's dog is capitalised or not and as a result if I hit the wrong key there if I run this here take a second to build here I can give it some text that's never seen before with an umlaut it's never seen that before and maybe I could give it one with a capital G but a small D I don't think there's any examples of that and it can still make a reasonable prediction there because I've helped it to focus in so if you know that all your photographs that you're doing image processing on is the shape you care about not the colour then strip the colour out help it to not have to learn that for itself and it makes small amounts of data go further you also, and this is a data question is worry about artificial stupidity it's very much about thinking about the right question and whether the data supports it here's my toy example I've got some heights here of people and whether they're male or female so this 1m82 person is male and this 1m60 person is female and what I want to do then is to say let's predict if we've got somebody who's 1.6 meters tall what gender are they now that seems like a slightly surprising result because the only example in the data that we got that's 1.6 was labelled female and it seems not to have learnt anything useful from that well actually if we look at what it thinks about different heights for females the probability that somebody's female four different heights you can see that while it does make a significant shift around the 1m70 point actually the probability varies between 0.4 so 40% and about 8% it never thinks that they're female and the reason is if we go back to the data the data was horribly unbalanced here I've got one, two, three four examples of male and one of female it doesn't know about the world so as far as it's concerned the world is full of men and so of course it's going to predict that it's male because all the evidence is that it's unlikely to be female now this is very contextual again that if this was data collected in some kind of men's club or the male changing rooms at the swimming pool then maybe that's the correct assumption in fact we're getting the right prediction out if this is supposed to represent the world as a whole then we have to do something about it and so there's all kinds of things in the details you can get involved in here where I'm going to say here that we'll do the same question but I'm going to give it some prior knowledge here that says the background before you look at the data is that men and women are equally distributed and now that I do that it's able to adjust its probabilities and make a prediction that's given the data it's got and given that background knowledge that this person is most likely to be female by about two thirds so it's that's the role of the human is to think about where the data is going to mislead what the questions mean, what the answers mean and to worry about how much validation we need to make that work so let me wrap up and I think I've got about a minute for questions after this by saying in the end what our mission is in AI is to automate that pipeline but actually it's in the context of automating all of computation not just machine learning and to what I'd love to talk about it more but having a world of image processing and signal processing and graph theory is all very important for that preparing the data for the AI and our mission really is to empower you either as experts or as beginners to be productive and hopefully I've made that case for the AI here and you can download the talk or play around with Wolfram Technology in on these links that I've put up on screen Thank you very much Are there any questions? I have a minute 40 so maybe two questions or one No? Okay, well thank you very much