 Okay, so this morning we're going to talk about neural networks. For those of you who don't already know how awesome neural networks are, I've prepared a few examples to show you how awesome neural networks are. So we're going to just jump through those. So this is some pie, full disclosure, I am a contributor to this. I mostly just updated it to Python 3, so I'm not a super contributor. But a self-organizing map is pretty cool, you can do a lot of stuff with it. Here are some examples. So primarily it's used to take things from high-dimensional space to low-dimensional space. Usually something like seven dimensions or something down to something like two, so this way we can visually respect it. So this is an example of using a couple of variables and a self-organizing map. You can see here that we've trained this and then we kind of do this view and we look at like 20 different, this is, yeah, it's like 20 different variables. Here's a toy example, it just shows you some clustering. Here's some more clustering, I believe this is k-means. So you can see that there are clear clusters for each of the different points of the data and they've been mapped down to lower, okay, sorry, here's the k-means thing. And then you can see that they're mapped down to this lower-dimensional space. So this is like a pretty cool example, it's like reasonably accessible, I think, and you can make really pretty maps with this thing. So I really like it, it's great for data investigation. So if you're looking for like this actual project, it's some pie slash some pie. So it's, oh, sorry, it's, yeah, it's just some pie, it's this guy, save a little, I don't know what to say, get a username, whatever. Here's some more examples, these are also really pretty, I like them quite a bit. You can also use it to do like path algorithms, which is kind of sweet. So neural nets can be used for like lots of really cool things. This is just like one example, let's go through some more. So I'm sure you're all familiar with the classic word-to-vec thing. So those of you who may not know, word-to-vec is actually a neural net. Some people talked about it in their talks yesterday, but you may not have heard. So basically what you do is you train a confinet and then you pass everything up to the prediction points, you do all the neural network pieces, and then you just rip off the last piece and then somehow this magically turns into like a bunch of math that you can do like really cool things. Like you can tell how the similarity between a woman and a man is semantically. So you have like a semantic sense of this vectorization. So you can also do like, you can do a whole bunch of stuff. We also saw a talk yesterday about this guy that, or about this, this tool that this guy made and he took music and turned it into vectors. So you can actually do this on any semantic data point. So like images, audio, text, video, I think video would be really cool or like summaries of video I think would be really cool as far as applications go, but you can also use it on like really weird data things too. So that could be really, really interesting. Another good example is facial recognition that's sort of like classical. So open face is the example that I came to. It's a really good high-level interface for doing facial recognition. It's a CMU project, I'm pretty sure, at least they referenced it a couple of times. And then here's an example actually using the library. It's just their demo, but I mean, this is like extremely intuitive and really well structured and gets like pretty good results. So like Steve Carell, Amy Adams, John Lennon, right? Like, and you know, you can see the accuracies, oh it thinks it's Steve Carell. Okay. But you know, it does pretty good on some of these people. I think it depends on the number of training examples, but you can see here how to run this example. So anyway, so facial recognition, neural nets are used for like quite a few things. That's sort of the point. Okay. So does everyone feel sufficiently motivated if you didn't already know why neural networks were awesome? Yeah? Yeah or nay? I think you end many examples if people want. Okay. Okay. So what is a neural net? So loosely it's just a set of techniques for doing mathematical modeling. So like there are some clear steps that sort of like happen every time you do a neural network, but other than that, you can pretty much swap out all of the algorithms for something else. So even like stochastic gradient descent, which is like the centerpiece and we'll go into detail if you don't know what that is already. Do you have a question? Not yet. Oh. Do you guys want it? Okay. You want it now. All right. I assumed after the talk would be fine, but I guess yammering away is like a thing. Okay. So I'm going to tweet it out. Oh, but you guys don't have my Twitter. Whatever. All right. I'm going to, we're going to do this. Okay. I'm going to tweet it out right now and then. Yeah. Yeah. Yeah. And then we're going to plus this and then this is my Twitter handle. So there. Can you all, can you all see this? What? I'm making the font bigger. I can't make it any bigger. Okay. Okay. Okay. How's this? Can people see that? Yeah. Okay. All right. All right. I'm going to get the slides to get the slides. Okay. I'm done. All right. You know, you're just on your own now. If you didn't do this, then why is there a picture of Trump? There's always a picture of Trump. I'm sorry. He's just the running joke this conference. My God. All right. Anyway. So, um, uh, mathematical description of things. All right. I'm going to start over from what is, I'm assuming you all remember what examples look like. So anyway, so a neural network is, um, uh, loosely defined a set of techniques that we're doing for creating mathematical model. Um, so the reason why I like to think of it this way is because it makes it feel like less of a shiny tool that's magical and different and like special and fantastic. And just like yet another tool in our toolbox. So, um, for those of you who don't know, uh, there's linear regression. There's support vector machines. There's like N many tools for doing regression or classification. Um, the reason why neural networks are the new hotness is because of a really innovative technique actually that, that, that lives within neural nets, uh, called back propagation. Um, and we'll see that in like a few minutes. So other than that, like without back propagation, neural nets are pretty much just like the same thing as everything else, except they're like chained models. Um, but back propagation was this like really crazy, uh, extremely well, uh, received idea and it's been super pervasive, especially as we got to GPU. So anyway, so okay. Um, basically it's, uh, a mathematical model. Hopefully I don't have to motivate why mathematical models are cool. Um, but they are. So anyway, I like to think of neural nets as frameworks, like I said, because all the techniques are pretty much swappable. Um, the way that you kind of work through a neural net is you start with, uh, doing the forward prop, the initialization, and that's like how stochastic gradient descent starts, then you do the forward propagation, then you do the back propagation and that's, that's pretty much it. Everything else can be swapped out. So like the specific type of gradient descent that you do can be swapped out. Um, you don't even have to use that as necessarily, um, like even like the way that you structure the neural network can be different, like what's why you have recurrent and convolutional neural nets. Anyway, so enough, enough of that. Um, uh, let's talk about how to interpret a neural net, which I think is like the real meat of this talk and what you should be able to take away at the end of this. I want every single person in this room to not only understand what a neural network is, but actually be able to explain it and then implement it on their own. Um, and I think the best way to do that is to build the right intuition. Okay, so, um, basically, uh, there's this idea of, uh, linearly non separable. Um, and this is an example of a linearly non separable, uh, like set of data. So like we try to classify with a straight line, we can't because every point here is going to touch here and we can't use vertical lines and we can't use, we can't even like use, so like no straight line will, will completely classify this. However, we can use a series of nonlinear transformations to create a straight line between these two sets of data and through the propagation of these, um, various nonlinear transformations, that's pretty much how we are able to use a neural network, uh, to, to get better predictions and better results. We're doing all of these, these matrix operations, um, to actually do this. And we'll look at what matrix operations are in detail if you don't know already. I believe I'm supposed to click this, but I don't remember. Oh yeah, so this is just a, a reference to, um, explaining the topological background and giving you like further insight, but, uh, the big thing to take away really is that you can do nonlinear transformations and there's this topological view of neural nets that is really pervasive and intuitive and allows us to draw straight lines through data. So basically really what we're doing is linear classification. Um, nothing too crazy. Anyway, now that we've seen that, let's look through, um, an example of how a neural network can be used. So this is clearly very nonlinear. Um, I believe this is gonna work. I'm not 100% sure because, uh, it's stochastic in nature so it's not always gonna be 100% accurate. So what I'm gonna try to do right now is learn the orange and the blue. Okay, this is doing pretty good, this is doing pretty good. And done. With 118 iterations using the right tuning parameters, um, I was able to actually get like a pretty good, um, neural network that classifies this spiral dataset. So this is like super hard for like linear regression to classify, but by using a two-layer neural net with seven neurons and one and three in the second I was able to do this. You can actually see intuitively like kind of what's happening. So these are all these nonlinear transformations to the data and what it ends up with this is this wonderful picture that actually gets us what we want. But, um, so let's see if this model generalizes. Let's try playing with this with other things. Okay, it does a pretty good job there. Um, it does a pretty good job there. I mean, I could have gotten to like full, but I just decided whatever you guys get the idea. So, um, let's like look at a second though. Like, is this just gonna magically work with any set of nonlinear transformations or does it actually matter like what we kind of did and like how we trained it and all those other things. Let's just like mess with some of this stuff. Well, I mean, okay, so that's now not like basically not working, right? Um, so the point I'm trying to drive at here is that neural networks are a lot still an art form. Um, there are actually new techniques that came out this year for training these hyper parameters that that sort of live up here and we'll walk through like most of what these are. But basically, um, an activation function is a nonlinear transformation that you apply to your data set. That's what's happening in each of these layers. You're applying this nonlinear transformation. Regularization, um, kind of like penalizes your data so it forces it down so you get less of the results overall. Um, the regularization rate sort of says how much regularizing we're going to do. The learning rate is how fast we take in new data. Uh, and then like, you know, problem type, so classification, um, we could have also done regression and then it would have output a number. Uh, and then there's like a whole bunch of other stuff we can tune, but basically this is just like a whole bunch of knobs and levers. Really, that's what's kind of happening when you're thinking about actually training neural nets. Um, you're just like pulling at these random levers and somehow you use this art to create really interesting, uh, conclusions. Of course, the solution I presented above is not the only way to train a neural net and get results. So this one actually also works. It's completely different and yet somehow, okay, it still gets the same solution. Not as quickly you can see, but like, it's like trying, it's trying super hard. Uh, it will get there and... Come on, come on, come on, yeah! And it gets there. So it's a completely different model, but it still gets the same result. And so like figuring out how to actually get there can be a challenge. Okay, so neural nets are hard. Um, I think it works better on this one. I don't remember, whatever, he'll just believe me. Um, but it can be really rewarding and a lot of fun too. Anyway, so, um, I would highly recommend, um, you know, playing around with these things, um, and like being a scientist about this. Uh, so, alright, here's the real conclusion. Neural nets are fickle. Um, unlike simpler models, like regression, linear regression or decision trees, neural nets require a lot of care. You have to really try to actually get them to do what you want them to do. Um, and it turns out that actually like, they're not that parsimonious. Like, I didn't understand how the heck it figured out that spiral. Not really. I mean, I saw the pictures and still it was like hard, but I think by the end of this talk we should have some intuition about how that kind of happened. And if you can think about these non-linear projections and transformations and then apply like a linear regression layer at the end, then you can actually get, um, quite a bit of information out of this. At least something parsimonious. And this is like a new area in the field. So, so, so hopefully you'll be able to drive at that at the end of it. Um, so, uh, there are some follow-up questions here. Um, and these are just like things that you should experiment with. Um, but anyway, I figure this isn't like too mean because, you know, you're just playing with like tugging at wires. You're not actually doing any codes. So I'm not really forcing you to do anything cruel, but I don't know, think about this if you have the time. Okay, so, let's make our own neural network. That's why you're all here. Um, now that I'm giving you some of the intuition, uh, from like an intellectual standpoint, we're going to actually just like build the damn thing. Um, so the first thing you need to know is the derivative. How many people took calculus in this room? Pretty much, okay. Not everybody. Okay. So, super quick introduction to calculus. Um, so, calculus is basically the ability to take the instantaneous rate of change, um, at a line. So, uh, give me one second. I'm going to give you a thing. mx plus b. Okay. Does everyone remember the linear equation for a line? It's mx plus b. Um, so y equals mx plus b. So the m here, everyone knows that that's the slope, right? So, okay, a derivative is pretty much just the slope at a point. That's it. So it's a way of calculating the slope at a point. Okay. Um, everyone cool? Yeah? Alright, great. So, um, there's this rule called the power rule, and you can use it to, uh, actually figure out what that, an equation for the slope at each point in the line, which is pretty neat. Um, so, like, if we looked at, uh, f to the second power, sorry, x to the second power, if the function f equals x to the second power, then we get 2x to the first. Um, so the general rule is x to the n becomes, uh, n times x to the n minus 1. Okay. So, um, matrices. So, since there are some people who didn't take calculus, I'm going to assume not all of you took linear algebra. Um, so linear, uh, linear algebra is super awesome. Um, it primarily is devoted to vector spaces and the idea of matrices and doing matrix multiplication and a lot of other fun stuff. Uh, it's probably my favorite class in undergrad, and subsequently why I studied algebra in grad school. Um, and so, like, the, but the main aha moment is just that you can take equations like this 4y and, like, do this, sorry, ah, what happened? Um, you can, um, you can just, like, create this matrix. So, um, this is, like, just the, uh, like, computer science-y form of this, I don't know, LaTeX. Um, but basically this is 5 from here, this is 6 from here, that's 7 from there, and that's 4 from there. Um, and then you can do things like the dot product. And basically, all you're doing, when you do, uh, uh, let's say these are single, um, like, 1 by 2 dimensional matrices, um, so 1 row and 2 columns, um, then you can you just do this multiple, matrix multiplication across in this way. So this generalizes to the n case. Um, there's another thing, you can figure out the distance, uh, of, of 2 matrices by taking the distance of a and times the distance of b and multiplying it by cosine of theta. Um, and that's the same thing as a dot b. So there's, like, this, there's this, um, intuitive notion of, of, of, in, when you're doing, um, uh, neural networks of, sort of, like, figuring out where you are and then adjusting by some distance. Um, and so this is the geometric interpretation of this. So, so, so, yeah, you can sort of, like, think about matrix multiplication in this way through this, like, changing of, of direction and these doing these nonlinear transformations. Sort of like we did above. Um, and that's where stochastic gradient descent is going to come in. So we're going to look at that now. Um, so, Newton's method. Um, you could actually claim that, like, about half of the real power of neural networks kind of came from Newton's method, because of the stochastic gradient descent algorithm that was first sort of popularized by, by this method. Um, so, uh, the basic idea is pretty simple. You start with, um, uh, some function f and some, uh, some guess x0 and then what you're trying to do is figure out all the zeros of f. So what I mean by that is you figure out all the places when, uh, uh, f takes as a parameter x0 and then, or some x value and returns zero. So when f of x equals zero, that's considered a zero of the function. Um, so basically what you're doing is you start off with a guess and it's just, like, any guess x0. In this case, my guess was just some random number between zero and a hundred. And then what you do is you just, like, pretty much, um, uh, keep guessing. Um, so here I'm taking the, the, the guess that I have currently and I take the, the, the size of x0 and, um, I divide it by the derivative. So I'm discounting by the rate of change and, uh, this is basically how it works. And the reason why it's stochastic in nature, so this is, like, gradient descent. Um, gradient just is another word for derivative. And then, um, it's stochastic because I started with a random value. So I'm initializing it randomly. So I will still get to a zero, but it won't necessarily be the same one. So if we ran this code, we would actually find that there are two zeros for this function, um, f at zero and f at one. Um, and we would get that stochastically by applying this many times until we get all the, the roots. Anyway, so, um, yeah, uh, that's stochastic gradient descent. Hopefully this algorithm is pretty clear. Um, maybe not, like, why it works, but that it works and, like, sort of how it works. Um, does this feel somewhat clear to people? Yay, nay. Okay, great. All right, moving forward onward and upward. Um, here's an explanation of pretty much just everything I did there. Um, and then we can look at this graph if we're interested. I'm not going to go through that in super detail. All right, so, we're ready now to talk about neural networks. We have all the background material, maybe not in detail, but at least to some degree. Um, this is actually a neural network. Surprise, they're actually not that complex. Um, this is a single layer perceptron network, uh, and it does pretty much everything you would need to. So I stole this straight up from Andrew Trask. He's super awesome. Um, and he, uh, yeah, he made this really awesome blog post series. Highly recommended. And it takes you through neural networks from scratch. However, it doesn't generalize the code. So that's my contribution. Um, I wrote a general implementation of a neural network based off of his code that's significantly, uh, that can go, you know, in depth deep. Um, okay, so let's get started. So the first thing we do is we start off with two matrices. Um, these are the input matrices and then we have our output matrix. So, like, this is like our X values for our function. This is like our output values. Alright. Uh, and then what we're going to do is we're going to create some synapses. These are our initial guesses. So this is like the stochastic part of our stochastic gradient descent. Um, and then we're going to use something called forward propagation. And basically what we're doing here is we're taking this sigmoid function. So a sigmoid function, um, super weird looking thing. It's just, it's this. Um, it's this. Um, and we're going to chain these these, uh, what, so called layers together. So, um, I'm actually going to go back to my picture of a neural network. So, like, it's super clear exactly what I'm doing. Give me one second. Alright. Um, so the layers are like these boxes. Alright. So these boxes, these boxes, these are all layers. This is like the input layer. This is like the first layer. This is like the second layer. And this is the output layer. So, let me scroll back down. Okay. So this synapse, these synapses, these randomly initialized things, these are actually all these lines in between the boxes. Okay. So they're actually not that scary. It's just like, um, matrices. They're just matrices of a certain size. So, um, if you guys don't know, matrices aren't commutative. So you actually have to, like, preset what the sizes of your different matrices are going to be so that they can interact with each intermediate layer. So your synapses are the lines in between each transformation layer. Um, and each layer have to match up. Um, so that's the reason why I preset this to 3, 4, and 4, 1, because this is, you know, 3 by 4. And so this has to be 3 by 4. And this is 4 by 1, because this is, uh, 1 and the thing it was getting was a 4 by 4 matrix. Okay. So that should be super clear. Just, uh, matrix multiplication. Alright. So then this is back propagation. We're, we're, we're, we're, we're pretty much ready. Um, so, uh, here what we're doing is we're looking at y, which is the final output, and we're subtracting l2 from it. Um, then we're multiplying, uh, this is, this is the sigmoid function, I believe. Anyway, so, um, sorry it's early and I'm, like, slightly forgetting, uh, but, yeah. Um, basically this is just like the error at, um, the last layer. So basically what we're doing right now is we're seeing, okay, this is the actual output, and we're seeing how much we were off from the actual output from what our guess was. Um, that's what l2 is. It's our guess. So remember, y is just a constant. It's the thing that we define above, and it's what our actual output should be. Um, and then this is, this is, this is our, this is our error term. So we're subtracting our error term, or we're multiplying our error term by, by this, and then we get the delta, so this is the difference between, um, l2 and then, uh, yeah, and y. And then we take the, the, the error from, from, from the previous layer and we multiply it, we dot it, um, by synapse one, uh, which is like this thing that's going to get us back to here, right? So, um, and then we just multiply that by the, the signal function. Okay, and then we update our synapses, um, accordingly, by, by multiplying them by the transpose of, uh, the, the amount we were off by. So this is how the propagation happens forward and backwards. We're just doing matrix multiplication and then we just like, we're just like taking a derivative. Um, so the derivative may not have been like super clear, like where it was exactly happening. Um, it'll be clear in like the general thing. But anyway, at this point you should be able to see the clear, the steps somewhat clearly. There's initialization, forward propagation, and then back propagation. That's it. That's all there is to neural net. So, um, if we generalize this code out, then uh, the initialization step becomes create connections. Um, here I'm actually creating my neural network with, so like, I don't know, I just think this function is clean. Um, basically it creates N, N many layers and this is like the real innovation that I'm bringing. I mean, I'll, there'll be further things you'll see in a second. But basically, um, by creating a lay, uh, uh, uh, N, N size neural network, we can make our networks arbitrarily deep and arbitrarily wide. Um, this code, uh, uh, doesn't actually, um, allow us to make it arbitrarily wide. Um, this is another iteration of this code that does, but this will allow us number of hidden layers to arbitrarily, make it arbitrarily deep. Which is actually, what turns out what matters, the number of nonlinear transformations that you take. Um, so like, like, the, the, the depth of this, which would be like the number of hidden layers, this actually turns to, to be the thing that allows us to make infinitely more complex models. For some reason, width doesn't really matter that much. At least that's what the paper that I read said. I didn't fully understand it, to be honest. Um, anyway, so, uh, the next thing, forward propagation, so this becomes, okay, so first we set up the actual sigmoidal function, um, and we have something that allows us to, to do the derivative. Uh, this is probably not the best way to do this, I should probably just have like a nonlinear, uh, function that in like derivative, nonlinear function, but whatever. So then forward propagation becomes, so we haven't done the derivative yet, we don't do that until back propagation, let's just be clear on that. Uh, we look at all the layers, why does that keep happening? Um, we look at all the layers, then we iterate through each synapse. Uh, just so we're clear, the synapses are these line things. And then, uh, we update our layers by doing a transformation, a nonlinear transformation specifically, and then we dot, so we multiply the layer times the, the previous layer times the current synapse, and that's what gets us to the next layer. And so then that's why we're appending this to layers. So we're doing, um, layer i minus 1 synapse that gets us to layer i, and then we do a nonlinear transformation off that. And that's it, that's all forward propagation is. Um, at least in this scheme. Alright, so now we're ready for back propagation. Uh, strap in, this is about to get a little painful. Um, so back propagation, it turns out as a little, a little ugly. Um, this is not the cleanest implementation that I've seen, although I cannot find the cleanest implementation I've seen, I meant to copy it down and I just forgot to. Um, basically the idea here is we're going to apply the derivative at the delta layer. So right now what I'm doing is, uh, I've got all these errors. Uh, and the errors start off by looking at the last synapse, uh, minus the last layer. Okay? Um, so basically what I'm doing here is I'm taking the output and I'm subtracting the, the, the last layer. And then, um, uh, each error will be multiplied by the derivative transformation with the derivative. So this is where the derivative actually happens. It only happens once. Uh, uh, and it's being transformed. It's transforming the, the, the layer at position I. And then these, the, the, um, the multiplication of the errors times the layer with this non-linear derivative transformation gives us a way of getting closer and closer to bounding towards, um, a solution to this. And then everything else around this is sort of like, not fluff, it's all important because you don't just need the errors. You actually need the deltas and then you need to do a dot product with the delta times the synapse. And that's what was happening up here. Um, in the end case, this gets a little ugly. And then you have to, um, update all your synapses and that's what this is. So that's, that's this step. Okay. Um, I know back propagation is a little ugly. Um, it's always a little ugly. But I mean, this is like 20 lines. Hopefully you can parse this if you look at it hard enough. And remember, it's just this. If you can understand this, then this should not be like impossible to understand. Um, anyway, so what do we get? This becomes this. And this is pretty okay. Like, I think anyone can understand this. So even if everything in here didn't make all the sense in the world, um, hopefully the idea of a playing, taking our neural network, which is our synapses, right? And then, um, doing forward propagation of all of those to get out of layers, and then doing back propagation on our synapses and our layers to get our neural network back and the current error. And then, uh, basically checking what our error is at each level should be relatively intuitive. So we do have some time, um, and I'm going to go through some demos. But if anyone has any questions at this time, or like if anything was super unclear, any specific questions, um, about like what I did specifically, but if no one does, then I'll just do like, um, basically the talk that, the extended version of the talk, because I got through the material faster than I thought I would. So any questions about what I went over, or what I did? Um, any confusion? Anything super unclear to any degree? Um, so, uh, this is like a great example of like, I think a minimal neural net. If you want to learn more, I would highly recommend reading Andrew Trask's blog. Uh, that would be the first place I would go. Um, then, I would check out, uh, my references. Uh, that should be on three separate lines, sorry guys. Um, so, this guy has the best theoretical explanation of neural nets that I've like, ever seen. Um, I actually literally just stole this picture because I think it's the most intuitive one. Um, and this is like, uh, like word-to-vec kind of stuff. And then Wild ML is really good. He really likes Theano a lot. He does some NumPy things, but he really likes Theano a lot. Um, and also, uh, so the other thing I was going to recommend, sadly Coursera has taken down a lot of their stuff. Um, they had a class on neural networks with, uh, I believe it's George Hinton or Geoffrey Hinton, or I'm not sure how to spell his name, but he's at University of Toronto, and he, um, I mean, now he works for Google because that's what happens all the artificial, like, intelligence, like, super monster guys who are, like, badass. Um, anyway, so, uh, yeah, he gives a class on neural nets and it's, like, super, super, super good. But, um, if none of this stuff made sense, if you are still scratching your head and you're like, ah, this is hard, um, I'd highly recommend Gilbert Lestrang's MIT OpenCoursework course in linear algebra. Um, it's, like, the best thing ever. I actually didn't even, like, attend my lectures and I went to a pretty good school for math. I just, well, I attended my lectures, but I watched these. These, this is how I learned linear. Um, and then, um, I'd also highly recommend um, what else do you need? Probabilistic graphical models is really important. Um, yeah. So, uh, PGMs is super important. Um, okay, this is, like, not the right example. I don't know. They're just search for probabilistic graphical models. Um, that's basically what neural nets, so, like, the computer science view of neural networks is more or less probabilistic graphical models. Um, they see neural nets as a generalization, although I feel like it's a confluence of a lot of the sciences coming together, like, uh, linear algebra and calculus, so, physics. Um, and really this kind of, I feel like, kind of, came out of CERN and a lot of really hard science, and it can't really be pointed to by one specific thing, but that's, like, personal perspective, just, like, seeing them evolve. Anyway, okay, cool. Any other questions before I go into more fun stuff? Yeah. Oh, yeah, for sure. Oh, great. Yay! That's super great. Okay. Oh, yeah. Like a ton. Okay, so, um, yesterday someone did a talk on, okay, so, I want to pause it one thing. If you're going to build your own neural nets, uh, you should be really, like, okay with it being, kind of, hard to, to, to do, like, with real accuracy. Um, but I would, I would highly recommend, um, so, uh, someone did a TensorFlow problem yesterday, and they basically did, like, handwritten, like, handwriting recognition, like, uh, number recognition. That, um, the open face thing that I mentioned at the beginning of the talk, they have a bunch of data. You can just, like, kind of take that, and then use it for whatever. Basically, um, neural nets are really good for supervised learning. So anything that's supervised is going to be super useful, um, for this. But also, I mean, Kaggle competitions. There's, like, and many of these, right, and they all have data sets, and they're all cleaned, right, they're ready to go. Um, you can also use the scikit data sets. I think Panda's has some data sets. I mean, there's an area of big data, so, like, if you want to play around with these things, and, like, the nice thing about Kaggle competitions is they'll actually have the neural networks, like, all set up so you can see what someone else did, and then you can be like, oh, can I do better with my own implementation? I mean, that's kind of, like, how I got into this a little bit, and then it just got a little out of hand. That's how I'm speaking to you all now. Yeah. Totally agree. Yeah. Yeah. Yes. Yeah. So, um, actually, still, um, if you guys want, I can take you through some advanced features in neural nets. We can kind of just, like, go through, um, some of the more advanced stuff. Okay, so this is going to be, like, off the cuff, so I apologize if anything I say is crazy. But before I do that, um, I do want to show you one example of a neural net in practice doing something absolutely ridiculous. Uh, so this is, uh, a neural net that someone else wrote that plays Mario. Yeah. Do-do-do. Oh, this is, like, the bad one. Oh, no. I can do better. Hold on. I'm going to get a better example. Um, so I didn't write this one. Uh, this is someone else's. What is... Okay. Uh, let's just do play and see what happens. Okay. Oh. Okay, this is me playing. Shoot. Okay, hold on. I'm going to cheat for just, like, two seconds, and then we'll actually look at the example. Uh, what is it? It's this guy. No. Mario... Yeah. Okay, here we go. So this is, like, his example of, like, a thing. Where is... Oh, maybe we have to go to advanced. Uh, that's right. I took... Okay, I took the wrong one. All right. So I want this guy. I want this guy. He's... It's, like, a little scary how good this thing works. Oh, could not find main class. Oh, that's why. Yeah. I doubled up on my Java. Yeah. Okay. Cool. So this guy is a little terrifying. Like, you don't want to run into this dude if he's, like, a soldier out on the battlefield and you're fighting him because he will, like, straight up own you to death. Um, so, like, this is just an example of something that I really enjoy doing. Uh, is, like, playing with these things and just kind of watching it be terrifying and, like, straight up owning the world. Okay. Okay. Enough of that. So, um, whoops. Uh, so that's actually what I'm going to show you right now. So it turns out that you can use neural networks in congruence with another technique called Q-learning. Um, and I'm going to explain Q-learning right now. So hold on to your hats. Let me get to it, though. Mere. Different slides. Uh... Oh, gosh, too far. One second. Oh, no, you're not going to have any... Oh, whatever. So this was another talk I gave about neural nets. Um, okay, reinforcement learning. So, um, one of the things you can use with neural nets is something called reinforcement learning. So, um, right now we've kind of looked at some examples of supervised learning. You have a lot of training data and then you have, um, a lot of test data and you can verify your results. Right? Well, what happens if you don't, if you can't do that last little step in the back propagation, you don't actually have something to check against error rates. How do you, like, know whether or not what to do with your model updated? Enter Q learning. So this is, um, unsupervised. Uh, it's called reinforcement learning. Um, because, uh, you don't actually train it the way that you would... Well, it's not actually technically unsupervised. It's technically just called reinforcement learning. So what happens is a object is acted upon and then it learns positive and negative reinforcement and then takes actions based on that. So we're going to see an example of this now. That's what this algorithm does that I've written up. Um, but yeah. So, so, uh, unsupervised learning is when you don't have any training data, you're just kind of figuring out patterns, more exploratory, uh, data analysis or machine learning. But anyway, okay, so reinforcement learning. So let's, let's, let's check this out. So, um, choose action. All right. So we're going to model this off of the idea that you're in a game, like Mario. Uh, and things will happen to Mario. Like, he's going to get stepped on, people are going to kick him, his dreams are going to be dashed. He'll never grow up to be what he wants to be. He'll slowly grow in on him. I'm sorry. I was trying to make a joke. Uh, so that's like a negative reinforcement. Um, it's way too early for me still. Um, and then positive reinforcement be like getting all the mushrooms, getting all the coins, getting the princess, kicking Bowser's ass, all these important things. Um, so, uh, anyway, you have this representation and this is just a dictionary and this tuple is like the state, the current state and the action. Um, and so when you update, apply the representation, uh, or like, when you take the index of state action on the representation, so remember this is a dictionary, so I'm, I'm, words are failing me when I do, uh, this, like, say, when I, when I, well okay, yeah, yeah, fine. And then when I do, when I do that, that's, that's, that's, that's what's happening here. Um, there just may not be a value in here but I know that dictionaries had this thing called get, so I wrote a method that does evaluate and it just like, does a try-catch. Um, but that's basically what I'm doing. I'm doing get on a dictar, uh, on a dictionary. Um, and then I remove anything that doesn't have a value already and if nothing has a value already, so that mean I would have an empty list, um, return to me, that's what q is, it's like, that's what they need in the list, that's why it's called q-learning, um, in the algorithm, then I would choose a random action. So, um, another version of this would allow me to just like randomly choose an action sometimes anyway with like some low probability, but that's like a more advanced thing. We won't worry about that for right now. Anyway, so, um, in q-learning, what you do is you take the max-q. So, assuming that there is not just some random choice and we actually have learned some things and we know when positive actions have happened to us and we know when negative actions have happened to us so we know what to stay away from on average and we know what to do on average, then we take the thing that is the best for us. Um, pretty simple, right? And then the context of a neural network. So, um, where we did our error term, oh, crap, I moved off those slides. Hold on, give me one second. Uh, okay, I'm going to leave this up, and then I'm going to, yeah, yeah, and then I'm going to do this. Okay, two seconds, almost there. Okay. So... Okay, we're going to look at the little case because the little case is easier to understand. Okay. So, when we did this delta two thing at the end, right? Um, and we like updated based on some error. Instead, what we're going to do is we're going to update based on this like choice of Q, right? Okay, so everyone with me? Yeah? Okay. So, we're really not substituting much, we're just kind of substituting this. Okay, so this is like obviously like a really naive way to do this. There's like more sophisticated things you can do, but this is the basic idea of Q-learning. And so, using this, um, we can actually think of a lot of ways to update our representation. So, I did like the dumbest thing possible here. Oh, no, no, sorry, sorry, I didn't explain that. So, this is the choosing action, but how do we actually update the representation? Well, to update the representation, we can do something really dumb, where we take the current value, and then some alpha, and then some reward. So, like, this is usually a number less than one, but greater than zero, okay, we got some more reward. Let's like, be skeptical that this is actually a good thing, but let's update our current value a little bit for that state action pair. If it doesn't exist, then we'll just create a reward. So, this is like the most naive thing you could do, simple decay thing, but we can make this really advanced. So, this is like a sophisticated method for updating. We could take the mean, or we could, and this is like, this would be actually playing game. I haven't seen this here yet, but we can do this like, generate a board, if we're playing like a regular board, and then we can choose an action, we can update an action. What I wanted to show you though, where's the median? Okay, cool. So, we can also take the median of this. As you can see, update representation just happens the same exact way. Yeah. So, there's like lots of strategies for doing this. Like, the distribution of values, and then doing like a cosine distance between things, you know, figuring out a way of using like the histogram of values to actually inform how you update your representation and how you choose which value to do. That's the most informed way to do this, and it's usually the best, but it's also the most time consuming, right? It's the most costly. So, because you're storing way more data, and let's say you have to run this a billion times, usually like around a billion times, you end up with like these really expensive things. So, you know, there's some, like, intelligence in compressing this down to just like a simple average or a simple expectation. But, you know, if you have like GPUs and like crazy big super cluster, then you can make super badass things by taking the distribution of all these things and updating your representation and then doing lots of really sophisticated histogram type analysis. So, that's the at. Okay. Now, let's look at the implementation of this in a neural net. So, I'm going to get out of here. Okay. Oh, actually, before I do that. So, now that we've kind of seen this, and we've seen an example of this, let's just like look at another one, whatever. You know, whatever. Yeah. So, this is actually using Q-learning and convolutional neural network. The convolutional neural network is learning the pixels around Floppy Bird and keeping him safe. So, it learns where the errors are or the dangers are over time. And you actually see, like, let me zoom out for a second, the state that it's in, the action and the reward, and then like the max Q. So, this is like straight from that. And we'll look at the implementation in a second, but just enjoy how wonderful that is. He hasn't died yet. Like, he just keeps on going. He's just, you know, he's just chilling. He's just chilling. So, there's a reason why this little Floppy Bird guy does really well. He can only take one action, okay? Well, one or two actions. He can either, oh, god, stop, stop. It's live. He can either, there, one second. He can either press a button to flap or you can not press a button to flap. And those are only two actions. Let's just take another example to prove to you that our evil machine overlords are not here yet. The singularity has not happened. It's all fine. So, let's watch this thing play Tetris. I guarantee you, you could beat this engine without very much effort. So, as you can see, our neural network fails to even get to the last level one. Yet it tries. It tries so hard. But it cannot succeed. It is a true sadness that has happened to him. Poor Horatio. I knew him well. Anyway, so, or is it York? I don't remember. Okay. So, what we can see from this is that when the number of actions that a neural network can take, specifically the number of actions a Q learning algorithm can take, it's going to suck a lot more. So, how do you deal with this? Well, I actually came up with a, I believe, a novel way of dealing with this. So, we're going to look at that right now. If I can remember where I put it. Where did I put it? Okay. Give me two seconds. Give me two seconds. I think I know. I think I know. Two. Oh, I'm in. Okay. That's why. I'm going to turn to lectures. Are you in here? No, you're not in there. Okay. Yeah. Yeah. Yeah. Decision trees. Okay. I'll just explain the idea because I cannot find it. The idea is pretty simple. Actually, it might be in here. It might be in here. And then I'm going to give up. Oh, yeah. Okay. Cool. Wait. Maybe. Yep. Is it this guy? No, it's not. No, it's not. Maybe it is. No, this isn't right. Okay. I don't know where the code is. It's somewhere around here. The basic idea is pretty simple. What you do is you do a restricted set of your possible set of action pairs. Let's say you had eight actions, you would restrict yourself to two or three possible actions that you can take, and then you would learn on only those two to three actions, assuming that every action gets you from point A to point B, right, every action pair, so you would find a minimal set of action states that get you from point A to point B, and then you would learn on only those, and then you would sort of loop through all the possible action pairs that allow you to, or action state subsets that allow you to do this, and then you just, you end up picking the one that does the best. So there is an example of this somewhere. I have to dig through my code to find it. I'm sorry I wasn't expecting to give this talk today. Otherwise I would have had it available and ready to go. So I'm not sure exactly where it is right now, but yeah, it's in here somewhere. Oh, maybe it's down here. Oh wait, oh wait, no is this it? No, it's not it, it's not it, it's not it. Okay, I don't know where it is, it's somewhere in here. Anyway, so that's reinforcement learning. Okay, I don't think I have enough time actually to go through the convolutional neural net because I have about like eight minutes left and that's really not enough time. Okay, I'll stop again for questions if anyone has any about reinforcement learning since that's not what we're expecting to cover, and if there isn't any questions then I'll go through like drop out for like five minutes because there's enough time to do that. Okay, any questions at this time? Where can we find you? Twitter? Is it pretty good? I put it up before. I mean, all my stuff is just like literally my name so just like, I'm vain like that. Oh, 14, wow, okay. Some people have been saying either nice or mean things about me during this talk. This may be the end of my speaking career but that's okay. This is how you find me on Twitter. Other questions? You can also email me. My email, shockingly is just like, just my name at Gmail. Pretty much everything is just I got very lucky. I got all my names at all the things so if there's like a service that I use, yeah, okay, anyway. Other questions? Okay, drop out and then I'm going to bounce. I think that's appropriate, actually. Okay, so let's do neural nets more. I think it's in neural networks. Yeah, I named things well. I totally do. Okay, let's do drop out tunable backprop. Okay, cool. Okay, this should all look familiar. We basically are doing the same thing. The reason this is a little different is because now we have the depth and the breadth of the hidden layers. This is like that update that I was talking about before. And then we have the backpropagation, cool, cool, cool, and then we have this tuning thing. And now, oh my god, here's all the stuff. So this is depth, breadth, where's my dropout? Okay, here we go. So, yeah. This is going to be... Oh, geez. Okay, this is a little bit more complex than I remember. Okay. I'm just going to explain it. The high-level idea that's happening here is we're taking a random number from a binomial distribution and then we're deciding whether or not to keep the nodes at that layer and we're doing one minute. Okay, there's not enough time. All right, come ask me after if you're interested in dropout. But yeah, okay. All right, so I'm going to conclude now. I'm sorry that things didn't go exactly to plan. I was expecting this to take a different amount of time for some reason. I think it's early and so I was talking too fast. But yeah, thanks all for listening to me rant for the better part of an hour. Okay. Bye. Thank you.