 There. If you're standing in the back, there are plenty of seats. Please feel free to filter in. This afternoon, I'm going to be talking about deep learning. If you already know about deep learning, you're going to be bored. So take the opportunity to ask the hard questions about the stuff that I didn't cover. If you don't know anything about it, this talk is for you. You can be forgiven if, when reading on the internet, you can substitute magic for deep learning and it fits perfectly in all of the articles. It's hard to know what it can't do. We don't get to talk about that very much. So the goal of this talk is just to talk about it on a really simple nuts and bolts level. There we go. The summary, in case you want to take a nap, deep learning is not magic, but it's really good at finding patterns. So if this is our brain, this is deep learning. An owl can fly. A fighter jet can fly. But there are a whole lot of things that an owl can do. And arguably it's a much more complex thing. Although what the fighter jet does, it does really, really well. So deep learning is the fighter jet, highly specialized, highly engineered. We're today going to talk about the basics, the Wright Brothers airplane. If you understand the principles that it works on, then you can see easy to branch out into the finer engineering details. But there's a whole lot of things that go on in a fighter jet that we're not going to talk about in detail. But this is nice. We can talk about this at a comfortable level. This is a neuron. Like all neurons, it's got a big body in the middle, long tail and some arms that branch off. Here's an artist's conception of a neural network or a bunch of neurons. Again, big body, long tails, arms. This is an actual picture of neurons into some brain tissue. Here the bodies look like dots or globs. You can see long tails, some of which branch, and the arms are pretty much invisible. And again, picture of some brain tissue. Here the neurons are small dots and you can barely see any of the tails at all. This is just to give you a sense of how tightly these things are packed together and how many of them there are. Big numbers with lots of zeros. And the crazy part is that a lot of them are connected to many more of their neighbors. This is one of our very first pictures of a neuron. Santiago Ramón de Cajal found a stain that he could introduce into his cell, turned the whole thing dark. Under his 19th century microscope was able to see this and then draw it with a pen and paper. This is old school. What you see here though, bodies, long tails, lots of arms. We're going to turn it upside down because that's how they're typically represented in neural networks. And these pieces actually have names. The bodies are called the soma. The long tails are called axons. And the arms are called dendrites. We're going to draw a cartoon version of them. This is what they look like in PowerPoint. Now the way that neurons work is a dendrite, you can think of it as feelers or whiskers and they look for electrical activity. They pick it up and send it to the body. The soma takes this and adds it together and accumulates it. And then depending on how fast it's accumulating it, it will activate the axon and send that signal along down the tail. The more dendrite activity there is, the more axonal activity there is. And if you get all of the dendrites really firing then that axon is just as active as it can possibly be. In a very simplistic way, a neuron adds things up. Now a synapse is where the axon from one neuron touches the dendrite of another. That's an artist's conception of it. You can see in Ramonica Hall's drawings these little nubs or buttons. They're actually called boutons on the dendrites. And these are sites where the axon of another neuron touches them. So you can imagine there's a little connection there. We'll represent that connection by a circle. And the diameter of that circle is the strength of that connection. Big circle, strong connection. It then connects strongly or weakly or somewhere in between. We can put a number on this connection between zero and one. So a medium connection, we'll call it a .6. When the axon of the input neuron, the upstream neuron, is active, then it activates the dendrite of the output neuron. It passes that on with a modest strength. If that connection is strong, then it passes the signal on very strongly. If that connection is a one, then when the axon is active, the next dendrite is very active. Likewise, if that connection is very weak, say a .2, then when the axon is active, the dendrite of the output neuron is only weakly activated. No connection at all is a zero. Now this starts to get interesting because many different input neurons can connect to the dendrites of a single output neuron. And each connection has its own strength. We can redraw this by taking out all of the parallel dendrites and just drawing each axon and the single dendrite that it connects to and the connection strength represented like this with the dots. We can substitute in numbers for that, the weights. Although most often, oh we can also substitute in line thicknesses to show how strongly these things are connected, most of the time neural networks are drawn like this. And this is what we have. We went from the super complex slice of brain tissue with many subtleties in its operation and interconnection to a nice circle stick diagram where each one of those sticks represents a weight. In its current form, it can still do some pretty cool things. The input neurons can connect to many output neurons. So actually what you get here is many inputs, many outputs, and the connection between each is distinct and has its own weight. This is, it's good for making pretty pictures, it's also great for representing combinations of things. And the way this is done, let's say you have five inputs labeled A, B, C, D, E. In this case, this output neuron has a strong connection to A, C, and E, very weak connections to B and D. That means when the A, C, and E input neurons are active all together, that really strongly activates the output neuron. B and D don't matter because they only connect weekly. So a way to think about this output neuron is in terms of the inputs that strongly activate it. So we call this the A, C, E neuron. And here we have an atomic example of what happens here. This output neuron represents a combination of the input neurons. This is neural networks in a nutshell. You can do this with any kind of input. So you have a really low-tech four-pixel camera. Each of those four inputs is one of the pixels, upper left, lower left, upper right, or lower right. In this particular neural network with strong connections to the upper left and upper right pixel, we have a neuron, an output neuron, that represents this bar in the top half of the image. So we can combine letters, we can combine pixels to make small images. If you're doing text processing, the input neurons can represent individual words. So in this case, we're pulling words out of text. This output neuron is strongly connected to the input neurons for I and ball. So we can call it the eyeball neuron. Similarly, we can have a sunglasses neuron. And input neurons can connect to many outputs. We can have an eyeglasses neuron just as easily. So going a little deeper into this, this is a somewhat trivial example to show how these things work in practice. So there's a guy at the Schwarma place and makes Schwarma like nobody else. So you want to make sure and go when he's working there. And taking a step back, we actually have some domain knowledge here. We know he's got two schedules. Working in the morning, off in the evening, and off in the morning, working in the evening. Now if we were to instrument this with sensors, we would have the working in the morning, off in the morning, working in the evening, off in the evening. And it might be useful to represent his working patterns in terms of a couple of output neurons that combine those. So this is the network that we would expect to end up with. Working in the morning, off in the evening is one pattern. Off in the morning, working in the evening is the other pattern. And you can see based on their connection strengths how they combine those inputs. Here would be the weights associated with those. Now the question is how do we learn this? If we have to go in and fill it all in by hand, we haven't learned anything. It's just a fancier way of programming and a lot of hard work, especially if you're dealing with many millions of input neurons. So we want to learn this automatically. So to start with, might be a little counterintuitive, we create our neural network. We have our input neurons. All we choose is the number of output neurons. In this case we'll choose two because we happen to know we're learning two patterns. And then we randomly assign weights. Randomly generate numbers for each of these. It's a neural network that's completely, you roll the dice, you throw the sticks, and whatever falls out, that's what you start with. And then we start to gather data. We go, stand on the other side of the street, and we observe that the Schwarmagai on this particular day worked in the morning and then went home, did not work in the evening. That means this input is active. We'll say it's at a level of one. Off in the morning is at a level zero because we didn't observe it. Working in the morning is at zero. And off in the evening is at a one because we observed that too. So the next step is for each of the output neurons, we calculate the activity. So in this case, an appropriately simple way to do this is just take the average of the inputs. So here this weight is 0.3 and this weight is 0.1, so the average of those is 0.2. These neurons don't contribute anything because those inputs aren't active. Similarly, we can take the weight between this input and that output, 0.8 and 0.4, the average of those is 0.6. The output neuron on the right has a higher activity. That's the one we care about. We ignore all the others. There's a million others. We ignore the rest and focus on this one for this time step. First thing we figure out is how wrong is it? Well, if it was perfect, our neural network is perfect, that would have an activity of one. It would be perfectly aligned with our inputs. But it only has an activity of 0.6, so the error is 0.4. The bigger that error is, that's a signal for how much we need to adjust our weights. When that error gets very small, it means that the weights really represent what's going on. We don't need to make any more changes. Now the trick here, gradient descent. If there is an element of magic in deep learning, it's gradient descent. What you do is you go through and adjust each of these weights all through. You adjust it a little bit up and a little bit down and see which way decreases the error. The idea, the concept in gradient descent is weight is a quantity that you can shift a little bit side to side. As you do, this error will change. You can think of it as taking this ball, if you shift it a little bit to the left, it has to climb the hill. If you shift it a little to the right, it has to fall down the hill. And you like the direction, you choose the direction in which it gets lower. You want to bring that error down as low as it can get. And you take small incremental steps to keep everything numerically stable. So we go through and we do these for all of the weights that attach input neurons to our output. And we find that, yes, we want to increase this one. Because these aren't active, we actually have a bias toward low weights. So it doesn't hurt to decrease these. So we'll go ahead and decrease that weight and decrease that weight and increase that one. When we do that, sure enough, our new activity is 7. And so our error went from a 0.4 to a 0.3. It's a little bit better at representing what we saw. So that was one data point. We go back and we do the same thing the next day. It just so happens that this day, he's off in the morning, working in the evening. We adjust the weights and we do that day after day after day. And eventually the weights will stop changing or they'll slow down changing quite a bit. They'll get stable and we get the system of weights that we originally saw because we had knowledge of the problem. So this is the Wright Brothers airplane version of how training by back propagation using gradient descent works. Back propagation is a very specific way to do this that is computationally cheap and slick and fast. And you get your jet engine instead of the flapping of wings. So this is an underlying mechanism by which it works. So what we just looked at was a single layer. We have inputs, we have outputs, every output is a combination of things that were on the previous layer. Now there's no reason then that we can't turn around and take that output layer and make it inputs for the next layer. And do that again and again. If a network has more than three layers or so, we call it deep. Some have more than 12. In some recent research in Microsoft, there are deep neural networks with more than a thousand layers. There's no theoretical reason to limit the number of layers that you have. It just depends on the specifics of your problem. Now, what do deep get you? Why is deep special? If your input neurons are say letters in the alphabet, your first layer outputs, sorry, this is a deep neural network with all of the connections omitted for clarity. So these are your inputs. This is your first layer of outputs. They're combinations of those letters. Each level you go up, you get combinations of what happened the level before. So by the time you get to your second level of outputs, you're getting perhaps words in the English language, if that's what you're training on. The layer above that, you get combinations of words, short phrases, and so forth. And you can do this as deep as you like. So there's a variety of things you can learn with deep neural networks. A very popular one is images. If you take as your inputs pixels and show, instead of looking at Schwarma guy schedule, you're looking at individual pictures as your training data set. What you start to learn after a while is these little representations of short lines and dots and patches. These are the primitives of an image. If you train on image of faces, then your first layer outputs, sorry, your second layer outputs start to look like eyes and noses and mouths and chins. And your third layer output starts to look clearly recognizable as faces. Similarly, if you train on automobiles, your second layer outputs start to look like wheels and doors and windows. And your third layer output look like automobiles. So that's pretty cool. We didn't have to go in and twiddle any of those weights. This just learned that from seeing a bunch of pictures. You can do it on color images too. Here's an output of some of the output neurons of an eighth layer neural network. And as you get deeper, you can see things that are clearly recognizable and quite complex. You get spiders and rocking chairs and sailing at ships and teddy bears. You can also plug information in about music artists. So here's some research where output neurons were learned based on information about artists and then their representation was plotted based on how similar they were in that. And so we see things like Kelly Clarkson and Beyoncé are similar over here. Which is also not too far from Taylor Swift and Avril Lavigne. Whereas if we go up here, we get Weezer, flat keys, modest mouse, presidents of the United States of America, all in the same neighborhood. This is a network that didn't know anything, still doesn't know anything about music. But because of the data that it gets on the input neurons, it's able to group these things appropriately. It finds patterns and then finds things that most closely fits those patterns. Turns out you can take Atari 2600 games, take the pixel representations, feed those in as input neurons, learn some fun features and then pair it with something else called reinforcement learning that learns appropriate actions. And when you do this for a certain class of games, it can learn to play them far better than any human player. And it turns out that you can take a robot and let it watch YouTube videos about how to cook. And it uses a pair of deep neural networks. One to interpret the video, one to learn to understand its own movements and then uses those pairs that with some other execution software to cook based on the video representations that it sees. So while it's not magic, it's pretty cool. You can do some stuff. So as you're going through reading literature, reading popular articles about this, you can kind of play Bingo. There are some buzzwords and popular algorithms. You can think of a lot of these as the, you know, model numbers for the various fighter jets that are out there. But when you see any of these terms, if you like, you can mentally do the substitution of deep learning and apply what you know about the Wright Brothers airplane and most of it will still be accurate. For the bottom line, it's good at learning patterns. It doesn't do anything, but it's pretty good at learning patterns. So if you're interested in these slides, please add me as a connection on LinkedIn. I'll be posting them soon. Links to both the video recording and the slides themselves. And of course, if you have any questions, feel free to crawl on. Thanks for your attention. So until they wave signs of me from the back, we have a couple, oh, we're waving signs at the back. So we are done. There is a Microsoft booth right outside the door. By the way, I work for Microsoft. I'll be there for the next hour if you would like to follow up or have any questions. Thank you.