 First, thanks so much for coming to this talk. It really means a lot to me that you're willing to spend your very valuable time here, so thank you and thank you to the conference organizers, to Cincinnati, to everyone who's speaking and attending and doing everything that makes this community fantastic. I'm super glad to see you all here. Yeah, give yourselves a round of applause. You guys are great. So this talk is called Domo Arigato, Mr. Roboto, Machine Learning with Ruby. I was at Ruby Kaigi a couple months ago, so I guess it's actually Domo Arigato, but my Japanese is non-existent, so I will say Domo Arigato and just apologize for it. And this talk is for my younger brother, Josh, who passed away unexpectedly this summer. All right. Hi. Part zero. This is a computer talk, so you have to start with zero. I tend to speak very quickly, more so when I'm excited, and talking about Ruby and machine learning is super exciting. So I'm going to try to slow it down and go at a normal pace, but please do, if you hear me kind of going off the rails a bit, wave or some kind of signal something just to comment, dial it back. Just make it big so I can see it from up here, because it is super bright up here, which is awesome, but I feel like I'm staring into the sun. Cool. And feel free to shout, too. That's also fun. I'm going to talk for about 35 minutes, and then we'll have some time at the end for questions. It's funny. I gave a talk last year at RubyConf on garbage collection, and I was all set, ready to go. Talk was all buttoned up. I'd finished my slides multiple days before, which never happens. And Matt's came in and sat down front row center, like right before I started. And I blew through the entire talk in like 25 minutes or something like that, so I will try not to do that here. I've been actually practicing all my talks now, like imagining Matt's is in the room, which I don't think he is, but if he does show up, I'll be prepared. My name's Eric. I'm a software engineer slash manager at Hulu, which a friend of mine has cheerfully called Netflix with adverts, which is not wrong. So you can find me on GitHub, Twitter, et cetera, et cetera, in this weird human hash that I felt the need to make. I write a lot of Ruby and JavaScript for work, even a little bit of go, which is nice. And my side projects tend to be Ruby or Clojure or Lixar. I'm actually also a newly minted address contributor. So if you haven't heard of Idris or you're wondering what it is, come find me after the show. It's a lot of fun. I've been writing Ruby for about five years-ish. And about a year ago, I think, I wrote a book, this book called Ruby Wizardry, which teaches Ruby to eight to 12-year-olds. So if you're interested in that, also come see me. I'm happy to talk about it. I'm out of stickers, but we do have a 30% off promo code from the folks at No Starch, so thanks also to them. At any point this week, if you want to buy the book online, if you go to NoStarch.com and use that promo code, you'll get 30% off. Cool. This is not a long talk, but I think we still benefit from having an overview of where we're going to go. So I'm going to talk a bit about machine learning generally, a bit about supervised learning in particular, and neural networks in particular, and even more particularly than that, machine learning with Ruby on the MNIST data set, which we'll talk more about in a second. But first, machine learning. So show of hands, how many of you feel very comfortable with machine learning? Or even you could say, pay in like 15 seconds or a sentence. I could tell you what it is. OK, more hands. Great. What about supervised learning? Cool. What about neural networks? OK, cool. It was interesting. It was like a weird difference. There were some people who were like, I don't know anything about machine learning, but neural networks, yes. Which I thought was pretty cool. Cool. All right. So the good news is, if you didn't raise your hand, you will still be fine. This talk is sort of introductory. You do not have to be a mathematician to do machine learning. It does help. If you happen to know high school level stuff like basic stats or first year calculus or linear algebra, it helps a lot with understanding the way that the machine learning algorithms work and how everything is put together. But it is not necessary to know those things to use the tools we're going to look at. And it's not necessary to understand the content of this talk. So that's good. So this is what I think of when I think of machine learning and AI and robot stuff. This is actually like a dumb little drawing I did for the Ruby book. But this is probably not what you guys think of when you think of machine learning, or probably not robot pirates. So what is it, right? I guess if I were going to pick one word to describe machine learning to explain what it is, I would say that it's generalization. The idea here is you're going to get a program to assemble rules for dealing with things in the world, with dealing with data in such a way that it no longer has to be explicitly programmed in order to make generalizations. And what do I mean by that? Well, one thing that you can think of this as is pattern recognition. So maybe we go outside and I say, OK, that's a car. And that's a car. That is not a car. That is a car. And we do this for a while. And then we go to another part of Cincinnati, or we go to Norway, or we go to the moon. And I say, OK, is that a car? Is that a car? And the idea here is to see if you sort of generalized, right? You don't just have a list of car items, things that are cars. And if it's not on the list, you don't know. You sort of build a concept of a car. You have a sense of car-ness that you can kind of look at the world with. And this sort of car filter will tell you if you think something is a car or not. So we sort of want the machine to tease out these underlying patterns in some data set, and group them, or cluster them accordingly, or make a prediction about what they are. And this first thing I've talked about is this notion of kind of making predictions based on labeled information, the idea of having a list of this is a car, this is a car, this is not a car. This is supervised learning. The idea is you can perform classification or regression based on some data set, and you generalize from the data that you know what those things are to data that you haven't seen before. So in terms of classification and regression, classification is very much like that car example, right? Is this a car or not? Regression is, for example, I think the canonical one is housing data, right? Maybe you have a plot that says for some given feature like square footage or proximity to a good school, here's how housing data change with that feature. And so you might have this kind of scatter plot and say, okay, I'm gonna put a new point down and how much should this house cost? And if you've ever done linear regressions or lines of best fit, you have a cloud of points and you have a line or a curve that sort of fit the data, that's the idea here. Is your sort of performing function approximation? You're finding a curve that kind of explains the data. And I think that's another key attribute here is we're explaining data. We're finding patterns that human beings can't necessarily tease out on their own. So in order to do machine learning, we have to think in terms of features and labels. The MNIST dataset we'll talk more about, but it's a database of tens of thousands of handwritten numbers. And so the features here are actually raw pixels and I'll have a visualization for that. But the idea is you have these sort of, you can think of an image as like a vector, right? Like an array of intensities. If it were purely black and white, you could almost imagine this like ASCII art thing, right? Where it's like zero, zero, zero when it's blank and then you have one where there's a black pixel. And you can kind of unwind this across the line breaks and have like this one big vector. And so those are our features, this sort of like, you know, a list of pixel intensities. And the labels that we have for these features are digits, the numbers zero through nine. So we need to first divide our data along those lines, identify what the features are and what the labels are. And then we need to divide the data into a training set and a test set. So the training set is the data, these are the data that the machine learning algorithm will operate on and we'll learn about. And then there's a test set that the machine will then look at and make predictions about. And in machine learning, it's generally considered cheating to feed the test data to your algorithm because you know, it's, you're not necessarily proving that you can generalize if you are testing the machine on the exact instances that it's already seen. But you know, maybe you work for like a machine learning shop and one of your products is making recommendations or something like that and your test data come in all the time. Maybe then it's reasonable to say, okay, last year's test data, let's train the machine on that. We'll see if that gets any better. We'll pick some new test data and we'll keep going. So that's another thing I want to kind of call out this notion of memorization as opposed to generalization. And generalization is the goal. And like I said, these digits zero through nine. These are the labels and this is what we're going to try to predict. Given some image that represents a handwritten digit, is it a zero, a one, a two, et cetera? And this is sort of what I mean by a vector of intensities, right? So we might have a handwritten image. This gets sort of mapped to a vector where you have zeros for the absence of colored in pixels and you have some small real number value or some non-zero value for the pixels that are sort of lit up. Cool. So we've talked a bit about machine learning, what that is, and in particular supervised learning, which is sort of labeling unknown data based on data we've seen before. But what's a neural network? It's a machine learning tool modeled after the human brain, which is not super helpful. It kind of reminds me of that talk I think Aaron gave on virtual machines. They're machines that are virtual, right? This is a network that is neural. So let's start with artificial neurons, right? So human biological brains are made up of biological neurons and in much the same way neural networks are composed of perceptron or neuron units, right? And this is a very simple little neural neuron called a perceptron. And the idea here is you have this thing that's sort of almost like a function, right? So a biological neuron has a bunch of dendrites, right? Synaptic messages sort of come in through this like little branchy thing. So on the left, these sort of like inputs and they go into the body of the neuron and what comes out? Like it's transmitted along the axon, sort of the output is, you know, the synapse, you know, it sort of fires or it doesn't, the neuron fires or it doesn't. And you can model this as sort of a function, right? And you can think of these dendrites as vectors of signals and weights. Like what is the signal? How important is it? Is kind of a way of thinking about it. And the axon is the output and most perceptrons will threshold. So you'll set some little threshold saying, hey, if we're above this value, one, we fire or we're below it, zero, we don't. There are models where you just kind of pass along a real value as opposed to thresholding. But in this particular example, you can see that kind of like step function symbol indicates that there's thresholding happening. And during training, what happens is we sort of initialize all these neurons to have little random weights. And then we start looking at the data and as we train, we are told, you thought that this was the number four. It's actually the number seven. And so for each APOC of training, each iteration of training, we're going to use a feed forward neural network, meaning sort of we feed all the data through. There are no cycles inside the network. And then we do something called back propagation. The idea is you take a stab at the data, you see where you screwed up, and then you propagate that error signal back through the layers of the network, kind of tinkering with weights as you go. Figuring out, okay, if I change this weight by this much, now I'm going to call that a four when it's a four. If I change this weight by this much, now it's going to be a three when I say it's a three. And we keep doing this over and over and over, either until we get tired of it, which happens sometimes, or until we hit some predetermined threshold where we say, all right, our error is low enough that this is fine. Maybe we don't perfectly categorize every single thing in our training data, but we get like 99% of it right, and that's good enough. Like I said, we initialize these weights to small random numbers for two reasons. Actually, one, if you don't know what the data are going to be, there's sort of no sense biasing it by picking values that might introduce some kind of error in the data, but also when you have very high weights, you tend to overfit, and we'll talk in a minute about overfitting, but the idea here is you don't want to sort of believe your data too much, if that makes sense. So we talked a bit about perceptrons, and this is sort of how you might organize perceptrons or artificial neurons into a neural network. So generally they look something like this, where you have an input layer that corresponds to the sort of size or shape of the number of features that you care about. The hidden layer, which is kind of a hyperparameter you can tune. That's a fun word that I like to say. The idea being you can kind of turn the knob and say, I'm gonna have 100 neurons in this hidden layer, 200, okay, 50, and just kind of move it around until you see how your neural network behaves on the data. One downside of neural networks is that they are notorious black boxes. You generally don't see the weights that the network assigns, and they wouldn't mean much to you if you did. So it's not really clear what's going on inside this little learning machine. But you can turn some levers, or turn some knobs and pull some levers, and tuning the number of hidden nodes is one of them. And then finally you have an output layer, and the output layer's size, the number of neurons there, corresponds to the number of labels you expect to have. So if you have the digit zero through nine, you'd expect to have 10, since there are 10 possible labels that might come out. There are other things that you can tune in neural networks. We're not gonna talk a whole bunch about them, but I'm happy to chat about it later. Things like the learning rate, like how quickly the machine is actually learning. If you think of minimizing error as sort of like a surface and you wanna like get down all the way to the bottom, you can imagine almost having this kind of mesh, right? You have like those cartoons they show you where they try to show you what gravity looks like, there's that, you know? You can imagine trying to get to the bottom of this. If your learning rate is very large, if you take large strides toward that bottom, you could potentially kind of kick around and bounce inside the bowl without actually getting to the bottom. So lowering the learning rate to a smaller number means you train longer and it takes more time, but it does mean you're less likely to sort of bounce around looking for that minimum error, that global minimum. And if you've heard of deep neural networks, those are neural networks with more than three layers, which is kind of lying to you. But basically, instead of like one hidden layer, you might have 10 or 50 or 100. And all these papers that you're seeing now from Google and folks like that who are working with deep neural networks are building these very elaborate architectures, which I know a tiny bit about and I'm happy to chat about. But if you have experience there, I'd like to talk to you after, so cool. So that sort of covers the theory behind machine learning and supervised learning and neural networks. So now we can move on to looking at the data. So we're going to look at the MNIST dataset. We're going to use a library called the RubyFan gem, which is, fan is F-A-N-N, fast artificial neural network, which is written in C, and the gem are a bunch of bindings to that library. And then we'll talk about developing an app that lets us actually take advantage of the network that we've trained and say, okay, now that you've done all this training, is this a two? Is this a four? Is this a car? Or a five? And so on. So our data, as I said, are images of handwritten digits, and they've all been sort of sized, normalized machines. They're all the same size and centered. So you're not going to have to worry about sort of convolving or scrolling over the image to figure out what's there. There are deep neural networks and neural network architectures that can kind of like a little magnifying glass, like go over an image and find stuff anywhere, but this is much simpler. So we center the image for the machine so it doesn't have to go hunting around for it. There are 60,000 examples in the MNIST dataset and 10,000 are for training. So effectively we're going to take tens of thousands of examples, run them a bunch and then we'll have 10,000 tests. Is this a one? Is this a two, et cetera? And the dataset is open source. It's available online. So if you go to this URL, you'll find it if you search MNIST data, you'll find it. And when I tweet out and share the slides, you can follow all the links there too. And this is a sample of what these data sort of look like. As you can see, like there's some kind of loopy sideways, weird looking zeros. There's curly twos and kind of square bottom twos and one that looks like a Z. This is just samples of human writing. So some of these, I think, even some people might have trouble with. But the idea is the machine can tell, okay, it's a closed top four, it's an open top four, it's still a four. It's a nine that's sort of curly or it's a circle in a line, it's still a nine. So I went ahead and I trained a neural network on the MNIST dataset. And I've done this a few times. And I wanted to see how well we could do. Right. So we'll take a look at the Ruby fan output in a second. But the idea here is for some set of parameters, like number of epochs, I picked a thousand. It never needs a thousand because it gets to the minimum error that I set for it before then. But sort of working over these data, like what's the best that we can do? And so it gets, at some point, I think like 99.99 some odd percent of the data that it trains on correct, which is what we want. But then on the test data, it only gets 93% correct. So we have 93, roughly 9,300 correct, about 600 incorrect, so about 93.28%. Which is good. It suggests to me that we're not overfitting too badly. And I mentioned overfitting and I wanna talk a bit about that. The idea behind overfitting is that you're believing your data too much, right? You're sort of modeling not just the underlying features, like the actual information that you want, but you're starting to model quarks or noise or random fluctuations. Like things that you actually don't want are becoming something that your model believes strongly are like things that you should care about. So noise can come from a lot of places. It can come from sensor fuzziness. It can come from humans mislabeling stuff. It can come from sort of unmodeled things that you're not thinking about that are suddenly influencing the data. But as you become very, very tightly fit, you'll start to see your generalization go down. And we can talk about after, but there are these kind of very interesting charts where you see like the error decrease, decrease, decrease, and then you sort of hit this inflection point and error goes through the roof on your test set, even as it continues to get better on your training because you're sort of modeling every nook and cranny of your training data at the cost of not being able to generalize well. And in neural networks, overfitting occurs when you have very high weights, which is part of the reason why we initialize to small, little random weights. And it also occurs if you have lots and lots and lots of hidden nodes. And so again, tuning that number of hidden nodes perimeter will tell you if you're overfitting or not. So I think this is pretty good. 93, I'm happy with that. And feel free if you pull down the GitHub repo and you play around with it, please do make PRs with different hyperparameter tunings and things like that if you can beat my high score of 93.28. Cool. So now that we have talked a bit about how neural networks work and how they can be implemented for something like the MNIST dataset, we're gonna talk about how we built and trained this neural network library in Ruby and then built this neural network for the MNIST data and then sort of look at the app that I put together to allow us to test it. So as is always the case, I think with the internet, you start out to do something and then you find out that somebody else has already done it way better than you. So I'm hugely indebted to Jeff Busing. I hope I'm pronouncing his name right. When I was about two thirds or three quarters of the way through the code for this talk, I found his implementation for Ruby on the MNIST dataset on GitHub and I highly encourage you to check it out. It's very, very cool. We have very similar approaches, but I think his handle's like touch events and stuff like that, so when we get to the demo, it's unfortunately not gonna work on your phone but if you search Jeff Busing, you'll find his repo and his does work on your phone. So I will also accept pull requests for people who set up touch events. So that would be cool. Anyway, so the front end, my major contribution here is disastrously over-engineering things. So I decided to do the front end which really probably only needs like 30 lines of JavaScript and jQuery with React which I think has been proved to be an infinite number of lines of JavaScript. But why not, right? Like playing with toys is fun and playing with tools is fun. And it's no secret that I don't love JavaScript but I did find ES6 and Webpack and React to be very nice tools. So I think that things are getting better. If you don't remember anything else from this talk, keep that in mind, things are getting better. So this is just an example. I hope you can read this. It's just the submit code from the React component. So just use the fetch API to send the canvas data over to the Sinatra server. It does some processing and sends back some JSON saying, hey, here's what I think that number is. And if you haven't used React, like I said, I enjoy it. This is an overkill example but the idea here is you can build these sort of nice, neat little UIs where you have an editable canvas, a prediction and then a couple buttons. And we'll see that when we take a look at the UI in a little bit. So that's the front end. The back end is Sinatra and Ruby 2.2, soon to be 2.4. It would have been 2.3 but I got lazy. And the idea here is to use the RubyFan gem to do the training for us. So you can sort of see all the theory we've talked about put into practice. So we have some training data that we pull out. We go ahead and say, okay, we're gonna create a new instance of the RubyFan, the artificial neural network. There's gonna be 576 inputs which corresponds to a 24 by 24 pixel image. So that's one input for every single pixel in the image. I picked 300 hidden neurons for this. Like I said, this is something you can tune to see if you get better or worse performance. And the number of outputs is 10. And as we talked about that corresponds to the fact that there are 10 possible labels that come up. And then we just go ahead and train on the data. The values here, so train is our training data. 1000 is 1000 max APOX which means if you go through and you train and back propagate and train and back propagate 1000 times and you still haven't hit the error threshold, you're not gonna do any better. You can just stop. 10 tells us it's just for the console output and for RubyFan it says every 10 APOX let me know how we're doing. Like how is the error and sort of like are we sort of still progressing towards that elusive global minimum for error? And the .01 just says that if we get it the desired means that's the mean squared error. If we get to .01 we can stop. And mean squared error is I feel one of those things that's like plagues, computer science and math. It's like once you know what it means the name makes perfect sense. But if you don't know what it means you're like. And so mean squared error is just the average of the squares of everything we were off by. The average sort of makes sense. Like you wanna see what the average error is. We go ahead and square it one to make sure that we have positive attributes or positive values all the time. So you don't have like positive error or negative error like cancelling each other out. And we square it partly because it magnifies like outliers so we pay more attention to them. But also it has some nice algebraic qualities that I do not understand. And that's why we square things instead of taking the absolute value. But I'm sure somebody on like stat stack exchange can tell you a lot more than I can. Cool. So the front end react ES6 stuff like that back end is Ruby and Sinatra using the Ruby fan gem. I feel like it's enough of me talking for now and I'm gonna go ahead and do something dangerous which is a demo. I'm not gonna do any live code. Yeah, exactly. I'm not gonna do anything super crazy but we'll see how well this works. All right. There we go. Cool. Okay. So I'm running this locally. I actually turned the wifi off so I practiced this talk without it because I knew that if I relied on the wifi something terrible would happen. So as you can see there's nothing here and the prediction is two. No, I did it earlier and then cleared it. So let's go ahead and let's say let's draw seven. I think there we go. That's pretty good. Thinking, oh, it's a three. All right. That's great. And there's somewhere where it comes out and I super don't understand why. Like I said, these are sort of black boxes. That's a three. That's great. This is a one. It's going to be sometimes things ones are zeros for some reason. There we go. And let's do another one. Any recommendations from the audience? Smiley phase. Let's do it. Let's see what a smiley face is. Let's see if I can draw a smiley face. There we go. Zero. Yeah, and it only does one digit at a time. So if you do something fancy like this it's not a very good eight anyway. It's a five. So on average, 18 is a five. Which actually is not so wrong if you were to add one and eight and then divide by two. I don't think it's doing anything that fancy. And I'll do one more like a loopy two. And it's usually pretty good at loopy twos. Yeah, there we go. Yay. A four, closed up four? All right, let's see if I can do this. This also relies on my drawing ability which is not great with a mouse. Okay, four. Eight. Wow. Oh, a seven with a bar? There we go. I'm always surprised when I go. All right. I'll try like a European one. There we go. Ah. I'm pleasantly surprised by this. A rounded nine. I just started drawing an eight so we'll try this. All right. A one with a line. Oh yeah. So let's do the rounded nine. That's not a very good nine. It's not gonna know what that is. Oh, wow. And let's do a one with a bar. Huh. Yeah, there you go. Zero with a slash, all right. Let's do it. Why not? Zero with a slash. Where's my mouse? There we go. Huh, huh, huh, huh, huh, huh, huh, huh, huh, huh, huh, huh, huh. Ah, I would have expected like an eight or something or like a. What about square zero? Oh, square zero. But make it big, make it like. Oh no, God, that's all right, I can fix it. Square zero. Neat. That is better than I would have done if you had shown me this. I would not have known this was a zero. All right. Last chance, any more numbers? Oh, two circle eight. Two circle eight. That's the reason this talk actually takes 35 minutes as this part. I budgeted like 30 minutes just for playing around. And we had an ampersand, which, that's kind of like an ampersand, right? That's a cursive S. So let's see what cursive S is. I'm gonna have to look at my keyboard. Oh yeah, you're right. This is an ampersand. Six. But you can see why I would think that. It's sort of like six shaped, almost. Still six. All right. Last one. Stick figure man. I don't know what last one means. I don't know what last one means. All right, let's do stick figure man, then I'll be done. Eight. Hey, it kind of looks like an eight. Okay, all right. Back to the show. Cool. All right, so we've reached that point where we're gonna start summarizing. So what did we learn about, right? We looked at machine learning and we saw that it was generally, pun intended, generalization. That supervised learning is effectively taking labeled data and figuring out unlabeled data from what we know about the labeled data. And neural networks are awesome. Neural networks are super cool. They do have some pitfalls. Like I said, they're a bit of a black box. They can overfit, but if you tune them and play around with them enough, you can get really good results. And you can do all this with Ruby, which is awesome. Like, you know, I've done a fair amount of machine learning stuff the past few months in Python and Java and Clojure and it's been super, super nice to be able to do it in Ruby. So I'm gonna tell you a scary story and then I'm gonna finish with an inspirational one, I hope. The scary story is I was giving a talk similar to this one at Euroclosure on a different data set. And I was building decision trees based on Los Angeles police data from the year 2015. And the idea was if you know somebody's sex, race, stop type, which is either pedestrian or vehicular, you can predict the incidence of what they call post-stop activity. And the police literature doesn't have a lot on this publicly, but post-stop activity means an arrest or a search. And it turns out that with about 80% accuracy, if you know just somebody's sex and race from the LAPD data, you can determine whether or not there's post-stop activity, which is sort of horrifying on its own. But you'll see stuff in the news now where people say, oh, we can solve this problem with machine learning. We can say, we can figure out what this number is. We can figure out, is this a car? We can figure out who to arrest because the machine will tell us. And the thing is if you have biased data or even racist data, you will end up with a racist machine. And so we have to be extremely careful about what we believe about the data and what data we put in because it's going to affect what comes out. And so I just caution you as you're doing machine learning stuff on large things like large data sets like medical data or police data or have you things that are a little bit more emotionally salient and important, frankly, than classifying numbers, think carefully about that. Think about what it means for the machine to say, yeah, you should definitely arrest this person because of their color or because of their sex. So we have to be careful. So that was the scary story. This is the inspirational one. This is the TL-DPA or the too long didn't pay attention. So we're almost there. We're very close to the end. We can do machine learning with Ruby. But the tools in Ruby are not as good as the tools in Python and Java. Ruby has a phenomenal community. It's full of smart, motivated, really supportive people. And so if we want to do machine learning in Ruby, we have to build these tools. We have to contribute. We have to maintain. We have to break new ground and we have to be willing to go out there and build the stuff that we want to sort of be the change we wanna see in the code to horribly steal from Gandhi, I think. But we can do it. This is something that we can do if we are diligent and if we really want to. Kedav Marata, I think, gave a really good talk at RubyKaigi about the state of cyberbee and the need for contributors and the need to sort of grow. And whether or not we have bindings for things like scikit-learn, whether or not we have bindings for things like Weka or other Java tools, DL4J, things like that, whether we're doing stuff in JRuby or MRI or Rubinius, we need to build these tools if we wanna have them. So please do think about contributing to tools like RubyFan to projects like SciRuby. Feel free to check out the public version of the Ruby MNIST app, which is ruby-eminist.herokuapp.com. Like I said, it will not work on your phone because I'm lazy, but poor requests are welcome. And that's just github.com. My name, Eric Key-Weinstein and ruby-eminist. And I am super confident that if we work together really hard on this, we can do great stuff like machine learning using a language that we all love. So that's all I got. So again, thanks so much for coming. I really appreciate you all coming here. And I think we have five to 10 minutes for questions, if anybody has questions. No, sorry. So the question is, are there well-maintained recent RubyGems, things that we know are sort of underactive development that we can rely on to do machine learning stuff? And at the moment, there's nothing that I know of that's much better and more recent than something like RubyFan. So if we do want to build tools like this, we've got to make them. Sure, the question is, how do neural nets scale when you throw more resources at them? And do you mean in terms of just training faster or? Either one. Sure. Right, does it get better if you have more neurons? Sure. So you will tend to overfit with more neurons per layer. And I think more neurons generally. But there are some interesting architectures coming out from places like Google that are very deep neural networks with lots of neurons. And they have found ways to sort of mitigate overfitting. One thing you can do is regularization. You can kind of prevent your weights from getting too big, because the problem with deep networks, there are two. One is that your weights can get very, very large, which continues to overfitting. The other is that you can have what's called an exploding or a vanishing gradient. The idea being when you go through all these deep layers, that little error signal from the very end is supposed to back propagate and kind of adjust your weights becomes extremely weak. And so there are tools for mitigating that. But large networks can be prone to those problems. So the answer is not necessarily. In terms of scaling, there's one thing I can think of, which would be if you wanted to use an ensemble machine learning method like boosting, say. The way boosting works is you have these little learning algorithms. You can pick whatever you want. And they have to be what are called weak learners, which means just, on average, if it's two possible labels, more than half the time, they're correct. And the idea behind boosting is that you kind of train all these little weak learners on subsets of the data. And then they work together to produce. They effectively vote. It's sort of an overgeneralization, but that's how that works. And so if you wanted, you could train. You could have a bunch of different machines running neural networks, training on subsets of the data, and then running those in parallel, you could then do something like boosting afterwards. And boosted neural networks do tend to work pretty well. They can take a long time. But like I said, if you're parallelizing the actual neural network training, you might see some savings there. The issue with boosting, there are two ways it overfits that I know of. One is if the underlying model overfits, they will tend to overfit. So you have to be careful about not keeping those neural networks too big or having high weights. The other issue is in the presence of what's called pink noise, sort of uniform noise. If there's uniform noise in the data, these little weak learners, the way they work is they pay more attention to problems they've gotten wrong, and they'll work extremely hard to pick up right answers to things that they've gotten wrong before. So if you have uniform noise in your data, it's gonna spend an unbelievable amount of time trying to get all these little pieces of noise correct, and that will contribute to overfitting. So I guess the TLDR there is they don't scale well necessarily by adding more neurons, but I can see a way that you could do it with more machines. Sure, that was a good question. Sure, so the question is, are there other publicly available datasets that I might look at besides the MNIST data? There are a bunch, and they are super interesting. So there are a lot of cities now like London, LA, New York, that have open data projects. So if you go to, if you search LA Open Data, you'll find datasets including like the LAPD Open Data. The University of California at Irvine has a very large machine learning repository. They have things like, you know, incidents of diabetes in certain populations, or incidents of heart disease in certain populations, or things like, you know, data about abalone. Like there are a lot of very interesting datasets that are medical, legal, things like that. And like I said, are sort of more emotionally, culturally salient than just like, what number is this? But yeah, there's a whole bunch, and I encourage you to search those out. I think there actually might be a couple of GitHub repos that are just lists of cool datasets. Sure, so the question is, if you can't find the data that you want, is there a way to generate it? So the short answer is yes. The long answer is that really depends on what the data are and what you need. You know, I know that there are some initiatives now that are kind of going towards places that are historically sort of closed off, you know, things that are very difficult or medical data. So I've been doing some machine learning work on the heart disease dataset from UCI, and the issue is that there's just a lot of dimensions. The data have like 13 or 14 attributes, everything from like resting blood pressure to cholesterol to, you know, and only about 300 instances. And if you have, this is an idea in machine learning called the Curse of Dimensionality. The idea being the more features you have that you want to train on, your amount of data that you need to effectively train goes up exponentially. And you can kind of think of it like, you know, having a point versus a line versus a square versus a cube, like you're getting this like huge increase in the amount of data sense you would need to sort of like explain those features. And so there, I know that there are some initiatives to work with places like hospitals to anonymize and release large amounts of data, but right now in terms of generating it, I think unless you're going to sort of spearhead your own study, I think while it can be done, it's sort of a time and resource intensive thing. Sure, so the question is, do I know the history of why RubyFan is a thing as opposed to writing a binding for TensorFlow? I don't know. I think the last time I checked TensorFlow, people were working on Ruby bindings for it, but I think, so RubyFan is a few years old, I think it's older than TensorFlow or at least older than TensorFlow being very popular. So I think part of the reason is like, that's what was there when people started doing it. The other thing is, I do think people tend to look at these problem domains and say, well, I really like Ruby, but there's all this tooling for like Theano or TensorFlow or scikit-learn and it's all Python. And a lot of people don't have a strong opinion between Ruby and Python and say, I'll do it in Python, it'll be fine. Or the same thing is, people will do stuff with Java if they have like DL4J or something like that, and they view it as maybe a hassle or unnecessary to say, I'll do it with JRuby instead. So I think there's just kind of inertia there and I think we really need concerted effort to either build bridges to Python and Java in Ruby or to build new stuff in Ruby if that's the language we want to use to do this kind of work. Sure, so the question is, how do we use machine learning at Hulu or what are some common use cases for machine learning? So my team does not do machine learning directly. Our machine learning team is in Beijing, but I know that a lot of our recommendation stuff is machine learning driven. So you can use it for things like recommendations, you can use it for things like, I think image recognition is a common one. So if you've ever seen any of these kind of like Google deep learning papers that come out where there's a machine that can identify a bird in a picture with like 95% accuracy, that's an application of machine learning. Generally what they're using are what are called convolutional neural networks and combinets basically, I sort of alluded to this earlier, it's like rather than kind of like down sampling and centering your image and saying just kind of looking at it as a whole, you sort of have these little sets of filters that kind of like scroll over, they convolve over the image. And you can kind of think of it like almost like, I guess like taking like a little piece of paper with like a shape cutout just kind of like going over like another, like a newspaper or something like that. And when those things line up, the signal is very high. You can kind of imagine this like wave function that like when things line up with that filter, you get like a bright, a patch of brightness. And so you'll have like a little filter that picks up diagonal lines or a little filter that picks up horizontal lines or vertical lines. You get aggregated into what are called feature maps and then the feature maps are sort of like, okay, this is like a 4D representation of this image. And then there's something called down sampling and max pooling where you basically pick like all the brightest spots of the bright things, use those to represent the data and kind of like throw some information away to keep the problem computationally tractable. And you keep doing this over and over and over and over until you, and you can kind of see, it's very, very cool like shapes emerge out of that. And then once you have them, you can train on those data. So yeah, image recognition, recommendations, things like that are general applications. Cool, well, like I said, thanks so much. I really appreciate it. And if you have any questions or want to come find me afterwards, please do.