 and Social Vault, who is located here in Kansas City, although I live in Seattle, and I'm a remote worker. And today, we're going to talk about support vector machines. Originally, the title was Recommendation Engines. Then I realized that I only had 30 minutes to talk, so I will only cover one thing. And then after coming to Ruby Midwest, I figured I should change the title. So has anybody here heard of support vector machines? OK, very few of course. All right, who here does machine learning, statistics, math with Ruby? And there's silence. All right, so there is an inherent problem out there. We spend all of this time collecting data and putting it into our data stores. We talk about databases. We talk about key value stores. We talk about putting things away and then querying them. But then, really at the end of the day, we're just storing information. Well, unfortunately, a lot of us don't think about the things that we can use this data for. And honestly, what the hell do we do with all of this data anyways? People are always giving us information, whether it's star rating, like five stars, whether they like something, they dislike it. And really, as programmers, it should be up to us to figure out what to do with that and to bring it back to the users so that they have better recommendations of where to go, et cetera. Now, as all of you guys are here, women too, a couple. Sorry. Just making an observation. Thanks for noticing. Yeah, well, glad you're here. Ruby, unfortunately, there are a few statistics packages out there. There is a couple machine learning packages out there. But they're generally fairly old. Every time I look at a statistics package, it was written in like 2006. It doesn't compile on 1.9. Or I look at a support vector machine, and it goes to a blank page that doesn't exist anymore. It's unfortunate, because as we're all intelligent people, we should be utilizing the data that we have. And nobody really does it, unfortunately. Now, of course, you can use C extensions too, but I don't really like writing C extensions because it can be kind of a pain in the ass. And as a lot of us do, we like to open up an IRB show and mess around with data, see how it goes, and just play around with things. And writing C extensions for Ruby Genes is somewhat more difficult to do that, because you have a lot of overhead. Now, I have a proposed solution for this. So does anybody know the difference between a supervised learning model and an unsupervised learning model? OK, very few people. All right, so one thing that we can use, since we have all of this data, we can use something called a supervised learning model, which means that we are giving it data to use. Compare that to a unsupervised model where it's going out and finding new data. Most of the time, we don't really need to go out and query the web as it exists and find new things. We already have the data that we want to use. Now, there are a lot of supervised learning models, one of them being regression. If anybody went through math or statistics in college, regression analysis, linear regression, logistic regressions, those are all actually supervised learning models. But a lot of our data is really just about attributes. So for instance, a sentence will have swear words in it or a book is in a genre. And it really doesn't work that well to put it into regressions because that's more for continuous data, more for numerical data, not just is it a for movie or is it a thriller book? And like I said before, Ruby doesn't really have very many support vector machine libraries. There's a couple, but they're just not very well supported. So I'm proposing that you load up JRuby and you use the Java packages that are out there. Why JRuby? OK, I think, unfortunately, with some of the people that I've talked to, they think JRuby is really tied to Java. I really don't like Java. Just getting it out there. I don't like Java, but JRuby is wonderful because you can take all of the packages, all the Java packages that are out there, bring it in in line and start messing with it in Ruby, pure Ruby. Enough of an introduction. So the next part of the talk, we're going to talk about mathematics, very, very high level math of support vector machines, how they work, how they're useful for everybody here. Then we're going to get into a little bit of implementation and then look at some source code. I'm not going to do a live demo because I know they're tinted with the demo gods, so I'm going to stay away from that. So any supervised learning algorithm, there's really only three major steps. You take your data, you put it into the model, and then you can make predictions off of that data. Support vector machines are exactly the same. So you organize your data, you map it to hyperplanes, which are just surfaces in the indimensional space, and then you get back classic predictions. Very simple. Totally simple, right? It's a little bit hard with the math noun, so if anybody has any clarifying questions, just shoot your hand up, and I can explain it a little better. I know it's a little bit outside of the realm of most Ruby conferences, but stay with me. So as you can see, I'm going to use the laser because this is awesome. If you see this guy, that is just a big splat of data. Now, if you look at it without any of these colors on it, it just looks like a shotgun blast of data. Like, what does this mean? Well, all support vector machines are doing is they're mapping it to a higher dimension and then splitting the data in two, or three, or five, or whatever have you. And so that's what you see when it gets mapped to three dimensions. Does that make sense? You guys still with me? Somewhat? OK, yeah. You can kind of think of it as you have the United States and we have all of the borders, and we're just defining those little borders so that each state is its own individual piece. But really, these hyperplanes are like little borders. So support vector. I'm sure nobody knows what that means. If you look at these guys, you have dotted lines, right? So these are just dotted lines. You like the map? Yeah, I'm sure you do. So support vector machines are very simple. What it is is these little support vectors, which are the dotted lines, all we're trying to do is we're trying to maximize the margin. So we're trying to maximize the distance between different clusters of data. Make sense? Yes? What's the intent behind that margin? Yes, so the intent behind maximizing that margin is so that you get better classifications. So that way, instead of, I mean you can put a hyperplane or a line or anything anywhere, for instance, but you might have things very close to it that are misclassified. So the idea is to spread clusters away from each other as best as possible. All right, hopefully you're all still with me. I'm getting to some code soon, so I know this is probably more math than you guys have had in a while now. All right, I'm almost there to stick with me. All right, so mathematicians are never really satisfied with anything, obviously. And the world is not linear, so they came up with something called the kernel trick, which is using some crazy math, and they replace all the linear pieces with an exponential function. Really what this does is, instead of putting a line in between things, it uses a curly line. That's basically what it does. All right, so enough math. I know that probably nobody here really cares about the math. They really care about the implementation. So before I go on, is there any clarifying questions just to make sure? Yes. Yes. OK. OK. The fact of this, you didn't understand all of that. Yes. Can you sum up in a sentence what you wanted us to get from that? Yes, of course. So the main idea with support vector machines is to build, OK, so let me think here. That's the best way of doing this. The best way to describe it is you want to, let me go back to this slide. OK, so you see the red and the green, right? Now, you could put a line anywhere. It could be straight up and down. It could be whatever. But when you put a line like that, it maximizes the difference between those two clusters so that you get better classifications. This way, you don't get something that's classified, basically. Does that make sense? OK. So are you trying to find the best line to the best margin? Yeah. Or are you trying to create a margin? So you don't create a margin. You're finding the best line. And really what it is, is think of it as a border. So on the left with the red, you have cluster A, and then the green is cluster B. And you're looking to divide those two the best way that you possibly can. Does that make sense? Are you just taking large amounts of noisy data and sorting in the piles, and the machine figures out what pile to put it in? So if you have a name like Chris, you don't know if it's male or female. You want to find that line. So you look at their middle name, and you look at other characteristics, and you figure out, aha, that's a Chris, it's this, or that's a Chris, it's this. Yeah, so that's actually a pretty good example. So for instance, you could have full names. And you could look at first, middle, and last names. And map that to whether they're male or female. And what's really nice is that when you have that classification, it will, it separates things based off of like, if your middle name is, you know, I don't know, David. David, yeah, exactly. So Chris, David, whatever. That would probably be male. So really what it's doing is it's using the previous data to separate into new classes. Do you have a speaker? Yeah, I was just wondering, like, so this client plays in different dimensions. How do you associate, how do you choose which features to map to dimensions? Is that something you have to do before, or is that something that you get help from? Yeah, so I'll go over a little bit of this in a source code that I put together. But basically, it's up to you to figure out the features. Obviously, you don't want to take a sentence and then map something like and. I mean, that's a stop word and nobody really cares, but you would want to use like Ruby as a word or you want to use, damn, like, if you're mapping for sentiment. Anybody else? You mean, yeah, there's actually a very interesting project that somebody did, which. Okay, I'll repeat the question. So he was asking if there is any library out there, if I'm understanding this right, that maps natural language to sentiment, right? Yes, there is. There was somebody who did this, I don't know why I can't remember, but I can post it later, but there is a couple of them out there. So, back to technology, since this is a technology conference and I've probably overwhelmed you with too much math for the entire, live SVM, okay. So there's a lot of packages out there that do support vector machines. There's, I believe it's in WECA, which is a data mining library. It's pretty much in any of the data mining libraries out there, but there's one in particular that just about everybody uses, which is live SVM. Of course, you can Google it and take a look. It's an academic style project, so there's a little bit of that slant to it. It's pretty neat and you can get up and running with it fairly quickly. The nice thing is that while it is in C, they also release a Java port version, so you can make a JAR file really easily and then using that JAR file, you can just pull it into an RRP session and start playing around with it because that's really how anybody learns. So there is a Ruby version. They both kind of suck. One of them is a Swig Rapper and the other one is just playing Ruby that goes to nowhere, so I would recommend staying away from those. All right, so let's look at some source code because I'm sure you guys are tired of me talking about graphs and crazy stuff. Actually, no, no, sorry. Let me first, okay, sorry. So let's get to a full example. Now, in my abstract, I said there is Netflix, for instance, okay, who here uses Netflix? All right, who used to use Netflix if it doesn't anymore? All right. So as everybody probably knows, when you sign up for Netflix, they have a bunch of movies that they want you to go through and rate each one one to five stars. So you go through and you say, I like these movies and it says, oh well, obviously you might like these. So let's build a Netflix movie recommender. It's a fairly easy problem to do. Basic, oh shit, what did I do? What did I do? All right, let me try that again, sorry. All right, so basically you have, let's say that we have three movies. We got Ghost Fever, what is it, Attack of the Killer Tomatoes and the Blob. They all have attributes associated with them. They're one of them's campaign. I don't know if anybody's seen Ghost Fever, but it's pretty awful. It's a guy from Sanford and Son doing a horror movie. It's like the worst idea ever. Then there's Attack of the Killer Tomatoes and the Blob with teens and whatever. It's really up to you guys to figure out what's important, like what attributes are important, but in this model, I just figured campy, man-eating monster, the years, those kinds of things are important. Obviously you're gonna want more data, but this is just an example. It's now source code. You guys see that, okay? Okay, bigger. My text may have screwed up for some reason and I won't listen to my commands. Is that good or bigger? Good. Good, okay. So all I did here is I threw the data into a multi-dimensional array with the star rating. So I said five stars for Ghost Fever, even though it's a really awful movie. And then there's just a hash of the attributes going to the numerical value. Live SVM doesn't support like Booleans, but one is a Boolean. And really it's fairly simple to throw this through the Live SVM Java version. So I've got the Live SVM jar here and just load a new model. So what we're doing here is we make a new model. I'll step into that. And honestly, this stuff is all really just a wrapper. It's really easy to load up a Java class in JRuby and start playing with it. There's a couple of things that can be a little annoying. For instance, when you create an array in Live SVM, you have to create it of a specific length. Otherwise, it'll throw in the whole pointer exception, which is awful, but yeah. So all this does, let's stuff up. And then Gamma, which if you remember from that little squiggly line graph, it's just a parameter and they recommend that you do one over how many attributes that you're looking at. Then you just load it up, come back, and then you create a new version of the SVM prediction. I'm just wrapping it again, so that I don't have to call Java Live SVM, SVM, SVM predict every time I can just call predict and step back here. And for instance, in this very, very small data set, we have something that actually works pretty interesting. If you predict what the star rating for the year of 1988 would be, it would be two because I said that the blob was a two. That's really not that interesting in the year stuff. What gets more interesting is when you start adding things together. So you don't actually have to have a fully developed matrix to get some interesting predictions. Man-eating monsters one and teens one would actually return a two because in this case it's more pessimistic. Next to last time. Yeah, that would not work. Well, you don't like T ends. Yeah, sorry. That is not on purpose. And I will commit that to GitHub. All right, it's actually, okay. So really what we're doing here is we have all these attributes that map to something. And then when you throw it through the model, it'll return back what it guesses to be correct. What can you use that for? You can use it for a lot of things. You don't have to use it for Netflix. You can use it for spam filtering. You can use it for sentiment analysis. You can use it for swear word detection or not swear word detection, but how angry somebody is. Of course you have to figure out how to map whether a sentence is angry or not. But it's, it should be easy enough. How much, how much data can that deal with basically? Yeah, so you can deal with a lot of data with a lot. Yeah, I'm going over to my head. Sure, you can go over at least hundreds of thousands of rows. It gets a little bit more difficult when you start adding like hundreds of thousands of attributes to things. So you should keep it a little bit. You don't wanna like have an attribute for everything like whether it's the length of one or sorry. You don't wanna have too many attributes because then it gets a little bit picky. Yes? How does this differ from Bayesian classification on initiative? Yes, so Bayesian classification is really great too. And I'm actually, I recommend that everybody looks at some of these other models. Bayesian classification is great. It tends to work a lot better with continuous data. So things that are not like, it's in a, it's a horror movie or it's a sci-fi book or whatever. Bayesian classification usually is a little bit, you have to do something more like how many occurrences of a word happens to figure out if it's spam filtered or whether you should filter it or not. So it's a little bit more on the continuous access. This is a discrete thing. Yeah, from my short plane of Bayesian it seems to get a train at a number of times and it even potentially adds a random noise to it. You have to do that here as you're a one-time train ready to go. So it's a good idea to keep rebuilding your model as new data comes in, but generally you don't have to add a bunch of noise. It's pretty good at figuring out classifications. It's a pretty robust algorithm and a lot of people use it. Yeah, I've never had to add random noise and it works fairly good. Anybody else? Yes. Yeah, when you're adding those things like the monsters and the year, do those become your hyper planes? So no, what we're doing is is each attribute is actually a new dimension. So for instance, if you have Vanity monsters, teens, horror movies, sci-fi, whatever, then you have four different dimensions. So really what we're doing is we're mapping this under really high dimensions and there's only one hyper plane usually. Or sorry, there's a hyper plane for each classification that you have. So for instance, with five stars, there would be five, sorry, with the C. So I tested it with the Java command line interface and it was really slow, but then I actually tested it versus the C interface and what I got on GitHub and it was actually fairly fast. The problem is with the example that they give you in lib SVM, there's a lot of overhead with starting and stopping the JVM every time. So when you write it in this way, it's pretty fast. It's really fast. It's just as fast as the C stuff, all right? So I'd like to recommend that you guys look at some of this other stuff. Obviously somebody brought up Bayesian ranking, there's slayer regression, logistic regression. Just because they talked about support vector issues doesn't mean that you shouldn't learn about these things. They're all good for their own reasons. All right, so what you guys should get out of this is that we have data and whether or not you use support vector machines or not, you really should start thinking about whether you can utilize the data that you have and bring it back to the user in some sort of a recommendation format. For instance, Amazon does this with book recommendations and after they did that, their revenues went up actually quite a bit because when you're able to pair things together, your users will be much, much happier. So in conclusion, I have one request of you all. I made a speech in RedDirt, RubyConf and I bombed at it and maybe I bombed at this one too. I did not do so good and I'm working on it so please give me feedback. I have all of my Twitter information, email, GitHub, speaker aid, et cetera. And any more questions? Oh yeah? Just one more question, the array stuff, it's easier to create the array in Rubyland than I used to Java and I give it symbol int. Oh, that's really good to know, sweet. That's awesome, thanks. Can you repeat it? Oh, so if I get this right, you create the array objects in Ruby, then you do a to Java and then you call... Yeah, you just call to Java, give it an argument of call of int and that will return back a Java native limited array with the type. Okay, so just to repeat that, you create a Ruby array, you call it to Java and then you give it the argument of symbol int, right? Yeah, for the type. Yeah, that's really good to know actually. Anything else? All right, thank you. All right, that's it, thank you.