 You can learn to drive a car by getting the keys having someone show you how the steering wheel the gas and the brakes work Try it out in a quiet parking lot and then ease yourself onto the open road And before long you're able to get yourself to the grocery store and to a friend's house and you can get done the basics Now contrast that With becoming a race car driver in that case You want to be able to get as much performance as possible out of your car and to do that You need to know how it works down to the nuts and bolts That's the process of looking under the hood or opening the black box If you want to get the most out of your tools then how things work really matters So to show you what I mean by this for most of us beginning machine learning or software engineers This is what a support vector machine looks like Here is the black box you import scikit learn Cue up a simple example and run it and you're done. Here are the keys. Here's the steering wheel go But to get the most out of it you have to go deeper My goal in this talk is to walk you through the process of going deeper But more importantly to show how any of us can go deep on any subject that we want to get Become that we want to master so as I tackled support vector machines which Were fairly new to me when I started this presentation. The first step I took was to read the scikit learn documentation This is what I came across It's not particularly helpful to me being new to support vector machines It's a nice concise summary of the basic principles of how they work But it doesn't explain it to someone who doesn't already know them well One thing that the scikit learn documents did provide was this great diagram showing Here two different colors of points being separated by a line that line looks kind of like a road Where those two lanes are as wide as they can possibly get before they start touching data points? This was helpful My next step was to go find a nice tutorial and read it I found one that was highly recommended and started reading through it and I saw definitions and tables and equations and graphs and diagrams and plots and very soon got very overwhelmed and there were theorems for God's sake and Felt strongly that it was probably important stuff, but it was not very accessible to someone who is new to the topic So I took a step back I set that aside and I went to YouTube And I pulled up some of the most popular videos on support vector machines and in the course of doing this saw Multiple explanations of what they are what they're used for how to visualize them how the math underneath them works and The principles finally started to click these began becoming clear in my brain Following this I went and found some more blog posts again from a variety of posters Each with their own way of visualizing how they work of explaining the principles behind it And now the equations and the math and the definitions started to make a little sense enough so That I in the next step Started to try to explain it to myself and the way I do this is by drawing pictures And so I started illustrating some of these concepts in a way that would make sense to me in my brain Now that the idea is underneath support vector machines at this stage They're they're crystals they're nuggets, but they're scaffold, but they're not fleshed out So to do this I find it very helpful to choose a toy example It's this is a bit of an art It's trial and error, but finding an example that's simple enough that you understand it completely And you know exactly how it should work, but it's just complicated enough to illustrate the principle So for this one I settled on fruit So imagine that we have fruit it can either be small or large or yellow or purple now Any small fruit is a plum If it's yellow, it's not yet ripe, but if it's purple, it's ripe and good to eat Any large fruit is a peach if it's yellow, it's great to eat, but if it's purple, it's rotten You don't want to eat it So in this example we have in the world of peaches But there's a size axis and a color axis and you can see the good things to eat or at the upper left and the lower right Now in this example once you have this example If you can get it to where you can explain it to you know roughly like a 12 year old a sixth grader Which means that you use words and ideas that are common No jargon or if you do use a jargon term you explain it thoroughly So here's my attempt to do that for support vector machines So imagine you have peaches and they can be any color between yellow and purple and you would like to figure out Which ones are good to eat? You'd like to know in fact if you get a new peach based on its color whether you should eat it or not So what you do is you get a bunch of peaches and you grab one and you try it You get one that happens to be yellow. You taste it. It's good So you make a green circle and put it at the point that represents its color You grab another one that's pretty purple and you taste it and it's nasty It's rotten so you put a black X at the point that represents its color and You do this again for a few more peaches some yellow one some purple one some ones in between and before long You have a data set that looks like this The green circles all show good peaches the black X's show bad peaches now that you have all this data You would like to make a prediction based on a peaches color Do you expect it to be good to eat? Support vector machines allow you to do this and what they do when you have two groups of data is they come and they put What looks like a road in between them? There's a dotted center line and then two lanes and it tries to make that road as wide as it can possibly get Until the outside of those two lanes bump up against your two data sets The center line is the divider between the two groups Anything to the left of that will be assumed to be good to eat anything to the right of that will be assumed to be bad to eat And the lanes on either side are called margins for lack of a better term Now imagine though that in this set of peaches This can be trickier. What if you have some that don't really follow with the group You get some yellow peaches that just don't taste right or you get a purple beach that for some reason tastes amazing Now the data set there's nowhere you can draw a line that separates the green circles from the black X's But what you can do is still create a dividing line with its margins But any data point that is on the wrong side of its margin gets a penalty based on how far it is over and So you can move the position of this dividing line and the width of its margins to take that Penalty and add that penalty in and still make that as small as you can possibly can So you can still use support vector machines in cases where your data isn't completely separate The fancy term for that is linearly separable Just means you can't separate it with a line So it can handle non linearly separable data Now let's look at a different case Instead of just peaches we have peaches and plums The good ones we either want to eat yellow peaches or purple plums a yellow plum isn't ripe and a purple peach is rotten So we can do the same thing in this world We try a bunch of fruit of different sizes and different colors We find that the yellow peaches are delicious We find that the purple plums are delicious, but the yellow plums are terrible and the black peaches the purple peaches are terrible So we end up with a data set that looks like this The challenge now is that if we try to draw a line to separate these out It's not just that a few data points are going to be a little bit off It's there's a whole chunk of our data that we're missing We're not capturing it well So this data is obviously not linearly separable To help us visualize this I went and took it into Python and made a different visualization of it But it's the same thing green circles and black X's we'd like to try to separate them and in this case We can't with a line Now support vector machines have an answer to this What you do is you imagine that all of these data points are not on a Flat plane, but they're on a sheet of rubber and you can pick that sheet of rubber up and you can stretch it and bend it and warp it however you want and You can probably visualize here that if you take that and bend it just right and slice it you can separate out the good to eat fruit from the bad to eat fruit and This is exactly how you would do it if you can now you with a single straight slice You can separate these things out nicely this trick of Bending your sheet of paper of warping it. It's called the kernel trick and it's refers to how it's calculated but in practice all you need to know is you can take this space that your data is in this paper that it's on and Twist it and bend it however you want You can take and hold down the middle and pull up all four edges Or you can pull up all four edges and pull up the middle and leave a little ring low around the middle or low around the center or You can take it and pull it up and pull it down and make it like an egg crate So that you can capture really irregularly space data and with a single slice You can separate it all out from each other So the kernel trick is really powerful and in fact there is no limit to what you can do with this space to bend it To illustrate another way that it's powerful. Let's consider a slightly different problem now. We have fruit We don't care about the size, but we have five different colors Green peaches are unripe. They're not yet ripe yellow peaches are ripe. So they're good Orange fruit is an unripe plum and a purple is a ripe plum and then a black fruit is rotten So the good ones to eat are the yellow peaches and the purple plums any other color is bad to eat Now you can see that all of this data It's just on one line, but there's no nice way to slice it to separate the green circles from the black X's Now we can do and we can use the kernel trick we can take that line essentially and bend it however We want one way to do that is to just make a single bend in it like a smiley face And you can see that great, you know, we bent it But still those circles and X's are not laid out so that with a single cut we can separate them from each other Now the cool part about the support vector machine kernel trick is that you can come back and Bend your space again in a different direction. So this represents a two-dimensional kernel and Now it's not too hard to see that with the right slice You can come in there with the plane and separate out those green circles from those black X's Now if you look carefully, you'll notice that this slice is actually not exactly the one you want It kind of misses but you can imagine where it would go to separate those out So here we did two different warpings. We took our line and we bent it one direction and then we bent it another direction Because of the math it's a little bit Mind-blowing, but you can actually take whatever Space your data is in and you can bend it in an infinite number of directions To make it so that you can slice it and separate out your two groups of data So that is pretty powerful stuff. This is what support vector machines do they take and find the best slice that separates out two groups of data and if your Space if your data is hard to separate you can warp and twist your space until you find a way to separate Okay, so that's the explanation now comes the most important part In addition to understanding the strengths of a method you have to understand its weaknesses If you're going to use it well and push it to its limits So with support vector machines Issues include if you have data with lots of error So if you notice any time that we're finding a slice between two groups of data Their location of that slice depends almost entirely on the very nearest data points The other ones it doesn't matter if they're close to the margin or miles away from the margin It's those nearest data points that determine exactly where that margins going to be And if each of those has a lot of error associated with it then that error gets a really Loud vote more so than most of your data So that is an issue Another way that it can break is if you choose the wrong kernel If you look back at the original data set where we were bending our paper like a sheet of rubber If we bent it the wrong way We would not be able to separate out our data sets. We had to bend it just the right way That's choosing the right kernel And the act of choosing the right kernel is an art and it's done by trial and error and after a while by experience and Then finally large data sets can break support vector machines Calculating the kernel Some kernels especially can be very expensive take a lot of computing power And so if you're not dealing with just hundreds of data points, but billions Then the amount of time to calculate those is prohibitive so with large data sets you have to Stick with linearly separable problems So you have to go back and hand engineer Features that help your data get separated out help it so you can separate it with a straight straight line single cut So each of these Requires a human in the loop to determine when it's a problem and to work around it This is important to know this means that support vector machines are powerful But to get the most out of them you need someone who has used them quite a bit and understands them well This doesn't mean that they can't be used But this is important to know when you're deciding what method to use on your problem Now taking a step back. That was a quick walk through support vector machines Hopefully you understand it now a little bit better than you did before if you were new to it And now going into another project, you know what you need to do to make good use of it a Comment on the process We went through this together There was no Formal education no coursework no textbook no professor No permission granted no special libraries nothing purchased. This is all information that's out there This is something that you can do with any tool you want You can take and open the box and see what's inside You can lift up the hood and you can see what the pieces are and how they work together It's not an easy process and sometimes it's quite painful But it is something that you have at your disposal And it's something that in the cases of my key tools. I have found it very worth my while So I encourage you when you have something that you think you might need to use heavily Take some time open the box figure out how it works So you go from a grocery getter to a really high-performing race car Thank you