 I'm very nervous and when I am very nervous I talk very fast, so we're gonna go really fast to this thing I'm here to talk about locality sensitive hashing a really fun algorithm I learned about when as working at SoundCloud where I serve some amazing teams of data scientists as their humble manager Clear okay, so before I talk about locality sensitive hashing I need to review a few concepts so that we're on the same page to this to wit we will Define vectors then we will talk about machine learning algorithms and how they produce vectors and after that We will show how in recommendations algorithms where I work we use these vectors and compare them each other to produce Recommendations and we're gonna talk about how slow that is and then we're gonna talk about how locality sensitive hashing helps make this faster Okay vectors without going into the axioms of a vector space a vector is a point in n dimensional space relative to an origin For example here is a vector in one dimensional space. It's a point on a line relative to this zero Here is a vector in two dimensional space is a point on a sheet of paper Here's a vector in three dimensional spaces flexing my abilities to do drawings in Google slides It is a point in a cube and here is a vector in fourth dimensional space where the fourth dimension This time this The fourth dimension is time. Okay. Here's if you want to have n dimensions You can do this by just having n components that uniquely describe a point relative to an origin Now you all know what a vector is now we can talk about machine learning Many machine learning algorithms take sequences of things and produce vectors for those things here is the real live sound cloud Recommendation algorithm taking sequences of user histories and producing for each track a vector now These algorithms do this in such a way that track vectors that are close to each other in this space Happen to also be similar and we use this to build recommendations For example here is a two-dimensional track vector space with some of my favorite albums if we want to build Recommendations for joining Mitchell. We will just compare it It either length or cosine angle to all the other tracks in the catalog and pick the five top ones If you want to then produce it for Bruce Colburn self-title album from 1970 You have to compare it to all the other tracks in the catalog and pick the five best ones and this takes a really long time It's known in computer science as the K nearest neighbor's problem because you have to compare for each track To the rest of the catalog and produce the top five tracks or as I like to call K lit tracks Thank you Now it turns out that doing all these comparisons is really complex if you have n tracks in your catalog It'll take n steps of your algorithm because you have to compare it to itself and if your If your catalog of tracks is 130 million like they are at sound cloud then this takes just too damn long Okay, so thankfully some people came up with this really great algorithm called locality sensitive hashing That's gonna solve all this problems for us break. I'm gonna Tell you what we talked about so far. We talked about vectors as points in the space relative to an origin We then talked about machine learning algorithms that take sequences of things and produce tracks for those things We then talked about how to use those tracks which are close and therefore similar to produce recommendations by comparing them to each other And this takes too damn long now locality sensitive hashing is hopefully going to make this a lot easier for us before Oh, and let me define what that is locality sense of hashing is an algorithm that takes vectors as input and produces Binary as output and it does it in such a way that small changes the input Manifests themselves as small changes in the binary output and in this sense it preserves locality And I promise you you'll have an intuition for this by the end of this talk Before I talk about the algorithm, let me give you an analogy. How many of you live in Queens? Yeah, okay, how many of you live in Queens asked your friend who lived in Brooklyn over for dinner and that friend said Queens is too damn far Okay, yeah, I stayed in Queens once it was awesome I'm from Berlin by the way and originally from Canada now the reason that your friends say Where isn't your friends that it was so far away is because? We draw these arbitrarily boundaries everywhere and what we say is that we draw this line It's called the neighborhood and anyone on the other side of that line It's just too damn far and this is at the heart of locality sensitive hashing You draw arbitrarily lines that people on the other side are just too damn far So let's talk about this algorithm on the left. We have vectors and then on the right We have binary. I told you this algorithm takes vectors and produces binary So we're gonna do this this vector space is two-dimensional one axis is happiness. The other axis is how dead you are The algorithm goes as follows pick a random plane that passes through the origin then pick and an Orientation and assign vectors on one side of the plane to one and vectors on the other side of the plane at zero And don't stop pick a random line that passes through the origin pick an orientation Assign the vectors on one side of the plane a one and vectors on the other side of the plane a zero Don't stop pick a random plane pick an orientation assign vectors on one side of the plane a one and vectors on the other side of The plane is zero and don't stop pick a random plane Pick an orientation assign vectors on one side of one and vectors on the other side a zero as you can see We have vectors on the left-hand side and binary on the right-hand side. That is locality sensitive hashing Okay, so I need to show you two things one I need to show you that this algorithm preserves locality and then I need to show you how it can help us find recommendations And I'm gonna do that. Let's look at this Let's look at this binary representation very quickly I'm gonna make a claim I'm gonna claim that the probability that you picked a random line that separated to emoji is proportional only to the angle between them So think about this for a second. You have two vectors. They're really close together. We're picking random lines What is the likelihood if these two vectors are close together that your line separates them? It's pretty small now if the vectors are if the vectors are far apart The likelihood of you picking a random line that separates them is very large and in this way They are proportional now I'm gonna give you all a quiz and I'm not gonna ask you for the answer because we don't have enough time But how many lines separated these two emoji looking only at the bit strings? I give you a hint The answer is one and it is exactly the bits that are different between these two bit strings So I now make the claim that if you count the number of bits that are different between two of these binary representations that this is proportional to the angle between the emoji and of course This is the case because the only time you separate a plane is The only time you separate two emojis is when you pick a plane that separates them and the likelihood of you doing this Is proportional to the angle between them another complex way of saying this is that the hamming distance Which is the number of bits that are different divided by their length is proportional to the angle between the vectors So that's what we mean when we say they preserve locality Somehow this binary representation preserves the fact that they were one side or the other on the line and this is related to the angle But I haven't told you how to use this to find k nearest neighbors and we're gonna do that right now We're gonna take you back in your mind to the bucket analogy suppose I wanted to order these Emoji in such a way that emojis in the same bucket appear next to each other Suppose this bucket all I need to do is take this representation and lexically sort it Which means sort from the left in the right where your alphabet is just one and zero But I need to make sure that my red and blue bits are first and they are in this representation I give it a sort and Emojis that are in the same bucket appear next to each other We have some false positives like the ghost but we'll filter that out later and we can keep doing this We can do this for another bucket. Here is the bucket spanned by the blue and green line And because we chose these lines randomly in any order We can permute them in any order and so we just go and permute them Lexically sort them and we see here that emoji in the same bucket are nearest each other And we can keep doing this random permutation of bits and lexical sorting to find candidates And this is our this becomes our locality sensitive hashing algorithm So let me summarize what the algorithm is in entirety you have a vector space full of things You pick a random plane that separates this vector in half or sorry You pick a random plane that separates the space in half Then you choose an orientation and you assign two vectors on one side of that plane of one and vectors on the other side That plane is zero then you repeat this d times and if you repeat this d time for each vector You'll have a binary representation that is d bits long If you then take this representation and randomly permute the bits and lexically sort them you will find candidates for your recommendations And then you can just use cosine similarity to filter out the bad ones That's it why is this better Well first of all I told you our original algorithm was n squared But sorting is n log n and our algorithm depending on the number of times we permute bits and lexically sort them It's just s times n times log n, which is much faster than n squared when you have 130 million tracks Also, we can use The fact that the hamming distance was proportional to the angle to reduce all angle calculations to just bit math Which CPUs are very good at and this is the only time in my career as a software engineer Which is very short where a constant factor actually mattered in an algorithm, okay? That is the end of locality sensitive hashing an algorithm that takes vectors as input and produces binary as output in such a way that It preserves locality and it does this by splitting places in half and assigning each vector as ones and zeros on either sides of these planes You might ask is there more this algorithm only preserved the angle? And you might say I want to preserve the length of these vectors because hey Maybe that's important and there is such an algorithm that does this you might say hey all my vectors have a rank associated with Them because we put page rank somewhere in there and you can do this it is called rank based hashing in fact You might even say hey, why did we pick these planes so randomly this seems absurd? Why didn't I look at the data well you can do this and it's called supervised hashing you could pick the best planes you Want and because it's 2017 somebody has figured out how to do deep hashing using Using deep learning. This is my favorite webpage on the internet by the way, and there are many papers about deep hashing I have no idea how that works, but that's it. That is locality sensitive hashing may all your vectors be hashed