 Hello again everyone, so our last video we implemented the beginning of a hash table data structure and what we encountered was the problem of a hash collision and What do we want to do about that because we you know, even if a hash collision occurs Which is when two values map to the same index or hash to the same value We still want to put both of those different values in the hash table and be able to search for them later So how do we deal with something like that? all right, so first formal definition a hash collision occurs when Different values hash to the same hash value So we apply the hash function to something say two integers say five and twenty five But they hash to the same value. That's a hash collision Now there are such things as perfect hash functions that map each value Remember from the big great big potentially infinite space to a unique hash value and many of the cryptographic Algorithms for hashing they're close to perfect. We're pretty sure and there's no perfect hash functions, but It's a conceptual thing that we should be aware of right, but hash collisions in general are going to occur right so One way to get around them is to have better hash functions So for example in the modulo hash function isn't so great You get a lot of hash collisions with it the mid-squares a little is quite a bit better in terms of that Just because of the mathematical properties of the way that numbers are distributed So you know the first step to dealing with hash collisions is to avoid them in the first place You can do that in two ways one use a good hash function to Make the table bigger if there's more slots for you to hash down into The less likely you'll have a hash collision as well All right, so there's kind of two two aspects to this problem But if you get a hash collision, what are you going to do? You still want to put those values in the hash table somewhere well There are two strategies for dealing with hash collisions one the first is called open Addressing an open addressing means find the next empty slot using some search strategy The second strategy is called chaining and that's basically where instead of having a hash table with slots And you put a value in each slot Instead each slot is actually a list of values. And so we'll take a look at both of these All right, so open addressing here Imagine, you know, you've got this nice hash table. It's pretty small. It's only got 20 slots And we've got some values in there Open addressing says find the next empty slot All right, and there are two strategies here and we'll look in an example of how these work, right? The first is called linear probing Okay, so you hash two things for the same value linear probing is just a fancy way of saying Go to the next slot and see if it's empty. If so Put the item there. If that next slot is an empty go to the next one Go to the next one. Just keep going up the array until you find an empty slot That's all linear probing is The problem is it's susceptible to what we call clustering where if you have a lot of things that hash to the same value say This table has 20 Imagine slots imagine if you insert 20 40 60 80 they're all going to hash to zero and you'll get them all kind of clustered together And then your hash table if you're looking for a value You're basically just searching a list and that's not good Slightly better approach for open addressing is what's called quadratic probing where instead of just going to the next slot We take bigger and bigger and bigger and bigger hops down the Array until we find an open slot, but let's have a look right. Let's see what these look like in practice Okay, so I've got my hash table here. It's got 20 empty slots. Let me move myself. Whoops. Excuse me Let me move myself down here to the bottom so you can see all the slots. All right, so Let's just illustrate what linear probing looks like. Let me put some simple values in here Let's hash in Let's hash. Whoops 22 and we'll just use the modulo method. Okay, that'll be our hash function because it's the simplest method So if we hash 22 that's going to go here right now let's hash Just the value 2 Right if we hash the value 2 using the modulo method It's going to hash well first 22 goes to slot 2 if we hash 2 that also goes to slot 2 Well the linear probing method says alright There's already somebody here in slot 2 when I go to insert 2 Just increment by 1 until you find an empty slot, right? So if I try to insert 2 here, no, it's full. So just go up to 3. Is 3 empty? Yes, it is. So let's put a 2 in there. That's where the 2 goes. Okay Not too sophisticated. Let's put Let's put say 7 in there or 7 Using the modulo hash function just goes here Let's put in a more interesting value like 55 Where's 55 going to go? Using the modulo method that should hash to slot 15. So that'll go here. Okay, so we're starting to fill up the table a little bit Let's hash something we know is going to collide. Let's hash 27 And where's 27 going to go? I'll compute the hash 27 mod 20 is 7 so it's going to go wants to go into slot 7 But it cannot because there's already something there. Whoops Sorry about that Tablet's a little bit sensitive to Where I lay my hand still getting the hang of it. I apologize for that And how to move things around If I find my mouse here, let's see I'm sure why it thinks I want to be here. There we go. Hopefully this is a little better. Sorry about that So it wants to hash 27 to this slot right, but it's occupied. So where does it go? Goes to the next slot, which is number eight. Okay, so 27 would go here right So 27 it hashes to 7, but it actually winds up in slot 8 just like when we hashed to 2 wanted to go into slot 2, but it actually wound up in slot 3. Okay, let's do one more here Let's hash 82 82 mod 20 is going to give us 4 with a remainder of 2 So 2 is the slot it wants to go to so once again, we're up here. We want to go into slot 2 There's no room at the end. So It goes up one linear probe. It goes up one. Is that slot 3? Nope frayed not so it goes up one more slot slot 4 slot 4 open. Why yes, it is Okay, so when I hash 82 it winds up in slot number 4 Okay, so that's linear probing but the problem with linear probing is you tend to get Groups or clusters right you tend to get clusters of things and these clusters aren't good Why is that well? Because if you try and insert say 62 where's 62 going to go It's going to start here at 2 and then go to 3 and then 4 and then 5 right So we were certain to walk down this array and that's not good, right? The benefit of the hash table is it's a big O of 1 search time If we have to walk down the array to find an open slot or if we're looking for the value 82 and we call our get method. Well, we come here to 2 and we say it's not there Well, maybe I should look in the next slot because maybe a collision occurred. Let me go to the next slot Is it in slot 3? No Well, maybe I should go to the next slot. Is it in slot 4? Oh, there it is Okay, so when you have a linear probing you start to have to walk this list and that's not so good All right, this is clustering a slightly better approach is what we call quadratic probing Okay, it's kind of the same premise except we're going to do bigger and bigger hops Every time to kind of spread the data out through the hash table, right? So let's put these same values back in Let's hash 22 and again, we'll use the modulo method. So once again 22 is going to resolve to 2 right off the bat Okay, so Quadratic probing hash gets hash plus I Squared until you find an open slot. Okay, so let's hash 2 Okay, where is it going to go? Well initially? It wants to go to 2 again Append does not like me nor does my tablet Initially it wants to go to 2 again All right, but 2 is occupied. So where does it go? It goes to hash plus I Squared okay, so what's the I I is going to be the number of times that you rehash the thing So initially I is going to be one Right initially I is one So hash plus I Hash plus one is going to be three So we only jump to here to begin with Okay, not a big improvement. They're still side by side, but now let's go back and let's insert number 82 Okay, where does 82 go? Well by using our mod function it initially wants to hash to slot 2 okay It's not full can't go there because 22 is there, so it's going to try the next spot I equals one I equals one is going to be slot three Is there somebody there? Yes, there is value 2 is there All right, so now I equals 2 we're going to look for the next thing hash plus I Squared right I squared 2 squared is 4 so we're going to go to slot number hash here is going to be the initial hash value Which is 2 so this is going to be 2 plus 2 squared which equals 6 just like slot 3 was 2 plus 1 squared equals 3 right So this is going to jump from here all the way up to here slot 6 ah Now there is space, so I'm going to put 82 in right here Okay, let's do one more. Let's do one more Let's do let's move my head again Let's do one more. Let's do H of Who's going to go to slot 2 um 42 would right? So when we hash 42 We get 2 Alright 2 is occupied. We know that So let's apply our quadratic probe So the first iteration I equals 1 So 2 plus I Squared 2 plus 1 squared is 3 3 open Nope the value 2 lives there go to the next iteration I equals 2 Hash value plus I squared equals 6 6 open Nope afraid not Alright we go again. I equals 3 Okay, hash value plus I squared 3 squared is 9 9 plus 2 is 11 11 All right is 11 open. Yes, it is So that's where we're going to put it Okay, so this is the concept of quadratic probing And as you can kind of see just with even these values It starts to spread things out and prevent them from clustering when they hash to the same value, and that's a good thing all right so linear probing quadratic probing and go back to the slides here One more strategy that we want to Look at or we'll just talk about briefly So both linear probing and quadratic programming our strategy is known as open Addressing find the next open slot in the hash table using some algorithm the other option is called chaining okay chaining says instead Instead of having just one value in every slot in the hash table put a list in every slot of the hash table and That way whenever there's a collision just add it on to the end of the list It's there at that slot This is not a bad idea. It's actually pretty simple from an implementation standpoint Downside is you can wind up wasting a lot of space Because you know as you remember from talking about array lists You have to Jump you know each array that you allocate takes a chunk of memory So if you have arrays at every slot you're potentially occupying a whole bunch of memory And then also searching these things is we kind of lose the power of searching a little bit if you do so as well So chaining we're going to stay away from we're going to stick to an open addressing scheme Now there's one more concept that you need to be aware of that has a huge bearing on how well your hash table Performs in terms of its search right because our goal is to keep everything big o of one And the last thing that matters here that we'll talk about is called the load factor So what's the load factor? The more hashes you have in your hash table the more likely that a collision will occur right? But less free space the less open spaces there are The more likely you're going to run into something already occupying space so when it's the ratio of how many slots are filled up in the hash table to The number total number of slots is what is called the load factor right so for example in this list There is excuse me in this test table. There's 20 slots and there are 16 items So the load factor which is denoted by the Greek letter lambda is 16 divided by 20 which is 0.8 or 80 percent right so this test table is 80 percent full So it's a good idea to keep this load factor below a certain threshold to avoid collisions So how do you keep it below a certain threshold that what that means is that? Once you when you insert stuff into the hash table what you want to do is you want to check this load factor and say Hey, have you gotten above a certain threshold like 66% or 75% and if so Then what I'm going to do is I'm going to grow my hash table I'm going to allocate some more space a bigger chunk of memory and I'm going to put everything back in the hash table Kind of start start fresh and spread everything out again Kind of like Python's array does once it gets full if you remember Python's array Once it gets full it goes and finds a bigger chunk of memory and copies the array over into it So you can keep adding on to the list. Well, we're going to do a similar thing with a hash table once it gets too full We're going to allocate bigger room for it and reinsert everything into the hash table Okay, so these are all the critical hash table concepts. You got to know hashing hash values slots collisions how collisions can be resolved through open addressing and And linear probing or quadratic probing and then the load factor, right? So to wrap up hash tables We're going to go back to the code and we're going to implement some collision resolution and then at least talk about a strategy for dealing with dynamically resizing the hash table in response to a Too high of a load factor. We'll do that next time