 Hi folks. Last video we talked about hashing and the concept of hashing and in this video we're going to take that concept and concretize it a little bit by using it to create a data structure and hopefully in doing that we'll get a better sense of what hashing actually does. So just as a reminder the main purpose of hashing and the definition of hashing is that you have a great big set of values, potentially an infinite set of values and so usually what we're talking about in computing is numbers, integers specifically, and we create what's called a hash function that does the hashing and transforms this big set of values down into a much smaller space. And again we're talking, in general we're talking about numbers here. Why? Because everything in a computer can be represented as a binary number. Whether that thing is a number as you know it, whether that thing is a string, whether it's a music file or in some other arbitrary class, anything in a computer can be treated as a binary number. So when we're talking about hashing we're generally talking about taking a number, an integer, and transforming it down into a smaller set of integers. So how do we do that? Well we have to define what is called a hash function. And in this video we're going to see some different examples of what different hash functions are. So a hash function will take an input which is going to be a value, a value of an object in Python, whether that object is an integer or a float or a string or something else. And it's going to apply some mathematical operations and the output of the hash function is what we call the hash or the hash value that is going to be an integer number that is used to identify or represent that object. So we're transforming some arbitrary object down into an integer in a fixed space. So what do we use this for? Well we use it to create what is called a hash table. So a hash table is a data structure, not necessarily a type or an abstract data type. It is a data structure alongside an array list and a link list. Those are both data structures as well. How it's different though is that a hash table is a collection of items where the position of the item in the hash table is determined as a function of the item itself rather than by the order it was inserted. So that's a lot of words. Think to your list, your array list and your link list. You can add to those lists, you can append to those lists, you can insert, you can remove. Whichever operation you're doing there, you're kind of packing everybody in there, one next to each other. That's what makes it a list. A hash table is different. A hash table is actually a blank space and you can think of it as an array of memory. And there's a bunch of different empty spaces in this array and we call each of these empty spaces a slot. And in a hash table you can have empty slots and in fact the hash table starts out empty and then you start hashing things you want to put in and they go to different places in the hash table. Where they go is determined by the hash function. So when you implement this in Python, we are going to use an array based list as kind of the underlying storage. But what we care about, we don't care about the list aspect, we care about the array aspect. So what we want is a big empty chunk of memory that then we can put things in arbitrarily. The hash function is going to determine where in that array based list where in the array which slot we put the thing. So when we are talking about an array based list we are talking about computing the index where we are going to put the item. So every data structure that we have looked so far at has been a linear data structure. We've only looked at two. We've looked at the array based list and the linked list. They were linear data structures because the items in it are adjacent. Unless accepting some special cases. But if you add things to a list, if you append to the list, the items are there. In a hash table that is not the case. There are going to be gaps in the data structure. But the hash table has this benefit of big O of one search time. And that's simply because the hash function itself is usually it needs to be constant time operation. So if we have a hash function that does simple math to compute the value. Then that's constant time. The computers are very good at math. Compare that to say the linked list or the array based list. If you want to search for something in there you got to go through and you got to compare. In a hash table you just compute the hash function. Look up where the value is and that will tell you hey is the thing there or not. So we'll see an example of this. Maybe a little confusing to hear. The trade off though is the hash table just looking at the one on the screen. It's got wasted space. Arrays and linked lists tend to be packed in together. They make more efficient use of their space. So we're sacrificing wasted space that we may never use for the ability to search in big O of one time. And that's a trade off usually we're willing to make. So let's go through some examples of some numeric caching functions. Now we want to create a hash table and the underlying support structure we're going to have is an array based list. Kind of like the one on the screen. So when we want to put something in this list we need to figure out where is this bad boy going to go in the list. So we need to define a hash function that takes a value and converts it, converts its value into one of these slots. Slot 0 through 19. That hash value will tell us where in the array based list we want to store the thing. So we're going to look at three examples of hash functions very quickly. We'll look at modulo which is simplest and you have to do it anyway. We'll look at a folding function and we'll look at the mid square method. So to do that let's go over to our worksheet. So you all should have a worksheet that looks like this. You have a hashing handout and so we're going to start by looking at this modulo method. So what is the modulo method? It's very simple. Our goal here again is we want to hash an arbitrary integer value into a fixed size array. So I want to put into my data structure. I want to insert the value 15,997. Where does that go in my hash table? Well it can't go in slot 15,997 because I don't have that many slots here. I need to convert it down into a smaller space. So my hash table here only has 20 slots. So how do I figure out in which of those 20 slots 15,997 goes? Well I need to apply a hash function to get it down into that smaller space. So the simplest hash function is the modulo method. Modulo is mod in Python. So all we do is we take the integer value and we mod it by the length of the array. Okay so let's see an example here. So I'm looking at x here. So I need to insert or I want to put into my data structure the value 1024. Where does it go? Well let me compute the hash function. Here's my hash function up here. So my hash function is I take the input x and I mod it by the length. Well the length of this table is 20. There are 20 slots that I can put it in. So for 1024 all I'm going to do is say 1024 mod 20. So what's 1024 mod 20? You can plug this into a calculator. You can write a little Python script to do it. But 1024 mod 20 is 4. So the value 1024 is going to be put into slot number 4. 1024. So that's where it goes. Now for my hash table what do I care about? What are the operations that I care about? Really I only care about right now putting things in and getting things out. So I want to put things in I want to get things out. So if I put in 1024 my hash table is going to compute that hey that needs to go to slot 4. So collision. We haven't talked about collisions yet. Let's wait until we see an example of that. So let's do another one. 354. Now we're applying the mod operator. Mod 20 is what? 354 mod 20 comes out to the value 14. So how am I getting that? What is the mod operator? You divide by the operand here 20 and you take the remainder. So 20 is going to go evenly into 354 to give you 340 and you'll have 14 left over. So this is going to go into slot 14. All right. Let's go to a different value now. Let's go over here to 1984 this guy on the right hand side. So what is 1984 mod 20? Well 1984 mod 20 is 14. So I forgot to put up here before that we put 354 in slot 14 from this computation over here. Well now I've got a problem. 1984 also hashed into slot number 14. So were I to put 1984 in here? It would have to go in this same slot. So this is what we call a hash collision. Yes. So there was a collision here. A hash collision is when two input values hash into the same hash value. So in this case 354 and 1984 hashed to slot number 14 that collided. So this is a hash collision. Now let's go back to our table on the left hand side. When we hashed 1024 there was no collision. The space was empty. When we hashed 354 there was also no collision. The space was empty. But when we got down here to 1984 and we hashed yeah there was a collision because 354 was already taking up this slot. Okay. So that's what a hash collision is. Now when we create our data structure in Python we're going to have to deal with these collisions because we can't have two things in the same slot. It's not going to work. We'll talk about how to deal with hash collisions next time. So for now though what I would like you to do is as we complete this worksheet do it to get some practice just write down which slot each thing would hash to and write down was there a hash collision yes or no. Alright. So this is the modulo method. Let's look at the other methods just very briefly. Okay. So another one is the folding method. Okay. So what is the folding method? We need to divide x into equal size pieces then sum the individual pieces. Okay. So this is a very different transformation. Right. So but it's pretty simple in concept. So what we will do is we're going to divide the input into pieces of size two. So two characters. Sum them together and then mod them by the length of the list. Alright. So let's look at an example. It's maybe the easiest way to describe this. Okay. So let's look at 1024. So 1024 we're going to split it into pieces of size two. Two characters. So 10 plus 24. Those are two pieces. 1024. Right. I'm treating it kind of like a string. So sum these up which gives us 34. Okay. Sum the individual pieces and then mod this by the length. Well the length of this thing is still 20. Right. 20 is the number of slots that I have in my hash table. Okay. So 34 mod 20 is going to give me slot number 14. Okay. So compare this to the previous hash slot for this. Right. Previously when we hashed 1024 we got to slot number four. Now using this folding method we get a different hash value. It hashes the slot 14. Okay. The choice of hashing function is going to be really important. It's going to help us avoid collisions. Okay. So 1024 inserts in the slot 14. I'll write it up here at the top. And there's no, oops excuse me. There's no collision. Oh. Whoops. No collision at this point. All right. Let's move on to the next one. Let's do 354. Okay. So divide it into equal size pieces of size two. Okay. So 354 divided by two. So I got a 35. And then what I'm going to do is just say hey there's only one piece left. The four. So 35, 35 plus four. Okay. 35 plus four is 39. Okay. 39 mod 20 is 19. Okay. So 354 applying the folding method is going to hash down to 19. Okay. So in slot 19, I put 354 and there's still no collision. So I'm going to invite you to do the remainder of these. Right. But it's still going to be possible to get a hash collision. Right. So let's go over here to 14 again on the side here. Right. So let's hash 14 the way we would. Divide it into equal size pieces of length two. Well, there's only two characters here. So I get 14. Mod 20. And well, lo and behold, I'm hashing to slot 14 again. Hmm. So 14 hashes to slot 14. But there's already something there. So what do we have? We have a hash collision. Yes, there is a collision. Okay. So collision again. Collisions are not a good thing. We do not want them in hash tables. Ideally, there will never be collisions. That's not something that's necessarily very practical. And it depends on a couple of things. It depends on the nature of your hash function. And it also depends on how big this table is. Right. If you have a gigantic table, there's much less chance that you're going to get a hash collision. On the other hand, if you have a really silly hash function that doesn't distribute the hash values evenly, then you're also going to get a lot of collisions. Right. So let's look at one final hash method, which is called the mid-square method. Okay. The mid-square method, the way it works is you first square the input value, and then you just extract the middle two digits. Right. So square the number, treat it as a string, and take the middle two digits out of it. Right. The squaring property, the squaring aspect of this, and then extracting just a portion of it, has this interesting property where it's a hash function that tends to distribute the hash values a little more evenly than, say, just modulo by itself, or even the folding method. Okay. So let's apply it and see how this one works. Okay. So 1024 squared. Right. 1024 squared. So that's the first thing. Computing 1024 squared, you need a calculator. But 1024 squared is 104857. Right. And 106. Right. So that's the number of bytes in a megabyte, for example. Right. So the mid-square method says square the value, then extract some portion of the resulting digits. All right. So what I'm saying here is for hashes of even length, take the middle two digits. Right. Well, there are one, two, three, four, five, six, seven hashes, seven characters in this hash. It is not even. It's odd. So if it's odd, what are we going to do? We're going to take the middle, and the middle minus one. Okay. So the middle of this is eight, and the middle minus one is four. So what we do is we extract this 48 out of here, and then the final step is to mod it by the length. Well, the length of this hash table is still 20. Okay. So what we're going to do is we're going to take 48 mod 20. Okay. 48 mod 20 is eight. So 1024 is going to hash down to slot eight with no collision. Right. Let's do it again. Why don't I go to, let's do 585 this time. Okay. So 585 is the squared. First step, take x squared. 585 squared is 3, 4, 2, 2, 2, 5. Okay. Now, we're going to extract the middle two digits for hashes of even length. So the middle two digits are these two twos here. Okay. All right. So we extract them. Excuse me. Sorry. Sorry. And we do 22 mod 20 to get our length. Right. Or our hash value. Excuse me. We always have to mod by the length of the hash table so that we get whatever hash value we put it only in the available spaces. Okay. So 22 mod 20 is two. Right. And it doesn't look like there's any collision so far. Right. So 585 goes into slot two. Right. Now whether or not there's a collision depends on the order that you insert these things in. Right. So I did 1024 and 585. That's not to say that 5354 and 601,020 also don't collide. Okay. So this is the mid-square method. Now what I would like you to do is to complete this worksheet. And once you've completed this worksheet, you now have enough knowledge to be able to do the homework assignment for this week on searching, binary search, and hashing. So next time, we will talk more about how to deal with collisions and how to deal with collisions. And we will also get to work on actually implementing the hash table itself. Right. I'll see you and take care.