 Hello, 231 students. I hope you're all doing well. So in this video, what I want to introduce you to is the topic of hashing. And hashing has a lot to do with search, right? So the last time we talked about something new in this class, we were talking about searching, right? And the purpose of searching is pretty simple. You are always searching a collection, a group of items, whether they're integers or strings or some custom object. And searching will tell you one of two things, sometimes both, but is an item you're looking for in the collection. In Python, typically we're searching a list, but there are other types of collections as well. Or the other thing that searching will tell you is where in that collection is the thing. Like if you're dealing with a list, what is the index of the item in the list if it's there, right? And so before we were talking about, well, how do you search a list? And there's really only three ways to search a list. And that is one, the sequential search or the linear search, which is you start at the beginning and you just go item by item by item by item. And you say, is this it? Is this it? Is this it? And we talked about what we call a smart sequential search. So what made it smart was we first had to start with a list that was in order, say from largest to smallest or smallest to largest, where the smallest is at index zero and the largest is at index n minus one. And the smart part of sequential search was saying, okay, is this thing it? Is this thing it? Is this thing it? No, no, no. But if we got to a point where the item we were looking for was less than the value that we were currently looking at, right? So if we're looking for 45, but we see 55 in our list, well, we know that 45 can't be in the list because we would have found it by now. So the smartest part says, hey, I see a value that's bigger and I didn't find it. Let me just stop searching. But in order to do that, the list had to be searched or excuse me, had to be sortable, right? And then the final one was binary search. And binary search, again, you can only do it on a list of items that is sorted. And you can only do it with items that have a natural ordering, right? That means that they can be sorted. So you can sort numbers. You can sort strings. But how would you sort, say, students, right? You could probably come up with a scheme, but you have to have a way of comparing things, right? And binary search was fantastic because it was our first log-in algorithm, meaning you could search billions of things in at most, say, 20 comparisons, which is a pretty big improvement, right? So when we left off talking about searching, though, our question was, well, we can do big O then, search time with sequential search. We can do big O log in with binary search if the stuff is sorted. But can we do big O of one search? In other words, constant time search, immediate lookup. The answer is yes, yes. But we have to structure our data totally differently than we would in a list, right? We can't do this with lists. We have to think about it differently, right? So remember, why do we need a different data structure? Well, remember that lists are what we call linear data structures. In other words, the items are in there one after the other, right? Whether it's an array-based list or a Python or a custom linked list that you've created, the items are packed in there one after the other, right? That's what makes it linear. So for a linear data structure where things are packed one after the other, no, you can't do big O of one search time. You can get big O of n by iterating through it. Or if the contents are sorted, you can divide and conquer and get that big O log n using binary search. So if we wanna do a big O of one search, which is awesome, right? You know, if you've got an enormous data set or something you need to search a lot, having fast, fast search is really important, right? And we use search so much in our daily lives, it's gotta be fast, right? So we can achieve big O of one search time, but we gotta get rid of the idea of having data in a linear order where it's packed in front to back, okay? Instead, what we gotta know is we need to know before we go into our data to get stuff, we gotta know exactly where the item would be in our data if it's there, right? So how can we do that, right? Instead of searching for the thing, instead, I'm kind of cheating. I have a map. I know where the thing is going to be in my memory, right? We're always talking about storing data in memory. When you're talking about search, you're looking for where that thing is in memory. So what I need to get big O of one search is a map where a key that tells me, go here. If you're looking for the treasure, it's gonna be at memory address, blah, blah, blah. And if it's not, if you go to memory address, blah, and you don't find something, then it's not there. It would be there if it were there, okay? So what is the map? Well, what's our key? This is what we would call the concept of hashing, right? And what hashing is kind of just in a nutshell is, take the thing you wanna search for, whether it's an integer or a string. And we're gonna manipulate it. We're gonna transform it in a very specific way, and that transformation will give us the key of where to find it, okay? So the idea of hashing has been around since at least the 50s in computing. And the concept of hashing means you kind of, it's like making a hash out of something when you're cooking or making a meal. You gotta chop it up and rearrange it, right? Make a hash of it, okay? So what is hashing specifically? Technically, what hashing is, is taking a huge set of values, like potentially infinite, like all the numbers that there are, or all the strings that there are, or all the people that there are, taking some huge set of things, and hashing it down to a smaller set of possible values, okay? But these values are gonna be in kind of a fixed range. In other words, we're taking something that's off the map, and based on some attributes of that thing, we're gonna hide it somewhere in the map. But where we hide it is determined by the attributes of the thing, okay? So what determines where in the map or where in the memory space, the thing we wanna hash goes. To figure this out, we define what is called a hash function. And the hash function is noted with capital H, right? And then function takes a single parameter. And it takes an arbitrary value for its input, arbitrary meaning any value. So if you're talking about numbers, any number you can think of, real, imaginary, negative, positive, right? And it hashes it transforms it into a hash. The hash is the output of the function. And while the input may be infinite, come from an infinite source, say the numbers from negative infinity to positive infinity, the hash of those numbers will be in a fixed range. So say zero to 100, right? How could we get all the numbers from negative infinity to positive infinity? Integers, let me specify integers, into a fixed range of say zero to 100. Oh, we actually can do that, all right? But what we're talking about data structures here and what we wanna do is search. So what we can do is create a data structure, a new data structure, something totally different, kind of like your linked list was totally different from everything else you've used. We can create a data structure that uses this hash function to basically search for the data. So here's a graphical representation of what this would look like hashing, right? You've got a huge enormous set of values, value space, and you're applying a function to them to kind of compress them down into what they might possibly be, okay? And I'll show you an example of a hash function, very simple hash function here in a minute, right? So the hash function is gonna take the value of some object, and when we're talking in Python here, we could be talking about integers, strings, custom classes. We're gonna take the value of that object, apply some mathematical operations. Typically that's what we're doing. We're doing math to create a hash, which is usually represented as a number to identify the object. Now, looking ahead to the next video, what we're doing, what we're gonna do in our data structure is we're gonna take the thing we wanna put in that we wanna store and then search for it quickly later. We're gonna convert it to an integer, and that integer is gonna become an index and a list, right? But what that integer value is depends on the object we wanna insert, okay? So you've seen real examples of hash functions, right? Here's the Library of Congress call codes, right? An encoding scheme. So for any book that will ever be written, the Library of Congress hash function can transform it down into a relatively small set of attributes, you know, the classification, the cutter, and the date, right? But if you think about all the information present in a book, what a book represents, this hash function, this Library of Congress hash function can transform this whole concept of a book down into a much smaller, more manageable space. That's what hash functions do, right? Take it, convert it into something else, usually a number, okay? So you've, hashing is huge in computing, why? For example, encryption, right? Whenever you have a password in a computer system, it is usually hashed into something else. So if you have good computer programs, they're not storing your password, they're storing a hash of your password that is transformed using mathematical operations that take in your plain text password, you know, whatever it is, I love kittens, 28, and transforms it into some big incomprehensible number using mathematical transformations that are impossible for a computer to kind of reverse. You've probably also, if you've gone and downloaded software off the internet, you've probably seen hashes like this next to the thing that you're downloading. And what these are meant to be used for is you could download a program that will take, as input, the program you installed, say whatever this GZIP source tar ball is, put that into a program that hashes it and produces a number that looks like this. And when you take the one that you downloaded and compare its output hash to the one that the publisher presented, you better hope that they're the same, otherwise you know somebody has messed with it, right? So hashing is very important in computing. Let's take a look at an example of a very simple hash function, right? Okay, so remember, I'm gonna draw now and hopefully you'll be able to hear me okay. And so remember, we got a hash function and a hash function is gonna take a single parameter. We're gonna stick with integers for now because they're the simplest, right? And our goal is to transform our input, our x, into from a potentially infinite range of values down to something manageable, right? So think for a minute. If I've got an infinite number of integers and given any integer, I wanna transform it down to a fixed range, say the range from zero to 10, right? What function, what mathematical function could I apply to do that? And one of them, the most simplest one to think about is the remainder operation, right? So hash of x, how about the remainder function or the modulo function, right? So for example, let's define this function, this hash function as take x and mod it by some value. What value do you wanna pick? Well, I'm gonna say I just wanna take any number there is and compress it down into the range, say zero to 20, right? So I wanna do zero to 20, right? This is a hash function, very simple, okay? Not particularly clever, but it works, right? So this is the modulo hash function. So if I want the hash of say 15, okay? What is 15 mod 20? Well, modulo is the remainder. 15 divided by 20 is zero with the remainder of 15. So this hash function actually doesn't transform the input. I haven't done anything here. But let's take the value 21, okay? 21 mod 20 is what, right? Well, 21 divided by 20 is one, but modulo gives me the remainder. 21 divided by 20 is one with a remainder of one, right? So I've transformed the 21 down to just a one, okay? How about age of 32, okay? Well, 32 mod 20 is 12, right? So I've taken a 32 and modded it down to 12. Now, where did I get this 20 from? I just picked it. When we talk about what we're gonna do with our hash functions to create a data structure, our choice here will be important, right? But this is just to illustrate what a hash function is. But a particular thing, a peculiar thing can happen as well, right? So what is gonna be the hash of 52? Well, 52 mod 20 is gonna be what, okay? 52 divided by 20 is two, right? 42, 40, with the remainder of 12. So I've got integers, right? I've got two inputs here, 32 and 52, that resolve to the same hash value, okay? So our hash function has taken two different inputs and gotten the same output. This is what we would call a hash collision. And we will talk more about them next time, right? For this lesson, I just want you to really understand, you know, kind of in general, what is the concept of hashing? What is a hash function? And what is a hash value, right? And the big thing to remember is hashing is all about taking a great big space, a great big set of values, and whatever these values are that are in here, mapping them down to a much smaller space, a space with a fixed set, a fixed location. So what we will do when we go to create a data structure is we're gonna have a data structure with a fixed size, say 20 or 100 or 1,000 or a million, something that's gonna fit into memory. And we're gonna create a hash function that takes the stuff we wanna put in and figures out through math where in this fixed set of things the data should go, okay? And that because computers are so good at math, as we well know, just computing some quick little mathematical function like this, that is big O of one time. And so as long as we've got a big enough space to put them in and we do a pretty good job of avoiding these hash collisions, we can leverage the concept of a hash function to achieve big O of one search time. We'll talk more about that in the next video.