 Okay, once again, I'm here with James, and we're just doing a little tech talk. And I'm going to let James do most of the talking, so go ahead, James. So, okay, Chris, I think we were talking about different data structures at one time. It was a really neat conversation. I thought we could talk about it a little bit for your channel. So I want to talk about just data structures in general, but maybe we'll talk about hash tables, actually. So let's try to talk about a situation, because it's easier to think about things in terms of an actual situation. Like, let's suppose you were writing a program that was going to keep track of maybe the number of times a particular IP address was fed through it, right? So you have a Python program or something, or maybe C program, and it has an input, and you can input IP addresses, and it would try to store them, and keep track of how many unique IP addresses, or the count of each unique IP addresses. So let's talk about how that might work, and we'll talk about how that might scale, and then we'll talk about hash tables. Do you want me to say how I would do that? Yeah, please, that'd be a great idea, actually. Okay, so I would basically just do a grep for that IP, and then type it into wc-l. And that would tell me how many times that line, that string has shown up in lines, as long as everything's on it. This is great, this is great. Okay, so you would do grep of, let's hit it one more time. I'd grep with the IP address, so let's say it was a text file, and had all these random IP addresses, and I wanted ip123, just to make sure. I would say cat that file into grep, and I would grep 123, and I'd pipe that into wc-l, and that would give me the line count. So grep would say only lines with 123 on it, and that would get the line count, and that would be the number, and I'd pipe that into wc-l. This is a great, this is the perfect example, because that's so horrible, and we can talk about why. This one of those things is probably good on small scale, but you're saying on a big scale, let's go. Because we're going to talk about it, we're going to talk about data structures, and why they're important. And you described how you do it in match, but this is very similar to how you might do it inside of a program. But let's just talk about what that might look like in C or Python. Instead of storing a file, let's just store it through a data structure like an array. Or a list. Or iterate through it. Or iterate through it. Even using your method, we can improve it a little bit already. So one thing you were saying is you would store each value every time it came in in the array. So if I send you 123 a hundred times, your list would be a hundred. So it might be better even just to, this is not still not a good way to do it, but it might be better to just store 1234 and then in another array say, oh that's my 100th time I've gotten that. So let's talk about how many IP addresses are possible. So we have 124, 255, 256, 256. Wow. I did this part out. 256 and you've got four of them. So it's 256 to the power of four. And you should put what that number is down below on your video. If I remember, it's a very big number. And so what happens when you have nearly that many IPs that have come in your program? How many checks are you going to have to make by the time you have a million IPs? And I sent you the millionth one a second time. How many checks do you have to make? A million, right? So in data instruction, O notation is what they call it in computer science. In O notation we say that scales linearly. It scales with N. As N gets bigger, like you said, for 10, for 20, for even 100, for maybe even 1,000 or maybe 100,000, that works really well. But how much, if it takes you a second to do 1,000 of them, or if it takes a second for you to do a million of them, it will take you 10 million. It should take 10 seconds. And that will increase linearly. So what's a different way to do this? And so there's this thing called a hash table. You're probably familiar with them. If you use dictionaries in Python, this is actually what a dictionary is. It's a hash table. It takes a key and a value. And so, for example... That's also similar to JSON. For a key and a value? Yeah. No, this is a good point. I mean, I've done it with dictionaries in Python, but I've only played with them a little bit. This is true. So JSON is just like an output string representation, and it could be of a hash table. Like if you said, I want to dump this hash table to a file in some way, a JSON string would probably be a very fair way to represent it. So JSON string is just arbitrarily a great way to store arbitrary data. But it's in a string not a hash table. It's a string representation of what could be a hash table, an object, an array, or any combination of those things. Obviously, my understanding I have no clue what a hash table is, that's why I'm drawing things out of here. So one example you do know, though, is you do know a dictionary. Dictionary is a hash table, right? Well, yeah, I know that it is. I don't actually understand it. Let's talk about what this will look like in C, though, because it'll be more detailed and my key addresses work great. So let's suppose we took those numbers, you know, 256, you know, 0 to 256, the four of them, and we computed it through 255. Yeah, yeah, okay, sorry. And what we did was we converted those four numbers into one number that represented that, right? That should be easy, because these are just ones and zeros. We've arbitrarily broken them up into four, but we could just make it one big number. Any combination of those four numbers right? This makes sense, okay? So we'll say that's the key, right? That's the unique key for that IP address, okay? So what we want to do is find a way that we can quickly figure out where on our array or our list it belongs. Or in your case, if it was a file, what line number would we expect to see it on? Okay, I think we actually let's use that file, because that's great. Let's pretend we use that file. But the file stores the unique ones, right? We won't store every time it comes in. Every time another one comes in, that's the same, we'll just increase the number next to it or something. Okay, so how do we know what line number to look at on? Well, so an easy way to do it, an arbitrary way to do it, is just say whatever that number is, okay, we'll say it's on that line number. So the only problem with that is it could be a really big number. And the IP example is a really simple example, it's going to be a very big number for a string. It's going to be a very big number. So what we would do is we would make an array, or let's suppose a line, or let's say a line example, we could cap it, we could say, well, we know we're never going to get more than a million IPs, even though we don't know that it goes from zero to a million, it's going to go way beyond that, but we know we'll never get more than a million unique ones. And if we do, we'll come up with something great that way. And what we'll do is we'll do modulus, if we were going to modulus before, and it's a little percent symbol every time you're using programming, right? It tells you the remainder. It tells you the remainder. So what we do is we have a list that goes up to a million and let's suppose we get the IP of a million and one, we run the modulus on it for the size of this file, which is a million lines, and it'll give us the number one. And that's where we store it in the first position. But wait, okay, the smart people watching the video are going to say, well, what do we do when we get one? Because then there's a problem. So what we actually do is when we get one, we look at the first position and we see, hey, something's already in the first position and it's not one. And we go to the next position and we put it there, okay, and there would be position two. And so then the next one might be two, and again we'd look at two, two million and one, right? That's a better one, the example. It's two million and one. We see the first position is already taken by one. The next position has already taken the first position was taken by a million and one. The next position was taken by one. We put it in the third position. And this way we can fill that up. Now, how many checks did I make to store the hardest one I've done so far? How many checks did I make? Three. Three. I've done three at this point, I guess, because I've only given you three. But you can see how you make a lot as time went on, and I'm getting very, very close to it very quickly. Sure, sure, yeah. Mine's gripping through the file for each. Right, right, right. On average, for a million spaces, on average you'd look at a half a million. For mine, we would say that mine scales constant, not linear. So it doesn't matter, the beautiful part of a trillion, mine scales very well. The only time mine doesn't scale well is if I didn't leave enough room, right? If I didn't put things in the right places, right? If I didn't leave enough space, these are called hash collisions. When two things occupy the same cell for the first try, and I have to look at the next one, these are called hash collisions. And if I don't manage that well, this is going to be really bad. It's going to start to become bad. Of course, that's worse, it's still only going to be as bad as yours. There's going to be a constant time. There's even another part of mine that I didn't even mention. If I need to figure out certain IPs, if I had a long list of IPs, I'd have to sort unique those and then iterate through them. Although I'm not just grep in file, I also have to sort the original list that I'm looking for where you're kind of making a database so you already know what you're looking for. Right, so in your example, and I want to make sure I don't take something you have to do every time you're trying to find something of interest. That might not be an ideal thing to do, it's sorting. Sorting doesn't necessarily scale well, either, and if you have to do it every time you have a question, it's not typically a good idea. The hash table is always going to find you a good point to go no matter how big that hash table gets. You're going to always be able to look it up. So when you want to get an account, you're just going to then, again, use that key especially with that account. Let's just talk real quick, because I know the file was a great example, but let's just talk about what that might look like. I can see we would allocate an array of certain size. And then we would modulus around that array. And that array would basically essentially have to store a key and a value. It probably needs some type or some strut that would store both the key and the value or the list of values. It might be the count, or it might be the time it came in, whatever we decided to do. So let me say this. I'm still a little lost. Okay, great. Let's end this video here. If you think in our next talk, maybe you can draw something out. Yeah, let's do a little trick on the board. Great. So thank you for watching. Please visit vilt.com. That's Chris Decay. There should be a link in the description. Getting a visual in the next video. Thanks again.