 What's going on everybody? Welcome back to another video. Today we will be heading back into SQL for our third portfolio project. I am extremely excited for this project in particular for a few reasons. One, we're getting back into SQL and I really like SQL. And two, we are finally focusing on data cleaning. And I have talked so much about why data cleaning is important and that you really need to learn how to clean data and that's a big part of what a data analyst does, but I haven't actually showed you how to do it yet. And so that is what this whole project is going to be. And then at the end, you'll get to add it to your portfolio. So it's really a win-win. Now before we start, I just want to say that I think is going to be a little bit more advanced than a very first video in SQL where we walk through data exploration. If you see something that you have never seen before, I will do my best to explain it while we're walking through it. But if you get confused or it seems a little complicated, please pause it, Google it, do a little bit of research and then come back. And I think that will be very helpful. With that being said, let's jump over my screen and we'll get started on the project. So we're going to start over here on GitHub and this is where I've actually put the dataset that we are going to be using. So I will put this link in the description. We're going to go right over here to the Nashville Housing data for data cleaning. All you have to do is click download and it's going to download it and you can open it up if you want to. We're not going to do anything to this data at all, but really quick, I'm just going to show you what it does look like. And we'll of course look at this in SQL in just a little bit. We have a unique ID, parcel ID. We have this address, a sales date, the price of the home. So this is housing data, if you didn't pick up on that already, who actually owns the home, the owner address, and then some information about land value, bedrooms, bathrooms, things like that. Again, not super important because we're going to be doing all of this in SQL. So let's actually get this data into SQL and we're going to import it the exact same way that we did in the very first video. So we're going to come right over here and then go all the way down to Microsoft SQL Server 2019, import and export. We'll click next. Our data source is like last time, a Microsoft Excel. And let's take a look and we'll take that first one. This is the most recent one I've downloaded, but I just wanted to make sure. So I downloaded it a few times. For the destination, we're going to click SQL Server Native Client 11.0. And this is my client or my server right here. And I'm going to go down here and I want to put it in this portfolio project. So just configure this to what your server is. Again, if you haven't done this before, you've never set up SQL Server or a server to go on SQL Server. I will leave a link hopefully right here, also in the description, like I did for the first project. So be sure to go through that video so that you know how to download this and have everything. We're going to copy the data. We're going to take sheet one. We could have renamed sheet one to something else, but we didn't. And then we're going to finish this and finish and it should run successfully hopefully. It's looking good. Perfect. So we have 56,477. So let's head over to SQL. All right, let's go to our database portfolio project. And here is our sheet one. Now I'm going to rename this. Let's rename it. Was it Nashville? Let's just do Nashville housing. That's what I'm going to rename it as at least. So when I post these queries to the GitHub and you see them, this is what they will be. So if you want to have them the exact same or be able to copy and paste them, you know, you should you should do that as well. So let's take a look really quick. It's like the top 1000. There's about 56,000 rows. There's a lot of data in here. And a lot of things. So I'm about to open up a save thing and walk through the exact things that we're going to be working on in just a little bit. But yeah, this is what the data looks like in here. There's lots of columns, lots of data. So really excited about this. Let me pull this open really fast. It's going to be this project walkthrough. Here are the things and I'm going to show you this really quickly. Here are the things that we're going to be walking through. So we're going to standardize the date format. We're going to populate the property address data. That's referring to this right here. If you notice there's the address and there's also the city that it's in. So we want to be able to separate that out. And that is actually right over here. We're going to be doing the same thing to the owner address except that has an address, a city and the state, which makes it a little bit more complicated. And so that one should be really cool to show you. Oops, I messed up. That's what this one is breaking out into individual columns. That's what we're going to do for that. This populating the property address, if you notice, and we'll go into this a little bit, there's actually some values in the property address that are blank. But I'm going to show you how you can actually populate that, which is just a cool trick that I've used a few times and it does work. I mean, I think you'll find that one interesting. In the sold as vacant field, we're going to be doing some case statements if then we're going to be removing duplicates and then deleting unused columns. So we have a lot to get through. This could be potentially the longest video, and I'm okay with that because I love SQL. Down here, and I will say that when I, in the very first video I said is going to be ETL video, and I fully intended on doing that, but I ran into not issues on my side, but issues in the fact that the vast majority of people who are going to be watching this are not going to be able to do what I did to configure my server. But I left it in here anyways. When I think ETL is an automated process in order to extract the data from somewhere, we're going to transform it and then put it somewhere. This was going to be the extraction method, and I was going to put it in a stored procedure so that you could run the store procedure or run the job, import the data. It was going to be really cool. But I know that if I was having trouble with it, me trying to explain it to you and you being able to figure it out on your side was going to be very tough. I left this anyways because I was able to get to work on my computer, but it is tough. It took a lot of research. I did this for a previous server like a year or two ago, and I remember it being crazy hard, but I was able to figure it out on my computer. If you want to try it out, try it out and look into the stuff. I'm going to leave this here. This is just for, if you want to try it, it's a little more advanced. You don't have to. Just import it. This will be a data cleaning project instead of an ETL project, but data cleaning is what 90% of it was going to be anyways. Anyways, let's go back up to the very top really quickly. I have a whole another laptop right here as I did in the first video. I didn't show it to you last time, but I have all of my queries written out over here. I'm going to try to do this as quickly as possible. We have a lot to get through. Now, before we start writing our queries, I am going to turn off my camera so I do not get in the way. All right, you should still be hearing my voice, but let's get started. Let's just start with select everything and we'll do from and it is portfolioproject.dvo.nashvillehousing. Let's just get this pulled up on screen. Awesome. This is exactly what we were looking at before. The very first thing that we're going to be looking at is this sale date. Now, I wrote standardized sale date, but I'm really just going to change the sale date. Let's copy this really quick and let's look at just sale date. It has this time on the end and it serves absolutely no purpose. It just annoys me. I want to take that off. Right now, it's a date time format, but we're going to convert and we're going to do date and we're going to take sale date, sale date and we're going to go like that. Let's run this really quick. This is what we want it to look like. All right, so let's say update and we have portfolio project specified up here so we can just say Nashville housing and we are going to set sale date equal to and we're just going to copy this. Now, I will say before we do this, I had some issues when I was initially doing it, whether or not it made the update. I'm not sure why or why not it was doing it. So yeah, it's not doing it right now. You try it out on yours. It may or may not be working. I'm not exactly sure why that is because I would say like 80% of the time it's doing it, 20% it's not. I don't know why. No logical explanation of that, but most of the time when I did it, they would then be the same column. Something we can do, I just thought of, we can do alter, alter, can't even say that word, alter table and we can say, I think it's new or it's add, add, give me one second. Yeah, so add and we'll just do sale date converted and let's make that a date format. Just like this and then we can say like this and set sale date converted. Let's try this and see what happens. So I'm going to add this column and then I'm going to update this and it says it's affected. Let's see what happened. So let's write sale date convert sale date converted. Let's see what happened. Let's see if it actually worked and it worked. Okay. So we now have a column and maybe at the end we'll remove that sale date column so that we just have that sale date converted, but we know what that is. You don't have to name it that. You can name it sale date two or something like that. Cool. Well, let's go down to the property address and let's get it just a really quick look at it. Let's copy this up here. I hate rewriting this stuff so I'm always copying and pasting, but we're going to be working with the property address. There we go. So let's take a look at this really quick. So let's look at, sorry, I was looking at my notes. We need to look at where the property address is null. So what you'll see really quick when we run this is that there are null values. Why there are no values? Yeah, I really don't know. I really am not sure, but let's look at everything where this is, where it's null. So we have this property address, we have a sale date, a price, legal reference. There's this parcel ID and there's this unique ID. So we have a lot of information and when you have something like this, something like an address, an address is, you know, the address isn't going to change. The address is the address. The owner, the owner's address might change, but the property itself, the address, the 99.9% of the time is not going to change. So you can say with almost certainty that, you know, this property address could be populated if we had a reference point to base that off of. So really quickly, let's look at just everything and let's look at, and we'll just order by, let's do property, not property address, let's do parcel ID, and let's take a look at this. So we have to do a little bit of some research on this, but I'm going to show you something really quick. Let's see if I can find an example in not too long. Okay, so here's an example. Here's the same ID. So 015, and that's the exact same address. And we'll find this a lot of times and I look through the data and it is pretty much accurate. When it does have it, it is the exact same address. So this parcel ID is going to be the same as the property address. So something that we can do is basically say if this parcel ID has an address and this parcel ID does not have an address, let's populate it with this address that's already populated because we know these are going to be the same. That is basically what we are about to do. And it's not super complicated, but let's get started writing it. Let's copy that down there. One thing we are going to have to do with this is do a self-join. So we have to join the table to itself to look at if this is equal to this, then this needs to be equal to this, that kind of thing. So real quick, let's just write that join part out and we'll go from there. I don't know why I sounded Canadian right there. We'll go from there. So we'll join on this and we'll say on a dot, wait, let's label them. I'm going to do this in a really lazy way. I'm just going to do A and B. A dot parcel ID is equal to B dot parcel ID. And let's see really quick. So we need to find a way to distinguish these. The sale date could be the same. One thing, this unique ID is unique. So we need these to be different. So let's use this and let's say and A dot unique ID is not equal to B dot unique ID. So all we have done here is we've joined the the same exact table to itself and we said where the parcel ID is the same, but it's not the same row, right? Because this is a unique ID, unique will never, or that means these will never repeat themselves. So we'll never get the same one. So if this is equal to this, but these are different, we want to then populate the other one. So let's do A dot parcel ID and we'll say A dot property address, B dot parcel ID, B dot property address. And let's take a look at this really quick. And let's do, let me see if this works, where A dot property address is null. And let's see, see what comes up here. Okay, so this is perfect. This is exactly what I wanted to see. So we have this parcel ID, we have this parcel ID and here is our address and it's blank in all 35 of these. So we have an address for all of these, but we're not populating it. So what we want to do is we want to say use this thing called is null. So is null is basically saying it's the first thing is what do we want to check to see if it's null. So we want to check A dot property address, this whole thing. Now if it is null, what do we want to populate? We want to put in there this B dot property address, because we want to take that property address and stick it in there. So let's run this really quick. So this row is what is eventually going to be stuck into this row. So this is perfect. It's literally saying when it's null, take this and put it there. And so that's what this part of it is doing. So let's go in here and write our update. So we want to update and let's take this whole thing from here up. And we'll this is the set. So we're going to set property. Okay, we need to specify. And just so you know, when you're doing joins in an update statement, you're not going to say Nashville housing. Okay, that's going to give you an error. You need to use it by its alias. So let's put a. So now we're going to say property address is going to be equal to and now we're just going to copy this is no and put it right here. And we only want to update. Let's see if it does take this. So I think this should be correct. Let's let's test it out really quick. And we're going to run this above query and see if it made that update. All right. So there you go. As you can see, there are now none that have no in there. Otherwise it'd be giving us an output right now. So that one is fixed. We can go back and check it if you want to please go back and double check that. But that is what we did. And it worked perfectly. So that's what that is no does it checks to see if this is no, if it is no, it can populate with a value. You can also do like a string and what we I mean, you can write, you know, no address. If you wanted to do something like that, we don't want to do that. We're going to keep it how it is. Let's keep moving on. We do not have unlimited time here. Trying to keep this I'm going to try to keep this on one under two hours. Stretching the rules because for my love of sequel, that is the only reason. And this I think is going to take a little longer. So let's take a look and let's copy this real quick to do and let's take a look at what are we doing the property address, the property address, and we can get rid of this as well. So if you notice, we have two things here. We have both the address, and then there's this comma after all of them, and there is the city. Now, you know, you don't know that or you maybe you haven't looked into this, but I have and there are no other commas anywhere, except for in between these things as a separator, as a delimiter. A delimiter is literally if you don't know it, if you never heard that term delimiter, a delimiter is something that separates different columns or different values. So for us, the delimiter is a comma. And for this first one, because we're going to be separating this one out, and then we're going to be doing the owner address. For this one, we're going to be using something called a substring. And we're also going to be using something called a character index or a char index. So let's start writing that out. And let's do select. And let's say substring. Now, the substring that we want to take, we of course want to be looking at oops, let me put this down here. So it helps us out a little bit. And I'll get like that. So substring. And of course, we're looking at property address. And we want to look at position one. So we're going to start at position one. Now this next part is something that you may have never seen before. And if that if you haven't, that's totally okay. We're going to be the character index is going to be searching for the is going to basically be searching for a specific value. Okay, that's all it's doing. And you can you can look into this a little bit more if you want. So it's going to be char index. That's how it's spelled. And then like an open parentheses. And we want to specify what we're looking for. So it can be anything you can even do, you know, if you wanted to things like Tom, or you can do value. Well, you do it like this, you can look for Tom, or if you're looking for a specific word like john, you can search that that's what this is for. But we're going to do a comma. Where are we looking? That's what this next one is. So we're looking in property address. And then we're going to close the parentheses. And we'd also close it again to complete off that substring. And we'll say as address. And let's just take a look really quick at this. So right now it's taking the it is basically going is looking at property address, it's going to the very first value you're starting at the first value. And then it's going until the comma. Now the unfortunate thing is, is we're actually getting this comma in this output. And we don't want that. You don't want a comma at the end of every address. We can change that. So we can say, because this is specifying a position, if we just look at this chart index, which we can do really quick, it is going to give us a number. It is saying at position 19, that is where the comma is, right? So it's not like it's taking, it's not a value or it's not a, it's not a string. It's a number. So we can say minus one. And if we do that, and now we run it. Now that comma is gone because we're looking back, we're going to the comma and then going back one from one behind the comma. So that's how you get rid of that comma right there. The next one's a little bit more tricky because we're not starting, well, it's not super tricky, but we're not starting at that first position anymore. So let's put a comma and we have our substring. Now where we want to start is at this, as, at where the comma is. So instead of position one, we want it to be where that character index. I don't want it to look like this this whole time. Is it like this? What am I doing? It doesn't matter. Let's just get rid of this and see if that fixes it. What am I doing here? Oh, it's just because this is wrong. And we'll just do comma, parentheses that might fix it. That doesn't matter. Okay. I'm wasting time. I'm going to keep going. We want to start in this, in this position. Okay. But we actually don't want to start at minus one. We need to start at plus one because we want to go to the actual comma itself. Then once we get to the comma, we want to add one. So we didn't, if we just left it the same, again, it would include the comma at the beginning. Then we need to specify where it needs to go to, where does it need to finish? Now every single thing is going to be different. Every single address has a different length, but we can use that to our advantage in this one. And we can literally say the length of property address. You guessed it right. And then we can close this off. Let's see if that works. Okay. What's messing up? So we have property substring, property address, comma, character index. And then we have specifying it in the comma. We have the property address plus one. Okay. We can't have that right there. I don't know why I had that thing. Finally figured it out at the end. So let's see what we're doing here. Let's see if it worked. It works perfect. And again, this was one that I'm guessing a lot of people haven't used before. So I was trying to explain it a little bit more than other ones. But if we take that out, if you take out that plus one, you're going to see the comma at the beginning right here. So that's what that is. So plus one, and that's what we're going to keep now. We can't separate two values from one column without creating two other columns. So just like we added this table up here, we're just going to, I mean, we're, I'm just going to copy this down here really quick. We're going to create two new columns and add that value in. So we're going to add that. We're going to call this, let's call it, because it's property address. Let's do property, property, split. And this is the address. And then we'll say this one, this next one is going to be property. And this is city, split city, city. And this isn't going to be a date, of course. This is going to be, let's do nvar char. And let's make it 255 just in case it's a large, just in case it is a large string, large text. So then we can say update that, update that. And now we need to insert what we did for it. So this first one is the address. So we're going to say that equals the address. And we're going to take this whole thing, this whole substring, oops, and copy that. And that's going to equal this. And then at the end, we'll, we'll look at it really quick. So first, let's add this table. I'm going to do this one at a time really quick so you can see it. So it adds the table. Now it adds the results. And again, adds the table of city and sets that city to that substring. And now let's take, let's take this and just do select everything from this. And you should see at the very end, because when you added it, it goes to the end, we should have two new values. And here we are. So property, split address and property, split city, it's much more usable than this. I mean, this would be a nightmare, not a nightmare, it'd just be annoying to use this column. I mean, now that it's separated on the address in the city, it's so much more usable of data. I really, really is. The next thing we're going to be looking at is this owner address. Now, it was hard enough or it was tough enough to do this. But I want to show you maybe even a simpler way to do it, even though this is more complicated. So let's go down here and let's get rid of this. So let's say, let's get this and let's just say property, oops, no, we're doing owner, owner address. Here we go. Let's just take a look at this. Let's see what we got. So again, we're using, or what we have in here is the address, the city and the state. So what we need to do is split all of those out. And again, I don't want to use substrings again. That was a pain. I want to use something a little different, something again that you may have never seen. It's called parse name. And parse name is super useful, especially for like the limited stuff, stuff that's delimited by a specific value. So let me just show you what it is. And then we'll go from there. So we can say parse name. And we're going to be doing this on the owner address. Okay. Let me see. Let me see. Yeah. I mean, it's because I don't have this, of course, I do that all the time. So annoying. So on the owner address, and then let's do one. And let's just see what happens. Nothing changed, of course, because parse name only is useful with periods or that's what it looks for. That's what parse name looks for. And these are commas. So something we can just do is we can replace those commas with a instead of a comma, we replace it with a period. So super easy. We're just going to do owner address comma. And we'll look for the comma in there. Then we need to specify what we need to change it to, we'll change to a period. And let's close that. And now let's run it. And it's taking Tennessee. So something odd about, at least to me, odd about parse name is that it kind of does things backwards than what you would expect it to do. Let's really quick. Let's add the other things. You'll get a kick out of this as much as I do. Here's one, two, three. Let's execute this. And it separates everything for us, but it's backwards. So it's one, two, three. You would imagine it'd be one, two, three, but no, it's one, two, three. So all we need to do is go three, two, one, and run this. And there we go. So now we have it broken out. This is now our address. This is our city. And this is our state. So super, what I would consider super easy, a lot easier than a substring, but I didn't want to show you the easy one first and then give you the hard one. So now we just need to add those columns. And then we need to add the values. So let's do this. Let's make some room. And I need to get rid of one of these. I think, ooh, did I do that right? What did I do? I have my altar table update, altar table update. What's this doing here? What is this? I don't even know what this is. Let's just go like that. So now we have three. Perfect. So from National Housing, we're going to say, we're going to say this is the owner. Oops, owner split address. Actually, let me just copy the owner make it easier. So we have owner split address, owner split city. And let's do owner split. And then state. Oops. And copy there, owner split city. There we go. Owner split address, owner split address. So I'm putting all the sets equal to what we're about to add to. So now this first one, this three is the address. We'll paste it there. This second one is the city. So we'll put that. Oh, I see what happened here. That's what happened. Gotta get rid of that. I set the owner split city equal to that middle one. And then of course, the third one is the state. So let's go do that. And that should be done. So let's do it two at a time. Oops, owner split address. What's wrong with that? Oh, I probably just got to run this first. Let's try that. Try to get good too quick. You can do this in much more efficient way. I'm just doing this for visual purposes. I would update all the tables first, or add all the columns first, I mean, and then do all the updating at the end. That's normally how I do it. But again, for visual purposes, this is what we're doing. So let's go get this. Actually, let's get this. Bring this down here. Don't keep this in in your final queries. It's a lot of extra selecting everything. You don't need to do that. So here we go. So owner split address, owner split city, owner split state. Again, so much more usable than when it's all in one column. I mean, it is 10, 100 times more useful data now. That one to me, that gets used a lot. Let's keep it going. I feel like we're making fantastic time. I don't even know. I'm not even keeping track of time. Time is not even relative anymore. It'll be three hours and I wouldn't care. Let's keep going. Let's take a look at this column right here, sold as vacant. Right now is no, but let's look at, let's do select distinct. Oh, gosh, I hate when I do this. I do this all the time. Am I the only one? I don't think I'm the only one. And we'll do, what is it? Sold as, okay. Sold as vacant. Let's do a distinct count on, or distinct on these. So right now we have yes, no, and why I'm guessing which is known yes and then no. So let's look at, well, just for, just because I'm curious, let's look at a count of, I don't want to do that. Let me just do sold as vacant. Let me do a count of this and group by sold as vacant. Okay. Let's run this and see what we get. Let me order by. Okay. Here we go. Now we're, now we're moving. That's not what I wanted at all. Order by two. Here's what I wanted. Okay. So at no, we have 51,000. Yes, 4,000, almost 5,000. No, and then just a few. So let's change them to yes and no, because these are obviously the vastly more populated ones. And we're just going to do this through a case statement. So we're going to say, oh yeah, let me get this ready before we start. Oh yeah, I'm ahead of the game now. Let's do select and we'll do sold as vacant. And then we'll start our case statement. Yeah, let's do right here. So we'll do case when sold as vacant is equal to yes. All we want to do is say then we want to make it no. Oh, we want to make a yes. What am I doing? Jeez, I'm losing it. When, and I'm just, oops, oops, oops, ignore that. Pretend that didn't happen. When sold as vacant is equal to n, then no. And then else, we want to say if it's already, if it's not one of those values, it means it's already a yes or no. So we're just going to say just keep it as sold as vacant. And then we'll end it. So let's take a look. Okay, so let's scroll through here and see if we get any that we can see. Oh, I just went by some, didn't I? Oh, I just went by some. I know I did. Let's see. Okay, here we go. So here's an n. It's now a no. So this sold as vacant is this column, the newly, the case statement right here is changing it. So the end is no. So this should work. And this will be a unique update statement. And I hope it works on like the first update statement that we did. That was a that was a travesty. Let's do update Nashville housing. And we'll say set. Sorry, I'm talking faster than I'm going set sold as vacant equal to and we can just literally put in this case statement. It's not pretty, but let's try it. Okay, now let's go look at this again and see if it made the update. There we go. The update statement worked. Oh, fantastic. It's a beautiful thing. Okay, great. I'm glad that one worked. I was worried for a second that my update had broken in in SQL Server. Now we're going to do something. These next two things is we're going to remove the duplicates and then we're going to get rid of unused columns. This removing duplicates, I got to be honest, I don't do it a ton in SQL, but I have done it especially for like queries. You know, when I'm looking at full tables, I will write some sort of temp table and like put the removed duplicates in there. I normally don't delete actual data. We are, we're going to do that, but it's not a standard practice to delete data that's in your database. So just for future purposes, don't blame me if you delete all the duplicates backcident in your table at work. So you can do this a few different ways, but the way I'm going to show you is we're going to write a CTE and we're going to do some Windows functions to find where there are duplicate values. Okay, so excuse me. So let's start writing out our CTE and you know, even we can write out the query first, then put it into a CTE. That might be a little bit better. So let's do select everything and oh my gosh, I was about to do it. Somebody's out there just like waiting for me to make that mistake again. So we want to partition our data. When you're doing removing duplicates, we're going to have duplicate rows and we need to be able to have a way to identify those rows, right? So you can use things like rank, order rank, row number. There are a few different options. We're going to be using row number. And you know, if you want to look into how rank and rank, like dense rank and all those ones work, please do that so you know why we're doing it. But we're using row number because it's the I think the simplest and it's going to do what we need exactly. So I'm going to get this over here, we'll say select everything because we're selecting everything, then we're going to add this row number on here. So row number and we're going to do these parentheses right here. I'm going to say over and an open parentheses. Now we need to write our partition because we're going to partition this data. So we're going to say partition by cool. Now really quickly, while we're here, we need to actually know what we're partitioning on that's helpful. So let me write this while we're writing it, we can see what we're doing. We need to partition it on things that should be unique to basically to each row. If and I guess for the sake of what we're doing, we're going to pretend this unique ID isn't here. Although, you know, you can say I'm cheating, it doesn't matter. But I'm going to say, you know, if things like the parcel ID are the same, if the sale date is the same, the property address is the same, the sales price is the same, this legal reference, which I'm guessing is some type of legal document saying it's like somebody's property. If all of those are the exact same, then to me, that is the same data. It's unusable. Just for example, I mean, this may, I don't, I mean, this data is just some random data set I found online. Right. So that's what we're going to be going with. That's what we're going to be running with and pretend that lie that I just told you is completely true. So what we want to partition by, let's start with the parcel. Can I, is this not right here? Why is it saying this? Why is it not giving me? Okay. It doesn't even matter. I'm just going to say parcel ID. We can say property. We'll do a property address. Stick with me. We're getting somewhere. We'll do sale price. What do we say? Sale date. I mean, there shouldn't be two of this. They didn't sell twice on the same day. Come on. And then legal reference. And, oh, I know why it's not working or my auto complete isn't working, which I love. It's because we're creating our own partition. So it's its own column, of course. I don't know why it's late. As you can see down here, it's 1115. It's getting late for me. But hey, this is an adrenaline rush for me. Now we need to order it. Now we want to order on something that should be not necessarily, I guess, unique. So we're going to order on this unique ID. We'll see if that actually does what we want it to do. Oops. What am I doing? Order by. Come on. And we'll say unique ID. Perfect. And we should be able to close that off. And we're going to call this Rownum. I mean, that's just makes sense. So now we have this. And let's run this really quick and see what happens. And maybe we should order this as well. But maybe we'll do that later. Yeah, let's order this on parcel ID. Order by parcel ID. Let's just see what happens. Because this I think that should be pretty accurate. But let's scroll down and see if we get any. This is all ones. Maybe this should be doing it on unique ID. I don't know. Let's see if we get any hits. Okay, there was a two in there. Let's let's look at this really quick because I want to see it. Maybe I did something wrong. I don't know. It is absolutely possible. So many play some Jeopardy music for me real quick. Yeah, I don't know. I don't know why it's okay. So let's see. Let's look at these two. And let's see if I did something wrong. Oops. Don't need to pull that up. I was doing some research when that convert by wasn't working. Okay, so this one and this one, it's giving different row numbers. So let's look at the actual data. Ignore the unique ID, but the data itself. So the the sale date is the same. The sale price is the same, the legal references the same. The owner is the same. This is the same. I mean, literally every single thing in here is the same. So this is a good example. So we're going to do in this query that we're about to write that that will be that second one will be deleted because we don't need it now. There's there's only one. So it looks like this is working as intended. I can also do let's do where row underscore num is greater than one. Let's see if that I don't think it will work actually. Yeah, that's because it is that is in a Windows function. Of course, we can't do that. What am I thinking? That's why we need to put it into a CTE. Of course, it all comes back. So let's call this all comes back to the CTE. These things are amazing. Let's call this row num num CTE and we'll say as and then open parentheses. And I don't think we can have an order by in here. Let's do it like this. And let's just do select everything from row number CTE. So again, if you haven't watched my like CTE video or you've never used a CTE before, this is now basically almost like a temp table. So we're going to be able to this query down here is querying off of this table that we quote unquote created. So it looks like it's working. So all we're going to do is select everything from that. And we want to say where row num because that's now a row is greater than one. And let's order that by I don't know property address. Let's see if that works. And let's see what happens. Okay. So all of these are duplicates. We have 104 of them, it looks like. So there's not many, but it there's twos and threes, no, no threes. So there's multiple of these rows or columns that basically duplicates and we want to delete them. So all we're going to say is we're going to select instead of saying select everything from row, we're just going to say delete. And yeah, I got to get rid of that order by that doesn't work. And let's do this. There's 104. Let's see if it worked. So now let's do let's go back and we'll say select everything. And let's see if there's any more duplicates in there. There are none. That is fantastic. I'm like biting my nails now to see if each one of these works. Because that first one didn't work. So yeah, so it worked. We got rid of the duplicates. That is fantastic. And now it's smooth sailing from here because we're just going to delete some unused columns that we don't care about. This doesn't happen often. This, I would say actually happens more in like views. When I'm creating views, I have a view and I'm like, Oh, I didn't mean to add that column. Let me just remove it because I don't need it. You don't do this to like the raw data that you import. Usually this is, I mean, again, best practices, please don't do this to your raw data that comes into your database. Talk to somebody before you do this. That's just my my legal advice for the day. I'm not legally bound or legally held responsible for any mistakes you make. So let's keep going. We're literally just going to delete some columns. It could be any columns that we want. But for example, we got half these property split address and owner split address in city and state and city. And these are perfect and much more useful than these owner address because this is really unusable to be honest. So we're going to delete those. And maybe we'll also get rid of like, I don't know, maybe the land that land use might be useful. This tax tax district, who cares about that? So it's going to be super easy. We're just going to write alter table alter table. Did I say that right? Geez. And I'm going to say alter this table. And we're going to drop a column. And you can do as many as we want. So we're going to say owner address. We're going to do tax district. And let's also do the property address. All right. And let's try this and we'll see if it works. I'm nervous. All right. So as you can see that the property address has gone, the owner address has gone, the tax, what was it? Tax district has gone. And now we are left with this. Now remember the whole point of everything we were doing was to clean up the data, right? We wanted to clean the data. And actually now that, well, now that we're here, we have this sale date as well. And we have the sale date converted over here. Let's get rid, I forgot, let's get rid of this. Oh, that was my dog Max, excuse him. Let's get rid of, oops, let's get rid of that sale price that, or the sale date that made me look like an idiot. This is sweet revenge sale date. Sweet, sweet revenge. All right. And it is gone. So it's as easy as that. Now remember, like I was saying before, the whole point of this project is to clean the data and make it more usable. And it may not have felt like that as we were going through, because I wasn't really looking at the cleaning data. We were cleaning it, but what was the purpose of it? I may not have highlighted that too much. All these other columns that we created are just, it's much more usable, much more friendly. This is standardized now. And we did that through quite a few various methods. So let's go back up to the top. We're going to recap what we did really quick. So using this convert, we tried to standardize the date format or change the date format, may or may not have worked for you, didn't work for me. We populated this property address, which we did that before we broke this out. Because if we reversed it, if we broke these addresses out into individual columns, and then we populated this thing, we would have, because then we went and deleted, we went and deleted this column. Oops, sorry. We went and deleted this property address. So we wouldn't have actually gotten any of that data. So there was a reason it was in that order. Don't mess that up. That's happened. So we broke it out. We did that to using substring char index, as well as parse name and replace. Then we went through and we changed yes to no or y and ns to yeses and nos using k statements. Then we removed duplicates using row number, a CTE, and Windows function of partition by. And at the end, we deleted a few useless columns that we no longer want to see because they are horrible and terrible and we don't want to see them anymore. That is the entire project. That was everything and you did it. I'm honestly super proud of you for sticking around this long. This was not necessarily an easy project. We used quite a few new things that I may have not talked about or showed you before. This to me is just the beginning. This is just a glimpse into all the things that you need to do. You need to look for in order to clean data. I really do think this is a good portfolio project because it will show that you understand and know how to clean the data. Although this is not an end to end project, that would take a long time and a lot more exploratory analysis looking into the data to figure out what we need to change. For all intents and purposes, this is a pretty good project for cleaning data. I hope that you learned something. I also hope that you worked on this hard. If you want to make any improvements, please do that. This is not perfect by any means. There are other things that you could change. I don't even know. I'm not even going to try to guess. You could do other things to this data though and create your own queries. Create your own data cleaning part of this. Do that. If you were able to get the ETL part of it done, do that. I think it would be really, really cool. Again, I was able to get it to work, but I don't think 90% of people out there would be able to get it to work. Every computer is different. Every server is configured differently. It would just be a huge pain. I decided to cut that out and I'm sorry, but hopefully this will suffice. With that being said, this is it. You made it all the way to the end. Again, I'm super proud. You guys are doing fantastic. You guys are the ones putting in the hard work to build the portfolio for your future job. It's not easy, but you're putting in the work, and so kudos to you. In our next video, we're going to be going into Python for the very first time. Really excited about that one because I think the only Python video that I have up right now is on one where I was scraping data from Twitter. This will be a nice change of pace or a little bit different content that I normally put out, and so I'm really excited about it, and I hope you are as well. With that being said, I am done with the video. I'm going to be stopping it soon. Thank you for joining me. If you like this video, be sure to subscribe. Be sure to like this video. Leave a comment below telling me how it changed your life, and I will see you in the next video. Goodbye.