 What is going on everybody? Welcome back to another video. Today we are starting our data analyst portfolio project series. Now before we jump into our first project I wanted to talk with you for just a second so that we're all on the same page. First thing is that there are gonna be four projects. The first one is going to be SQL and we're doing a lot of data exploration and we're setting up a lot of our data to visualize it in Tableau. Tableau is going to be our second project. In our third project again we're going back to SQL but we're going to doing a lot more of the ETL process so a lot more of the data cleaning. I did that one as the third project because I think it's going to be a little bit more advanced than this first project. I tried to make it as beginner-friendly as possible so even if you are a complete beginner as long as you've walked through the tutorials that I have made on my channel you should be pretty good. And then the fourth and the final project will be with Python. We'll be using a lot of pandas doing a little bit of data cleaning and then doing visualizations as well. As I said just a second ago I'm trying to make this as beginner-friendly as I possibly can. The whole point of this series is that if you are trying to apply for a data analyst job by the end of the series you should have an entire portfolio or at least a really good start at a portfolio to show a potential employer. I give you full permission to copy every script every query line for line if that is what you want to do and create your own portfolio. I am totally fine with that but I will encourage you and I'm sure I'll say this throughout the video I encourage you to try to think of your own queries try to think of your own insights and your own things that you can do to make this portfolio project unique. With that being said I'm super excited to get started on this with you guys so let's jump over to my screen and get started on our very first project. All right so now that we are on my screen we are going to get started on this project we're going to download the data set we are going to format it just a little bit in Excel and then we're going to get it into SQL where we will start querying it. I will say that I think this is going to be a very long video. I'm hoping to keep it under an hour and a half. I may separate this into two videos depending on how long it runs but you know I will do my best to keep it short but we have a lot to get through. I'm going to basically do no cuts that's my goal is to do no cuts in this because I want to walk you through each step of the process so that you understand everything that's going on and you don't get lost at some point but I think this is probably the best way to do it we'll see. The very first thing we're going to do is download our data set so you know as we are looking at this there's an option right here to download the data set I don't recommend that one you can it just won't give you all the information that I personally want which is to go back to like the very beginning. If you go down right here to the very first graph you can actually push this back and then download it and what this will do is it will go back to I think January 1st of 2020. So let's open this one up and when we get in here we're going to reformat it just a little bit it's nothing too complicated I hope I'm just going to double click here actually let me let me go up here and filter just in case you want to filter anything so what we have here is a ton of information on COPEN I mean just a ton and it goes back to early 2020 I believe it does go back to the first of 2020 so really quick a really brief introduction of what kind of data is in here we have total cases new cases total deaths new deaths we use those quite a bit in the the queries that are coming up if we go way over here we have total vaccinations people vaccinated and then over here a little bit farther we have a population that's the main stuff we're going to be working with today as you can see there's so many other things in here I mean you can use this if you want to go back and do more stuff on this I highly recommend it there's such you know there's such unique data in here about smokers and diabetes and like all this random stuff that I did not do a de-diving I mean I could I could spend you know a month just like looking at this data set and and getting really interesting stuff from it but I'm not going to do that I wanted to do this faster than two months to complete what we're going to do is we're going to go back over here and we're going to take this population and we're going to click on this a s and we're going to click control x and that's going to cut it we're going to go back to the very beginning and we're going to place it right here and we're going to right click and say insert cut cells now why are we doing this because I've already done this entire project and if you don't do this you're going to do a join with every single query you do which if you want to do that keep it there and then just you know change your query for that I did it like this because I wanted to show joins later on I wanted to keep it kind of simple at the beginning and then work my way to a little bit more advanced things which you will see it gets you know semi advanced but not too much I promise you stick with me let's go back over here we're going to go to actually double a and then we're going to click control shift right key that's going to select everything over here and we're going to literally delete it okay this is going to be our first table over here so everything you see over here is our first table um and we're going to save that so let's save as I'm just going to keep it in my downloads as and let's do covid deaths so that has our death information the next one is going to include our um vaccination information which is what we're going to join on and then um we're going to do that later so let's let's hit control z that's going to bring it back now let's select on z and go all the way to e and we're going to do the same thing we're going to delete this looks like there's no data but I promise there is later on the vaccinations um like total vaccinations if we go down um you can see that that starts on in February the end very end of February in 2021 that's because vaccinations are you know didn't come out until recently now let's save this file and we're going to save as instead of covid deaths we'll do covid vaccinations all right now let's save that so now we have our two excels that we want we need to get them into sequel we're going to go over to sequel and we're going to create a portfolio project database I've already done this all you have to do though is right click click new database type in portfolio project and then click okay and it will create your database for you um if you open up the tables it should be empty and that's where we're going to put these two excel files now I had a ton of trouble actually importing these excels um I mean I tried everything and I eventually just went down a rabbit hole of how to get these in I don't know if it's me or or what but I could not figure out how to do it if you go to portfolio project you hit tasks and you hit import data that may do it for you and it may work um it did not work for me uh it just it kept giving me errors so what I would recommend you do right off the bat just to make sure that we're doing the same thing and you can do it that way if you want I went over here to start again I'm on a windows and I went down to Microsoft sequel server 2019 and clicked import and export it looks the same but for whatever reason it it all the research I did it has to do with the 32 bit versus the 64 bit when you do it this way it goes to the 64 bit and it is able to import the data if you do it the other way it was doing it the 32 bit version and gives you an error I don't understand it don't ask me that's that's the reason that's I mean I went down a huge rabbit hole but this one works so let's go over here and this is going to be our data source where's the data coming from it's an excel file so let's do that let's browse and let's go over to my downloads I thought I saved in downloads maybe because it's an excel workbook what was I saving before oh that's a csv okay something important to note is we're doing an excel and not a csv you're going to get the same error I'm just doing it live and I'm making myself look stupid so we're going to save it but instead of a csv we're going to save it as an excel workbook let's save that now we have to go back to how it was right here the same way and we're going to file save as and let's do this is now COVID deaths and save it as a workbook now we have them now let's go back now we have our COVID deaths and our COVID vaccinations let's do our deaths first let me get back right here so it looks kind of more normal so we have our excel file we have our COVID deaths let's go next and now we have to say where we're going to place it where's our destination so we're going to click over here and go down to sql server native client 11.0 I want to say this is something that I messed up and it took me like 45 minutes to figure out it was the stupidest mistake it's going to auto populate a server name and I never checked to confirm that this was my server name and so I couldn't figure out why I wasn't able to insert this into my portfolio project database that's because mine is 01 I created two different servers intentionally and for whatever reason I forgot that and so all I had to do is add 01 over here so just make sure yours is is the same thing click portfolio project click next yes we're going to copy the data should auto populate if it doesn't if you like multiple you know always a check mark on the one that you think is the right one it should be the first one we'll click next we'll just click finish I'm sure it says run immediately we'll click finish and finish now while this is running there should be around 89,000 that's how it was like a week ago when I started it maybe a little more now because there's extra days um with that being said you know this is going to be a good size amount of data we're about to do a lot of different things we're going to start at the very basics of just like querying the table like super simple and then we're going to go into things like joins ctes temp tables creating views um the whole purpose of what we're about to do is not to it's not to keep it too simple um I want to showcase to a potential employer right that you can do more advanced things so I'm going to probably do I mean I'm looking at because I have already done this entire project individually I mean we've probably got like 15 to 20 queries here you don't have to do all of them um I'm going to walk through all of them and you can choose which ones you want but you don't have to do all of them it is quite a few so just know that so there's 85,000 right here that's fantastic it won't show up immediately you need to refresh it and there we go so that's our COVID vaccinations let's get rid of this so we just have COVID vaccinations um I thought that was our COVID deaths one but maybe I'm wrong but let's do the exact same thing down here and we will import say next we're gonna go down to excel and browse and now we want to do the COVID deaths apparently last time we did the vaccinations which um I actually actually you know what I bet what it did was it took yeah it took this right here as COVID vaccinations but that was the deaths one as it saved so uh forget that let's go right here let's do the COVID vaccinations it just has the same sheet name uh so sorry for the confusion destination is going to be the exact same place it's going to be sequel server native client let's add that 01 and let's click refresh portfolio project next next um like I said before if it does this just click the first one it's going to be COVID vaccinations it did that for the COVID deaths that's because I made the mistake earlier I hope you I hope when you're watching this you aren't super confused um the whole point make two tables or make two excels one should be COVID deaths one should be COVID vaccinations upload them and then rename them in a nutshell so we have the same amount let's refresh this this one is actually the COVID vaccinations this one is COVID deaths I'm telling you this stuff is it confuses me sometimes to be honest but we're going to query this really quick to make sure we are actually doing um what we're supposed to be doing so let's do select everything from um let's do portfolio project and you can do dot dbo or you can do dot dot I tend to just do that because it's easier um let's look at this one make sure it's the right table so we have total cases new cases perfect um and let's order on let's do three comma four just to make sure or order by of course um just to make sure that we have all everything that we're looking for so this looks right this looks like our excel let's copy this let's go down here we're going to do COVID vaccinations and let's run this one make sure the second one came in correctly as well so perfect so we have our two tables this is fantastic news um and now we can get going um we can keep this one I'm going to comment it out in case you know we want to come back to it um I'm going to really quick again right here I have another laptop I have already done this whole project so I'm just using it as a guideline to know kind of what I'm doing next so that I don't waste everyone's time um so really quickly let's just let's select the data that we are going to be using you don't have to use these comments I will say that I'm going to specify I'm going to say hey this comment is something I would keep in your portfolio project I'm going to add a bunch of extra stuff that is not needed just for your purpose but when you are creating your portfolio project you shouldn't be adding some of the things that I'm going to be commenting um on so we're going to do um or actually let's do really quick let's copy this so that it kind of knows what we're doing so let's select the location uh the date the total cases the new cases the total deaths and then population uh now where we're at I'm going to turn off my camera because it's going to get it's going to start getting in the way to be honest I don't want it to interfere with your ability to see what we're doing on screen so it's been great seeing you guys I'm going to turn this off and we will continue from here all right that should be turned off so let's keep running so this is what we're doing let's actually let's keep this going because I I don't like things not being organized um so we have our location oh no we want to do one two we want to do it based off the location and the date makes things everything easier I promise you so we're going to be the first one's obviously Afghanistan here's our date we have our total cases our new cases total deaths and population so really quick I'm just going to scroll down just a second um they started having you know the the total deaths it's um it started about a month after they got their first case it looks like so and then it just like ramps up a lot um and we're going to be diving into all these numbers what they mean how to do some really simple calculations on them um but really quickly we're just going to do again a super simple calculation um and one that we do multiple times for different things um so let's go right down here and let's say we're going to be looking at the total cases versus total deaths so how many cases are there in this country and then how many deaths do they have per um you know how many deaths they have for their entire cases so let's say they have a thousand people who who've been diagnosed they had 10 people who died what's the percentage of people who died who had um who who had it so uh let's go right down here and we're gonna I'm just going to copy this really quick it's just going to make our life easier I think you should do the same as well um so we have location date total cases um and we're going to get rid of our new cases because we don't need that one in this query right here in order to be in this population so let's work on our calculation really quick it should be super super easy let me make sure I'm still recording perfect oh man we're 25 almost 25 minutes in um or more because I have the intro so now we're going to do uh we want to know the percentage of people who are dying who actually get infected or or or who um report being infected so we're going to do um total underscore deaths we'll go right down here and we're going to divide that by the total cases total cases and if we do this really quick um what it's going to have and well let's go down to where there's actually numbers so we have 34 we have one um it's it's showing 0.029 percent if you ever try to get a percentage of something you have to multiply times a hundred um so let's do that really quick all we have to add is the what's that the asterisk sign um times 100 um and while we're here let's just add the um what's called alias let's do let's call this death percentage I don't know that that works for me and let's take a look at this it'll be a little bit more accurate accurate so when there were 34 there was one and that gives us a 2.94 percent death rate and we can go down even further um and this is still all afghanistan and let's go down to the very bottom let's go down to the very very bottom so as of as of today yesterday there were 59 745 total cases in afghanistan and there were 20 2625 deaths which is four percent so you have a four percent chance basically right now of dying I mean if you want to look at like that four percent chance of dying if you get it and you live in afghanistan let's I mean we you don't have to but really quick just to look at it further let's look at where the location um I think it's let's say like real quick because I'm not 100% if it states um it should I think it's united states but yeah so I mean I live in the united states if you don't you can look at your country but um you know this is like this is genuine real reported data so it's really interesting um right at the beginning I mean though I don't know if it was the way we were reporting or what but we had really high percentage rates as we go down we're looking at a five percent six percent I mean this was the peak of it this got really bad in the US um maybe I hope it gets better um but but how many are we at this is I'm going to go to the end of this year when you're sitting around two to three percent um yeah it goes down to under two percent so at the end of at the end of the year we were looking at over two million people that's two million no wait 20 million nine three six three wait wait wait 20 million people who have been infected um that's a lot that's a lot of 20 million people who have had it 35,000 or 352,000 deaths by the end of the year that's a lot um let's keep going um and at the very end we had over 32,346,971 that's a lot of people who've been infected um there's a lot of deaths 576,000 and I verified this number um I googled it google knows all I googled this number and it's pretty accurate um and it's really sad that's a lot of lot of lives um and that's one point seven eight percent so as of right now if you're to get it today a rough estimate is around one uh and three fours to two percent chance that you're that you could die from it um so really interesting numbers this is the kind of exploratory stuff that that you know we're gonna be doing we're gonna get a lot more advanced as we go on but this shows you know the likelihood um and we can I'm gonna write that shows the likely I hope I'm spelling is right I'm not spelling is right likely I hope that's right if it's not I apologize uh likelihood of dying if you contract uh covid in your country um again rough estimates but you know just glancing at the data that's kind of what we're looking at um now we're gonna look at and let's go down here let's look at looking at the total cases versus the population again we're gonna do a lot of this like percentage stuff um it it's pretty simple um that will only last for so long I promise you but it'll be really I'm gonna keep it on the states just because um I'm gonna be looking at that one the most because it obviously it's pretty relevant to me um so if you're in another country filter it by your country you'll be really interested in the stats I know I was really really really um shocked by a lot of the things that we're gonna find today so we're gonna keep the location we're going to we're going to keep the date keep the total cases um but let's change this to population and then instead of um the total cases being here we're gonna put the total cases there and then change this to population so what is this going to do for us this is gonna show us what percentage of population has gotten COVID so shows what percentage of population oops got COVID um some of these things again they're they're good to know um the one that I upload to github will have the notes that I recommend keeping again not everything in here is um not everything in here is what you know you need to have in there this is mostly just you know what I think you guys need to see while we're actually typing this out all right so let's take a look at this um actually I want to change this I want to put this right here just as easier for me visually just for because the total cases right here so our population in the US is around 331 million um so at the beginning when we had one case I mean that it's like nothing let's keep scrolling um and see where we get to 1% so 1% that's 3,311,312 people and that happened in what is that August August of last year so 1% of the population let's keep going all the way down again we're just kind of glancing at this we're about 10% again we're at that 32 million so 10% of the population has has gotten it gotten a test and it's been confirmed so really interesting um you know we'll come back to that when I'm sure in the future I you know we might make we might use this one as like a visualization again I'm only looking at the states or the United States right now but you know think about it in terms of how we're going to visualize this in the future because a lot of what we're doing we're going to visualize in the future in Tableau I have Tableau even open right here you can see I have a map this is just a soup I threw this together in like two seconds we have the we have the location and so you know this is like our future this is what you need to be envisioning when you're looking at this data so we have you know Afghanistan and let's just scroll through Belarus and Bolivia and Bulgaria and Cambodia all the every single country that that is reporting so we're just looking at the states but remember all of these are going to be used so just something to remember I want to know and I'm really curious as to what countries have the highest infection rates compared to the population so we're just looking at our population up here how are we going to do this we'll do actually let me say well let me write it out really quick so let's look looking at countries with highest infection rate compared to population so that's what this script is going to do or this query is going to do I'm going to copy this so we're going to keep the location we are not going to keep the date this is not going to be date specific there's just going to be overall and then we're going to look at the max of the total cases so we only want to look at the highest so when we're looking at the US we had 32 million we don't want to look at every single pop of the total cases we only look at the very highest one so we'll look at the max total cases and let's right here we'll just say give an alias at least something to recognize it so highest I guess we can say infection count so we'll say highest infection count that's the highest infection count per country so per location and then we want to also take because it's going to it's not going since we don't have max total cases here if we just kept total cases here it'll give us the same one that we're looking at in this above query what we need to do is we need to look at the max of this so we're going to look at max and just add a parenthesis there and we'll look at this isn't the death percentage anymore I forgot to change that in this last one this is what is this percent of population infected so let's change that for both of these because I don't want to get confused when you're looking at the column headers later so we'll look at the percent of population infected let's run this and see what we get list is not contained in either the aggregate oh I need to add a group by of course so let's add a group by and we need to group by both the population and the location so let's try that really quick let's see if this works awesome well we ordered on location and population but I really want to look at the highest so let's so let's just see really quick look at some of these numbers they're like one percent four percent ten percent okay so yeah what we want to do is order on this percent population infected so let's go ahead and do that and let's do that descending so the descending gets the highest number first my goodness 17 percent so what percentage of your population has gotten COVID it's been reported and and we can see that now so the very first one small population so it doesn't surprise me but if you look right down here so that's that 32 million that we were talking about that's that max of total cases which is the the highest number of our infection count so we have 33 so we're I mean we're we're right up there on the list let's look for other large countries I mean it's us you know there's Israel there's Belgium Portugal France so you know we're up almost to about 10 in a lot of these countries so some some of us including the United States we are we are in there as well some of us has have really high percentage rates we just did not keep it under control and you know a large amount of the population has gotten it that's what this one shows now let's look kind of at the sad side of things we were just looking at how many people were infected let's look at how many people actually died so let's do let's comment and we'll say this is gonna this is showing the countries with the let's do highest high am I spelling that right yeah highest death count per population um now how are we gonna do this let's copy this off the bats but I don't know if we're gonna do it the exact same way because we just need location and not much else honestly so let's get rid of all this stuff but we do need we're looking at the highest death count so like we did up here with the max total cases we're gonna do max and then we'll do total deaths I hope it's like this total deaths um and then we'll do as total total death count um and we'll order that by the total death count see I don't need this I think I yeah I need to group by because there's an aggregate function and let's try this really quick okay so if you're getting this there's a there's a simple slash confusing explanation to this total deaths right now let's go into our COVID deaths columns okay let's show the total deaths which is right here it's an nvarchar 255 it's an issue with the data type um or wait total deaths no no no total deaths right here it's an issue with the data type um it just has to do with how the data type is read when you use this aggregate function we need to convert it um or cast it which is what we're actually do we need to cast this as an integer so that's read as a numeric um why I cannot 100% give you a perfect explanation for it but this happens all the time you just need to look at the data and realize oh it's probably because this data type let's try something else um and then it'll work so let's cast this and we're in casting it I find it's just easier but just as int boom there you go so now we're taking this nvarchar 255 over here and then we are converting it to an integer now let's run this um and let's give her this just for visuals visual purposes now we are much more accurate but we have a slight issue or we're now seeing a slight issue with our data in our data in the location section we have a few ones that really shouldn't be there ones like world or africa um or south america these are grouping entire continents so let's go back up to our um let's go back up here and let's do actually let's pull it up really quick because this is just part of exploring the data and figuring it out so if we scroll down um we're gonna find we're gonna see one like right where is it right here this this location is all of asia whereas in other ones the continent is asia if I can pull one up real quick so like right here the continent is asia whereas before the location is asia but if you also notice um the continent is null here so what we need to do is say uh where continent is not null because when it is null that means that this location is actually an entire continent and we don't want that um that may be helpful for us um later on but it is not helpful now so now this right here will get rid of that um and just knowing that figuring that out now we can add that to every every script um and we can do you know you don't have to do this i'm just doing this for you know visual purposes i'm not gonna do that for everyone um so let's say where continent is not null and now let's look at this and now you can see that the united states is number one and so number one is not the best thing to be number one in but we have a death count of 576 000 and again i i googled this earlier these numbers are pretty accurate there some of them are like a day or two behind give me a second i'm gonna take a sip of water they're like a couple days behind um this number is actually higher um and as you know as we continue to have more people die unfortunately that number just continues to go up um so the data that you download may be a lot higher um as of right now we've been breaking everything out by location right really quickly let's just do this by something we kind of saw earlier um and i'm just gonna do this for breaking it up purposes what i'm gonna say i'm gonna do caps lock let's break things down by continent i spell continent continent jeez is that even i spell it let's keep going um but now we can do continent right here and we'll just copy and paste that let's get that back up here um and now we can see where continent is not null let's see if that makes that yeah okay so now it's breaking out by continents um with north america south america asia europe africa oceana is this perfect no no it's not perfect um north america looks like it's only including the numbers from the united states and not canada um so we have some small issues in here um but for the purposes of what we're trying to do which i don't think anyone's gonna gonna come in here in fact check us or check the data they may and then you're you know they might be screwed but for the purposes of hierarchy um and you know drill that drill down effect in tableau which is something we are going to do we want to start including this continent in our in our queries so that we can drill down um further into these things um we can also do where this way i'm gonna do where is null um actually let me see so before we were doing more continent is not null but let's do location i'm just i i'm doing this on the fly i didn't i haven't done this before i just kind of am um doing this um this actually is the correct numbers and i don't know why i didn't do this before when i was actually creating this project but now this is a wonderful beautiful thing i believe this is the correct numbers um i could verify but i don't want to do that live because i'm i'm like stupid but i think this is accurate um remember before we were looking at the location and the location um and it was actually the countries itself and then there were ones where we did where is not null to get rid of all the ones that were like world and all those other things well now i'm just filtering on those instead of deleting them before we were looking at everything but these now we're only looking at these and these numbers look a lot more accurate so with that being said um i'm gonna use this going forward in my strips i'm gonna kind of change things up to where from what i originally had um let me see though because if that is the case it may screw up our drill down effect which is highly unfortunate i make i honestly might just revert back to it for the pure fact that we want the visualizations to look correct um just know that this is the right way and if you want to go back and do that i highly encourage that i didn't figure that out they're my first time around um but i'm willing to admit when i'm wrong let me see what let me do a time check i run it like 15 minutes or so i think we're gonna we're just gonna keep going all the way through i i don't think we're gonna stop um i don't think we're gonna stop in this project so we want to do some of the above the above queries were kind of what we were going for nothing crazy difficult right nothing crazy hard um and now we want to we want to start breaking this out by um consonant as well i'm gonna go back and is this correct let me look no so is not no um so we want to start doing some of the above queries but adding that continent in there you can even go back and add that as well um if you want to that's totally fine i'm gonna do some more queries down here um or at least one one or two more and then we're gonna start getting i think into some a little bit more advanced things we're gonna start getting some temp tables uh stuff like that because we're going to eventually set these up in views so that we have these views to um use for tableau later and again it shows you not a creative view so that's important so we we've we've done this first one this next one is going to let me go down one more this is showing the continents with the highest death count so almost the exact same as we did before but now we're looking at the continents um we can even go up and look at uh just wait we literally just did that um so that's what this one is actually looking at my notes wrong idiot okay perfect um now you know we want to start looking at this from a viewpoint of i'm going to visualize this so how do we do that well we want to look at let's look at some global numbers um you can do as many of these as you want anything up here just add continent to it um anything where it's like group i just replace it with continent and you and you got it um so i don't want to go through and do every single one of those but that is kind of the gist of what you might want to do especially if you want that drill down effect and if you don't want that is um you know it's like clicking on north america and then when you bring up north america then it shows all the countries in north america so canada uh and the united states and so it's a drill down so you look like on africa and then there's all the african countries that's what drilling down does and that's what you can do when you have um those layers so you have the continent then you have the location um so you know i'm not going to we'll look at that when we actually get to tableau but i don't want to actually spend all the time writing that out um but what we now want to do is we want to calculate everything for the across the entire world so let's do this let's say um breaking let's do global let's just say global global numbers easier easier than nothing um all right uh let me really quick find the i think it's probably the first one the death percentage let me let me see if this is one that we want okay let me see all right so let's take this one i'm sorry that took me a while to find again i'm not cutting any of this stuff out you just got to stick with me you if you're sticking with me this long i know you care i know you're not you're not cutting away because i'm trying to figure things out on my side so um let me get rid of this so this is the exact same screw up well let's say where just so we can get the right numbers um so we are now going to look at the global numbers uh so we're not going to we're not going to include any location any continent or anything like that but we do want to make sure that we're only looking at all of the countries and we're not looking at the world numbers plus all the countries because then the numbers would get astronomical so instead of now now we can't do so let's try running this really quick so now we really can't do this um because now it's breaking everything out by um by you know that uh which is the dates it's breaking everything out by the date because these total cases the numbers are different right so really quick let's group by date and now let's see what it looks like uh it's going to give us an error obviously that's because we're looking at um that's because when we're looking at this we're looking at multiple things and we can't group by just the dates obviously if we wanted to group by something which we need to do we then need to start using aggregate functions on everything else so really quickly let's do some aggregate functions i'm looking at my notes for just a second to see what i did basically what we want to do and i think what'll make things easier is i mean i could try to do the sum of max total cases i don't think that's possible let me comment this out really quick yeah um it's because there's an aggregate function within an aggregate function and we can't really do that um if we go back to the data and you we kind of looked at this earlier there's one called new cases um let's use this because instead of doing max we can just sum it or do a sum on it and that's going to give us the sum of all the new cases which adds up to the total cases so if we do this let's see this will give us on each day the total across the world because we're not filtering by any continent or or we're filtering out um like the world and in the actual continents we're not filtering by location or continent or anything it's just by date so we're looking at the sum of the new cases so now let's do let's do the sum of new underscore deaths and we can run that one operating a data type and varchar is invalid for the sum operator so going back and this is something that I encountered a lot when I was doing this is these new cases is a float which is why it's working in the sum but the new deaths is an nvarchar so what we need to do again is cast that as an integer it's just the easiest thing to do um and now that one should work so um let's get rid of the well let's get rid of down to here so we're about to do another one and that's going to be our death percentage globally across um across the I guess the world so we need to do the sum of I think it's we need to do new deaths all right divided by the sum of new cases all right times 100 uh let's see what this takes us um okay of course we're getting the same thing let me um let me put this right here and see if this works um invalid oh that's because this that's because this was new cases the new deaths one is right here and let's run this and now we are looking good um and as you can see the death percentage is right here we have 91 um and let me give these I don't we can't let me go back real quick and just say as total cases as total deaths and let's run that again okay and so across the world these are our numbers so we have total cases on that very first day that cases were starting to be reported there were 98 total cases there was one total death that gives a death percentage of 1 across the country or across the world I'm going to as we scroll down it gets lower and lower and that's because we have a lot of people who have gotten infected or the total cases um and again that's per day right so if we remove this all together that date all together which we can do right now this will uh this will give us the total cases which is oh gosh let me read this through 150 million versus 3 million 180 206 so overall across the world we are looking at a um a death percentage of a little over 2 percent so interesting numbers you can keep both of those queries separate if you'd like um you know they might come in handy later but let's do this so we have um one second check on my notes again because I just want to make sure I'm not doing something stupid all right all right so again we have a whole another table that we haven't used yet it's this COVID vaccinations um and just to you know refresh your memory let's do um let's look at the table from portfolio project COVID vaccinations let's jog our memory on what we've got here so we have um we have these tests we have vaccinations over here which was what we're actually going to be using um excuse me uh that's what we are going to be using so let's join these two tables together uh and let's let's actually just do from actually let's just do this whole thing from let's do COVID deaths and here's how we're going to join it so we're going to say join and we're going to say oops wait that is wrong join and we're going to say on so what are we going to join them on um we're going to join them on two things we're going to join them on location because that's much more specific than the continent we're going to join them on location and we're going to join them on date let's call this one DEA let's call this one vaccination so a little alias for these is that we don't have to type out this entire table name each time so let's do DEA dot location is equal to VAC dot location and DEA dot and we'll say date is equal to VAC dot date and let's just see what we get really quick so we'll have all of these things and let's look at Granada 0717 and let's go all the way over here and it should have Granada 0717 so just making sure that they were joined correctly for this query what we're going to do is look at the total population and let's do that right here so looking at total population versus vaccination so how many what what is the total amount of people in the world that have been vaccinated that is that is what we're going to do in this query right here so let's do DEA dot continent location DEA dot date again these are going to be the same in either one but we have to specify let me just for example if we do population population oh actually that's a terrible example because population is only in one let me go back really quick let me say I only write date that's going to give me an error because there's date in both of them in fact we joined it on them so we know there's date in both of them so it's going to give us an error we just have to specify what table we want to pull it from so I'm going to do DEA and DEA dot population just to keep it consistent and now we're going to have the next one DEA dot I mean let's do new vaccinations and really quick let's just look at this and let me get my orders because I want it to be organized I I actually want let's do one two three I don't like it when it's not organized that bothers me so we're looking at oh no I also need to add or continent is not null there we go uh DEA perfect now let's run this this should look much better there we go all right we are in fact if we want to look at Afghanistan like we have normally been doing um in previous ones we do two slash three so there's our population here's our new vaccinations now let's see we're going to go back go down and let's see they have vaccinations starting on two eighteen um if we go even further down let's just go to who's this Canada oh yeah Canada will be a good one to look at these are doing vaccinations on right here so 1215 I mean they started very early and the numbers only increased and now they're you know doing this is per day right so this is 288 thousand in one day so that's you know really high numbers but this is the number of new vaccinations there is a column called total vaccinations in this table but we're going to do something pretty just to display again this whole portfolio project is to show potential employers that you know how to do certain things so I want to set up opportunities to do that we're not going to use the total vaccinations we're going to use this new vaccinations which is new vaccinations per day um so we want to we want to know or do kind of like a rolling count um out here so as this number let me go back to the beginning as this number increases 718 2300 4179 we want it to add up over here it's a pretty cool thing I mean you know it's once you see it you'll be like oh that's pretty easy but you know we're going to be using partitioned by we're going to be using um this is a windows function so it's really good to to showcase I think so we're going to do and let's do we need to do the sum because we're going to be adding these together so we need to do the sum of new vaccinations oops we need the sum of new vaccinations let's do over and we're going to say partition oh gosh partitioned by and we need to partition by the location first and foremost because we're breaking it up by if we do it by consonant the numbers are going to be completely off we need to do it by location location and and also partly the date but you'll see that in just a second but we need to partition it by breaking it up by um location and why is that because every time it gets to a new location we want the count to start over we we don't want this aggregate function to just keep running and running and running it'll ruin all of our numbers we only want the this partition it's a partition on the location so that it runs only through canada and then one gets to the next country it doesn't keep going um and if we only did that by the way let's look at what this looks like uh okay real quick we need to cast this um as an integer like we've been doing in the past you can also do um real quick I want to show you another one convert and I think it's comma integer or is it integer comma let me try integer comma I think it's that way actually and you can do it this way as well that is up to you um you know either one is totally fine if you want to use both that's even better because then it kind of shows you can do both but they basically do the exact same thing so let's go down and let's see what what's happening here so it goes down to albania and since we're partitioning on albania albania their total amount of vaccinations is 347 000 I know that going into it because it has it on every single stinking row but down here they started to add they started to add up right but we didn't do that we only partitioned on location so it added it did the sum of all the new vaccinations by that location so what we need to do is go over here and say order by and we need to order it by both the location oops da dot location and the date that is very important uh the date is what's going to separate it out um and you'll see in just a second what I mean so now let's run this and let's go back down to albania I think it was so here's albania let's go to our first one so here's what we have we have 60 and it gives us 60 then we add 78 so we add 60 plus 78 equals 138 then 78 plus 178 I'm sorry 60 plus 78 plus 42 equals 180 then 60 plus 178 plus 142 plus 241 so you get the point it adds up every single consecutive one and when there's nulls or there's zeros it's going to not add anything it's just going to keep it going and then you can see as it's it's a rolling count so we're going to name this let's do as um let's do as um rolling people vaccinated let's call that um think that's good now what we want to do is actually look at the total population versus the vaccinations um and really what we want to do is use this rolling people vaccinate want to use the max number because at the very bottom is our max number this is how many people in albania um we want to use that number and divide it by the population to know how many people in that country are vaccinated so what we want to do is we'll do this we'll do rolling people vaccinated divided by population times 100 and as you can see we're getting an error you can't use a column that you just created to then use the next one so what we need to do is we need to create either a cte or a temp table this is at this is the time of of the show of this tutorial whatever you want to call it where i'm going to give you some options you can do one you can do both you know there's no preference to me um but we're going to take this and we're going to at least for this first one we're going to use a ct so we're going to say excuse me we're going to say width and let's call it pop versus back i don't know population versus vaccination and then all we need to do is specify the um basically the columns that we're going to input um so let's put as and let's insert that down here because what we need to do is we want to say um we're going to do continent oh gosh i'm so bad at spelling continent location date population um and then we'll have this rolling people vaccinated that should be it um and let's see if there's maybe we just need to close this parentheses so this is our cte it should be working um actually that's not true i need an open parentheses here that's why it's giving me that error um let's see it's i'm still getting an error so let me see if i'm doing something wrong i have this in parentheses there and there i say width pop back there's continent location dates population uh i believe that is the issue so then we need we just need to add that last column new vaccinations um if the number of columns in the cte is different than the number of columns here it's going to give you an error so you got to make sure um and then let's just say for real for right now select everything from and we'll do and we can even say pop versus back it'll come up right away so really quickly let's run this and see what happens the order by clause can't be in there i knew that but whoops let's comment that out yeah let's get that all the way up here let's run this so now that query that we were looking at before is now in here but now we can actually use it to perform further calculations so we'll just do everything comma and then we'll do rolling people vaccinated divided by and that needs to be population times 100 i'm pretty sure this is incorrect give me a second um invalid object oh it's gonna have to run it with the cte my bad um so let's look at this percentage really quick um it's not wrong it's actually going to give us a rolling number and this may actually be what we want um so basically what it's doing is taking this column and doing it versus this column and so this number should only increase because as this number increases this number will increase because the population stays stagnant again i'm kind of looking at this as we go so right now 12 percent of the population in um albania is vaccinated so that you know that is that's all we know i don't think we need to go any further than that i think um if you want to you can look at the max one um but you'll have to get rid of date and just keep the location um population etc because the date is gonna throw everything off so if that's something you want to do absolutely do that um and you can use a temp table here we can look at how to do that really quickly i think um so that you guys know how to do that again i recommend throwing in one or two of these um like even up here here you can do different um different counts and then do one for each um so let's do temp table all right so it's going to be a lot of the same stuff we're going to keep this and this is going to be what we insert so let's say insert into and we need to write where we're inserting it into but let's say again i'm only doing this for it's going to be basically the same it's going to have the same effect but um with a temp table so we're going to do a temp table and let's look at um let's say let's call percent population vaccinated and we need to specify our columns so let's go down here excuse me let's go down here and let's do the basically the exact same thing so continent i think i spelled that right i almost did i got really confident we'll do we and and just so you know for these we have to specify the data type as well um because we're basically creating like a genuine table it's just a temporary one so let's do inverter char 255 we'll do um location we'll do the same thing and var char 255 we need to do date and we'll do that as date time we'll do population and we can do i mean there's lots of different ones we can do but we'll do numeric for this example there's new underscore vaccinations and let's do that one as numeric again you can use different things um and then we'll do rolling people vaccinated um this can be numeric as well and then we need to insert that into here okay so we're inserting the data and then down here we can actually select it and let's let's take this and do right here except we're going to be doing this by this right here um it hasn't been created yet but it will be created in just a second okay so you let me see it yeah so these were the rows that were affected um and we and then we got our actual output from this right here now let's say you wanted to change something in here you're like oh you know i don't want to do it wear this let me comment that out and then let me do this and um create that table again oh no we got an error oh how can we get around this very simple i've done this and i should do this in a different one you can do drop table if exists and then do this right here um and when we run this it should give us our output i highly recommend just adding this especially if you plan on making any alterations so that when you um run it multiple times you don't have to you know go and then delete the view or delete the temp table or drop temp table or you know it's just built in it's at the top it's easy to maintain and it looks good it's it's something that a lot of people do and so if you have that at the top of your query and somebody you know somebody who wants to hire you looks at this they make oh okay that makes sense i'm glad they included that they know what they're doing this guy's smart i should hire them um now what we're going to do is i feel like i've showed you as much as i can show you um with the limited data that we've looked at again i could have done this for six hours straight if i had used all the data at least i mean there's just so much data but let's create a view you know i'm only going to show you how to create one view but i want you to go back and create multiple views you know if this is one that you want to look at these global numbers um let's look at this one really quick if you want to look at this number right here toss it in a view i mean that one doesn't make sense to toss in a view but this one toss these numbers in a view um and we're we're going to um look at it in tableau later but for right now let's just create our view um so like let's just say creating view to store data for later visualizations all right so let's say create view um and i wanted i'm just going to keep the same thing um like that um and for views it's so easy i mean i'm literally just going to and i can even take um the order by i believe we'll see if i'm correct um actually let's get rid of both these things so this says create view percent percent populate oops percent population vaccinated um and let's see am i doing anything wrong here let me see the order by clause oh i was completely wrong i was wondering why i was getting that now let's try running it okay so we ran successfully let's look at our views it's not going to be in there let's refresh it hey look we got our very first view we can open that up like a table if we want to is it i mean it's gorgeous if you want to get rid of that select or sorry control shift r that's a refresh um and now it it basically recognizes it but let's go back here for a second um and you know we can now query off of that it's a view now so you know it's it's something that you can it's permanent you know you have to go in and actually delete it's not like a temp table this is now permanent and this could be something that we now use for a visualization later so do some of these look at some of the queries that we've looked at and create a few of these views um and we will use them later um normally in a normal setting uh if i was actually working i would put some of these in actual like i would call them like a work view or a work table or something set aside so that i can use them consistently um but i would also set them aside so that i could connect tableau to that view now we're going to be using something called Tableau public that'll be in the very next tutorial unfortunately um let me see if i can show you i can't show you tableau public does not connect to sql databases um and that's because it's free and i totally get it you have to pay for the upgraded version but i am not a a billionaire okay i cannot afford uh the real version of tableau i'm also not like a student or like something where i can get it cheap so i'm not paying for that so we're gonna use tableau public and and i recommend this anyways because anybody can access it it's it's free for anybody so we're going to be using tableau in the next one to actually visualize a lot of these things i want to get at least five visualizations we're going to create a dashboard it's going to be a beautiful beautiful thing all right so the very last thing that we are going to do is we are going to actually save this and then put it into github and i just want to show you how to do that that's where we're going to be storing our code at least for now um so let's go up here let's click file let's click save as um i've already have multiple versions of this let's just push v2 i'm going to save that so we have this save now i'm going to go over here i'm going to go to my github now if you don't have an account i highly recommend getting an account so you can start putting your portfolio projects in here of course we're not going to put our tableau one in here but our sequel ones and our python ones you can put in here again i'll talk a lot more about how we actually want to display this in github or other places but what we're going to do for this is we're going to create a new repository let's call this one word folio projects make it public we'll create the repository we'll do all that extra stuff later so what we now want to do is upload an existing file we'll click right there go to choose files and we'll click this latest one that we saved um we'll open it and we can always change the name of it later on and you can add notes if you'd like but we'll commit that change so we'll actually upload this uh this file um but let's look at it really quick and yeah i'm gonna go back and i'm gonna use the real one where has the formatting and and the notes that i have that i wanted to add in there but as you can see you know you can see all of the queries that we wrote and this is fantastic so if somebody comes in here you know we'll have more notes and kind of better comments on what they do um and what the takeaway is this from for a hiring manager to you know when they actually look at this so this is a really really good place to start again this may not be your optimal place to put this i'll give you a few different options in a later video about how we can actually potentially improve upon this i'm really looking forward to getting more portfolio projects done so we can actually start building a complete portfolio uh if you've stuck around all this way i just want to say congratulations i mean i know this was a long video i know that it took a long time but you stuck with me uh you put in the hard work and that is fantastic and i really hope that it pays off and i hope that this has been helpful thank you for watching we'll have a lot more videos in the future on these portfolio projects and i'm i'm just really really looking forward to doing them to be honest so thank you for sticking with me thank you for watching i really appreciated if you like this video be sure to like and subscribe below and i will see you in the next video