 What is going on everybody? Welcome back to another video. Today we are starting our data analyst portfolio project series. Now before we jump into our first project I wanted to talk with you for just a second so that we're all on the same page. First thing is that there are going to be four projects. The first one is going to be SQL and we're doing a lot of data exploration and we're setting up a lot of our data to visualize it in Tableau. Tableau is going to be our second project. In our third project again we're going back to SQL but we're going doing a lot more of the ETL process so a lot more of the data cleaning. I did that one as the third project because I think it's going to be a little bit more advanced than this first project. I tried to make it as beginner friendly as possible so even if you are a complete beginner as long as you've walked through the tutorials that I have made on my channel you should be pretty good. And then the fourth and the final project will be with Python. We'll be using a lot of pandas doing a little bit of data cleaning and then doing visualizations as well. As I said just a second ago I'm trying to make this as beginner friendly as I possibly can. The whole point of the series is that if you are trying to apply for a data analyst job by the end of the series you should have an entire portfolio or at least a really good start at a portfolio to show a potential employer. I give you full permission to copy every script every query line for line if that is what you want to do and create your own portfolio. I am totally fine with that but I will encourage you and I'm sure I'll say this throughout the video. I encourage you to try to think of your own queries try to think of your own insights and your own things that you can do to make this portfolio project unique. With that being said I'm super excited to get started on this with you guys so let's jump over to my screen and get started on our very first project. All right so now that we are on my screen we are going to get started on this project we're going to download the data set we are going to format it just a little bit in Excel and then we're going to get it into SQL where we will start querying it. I will say that I think this is going to be a very long video I'm hoping to keep it under an hour and a half. I may separate this into two videos depending on how long it runs but you know I will do my best to keep it short but we have a lot to get through. I'm going to basically do no cuts I'm that's my goal is to do no cuts in this because I want to walk you through each step of the process so that you understand everything that's going on and I you don't get lost at some point but I think this is probably the best way to do it we'll see the very first thing we're going to do is download our data set so you know as we are looking at this there's an option right here to download the data set I don't recommend that one you can it just won't give you all the information that I personally want which is to go back to like the very beginning if you go down right here to the very first graph you can actually push this back and then download it and what this will do is it will go back to I think January 1st of 2020 so let's open this one up and when we get in here we're going to reformat it just a little bit it's nothing too complicated I hope I'm just going to double click here actually let me let me go up here and filter just in case you want to filter anything so what we have here is a ton of information on COVID I mean just a ton and it goes back to early 2020 I believe it does go back to the first of 2020 so really quick a really brief introduction of what kind of data is in here we have total cases new cases total deaths new deaths we use those quite a bit in the the queries that are coming up if we go way over here we have total vaccinate vaccinations people vaccinated um and then over here a little bit farther we have a population that's the main stuff we're going to be working with today as you can see there's so many other things in here I mean you can use this if you want to go back and do more stuff on this I highly recommend it there's such you know there's so such unique data in here about smokers and diabetes and like all this random stuff that I did not do a de-diving I mean I could I could spend you know a month just like looking at this data set and and getting really interesting stuff from it um but I'm not going to do that I wanted to do this faster than two months to complete what we're going to do um is we're going to go back over here we're going to take this population and we're going to click on this as and we're going to click control x and that's going to cut it we're going to go back to the very beginning and we're going to place it right here and we're going to right click and say insert cut cells now why are we doing this because I've already done this entire project um and if you don't do this you're going to do a join with every single query you do which if you want to do that keep it there and then just you know change your query for for that I did it like this because I wanted to show joins later on I wanted to keep it kind of simple at the beginning um and then work my way to a little bit more advanced things which you will see it gets you know semi advanced but not too much I promise um you stick with me let's go back over here we're going to go to actually double a and then we're going to click control shift right key that's going to select everything over here and we're going to literally delete it okay this is going to be our first table over here so everything you see over here is our first table um and we're going to save that so let's save as I'm just going to keep it in my downloads as and let's do covid deaths so that has our death information the next one is going to include our um vaccination information which is what we're going to join on and then um we're going to do that later so let's let's hit control z that's going to bring it back now let's select on z and go all the way to e and we're going to do the same thing we're going to delete this looks like there's no data but I promise there is later on the vaccinations um like total vaccinations if we go down you can see that that starts on in February the end very end of February in 2021 that's because vaccinations are you know didn't come out until recently now let's save this file and we're going to save as instead of covid deaths we'll do covid vaccinations all right now let's save that so now we have our two excels that we want we need to get them into sequel we're going to go over to sequel and we're going to create a portfolio project database I've already done this all you have to do though is right click like a new database type in portfolio project and then click okay and it will create your database for you if you open up the tables it should be empty and that's where we're going to put these two excel files now I had a ton of trouble actually importing these excels I mean I tried everything and I eventually just went down a rabbit hole of how to get these in I don't know if it's me or or what but I could not figure out how to do it if you go to portfolio project you hit tasks and you hit import data that may do it for you and it may work um it did not work for me uh it just it kept giving me errors so what I would recommend you do right off the bat just to make sure that we're doing the same thing um and you can do it that way if you want I went over here to start again I'm on a windows and I went down to Microsoft sequel server 2019 and clicked import and export it looks the same but for whatever reason it it all the research I did it has to do with the 32 bit versus the 64 bit when you do it this way it goes to the 64 bit and it is able to import the data if you do it the other way it was doing it the 32 bit version and gives you an error I don't understand it don't ask me that's that's the reason that's I mean I went down a huge rabbit hole but this one works so let's go over here and this is going to be our data source where's the data coming from it's an excel file so let's do that let's browse and let's go over to my downloads I thought I saved in downloads maybe because it's an excel workbook what was I saving before oh that's a csv okay something important to note is we're doing an excel and not a csv you're gonna get the same error I'm just doing it live and I'm making myself look stupid so um we're gonna save it but instead of a csv we're gonna save it as an excel workbook let's save that um now we have to go back to how it was right here um the same way and we're gonna file save as and let's do this is now COVID deaths and save it as a workbook now we have them now let's go back um now we have our COVID deaths and our COVID vaccinations let's do our deaths first let me get back right here so it looks kind of more normal um so we have our excel file we have our COVID deaths let's go next and now we have to say where we're gonna place it where's our destination so we're gonna click over here and go down to sequel server native client 11.0 I want to say this is something that I messed up and it took me like 45 minutes to figure out it was the stupidest mistake um it's gonna auto populate a server name and I never checked to confirm that this was my server name and so I couldn't figure out why I wasn't able to insert this into my portfolio project uh database that's because mine is 01 I created two different servers um intentionally and for whatever reason I forgot that and so all I had to do is add 01 over here so just make sure yours is is the same thing click portfolio project click next yes we're going to copy the data should auto populate if it doesn't if you give you like multiple you know always uh check mark on the one that you think is the right one it should be the first one we'll click next we'll just click finish I'm sure it says run immediately we'll click finish and finish now while this is running um there should be around 89,000 that's how it was like a week ago when I started it maybe a little more now because there's extra days um with that being said you know this is going to be a good size amount of data we're about to do a lot of different things we're going to start at the very basics of just like querying the table like super simple um and then we're going to go into things like joins ctes temp tables creating views um the whole purpose of what we're about to do is not to it's not to keep it too simple um I want to showcase to a potential employer right that you can do more advanced things so I'm gonna probably do I mean I'm looking at because I have already done this entire project individually I mean we probably got like 15 to 20 queries here you don't have to do all of them um I'm going to walk through all of them and you can choose which ones you want but you don't have to do all of them it is quite a few so just know that so there's 85,000 right here that's fantastic it won't drop immediately you need to refresh it and there we go so that's our COVID vaccinations let's get rid of this so we just have COVID vaccinations um I thought that was our COVID deaths one but maybe I'm wrong um but let's do the exact same thing down here and we will import say next we're gonna go down to excel and browse and now we want to do the COVID deaths apparently last time we did the vaccinations which um I actually actually you know what I bet what it did was it took yeah it took this right here as COVID vaccinations but that was the deaths one as it saved so uh forget that let's go right here let's do the COVID vaccinations it just has the same sheet name so sorry for the confusion destination is going to be the exact same place it's going to be sequel server native client let's add that 01 and let's click refresh portfolio project next next um like I said before if it does this just click the first one it's going to be COVID vaccinations it did that for the COVID deaths that's because I made the mistake earlier I hope you I hope when you're watching this you aren't super confused um the whole point make two tables or make two excels and one should be COVID deaths one should be COVID vaccinations upload them and then rename them in a nutshell so we have the same amount let's refresh this this one is actually the COVID vaccinations this one is COVID deaths I'm telling you this stuff is it confuses me sometimes to be honest but we're going to query this really quick to make sure we are actually doing what we're supposed to be doing so let's do select everything from um let's do portfolio project and you can do dot dbo or you can do dot dot I tend to just do that because it's easier um let's look at this one make sure it's the right table so we have total cases new cases perfect um and let's order on let's do three comma four just to make sure we'll order by of course um just to make sure that we have all everything that we're looking for so this looks right this looks like our excel let's copy this let's go down here we're going to do COVID vaccinations and let's run this one make sure the second one came in correctly as well so perfect so we have our two tables this is fantastic news um and now we can get going um we can keep this one I'm going to comment it out in case you know we want to come back to it um I'm going to really quick again right here I have another laptop I have already done this whole project so I'm just using it as a guideline to know kind of what I'm doing next so that I don't waste everyone's time um so really quickly let's just let's select the data that we are going to be using you don't have to use these comments I will say that I'm going to specify I'm going to say hey this comment is something I would keep in your portfolio project I'm going to add a bunch of extra stuff that is not needed um just for your purpose but when you are creating your portfolio project you shouldn't be adding some of the things that I'm going to be commenting um on so we're going to do um or actually let's do really quick let's copy this so that it kind of knows what we're doing select the location uh the date the total cases the new cases the total deaths and then population uh now where we're at I'm going to turn off my camera because it's going to get it's going to start getting in the way to be honest I don't want it to interfere with your ability to see what we're doing on screen so it's been great seeing you guys I'm going to turn this off and we will continue from here all right that should be turned off so let's keep running so this is what we're doing let's actually let's keep this going because I I don't like things not being organized um so we have our location oh no we want to do one too we want to do it based off the location and the date makes things everything easier I promise you so we're going to be the first one's obviously Afghanistan here's our date we have our total cases our new cases total deaths and population so really quick I'm just going to scroll down just a second um they started having you know the the total deaths it's um it started about a month after they got their first case it looks like so and then it just like ramps up a lot um and we're going to be diving into all these numbers what they mean how to do some really simple calculations on them um but really quickly we're just going to do again a super simple calculation um and one that we do multiple times for different things um so let's go right down here and let's say we're going to be looking at the total cases versus total deaths so how many cases are there in this country and then how many deaths do they have per um you know how many deaths they have for their entire cases so let's say they have a thousand people who have who've been diagnosed they had 10 people who died what's the percentage of people who died who had um who had it so uh let's go right down here and we're gonna I'm just gonna copy this really quick it's just gonna make our life easier I think you should do the same as well um so we have location date total cases um and we're gonna get rid of our new cases because we don't need that one in this query right here uh in order to do this population so let's work on our calculation really quick it should be super super easy let me make sure I'm still recording perfect oh man we're 25 almost 25 minutes in um or more because I have the intro so now we're gonna do uh we want to know the percentage of people who are dying who actually get infected or or or who um report being infected so we're gonna do um total underscore deaths we'll go right down here and we're gonna divide that by the total cases total cases and if we do this really quick um what it's gonna have and well let's go down to where there's actually numbers so we have 34 we have one um it's it's showing 0.029 percent if you ever try to get a percentage of something you have to multiply times 100 um so let's do that really quick all you have to add is that what's that the asterisk sign um times 100 um and while we're here let's just add the um what's called alias let's do let's call this death percentage uh I don't know that that works for me and let's take a look at this it'll be a little bit more accurate accurate so when there were 34 there was one and that gives us a 2.94 percent death rate and we can go down even further um and this is still all afghanistan now let's go down to the very bottom let's go down to the very very bottom so as of as of today yesterday there were 59,745 total cases in afghanistan and there were 20 2,625 deaths which is 4 percent so you have a 4 percent chance basically right now of dying I mean if you want to look at like that 4 percent chance of dying if you get it and you live in afghanistan um let's I mean we you don't have to but really quick just to look at it further let's look at where the location um I think it's let's say like real quick because I'm not 100 percent if it's states um it should I think it's united states but yeah so I mean I live in the united states if you don't you can look at your country but um you know this is like this is genuine real reported data so it's really interesting um right at the beginning I mean though I don't know if it was the way we were reporting or what but we had really high percentage rates um as we go down we're looking at five percent six percent I mean this was the peak of it this got really bad in the U.S. um maybe I hope it gets better um how many are we at this is I'm going to go to the end of this year when you're sitting around two to three percent um yeah it goes down to under two percent so at the end of the year we were looking at over two million people that's two million no wait twenty million nine three six three wait 20 million people who have been infected that's a lot that's a lot of 20 million people who have had it 35,000 or 352,000 deaths by the end of the year that's a lot um let's keep going um and at the very end we had over 32,346,971 that's a lot of people who have been infected um there's a lot of deaths 576,000 and I verified this number um I googled it google knows all I googled this number and it's pretty accurate um and it's really sad that's a lot of lot of lives um and that's 1.78 percent so as of right now if you're to get it today a rough estimate is around one uh and three fours to two percent chance that you're that you could die from it um so really interesting numbers this is the kind of exploratory stuff that that you know we're going to be doing we're going to get a lot more advanced as we go on but this shows you know the likelihood um and we can I'm going to write that shows the likely I hope I'm spelling is right I'm not spelling is right likely I hope that's right if it's not I apologize uh likelihood of dying if you contract uh COVID in your country um again rough estimates but you know just glancing at the data that's kind of what we're looking at um now we're going to look at and let's go down here let's look at looking at the total cases versus the population again we're going to do a lot of this like percentage stuff um it it's pretty simple um that will only last for so long I promise you but it'll be really I'm going to keep it on the states just because I'm going to be looking at that one the most because it obviously it's pretty relevant to me um so if you're in another country filter it by your country you'll be really interested in the stats I know I was really really really um shocked by a lot of the things that we're going to find today so we're going to keep the location we're going to we're going to keep the date to keep the total cases um but let's change this to population and then instead of um the total cases being here we're going to put the total cases there and then change this to population so what is this going to do for us this is going to show us what percentage of population has gotten covid so shows what percentage of population oops got covid um some of these things again they're they're good to know um the one that I upload to github will have the notes that I recommend keeping um again not everything in here is um not everything in here is what you know you need to have in there this is mostly just you know what I think you guys need to see while we're actually taking this out all right so let's take a look at this um actually I want to change this I want to put this right here just as easier for me visually um just for because the total cases right here so our population in the us is around 331 million um so at the beginning when we had one case I mean that it's like nothing let's keep scrolling um and see where we get to one percent so one percent that's three million three hundred and eleven thousand three hundred and twelve people and that happened in what is that august august of last year so one percent of the population let's keep going all the way down again we're just kind of glancing at this we're about 10 percent again we're at that 32 million so 10 percent of the population has has gotten it gotten a test and it's been confirmed so really interesting um you know we'll come back to that one I'm sure in the future I you know we might make we might use this one as like um a visualization again I'm only looking at the states or the United States right now but you know think about it in terms of how we're going to visualize this in the future because a lot of what we're doing we're going to visualize in the future um in Tableau I have Tableau even open right here you can see I have a map um this is just a soup I threw this together in like two seconds um we have the uh we have the location and so you know this is like our future this is what you need to be envisioning when you're looking at this data so we have you know Afghanistan and let's just scroll through Belarus and Bolivia and Bulgaria and Cambodia all the every single country um that that is reporting so we're just looking at the states but remember all of these are going to be used so just something to remember um I want to know and I'm really curious as to what countries have the highest um infection rates compared to the population so we're just looking at our population um up here um how are we going to do this we'll do actually let me say well let me write it out really quick so let's look looking at countries with highest infection rate compared to population so that's what this script is going to do or this query is going to do I'm going to copy this um so we're going to keep the location we are not going to keep the date because this is not going to be date specific there's just going to be overall and then we're going to look at the max of the total cases so we only want to look at the highest so when we're looking at the US we had 32 million we don't want to look at every single pop um of the total cases we only look at the very highest one so we're going to look at the max total cases um and let's right here we'll just say give an alias at least something to recognize it so highest um I guess we can say infection count so we'll say highest infection count it's the highest infection count per country um so per location um and then we want to also take because it's going to it's not going since we don't have max total cases here if we just kept total cases here it'll give us the same one that we're looking at in this above query what we need to do is we need to look at the max of this um so we're going to look at max and just add a parenthesis there um and we'll look at this isn't the death percentage anymore I forgot to change that in this last one this is what is this percent of population infected so let's change that for both of these because I don't want to get confused when you're looking at the column headers later um so we'll look at the percent of population infected let's run this and see what we get uh list is not contained in either the aggregate oh I need to add a group by of course um so let's add a group by um and we need to group by both the population and the location so let's try that really quick let's see if this works awesome um well we ordered on location and population but I really want to look at the highest um so let's so let's just see really quick look at some of these numbers they're like 1 percent 4 percent um 10 percent okay so yeah yeah what we want to do is order on um this percent population infected so let's go ahead and do that uh and let's do that descending so the descending gets the highest number first um my goodness 17 percent so what percentage of your population has gotten COVID it's been reported and and and um we can see that now so the very first one small population so it doesn't surprise me but if you look right down here so that's that 32 million that we were talking about that's that max of total cases um which is the the highest number of our infection count so we have 33 so we're I mean we're we're right up there on the list let's look for other large countries I mean it's us you know there's Israel there's Belgium Portugal France so you know we're up almost to about 10 percent in a lot of these countries so some some of us including the United States we are we are in there as well some of us has have really high percentage rates we just did not keep it under control um and you know a large amount of the population has gotten it that's what this one shows um now let's look uh kind of at the sad side of things we were just looking at how many people were infected let's look at how many people actually died um so let's do let's comment and we'll say this is gonna this is showing the countries with the let's do highest high am I spelling that right yeah highest death count per population um now how are we gonna do this let's copy this off the bat but I don't know if we're gonna do it the exact same way because we just need location and not much else honestly so let's get rid of all this stuff but we do need we're looking at the highest death count so like we did up here with the max total cases we're gonna do max and then we'll do total deaths I hope it's like this total deaths um and then we'll do as total total death count um and we'll order that by the total death count see I don't need this I think I yeah I need to group by because there's an aggregate function and let's try this really quick okay so if you're getting this there's a there's a simple slash confusing explanation to this total deaths right now let's go into our covid deaths columns okay let's show the total deaths which is right here it's an n bar chart 255 it's an issue with the data type um or wait total deaths no no no total deaths right here it's an issue with the data type um it just has to do with how the data type is read when you use this aggregate function we need to convert it or cast it was what we're actually do we need to cast this as an integer so that's read as a numeric why I cannot 100% give you a perfect explanation for it but this happens all the time you just need to look at the data and realize oh it's probably because of this data type let's try something else and then it'll work so let's cast this and we're in casting it I find is just easier but just as int boom there you go so now we're taking this n bar chart 255 over here and then we are converting it to an integer now let's run this um and let's give her this just for visuals visual purposes now we are much more accurate but we have a slight issue or we're now seeing a slight issue with our data in our data in the location section we have a few ones that really shouldn't be there ones like world or africa um or south america these are grouping entire continents so let's go back up to our um let's go back up here and let's do actually let's pull it up really quick because this is just part of exploring the data and figuring it out so if we scroll down um we're gonna find we're gonna see one like right where is it right here this this location is all of asia whereas in other ones the continent is asia if i can pull one up real quick so like right here the continent is asia whereas before the location is asia but if you also notice um the continent is null here so what we need to do is say uh where continent is not null because when it is null that means that this location is actually an entire continent and we don't want that um that may be helpful for us um later on but it is not helpful now so now this right here will get rid of that um and just knowing that figuring that out now we can add that to every every script um and we can do you know you don't have to do this i'm just doing this for you know visual purposes i'm not gonna do that for everyone um so let's say where continent is not null and now let's look at this and now you can see that the united states is number one and so number one is not the best thing to be number one in but we have a death count of 576 thousand and again i i googled this earlier these numbers are pretty accurate there some of them are like a day or two behind give me a second i'ma take a sip of water they're like a couple days behind um this number is actually higher um and as you know as we continue to have more people die unfortunately that number just continues to go up um so the data that you download may be a lot higher um as of right now we've been breaking everything out by location right really quickly let's just do this by something we kind of saw earlier i mean i'm just gonna do this for breaking it up purposes but i'm gonna say i'm gonna do caps lock let's break things down by continent i spoke continent continent geez is that even i spell it let's keep going um but now we can do continent right here and we'll just copy and paste that let's get that back up here um and now we can see where continent is not null let's see if that makes that yeah okay so now it's breaking out by continents um with north america south america asia europe africa oceana is this perfect no no it's not perfect um north america looks like it's only including the numbers from the united states and not canada um so we have some small issues in here um but for the purposes of what we're trying to do which i don't think anyone's really gonna come in here in fact check us or check the data they may and then you're you know they might be screwed but for the purposes of hierarchy um and you know drill that drill down effect in tableau which is something we are going to do we want to start including this continent in our in our queries so that we can drill down um further into these things we can also do where just way i'm gonna do where is null um actually let me see so before we were doing more continent is not null but let's do location i'm just i i'm doing this on the fly i haven't done this before i just kind of am doing this um this actually is the correct numbers and i don't know why i didn't do this before when i was actually creating this project but now this is a wonderful beautiful thing i believe this is the correct numbers um i could verify but i don't want to do that live because i'm i look stupid but i think this is accurate um remember before we were looking at the location and the location um and it was actually the countries itself and then there were ones where we did where is not null to get rid of all the ones that were like world and all those other things well now i'm just filtering all in those instead of deleting them before we were looking at everything but these now we're only looking at these and these numbers look a lot more accurate so with that being said um i'm going to use this going forward in my script so i'm going to kind of change things up to where from what i originally had um let me see though because if that is the case it may screw up our drill down effect which is highly unfortunate i make i honestly might just revert back to it for the pure fact that we want the visualizations to look correct um just know that this is the right way and if you want to go back and do that i highly encourage that i didn't figure that out they're my first time around um but i'm willing to admit when i'm wrong let me see what let me do a time check i run it like 50 minutes or so i think we're gonna we're just gonna keep going all the way through i i don't think we're gonna stop um i don't think we're gonna stop in this project so we want to do some of the above the above queries were kind of what we were going for nothing crazy difficult right nothing crazy hard um and now we want to we want to start breaking this out by um continent as well i'm gonna go back and is this correct let me look no so is not no um so we want to start doing some of the above queries but adding that continent in there you can even go back and add that as well um if you want to that's totally fine i'm gonna do some more queries down here um or at least one one or two more and then we're gonna start getting i think into some a little bit more advanced things we're gonna start getting some temp tables uh stuff like that because we're going to eventually set these up in views so that we have these views to um use for tableau later and again it shows you know how to create a view so that's important so we we've we've done this first one this next one is going to go down one more this is showing the continents with the highest death count so almost the exact same as we did before but now we're looking at the continents we can even go up and look at uh just wait we literally just did that um so that's what this one is actually looking at my notes wrong idiot okay perfect um now you know we want to start looking at this from a viewpoint of i'm going to visualize this so how do we do that well we want to look at let's look at some global numbers um you can do as many of these as you want anything up here just add continent to it um anything where it's like group by just replace it with continent and you and you got it um so i don't want to go through and do every single one of those but that is kind of the gist of what you might want to do especially if you want that drill down effect and if you don't want that is um you know it's like clicking on north america and then when you bring up north america then it shows all the countries in north america so canada uh in the united states and so it's a drill down so you look on africa and then there's all the african countries that's what drilling down does and that's what you can do when you have um those layers so you have the continent then you have the location um so you know i'm not going to we'll look at that when we actually get to tableau but i don't want to actually spend all the time writing that out um but what we now want to do is we want to calculate everything for the across the entire world so let's do this let's say breaking let's do global let's just say global global numbers easier easier than nothing um um all right uh let me really quick find the i think it's probably the first one the death percentage let me see if this is one that we want okay let me see right so let's take this one i'm sorry that took me a while to find again i'm not cutting any of this stuff out you just got to stick with me you if you're sticking with me this long i know you care i know you're not you're not cutting away because i'm trying to figure things out on my side so um let me get rid of this so this is the exact same screw up let's say where just so we can get the right numbers um so we are now going to look at the global numbers uh so we're not going to we're not going to include any location any continent or anything like that but we do want to make sure that we're only looking at all of the countries and we're not looking at the world numbers plus all the countries because then the numbers would get astronomical so instead of now now we can't do so let's try running this really quick so now we really can't do this um because now it's breaking everything out by um by you know that uh which is the date it's breaking everything out by the date because um these total cases the numbers are different right so really quick let's group by date and now let's see what it looks like it's going to give us an error obviously that's because we're looking at um that's because when we're looking at this we're looking at multiple things and we can't group by just the dates obviously if we wanted to group by something which we need to do we then need to start using aggregate functions on everything else um so really quickly let's do some aggregate functions i'm looking at my notes for just a second to see what i did basically what we want to do and i think what'll make things easier is i mean i can try to do the sum of max total cases i don't think that's possible let me comment this out really quick yeah um it's because there's an aggregate function within an aggregate function and we can't really do that um if we go back to the data and you we kind of looked at this earlier there's one called new cases um let's use this because instead of doing max we can just sum it or do a sum on it and that's going to give us the sum of all the new cases which adds up to the total cases so if we do this let's see this will give us on each day the total across the world because we're not filtering by any continent or or we're filtering out um like the world and in the actual continents we're not filtering by location or continent or anything it's just by date so we're looking at the sum of the new cases so now let's do uh let's do the sum of new underscore deaths and we can run that one um operator data type and varchar is invalid for the sum operator so going back um and this is something that i encountered a lot when i was doing this is these new cases is a float which is why it's working in the sum but the new deaths is an n varchar so what we need to do again is cast that as an integer it's just the easiest thing to do um and now that one should work so um let's get rid of the well let's get rid of down to here so we're we're about to do another one and that's going to be our death percentage globally across um across the i guess the world so we need to do the sum of i think it's we need to do new deaths all right divided by the sum of new cases all right times 100 uh let's see what this takes us um okay of course we're getting the same thing let me um let me put this right here and see if this works um invalid data oh that's because this was new cases the new deaths one is right here and let's run this and now we are looking good um and as you can see the death percentage is right here you have 91 um and let me give these i don't we can't let me go back real quick and just say as total cases as total deaths and let's run that again okay and so across the world these are our numbers so we have total cases on that very first day that cases were starting to be reported um there were 98 total cases there was one total death that gives a death percentage of 1 percent across the country or across the world i'm gonna as we scroll down it gets lower and lower and that's because we have a lot of people who have gotten infected or the total cases um and again that's per day right so if we remove this all together that date all together which we can do right now this will uh this will give us the total cases which is oh gosh let me read this through 150 million versus 3 million 180 206 so overall across the world we are looking at a um a death percentage of a little over 2 percent so interesting numbers you can keep both of those queries separate if you'd like um you know they might come in handy later but let's do this so we have um even one second check on my notes again because i just want to make sure i'm not doing something stupid all right all right so again we have a whole another table that we haven't used yet it's this COVID vaccinations and just to you know refresh your memory let's do um let's look at the table from portfolio project COVID vaccinations let's jog our memory on what we got here so we have um we have these tests we have vaccinations over here which was what we're actually going to be using um excuse me uh that's what we are going to be using so let's join these two tables together uh and let's let's actually just do from actually let's just do this whole thing from let's do COVID deaths and here's how we're going to join it so we're going to say join and we're going to say oops wait that is wrong join and we're going to say on so what are we going to join them on um we're going to join them on two things we're going to join them on location because that's much more specific than the continent we're going to join them on location and we're going to join them on date let's call this one DEA let's call this one vaccination so a little alias for these is that we don't have to type out this entire table name each time so let's do DEA dot location is equal to VAC dot location and DEA dot and we'll say date is equal to VAC dot date and let's just see what we get really quick so we'll have all of these things and let's look at Granada 0717 and let's go all the way over here and it should have Granada 0717 so just making sure that they were joined correctly for this query what we're going to do is look at the total population and let's do that right here so looking at total population versus vaccination so how many what what is the total amount of people in the world that have been vaccinated that is that is what we're going to do in this query right here so let's do DEA dot continent location DEA dot date and again these are going to be the same in either one but we have to specify let me just for example if we do population population oh actually that's a terrible example because population is only in one let me go back really quick let me say I only write date that's going to give me an error because there's date in both of them in fact we joined it on them so we know there's a date in both of them so it's going to give us an error we just have to specify what table we want to pull it from so I'm going to do DEA and DEA dot population just to keep it consistent and now we're going to have the next one DEA dot I mean let's do new vaccinations and really quick let's just look at this and let me give my orders because I want it to be organized I actually let's do one two three I don't like it when it's not organized it bothers me so we're looking at oh no I also need to add or continent is not null there we go DEA perfect now let's run this this should look much better there we go all right we are in fact if we want to look at Afghanistan like we have normally been doing in previous ones we do two slash three so there's our population here's our new vaccinations now let's see we're gonna go back go down and let's see they have vaccinations starting on 218 if we go even further down let's just go to was this Canada oh yeah can it'll be a good one to look at these are doing vaccinations on right here so 1215 I mean they started very early and their numbers only increased and now they're you know doing this is per day right so this is 288 thousand in one day um so that's you know really high numbers but this is the number of new vaccinations um there is a column called total vaccinations in this table but we're going to do something pretty just to display again this whole portfolio project is to show potential employers that you know how to do certain things so I want to set up opportunities to do that we're not going to use the total vaccinations we're going to use this new vaccinations which is new vaccinations per day um so we want to we want to know or do kind of like a rolling count um out here so as this number let me go back to the beginning as this number increases 718 2300 4179 we want it to add up over here it's a pretty cool thing I mean you know it's once you see it you'll be like oh that's pretty easy but you know we're going to be using partitioned by we're going to be using um this is a windows function so it's really good to to showcase I think so we're going to do um and let's do we need to do the sum because we're going to be adding these together so we need to do the sum of new vaccinations oops you do the sum of new vaccinations let's do over I'm going to say partition oh gosh partitioned by and we need to partition by the location first and foremost because we're breaking it up by if we do it by continent the numbers are going to be completely off we need to do it by location location and and also partly the date but you'll see that in just a second but we need to partition it by breaking it up by location and why is that because every time it gets to a new location we want the count to start over we we don't want this aggregate function to just keep running and running and running it'll ruin all of our numbers we only want the this part it's a partition on the the location so that it runs only through canada and then one gets to the next country it doesn't keep going and if we only did that by the way let's look at what this looks like uh okay real quick I need to cast this as integer like we've been doing in the past you can also do um real quick I want to show you another one convert and I think it's comma integer or is it integer comma let me try integer comma I think it's that way actually and you can do it this way as well that is up to you you know either one is totally fine if you want to use both that's even better because then it kind of shows you can do both but they basically do the exact same thing so let's go down and let's see what what's happening here so it goes down to albania and since we're partitioning on albania albania their total amount of vaccinations is 347 000 I know that going into it because it has it on every single stink and row but down here they started to add they started to add up right but we didn't do that we only partitioned on location so it added it did the sum of all the new vaccinations by that location so what we need to do is go over here and say order by and we need to order it by both the location oops da dot location and the date that is very important uh the date is what's going to separate it out um and you'll see in just a second what I mean so now let's run this and let's go back down to albania I think it was so here's albania let's go to our first one so here's what we have we have 60 and it gives us 60 then we add 78 so we add 60 plus 78 equals 138 then 78 plus 178 I'm sorry 60 plus 78 plus 42 equals 180 then 60 plus 178 plus 142 plus 181 241 so you get the point it adds up every single consecutive one and when there's nulls or there's zeros it's going to not add anything it's just going to keep it going and then you can see as it's a rolling count so we're going to name this let's do as um let's do as um rolling people vaccinated let's call that um think that's good now what we want to do is actually look at the total population versus the vaccinations um and really what we want to do is use this rolling people vaccinate we want to use the max number because at the very bottom is our max number this is how many people in albania we want to use that number and then divide it by the population to know how many people in that country are vaccinated so what we want to do is we'll do this we'll do rolling people vaccinated divided by population times 100 and as you can see we're getting an error you can't use a column that you just created to then use in the next one so what we need to do is we need to create either a cte or a temp table this is at this is the time of of the show of this tutorial whatever you want to call it where i'm going to give you some options you can do one you can do both you know there's no preference to me but we're going to take this and we're going to at least for this first one we're going to use a cte so we're going to say excuse me we're going to say with and let's call it um pop versus back i don't know population versus vaccination and then all we need to do is specify the uh basically the columns that we're going to input so let's put as and let's insert that down here because what we need to do is we want to say we're going to do continent oh gosh i'm so bad at spelling continent location date population and then we'll have this rolling people vaccinated that should be it and let's see if there's maybe we just need to close this parentheses so this is our cte it should be working um actually that's not true i need an open parentheses here that's why it's giving me that error um let's see it's i'm still getting an error so let me see if i'm doing something wrong do i have this in parentheses there and there i say with continent location date population uh i believe that is the issue so then we need we just need to add that last column new vaccinations um if the number of columns in the cte is different than the number of columns here it's going to give you an error so you got to make sure um and then let's just say for real for right now select everything from and we'll do and we can even say pop versus back it'll come up right away so really quickly let's run this and see what happens uh the order by clause can't be in there i knew that but whoops well let's comment that out okay let's get that all the way up here let's run this so now that query that we were looking at before is now in here but now we can actually use it to perform further calculations um so we'll just do everything comma and then we'll do rolling people vaccinated uh divided by and that needs to be population times 100 i'm pretty sure this is incorrect give me a second invalid object oh it's gonna have to run it with the cte my bad um so let's look at this percentage really quick um it's not wrong it's actually going to give us a rolling number and this may actually be what we want um so basically what it's doing is taking this column and doing it versus this column and so this number should only increase because as this number increases this number will increase because the population stays stagnant again i'm kind of looking at this as we go so right now 12 of the population in albania is vaccinated so that you know that is that's all we know i don't think we need to go any further than that i think if you want to you can look at the max one um but you'll have to get rid of date and just keep the location um population etc because the date is going to throw everything off so if that's something you want to do absolutely do that um and you can use a temp table here we can look at how to do that really quickly i think um so that you guys know how to do that again i recommend throwing in one or two of these um like even up here you can do different different counts and then do one for each um so let's do temp table all right so it's going to be a lot of the same stuff we're going to keep this and this is going to be what we insert so let's say insert into and we need to write where we're inserting it into but let's say again i'm only doing this for it's going to be basically the same it's going to have the same effect but um with a temp table so we're going to do temp table and let's look at um let's say let's call percent population vaccinated and we need to specify our columns so let's go down here excuse me let's go down here and let's do the basically the exact same thing so continent i think i spelled that right i almost did i got really confident we'll do we and and just so you know for these we have to specify the data type as well um because we're basically creating like a genuine table it's just a temporary one so let's do and varchar 255 we'll do um location we'll do the same thing and varchar 255 we need to do date and we'll do that as date time we'll do population and we can do i mean there's lots of different ones we can do but we'll do numeric for this example there's new underscore vaccinations and let's do that one as numeric again you can use different things um and then we'll do rolling people vaccinated um this can be numeric as well um and then we need to insert that into here okay so we're inserting the data and then down here we can actually select it and let's let's take this and do right here except we're going to be doing this by this right here um it hasn't been created yet but it will be created in just a second okay so you let me see it yeah so these were the rows that were affected um and we and then we got our actual output from this right here now let's say you wanted to change something in here you're like oh you know i i don't want to do it wear this let me comment that out and then let me do this and um create that table again oh no we got an error oh how can we get around this very simple i've done this and i should do this in a different one you can do drop table if exists and then do this right here um and when we run this it should give us our output i highly recommend just adding this especially if you plan on making any alterations so that when you um run it multiple times you don't have to you know go and then delete the view or delete the temp table or drop temp table or you know it's just built in it's at the top it's easy to maintain and it looks good it's it's something that a lot of people do and so if you have that at the top of your query and somebody you know somebody who wants to hire you looks at this i think oh okay that makes sense i'm glad they included that they know what they're doing this guy's smart i should hire them um now what we're going to do is uh i feel like i've showed you as much as i can show you um with the limited data that we've looked at again i could have done this for six hours straight if i had used all the data at least i mean there's just so much data but let's create a view you know i'm only going to show you how to create one view but i want you to go back and create multiple views you know if this is one that you want to look at these global numbers um let's look at this one really quick if you want to look at this number right here toss it in a view i mean that one doesn't make sense to toss in a view but this one toss these numbers in a view um and we're we're going to look at it in tableau later but for right now let's just create our view um so like let's just say creating view to store data for later visualizations all right so let's say create view um and i wanted i'm just going to keep the same thing um like that um and for views it's so easy i mean i'm literally just going to and i can even take um the order by i believe we'll see if i'm correct um actually let's get rid of both of these things so this says create view percent percent populate oops percent population vaccinated um and let's see am i doing anything wrong here let me see the order by clause oh i was completely wrong i was wondering why i was getting that now let's try running it okay so it ran successfully um let's look at our views it's not going to be in there let's refresh it hey look we got our very first view we can open that up like a table if we want to um is it i mean it's gorgeous um if you want to get rid of that select or sorry control shift r that's a refresh um and now it it basically recognizes it well let's go back here for a second um and you know we can now query off of that it's a view now so you know it's it's something that you can it's permanent you know you have to go in and actually delete it's not like a temp table this is now permanent and this could be something that we now use for a visualization later so do some of these look at some of the queries that we've looked at and create a few of these views um and we will use them later um normally in a a normal setting uh if i was actually working i would put some of these in actual like i would call them like a work view or a work table or something set aside so that i can use them consistently um but i would also set them aside so that i could connect tableau to that view now we're going to be using something called tableau public that'll be in the very next tutorial unfortunately um let me see if i can show you i can't show you tableau public does not connect to sequel databases um and that's because it's free and i totally get it you have to pay for the upgraded version but i am not a a billionaire okay i cannot afford the real version of tableau i'm also not like a student or or like something where i can get it cheap so i'm not paying for that so we're going to use tableau public and and i recommend this anyways because anybody can access it it's it's free for anybody so we're going to be using tableau in the next one to actually visualize a lot of these things i want to get at least five visualizations we're going to create a dashboard it's going to be a beautiful beautiful thing all right so the very last thing that we are going to do is we are going to actually save this and then put it into github and i just want to show you how to do that that's where we're going to be storing our code at least for now um so let's go up here let's click file let's click save as um i've already have multiple versions of this let's just push v2 i'm going to save that so we have this save now i'm going to go over here i'm going to go to my github now if you don't have an account i highly recommend getting an account so you can start putting your portfolio projects in here of course we're not going to put our tableau one in here but our sequel ones and our python ones you can put in here again i'll talk a lot more about how we actually want to display this in github or other places but what we're going to do for this is we're going to create a new repository let's call this one work folio projects make it public we'll create the repository we'll do all that extra stuff later so what we now want to do is upload an existing file we'll click right there go to choose files and we'll click this latest one that we saved um we'll open it and we can always change the name of it later on and you can add notes if you'd like but we'll commit that change so we'll actually upload this this file but let's look at it really quick and i'm going to go back and i'm going to use the real one where has the formatting and and the notes that i have that i wanted to add in there but as you can see you know you can see all of the queries that we wrote and this is fantastic so if somebody comes in here you know we'll have more notes and kind of better comments on what they do um and what the takeaway is this from for a hiring manager to you know when they actually look at this so this is a really really good place to start again this may not be your optimal place to put this i'll give you a few different options in a later video about how we can actually potentially improve upon this i'm really looking forward to getting more portfolio projects done so we can actually start building a complete portfolio uh if you've stuck around all this way i just want to say congratulations i mean i know this was a long video i know that it took a long time but you stuck with me you put in the hard work and that is fantastic i really hope that it pays off and i hope that this has been helpful thank you for watching we'll have a lot more videos in the future on these portfolio projects and i'm i'm just really really looking forward to doing them to be honest so thank you for sticking with me thank you for watching i really appreciate it if you like this video be sure to like and subscribe below and i will see you in the next video