 What's going on everybody welcome back to another video. Today we are going to be solving hard SQL technical interview questions. These hard interview questions are something that you would get in kind of a medium or a senior level data analyst position. The easy and the medium are more towards the entry level or slash mid level somewhere in that range. The hard ones are not something that you're going to get in the kind of more entry level range. These are questions that you might see in kind of a more advanced SQL technical interview. I've been on the interview sign where I've interviewed for a ton of data analyst positions. I was also a hiring manager and then even before that I was on a hiring team where we conducted a ton of SQL technical interviews. And so the questions that we're going to look at today are very very similar to ones that I have seen in the real world or even given myself to interviewees. With that being said let's jump on my screen and take a look. All right so we're here on analystbuilder.com. We're going to go over here to the questions tab. We're going to filter to the free ones and then we'll go to the hard ones. Now there are a lot more hard ones here on analystbuilder but under the free tab we only have three. Looks like I didn't get this one right but we have that one today. Well so we'll see if I get this one right today. You can go try out these questions completely for free and we're going to be taking a look at temperature fluctuations and Kelly's third purchase. Now there's another one called cake versus pie which I think may be the hardest of these three. So you might want to go ahead and try to take that one and see if you can get it. Now this is Kelly's third purchase. We'll start with that one and then we'll do temperature fluctuations. We'll see how quickly I can do these two because these are hard but I think we can do it. So let's look at Kelly's third purchase. It says at Kelly's ice cream shop Kelly gives a 33% discount on each customer's third purchase. Write a query to select the third transaction for each customer that received that discount. Output the customer ID transaction ID amount and the amount after the discount as discounted amount. Order the output on customer ID in ascending order. Note transaction IDs occur sequentially. The lowest transaction ID is the earliest ID. Now that's really important we'll have to remember that. Now before we jump in anything let's um let's go look at the data but then let's start making some notes. So we have a customer ID. We have the transaction ID and the amount that they spent. Now this is the amount that eventually we'll need to use to calculate the end amount that they paid with the discount. So they get 33% off whatever this number is on their third purchase if that if you're tracking that. So what we need to do and let's start making some notes. One we're going to need to apply a discount. So that's going to be 33%. We have to identify though the person's third purchase. So when they come in three times on that third purchase they then get to get that discount. So how can we do that? Well I'm almost certain just looking at this data because we have 1001 for a customer ID, 1001, 1001. What we can do is we need to order this transaction ID and then give it some type of rank. Now because each transaction ID should be unique and we'll double check that but it should be unique we should just be able to use row number but we could also use rank or dense rank um but they should get the exact same output for each of them. It shouldn't matter. So I think using just row number um and then filtering when it equals three. So we're going to apply a row number based off the customer ID and the transaction ID and then for each customer we'll give it a row number and then when it's three which is the third transaction that's the one we give the discount to. In our output let's look at what our output is going to be. Our output is going to be let's take a look. Select the third transaction output, customer ID, transaction ID, amount. So all columns, all columns with and I'll just copy this discounted amount. So really everything with just that new column and then we need to order by the customer underscore ID. So we got a lot to do uh this this definitely doesn't look like of course is a difficult question it's a hard one but it doesn't look like a super straightforward one. So the first thing that we need to do is we have to identify this row number because we cannot apply the discount until we know which data to apply it to. The output in the order by will come at the very end. So let's look at this let's do um let's do a comma here we'll come down here let's do row number now this is a window function it's it you know if you haven't used these before you haven't taken like my full course and and you know work through these things row number uh is a window function that's going to apply to a window or kind of like a group by is is what I compare it to. When you group by all of those customer IDs with 1001 are going to be grouped into one row with uh a window function they aren't going to be grouped into one row they'll just be in a window where you'll see each row individually and you can apply something to each row instead of grouping it and aggregating the data. So it's really unique and really useful. So we're going to do row number and what we need to do this is over the transaction ID and we need to order by order by transaction ID and that's going to be ascending. So the earliest one it says lowest transaction ID is the earliest so we need to start with the earliest then go to the highest and pick the third one. So let's just run this and I need to do this over I said row number I didn't write that right at all so we'll do over and then we write it. So we're doing we're applying this row number the over is the keyword that we use to specify that this is what we are doing it on and so now we have this and we can't just do this because it's applying the row number appropriately based off of only the transaction IDs from from lowest to highest. Here's the thing though we have to do it per each customer so it's each customer's third so we have to use partition by before the order by. Now partition by is going to separate it out by the customer ID it's kind of that's kind of like the grouping part and so we'll use partition by customer ID and why did I copy that by customer ID and now let's run this and now when we come down it should say 1001 1001 and notice that we have this row number applying at the customer ID level and then when it gets to the last customer ID that goes to the next one it restarts so now this is that third person's transaction 1001 and then in 1002 this is the third transaction. Now here's the tricky part about trying to then use this row number is I cannot come down here and say where and let's label this we'll say as row underscore num let's run that uh whoops because I have this as blank let's comment that out real quick so I have this row num but I cannot say where row num is equal to three let's try it so it's going to say unknown column it doesn't understand that that's a column and you may be thinking well you know in aggregations with group by you can use the having statement well let's try the having and let's run this it says the window function is allowed only in the select list and order by clause we cannot use in the having so what we need to do is we need to actually make this as uh its own little output is what I'll say now we can do that in two different ways we can use a cte and we can use comment table expression to kind of store this data down here how it is and then we can query off it later or we can put it in a subquery or if you know we want to get really advanced and we're using um actual mysql database we could use something like a temporary table or a view or something we could do other things but for here let's wrap all of this in a um let me come right here let's wrap all this in a subquery and when you have uh a subquery in a from statement you have to label it so you have to give it a name so we're just going to call as row numbers and let's select everything and what this is doing is we're selecting everything from this data right down here this table that we've essentially created so what we're going to do is we're going to select everything now what we need to do is we need to say where row underscore num is equal to three and let's run this so now we have each person's row num three this is the third person's transaction now this is really good so what we need to do next is we need to then calculate this amount so it gets a 33 discount now so let's select the columns that we actually want in our output we need customer underscore id we need transaction underscore id we need the amount now we need to calculate the discounted amount remember we have to label this last one um we'll say as discounted amount so this next column is going to be this calculation now we have to give a 33 discount so we can't say amount times let me bring this down like this we cannot say amount times 0.33 let's run this and let me see I just spelled transaction id wrong transaction id that's it always gets me um this is actually a this is 33 of this number that's what this is now 33 of this number is not a 33 discount we're giving them a 67 discount what we want is 67 percent of the amount let's run this this right here is 33 off the total amount it's a discount so instead of paying 94 this person only had to pay 62.98 now the last thing we need to do the very last thing is order by customer id and it looks like it already is but i'm gonna do it anyways we'll do order by customer id ascending and let's run this this should be our final output and you know it took a little bit of work to get there we had to use this subquery but i'm pretty sure this is right let's go ahead and check this answer and there we go our solution is correct now remember there's other ways to write this there isn't just one way this is kind of the difficult part about hard interviews or like senior level data analyst interviews for for sql um technical interviews the difficult thing is there's not only one way to answer it and so it starts getting down to okay what's the best way to solve it walk through your thought process so everything that i just did where i walked through and i said okay i could use any of these but row number makes the most sense for this data understanding the difference between those and why i'm choosing one over the other um is really helpful for the interviewer to understand engage kind of your skill level that's why i recommend you write it out well but also talk about it the thought process is kind of the most important part now if you tried this question you could not get it or you couldn't solve it you can always get a hint you can take a look at the expected output or you can go up to the video explanation where i walk through this entire question or just go look at the solution and let's see if i wrote it the same way well i called it rn for row number but this is essentially the same although i wrote out the column names it's essentially the same but you could have done a cte uh with this as well but that is how you would solve this kelly's third purchase i will leave a link in the description if you want to try that one out now let's go up here let's go to temperature fluctuations so this question says write a query to find all dates with higher temperatures compared to the previous dates yesterday order dates in ascending order okay let's look at the data so we have our date over here and the temperature so it looks like for example this one this is the second of january this temperature was 70 the previous days was 65 so we want to identify this date and i think it's just the date to find all the dates with higher temperatures it looks like our output is just going to be the dates and i'll write that real quick output just date column now how are we going to do this how are we going to compare this uh initially there are two things that i think we could do one we could use a window function we could use a lag uh a lag function on this which would look at the previous rows data so if we ordered on the date which it already looks like it's ordered we can use the lag function to look at the previous value so here is 70 then we use the lag function that would pull over 65 over here that would perfectly find a way to do it the other way we could do it is we could do a self join so we could tie the table to itself but instead of doing it where the date is equal to the date we say the date minus one that's another way that we could solve this so you can go ahead and try whichever way you would like to i think i prefer the self join um i just think the last one we did a row number which is a window function so i don't want to do another window function right although you could solve this with a window function um i'm going to try a self join so let's pull this over um i think i'm going to do a self join but on the previous day where there's a one day difference so where the day is one day off is what i'll say um then we can use that to compare so use the temperatures to say where one is higher than the other and that should make more sense uh in just a second we start writing it out but now we also need to order by dates ascending that's it so we have our data down here and in order to do a self join we're just going to say um i'm going to say inner join but we can do if there's a different type of join you want to do you can also do that but i'm going to say on temperatures and we need to label these differently so we'll do t1 for temperatures one and then t2 so now it's like we have this table over here which is temperatures we have this table over here this temperatures and we're going to join them together well now what are we going to oh let's do t2 now what are we joining these together on um we're going to be joining this on the dates now there's a few different ways that we can write this but there is a function called date diff where we can take one date and compare it to a different date and make sure it's one day different so let's go ahead and take a look at that let's do um a join on and we'll do date diff and then we'll do t1 dot date and it's auto-populating it for us but date diff comma t2 dot and then we'll say date right there and it should be a one day difference let's try this let's run this and the reason why it's not pulling up is because we have all the same uh column names now when it has the same column name it's just showing up as just one one is overlaying the other so we're going to do is t1 dot date comma t1 dot temperature and let's get rid of that then we're going to label these other ones different so we'll do t2 dot date as date 2 and then we'll copy this and we'll do t2 dot temperature as temperature 2 now let's run this and let's see what happens so we have uh this date compared to the previous date this date compared to the previous date so three versus two and let's keep going four to three five to four six to five and so every single date has the previous date now what we can do is we can compare this temperature to this temperature now we can do that in a where statement or we could just do it in the join we can make it a conditional uh part of the condition in the join maybe I'll write out both but let's say and we'll do t1 dot uh t1 dot temperature is greater than t2 dot temperature so let's run this so now we have 70 compared to 65 and it looks like the other one wasn't as high so the third day is gone 58 compared to 55 90 compared to 58 82 compared to 70 88 compared to 82 so these are all of our dates right here in this column that this temperature was higher than the previous day's temperature and so what we should be able to do is just get rid of all of these columns and just take the date and let's run this and there we go and all we have to do now is order by and I think it's already correct but I'm just gonna I always like to write it out if it's asking us to do it take do this in ascending it's let's run this you know in ascending order I didn't write that oh yeah here I did I wrote it right here so now this looks correct to me let's go ahead and check our answer and there we go our solution is correct now again there's multiple different ways to solve this genuinely off the top of my head I could think of two probably another one another third option um just off the top of my head because I've been using my sequel for a while walking through that in your interview saying I think I could do it in this way or this way but here's why I'm choosing this way that really tells an interviewer this guy knows what he's talking about or girl this person knows what they're talking about they understand it they get it I can trust that this person will know how to do the work that we're going to give them if we hire them you want to give them a lot of confidence that's all I'm going to say now if you don't know how to do this or you've never done something like this before that's what this platform is for um so that when you get into those interviews or you know you want to learn and go take a course when you get into those interviews you can confidently say I know this skill uh and so if you had trouble with that one you can always go to hints you can always go to the expected output video solution solution um let's see how I solved it here I solved it the exact same but I called it I could have solved it a different way I think the lag function would have done just as well uh it may have been simpler actually so this is the one that I used but honestly the lag function in a window function may even be better so we solved this Kelly's third purchase we solved this temperature fluctuations I will leave links in the description for both of those go ahead and try those out yourself and again even back in the question section there's this cake versus pie uh which is probably the most difficult of the hard ones under the free tier now if you go back under the you know there's a where you can pay for a subscription under those there's like 20 hard questions and they're all very unique very different focusing on data cleaning window functions uh different types of joins and they're all really unique uh and fun to do but this cake versus pie one is really interesting I want um I want you guys to go try this one I'll leave this one in the description as well there's really an interesting difficult question so those were our two hard sequel interview questions uh they were pretty challenging you know window function and then self join two things that are a little bit more complex and you'll see an easy and medium questions in the next lesson we'll be solving a very hard question so if you have not check out analystbuilder.com it's one of the best platform for data analysts I created all the content on there all the courses all the questions and we have so much more coming to the platform if you like this video be sure to like and subscribe below and I'll see you in the next video