 Hello everyone and welcome to another episode of Code Emporium where we are going to talk about some sequel Now sequel is something you use in your day-to-day job as a data scientist and From my experience for the past two years as being a full-time data scientist I want to show you the kind of sequel that we actually use and how it's really different from all the Theoretical stuff that you would potentially learn in school, too A lot of the sequel that we use is definitely overlapping with a traditional education system But I just wanted to make sure that you all know exactly how we use sequel in the industry And so if you want to know that this is the video for you I wanted to make this video a little different to where we're gonna be looking at a data set And I'm gonna be giving you a few like actual questions that you would potentially be solving with sequel where you could just take a pen or paper and also like maybe open a notebook or something like that and Just type away the sequel answers to this I'll give you some like five seconds to do it where you can just pause the video and then after that We can look at the solutions together So that we cover all of our bases from like select statements to joins to where clause is to even more advanced concepts like windowing functions and then at the end of the video I'm gonna show you a couple of applications Mostly from like previous videos of like how all of these concepts were tying together to actually You know how we would actually use sequel in creating Let's say a machine learning data set or just in machine learning in general and with that let's get started Alright, so first right here. We have this huge monster of General syntax of a sequel statement that I wanted to type out I did this because a select statement is literally what you will be doing for 99% of your querying in sequel and And this entire syntax kind of has about everything that you'll even need From selects with a from and you'll have a couple of joins Where clause is dealing with aggregations with group by and having and then we have like an order by to order your data And then limit to get a number of specified rows So this is kind of the gist of all you need to know But we're gonna take a day to set and actually work with some problems because this is a little hard to look at so first of all, I have the CSV of some Indian takeout food and They have like information here that I'm gonna scroll down to where each row kind of looks like this So it's an order technically each row of corresponds to an order But it's an it's more like a product for every order So you'll see that one order will span multiple rows with you know How many of each item we're getting the price of the items and whatnot, right? And I am just doing some Pandas manipulation right here just to get that data out and up to us and also like creating this extra date field so that It's so that the data is in a particular format that just easy to work with now I'm kind of like modifying date for for you know tips here I'm just kind of modifying this date to look like You know the year followed by the month followed by the date because in sequel you can kind of treat the data as a string and When strings are ordered and if you sort the strings this will actually sort the strings by date. So 2017 04 to 3 will actually come before even 2019 04 to 1 even as a string and not just like in a typical time series date format Just a little tid been letting you know So this is kind of one of the tables that we have in another table that we have in our data set is like an item table and a price So this is definitely the price of an individual item whereas this price I actually do not know what it is and What we could do is kind of use sequel to figure it out So if this is the orders table and this is the products table Let's jump to our first question of can we verify if the orders dot price that is this one over here Represents the item price itself or if it represents the item times the quantity price So there are situations where this quantity is two and the price is given So does this price represent the price of this entire row like two quantities of plain rice for example If it was there or would it just be one no matter what this quantity value is How can we do that by joining this product orders table to this products table? So just take out a piece of paper or just open a top of your notebook and try to write a sequel query for this in the meantime Though enjoy my pointless banter. Have you liked this video? Please like Have you subscribed to my channel? Please subscribe. I Hope that was enough time for you to write down a sequel query and right now We shall go through some results so first of all I'm taking the orders table and doing a join on the products table by Taking that the products item field is equal to the order item field So this is a typical just a typical join condition right here and in that case I also use kind of this as function, which is basically used for aliasing and Product start price is now called product underscore price, which you see appended to this table here And so What we can see is that the orders are actually the same and I think that if I were to even Just refresh all of these columns and just do some sampling Let's try to do some sampling and see what happens here So you can see that the plane the plane bottom is we have four of them here Where the price in the orders table is 80 cents whereas the price here the products is also 80 cents Which means that the order dot price actually represents just the price of a single item That's good information to know when you're kind of digging around your data set because a lot of the time You know you don't have You just don't know there's no description of some of these fields So you might have to like figure it out yourself a little bit But it's good to know very simple query, right? It's still with us great Now let's go to select and where clauses which items cost more than $10 Now I'm gonna try to try to write a query for this I'll give you some time and in the end. Let me just actually give you the let me let me scroll up right here Yeah, so that you can see kind of the table that we're dealing with right now In the meantime, have you joined our discord server because it's poppin It's really cool, and I think it would be an even better place if you joined Yeah, all right, so I hope you all have written that query out and we are now going to check it out So we want distinct items from just the product from just the products table where We want to see if the price is greater than $10, which is probably the easiest way to do it You can also do it from the orders table too But like the products table is definitely more simplified and just easier to work with in this case and So I've just displayed like the top 10 products in terms of pricing and so that is lamb to kabiryani chicken to kabalti lamb to kabalti prawn bati paneer to kabalti chicken to kabiryani Persian lamb biryani prawn balti Persian chicken biryani lamb Persian and the list will go on All right. Now, you know how selects and wears work and also the joins Let's take it up a notch and deal with aggregations of data Where we're going to be doing like the sum of values or the count of values or the max or min of certain groups of values So let's try this question now. How many orders were placed daily in 2019? So in this case, I'm looking for more of a time series chart where every day I just want to aggregate and count the number of orders that occurred. So how do we do this? Don't worry. I'll give you some time. Are you ready? I hope you're ready Because right now Let's get into it So we have the select statement again and this time. Well, we have a date that's kind of given in Well, I've already kind of it was technically in a time stamp format with like hours minutes and seconds But I just wanted to convert it into a proper date format and since I'm just casting at his date here And then after that, we are going to count the total number of distinct order numbers that exist Right and we're doing this per group and the grouping here is every date So every date we are going to count the number of orders the distinct number of orders that occur Where of course the date is in 2019 only So because like we're dealing with these dates as strings now It actually kind of makes sense that we transformed it into this what year hyphen month hyphen day because even in a string format we could compare dates and That's how we would do aggregations here And if we were to plot this out in a chart over here, it would look something like this Looks really beautiful. Looks really neat and tidy And this is I guess kind of like how you would kind of structure your data even in a time series problem For example, if you were dealing with it foreshadowing of what's to come Now apart from this and aggregations, we also use something called case statements which are used for conditionals kind of like if else or if and then Situations so let's say right now I want to add another column where we are going to categorize data based on sales So if every date will compute the sales, right? And if the sale date has over 30 sales, then we'll say it's a high yield day Otherwise if it's between 10 30, it's a medium yield day and other than that if it's lower than 10 It's a low yield day. So can you add an extra column to reflect this categorization with sequel? We'll give you a minute Wow one minute passed so fast. That's so cool. Wow time just flies when you're having sequel fun So first of all, this is also introduces the concept of nested queries where I'm literally copying the query that we took before to get the daily order counts and I'm calling it T. This is essentially an alias in sequel and then from this table T We are going to get the date the total number of orders and then we're gonna also add our Categorization so when the number of orders is greater than 30, it's high if it's less than 10. It's low Otherwise, it's a medium yield day and when you execute this query you get this results where 22 is Definitely between 10 and 30. It's medium 9 less than 10 is low So it works out pretty well. I hope you're keeping up and I hope this is fun Now next is probably one of the most important concepts that is not taught in Traditional education, but we use all the time at in like actual work in the industry and it's common table expressions which is basically a better way to manage your queries and Essentially, like if we took this previous query right here It can be a little hard to see in some of these nested queries and in many situations you might have like nested queries within nested queries which become very difficult to manage and If we wanted to convert this into using common table expressions because it creates a logical component It's easier to understand. We can do that So I would just take this chunk of code over here and then we call it daily orders So it's essentially taking that same chunk of code and let's just call it daily order So it's a nice little logical partition there And then we can treat this as if we treated any other table where we take daily orders Which is over here and then we just write the query as we did before So in my mind, this is a lot easier to read than something like this Especially when the conditions get get a little more tricky as they would in in the future Right Okay, cool. Now, let's go on to an advanced concept, which is windowing functions So let's say that we have this particular question right now What are the top three most expensive orders? Every single day. So every day, I just want to get the top three orders in terms of pricing, right? How do we do that? A little hint here is that you can break the problem down into three steps Where first you would get like the total price of all orders every single day And then you would start to rank those prices and then you would only filter for the top three. So We're gonna walk through the step-by-step situation, but after you take some time to think about it All right, hope we're done here. So let's now talk about this problem The first of all is get the total price of all orders every day Which is kind of what we have done But in this case like for every single order, we are going in every single order on every single date We are going to determine the total price So we're using an aggregation right here and we're grouping by of course like the first two two columns here So what this will lead to is well, you have the date with an order number and the total price of all items in that order Now what you want to do is rank those orders from most expensive to least expensive And essentially you can do that with well The this is like the main thing that we're adding here. It's it's this is a window This is a window aggregation function where basically we can just start ordering We can order these rows or results just in any way we want. So in this case, I'm partitioning by date So for every single date, I am going to be creating an ordered list Where the total price or the height where the higher price is on top and lower one is on bottom And that's kind of exactly what we're doing here if we just look here So it's kind of sorted with the rankings in just that order and it restarts for the second date Right so cool We got a ranking going on now for every order every day and now we just need to filter for the top three And that's exactly what we do in this third chunk where we're just saying where ranking is less than or equal to three and And everything else kind of remains the same and so you'll get a list that kind of looks like this where Every single date will have three records and the record will be correspond to the most expensive food items on most expensive orders on that day the top three at least and So I hope this kind of helps you understand like intuitively now. We've we've builds like how Joins aggregations case statements and windowing functions can all come together to make reasonably complex queries with pretty cool decisions Now we're gonna jump into a couple of applications of like how this could be used in machine learning so Specifically like I have two previous videos One of them is on time series forecasting as a regression problem And then also fraud detection as a classification problem and we can look at like how sequel works with both of them So this time series forecasting is actually a part of a previous video That I kind of I made a video on so if you want to check that on you're interested in time series forecasting be sure to check it out in this case I I have like I created the same kind of time series data set because it's it is using the same data set So I created the same time series forecasting over here But I wanted to model this as a regression problem and this is the kind of rows This is kind of what I wanted here So I created two features which is like what is the total orders that happened the last week as well as the last month and Try to determine the total number of orders that will happen in the next week, right? And for that I created this data set So you have features you have labels and it's created in a data set called queries slash Query slash base dot sequel so we can actually go to that And when we do you'll see all of the concepts that we just talked about are kind of embedded in here So we have CTE's over here. We have a CT to get distinct orders then total number of orders every week We have that now this by the way is a strange all of this is kind of like a strange query You can say accent or dialect, which is known as sequelite There are many different variations of sequel that you could try You know programming the stuff in for example, like my sequel postgres Sequelite snowflake, there's a ton out there. Just choose one whatever you're comfortable with and just roll with that Right now here is to get like the first feature for seven days Recent orders in the last 30 days and then I'm getting total number of orders in the next coming week Right and I'm using a bunch of joins over here to Accommodate for all of them and just join them together and create the data set that I showed you before So essentially we're just using all of the same fundamental concepts that I just showed you right now in Order to you know use this for machine learning so very useful at work And it's actually not much that you would need to know Now the next one is fraud detection for classification problems. So let's just go through that real quick right now So in this case, let's go fighting fraud that I pine bee And in this case, I modeled fraud as a classification problem where We have essentially a label which would be like hey does this is this a fraudulent transaction Zero or one. It's a binary label And then we have a slew of like all these all these features that we're using and putting into our model, right? Now, how did we create this? Well, let's go to the query slash base dot sequel file As as we always do Let's go here And voila, we have a very similar format that we saw in the time series the time series query as well Where we first start with well just getting the basic stuff for the transaction that we need Then this is a CTE for just getting past orders This is just the same thing for like looking at the past how many fraudulent orders are made in the past by this particular user And then we do the same thing for merchants. I'm creating two CTEs for past merchant sales and past Merchant sales that happen to be fraudulent and then we just join all those CTEs together in this way, right? And essentially we're using the exact same concepts including here We're also using case ones, but I guess what's extra here is this cast and coalesce So casting is used to basically turn You know from one data type to another in this case I wanted to make sure that the location the category all of these fields are strings or variable character sequences and then coalesce It's a it's a fun a fun little function that says hey if this thing is null like number of past transactions null Then that means by default set it to true Right so overall my entire point of like even showing this right now was that we could use the fundamental SQL ideas that I just talked about in the beginning of this video for even more Complex, you know creating machine learning data sets for training altogether And that's pretty much all the sequel. You'll this is a good amount of the sequel that you'll really need Now I have gone into detail for both the time series and the fraud detection in separate videos That I encourage you to really check out if you want to know more about these queries I've explained it in detail Also, I'm planning to see if I could make a course that's more Application-oriented where we'll even do a deep dive in certain questions and try to tackle sequel from a very very Practical standpoint if you're interested in that do let me know in the comments down below and that's all I have for now So, please do join. Please do first like this video. Please subscribe if you like content like this We also have a discord server going on like I mentioned before so I would love for you to join them Let's have some happy discussions over there and that's all I have for you now So thank you all so much, and I will see you all in the next one Bye