 Hey, this is Andy. I botched the recording for the second lecture in person on campus last week. And so rather than try to me auto double the audio and recreate things, I'm actually going to show a the 2020 lecture that we recorded during the pandemic where I've shipped out all the student names and everything. So I apologize, but the 2020 lecture should cover most of the things that we discuss in this class here. So the original titles lecture on the schedule was advanced sequel. And every year sort of bothers me because I feel like I'm lying to you saying that it's advanced sequel. So I renamed it to intermediate sequel. So again, I'm assuming that that, you know, that everyone has seen some basic sequel in their life. Like the languages is what 40 something years old now. So I just want to spend the time today going on, going a little bit deeper into some more interesting things you can do with sequel. Advanced sequel would be things like lateral joins and other stuff like we'll get into that maybe later on, but not today. So wherever you were all left off last class and click, okay, where we love the last class, we were talking about the relational model and relational algebra. And we talked about how that the whole purpose of the relational model and what Tedcott proposed is that instead of us having to write low level, you know, like, you know, instructions to tell the database system how we wanted to execute our query, we instead provided a high level instruction and say, this is the answer that we want you to compute. And it was up to the database system to figure out how to, you know, what was the best way to actually execute that query to produce the result that the user wants. So again, the main takeaway, what we're sort of focusing on is that we're allowing the user to specify the answer that they want, but not necessarily how to compute it. And that sort of seems like an obvious thing now, or it's a different way to think about programming, but this frees up the database system to do whatever it wants to do in terms of executing your query, the best way that I think is possible. And this can go all the way down from the storage stuff, which we'll talk about next week, to how you actually want to execute the algorithms for the joins or whatever else you're doing. Like, all of that is abstracted away and left to the implementation, which is a very powerful concept. So how the database system is going to generate this query plan is a complex process called query optimization, which we will talk about a few weeks from now before the midterm. And this is like the black order database system, that's the hardest part. I say this every year, if you know how to do query optimization, we're putting query optimization here at Carnegie Mellon, that this is something you're interested in when you get involved in. Like, I can get you a job immediately because everyone has this problem, everyone is putting a database system has this problem, and they can't hire people fast enough for it. So the high end systems, so think of like Oracle and SQL Server and DB2, but even to like, you know, the sort of the open source ones like Postgres and MySQL, they're going to have sophisticated query optimizers that can do query rewriting and basically execute a search algorithm to try to figure out what's the optimal ordering for the operators in your query plan. Again, that's not our purpose today. I'll show an example of what Postgres can do, but this will come up later on when we talk about query optimization. But underneath the colors, think about this is what the system is actually doing. So the history of SQL goes back to the 1970s. So SQL was developed by IBM, these two guys Chamberlain and Boyce, because they were developing the query language used for this database system that IBM was building called System R. So again, the history of relational databases is in the 1970s, again, Ted Codd proposed the relational model, but he was a mathematician, but he wasn't actually going to write the code. So some IBM researchers in San Jose all got in a room, took Ted Codd's paper and actually try to build the first implementation of a relational database system. So System R was the first one that came out of this, out of this work. There was another IBM system that's less famous, that actually was in England, but I think actually might predate System R. And the thing about it has like this weird experimental band name, so nobody thinks it's a database system, it's called the Peter Lee relational vehicle, which again, doesn't sound like a database system, but that was another famous, or sort of not famous, but another relational database that IBM was building, that one was in England, that didn't go anywhere, System R is the more famous one. So the first language that these two guys wrote before they wrote SQL was this thing called Square, which is the specifying queries as relational expressions. And that language was super complex and difficult to write. So then the second one they wrote was SQL. Now, if you notice that back in the day, it was spelled out S-E-Q-U-E-L, because it was the structured English query language. IBM got sued by some other British company for trademark infringement because there was another language for some manufacturing company that called SQL, so they shortened it to be SQL. So, but this is why people like me or a lot of people still don't refer to it as SQL, we say the full word SQL because it comes from what its original name was. So, IBM put this out in the 1970s, the other sort of famous relational database system being built at the time was Ingress at Berkeley, they had their own query language called Quell, this was developed by Mike Stoenberger, the inventor of Postgres, and so again that's SQL to play on words because it's the SQL to Quell. So that's also where the name comes from. So, IBM did this in System R, they never commercialized System R, but then they had a bunch of offshoots of it, System 38, SQL DS and DB2. DB2 is the most famous one of all of this, I don't know whether these guys are still around, but DB2 is still alive and kicking it and making them a lot of money. And so, IBM was the juggernaut software company back in the day, so whatever IBM did, people just sort of adopted the same thing and assumed that would be the standard, certainly not that way now, but Oracle was in 1970s, they were copying everything that IBM was doing, like the founder of Oracle, Larry Ellison would literally call on the phone to IBM, the guys building System R and say like, hey, if you put a bad query in, what error code do you get back? And he would end up copying whatever they told him on the phone, right? Because, I mean, anyway, that's a side comment. But the point I was trying to make is like, the reason why Oracle is the juggernaut now and not ingress, even though they were sort of both at the same time, is that IBM put out SQL with DB2, and that's sort of what everyone adopted, and Oracle was at the right place at the right time supporting what IBM's language was. So then it became an anti-standard in 1986 and an international standard in 1987. And then again, that's when they shortened the name just to be structured query language. So even though SQL is from 1974, it's certainly not a dead language, right? There's updates for it all the time periodically. The latest version is SQL 2016. And you can see these are sort of the major versions in the last two decades, and you just see all these new features getting added to the language. Now the core structure of SQL is always the same, like the selects and the leads, insert and updates. But now there's a new functionality that you can use within those basic query constructs. And so the JSON stuff, for example, that sort of common now in no SQL systems, and that's a good example of where the caretakers of the SQL language are adopting their new technologies or new concepts or new concepts that help people program database applications and keeping SQL up to date with these things. So the minimum support you need or SQL syntax you need to say that you can publicly announce and say that you support SQL is defined in the SQL 92 standard. So again, this would be basic things like the selects, inserts, updates, deletes, to create table statements, the drop table statements, transactions, as if you're building a database system today and you say I support SQL, the minimum you need is this. And this link here will take you to the original spec of the SQL 92 spec, like the text file. Okay, so one thing to also point out too is although there is a standard, and as we'll see this today when we start doing demonstrations, the no one database system adheres perfectly to the standard. And this is partly because the consortium or the governing body for the SQL standard is like a bunch of people from various database companies. And so they show up representing their database company trying to get their new features in that their database system supports and try to force everyone to adopt the same thing. And so I would say in terms of who follows the standard of the best, I'd probably say Oracle is the best and then maybe actually Postgres after that. But like even though there's a standard not everyone's gonna have all these features and for the features that they do have, the syntax will be slightly different. All right, so SQL is a, the specification is a combination of these three things. So the data manipulation language, that's the commands you use to either retrieve data or modify data in the database. Then there's the DDL where you specify what the data actually looks like. And this goes beyond just tables, like you can specify indexes, you can specify namespaces, triggers, functions and a whole bunch of things by declaring the metadata for your database. And then there's the DCL, the data control language. This is a specification for like security ACLs and other things like that. Like, who can read what data? There's a bunch of other things in this as well that we'll sort of cover out throughout the semester. Like you define views. I would say that's part of the DDL, integrity and referential constraints and specifying what values, what tuples can take on in tables. And then transactions are gonna be a really important concept. That's gonna specify how we can do a bunch of operations atomically in our database. So another important thing to point out too as well is that SQL is gonna be based on bag algebra, not set algebra. So in the last class, when we talked about the relation algebra, that was entirely based on sets. So there's sort of three basic sort of data collections you can have, right? You can have a list, right? That's where you can have duplications and there's a defined order. Like I can append things at the end of the list. You can have sets where there is no order, but you cannot have and you can't have duplicates, right? If I insert one key, if I try to insert it again, I have to fail because the key already exists. Bags don't have orders like sets, but they can allow duplicates like lists. And so the reason why we're gonna do this in our database system is because it's gonna make a lot of query processing or sort of operators and algorithms you wanna execute. It's gonna make them way more efficient if we can allow for duplicates, right? And we'll see this when we start doing group buys and things like that. There's ways to enforce the removal of duplicates, but you're telling the data, actual work you wanna do. Like so by not having to worry about duplicates as you process queries, we, it'll make our life a lot easier. Again, this'll make more sense as we go forward. All right, so for today's agenda, we're gonna start off with aggregations of group buys. Then we're gonna do a quick crash course on string, date, and time operations. We'll look at alpha control, how can you specify how much data you wanna get out of a query and where to actually put it. And then we'll focus on at the end what you'll need for the first homework assignment doing nested queries, CTEs, and window functions. And again, please interrupt if you have a question and I'll clarify things, okay? So for today's, all of today's examples, for SQL, we're gonna use the following database that's modeling a really simple college, right? We have three tables, we have students, we have courses, and then students can be enrolled in courses. So there's a foreign key of reference from the student ID here and the course ID here. Again, this'll be what we use as our running example. So the first thing to discuss is aggregations. So the way you think about aggregation is it's like a function where you're gonna take a bag of tuples that you're in your query and then you're going to collapse them down into a single scalar value, right? So like, say a simple thing like account. I wanna count the number of tuples that aggregation would take a bag of tuples, count each one by one by one, and then produce a single scalar value that tells me the number of tuples that I have. So these are the basic five aggregations you get in the SQL 92 standard, average min, max, sum, count. You can do other things like standard deviation, geometric means, like you can go beyond these, but like the basic concept is the same and different database systems will have support for different types of aggregations. And some systems will even let you define a specialized aggregation function, but it'll work the same way. So let's say that in our sample database, we wanna count the number of students that have the at CS login, right? So all I need to do is my query, I have my from on the student table, my where clause has a like, which I'll explain what that is in a second. Just think of this, just matching like a wild card, like a regular expression. And then we have our aggregation function says count the number of login fields, right? So, right, this is pretty simple, but there's one thing to point out here is like inside my aggregation for a count, I'm asking to count the login field, but I don't actually need to do that, like it doesn't really do anything. So I'm not actually looking at the values of the login attribute. All I'm doing is just seeing does this exist for this tuple. So you can actually replace the contents of inside the count function with a star that though in SQL, that's a wild card that says can be all the attributes. So again, there's no semantic difference between this approach. You can even simplify this further, it just put one, right? So this is saying count the number of ones that appear for every single tuple that matches this. So it is just adding up all the ones. Again, these are all produced the same answer. In general, I say for this, the database system will be smart enough to recognize that like I don't need to materialize all the tuples if I have a star and it can just rewrite it as this. So you're passing less data from one operator to the next. You can do multiple aggregates in the same query. So here we're gonna do average GPA and so count the number of students that have the at CS login and we wanna compute their average GPA, right? And so you do this and we get an answer like this, right? You can also add a distinct clause for the count sum and average, right? And the idea here is you wanna count the number of distinct values for given attribute for all the tuples that you're looking at. So in this case here, I'm again getting all the students that have at CS and I'm counting the number of distinct login handles that they have, right? And it reduces it down to a single scalar like this. Now this example is kind of stupid because it's like you would hope that everyone has a unique log in. So you wouldn't necessarily need to have this. And so the database of them could be smarter to recognize that this thing has to be unique because I declared it in my schema and then not doing anything special for the distinct. So there's a small question here. What is the best style to put as count arc? I did not understand myself very much. I think he's asking this. Is the question, what is the best argument to put in here? Can the students read? Can the student unmute and tell you? Yeah. Yeah. Sorry, yes, that's it. That's the question. Yeah, like which, is there a preferred style in the industry? So the question, okay, good. Actually maybe there's two questions. First question is, is there a preferred style in industry? No. You're gonna find this in the real world that there's like, like there's like linters and formatters for like C++ and Java and any programming language you went out there. There are those for SQL, but like as far as, people don't really enforce that very often, right? I think the better question though is what is actually would be more efficient? And again, I think the database system will be smart enough to recognize that like all of these are semantically the same. So when it's computing the aggregation, the way you would compute aggregation is you would basically build a hash table internally and just update this count for everything that matches here. Or actually in this case, it's just a running counter. You don't need to get the hash table. So for this one, I don't think there would actually be any performance difference. We can try this afterwards, but I don't think that I don't think it'll matter. So there's one small question that what is the difference between like and equal to? All right, this will come to you later. So let's hold off on that for now. We'll talk about string matching later on. Equals is like something has to exactly match. Like is like a wild card, like the little percent sign is, again, this is like SQL syntax. The percent sign is like the dot star in regular expressions, it's the wild card. Again, we'll come to that in a few more slides. Okay, so let's say now we wanna do, we wanna get the average GPA of the students enrolled in each course and we wanna know what that course was, right? So if we try to run this query like this, where we just have now the course ID as the output list for our select statement, depending on what database system you use, this may or may not work. In some systems, you will get an error because you can't actually specify a column like this, right? That actually, this would work, but it's unclear what you actually will get because I'm getting the average GPA, but I have no way of like specifying what course is actually being taken here. So what'll show up here will be completely random and we can try this in a bunch of different systems if you want. So what we really wanna be doing is using group buys, where now what can happen is we specify at the end of our select statement, group buy on a course ID, and now we can specify the course ID up in our select statement. So what this will do is this will bucket up every single course by their course ID and then compute the average of the GPAs for everybody within that course, right? So you take all like this, you would take the course ID, you would bucket them up, and then for each of those, you would then compute the GPA from that, right? So again, this is allowing you to basically divide up the data based on one or more parameters. I have one here, I could have a common and separated list of a bunch of other columns, and then within that group, it then computes the aggregation for all the tuples that match that, right? And it goes like that. Okay, so the point I was trying to make before though is like, if you now have a group buy clause, but then you don't specify, then you have columns in here that are not specified in the group buy clause, like we're trying to get the student name here. In some systems, this will work, but you'll get a weird answer or you get something unexpected or it's random, right? In other systems, if they follow the SQL standard a bit better, they will say you can't do this. And so the only way you can actually run those queries is if you put the student name down here. Now, for this example, it's stupid. It's not a good, this is sort of a useless query because this is grouping by the course ID and getting the student name, right? But every student can only take a course once, at least in our database. So the average GPA of the, based on the course ID and the student will just be that single student's GPA. But the point I'm trying to make here is again, you can't have anything up here and let this also be included in the group buy down there. All right, so if you now wanna start filtering based on the output of an aggregation, you can't do that in the where clause. Let's say in this case here, I wanna do that aggregation by the course, the course ID and get the average student GPA within every single course, but I only wanna show the courses where the average GPA is greater than 3.9. So you would think in this case here, I could put the average GPA greater than 3.9, right? Because I'm aliasing the average GPA to be called at average underscore GPA. So I'm trying to reference this column, the column that's generated here inside my where clause. But you can't do that because again, just thinking about how the query is actually being executed, by the time you're processing the tuples one by one, you don't know what the average GPA is, because you haven't seen all the tuples. So there's no way to reference this thing up here, this thing, the thing that's computed up here down in here, because you haven't computed it yet. So the way to get around this is through the having clause where you can now, you can specify after an aggregation what the output should be. I don't know, I shouldn't actually show this. So this doesn't work in a bunch of database systems, but this doesn't even work, you can't even reference it. What you still actually need to do is actually re-compute the GPA down here. Now, I don't know why they don't let you do this because what's gonna happen is it's not actually going to compute this twice, even though you're specifying twice. The database system is gonna be smart to recognize, oh, well this average GPA and I'm computing here, well that's the same thing and then I'm computing up here, so I can just compute that once and then do my filtering there. Some database systems will let you do this, some systems won't. I think my SQL will let you do this, Postgres will not and SQL server will not and we can try this later, SQL might let you. But in the main takeaway here is that like the, this is a way to do filtering after you've done aggregations. You have to put it in the having clause, you can't put it in the where clause, right? And this is just showing what the output would be. Okay, so that's aggregations. Any questions about that before we proceed on to the string functions? So now we're gonna start getting into the part of SQL, as I said at the beginning where the dialects of these various database systems are going to differ and it's gonna be frustrating because you would think, okay, SQL's a standard, like I can take my SQL written from one database system, I should be able to plop it over and use it on another SQL database. It has to be no, and we'll see why this is gonna be hard to do. So the first thing I'm gonna deal with is string operations. So the SQL standard, the SQL 9.2 standard specifies that strings have to be case sensitive, and then we will reference them using single quotation marks. But as you can see here, everyone is gonna differ wildly. And as I said, Postgres and Oracle, actually DB2, they follow this pretty well. They follow the standard. My SQL and SQLite are variable. This one drives me the crazy the most, that fact that the strings are case and sensitive, right? So if I'm trying to take somebody's name, I can run the string function opera to make all the letters uppercase, and I have to do that in order to match anything else, right? Like if I wanna put anything else uppercase. In my SQL, you don't have to do that. You can have whatever case you want for the letters within the string, and it'll still match. I haven't checked to see whether that still happens in the new version. My SQL, well, we can try that right now if we want. The other thing too that also burns me is, I started, my first database ever used was my SQL 3 back in like the early 2000s. And it's one of those things I had like muscle memory. So when I write queries and I write strings, I always put double quotes because that's what we did in my SQL. And that always fails whenever I go switch to another database system. And it's like a bad habit of mine. All right, so this was the like operator that somebody was asking about. So like is how we're gonna use for basic string matching in SQL, right? So you can have a like operator and then what comes at like, so you take the column name, then you say like, and then the string that comes after that is the string pattern you're trying to match. And the two basic characters you can have to do matching would be the percent sign, which is one or more character or any, yeah, one or more character, including empty strings. And then the underscore is match exactly one character. So again, this is like, this sucks because like we're used to using the star in like, you know, the bash terminals or whatever you're using it to represent wild cards for whatever reason the percent sign is the wild card in SQL. Now there are regular expressions in SQL. That's one where I think that again, the SQL Spanner Center specifies what you can do, but the different database systems support different functions to call the regular expressions. And usually they're perl compatible, but I don't know about like the commercialized like Oracle or SQL server, right? And then there's also gonna be a bunch of string functions. I showed upper in the last slide. SQL standard is gonna have a bunch of things you can do as you would expect on strings, like trims, upper, lower, substrings. And you can use these on the output of, you know, you can use them any part of the where clause or sorry, any part of the SQL statement where you're referencing an attribute. So you need to put it in the select output or you can have it in the where clause, right? It can be on either side of a predicate. And again, and every single data system will support these basic functions. The thing though that you would think would be super easy to do and common across all the different database systems would be string concatenation. The problem is though, everyone does something different. So the standard specifies that you use double bars. This is what Postgres does. This is what Postgres does. Oracle does this. SQL server uses the plus operator and then my SQL doesn't use either of these. We'll try version eight just now but like they instead have a concat function. And so if you wanna concat a bunch of strings together you basically, you know, calling all these nested concats. But let's play around this real quickly and see what happens. All right, can you guys see my terminal? Yeah. Okay, awesome. But let me log in my other laptop because typing, I have a Windows surface but it sucks at typing. Okay, so we're gonna have, oh, there's nothing showing now. Can you see my screen? Yes, yes, yes. All right, you guys see that? No, no. Screen change because you already saw it, right? All right, so this is Post, all right, so this is Postgres, right? So if I wanna do like simple string concat, I have to do Andy and then like that and get that. So now let's try this in my SQL. So this is my SQL version eight and it gives me zero, right? That sucks. Maybe I try the plus. Nope, didn't like that. There's a bunch of warnings. It doesn't like me. Let me try the dots. Just probably maybe just fail. Nope, didn't like that. What you can do though is not have any operator and just have like spaces in between the strings and that works. That sucks. All right, let's try a SQL light. They support the pipes that work well. Let's do SQL server. Do not like the pipes. So we'll switch to dots. Didn't like that. Maybe there's a plus. There you go, they do the plus. Again, string concat nation should be super simple. Everyone does something different. Either they should all have the concat function though. That should be standardized. This is a small question, but I also did not fully understand what is the relationship between relational language star and SQL spec? Mathematical theory backing it. We said again, the question is what, sorry? What's the relationship between relational language star and SQL spec? Is mathematical theory backing it? Can the person unmute themselves or? Sorry, I typed it wrong. I meant what's the relationship between the SQL language spec and the relational algebra? I meant relational algebra instead of relational language. Is it the mathematical theory backing it? Like, why do we mention relational algebra if this is like bag? Okay, yeah. So the question, I think your question is, is last class we talked about relational algebra. Now we're talking SQL. What's the connection between the two? So the SQL is a way to specify a query that can then be executed using relational algebra. So probably the easiest way to show something like this would be, so we can look at the query plan, right? And you will see how like these basic operators that we talked about, Postgres is gonna be the best of this. So let's say that I have the student table and I join it with the enroll table. On, I think enroll with student.sid equals enrolled to sid, right? So that's our join. So you can use a command called explain in front of your query and that's just gonna tell you, like give me the query plan. So now this is actually the physical query plan of what the database is gonna execute. This is what the optimizer is gonna generate, right? And so it's basically a tree structure, but in the inside of the tree, we have a sequential scan and we don't have a filter in here, but like think of that as like the select operator for the, that we talked about in relational algebra. The difference though is the relational algebra is a logical definition of what the plan should do. This is a physical manifestation of it, meaning like the relational operative or select says, hey, read this table. It doesn't tell you though how to read it. So what the database system is doing is taking the SQL query, converts it into relational algebra, then converts it into, which is logical, and then converts that into a physical plan that can be used to execute that query and produce the output that's needed by the relational algebra. That make sense? Okay. Any other questions? Okay. All right, so let's look at this dumpster fire. Okay. Dates and times. This is where things are even worse. This is where things go, go awry. So again, the SQL standards specifies how to operate on dates and times, but the, it's the sort of unwritten implicit behavior of them sometimes that can vary. And then the syntax for all the operations you wanna do are gonna be wildly different as well. So let's do an example. Again, what we think would be a really simple query, we wanna count the number of days from now until, since today, since the beginning of the year. So again, we'll do this on all three different database systems. So the first thing we have to do is figure out how to get what is today's date. So there is a now function in the SQL standard and then that'll just, that runs, this is Postgres again, it tells you at the bottom what we're running. So this is Postgres and this will give me the timestamp of today, right? So I can run that same function in my SQL, I can get the same thing. I can run this in SQLite. But now it says there's no function. Okay, well, that's problematic. And then I can run this in SQL server. Doesn't have, doesn't have now. Okay, well, there's another way to get the current time. So there's another function we can use called current timestamp. Okay, a bit more verbose. And again, that'll do the same thing except Postgres doesn't have the function called current timestamp, but it has the keyword current timestamp. Oh, it's hot. Okay, so now let's go to my SQL. They have the function, they have the keyword, SQLite, no function, they have the keyword, SQL server, no function, they have the keyword. Okay, so right, so again, something really simple. What is today, what's the current time right now, that varies. All right, so now we wanna figure out a way to get the, again, we wanna figure out, we're trying to count the number of days since the beginning of year to today. So there is a date type, right? And we can do casting in SQL, right? So today is what, 09, 02, right? And this is just taking the string and now casting it into the date type. So now we, again, there's some basic functions we can do on this, like maybe we could extract the day from this timestamp, right? But that doesn't really help us because that's just telling us it's been two days since the beginning of the year. So the way to do some Postgres turns out is all you need to do is cast the string, for today and then beginning of the year and just subtract them. And that gives us 245, right? I haven't done the math, but I'm assuming that's correct because it's, you know, it's roughly about right. Okay, so let's try to do this in my SQL, but it gives us 801, what is that? So every year that I give this demo, I'm like, sometimes it's like 728 or some other number and I'm like, what the hell is this number? So many on YouTube, believe it or not, actually then told us what it is. The first digit, eight, is the current month, our current month subtracted by the month we're trying to take the default one. So September is nine, January is one. So it's nine minus one is eight. And then the 01 is again the current day of today's date minus the day we're trying to get the difference from. So the 02 minus 01 and it gives us one. Now, I think the type of this is an integer and not a string. So like this is bizarre, like this, if you would run this query, it'd be like, what the hell is this number? Like for every time I run it, I'm like, I don't know what this is. So the next thing we can try to do then is the solution I came up with was to convert the today's date into a UNIX timestamp, which is the number of days or sort of the number of seconds since January 1st, 1970, it's the UNIX epoch. So we convert today's date into the UNIX timestamp and then we subtract that number of seconds by the number of seconds from the beginning of the day, beginning of the year. And then we divide it by 60 seconds times, 60 minutes times 24 hours, and that gives us 245, right? So that's sort of bizarre, but that's one way to do this. But then I found out that my SQL has a date diff function where we just get cast as a date and then the date diff will then give you exactly that value. All right, so that's one way to do it in, two ways to do it in my SQL. So now let's try this in SQL light. So SQL light does not have date diff. And so the way I figured out to do it other than using UNIX timestamps was to convert the dates into the Julian calendar, which is the number of days since like 400 something BC, I think it's when Julius Caesar was born. And then that will give you roughly the right value, which we can then just cast as integer to get 245. All right, so now let's go, last one would be SQL server. So if we try to do what we did in Postgres at the beginning and just subtract two dates, that doesn't work because they don't have the date type. They have a date time type. So what we'll do is we'll call the convert function to convert the dates into the date time and then we'll subtract the two date times from each other. Right, that gives us September 3rd, 1900, which I don't know what that is. But then I found out they do have, they also like my SQL, they have a date diff function. So we can basically do the same thing we did in my SQL, right? Take today's date, subtract it by beginning, sorry, that's wrong, which that should be beginning of the year, right? So take today's date subtracted by beginning of the year and base it on days, but that gives us negative 245, right? Again, just going back to my SQL, right? I did today's date first, where are we? Today's date first subtracted by beginning of the year. I do that same thing in today's date, so in fact I'm gonna get a year in SQL server and I'm getting negative 245. So if I just now flip the order of the arguments, then I get positive 245. So again, you see how frustrating this is, it's like the simple operation, but you can do it in a bunch of different ways in all these different database systems, it sucks. All right, so any questions about this? Okay, so let's keep moving ahead. All right, the, let me turn up this over here too. The, all right, so the next thing we gotta worry about is output redirection. So again, in all the examples that I showed you, when I run a select statement, the output comes to my terminal. If you're writing an application, it'd be the same thing, like the output would go to the, whatever has the connection open that made the query request. But maybe there's maybe times where you don't want the output to go back to the terminal, you actually wanna store it back inside the database so that another query or something else could start using it. So this is what output redirection can do. So you basically take the output of a select statement and you're putting it somewhere else. Now, the table does not always have to be defined. Like you can, in this case here, you can call select out, sorry, you can call create table here and then have the select query be what generates or populates the table. And the database system will derive the schema from the, from what the, for the schema for the table based on what the query is outputting. Like it knows, oh, this course ID is an integer. So I'll define a column called CID that will be the integer type. In the SQL standard you call select into and you specify here where the table has to be. I think this one, I think again, this one also does not have to exist in the SQL standard. You can insert in tuples into another table. So this is like, this is basically what it should write or this is the insert clause. And instead of having a value list, you have the select statement, right? So although the syntax for this is gonna be the same across different database systems where they're gonna vary will be what they do when there's an error, right? So like there could be the case where the, I'm trying to insert into a table that has a primary key and therefore I can't have duplicate values. And now I do this like select query like this where I'm doing the insert based on the output of the select and I come across now a value that already exists and that violates the uniqueness constraint that I have on my table. Well, some systems will throw an error the first violation that they find and then roll back any changes that they've already made to the database. Like anything that got inserted before I hit that error, I undo all of them. So as if the query never executed, some systems will throw an error and stop the insert but they keep all the things that already got inserted. Other systems will just straight up ignore the error, give you a warning at the end that there was an error but keep everything else that had inserted. So again, the SQL standard doesn't really specify what you should do, or at least I haven't looked at this in particular, like everyone is doing something different because it's just based on how, the one person that implemented it this feature back in the day, this is just what they did. All right, the next thing we can also do is control the how much data we're gonna spit out from our select statements. Clicker's not working. Yeah, is this frozen? Oh, there we go, sorry. Right, so the order by clause is used to generate the sort order that we want for our select statement. Again, if we're based on bag algebra, the data can store the tuples in any way that it wants. It won't even store it, you're not even guaranteed that it's gonna store them in the way that you insert them. I will, we can see example of this later in the semester but just because I insert your tuple one first followed by tuple two, when I run my select statement, there's no guarantee I'm going to get back one followed by two. The database isn't as left to store it anyway that it wants, right? Bag algebra semantics says there is no defined order. So if we care about order, then we have to add our order by clause and then we can specify what column we wanna sort on and then whether or not we want it ascending or descending. Right, so this is a simple example here where I'm getting all the students in 15721. I specify I want them to be sorted on grade. And so in the SQL standard, if you don't specify what order you want, the default is ascending. Instead of providing also the column here, you can actually just give an offset of the output list here. Actually that should be a two, right? Because it's based on one. So this is saying that of my output list, I wanna order them on the first attribute or should be the second attribute here. And that's the same thing. So that's useful when you have things like like an aggregation or something that may not have a defined name or alias for the output field in the select applet. So you can just say I wanna sort it based on this offset. You can then combine this with multiple things. So you can do sorting with multiple columns and each one can be sorted in a different way. So this is sorting based on grade and descending order followed by student ID and ascending order. But it's important to notice here is that I can sort by grade even though my output list doesn't specify that I want grade to be included, right? So the data system will know that as I'm scanning the roll table, I have to maintain that the grade attribute and the tuples and my innovative result as I'm going along. But I'm gonna need to sort that by that value later on even though it gets projected out later. Again, the same thing I can just specify the, I can specify the number that I want here from the output list as well. All right, next thing is that there are limit clauses. The limit clause is the way to limit how much of the output you're gonna get from the query. The way to sort of like do early termination to say, I've gotten all the data that I need or don't do any further processing. So you can specify how many tuples you want to be produced as the output. You can also specify an offset of where the limit sort of calculation should start. All right, so this one here will get all tuples, do the scan and it proves 10 output tuples. This one says, skip the first 10 and then give me the remaining 20. So this is why they use and like any website where you see like, you know, it shows you 10 results and you say, give me the next 10. If they're using SQL, they're basically doing this with a limit with an offset, right? Just to be careful because depending on what the query is, you may actually have to look at all the tuples even though you specify a limit. So for example, if I wanna get the top 10 values, some attribute and sorted by that value, I gotta sort the entire table, then look at the top 10, right? There's no way you can shortcut that very easily. So just be mindful of like, even though you're saying I only want 20 tuples, you may have to look at all the tuples in order to compute the answer that you're looking for. This is just a duplicate, yeah, sorry. I'm not sure what happened there. All right, so any questions about output redirection, output control? Okay, all right, so now let's get into the good stuff, right? Let's talk about nested queries, window functions and CTEs. So nested queries are a way for us to embed one query or multiple queries inside of another query. The way you think about this is like, you know, we treat it like a function call, right? So say I'm doing this query here, I wanna get the names of the students that are enrolled in least one class. So in my, I have my outer query here that specifies what the output's gonna be, but inside my where clause now, I can embed another query. So the terminology we would use is that this is the outer query, that's the uppermost query that again produces the output that's sent back to the terminal or the client and then anything that's embedded inside of it, it will be the inner query. The, you can have any arbitrary number of nesting. So like this inner query can have another, it's own inner query, right? You can pretty much do this infinitely. Obviously you can't enter real system because you would overflow the stack, but like, and maybe some systems actually cut it off at some point where they say, I can't do any more nesting, but like again from a syntax standpoint, you can do this forever. So I would say that like, these are super useful and in the open source databases, especially my SQL in recent years, I've gotten much better at handling them. Oftentimes the, they will do the, sometimes you'll see some data systems, if you give it a really bad nesting query, they'll do the dumbest thing, which is execute the inner query for every single tuple of the outer query, right? So like for this example here, say I have a thousand students and there's a thousand records in the student table. So now in my where clause, I wanna see whether that student exists in this other table here. So the dumbest thing to do would be for every single student, go scan the entire enroll table and then check to see whether that current student is inside of that, right? So what query optimizers are going to try to do is they're gonna try to rewrite this as a join because that can be way more efficient than doing the sort of brute force search that I just said. All right, so it looks more complex examples. So say you wanna get all the students that are enrolled in 15, 445. So the way to think about how you wanna write this is you wanna start with the outer query and think about what is the output result that you wanna produce. And then you work your way inside and say, okay, what does my inner query need to do to figure out how to filter the tuples that I need? All right, so in this case here, just writing in English, here's our outer query. We know we want the name of the students and then we wanna be able to match them where for every student, if they're a student that's enrolled in 15, 445. So we know that there's gonna be an inner query here where we're just going, getting all the student IDs for the students enrolled in 15, 445. So the question is now, what do we put here as the output around this? Well, this is just again, that in calls we had from before. So this is again saying, find me a student ID that's within this set here. So the point I need to point out is like, we're referencing student ID twice because both the student ID, student table and the enrolled table have student ID, but this inner query is bound to this one and this SID is bound to the outer table here. All right, the other ways you can operate on nested queries that are producing multiple rows is you can use all, which says that all the rows have to satisfy your expression within the subquery. The any says that at least one row has to match in our subquery. In is basically equivalent to equals any, so these two guys you have to say, you specify greater than, less than, equals all or any. In is just a shortcut way to say equals any and then exist is just a way to say, as long as the inner query produces one output record, I don't care what it is, I don't care what's in it, as long as that returns at least one record, then my predicate will evaluate to true. All right, so going back here again, this is just a way to rewrite the in we just showed before, you can say where the student ID equals any student that matches in your old table. In this example here, the nested query is inside the where clause. The nested query can appear anywhere, right? You can actually have it in the select output. So this is running, this is producing the exact same result as this query, but it's just flipped things around. So now the outer query is the roll table, where I'm just gonna scan through, find all the students that are in course ID 15445. And then for every single one that isn't enrolled in this course, in my select output, I'll do a lookup into the student table and get their name, right? So again, semantically they're the same, the database system is probably, we'll be able to probably rewrite this one as a join, which is the most most efficient thing to do. This one here, probably we'll have to get rewritten as a, well, it probably runs at once in Postgres, materializes the result and then reuses that. But the dumbest thing we do would be again, for every single tuple that's matching out here, do the probe into the student table. All right, so let's look at more complex things. So now I'm gonna find the student record with the highest ID that is enrolled in at least one course. So you would think you wanna do something like this, right? I wanna do a join on the roll table and the student table and I wanna get the max student ID and just get their name, right? But as I said before, this won't work in most database systems because we can't have the student name reference here because it's gonna produce, it's not actually matching to the thing that we're computing aggregation on, right? Again, we can do a quick demo of this. All right, so let's try this in Postgres. So this was what, select, select max student ID, student name, from enrolled as E, student as S, where EID, was SID, S, S, right? This throws an error because again, we're specifying, we're trying to access the student name but we're doing an aggregation so it doesn't know how to bind it. Actually, one quick thing to share too as well, like this is like old school join syntax, like you have now the from, you have all your tables and then you specify the join predicate in the where clause. The preferred way of doing this now would be, you would write join like this and then you replace the where clause with on, right? So that'll work. Again, I'll try to use the correct syntax but if you've never seen it before, you don't have to specify join when you want to do a join. If you just have it in the where clause, the data system is smart enough to know that that's my join predicate. So let's try the same query now that was failing in Postgres in MySQL. So MySQL throws an error, which is good. It didn't always used to do this. So in, actually, probably two years ago, MySQL would let you do all sorts of things that violate the SQL standard and so they had this thing called SQL mode where you can specify what syntax is allowed in MySQL. So if you set it to traditional, which is prior to like a new version of MySQL 5.7, this is MySQL 8, right? With traditional mode, you can do all the quirky things that you're not supposed to be able to do. So this here, now we can actually run this query and we see we're getting the right student ID, but the name is Tupac. But if we go look in the student table, so student ID was 53688, 53688 is Justin Bieber, not Tupac. So this is a good example. Again, it doesn't know how to bind the student name to whatever the values have been using for the aggregation. So it's giving you garbage. So question? There's multiple students out there confused about all and they want to also ask about the difference between exists and any. All right, the questions are going back to those slides. It's about these guys here. So the question is what does exist do and what does all do? Yeah, and the difference between exists and any. Okay, so exists, all right, so let's do a quick example here. All right, let's do Postgres because Postgres is I think follows the standard better. So all of this is doing is saying that whatever predicate I have, as long as something in my interquery returns, as long as the interquery returns one tuple, that's all it needs, right? So let's say I do it from student, where cause, and then I can just have exists, select one, right? So you can have a query without a where close, right? I can do select one and all that does is return one, right? And you can treat the database like a calculator. I can do one plus one, right? So this query here is just saying that for my interquery it returns one tuple that has one attribute and the value is one. So long as this thing returns true, like the long as this, like long as one tuple gets returned, the exists will be satisfied, right? So if I think I do this, I guess no, the tuple too. Let's make a fake table, create table empty and it has an ID field, all right? Select star from empty, there's nothing in there. So now that I go back to this guy, select star from empty, now returns nothing because exists as something has to be in there but my little fake table that I made that doesn't have anything in it doesn't return anything on when I call select. So therefore, this now predicated values to false and no tuple satisfies, right? So that exists. So you think, okay, well, this is stupid. How do I actually use this? Well, you can use it for things like, because the interquery can reference the outer query. Or so yeah, the interquery can reference things that are in the outer query. So like the outer query here is select. So now in my enroll thing, I can say so as E where the student ID equals the outer table student ID. So what this is doing is this is returning, this says along with some tuple that satisfies this predicate and here now I'm matching the student ID from out here inside my interquery and that's returning some tuples and it comes back as true. All right, so now with all, it's the idea with all is that we're trying to, well, we have a predicate that says, the way that our interquery is gonna be based on a, if the interquery returns back a list set of tuples, then our predicate has to match for every single value that's in all those tuples. So let's see what I mean by this. So say we have now, say we have our student ID out here and we say that the student ID has to be greater than all tuples that are gonna be produced by the interquery here, right? So my interquery, I can put, again, anything I want, I can have it as return, no, 999999. So now this predicate's gonna say, for every single student, checks you with our student ID is greater than all values that are returned by this interquery here, which none should match, so you get nothing. So let's maybe use that empty table and make this a bit more clear. Insert into empty values, so we'll put 9999 and zero. So now my empty table, which is no longer empty, I have two tuples, 9999 and zero. So now if I go back to this guy here, select star from empty, this will not match anything because it's gonna look at every single tuple that this interquery generates and it's gonna evaluate that the student ID is greater than all of them, which it won't be, because one of them is 9999. If I now delete from empty, where ID equals 9999, now they all match because they're all greater than zero. So Andy, there is one question on chat, I did not fully understand it, like the student is asking whether the exist act as a Boolean in way that it returns a true or false? Yes, exist returns true or false, yes. Important thing to understand about this too is like this is taking a scalar of this tuple and it's gonna match it against multiple tuples from the interquery, but they actually have to return, they only return one value because that's the value you have to use to match against this thing. So like if I go back to my example here, 9999 returns, this interquery is returning one value, one tuple with one attribute. If I now make a return multiple things, this should throw an error, because the interquery here is returning too many columns so I can't compare against student ID. So this one, I think we covered this one. The main thing here is that you can have it in the where clause, you can have it in the select output, you can actually have it in the from clause. So let's now see a way to get the, the next query we're gonna run is you wanna get the student record that has the highest ID that is enrolled in least one course. So again, and we covered this, right? The way we actually wanna do this would be, and this is the outer query, this is what we wanna produce and we know that our where clause has to have the highest enrolled student ID. So we know that there's gonna be this max query on the roll table and that'll give us the, that'll give us the one tuple with the one value that's the max student ID. And this is the question is, how do we replace this match here? What? The in clause would do that for us, right? We can also rewrite this as now, we can use, instead of using the max aggregate, and we can, instead of doing an aggregation, we can do that as you just do an order by based on the student ID and the sending order, we'll limit it to one, and then that'll produce the same answer. In many cases, like for, depending on what's more efficient, the data we said could rewrite it, this max query to be this order by or could do it the other way around, again, it depends on the implementation. So another way to do this, now we have in our from clause, we're doing a join on the table, I'm sorry, on a sort of synthetic table, that's the output of that max query. That gets materialized into this, again, temporary table called max, max E. And then we can reference that in our join clause, just as we would, as if it was a real table. Like it's an ephemeral thing. The database system will generate this table, materialize it somewhere in memory, and then does the join and then blows it away when the query is done. So one more example, how do we find a course that has no stuples enrolled in it? So this case here, we use not exist, because we said we're basically saying long as no tuple matches, whatever in our inner query is, then that'll produce the answer that we want. So now we just have that select statement where we're doing essentially a sort of inner join in the right word. We're doing that sort of a join internally in the subquery, where we're referencing the course ID of the outer table or the outer query and the enrolled ID of the inner query. And then that's essentially doing a join to find the answer that we're looking for. And then 823 is a class that I taught when I started at Carnegie Mellon, but I haven't taught it since then. All right, so any questions? Questions about nested queries? Can we go back a few slides? There was a query that you said that didn't work. And I was, I'm so confused on why it didn't work. This one here, yeah. The reason why it doesn't work is because you're doing an aggregation. What does an aggregation do? It's taking multiple tuples and coalescing them into a single scalar value, right? But now I'm trying to reference the, for all those tuples that I collapsed together, I'm trying to reference some attribute of them. But like, which one, right? I had a thousand tuples to compute the max student ID, but now I'm referencing the student name, like which student? And so I showed the example in, with my SQL, if you put it in traditional mode, it'll let you do it, but the number, the name is actually giving you is wrong. SQLite will have the same issue as well. Wait, but by aggregation, where is the aggregation happening? What do you mean? Where is the aggregation happening on the from clause? No, the aggregation is happening on the output list. So you do this join, right? And just think, ignore the aggregation. You're doing this join, now you're gonna generate a list of all these tuples that match your join clause. Then now you then compute the max value of the aggregation. Oh, okay, I see now, I see now. Yeah, I see. Any other questions? Shit, it's already 4.30, all right. Let's keep going. All right, window functions, because I want these in CTEs because you have to need these for the homework. Okay, so window functions are gonna be like aggregations, but instead of collapsing the tuples into a single aggregation, you're gonna compute the aggregation on a sort of incremental fashion, and then you still produce the original tuple as the output. So this is useful when you're doing like time series analysis and you wanna compute things like a moving average, but as you're processing the tuples, like you're computing aggregation, you still wanna know what was the value of the tuple you were looking at at individual time text. So the way you would do this is that you specify that you have a function over some aggregation, and then the over clause specifies like what is the, how you actually wanna group together the tuples, right? So my battery's dead. Right, so I keep clicking the zoom thing and I lose focus, right? So this is the aggregation function and then this over clause is essentially gonna specify how we split up the data and we can sort it in doing more than just the group by we have before. So we can do all the aggregation functions like we have before the bin, max, sum, and count and average, but they now have special window functions that can do other things like now give us the position of what row we're looking at in our window and then we can choose also now the rank meaning the order position as we sort them. So in this query here, what are we gonna do? We're gonna go scan the enroll table and then for every single tuple, we're gonna compute a row number and we're gonna synthesize that as this row num attribute here. So it just looked like this, right? So as now I'm scanning the enroll table, I'm maintaining a calendar that's telling me, here's your row number, right? This is pretty simple. It works in Postgres, MySQL, newer versions of SQLite and in case it's SQL server, you can't have an empty over clause. You have to have an order by in there. So the over keyword is an important part because that's gonna specify how the query should group the tuples together when it's computing the window function, right? So save this query what we're doing now is that we wanna get the row number or compute the synthetic row number of the tuples as we group them by the course ID, right? So it would be like that. So for every single course, we have all the tuples out of the tuples that are enrolled in it and then there's a row number that's specifying here's their position within that group, right? It's essentially, it's like a group by but I'm still getting the tuple and I'm having my aggregation of being computed on the fly. We can also include an order by clause in the window to sort the elements of each group. And so the way anybody's like, you're doing the sorting and then you compute the aggregation. Again, it's another way to sort of split things up. So let's look at function like this or query like this. So you wanna find the student with the second highest grade for every single course. So this would be sort of tricky to do now in a, again, without a window function because you have no way to know what is the position of the tuple if you now start sorting them or grouping them based on their, and, you know, on a course ID. So in our, and this one here, we have an inner query. Let's focus on the red part here. The inner query is gonna do a rank function over the, over the, oh, sorry, where's it rank function over the enrolled table ordered by the grade in ascending order. And we're gonna split them by course so that now we end up with for every single course we will have the sorted list of students based on their, you know, based on their grades. And then they'll be assigned a rank telling their position in that sort order. And then we can then in the outer query, we're referencing the table that ranking that gets materialized by this inner query. And we can specify that we only want the tuple that where the rank equals two. And this is just looking like that. So let me show this really quickly. I think it'll make more sense when you see it actually running. So we'll use Postgres again to start with, right? So again, the rank over partition, rank over, compute the rank over the, for the only enrolled table and we're gonna split them by course ID and then we're gonna order the elements by their grade. So if I remove this, this outer query part, right, and just run the inner query, you can see what the output actually, it's gonna go on, right, there you go. So in this case here, again, now you see that for every single course, I'm sorting them by grade and then now they have a rank position within their, you know, within their group, right? Any questions about this or is this clear? So students are asking what is the difference between rank and row number? Row number is your position, in the output list, rank is your sort position, right? So I do this, if I remove the order by clause, one, two, one, two, let me try to give an example where it comes up different. One, two, three, four, five, and then, yeah, perfect. Okay, so in this example here, the rank is one, one, three, three, five because in their sort position, we were, since we were sorting by the course ID, these two tuples have the same value. So technically they should be in the same position, right? So their rank is one. Up here, when I was doing it by the row number, it's one, two, three, four, five, because it's just the output as they come out, right? So is this clear? Right, but now, if I go back here and I do a multiple one, so I do order by grade, I think I can then specify also, sorry, by grade as well, now the rank should be the same thing as the row number because the grade here is the tiebreaker. All right, any questions? Other questions about window functions? CTEs, CTEs are awesome. CTEs are way, they're gonna look like nested queries because you're taking the output of one query and then using it as the input for another query, but it's different because you're gonna declare it before you have declared the CTE first and then you can reference it down below. So you add this with clause here, then you specify the name of the CTE and then you can reference it as if it was a table down below. And again, think of this as like an ephemeral temporary table where it's only bound to this query that's running down here. When the query finishes, anything I materialize for the CTE gets blown away. So we'll see some examples of things you can do with CTEs that you cannot do with nested queries. All right, so the way it works is like, you sort of think if you declare the CTE as if it was like a table, you're gonna specify column names. You don't have to specify the type because since CTE was declared as, you can derive what the type is gonna be based on what the output of the interquery is gonna be. And then now down below, I can reference those columns and do whatever I want them. So this is now declaring a CTE called CTE name and only has two columns, one and two. And then here's my select query without a from clause that then produces a single tuple with the values one and two. Turns out though, I found that you can do this in Postgres and I don't recommend this as like, these column names don't actually need to be unique. My SQL throws an error, Postgres does not. So you can do this, right? You can specify that these column names are the same. But then down below, it recognizes that you're trying to use, it doesn't know how to bind the name of the column to whatever the output of the CTE is up here. So Postgres won't let you do this. You have to rewrite it as a star. And in that case, that's totally fine with Postgres and lets you do that. All right, so let's see that example we have before, but we're trying to find the highest, the student record with the highest ID that the role needs one class. Again, instead of having the nested query down below inside the from clause, I declare it as the CTE, where now I'm computing the max student ID from the enroll table. And then now in my bottom query, I just referenced this thing as if it was another table. All right, so that's not that exciting, right? Where things get awesome and what, and this what CTEs are all make SQL Turing complete is you can do recursion. So if you add the recursive keyword, this allows you to then invoke the CTE on itself. And then that keeps going until there's some recursive, till there's some halt condition and then it fires off the query down below. So what this is gonna do, this CTE is gonna recursively call itself to print out the numbers one to 10. Think of it just like as a for loop, we're just gonna keep outputting, take where the current counter is, add one to it and then recursively call the next counter and get the next value. So what this is doing this is it's going to select one and produces a single tuple with the output one. And then I'm doing a union where I'm gonna combine whatever the value, the output of this query with the output of this query, which has a recursive call now into itself to take whatever the counter is that's getting bound, adds one to it and then repeats the process over and over again, right? And then we have this where calls to specify when to terminate. Okay, so let's see some examples of this working. So we can get stuck in an infinite loop if we're not careful, right? So here's a really simple CTE, right? Here's the example I have before, I'm sorry, here's the example I have before, right? Select one, our CTE call CTE name has an output column one, column two and all does is produce the output directly in the query down below. So we can run that in Postgres, that's fine. We can run this in MySQL, that's fine. We can run this in SQLite, that's fine. We can run this in SQL server, that's fine. So that's good. So now let's see the example we had about the for loop, right? So again, I'm calling a recursive CTE where the first value that I'm generating is just the value one. You don't actually need the union all when you're rid of that. And then I'm gonna union that with the output of the counter that's being passed in plus one, but then I call the CTE again, right? And you get the output list, one, two, three, four, five, six, seven, eight, nine, 10, like that, right? Again, it's essentially like, we're treating this like a for loop. Let's try to run this in some other databases. My SQL that lets us do it, SQLite doesn't let us do it because it doesn't like recursive. And then SQL server doesn't like it either because it doesn't know how to deal with recursive. So, of course the CTEs are super powerful, but not everyone's gonna support them. Let's see now what happens if I don't have that where clause here where I'm saying stop when the counter is less than, or keep going along with the counter less than 10. So, what I'm gonna do in Postgres is I'm gonna set this timeout variable to be 10 seconds. So, this is a Postgres specific parameter that specifies how long query is allowed to run before I kill it because it ran too long. So now if I run this, if I wanna basically the same query where I'm gonna call over Chris of CTE and do that recursive call into itself, but I don't specify, oh, that one did it. It's the union all, that's why you need this. If I don't have, union all means it keeps going forever. Like if I don't have the where clause to say when I should stop, then it will run forever, but unless my timeout is set to 10 seconds, right? In this case here, Postgres says I timed out and I go ahead and kill it. All right, CTEs are often confusing for students. I'm sure there's questions, go for it, if any. Okay, cool, awesome. So, again, as I said in the beginning, SQL is not a bad language, right? It's, there's always new features being added to it. Different databases support different things and a lot of times in new releases, they will add new SQL features for you, which is super cool. When you use like an ORM, if you're using like Django, Ruby on Rails, Node.js where they sort of extract away the database system, there is a sort of mapping layer that knows how to take whatever the operation trying to do in like your Python or Ruby or JavaScript code and convert that to their SQL statements. And in those cases, they usually don't try to use the specialized features. It's only when you write SQL by hand, you end up falling into the trap of like using specific things that the data doesn't support. The, when for the homework, which is next, you're going to want to strive to write your SQL queries always in like try to compute the answer you want in a single SQL statement. And this is going to allow, because now the data system can see the complete view of what you're trying to calculate or compute and generate the potential of the most efficient plan. Like if you, like I said, you're using nested queries, you could break it up into sort of, you know, a bunch of selects that go put things in temp tables and then have the final query, but that's going to be way, it's going to be less efficient than if you had a single query that you just send to the database server and let it compute everything all at once. All right, so homework one is going out today. So this year, what we're giving you is a sample database generated from the music brains. It's basically an online encyclopedia of like a bunch of album information and artists. And so we're asking you to write queries in SQLite. This will allow everyone to write, you know, test things on SQLite locally. SQLite, if you have a Mac, we've already installed it with Linux, it's super easy for Windows. It's also super easy. Like SQLite should run anywhere. And so everyone should be able to do this locally. And then you would submit your queries to Gradescope and then we'll run the same thing on SQLite up there. And then it'll tell you whether output mesh or not. So you can submit as much as you want on Gradescope, but obviously don't use it for debugging, right? Cause it's useless cause it's, you know, you're better off trying to, you know, trying to figure things out as much as you can locally. And then the, it'll be due on Sunday, September 13th. All right, so any questions about the homework? Quickly. Okay. So next class, we'll do storage management. Again, now we're actually, this is pretty much the most we'll see of SQL this semester. Everything going forward now will be, how do we actually build the data system that can execute those SQL statements? Okay.