 Okay, while we wait for everyone to get in and settle down, let me ask you a few questions to get a feel for the background of people here. How many of you have used the sixth edition of my book or this particular schema in your courses? Several of you, okay. Since many others haven't, I think it's important I spend a little bit of time going over the schema because our exercises are going to be based on this schema. So you need to understand what is in here. So I'll spend some time and if you have any questions about the schema, please do ask. I'm not in a hurry to run through this because unless you know this lab exercises will also be more difficult. So let's start with the very top right part of the schema. This is a university schema. It's a very simplified model of what kind of academic information a university might need to maintain. And if you look at the top right, there is something called student. This is a schema diagram. This is not an ER diagram. We will be coming back to ER diagrams later. This is the actual relational schema with arrows being what? What are the arrows? Foreign key dependencies. And each box is a relation with the name of the relation at the top. And if you see some of the attributes there are underlined. The underlined attributes are the primary keys. So all that is fairly standard notation. So if you look at the top right, there are students. What are the attributes of student? I have an ID, a name, a department name. A department name here is the foreign key, which is referring to a department. In real life, probably you wouldn't use a name as a primary key and foreign key. You might have a department code, a department ID, but like I said, this is simplified. And then you have some field called total credits, which is supposed to reflect how many credits the student has cleared till now. This is an example of a derived field, basically because it is supposed to come from other information in here. In the databases that sample databases that we use, it doesn't actually match because these are parts of what would be a larger database. Okay, now if you come down to the bottom right, there are instructors. We could have used faculty. We could have used teachers or whatever. We chose a neutral term, instructor. And these instructors have more or less the same fields except instead of a credits, they have a salve. Now again, this is not realistic. You probably don't want to store salve as a field of instructor. Why? In real life, salve is a lot more complicated. If you are in an Indian education system under UGC or in the government, you have so many parts to your salve, you cannot represent it by a single number. But this is just a toy example. And then you have courses, so if you, okay, you have departments in between student and instructor there. You have departments. Again, this is grossly simplified. For our purposes, a department has a name, has a building, which are again not very realistic. Our computer science department here lives in three buildings as of now. It should reduce to two later on. But here we are assuming there's a single building and there's a budget. So now there's a department, which we already saw. And then there are a lot of things linking these things up. But before we get to the relationships, there are a couple more things which in the ER model would be entities. So we have, in any university, we have people, we have courses, we have departments, we have many more things. This is just a small part of what is there at a university. So courses in our model is again quite simplified. It has a course ID, title, which is the name of the course. The course usually belongs to a department. So it has a department name and it has a number of credits, how many hours a week or whatever else, how much effort is required for that course. That's a simplified view. In reality, courses change over time. What is CS 101 today, its title may change next year. So if you want to keep historical data, the schema has to be more complicated. This reflects the name of the course, the title of the course, as of now. So again, like I said, it's simplified. A real university schema would have to keep track of history. We'll come back to this when we talk of database design. So that's courses. Now if you look over here, there's something called a section. So what is a section? A course, database is a course. But it's offered every year or maybe twice a year or maybe in a single semester, we may have two different sections of the same course. The section captures all of these. It could be called a particular offering of a course, it's called a section. Now in this schema that we are using, a section has four fields which form its primary key. What are they? There's a course ID, it's a section of a particular course. There's a section ID, which uniquely identifies the section within a particular semester and year. So section ID does not have to be unique across time, but it should be unique in a particular semester. So section A, B, C in this year and this semester. Now again, there are many alternatives to this design. One of the alternatives we considered was to have a section, have a section ID which is the primary key. There's nothing else in there. And then the course ID, section ID semester year would be foreign keys referencing other tables. Again, there are design alternatives. We picked on this particular design for simplicity. So that's a section. It's of course associated with the course. So course ID here, not only is it a primary key attribute, but it's also foreign key referencing course. So that's a section. Now there is a relationship between section and student, which is takes, a student takes a course, but the student actually doesn't take a course. The student is at any a section of the course. So the relationship of a student to the course is indirect here. The student is taking a course. So how do we represent that relationship? There's a takes relation with an ID, this student, this ID of the student taking the course, and these four attributes are the primary key of the section relation. If you had chosen a single section ID to be unique across all sections, there would be just a single field here. And finally, there's a grade. Now a grade is given at the end of a course. So until then, what is the value for this field? Null, of course. So the null value tentatively, till the grade is known. It's unknown during the semester. Now moving on, we have a classroom, which says that this section runs in this building and in this room number. So we are in this particular lecture hall complex, room number one. And that itself is a foreign key into the classroom relation, which also has a capacity. If you want to schedule classes, you want to make sure that the room is big enough. This room is big enough for this audience, but the rooms which are in the CS building are not, we don't have any room big enough for all of you sitting here. And then we have a relationship teachers between section and instructor. So there's a corresponding relation, which says that this particular instructor, identified by ID, teaches this section, which is course ID, section ID. Those are the fields which identify which section. So these five fields together form the primary key of teachers, which allows this particular instructor to teach multiple sections of the same course in the same year semester. And here is one more relation, which is called time slot. Now different universities organize their lectures in different ways, but many have a notion of a slot. IIT Bombay has a notion of a slot. We say that slot one meets on this time on Mondays, Wednesdays, and Fridays, let's say. Slot two might meet at this other time on Tuesdays and Thursdays. So different courses have different slots. So a slot has an ID, time slot ID. It also has a day, a start time and an end time. But actually this is a bit weird. So if you look at the primary key of time slot, you'll see here it's time slot ID, day and start time. So what does this identify? It's identifying not only the ID one, but which particular class, that same time slot one meets on Mondays, Wednesdays and Fridays. Now why is start time part of the primary key? It allows what? I'm jumping from SQL into database design here, but that's okay. Yeah, on the same day, it might meet more than once. So you want to identify the start time also. But end time is not because with a particular start time, it cannot end at two different times. Now again, this could have been done differently. This particular design, we could have had a time slot relation where the primary key is time slot ID. And then we would have another relation which is when does this time slot meet? Again, we did this partly for simplicity, partly just to create a little bit of a complication to show certain things which would be a little harder to show otherwise. To show what aspects of, what are the impact of certain design decisions we did this. So you could split it into two tables, a time slot master and then a time slot detail which is when this particular time slot meets. We have two more relations here. There is a prerequisite. So the database scores in IIT Bombay has a prerequisite which is the data structures and algorithms score. Now in certain places where IIT Bombay did not bother to record the prerequisites very explicitly earlier on. The notion was there but it was not widely used. Partly because we said that students go through this sequence of courses. So they should have done the previous course before the next course. So when you have flexibility of when students can do what course, that you don't have to do data structures in semester three, databases in semester five, you could flip it around, well you can't flip it around. So that prerequisite records these dependencies. So it'll record that this particular course ID which is databases has a prerequisite which is data structures. So that's this relation. And finally one last relation here, advisor which says that this student has this instructor as their advisor. A student can have more than one advisor. So in general, but in this schema, we have restricted it to a single advisor. So that's a summary of this schema. Any question before we move on to example queries. So now chapter two which I will cover in more detail in the main course is relational algebra. I suspect most of you would already know about relational algebra. Anyone here who doesn't know about relational algebra, it's not familiar with it. Nobody admits to it at any rate. So just as a quick reminder of the notation, we have the select operation sigma which is not the same as the SQL keyword select which is actually the project operation. And this example shows some relation and the result of a select with A equal to B and D greater than five on R. So A equal to B are that tuple, that and that. So three of these rows satisfy equal to B. D greater than five is satisfied by these two but not that one, so that's the final result. So an SQL query uses the select operation internally. The syntax is different but many times queries just pick out certain rows from a table. Then we have the project operator which selects particular columns, not rows but columns. And this is called select in SQL. So this again you must be familiar with. We have a table ABC and this project operation, project AC on relation R, initially would give this. Now there are actually two versions of the project operator in relation algebra. In the standard relation algebra, every relation is a sector, it cannot have duplicates. So what you have to do is when you do this project, if you see here, there are no duplicates in this relation but when you do the projection on AC, alpha one appears twice, because of this row and this row and duplicate elimination is done on this to get only one copy of alpha one. This is the standard relation algebra project operation. But SQL doesn't work that way. For historical reason of efficiency in the past, less important in these days but there is a history to these things. SQL by default does not remove duplicates. Unless you say, what is the syntax for it? Select this thing. You can tell it to remove duplicates. If you don't, you get the duplicates. Oh, there are some pop quiz questions which I'm not using in this workshop but I may use some of these in the main workshop. So if you see these questions there, don't bother about them unless I explicitly ask. And then the third major operation in SQL is the Cartesian product. Again, you're familiar with this. So you have two relations, R and S and the Cartesian product matches every row in this to every row here. So alpha one is matched with all four rows. So you get these first four rows and then beta two is matched with all four rows. So you get the next four rows. If you see this side, CD, these are the same thing that's here. You're familiar with that. And in SQL, where does the Cartesian product come in? If you use the from clause, you say from RS, implicitly there is a Cartesian product. And of course, you know that practically speaking, you don't want to do the Cartesian product. It will blow up the size. Already with two small relations, you have a fairly large thing. Even if you have a thousand tuples in each, the output is a million. That's crazy. So you don't want to do that. You want to have a join with some conditions. We'll come to that in a moment. Then there's the usual union set difference, intersection, I'm gonna skip that. And let me come to the join. So there are several ways of doing joins. But the two primary ways are either using a condition which says which rows match which rows. Or one of the special ways of doing it is natural join. So this is again supported in SQL and in relational algebra. Both versions are supported. There's a quick reminder, forget the details, but we'll just do it through example. So we have this relation in this. What are the common attributes? B and D. You see here, A, B, C, D, B, D, E. B and D are the common attributes. So two, a tuple from here matches a tuple from here. If their B, D values match. And since they match, we are going to keep only one copy of B and D. So B and D come from both the tuples and they have to be the same value. So in the natural join, B occurs only once, B occurs only once. In SQL, if you do a Cartesian product, what happens? B will occur twice, D will occur twice. So how do you distinguish these two? You say R dot B, R dot D, S dot B, S dot D. So that's the SQL equivalent. So that was a very brief overview of the relation algebra. There are, this is a very basic relation algebra. But all of you know that SQL also has aggregate operations. And you would know that the aggregation operation cannot be expressed using these basic operations in general. There are some special cases where you can do it, but in general you cannot do it using this. So there is an extension to relation algebra, which includes what operator? There's a G, depending on which textbook you use. This slide, no, I don't have it here. So we have used telegraphic G notation or this other notation is actually becoming a little more standard these days, which is, what is that? The Greek letter, gamma, so group by. So we'll probably eventually switch over. More people are using that than the other one. Okay, so what does that operator do? It groups, stumbles, we'll see it a little bit later. You're familiar with that. Okay, so that's a brief overview of relational algebra.