 To start things off, we'll look at the relational model itself. What's called a relation is composed of both tuples and attributes. In common parlance, though, we call relations tables, and tables are made up of columns and rows. What the relational model calls attributes are the columns, and what it calls tuples are the rows. So, here is an example relation, an example table. It's a table with data about planets, so we'll call it our planets table. And it has four columns for attributes about planets. First, there's the name, there's the distance from the sun in miles of that planet, there's the length of a year in Earth days for that planet, and then there's the length of that planet's day, how long it takes to rotate expressed in terms of Earth days. Each tuple in this relation, each row in this table, represents one planet as represented by these four attributes, the name, the distance from the sun, the length of the year, and the length of a day. Be clear that here at the top of the table in dark blue, those are just the headers for the columns. That's not actually a row. Now, a key thing to understand about the relational model is that the order among the tuples, among the rows, is not a part of the data. As far as the relational model is concerned, the rows aren't in any order at all. They just happen here to be written in this particular order. Because of course, if you present a table, you have to show them in some order. But as far as the relational model is concerned, the fact that Venus here is listed first and Mars is listed last, that's just happenstance. If we mixed the rows up and put them in a different order, it would still be the same table with the same data. Likewise, the attributes of a table, the columns, those don't have an order either. So the fact that we have the name column first is just happenstance. We could put it second, we could put it third, we could put it fourth. It doesn't matter. We could present these columns in any order and it would still be the same table. Another way of thinking about this is that this is a planets table. So each tuple here, each row here, represents a single planet. And what makes up our definition of a planet is these four pieces of data, a name, a distance from the sun, the length of a year, and the length of a day. Our table though contains some number of planets and amongst those planets, there's no concept of order. Each row is just one more planet. And likewise, the attributes that make up each planet, well those don't go in any order either. These are just four independent facts about each planet. Now in the relational model, when we wish to retrieve data from our database, we want to retrieve some data from one of our tables. That's what we call a query. And when we query a table, we don't necessarily want the whole table. We maybe only want certain rows and certain columns. When it comes to picking columns, we just specify the ones we want by name. When it comes to rows however, we use a predicate, some test condition, to filter for the rows that we want. So when we query our planets table, we might filter for say, the rows in which the distance from the sun is greater than 100 million miles. So that leaves us with Uranus, Neptune, Jupiter, Saturn and Mars. And say that in our query, we don't want all the columns. We just want the length of the year and the length of the day. Well, we just specify those by name. And this is the end result. This is the result of our query, which notice looks very much like a table, because it is. It has columns, specifying attributes, and then it has some number of rows. So this is actually a very important concept. Any query always returns just one table. The question then is, well, what if I want a query that brings in data from multiple tables? And well, yes, that's actually possible. But doing so will involve what's called a join, where you take more than one table and combine them into one. And in your query, once you've joined tables together, you can then again select for columns and also filter for rows by a predicate. So first though, how do you actually join tables together? Say we have these two tables, a cat's table and a dog's table. The cat's table has three attributes, three columns, name, wives, and hair length. And you can see it has three rows. So it's effectively a column with three cats. And then the dog's row simply has two attributes, sex and name. And it has two rows, so effectively two dogs. Now, if we prefer what's called a cross-join of the table's cats and dogs, this is the table that results. This is what we do when we match up every row in the cat's table with every row in the dog's table. It's taking the columns of both rows and matching them in all combinations. That's why we end up with six rows, because the cat's table had three rows and the dog's table had two. Three times two is six. When you pair up everything in a set of x things with a set of y things, the number of pairs that result is x times y. So if, say, the cat's table had five rows and the dog's table had seven rows, then the result from the cross-join would have 35 rows, five times seven. Also note that this table has all of the attributes, all of the columns, of the original two tables. And also note that I've labeled each column with the table from which it originates, which is important here because we have both a cat's name column and a dog's name column, and those are two distinct columns. So just to state this all again, when you cross-join tables A and tables B, you're taking each row from table A and matching it up with every row from table B. So the number of rows you end up is the product of the number of rows in A, the number of rows in B, it's just the multiplication of the two, but the number of columns is simply the sum of the number of columns in table A and the number of columns in table B. So this is what's called the cross-join. There are also, though, what are called inner-joins and outer-joins. And inner-join, very simply, is the same deal. We perform a cross-join, but then we filter out rows by a predicate. So here we've performed the same cross-join from the same two original tables, but then we filtered to keep only the rows where the dog's sex column value is female. Now, at this point, you're probably a bit skeptical about the utility of joining tables together because in this example, we've taken really what are two totally unrelated tables, and the joining of two unrelated tables generally doesn't really produce useful results. If we go back, though, and tweak our example tables just a little bit, here changing a few of the names of the cats. And once again, we do the cross-join. Well, now with an inner-join, we can filter by a predicate that is possibly useful. Here we're filtering for all the rows in which the cat's name value equals the dog's name value. So we're ending up with a table that lists names which happen to be shared by both a cat and a dog. Still not the most compelling example in the world, but hopefully you can begin to see how joining tables together might be useful. Now, the last kind of join is what's called an outer-join in which we take the inner-join and then after we filtered out rows, then we add back in any row from one of the two original tables that is not present in this result, in the filtered result. And these rows we add back in, we match them up with null values. So here, for instance, is the same inner-join we just did but made into an outer-join. And whereas the inner-join left us with just two rows, this outer-join adds back in a third row with the cat-fluffy matched up against null values in the two dog's columns. Why does the outer-join do this? Well, recall in our original tables we had three cats, fluffy spot, and princess. In our dog's table we had two dogs, princess and spot. And both of those two dogs, princess and spot, were included in the result of the inner-join, so the outer-join doesn't need to add them back in. They're already represented in the table. The cat-fluffy, however, the inner-join was missing that, so the outer-join adds it back in. The inner-join left us without that row from one of the original tables, so we add it back in, but then we match it up against null values in the columns from the other table. In this case, the dog's table columns. Now, there are actually three different kinds of outer-joins. There's what's called a full outer-join, which ensures inclusion of all rows from both tables, both of the input tables. There's what's called a left outer-join, which ensures inclusion of all the rows from just the left table, which is to say if we write A outer-join B, the so-called left table is A, because it's the one written on the left. And conversely, a right outer-join ensures inclusion of all rows from just the right table. So if we A right outer-join B, that's ensuring inclusion of all rows in just table B, not A. In common parlance, we usually don't say left outer-join or right outer-join. We just say left-join and right-join because the outer part is implicit. There's no such thing as a left inner-join or a right inner-join or a left or right-join of any other kind. They're always outer-joins. For a full outer-join, you can't drop the full part. You have to always say full outer-join. When it comes to inner-joins, and I should say also cross-joins, when it comes to inner-joins and cross-joins, they are both always commutative and they are associative. And commutative recall refers to an operation where it doesn't matter the order of the operands. So if you write A inner-join B, it's the same as B inner-join A. Just as A plus B is the same thing as B plus A. An associative operation recall is one in which when you chain a succession of these operations together, it doesn't matter in which order you do them. So if you wish to inner-join together A, B, and C, or cross-join them, it doesn't matter if you join A and B first or B and C first. Or for that matter, we could join together A and C first and then join the result of that to B. Because inner-joins and cross-joins are associative, it doesn't matter. Outer-joins, though, are a different matter. While full outer-joins are commutative, left and right outer-joins are not. So A left outer-join B is not the same thing as doing B left outer-join A. And when it comes to stringing multiple joins together, outer-joins of any kind, full left or right, are not associative. So when you write A outer-join B and then outer-join the result of that to C, that is different than first joining B and C together and then joining the result of that to A. So in short, with outer-joins, the order of joins matters in a way that it doesn't with inner-joins. A final thing to note about joints here is that it's actually possible to join a table with itself. So here, for example, we take our 2 by 2 dogs table and join it with itself in a cross-join. And what we get back is a table with four columns and four rows, because 2 plus 2 gets us four columns and 2 times 2 gets us four rows. So that's all the gist of what you need to know about joining. There are a few more interesting things, though, we can do in our queries, such as, say, filtering four distinct rows. In other words, getting rid of duplicates. In this table, for example, we have two rows where the values in each column are all the same. It's two cats named princess with six lives and long hair. If our query filters four distinct rows, then these two rows consolidate into one. In other cases, it may be useful to consolidate columns that only match in certain columns that aren't total duplicates. So here, for example, if we group on the name column, we're consolidating duplicate names, leaving us with just a name column. Since we're grouping on name, it doesn't make sense to include the other columns because the values in those other columns might be different. The hair length, for example, we have three princesses that are getting consolidated into just one princess, but two rows with the name princess have long hair while the other one has short. So the question is, well, if we had a hair length column here, what would it be? Would it be long or short? And the answer is, well, neither would actually make sense. We have conflicting values. It is possible, though, to group on multiple columns. So if we group on both the name column and the hair length column, then we get back a table with the name and hair length columns in which the name and hair length together form a unique pair. So we have two rows with the name princess, but one has the hair length short and the other has hair length long. Likewise, if we were to group on lives and hair length, we'd end up with a table with the lives column and the hair length column with two rows with a lives value of six, but one with hair length long and the other with hair length short. When we group columns it's then possible to use what are called aggregate functions. An aggregate function produces an output value by taking in all the values to get grouped together. We can then produce these output values to produce totally new columns, columns which aren't in the original table. So here we have a table of car inventory specifying a car make, car color, and then the quantity of cars with that make and color. So for example the top row here says that we have 15 blue Toyotas. If we were to then take this table and group on the make column that leaves us with a table of three rows, Toyota, Ford, and Honda. What's new here is that we're using the aggregate function sum to produce a second column and the sum function is taking the value from the quantity column and what the sum function does as you might imagine is sum together all the values from that group. So the Toyota group here has the quantities 15 and 10. So you add those together and you get 25. The Ford group here has the values 3 and 20 which you add together and get 23. And then the Honda group here just has the one row so it's just the value 9 added to nothing. So we just get 9. So you can see how this is useful. Given our table of inventories of car make and color we've gotten a table that tells us how many cars we have in total per make. For another example we're taking the same inventory table but we're grouping on the color column and we're using the max function instead of sums. We're taking the max of the quantity and what the max function does is it takes all the values from the group and it returns the largest one. So in the blue group we have the quantity values 15 and 9. 15 is the larger so that's what gets returned. In the red group we have the quantity values 10 and 3. 10 is obviously the larger so that's what gets returned. And then in the green group there's just one value 20 so the max function just returns 20. I should state at this point if it hasn't been obvious but these two functions these two aggregate functions sum and max only work on numeric values. It wouldn't make any sense on say a string value like Toyota or Ford. Now last thing to say about aggregate functions is you actually can use them when you don't do any grouping it's just that it treats the whole table as a single group. So here if we query our inventory table and tack on a sum quantity column notice the value for that column in every row is 57 which is the sum of all of the quantities together. Now aside from that one exception the aggregate functions always work on groupings of rows. There are non aggregate functions however what we might just call value functions and also we can use a number of operators to produce new values. So here for example we have a table with widths and heights and in querying this table we're tacking on two columns. One which is the multiplication of width and height together and another column which uses the square root function to get the square root of the height. So in the bottom row here for example the width times height column has the value 16 because the width is 2 and the height is 8. Multiply this together you get 16 and the square root height column value is 2.828427 because that's the square root of 8. So again what's going on here is we're using operators and functions provided by the database to produce new values, to produce new columns using values from other columns in the same row. Now the question is why do I need my database to do this stuff for me? Can't I just get the data from the database and then do the operations myself in my own code? And the answer is yes you can. The issue though is whether or not it's done more efficiently for your purposes in the database or whether it's easier to do it in your own code which is more efficient. Depending upon exactly what you're doing that answer might change. Generally I would say it's better to let the database do as much work as possible because the database is generally optimized for doing these sorts of things on large sets of data. Does this always hold true in all cases though? Well it probably doesn't so you'll probably find there are cases where you're better off doing the work outside the database. Now I did stress that in the relational model the order of the rows of our tables is not a facet of the data. If you take any table and you mix up the order of the rows it's still the same table. Of course though when our applications query data from the database and we get it back is this result set a bunch of rows. Quite often we want those rows in some kind of order because say we're going to present them in a certain order like say alphabetical or something. So relational databases actually do allow us to request our queries to come back in some certain order. Even if the data itself as stored in the database has no order when we get it back when we query it then we want it in some order. We want it to be presented to us in some order. So for example here again we have our table of cats and if we want to query this table but get it back in some certain order we specify which column we want to sort on. In this case we specify the lives column and we specify whether we want this order to be ascending or descending. Ascending very confusingly and ask backwards means that the values get larger as we go down the table. So as we descend the table the values get larger. It makes no sense. It's totally backwards. But we say that the values are ascending in this order because from the perspective of the program making the query and getting back the results you know in our code we get back this list of rows. And in the list we want the first thing to be the smallest value and we want the values to ascend to get bigger as we read through the list. So that's why it's called the ascending sort because the values as we go downwards they get bigger and so as we go from left to right from first to last in our list they get bigger. That's why it's called ascending. A descending sort as you can imagine is the precise opposite. The values get smaller as we go down the rows as we go from the first row to the last. And the last thing to say about sorting is you don't have to sort on just numeric values. You can also sort on text values. And as far as text is concerned values lower in the alphabet, later in the alphabet are considered greater values. So if you do an ascending sort on a text column that's effectively sorting alphabetically. A descending sort would be reverse alphabetical.