 When we create a table, we have to specify the data types of each column, and in SQL there actually is specified by the standard, a certain set of data types that must be supported. Unfortunately, this is one of the areas where the databases, the very SQL databases are most divergent because generally there's a lot of overlap because obviously every database needs a type 4 text, a type 4 numbers, type 4 binary data and dates and so forth. But even within that small domain, there's a lot of variation at the very least even in just what to call the data types. So you'll have examples where these two databases have the same data type that just give them different names, which is quite annoying. But if forced to produce a list of data types available in most of these databases, this is what I'd give you. First off, of course, we need number types. So we'd have a type for integers here called integer, which is usually a fixed size integer, an integer of a certain number of bits, whether that's 32 bits or 64 bits, but depend on the different on which database we're talking about. And then you'll usually have some number of float data types here float and double precision. And actually these are the names prescribed by the SQL standard. And yes, when you write double precision, you have to write out double precision. You can't just write double like you would in C or Java. And those two, of course, are floating point types where float is 32 bits and double precision is 64 bits. And then numeric is the name usually given to the decimal type. And for this type, we actually have to specify in parentheses first a precision and then a scale precision, meaning the number of significant digits and scale effectively meaning how large the magnitude of the numbers can get. How many zeros can you put at the end? For text data, SQL databases usually have two types, one called char, the other called varchar. For both of these, we specify in parentheses a number n, which is the maximum number of characters. The difference being that in a char, each value always takes up that full n number of characters, whereas in a varchar, the database may actually truncate. By its own discretion, it may actually store the data with fewer than n number of characters if the data doesn't need the full n characters. So you can have a varchar and say like 4,000, but then you have just a short couple words in there for the value and it'll be stored not as a full 4,000 characters, but just as however many are needed. And then for dates and times, we have date, time and timestamp, date being just the day of the year, time being just the time within the day, and timestamp being both together. And finally we need some data type for storing just arbitrary binary data, and usually that data type is called blob as in binary large object. So if there's some binary data like say an executable file, which you want to store in a database, then that has to be stored in a column, which has been declared to have the data type blob. Again, what data types are available exactly and what they're called, that differs significantly from one database to the next. This link at the bottom here takes you to your Wikipedia page with a good chart showing all the data types available in all the popular database systems and what they're called. So now we can finally get into looking at some real SQL code. First off, we create tables with a statement beginning, create table, and then we specify the name we want for this table, and then in parentheses we list columns, the names of the columns and their data types. And at the end of the statement, as with all SQL statements, we should put a semicolon. So this statement here for example, we're creating a table called cats, we're giving it four columns, one an ID column of type integer of course, and then a name column with varchar of 100, that is its string values up to 100 characters long. Then there's a lives column of type integer, that is how many lives does this cat have left, and weight, the weight expressed as a float, so like 12.0 or something like that. You may notice that I'm writing all of the SQL reserved words, the special words in the SQL language, I'm writing them in all caps, which is a common practice, but actually not required because the reserved words are not case sensitive. We can write them in any case we want. As for the identifiers, the names that we ourselves are creating, cats, ID, name, lives, weight, those things. By default in most databases, they're not case sensitive either. Some databases though actually have a configurable option where you can turn on case sensitivity for table names and column names and so forth. Now in our example, we created a primary key column called ID in a type integer, as we generally should for basically all tables, but recall what we really want for our primary keys is for the database to auto-generate these integer numbers for us, so we don't have to remember which IDs we've used and which we haven't. And the way we do this is we declare this column, this integer column, as a primary key with the reserved words primary and key. And now when we add new rows to the cats table, we don't have to provide an ID, it gets auto-generated for us and the database will ensure that the ID generated is unique and not otherwise used already in that table. So that's how to create a table. To remove a table from the database, we use the drop statement, and you just write drop table and then the name of the table. So here drop table cats will remove the cats table from our database and be clear that this will destroy the data in our cats table, so any data we've inserted will get lost. If you wish to modify an existing table by giving it a new column, we can do so with the alter statement. We write alter table, name of the table, and then add and then the new column with both the name for the column and the data type afterwards. Likewise, we can remove existing columns by writing alter table, table name, drop, and then the name of the column, just the name, not the data type. And finally, we can actually change the data type of an existing column by writing alter table, table name, alter column, and then the name of the column followed by the new data type. And understand that in most databases when you modify a column like this, changes data type, you actually wipe all the existing data in that column. So it's effectively like you really just deleted the column and then added in the new column with the new data type. I believe in some databases though for some kinds of conversions, like say converting from an integer value to a floating point value, I believe in some cases that it may actually do a conversion if such a thing is possible, depending upon what you're converting to and from. But really if you find yourself really needing to modify an existing table while preserving the data already in a column you want to change, well then you just extract all the data from that column into your application. Have your application do the conversion that you want, assuming one actually makes sense and then stuff the data back in. So there's always ways around this of course. And changing your existing tables is not something you do on a regular basis. That's like a design change of your application. It's not something done in the normal course of business. To insert data into a database means to add a new row into a table. So the insert statement, we write insert into the name of the table and then the reserved word values followed by a pair of parentheses with a list of values separated by commas inside. So here we write insert into cats followed by the reserved word values and then in parentheses a string mittens, strings in SQL are always enclosed in single quotes and then the number 9 and 12.0 and you may be wondering well which columns do these values get assigned to and the answer is it's the same order as when we declared the table itself. So when we declared the table we had the four columns actually. First we had the primary key but that value is being auto generated by the database itself because it was declared as a primary key. And then after that is we had the name field, the name column, and then we had the lives column and finally the weight column which is a float value. So this relies upon us remembering the order in which we declared the columns. So of course remember in a table there's no concept really of order between the columns except for this purpose. There actually is an alternate syntax where you can specify the names of the columns so you don't have to remember what order you originally declared them in but we won't cover that. It's a trivial variation but I'll leave it to you to look it up. To remove rows from a table we use the delete statement and write delete from the name of the table, the reserved word where, and then the predicate, some boolean expression basically. So here for example we're deleting from the cats table every row in which the lives value is less than three. To modify data in existing rows we use the update statement and we write update the name of the table, reserved word set, and then one or more column values which come in the form of key value pairs effectively. The name of the column equals sign and then the value to sign to that column and we separate these all by commas if we have more than one. And in this form we're not specifying any predicate to select rows so this will actually update every single row in the table. So here when we update the cats table and we assign five to the lives column and 9.0 to the weight column those two values will become the lives and weight values in every single row of the table. If we wish to update only certain rows in the table we append a where clause which specifies a predicate. So here we have the same thing but with a where clause with the predicate name equals the string fluffy. And this is confusing because in the set clause the equal sign is used as an indicator of like assignment whereas in predicates the equal sign is used as the conditional operator. It's performing a test returning true or false to pin upon whether these two values are equal. This is the equivalent of what we would write in say Python or Java as two equal signs rather than just a single equal sign which in the vast majority of programming languages is used for assignment not for a quality test. But in any case in this update statement then we are updating the value of the lives and weight column but only in that row or rows where the name of the cat is equal to fluffy. To perform a query we use the select statement which in its simplest form is written select then one or more column names listed separated by commas the reserved word from and then the name of the table. So here are two examples first we're selecting the columns name and lives from the table cats so we get back a table as a result of our query which has two columns name and lives and has all the rows of the cats table but just with those two columns. In the second example here though we don't specify any column names we just use the special symbol asterisk which is used as just a shortcut for saying I want all of the columns from this table. So select asterisk from cats will return effectively the entire cats table with all of the columns. Now in the SQL statement here where it's calling for us to specify a table we can actually specify a table produced from a join because well a join does produce a table right so we should be able to query such tables. So here we're selecting asterisk all columns from the join of cats and dogs. So this first query will return the cross join of cats and dogs with all of the columns. In the second example it's the same deal except instead of a cross join we're doing a left outer join. Remember commonly we call left outer joins just left joins. We could actually write left outer join here and that would also be acceptable but SQL most commonly we just write left join. And then finally the last example again it's the same deal we're doing a left outer join between cats and dogs. Recall though that outer joins and inner joins unlike cross joins may have a predicate which is applied to filter the rows returned by the join. So here this is a left outer join with the predicate specified in the on clause of cats.lives equals 6. And cats.lives here is the notation we use to specify the lives column in the cats table. And this may be important because the cats table and the dogs table there are cases where those two tables we're joining together might have columns of the same name so we have to use the syntax of writing the table name dot and then the name of the column to specify precisely which table we're referring to. So the join that's performed here remember an outer join is first effectively an inner join so we're inner joining cats and dogs together and then filtering on the predicate filtering to keep only those rows where cats.lives equals 6 and then because this is an outer join a left outer join we are then going to add back in any rows from the cats table which by that filtering process are no longer present after the inner join and so the outer join adds those back in matching them up with null values in the dogs columns. I should note here if you're wondering the syntax of SQL is freeform that's why we have semicolons at the end of our statements so the fact that we're indenting the on clause and putting it on its own separate line that's just basically a matter of style. And to make what's going on in the syntax clear we can actually put parentheses around our joins so it's clear that it just produces one single table from which we are then selecting columns. So the parentheses in these three examples aren't really doing anything they're just making it more explicit what's really going on. So we're performing joins here and then using the product of those joins as the tables upon which we are performing select statements. Now in any select statement we can add an optional where clause which is another predicate used to filter the rows. So here we're taking our last example and adding on a where clause that's filtering for the rows in which cats.name equals dogs.name so those rows in which the cat and the dog have the same name and you should be absolutely clear about when this predicate is applied especially in relationship to the predicate of the join itself. The join with its predicate that is performed first and then we are selecting columns from that product and then we are applying the where predicate where then we're filtering the rows with this additional predicate. Now if you wish to group your query on one or more columns you add a group by cause which is simply written group by the name of the column on which you wish to group. Again here we're qualifying the name of the column with the name of the table just in case the dogs also happens to have a column with the same name. And if you then wish to filter the rows after the grouping you use the having clause and what's special about the having clause unlike the where clause is that in the having clause you can use aggregate functions which is not the case with the where clause in most SQL databases. So here we're grouping cats by age and then we're using the aggregate function min to get the lowest weight out of each age group of cats and we're filtering those rows in which the minimum weight for that age of cat is greater than 10. Now again you should be very clear about when these various predicate clauses, the on clause, the having clause and the where clause you need to be clear about when exactly they get applied when their filtering gets done. And the on clause is it's a predicate applied in a join so it's performed during the join you can say the having clause is then applied immediately after grouping assuming there is any grouping and then the where clause is the final filter on all the rows returned by the select so the where always comes last. Now in our various SQL statements in our update statements, our select statements, our delete statements and so forth there are cases where the syntax is calling for us to provide some table or some value or some set of values and well normally when we provide say a value we do so by just writing the value ourselves in our SQL code it's possible to use a subquery that is to use a select statement within another statement and then if that select statement say returns a table then we can use it in a place where a SQL statement is expecting a table or if a select statement returns a table with just a single column then we can use that select statement in place of where SQL expects us to provide a list of values, a set of values and finally if a select statement returns a table with a single row and a single column that's effectively a single value right so we can use such a select statement in place where SQL is expecting us to provide just a single value. Now to provide an example of a subquery I'll actually introduce first a new operator the in operator which takes a value and then after it a set usually expressed as a list of values in parentheses and the in operator is a conditional operator which evaluates to true if the value is found in the set and otherwise it returns false so our top example here imagine let's say this is a where predicate and it tests whether the value from the column cats.lives is found in the set of 3, 6 and 5 so if cat.lives is equal to 3 or equal to 6 or equal to 5 this will return true otherwise it will return false in our second example here we're expressing the value not as a column name but as just a fixed value the value 4 but we're expressing our set not as a fixed set of values which we write ourselves in the code but as a subquery it's a select statement which is returning a table with a single column the column age from the table dogs and so this in operation here will evaluate to true value 4 is found in the column age of the table dogs aside from in we have a number of other operators which deal with sets and these are actually combinations of the existing comparison operators the equality operator the less than operator greater than less than or equal to greater than equal to etc it's those operators but then followed with either the reserved word any or all that turns these comparison operators into comparison not between one value between a value and a set and so in our top example this is an equals any operation with the value from the column cat dot lives testing whether it is equal to any value in the set 3, 6 and 5 in truth and equals any operation it's really just the same as an in operation swapping out equals any for in here would do exactly the same thing in the second example we have a less than or equals all operation which is asking is this value less than or equal to all of the values in the set so is 5 less than or equal to all of the values in the set returned from the column age in the table dogs so assuming the ages in the dogs column are say 7, 8 and 7 this will return true because 5 is less than both 7 and 8 if the ages in the dogs column though includes any value which is less than 5 then this would return false now subqueries can be pretty tricky to understand and I know I'm going over them quickly I'm kind of glossing over them but I will make one last point which is very important and that is you need to understand when subqueries are really performed in this example here we have a select statement which is selecting all the columns from dogs but then we're filtering those rows with the predicate dogs dot name in and then a set provided by a subquery and this subquery you will notice is selecting the name column from the cats table but then it has its own predicate its own where clause which is filtering on a predicate using not just a column from cats but a column from dogs and this doesn't make any sense if you assume that the subquery inside the other query is performed first which is kind of sensical because you would think just like in sub expressions expressions contained within expressions that you would evaluate inside out but that's not always the case with subquaries what's actually going on here is that the subquery in the where clause is being evaluated multiple times in the containing select statement the outer select statement we are filtering each row of the dogs table and this subquery in the predicate is actually being re-evaluated for each row of the dogs table and that explains how we can use a column from the dogs table in the predicate of the subquery even though the subquery itself doesn't involve the dogs table at least not in the from clause so once you understand when the subquery is actually being performed then you can understand how it works the remaining question though is whether this is a good way of getting the results because performing the subquery for every row of the dogs table sounds really quite inefficient and the answer is that in this case you probably wouldn't do things this way it's much more obvious if you simply use the columns and cats tables together and then use an on predicate to filter for the same condition of cats.name equals dogs.name this gives us the same results though actually to make it exactly the same we'd have to specify the columns we want with a name because actually we're selecting for all the columns now in both the dogs and cats table joined together which is not what we originally wanted we just wanted originally the columns from the dogs tables but otherwise this gets us the same thing being more efficient than the other really depends upon the database because databases generally try and do very aggressive optimizations such that they might take your query and then produce a optimization plan a query plan that does something quite different to get the same result so it could be the case in certain databases that these two queries actually trigger really the same internal actions and therefore take the same amount of time to perform