 Let's continue our discussion about databases. We've already talked in class about what a database is and how that's different from a data structure. But today I'd like to focus more on why you would want to use a database and how you would use a database. So first, let's discuss the why. Of course, the biggest reason is that we have lots and lots and lots of data that we want to store and manage and we need an efficient way of doing that. Databases allow us to do that. It's also nice to be able to organize all this information and to give structure and that's what a schema is for. And finally, it's nice to be able to extract the information out of data that we want to make sense of. And so having the ability to query the data or ask questions about the data is a huge selling point for using a database system. I think one of the most important topics as you read through the chapter this week is check out page 408 where it talks about data independence. The ability to change or reorganize a database without having to go and change all the applications built on top of that is a huge selling point for using a database system over just regular files. And so we'll talk more about that in class on Wednesday, but these are sort of the main reasons as you're reading the book that you'll want to look out for. And as I discussed in class, there's a lot of different technologies for using a particular relational databases. So we've got the commercial ones like DB2, Oracle and SQL Server and also all the open source ones like MySQL, PostgreSQL and SQLite. We'll learn this week how to use SQLite in the lab, but there's a difference between the database management system, which is one of these software tools, and a database, which is the abstract concept of data. So let's talk a little bit about how data is organized in a relational model. Of course there's other ways to organize information like XML and JSON and other data models like the object-oriented one discussed in the book. But by and large the focus this week will be on the relational model. So here I have a table or a relation, more formally speaking, of a few movies. So we've got Gone with the Wind and What Year It Was Out, How Long It Is in Minutes and the Genre. And something to note is that the structure of relational data is relations, or in other words, tables. And the columns define the roles that all the individual pieces of data play. It also has a bunch of operations that we'll learn this week. There's an entire algebra for this, so operations like selecting, projecting and joining data. And we'll define those here in a minute. And also the relational model allows us to add constraints to a database, such as, okay, the genre of a film must be action, comedy, drama, and so forth. We can just give a set of terms that applies to that column. Or you can say things like, no two movies can have the same title in a year. That's called a key constraint in terms of database theory. And just being able to constrain the values of rows and columns within this table gives a lot of flexibility for having a consistent database. Before we jump into the actual operations, let me just say a note on terminology. A lot of terms and databases in other fields of computer science find their roots in mathematics. So in particular, when you study set theory in discrete math, like in CS227 at JMU, you'll learn terms like relation, attribute, tuple, and the layman terms for these words are like table, column, and row. So I might use those terms interchangeably and you should just sort of memorize, at least put in your mind, okay, the tables relation, the columns and attributes, the rows of tuple, and so forth. And just to make things even more exciting, the same thing applies to the operations we put on data. So mathematically speaking, you can project data, you can join data, you can select data. But then, in the design of the SQL query language, those terms were renamed to select from and where. So there's a little bit of confusion, especially right here. When you say select, do you mean a selection or projection? And I'll try to be consistent and where I say that, but you should note, especially as you're reading through the textbook, and it talks about selections. It's actually referring to the where clause of an SQL statement. So make sure you don't get confused by that unfortunate, you know, double naming of terminology in the field. Alright, so moving forward, let's talk about SQL. This is the structured query language. It's a different type of programming paradigm from the Python programs we've been writing so far this semester. The reason why is a declarative language. You basically say what you want. And the system will figure out how to get it. So for example, I might write a query like select title from movie where rank is less than or equal to 10. And then the database system will run that query, which will be a bunch of for loops and this statements internally and give you the results. And so you just have to know, okay, I have a bunch of attributes or columns that I want from one or more tables and what conditions applied to the results. Just a note on style, you'll notice an SQL queries that typically programmers will put each clause select from or where on a separate line. Although there's no requirement in the language to do so, it's just a lot easier to read. And the other convention that you'll see often is that the keywords in the language like select from and where which light up in blue and an editor will be in all capital letters. And everything else like the names of tables, the names of columns and so forth are typically all lowercase letters. That just makes it easier to read. You can tell the difference between a reserve word and a variable name that somebody made up. So I'd like to show you a little bit of an example with SQL hands on and I would encourage you to follow along in on your own computer and do these same exercises that I'm doing in the video today. You can click this link right here to download file example data. And I'll just go to my desktop here and extract that. So of course, if you're on a different computer, you'll just have to figure out how to unzip a file. Just push the extract here button. And inside this folder here I have a number of files, hero, movie and villain. These are called CSV files or comma separated values. It's basically just a common file format that any spreadsheet program or any database system should be able to import. If I just open one of these CSV files in a text editor, you'll see all it is is just a bunch of text. And let me just search for the comma here. Every field in this data is separated by a comma. That's what it means comma separated value. And so you can see this first row of text here is the schema or the column headers. This is a database or sorry, a data set of movies. And I guess they're the top 280 movies at one point in the internet movie database or imdb.com. So we've got the rank, the title of the movie, the distributor, what year it was released, the gross income at the time and also the adjusted gross income which is how much money would that be worth today. Of course Gone with the Wind that came out in 1939 grossed almost 200 million US dollars which is quite a bit of money for back then. That's why you have the adjusted column. You can compare movies across history that way. So anyway this is what the data looks like just in a flat CSV file. There's a couple other files in here like hero has I think the top 10 heroes and movies voted by different users on imdb and also we have or sorry the top 50 villains. So you can look at heroes and villains and movies. I'm going to show you how to do that in a database system using SQL. For the rest of the video I'd like to switch gears and run Firefox because Firefox has a really nice add-on available for doing SQLite in a graphical manner. So if you run Firefox on your own computer you can click on the settings menu and go down to add-ons. And if you don't already have it installed you can search for SQLite sqlite push enter. And the top link here should be the SQLite manager. Again this is a freely available graphical user interface for SQLite so I'll just click the install button. And in order to run it for the first time you have to restart Firefox so I'll click on that now. Now finally the easiest way to get SQLite up and running if you go back to the extensions page you can see how it's installed here. Let me minimize my video here so you can see what I'm doing. If you go to the settings menu and go to customize you'll you can see all these additional features like SQLite manager. I'm just going to go ahead and bring that into the page here so it's part of my settings menu now. So let me click exit customize and I'll go ahead and close this tab. So now I'm back to the start page. And now if you go to by the way this icon is called a hamburger icon because you've got you know a bread and meat and bread. Anyway it's a dumb geek joke here. But that's literally the technical term for the settings icon. So anyway let me go into the SQLite manager and that will open up a new window now. So let me go ahead and make this window the size of my screen here. And now you're able to browse all the SQLite databases that come with Firefox that will be part of the lab on Friday. But what I'd like to do today is show you how to create a new database. So there's a bunch of menus here that you can explore to see what all these things do. We're just going to make a new database and by default when you do that it asks you for a file to store the database in. Because this is just a demonstration I'm going to create a new in memory database and that way I can just skip the step of having to save the file on my disk. So I'll make a new in memory database and now I'm going to go to the import feature under the database menu and I'm going to import those files that I downloaded previously with the movies data. Okay and so we'll bring up the import wizard. I'll go ahead and select a file and they're right here under my recently used. So I'll go ahead and open up the movie file first and you'll notice here it automatically says okay this is a CSV file. The name of the file is or the name of the table will be movie. I'm going to check the box that says first row contains column names because that way it knows what all the names of the columns are. They're already in the file and then I'll go ahead and push okay. After pressing okay you get this little message here. Do you want to modify the table and that gives you an opportunity to take a look at the columns that it detected from the CSV file and specify the data type for each column. If you want to be able to write queries over the data type you have to add this structure, this schema. So I'm going to say rank is an integer, title is text, distributor or the movie company is text, the year is also an integer and the gross and adjusted gross income are numeric values. I'll just pick double because that's the default data type for doing a floating point number that isn't just an integer. So let me go ahead and just say okay on that and it says are you sure you want to do this? Import 280 records imported. While I'm here on the import screen let me go ahead and grab those other two databases that I mentioned earlier or rather the other two tables. So I'll go here to select file, we'll do hero first, push okay and do you want to modify? Yes, okay so the ID is an integer, hero is text, actor is text and film is also text. Alright and it says are you sure you want to do this? Yes, okay and let me go ahead and select villain. Again on all these make sure that the box here is checked first row contains column names. I'll push okay on that, yep let's take a look ID is an integer, villain is text, actor is text and film is text. And are you sure you want to do this? Okay so I guess it's pretty easy to import some CSV data. Now the interesting thing is we can start using this tool on it. So if I go here to the structure tab, all these tabs are how you're going to interact with the data. So the structure tab shows you not surprisingly the structure or in other words the schema of the data and as I click through the table you can see what all information it has internally stored about the schema. It also shows you the create table statement which is the SQL syntax for creating this type of table in another database system. Let me go here to browse and search. Now this tab allows you to just see the contents of those tables quickly and the different colors show you the different data types. So for example green is integer and cyan here is text and the darker green is the floating point number instead of an integer where it double as I specified it. And so that's just a quick way to see what all data is there. But the more interesting one is execute SQL. So here is where I can actually write the example code that's in the book against these data. So I might say select star for a movie for example. Click the run SQL button. There it is. And then I might add a where clause to this select star for movie where rank is less than 10. So let's look at say the top nine movies right. If I hit run SQL now I filtered it down like that and of course instead of this star here this asterisk I could say select rank title here. I can just give it a list of columns that I want to project out of that data set. So if I hit run SQL you can see the result now just has those three columns. So that's the basic idea is I can either just browse and search the data or I can run arbitrary SQL statements on it and I of course can import other CSV files into this tool as well. Let me show you an example of a join query where I'm actually looking at two different tables. So I might say something like select star from hero. Let me just now and often when I'm writing SQL statements I like to just write a little bit of code and run it to make sure it works and see where I'm at. That's a general principle actually in programming is never go write a whole bunch of code before you test it, compile it, see what's happening. It's easy to get lost and not know how to fix it at that point. Here I have the heroes table and of course it just has four columns. It shows me the hero name, the actor's name, like Harrison Ford was Han Solo for example. He was also Indiana Jones for example. And let me actually change this to villain now, B-I-L-A-I-N. And so you can see in the villains database I've got some, I'm not sure I know all of these, I haven't seen all these movies but you'll probably recognize, yeah I know the Wicked Witch of the West for example, that's a classic. Alright now look at the hero comma villain in the result. This is called a cross product. It's basically going to create, notice how there's 2, 2500 rows that were returned. It basically took every row in the first table and lined it up with every single row in the second table. So here I've got hero Atticus Finch and he's with every single villain now in the other table and you can scroll through the results and see that Cartesian product or that cross product. What you really want to do in order to get a join, now the joins that are discussed in the textbook always have these conditions right? So I might say something like hero.film is the same as the villain.film. I'm hoping this is going to work. So now you can see if I run that query it actually starts lining up data that's related to each other right? So I've got you know Jody Foster and Anthony Hopkins are both in the silence of the, well I can't read that lambs I should have known that right? Or you know in Batman we've got Batman is the hero and the Joker is the villain right? So often you know if you forget to put a condition on that from clause right? If you don't have something in your where clause that actually links the data up together it's going to assume there is no condition and all the data matches up with all the data. So we'll explore some more of that on Friday in the lab but I wanted to at least put this tool in your hand so you can see what it is and also try to make sense out of what you're reading in the textbook with relational operations. So with that we'll see you tomorrow in class and let me know if you have any questions.