 So, very first thing, what is a database? A database is basically just any kind of system that's designed for storing lots of data, and typically it's a system made available on the network which handles many incoming requests and those requests can overlap. Most often the database has been given a defined schema, basically an imposed structure on the shape of the data, and usually the database has some mechanisms that ensure the structure gets preserved that no one can insert, modify, or remove any data in a way that will violate the schema. To these ends, handling many overlapping requests and preserving a schema, preserving a data structure, most databases handle requests in what are called transactions, which is basically a defined independent unit of work that is enacted upon the data. So one client contacts a database and conducts one transaction while simultaneously perhaps another client is contacting the database and performing another transaction. And the idea is that the database is somehow making sure those two separate transactions don't interfere with each other. Ideally transactions conform to what's prescribed by the acronym ACID, which stands for Atomicity, Consistency, Isolation, and Durability. What atomicity means, what it means for a transaction to be atomic, is that the entire transaction is performed all or nothing. So either all the modifications made in a transaction are committed to the database, otherwise all the modifications are rejected, and if rejected the transaction, then all of its modifications, it's like they never happened. So in the course of a transaction, something might arise that will require the transaction to abort effectively, and so all the changes made up to that point in the transaction, they have to get rolled back, they have to be undone. And then of course, the client making this transaction should be notified that hey, we aborted your transaction, so try again later or something. Now you're probably wondering why would a transaction get aborted? Well, there are several reasons this might happen. One reason is because something is happening in another transaction which then interferes with your transaction and one of them has to be aborted. That's a possibility. It could because there's a power failure or some sort of crash or some kind of internal error in the database itself. The important thing in all these circumstances is that whatever work, whatever modifications are being performed under transaction, they should be all or nothing. So we either take it all or we drop it all. What consistency in asset means is that when a transaction completes, it should leave the database in a so-called consistent state, meaning a state consistent with all of the rules, all of the constraints imposed in the schema. So if the schema says that the data should conform to such and such structure and the values should conform to such and such rules, every transaction should leave the database conforming to those rules. Isolation in asset refers to the property that transactions should be totally independent of each other. They shouldn't interfere with each other in any way. Or actually to put it more accurately, overlapping transactions should not interfere with each other. Obviously, if one transaction updates the value of one piece of data, you want that new value to then be subsequently read by any subsequent transactions which read that same piece of data. The issue with concurrency is when you have multiple actors all trying to act on the same data at the same time, you have effectively, you have different hands all reaching into the same bag, not knowing what the other hands are doing. And without proper isolation of transactions, this could lead to situations where, say, you have two overlapping transactions and while one transaction is updating two pieces of data, the other transaction is trying to read those two pieces of data. But because of happenstance of timing, the transaction reading the data gets one of the values updated but not the other one. And depending what your application does, this could be very undesirable. You might have written your code such to assume that when you update multiple values that any one reading the database is going to see all of those updates as a whole rather than getting a mix of new updated values and old out-of-date values. So again, the ideal is complete isolation of transactions such that when a client connects the database and makes a transaction in the course of that transaction, it's as if that single client has the database all to itself and no one else is making modifications or reading data. Now the trade-off there is that if you have completely isolated transactions, the only way to achieve that really is to serialize all the transactions such that while one client is making a transaction, all other clients have to basically just wait. And so obviously from the perspective of the clients that are forced to wait, the performance of the database has degraded because they're sitting around and waiting for it. So in practice, databases typically offer different levels of isolation. It's a configurable option of how much isolation do you really need. Do you need total isolation or are you willing to relax the rules a bit to allow for more overlapping transactions to get on with their work? These different isolation levels are something we'll talk about much later. Finally, the durability of a transaction simply refers to the property whereby once a transaction completes or commits as we say, then all of the changes it's made to the database, they get preserved. The data should persist and it should persist even in the event of say power loss or crashes. Now of course systems do go down, but the point of durability is that in such eventuality, the database should be able to be restored to its last good state. So it's these four properties of transactions, animicity, consistency, isolation, and durability that should help convey why it is desirable to use databases, why rather than have your application store data in just some makeshift format which your application then writes to files, as soon as you recognize that your application makes use of a lot of data, it makes sense to then pass that job off to a database so that you don't have to in your own code provide for animicity, consistency, isolation, and durability because providing all those things is really damn tricky. The standard file system itself does not offer those features. In this unit we're going to specifically learn about relational databases. Relational here refers to what's called the relational model of data which was devised by a guy named Edgar F. Codd in about 1969. The relational model for structuring data is one of a few alternatives. There's the so-called hierarchical model, a network model, an object model. The relational model however is by far the most dominant. The large majority of databases and use out there are relational databases. And in fact the term rdbms stands for relational database management system. And this term rdbms refers to the software itself. And the reason it's called a management system rather than just a database system is because you can have multiple separate databases all under control of one program, of one database system. In common parlance though we just usually refer to the software not as an rdbms or a management system, we just say the database usually. Now SQL more commonly pronounced SQL and standing for structured query language that refers to a standard language used by most databases. It's the language which clients use when they talk to the database server. The term query refers to a request for data so when you send a request to get data back from the database that's a query. SQL however is not restricted to this query, it's also for requests to insert data or to modify data in the database. So calling SQL just a query language is a bit of a misnomer. Now SQL as I said is a standard it was first introduced in about 1974 and it's used by basically every relational database out there. However the issue is that not all databases use exactly the same SQL language they all have their own variations. So in practice the SQL code you write for one database generally has to be rewritten to work with another database. Now there are dozens of different relational database management systems out there these five here though I would say are the most popular. First Oracle Corporation is one of the big software companies around and it's it built itself virtually entirely on its database product. For many many years all through the nineties at least Oracle was regarded as basically the serious database of choice for the enterprise and Oracle charged for licenses of its database accordingly. Currently Oracle licenses go for something like tens of thousands of dollars per CPU license meaning if you have a database server with say eight CPUs you have to pay for eight separate licenses to run it on that server. So Oracle is a very very very expensive. On top of being by far the most expensive option is also by considerable margin the most complex. Microsoft's database which they call SQL server I would say is also one of the more complicated options and also one of the more expensive though I would say not nearly so much as Oracle. It is considerably less complex and the licensing cost is considerably less shocking. Now despite having SQL in the name of its product Microsoft's database is not that much more conformant to SQL than any other database out there. No what I say it's more divergent but Microsoft actually has codified its own extensions of the standard SQL language what it calls T-SQL. T isn't transact. The MySQL database is so-called because the original programmer named it after his own daughter named My. This programmer and his partner founded a company around this database product and they actually released MySQL under an open source license and MySQL today is still the most popular open source database. Now of course the company needed money so they sold an enterprise version of the same database with a few extra features. Back in 2008 this company was acquired by Sun Microsystems but Sun Microsystems went under a few years later and got acquired by Oracle so now Oracle actually owns the rights to MySQL. This has caused a lot of consternation in the open source community because Oracle is not exactly the most friendly company towards open source. Oracle of course can't outright revoke the existing open source code base so it remains to be seen whether Oracle successfully undermines MySQL as an open source product or not. And again despite having SQL in the name of the product itself that doesn't guarantee at all that MySQL conforms any better to the SQL standard than the other database and in fact MySQL is kind of one of the more divergent SQL databases out there. The happy news is that well if MySQL is on the wane as the open source database of choice well Postgres is picking up the slack. Postgres as it's usually called we also sometimes say Postgresql. Postgres sort of gets its name as a pun on the previous database which it was sort of based on and that database was called Ingres hence Postgres. Postgres is my database of choice and it's the one I would recommend starting out with. The only reservation I have of recommending it over MySQL is that Postgres is actually what's called an object relational database so it's not just a relational database it's a object relational it's mixing the object model of data and the relational model together or more accurate it's actually just taking the relational model and putting object features on top which fortunately means that you can actually work with Postgres and just ignore this whole object aspect to it you can treat it as just a plain relational database if you want but when you go look at the documentation for Postgres you should be aware that there's some stuff in there that is not strictly relational it's object relational and that explains why there's some added complexities there Oracle actually is also the other one here which is also an object relational database in fact it was the first major example of such a system but again I wouldn't fret about it you can ignore the whole object aspect of these systems and in fact that's what we will do we'll just treat them as plain relational databases finally SQLite is an interesting example it's also open source but what makes it unique is that whereas in all these other examples and in fact in most database systems they run as network programs they run as servers listening for requests from clients whether those clients are on the same machine or on some other machine so when you make use of these databases you write your own program which then contacts the database over the network SQLite however is a SQL database implemented as a library the version for Python for example is just a Python module which we import and then it has classes and those classes have methods and properties which we then use and this library just takes our data for us and reads and writes our data to our file and that files the actual storage of our database so with SQLite there's no servers there's no clients it's just your application program reading and writing to this file as a database we're just doing so with more or less the same semantics we use when we talk to a relational database a SQL database so to give you one example where SQLite is used the Firefox web browser actually uses SQLite to store all of your browser history and your bookmarks using SQLite for this purpose makes a lot of sense because you would want to use a full fledged SQL database that would require installing and having the user run a full fledged server on their system which is just way overkill and a huge configuration nightmare that most users are not going to manage so for that use case SQLite makes tons of sense be clear however that SQLite very consciously and very deliberately is not appropriate for the sort of applications usually used for full fledged SQL database like or col or SQL server or Postgres so for example the vast majority of web applications out there you have the web server and the web server stores and retrieves data from a database and in that case you want to have a proper full fledged database like Postgres or Microsoft SQL server unless your website gets very little traffic you wouldn't want your website to use SQLite because it just does not perform well for those kinds of scenarios