 So, this chapter 5 which I am not going to cover completely has several things. The first part is JDBC, this is the only part I am going to be covering. It also has other embedded SQL, it has ODBC, it has more on SQL data types and schemas, it has tough on functions and procedural constructs in SQL including the standard version of Oracle's PL SQL, it is called SQL stored procedure. Each database does its own way, I told you about assertion where the standards bodies jumped ahead, they were partly motivated by the procedural construction SQL where they were behind, each database is implemented its own. And then they defined a standard which nobody implements because database have already had their standard way, people have already been, I mean their own way, not standard, their own way. People have already been building applications using that, now nobody wants to give that up and go do something else. So, each database has its own way, so what we do here is we cover the features but the syntax depends totally on your database. We cover meaning in the book, I am not covering it in this set of lectures. There is also stuff on triggers on OLAP and last but one over there is advanced aggregation features. This is now increasingly supported by databases. Postgres supports a whole variety of new aggregation features. We looked at simple ones, group by aggregate functions. There are a bunch of new features which include what are called windowed aggregates. For example, I want the, I have data which is per day, fail. I want to group by month or I want to group by week. But I may also want to group by consecutive periods of a few days to smooth out the data, it is called windowing. So, sales may go up and down each day, but I want to see the trend across the month. So, I smooth it out by let us say a full day window and then I see is there a trend across the month. So, such aggregates are supported. Then there is support for ranking. So, supposing I have a list of marks obtained by students, researchers often need to get ranks first, give ranks to students. How do you do that? Conceptually very easy, you just sort, give first rank to the first mark or if there are multiple people with the same mark, all of them get first rank, then go to the next value, give the next rank, you can do that easily. Procedure release, very, very easy to do this. But can you do it in a pure declarative language like SQL? Then so it is not trivial. How do you define what is the rank of a person? I can easily say it imperative, I can easily say sort it, assign ranks thus, it is a very small loop which anybody can code. But can I declaratively say, specify in SQL what is the rank of a particular person, of a mark, I have given a set of marks. How do you define what is the rank, logically, not through a procedure, maximum value will not be the first rank, but out about second rank, third rank, what is the i-th rank or given a mark, what is its rank, how do you define it, in a subquery, what do you compute. Yeah, so the simplest way to think about it is arrange the mark, but you are not allowed to do that. SQL, you can have order by, but that is only for output, anyone else wants to, you have a ranking function, so that was added to SQL, it is now part of the standard. But if you did not have it, that is what you can still do it, it is a very, very inefficient way. If you think about it, the rank of a student is the number of student who have higher marks than this student plus one. If you take the student in the highest mark, nobody has a higher mark, so there is zero student with higher mark, so the rank is one. If two people get the first rank, what is the next rank? You could define it either two or three, let us keep the three one, with you have breaks in the ranking, that is often used. So then, the next mark will have two people with higher marks, so the rank is three. So I can write a subquery to compute this, to compute how many people have higher marks, but it turns out it is horribly inefficient. If you think about it, for each student I again go over the list of all marks, find how many got higher, count it and output it, very, very inefficient. You can easily write this query, but if you run it on a very large class, it is going to run pretty slowly. So the SQL standard was extended to provide a built-in way of generating ranks. So that is also part of the advanced aggregation features. So go read it up if you are interested. Postgres does support this, so you can try this out on Postgres later on. So now, a very brief overview on JDBC, so what is, well, ODBC is a father of JDBC in some sense. The goal of all of these systems is to provide an API, which programmers can use to connect with a database server, send SQL queries over, get results back. And finally one other thing, which is to get metadata, like what are the relations in this database, what are their attributes, what are the primary keys, what are the foreign keys, you can get such data about the schemas and so on, which is called metadata. So the JDBC API supports all of these, connecting, running queries, getting results back, getting metadata information. So let me do it by example. You will be writing such queries in the lab tomorrow, not today. Some of the syntax is a little weird, you may not fully understand what is going on, but I will try to explain it. So the first thing over here, it says, class.porname oracle JDBC driver oracle driver. Actually I should have updated it to Postgres SQL, your sample programs have the Postgres variant of this. So what this is doing is, this is dynamically loading a library to talk to Oracle, because the API is the same, whether you talk to Postgres or Oracle, the API is the same, but the actual code implementing it varies. In fact, the API even allows you to connect parallel to two databases, one Oracle and one Postgres, and issue things to either. So where does the request go, if I say connect to a database, which one does it go to? Well, when you open the database, you specify which one, and correspondingly it will go to that one. So you are allowed to actually load multiple libraries, which are all supporting JDBC for different databases. So the first one is loading the, dynamically loading the Oracle JDBC library, and then it says driver manager, get connection, JDBC, Oracle, SIN, this is a clue to the driver manager to say, use the Oracle implementation of JDBC library. If I say here, Postgres SQL, or what is the exact syntaxes on the Moodle site, then it knows it should use the Postgres SQL library, so the driver manager chooses the right one, and returns a connection. Then the connection itself is what you use to do the actual work. So how do you do actual work? First you get a statement from the connection, the statement is essentially a way of attaching a SQL query and executing. Now the actual work is in the next slide, but before that let me also show you how to wrap up after this. So when you are done, you have to close the statement, close the connection. And if there is an exception, well the syntax in Java is try catch, so here, anywhere here you may get an SQL exception, and it catches it and then prints the exception and says what's going on. In fact, to be safe, it should also close the statement and connection here. Why should you close the statement, in particular, why should you close the connection? You will not close the connection, then there will be lots of connection in the pool, and in that case, Saroj will throw too many connections. Correct. Exactly right. So if those of you who are familiar with Java are used to garbage collection in Java. If you use C++, you know when you dynamically allocate, you have to free. In Java, you are used to never freeing because the Java environment automatically collects unused memory. Unfortunately, it cannot collect unused connection because it's not part of the same framework. And if you don't close the connection, what happens is a lot of connections are open to the database. Now when you test your program by running it once or twice, you don't notice anything. This happened to us many times in early days, in early 2000s when we first switched from our old systems to JDBC, a web front end for student registration, we would test it out. Everything would seem to run fine. Then we say, okay, students, now go ahead and register. And in the first hour, a few hundred students would register. The first few would go through fine. After that, nobody can get in. What happened? A lot of open connections, the database ran out of the number of connections it supported. No new connections could be made to the database, so the following queries could not be executed at all. And we didn't have any clue what was happening. Remember, it was very early, around 2000. This happened one semester, we fixed it, the next semester it happened again because it turned out somebody wrote some new program, new features, and again they forgot to close connection. Eventually, the programmers realized that this was very important and did it right. But occasionally, in some new application, some programmer forgets to close connections and this problem recurred periodically, it's not ever gone away fully, unfortunately. Okay, so this is something to keep in mind. You should close connections when you are done, you should never leave it open. And in particular, if you get an exception, you might end up not closing it, so you have to close it here. Okay, so much for connection, what do you do with the connection? Here is an example, it's again in the tri-catch, you got the statement, so this will come here. So this part, do actual work, is where this stuff will be put in. So this is saying statement.executeUpdate into instructor value, so this is a SQL query directly. Okay, so if there is an exception, you say what happened. So that is for an update. This one is for fetching results. So here, statement.executeQuery, select department name, average, sorry, blah, blah. And the result of that is a result site. Why result site? Because there may be many rows in the result, so it should have a site. So result site is a standard JDBC type. And then on the result site, while result site.next, what do you do? System.out.println, result site.rsite.getStringDepartment, plus space plus getfloat2, which is this one. Why 2? Because this doesn't have a name. The column name for average is not there, we don't know what the column name is. We could have given a name here and then used it here or if we don't give a name here, we use it by position. Why getfloat? Because average is a floating point actually, when it's mapped to Java. And here is a string, I know the column name is department, so it's a department. I could have said getString1, that would also work. And there's a loop, how many times does this loop, as long as rsite.next succeeds, that is the first time it's called returns the first row, each time it's called returns the new row, eventually when there are no more rows, this fails and then the loop exists. So this is the basic way of interacting. So that's the very core thing in connecting to a database. Then there are other, these are some, this detail I already told you. And then there's some other way to deal with nulls, it turns out Java does not know anything about nulls. So how do you know if, I said getString, how do I know if the string was null, if the name was null? For strings it's not a problem, it can return a null value. So if I say getFloat, how do I know if that was a null value? A SQL database supports null, but JavaFloat does not have a value corresponding to null. So I can look at it afterwards, I can say int a equal to rs.getint a, and I can say if rs.wasNull, the previous operation if it was null, then this int a got some value, I'm going to ignore it. I don't know what the value is, it may be 0, it may be minus, max, and its location defined. So I should, instead of using it I can say got null value. Now the last thing I'm eating a few minutes into your break time, but let me wrap this part of JDBC with this, which is prepaid statement. What is the prepaid statement? For many times you need to take a parameter from the user, and I'll go over this later on when we talk about web interfaces, but the bottom line is I want to execute something, so I take an input from a user, a roll number or something from the user, and then execute a query using that roll number. So I have to pass user input to the query. Now the simplest way of doing it and a very, very insecure, dangerous way of doing it is to take the string which the user has given and use the standard Java string concatenation to create a query. You should never, ever, ever do this. Why? How many of you have heard of SQL injection, unfortunately very few of you? I want every one of you to be careful about this. This is something you should teach your students because this is a huge risk which many people are going out in the job market without knowing how dangerous this is. So what is the danger? I'll come to it later on when I bit later. The bottom line is if I get values from somewhere, so here it's like ID and so on, here insert into instructor values, ID, name, department name, balance, and I've got these from the user. So the first problem is just a syntax error, but that turns into a security hole eventually. So what if the name which was typed in was Disuza? An error will come. Why? Because I have put quotes, so Disuza would have, you know, it's hard to read from this, but there are quotes added around the string, single quotes added around the string. So I added single quotes, Disuza was a particular one here, so single quote D, single quote Suza, single quote, it's not matching. The first quote close the string, now what is Suza? It thinks it's part of the SQL query, not a string and tries to interpret it and gives a syntax error. This may sound innocable, say, okay, this is a bug and the right way to do it, deal with this bug is to use prepared statement. So in this case, what I'll do is use this thing here. Instead of the previous way which was execute query, connection.create statement, instead of this, I do connection.prepare statement. And I give a string without, you know, the values which I'm getting from the user are represented by question mark here. And then I provide the values through this. I can say, this is a variable, pstmt is the prepared statement, so I say set string 1 to something, set string 2 to something else and so forth. And then I can say execute update. What is this going to do? Fill in the values and execute it. Now what if I did a set string here, instead of peria, I said disuza with a single quote in it. The prepared statement will take care of that. It will take care, it finds that there is a single quote there and it escapes it. We put a backslash single quote or something like that, so that is handled properly. You don't have to worry about strings containing quotes. So this is the correct way of doing it. You should always use prepared statement if you are taking values from user. You should never directly do execute query. You should, any value which comes from a user should be represented by a question mark here and that value should never be concatenated, it should only be set using set string or set in whatever it is. So what is the security risk? The previous was just a syntax error. The security risk is supposing the user, so the query is select star from instructor where name equal to some, I am taking a name. So supposing the user instead of giving a name says x quote or quote y quote equal to quote y, no close quote there. That close quote would be added here. This one adds the close quote. What is the SQL which gets executed? Name equal to x or y equal to y, y equal to y is true. So it is going to print all names instead of one name. So what is the big deal? What the big deal is the user managed to get the system to execute a different query from what the programmer intended. The programmer intended this, the user got it to execute something else. You can leverage this. Now user could have put x quote semicolon update instructor set sal v equal to sal v plus 10000 semicolon minus minus is to comment out whatever else to comment out this quote would get commented out there. Now what has happened? JDBC, the string which has gone to the database is now two queries. The first one is select star from instructor where name is x. The second query updates instructor set sal v. Now shouldn't JDBC complain that you gave two SQL statements instead of one? Again for historical reasons people have been using JDBC with multiple statements not banned and guess what happens? Both get executed. So the programmer never intended to allow the user to update sal v. This is not proper. You should not be allowed to update your own sal v. But hey, you just did. So there is a huge security hole. Now we can assume that instructors are honest people. They are not going to go around doing such things. But there are many hackers in the world. And it's quite surprising but banks of all people who should be extremely security conscious many banks and credit card companies never realized that SQL injection is dangerous and many of their programs had SQL injection problem and hackers have actually exploited it to make a lot of money of these people. So it's a huge risk. Now what about other places? If banks have not been careful what about others? They haven't been careful. I can assure you even today IIT Bombay's own applications developed internally there are quite a few SQL injection bugs. Now why? I mean telling people for at least six to seven years now more actually. So never ever code like this. Turns out somebody or the other built some bad API long ago and in order to fix it they have to rewrite a lot of code. And there are always new feature demands that nobody ever gets around to fixing the security holes until one day somebody will hack in and cause havoc on our system and then they'll wake up and go fix it. That's how the world works. That's how Microsoft went around fixing a lot of bugs when it came to a point where people stopped buying Microsoft products because they were extremely vulnerable to viruses. Then they did a lot of work to clean up their act. It's a lot, lot better than it used to be but still there are problems. Now similarly with SQL injection there are a lot of problems. People are cleaning up their act but at least new students who we graduate should never ever make this mistake. So when you teach this please emphasize that students should never make this mistake. Very easy to do. There are some tools for recognizing and detecting this but it should just go into the head that they should never even write code like this in the first place. Okay I'll stop there.