 Let me spend a few minutes covering a little bit more on SQL today and I may cover a little bit more of this missing material later on. But before I start on that, today's lab consists of setting up PostgreSQL and doing a bunch of stuff on that. So after maybe less than half an hour of other stuff, I will get into the lab material itself. You have handouts, maybe some of you may have already read that and figured out what is happening. If not, we'll talk about it after covering up a bit on SQL. Shall we get started? Are you all ready? So this last chapter on SQL, it's called Advanced SQL. It has several topics. I'm not actually going to cover all of those, we don't have time for it. We won't cover it all here in this version. However, in December, when we have about twice as many lectures as we have in this one week course, I will be covering some of these topics in more detail. So at this point, I will just flash the topics by you so you can read it up on your own and get familiar with it if you're already not aware about it. So the first topic which we will be covering now is accessing SQL from a programming language, in particular using JDBC. The other stuff I will only give you an introduction. So how many of you have used JDBC specifically? Let's take a quick poll. We did this poll online, but let's redo it. How many of you have used either JDBC or ODBC or any other equivalent API to connect to a database from a programming language? How many of you have not used any API to connect to a database? A few of you. So then I will spend a little bit of motivation for these APIs before getting into details. Let me also ask one more question. How many of you have used Java, have done any programming in Java and flipping that question? How many of you have not done any Java programming? A few. This lab and tomorrow's lab are purely SQL based. The lab on day after tomorrow is using Java and using Eclipse. I don't know how much free time you have, but if you have access to resources on Java, be good if you, those of you who are not familiar with Java, please read up very basics of Java. It's not a big deal. If you know C, I assume everyone knows C. So Java is close enough with some minor changes. So I think you will manage fine. But if you have time and can browse some of the Java tutorials online, that would be good. Now let's get back to JDBC. The idea is we want to build an application. We cannot write it completely in SQL. It has to be written in some other language. And SQL is merely the language to access data. So the question is, how can you communicate with a database from a programming language? And for that, basically, you need to define an API. And the standard APIs which are used today, there are several standards you're paying on the language. For Java, it is JDBC. For C, C++, it's ODBC. And for the .NET languages, there are variants, ADO.NET, Microsoft line. Then for scripting languages like PHP and so on, there are again APIs of their own, which are basically a layer on top of ODBC. JDBC, ODBC are fairly similar in terms of the features. So we will see what are the features. So the main steps in an application talking to a database. The first is to connect to the database and identify who you are to the database. This application has the authorization to use features on this database. Now how do you do that? The standard way is to give a login password for the application to identify itself. This has some security loop holes because the login password is stored in the application code. Somebody gets access to that code, they can get access to the database. But although other approaches for authenticating an application to the database have been proposed, none of those are practical as of today. So this is basically what we need to use. You can do a little bit more with certificates and so on, but it basically comes to the same thing. The next step is to send SQL commands and get results back. And since results are usually a set of tuples, we need a way to iterate over the tuples in a result. Again, there are different architectural approaches used by different APIs. We will see what JDBC does, which is you can fetch tuples of the result one by one and process them one by one. So we have already seen what is the motivation. Now in addition to the basic things of sending a query and getting results back, JDBC also supports several other features. In particular, one of the nice things it supports is metadata retrieval. Again, this was already there in ODBC and JDBC came later. But this is a very powerful feature which lets you, when you run a query, the query is a string as far as your application is concerned. But you can submit a query and use the metadata features to talk to the database and find out how many columns does the query result have, how many columns does the stable in the database have and so on. So you can get metadata, type information about both database relations and query results, which lets you build generic interfaces where you type in any query, the system can run it and show you the result of that query in a nice, tabular fashion with the proper typing and so on. So that is another important feature of JDBC, which we will look at only briefly. We won't cover it in the lab, but you can look at this in more detail offline. So how do you communicate with the database specifically in JDBC? Open a connection. Then we have to create a statement object, which basically is like an encapsulation of an SQL query and then we execute state queries using the statement object, fetch results and then there is a whole exception mechanism. So here is how you set up the connection. This is a method somewhere in some class and we are not shown the enclosing class, but of course in Java all of this has to be part of a class. What this particular method does is it takes a database identifier, a user identifier and a password and then does a whole bunch of stuff in here. What is the first thing it does here? It says class.forName. This one is oracle specific. In the lab you will be using the post-chrisquiel version of this. So these things will change a little bit, but the first step it does is it is loading a driver, in this case for oracle. Now what is this loading of a driver? To understand this you have to understand how JDBC is architected. Now there are two possible approaches to connecting to a database. One is that the database says here is a network protocol. You send me these bytes, I will interpret it thus and I will give you back results. That's not how JDBC or ODBC work. How they work is that it's purely an API. Exactly goes across the network to the database and comes back is not part of the JDBC protocol specification. The protocol only specifies the API, which means you need some code library to run on the client machine, which actually communicates with a corresponding set of functions on the database server side. On both sides you have code and your program has to link to the client code, which will actually talk to the database and for each database you have different client code. So in this case we are getting the oracle driver. In your assignment you will use the post-chrisquiel drivers. Now each of these databases has its own set of client libraries. So one way is if you were using C for example in the Unix world, traditional world, you would link it statically. So for this application I am going to link to the oracle library, but this doesn't work that way. Here what is happening is we are doing dynamic linking. So the program is actually loading the library on the fly and in fact it's possible for a program to load multiple libraries, which can all exist in the same program. So this first step has basically just loaded the drivers. The next step is actually opening a connection. If you look here you have a URL, just like a http colon slash slash, we are saying JDBC colon oracle colon thin and so on. Now this is in, you know, the driver manager basically figures out from this that it has to use the oracle driver. If you use a different thing here, if you loaded two drivers, one for oracle, one for post-chrisquiel, at the same time you can do that. Then depending on this it will use the appropriate driver depending on what you specify here. So in this case it says at db.el.edu that is the machine's hostname colon 2000 is the port number, colon univ db is the database on that server. The server is running on that server, on that port number oracle is running and oracle supports multiple databases. So we are saying use the univ db and finally authenticate yourself using the user ID and password passed in here. So that opens a connection to the database, log in session. Then we create a statement. So on the connection we create a statement and then the actual work is on the subsequent slide, too big to fit in one slide. When you are done with the actual work you have to close the statement, close the connection. Note it is important to close both of these. Those familiar with Java would know about garbage collection in Java. You know you can create objects and then forget about them, they get garbage collected quietly. Unfortunately, connections do not get garbage collected quietly. What happens is they just hang around and when you are testing a small program it seems to work fine and then when lot of users use it the database runs out of connections. There are too many open connections. So don't forget to close the statement and the connection at the end of whatever you are doing. So that this whole thing is within a try catch and these things, the only exception that they can generate in this case is SQL exception. You can catch it in output, whatever. So what are the exceptions which could arise here? There are many possible exceptions. One is it couldn't connect to the database, the database is not running. Another we have not shown it here but the actual SQL statements can have an exception. I try to insert into a relation primary key violation SQL exception. So all of those exceptions are thrown in here and caught here and this just prints out the message so you know what happened. So what is the body of the code? This one is any of these could be in the body. So the first one is updating the database. So we have already opened the statement object which you saw here. STMT is connection dot create statement. So on that object we can say execute update and then give a query string and it's going to submit this SQL string to the database, run it there and again it can have an exception so in this case we catch the exception here and print it out right here saying we could not insert the couple. Or it could be a query in which case again we have not shown the tri-catch here. But even a query could run into trouble. What kind of trouble could a query run into? Why would a query not execute? There are many possible things. It could be a syntax error. It could be syntactically correct but you don't have authorization to that table. So there could be many reasons why this fails and they would all result in an exception. But assuming everything works fine, this query is executed in the database. And the result is passed back as a result set object. Now, what is this result set object? It's basically, it encapsulates the entire result. Conceptually, if it's a very large result, the driver could fetch it in pieces but practically I think most of the common drivers fetch the whole result up front. And it can get you into trouble if it's a very large result. But let's ignore that for the moment. So what you can do then is step through the tuples in the result. There could be many records in the result. So rset.next steps through tuples and it returns false when there are no more tuples. So here as long as it's true, what are we doing here? We are printing getStringDepartmentName, that's there. And rset.getFloat2, where did we do 2? Because this is the second attribute of the result. Now note that this doesn't have a name. AVG of Salvi has some internal name which is system dependent. We don't know what it is. We could have said as and given it a name and then fetch that name, use that name here, but you can use positional notation also. Not also that, whether to getString or getFloat is a decision we are making in the program here. It is possible that, I'll say getString on 2, what happens? As long as it can be typecast, so you can always convert a float to a string. So it will give you back that string if I say getString2. On the other hand, if I do getFloat of department name, what will happen? That department name is not a number, it will give you an exception at that point. Okay, so as long as these are type safe, we will step through all the rows of the result and print all of them. So that's all this programmed us. Any questions? So, these two are equivalent, getString department name and getString1 because this is the first attribute. How about null values? When I say getString, I'm getting a string back. If the initial input was null, I could get a null object perhaps. But what if I- Instead of that column name, can we use a positional notation 0 over 2 string? That's what we have done there. GetString1, that's the position. First, second, third. No, it starts from 1, not 0. Okay, so now one of the differences between the SQL type system and Java or any other programming language type system is that SQL supports nulls for every possible type. In a language like Java, if you have an object or pointer or reference to an object, the null value is a special value. But if you have an int, a primitive int, there is no way to represent a null value there. So, when you get data from SQL to Java, you have to do a bit of work. So, you could always say get int a, this is not for the previous query. So, something else, let's say it's an integer. If it was a null value, what is the result you get here? It's something system defined. We don't know what it is, maybe minus max int, maybe 0, I don't know. However, what I can do is, I can say rs.wasnull. So, the last action I did on that result set, whatever it was, get int, get string, whatever is the last thing I got. Wasnull tells me if that was null. So, whatever the value is saved here, if it, if this is true, I can ignore that and treat it as null, okay? So, all of this is to deal with the difference between the type system, the a, the null value, b, the fact that SQL deals with relations, whereas Java deals with individual records. Now, we saw a query here before I get to that. We saw query here, which was executed. You can always pass a query string. Now, if you have any input which you're getting from the user, you have to be very careful with this. How many of you have heard of SQL injection attacks? Few of you, most of you haven't. How many of you have not heard of SQL injection? Good, you'll be hearing about it in the next couple of slides. So, how many of you have used something like this, where you take a string and pass it to a database and execute it? Quite a few have done this. And if you got a value from the user, you probably just concatenated the string and then passed it into the database. This seems like a reasonable thing to do. What can go wrong? Well, I'll show you in a moment. So, first I'll show you the right way of doing it. The right way of doing it is whenever you get parameters from the user. In this case, actually, they're not from the user, they are fixed values. But this still illustrates the syntax. In this case, there is an insert statement, but it could be any statement. It could be a query. Instead of an execute query, what I'm going to do is connection.prepare statement. What this is going to do is going to take a string which has question marks for values which are going to be provided subsequently. And these could be values which are input from the user. So, whatever is a value which is input from the user, the only way to pass it to the database is by first taking the rest of the query with question marks for as placeholders for those values, preparing a statement. And then you can put those values in. In this case, the values are fixed, 88877, peri, finance and salary of 125,000. So, I'm setting all these values. On this, set in one to this, two is the second question mark, three is the third one, four is the fourth question mark. I have provided values for all four question marks. And finally, I can do statement.execute update, because this is an update. If it were a query, I would have done execute query similarly. So, what is the benefit of doing this? We are going to see this next. This is the way to do it. Why should you do it this way? We'll see in just a moment. But before that, note that what we have done here, we can actually reset one of the values. So, in this case, we have made the first question mark. We have changed it to 88878 instead of 77. And then we have done execute update. What does this do? It creates one more instructor with a new ID. But since we didn't set 234, it just takes the old value. It reuses those. So now, what is wrong with concatenating strings? The alternative would have been to do something like this, which I'm sure many people have done. Insert into instructor values, single quote. And the double quote closes the Java string. The single quote is for SQL plus ID, which we have read from somewhere, plus single quote, comma, single quote. That's plus the name, and so on. So we have constructed a string in SQL query by explicitly including single quotes over here. But it's very dangerous. What if somebody's name was Disuza with a apostrophe there? What's going to happen? This name here, b, that quote, will finish the name. So now the name is b. And what is after that? So you za, which is not a value. There's no comma even. That should have been a comma. So what will happen is you will get a syntax error in this case. And you should be very happy you get only a syntax error here, because things can be much, much worse. So in the early days, before we realized the need for preparatory plans, all of us have gone through this. We had programs written here, long back, which did this. And guess what? They failed on Disuza. This example is real. They failed on Father Agnell's college. Any of you are from Bombay? Anyway, in our MTech records, we keep track of where people did their BEB tech. And we found out after, I think, a year of this application running, that we had a lot of garbage there. Wherever Father Agnell's college was coming in, there was garbage. Nothing was, all the remaining fields were gone. So you shouldn't be constructing it like this. The right way to have done it would have been to use an escape character, so that the quote in Disuza is not interpreted as a string terminating quote in SQL. Backslash quote should have been used. Now if you knew this, and you were careful, you could have done it. But why take all the trouble when this guy will do all of that for you? So in the prepared statement, when I say set string, if this string happened to have a quote, set string will do the job for you. It will put in a backslash, so that the quote is not treated as a SQL special character. It's treated as part of the string. So that's one reason why you should use prepared statement. You should never concatenate strings like this. It's a terrible idea. Well, I said it gets worse. What if somebody clever knew that you had a bug like this, let's see what all they could do. So select star from instructor, where name equal to the name which was typed in. So the user could type x quote or quote y equal to single quote y. Note that the last quote is not typed in. That is provided here. So this query, which was supposed to take a name and print the information of that instructor, what is it done instead? It says the name equal to x or y equal to y, which is true. So it's going to print the name of all instructors. So the query was supposed to print just this one name. This clever user has modified the query and is running a different query from what the system implementer intended to run. Is that so bad? In this case, it's not done anything drastic, but it could get worse. The user could have done this. x quote semicolon update instructor said sal v equal to sal v plus 10,000 semicolon and then dash dash. Now, y is dash dash. That's an SQL comment. What it ensures is this quote will go into a comment. So now you have a valid SQL statement, actually two part statement, one of which is a select and the other is an update. Now if the database accepted this as is and executed it, many databases do. You can instead of giving a single statement, you can give a set of statements and it will execute all of them. So what is going to happen? It's going to run some query whose result is irrelevant and then give everybody a sal v increase. Of course, if we get a sal v increase, we would be happy. But equally well, this person could have said delete from instructor and deleted all of us. So basically, a hacker can do anything they want to the database through this loophole. And it's shockingly easy. They just have to connect to your application, type one quote, and then type whatever they want and this string goes and gets executed. It's like a big hole, you're just waiting to be kicked. And unfortunately, many, many applications out there have this bug, SQL injection bug. This bug is called the SQL injection bug because what your hacker is able to do is actually enter a SQL statement which the database is going to execute on their behalf. And they can do anything they want to the database. Any questions on this? So how many of you have ever written an application which has a SQL injection bug? No? I have. I'll admit to it. Anybody else? The rest of you have not written any application? That's the only way you have not written an application in this bug. Anybody here who has written an application but not ever made this mistake? No one. So I want to spread the awareness of this because this is a really dangerous thing. And unfortunately, exceedingly common. So you should make sure your students never make this mistake. So make sure you communicate this. So let's then wrap up with JDBC quickly. And I'm going to stop there for today and get back to the lab. As I said, there are metadata features. There are two kinds of metadata features. One is to get information about the result set for a query. And the other is to get database metadata, which is schema information, primary key, foreign key, all kinds of information on the database schema itself. So I'm going to skip the details of both of these, but go read up these things. And the final thing in JDBC is something which I mentioned before lunch, which is connection.setAutoCommit to false will turn off auto commit. So then all the statements which you submit subsequently on that connection will all be part of one transaction. They will not get committed automatically. And finally, you can say connection.commit or connection.rollBack, depending on what you want to do. Why would you do connection.rollBack? Because you did multiple updates. And you realize there is a problem. And then you cannot go forward. So then you roll back, which undoes all the updates. But that's going to be very rarely you need to roll back. The more important news for it is, what if there's a power failure in the middle? If you did two or three updates separately, it's possible that the first few updates got saved in the database, the remaining, before you even did it, power failed. So what has happened is the database is an inconsistent state. If you're familiar with transactions, you know this. That's the real problem, or the more common problem. And this will prevent that from happening, by treating all of those as a transaction. So I'll just stop the chapter pretty much here. There is a bit on ODBC, which is similar to JDBC. ADO.NET, which is also similar to ODBC. This syntax varies. And then the embedded SQL, which is, again, variant, which is more tightly integrated with the programming language. So there are a couple of things. There's a Java embedding of SQL called SQLJ. Again, I'm not getting into details. That is it for the communication from a programming language to a database. The chapter has a bit more of stuff on a couple of things. One is third procedures, then there's triggers. And I will cover this maybe tomorrow I'll do this. So any questions on what we covered today?