 So welcome back to session number two of the sixth day. We are lucky to have before us Professor Sudarshan who will be talking in the area of database security. I am sure most of you have studied this course databases, some of you may be teaching it and in that connection you might have had the opportunity to hear about the textbook on databases probably the most well-known book in the subject. Professor Sudarshan is a faculty member in this department, in fact he is the head of this department and has been one of the most dynamic heads. So with that brief introduction over to Professor Sudarshan. Thank you Professor Banner and good morning everyone. Today we will be talking about database security. Given that we have just about an hour and a half or even less to cover a pretty vast area I am going to just be scratching the surface. So I will start by motivating database security. So what is database security? There is data in a database and there are certain people who are authorized to view the data. There are certain people who are authorized to update certain parts of the data. And data or database security is basically protecting the data from malicious items to either steal or just view the data or to modify the data which can have other repercussions. For example, adding money to your bank balance without actually having deposited money. Now why is prediction of data getting, you know, it is not just important, it is critical to life today in some sense to modern day life and that is because there is just so many things which are online these days. If you have a bank account and every one of us does, all our information about the money we have in our bank is stored online. If you own shares, it is a DMAT form and stored online. If you use a credit card, all your expenditure is online. When you want to make a purchase with a credit card, obviously it is contacting an online database. Similarly, your salary, income tax, if you are a student, your admission information, your marks and grades, if you own a house, for example, the land records now are increasingly online. It is still mostly on paper but a lot of it is going online because of fraud on paper. Then there are licenses for various things, driving licenses, taxi licenses, auto rickshaw licenses and so on, which are also going online to a large extent because of fraud on paper. Then there are medical records, which as of now in India are not very much online but the trend is clear. Increasingly doctors are going to write your prescriptions on tablets, not the medical kind but the iPad or Android tablets. That is where your health history is going to reside online. The tablet is the interface. Pretty much any information about you which is critical to your life and well-being and your wealth is online. That is from your perspective as an individual. From an organizational perspective similarly, data is key. If you do not have and if you do not protect your data, you are in trouble. Now when there is such important data out there, obviously people are going to try to hack and get access to it. There are various ways in which this can be done. And this is just a very small sample of headlines, meaning the headlines are really the biggest data breaches that have occurred. The smaller ones never even make it into newspapers, let alone become headlines. So here is just a small sample. I have tried to pick some from across different years. Just a few days back there was a report about Chinese hackers breaking into the US federal employee database. It is not clear what information they got but there is a lot of potential for using that information to target individuals who have high security clearance. And if you have security attacks that target individuals, you can put more effort and break in and thereby get into high security government databases. A few months back there was a break into eBay account information. eBay is the mother of LX and other similar things. It is there in India but it is really big elsewhere. So eBay's account information including usernames and passwords were compromised. And this means that people can access eBay accounts, make purchases, spend money and so on. And even more insidious attack happened not so long ago in December where the target store is one of the biggest chains. It is like you heard of Walmart. Target is slightly more upscale version of Walmart. It is a little smaller overall but it is a huge chain. And hackers broke in and got access to credit and debit card information along with PIN numbers which is extremely dangerous. That particular one did not hack into the database. They actually hacked into the point of sale terminals but equivalently if they got into the database they could have got similar information. A little bit earlier in 2010 there was a big article which was called The Great Cyber Heist. Talking about how billions of dollars had been stolen from banks. And it turns out that the key technique used by those people who broke into the banks and credit card organization was something called SQL injection which you saw probably yesterday or day before and you must have had a lab which showed what you could do with SQL injection or SQL injection. I'll be using the word SQL and SQL interchangeably. People in the field use both of these. And you have seen how to do SQL injection and it turns out that the repercussions can be enormous. In this case billions of dollars were at stake. And the number of places that are vulnerable to SQL injection is unfortunately enormous. People have been very careless programmer. In the early days back in the early 2000s people didn't realize the risk and coded dangerous things without realizing it. Soon after that people understood the risk. This was taught in courses. So the current generation of programmers ought to know very well what are the risks of SQL injection. Unfortunately many people still don't and there are still many bugs out there which people are creating today not to mention a lot of legacy code which has these vulnerabilities. Coming closer home to India there have been many such examples. In 2009 there was a thing about medical records which was sent to India for digitization being put up for sale. What people would do with it and why they would pay money for it I'm not very sure but clearly there was a market for it. And this had a lot of repercussions because it gave India a bad name and would have diverted a lot of the BPO market away from India causing economic harm to India. In 2010 there was a fake registration certificate racketed the road transporting in Hyderabad. I believe that people would steal cars and then you need a RC book to sell the car and they had broken into the RTA and could generate those things and then they could sell the stolen cars. And a little bit earlier still in Delhi they had an estimate of they had issued about I think 1 lakh auto-rickshaw licenses and a sample showed that there were at least 2 lakh autos on the road. So half the autos on the road were illegal they were not paying taxes and the government decided to clamp down on it. So they said that this is happening because of paper records we will now have everything online and all licenses have to be renewed and everything is going to be done on the database online. Very nice idea. They went through with it and at the end of it they found that from 2 lakhs the number of autos had come down to 1.5 lakhs but there was still only 1 lakh licenses that means half of the licenses were fake even after putting it online. So then they investigated what had happened and it turned out that somebody had accessed the database and wiped out the records after a bunch of autos had been registered allowing fresh registrations against the same license number. So then 2 autos with the same license number could exist in the city without anybody knowing and they only paid tax once. So this kind of fraud is common and dangerous. So if you are a database administrator this is what the world looks like to you. Here is the data sitting in the middle and here are all these people trying to hack into the data and it's your job to protect the data and obviously you'd be extremely scared about what is going on. The amount of money at stake has gone from literally in dollar terms from hundreds of thousands to millions to now billions. It's an enormous increase in scale. So I'm not going to be able to cover everything in this vast space as I told you but I'm going to cover a few things and for each of these I'm going to give you just a little bit of overview of what is going on and then I will leave it to you to dig deeper into these areas if you are interested. So first of all we will cover the levels of data security and below are the actual levels so I'm going to go into each of these levels in more detail in the physical data security. I'm going to talk about authorization, application level security protection against certain other attacks such as insider attacks and then I'm going to talk about two topics which are more at the research level today one of which is security with outsourced databases and the other is privacy. I don't think you have had privacy as a topic here I have a list of topics down here so again privacy is a very big area I won't get into much detail but I'll give you a very quick issue of some of the issues quick overview of the issues. So this is what a modern information system looks like you have users who are connecting over the net now here it shows a very old drawing of somebody with a CRT monitor today of course it's not just desktop computers here but mobile devices are actually dominating but be that as it may they usually connect over the internet to an application program which is sitting somewhere and that application program is going to access data in a database the database system itself is running on top of an operating system such as Windows, Linux increasingly this database is not a centralized database but is actually something which is highly parallel and underlying all that is the actual disks and hardware on which this data is stored. Now coming to the different levels of data security against which any organization which has an information system has to be aware of all these levels and take precautions against breaches at any of these levels the highest level is humans now obviously there are some humans in the organization who have access to all the data if this human can be subverted then there is a problem now how do you protect against such things that's not an easy problem in the no single solution to it but anybody building a system has to figure out ways of it and we will talk a little bit about this now the next level the human is connecting to the application program over the network and there are a lot of opportunities for breaches over here I'm not going to talk about it because you have been covering that and will be covering that in a lot more detail in this course further down is the database application program and we are going to spend a fair amount of time on security at this level and the application program in turn connects to the database system so we will talk about security here now the database itself typically runs on top of an OS if somebody can subvert the OS they can get access to data again we are not going to get into that although that's an important problem and then finally there is the physical level now we are going to talk a little bit about protecting data at the physical level coming up so I'm going to actually cover these in reverse order I'm going to start with the bottom and then go upwards roughly speaking let's look at the physical level security now you must have heard about Snowden and how he copied data from the information systems and is now busily embarrassing the US many of us probably think it's a good thing because he revealed a lot of unsavory things that the US government had been doing but obviously the US government doesn't view it that way and they would want to prevent something like this happening again so in this case Snowden had access to unencrypted data and he copied it to a USB key we are not going to deal with how to prevent that kind of access but the question is what if somebody had data in a laptop and this is a common situation in a business management often needs low level data to do analysis if somebody steals the laptop then they have got access to the data if somebody manages to break into a hotel room copy out your disk and leave without you knowing it they have got access to your data so that's a serious risk so how can you protect against this kind of a risk similarly if you had a USB key with data which is lost and this happened again to the US apparently there was a lot of military data on USB keys in Afghanistan and USB keys are extremely easy to steal obviously so these things were being sold in the market in Pakistan containing critical military information so how do you protect data which is stored on these things and which has leaked out and the obvious way is to encrypt the database at the storage level so most commercial databases today support encryption of the entire database so if anybody gets a copy of the disk if they don't have the key then they cannot access the data so that is whole database encryption now whole database encryption is good but there are certain situations where that is either too high overhead or is not sufficient and the next level of security is column encryption so this is very commonly used for sensitive data so supposing you have an account with let's say Flipkart Flipkart has your credit card number and that is considered sensitive information now if they store it in plain text if somebody gets access to it there is a problem so instead what they do is the database itself may not be encrypted but the column containing the credit card numbers is stored in an encrypted fashion and if Flipkart's application program wants to access the data it has to be decrypted so as long as the database is kept physically separate from the application program containing the decryption password then if somebody steals the disk they are not going to get access to the sensitive information at least so this has lower overhead than full database encryption and is used quite a bit there are issues with this still for example key management if somebody manages to steal the key then they have access to your data but it's still a very important aspect of security and most commercial databases do support both of these levels today whole database and column they also do file system encryption most operating systems today support some form of encryption of a whole file system so you could do that too so now encryption of sensitive information such as credit card numbers is not just a good idea but it's actually the law in the US under the HIPAA law there are certain sensitive information which by law cannot be stored in plain text in the database so you have to encrypt it let's move up one notch and come to the database and application program so here security is not just about preventing access to data because you do need access to data here but it's about authentication and authorization so you need to authenticate users to know who the user is and then you have authorizations to specify to the database what a particular user is allowed to do so you are going to get into more detail here so there are many types of authorizations at the application program level there is usually a screen level authorization which controls which users to get to see which screen so a typical application program has a very large number of screens each of which allows certain things to be done so most application programs would have an authorization mechanism built in which says that this user or this category of users can get access to these screens in addition the screens themselves will take certain parameters directly from the user login so if a user is logged in they get the user ID so a screen which shows your salary for example would directly take from the login your employee ID and you will only get to see your own salary not anybody else's salary so the screen enforces that the salary of the currently logged in person is made available so that's the screen and application program level authorization now if you go down into the database the application program itself connects to a database and does various things to it and so there has been a lot of work on authorization in the database anybody who has taken a database course would have probably seen some of these things and here are some examples of the types of authorization these actually technically these are privileges you have a read privilege, insert, update, delete and a variety of other privileges so now this is encoded in the SQL language the entire authorization mechanism built into SQL and this is based on four different notions there's a notion of a privilege so I just told you what are the kinds of privileges read, update, insert, delete and so on and then there are objects on which these privileges can be granted so in a typical database the object could be a relation or it could be one or more columns of a relation so you can have access to column A but not to column B and then there are users and roles so now what is a role? it's a notion of the kind of work that a particular set of users might be doing so let's say a role is a manager so if you look at this example here it says grant select on employee to manager employee is a table and manager is a particular role so all people with the role manager will now get the ability to run a select query on employee's table and see all the roles in that table so the objects as I told you could be relations, columns but it could also be views, thought procedures and so on so there's a fairly extensive set of objects on which you can grant privileges and there's actually more kinds of privileges for example for thought procedure you may not be able to read the thought procedures definition but you may be allowed to execute the thought procedure you might be able to execute the thought procedure as yourself or you might be given the authorization to execute the thought procedure as the user who created the procedure in other words you may not directly have access to a particular table which that procedure uses but when the procedure is executed it runs as the user who created that procedure and can therefore access the table so there are a lot of such functionalities built into SQL in the current SQL standard however the authorization on objects is not at the level of roles you cannot say that a user can access this role of a relation but not that role and I'll come back to this limitation later there has been some work to remove this limitation then there's also a way to grant roles to users for example grant manager to Bernard because he's managing this course now I'm not going to get into more details of authorization in SQL it's fairly standard if you've done a database course you would know it if not see any database text now let's come back to the application level as I said the application connects to the database and executes queries and updates in there so how does the database know who's connecting to it it's a connection which is coming across the net the socket is open but how does the database know who is opening this and typically this is done by using a password a database password which is stored in the application program and so the database has no idea who are the users out here all it knows is that there is a single user under which the application program is accessing the database now the application program obviously is comprehensive it has to do various functionalities on the database therefore it has to have complete privileges on the database it should be able to update any data, read any data in the database so what does this mean? it means that if this password leaks first of all you are in trouble but even more important if somebody is able to fool the application program is able to subvert the security of the application program the database level security is useless it's completely useless because the database only sees one application program user and it has no idea that this particular update that came is not a legal update but rather is a result of a breach of security here the database cannot find out so that's a more serious problem we'll come back to this so it turns out that the biggest problem of security today by far is not inside the database but really at the application program so poor coding of applications in particular SQL injection which you saw yesterday or day before is one of the biggest problems here but there are many more for example application might easily forget to implement an authorization check on a particular screen so there's a famous case a few years back there's a site called applyyourself.com which is used to handle applications to various universities it's a site which many universities use so what happened is that in there students were supposed to see the results only after a certain day so universities get time till a particular date to update their admissions information after that date the results are made available to students now what happened is they use some interface for universities and others to see the data and that interface did not have the time limit and they had a similar interface for users which also did not implement the time check and this particular thing was not publicized to users but it was there on the system unfortunately and as luck would have it somebody chanced on this interface I don't know how they got information about it but what they did is they typed that extra suffix to the URL on the browser and lo and behold they could connect to that interface and see the results before the results were declared unfortunately this could only be done if you are already logged into the system well fortunately or unfortunately unfortunately for the students I would say because some students came to hear of this and they said hey I want to see my results early you know what's a big deal nobody is being hurt by this so they went ahead and used it to see the results but unfortunately for the students applyyourself.com was logging what was going on so it was able to trace all the students who had accessed data through this interface and it went ahead and turned over that information to the university and many universities actually cancelled the admission of students who did this so it hurt those students quite a lot the university said that hey these students are doing unethical things we don't want to admit them so that's a big impact on those students so it's very important to code the application to avoid these kinds of problems but of course this is easier said than done way back about 11 years ago we had a workshop on data based security here in IIT Bombay and there was a student who built an interface for registering for this workshop and there were some people in a company in Pune who were working on security one of them said hey let me try to see if this particular registration application is vulnerable and what happened is unfortunately it was he was able to see the passwords by a small trick which was well known and he could have accessed our database and done anything to it of course he didn't he told us about it but that's another example of application vulnerability so let's come back to SQL injection which you saw earlier I'm just going to remind you briefly about this before going into techniques for avoiding SQL injection attacks so here is a form which takes a username and a password and here is a badly written application which tries to authenticate the user as follows it has a query select user info from user where user ID equal to and note the construction here the query has a string plus user ID equals and then there is a username which is typed in here and then there is a and password equal to and a password which is typed in here so now this username was supposed to be this particular one I think is missing some quotes there should have been there's a typo on this slide there should have been an extra pair of quotes before and after username and similarly before and after password so that in the SQL query which you get the username and password are enclosed in quotes which is the SQL language requirement that you have quotes around this identifier so now a malicious user types in a username that looks like that the following XYZ quote and then dash dash now what is this doing as I said there should have been a quote here as I said there is a small mistake in this slide so now there have been an opening quote for username so it should have been something user ID equal to quote and then XYZ close quote that closes the string so in a SQL query what comes after the string well the remaining SQL constructs in this case dash dash now what is dash dash in SQL it's a comment so what is the effective query that is run select user info from users where user ID equal to XYZ and then the rest is commented out that's it, that's the query and this query is going to return a result now normally if the user had not hacked it the query would return a result only if the username password matched in the user's table so if the user gave a wrong password it wouldn't allow login but now the password check has been commented out so the user is in trivially by just adding a quote and a dash dash how much easier can it be to break into a system so this is the problem with SQL injection attacks they are extremely easy to execute so how do you write code to avoid this we will see that in a bit as I told you the great cyber highs where millions of dollars were compromised was initiated totally from SQL injection a little bit later Sony had online games for their gaming console, PlayStation and somebody hacked into it again using SQL injection and was able to get credit card information login, password, all kinds of information about the entire set of users so it hurt the company enormously so SQL injection is extremely dangerous so how do you protect against SQL injection now all languages which are used to connect to databases have a notion of prepared statements or there are a few other ways but prepared statements are by far the easiest way of protecting against SQL so I am going to spend a few minutes talking about the JDBC API specifically whatever I tell you here has is equivalent in other APIs so if you use PHP or Python or anything else you would use a corresponding method over there so all these APIs for accessing databases have the following model you open a connection to the database you create a statement object and then you execute queries using that statement object and then there is an exception mechanism to handle errors I am not going to cover that so that is the basic model so here is a sample piece of JDBC code which shows the overall thing and in the next slide are the details which we are going to focus on so this is kind of the wrapper for what we are doing so do not worry about the specific syntax here it is loading some oracle driver first and opening a connection to this URL db.el.edu for 2000 connecting to db with the user ID and password that were passed in here using a particular protocol, JDBC oracle thin so all that is not of concern to us for security the connection is open in this statement and now from that connection we create a statement and the statement is used to do the actual work which is in the next slide you close the statement, close the connection and wrap up so that is a typical model for accessing a database from JDBC now here is the actual code which uses that statement variable to execute whatever is required so if you want to update the database you could write a thing like this statement.executeUpdate and here is a SQL string insert into instructor values blah blah blah so new tuple is added to instructor and that is an update similarly if you want to execute a query you could say result set r set equal to statement.executeQuery and here you have a small SQL query being executed and then here is a loop which loops over the result set and prints out the results of that query let us not worry too much about the details of the syntax but let us focus on this particular thing here you have a statement.executeQuery and a string and the problem is if a string is created using input from the user then we have a problem so here is a string which is being typed in it says select star from instructor where id equal to single quote plus user id plus single quote so whatever user id is typed here is enclosed in single quotes now the double quotes are the java language string terminators and the single quotes are the string terminators within sequence so this is how this particular query is constructed so here the quotes have been done properly sorry for the mistake in the earlier slide so now if the user instead of typing a real valid user id type something like just a single quote directly semicolon delete from r this is the example of SQL injection you have seen then the query becomes select star from instructor where id equal to quote that quote came from here and then a quote which came from the use this thing what the user typed in semicolon delete from r semicolon and then there would be a quote after that which is added here which might be a syntax error you could put a dash dash here to remove that quote also so now this is going to run two SQL statements select star and then delete so that's a second kind of vulnerability so now how do you find out if a particular application is vulnerable it turns out to be very easy in fact there are automated scripts for doing this so you have tools which will type in strings which look like this quote star star star into a box, why star star star that's not a valid SQL statement it will cause a SQL error and the quote will terminate a string if you had a vulnerable application and then this will cause an exception and the exception will immediately cause the application to output a message in some cases the application actually outputs the entire error message and it turns out that the error message often has very valuable information to the hacker what kind of information it tells the user what was the query it tells the user what were the tables in the database it tells them what are the columns so using this the hacker can now get more information and probe deeper into the database starting with no knowledge about the database so that's another problem printing error messages which the database has thrown to users is actually a security whole, many applications do this even today so how do you prevent this kind of an attack the simplest technique is to use prepared statements so how does a prepared statement work well it looks like this this is the JDBC prepared statement you say con.prepared statement and you pass the SQL query but in here you do not pass any values which are taken from the user so instead you put question marks for each value which is going to come from the user so this is a fixed string select star from instructor where id equal to question mark is a fixed string it cannot be modified by user input however there is a question mark indicating that this is a point where some user input is going to be provided and the next line here pstatement.setString1 to use already one meaning the first question mark in this query string that is being said to user id and then you execute this prepared statement so what this particular thing does is the set string looks for quotes in the string which is being passed in and adds escape characters so in particular if the user typed in quote semicolon delete from our blah blah what is going to happen the set string is going to add a backslash before the quote with the user typed in so backslash quote is treated by sql as an actual quote in the midst of a string so now this becomes the entire string the string becomes single quote semicolon delete so id equal to single quote semicolon delete blah blah which of course is not going to give anything useful and so this protects the system from SQL injection so this is by far the simplest way of avoiding SQL injection use prepared statement if you read up instructions on how to avoid SQL injection there are other alternatives which are provided for example you could use stored procedures and then you say call with a thing with strings which are enforced as strings but that's a little clumsy because you don't want to create a stored procedure for every query another alternative which is widely used is to write your own function or use a library function that adds quotes as above before you form the query now this is again not recommended because instead of doing this yourself leave it to the database to do it prepared statement will take care of all this why do you worry about how exactly it's to be done so this form actually tells the system what you want rather than how to do it by escaping quotes so it's better to do it at a higher level so you should never ever take a user input and concatenate it into a string directly you should only pass it to a database by using a prepared statement and set string, set int or whatever depending on the type of the input which you got from the user if you do this you are pretty safe from SQL injection now there are other vulnerabilities here for example passwords in scripts for if I told you about the database workshop hack at IITB this was what was done actually in this case there's a php I think but the same thing happens with jsp so these are scripts which are stored in a directory which is accessible directly from the web the only thing is that when the web server finds a file with suffix dot jsp or dot php it reads it as a program and executes it if it finds a file with any other suffix it simply displays the file so now what happens is if you edit the file in the directory with say emax editor which probably few people use today but it was very popular it creates a file with the same name with a tilde attached to it other editors do something like have a file called dot swp so it's something dot jsp dot swp so what happens is the hacker sees a particular URL let's say file one dot jsp and then types in file one dot jsp tilde since it's jsp tilde the server simply displays the file and if by chance the database user id and password was actually stored directly in that file you have a problem that file has been shown to the user who now has the database user id and password and this is what actually happened in the back in 2003 so there are a few things which you obviously need to deal with to prevent this first of all you should never store scripts such as java jsp in an area accessible using the basic HTTP protocol and second you should never store passwords in scripts store them in config files which are in a separate area which even if there's any error will never be revealed through the web server the web server does not have access to that area with the password so that's that's a basic thing which anybody developing an application has to take care of and finally even if the user if the hacker gets the database id and password they still have to connect to the database and make modifications there so another basic technique which must be used is at the database you list the particular set of IPs from which you can connect and execute query so this IP will include the application server but not anything else so if a hacker tries to connect from some other IP they will not be able to execute any queries from the database so even though they have the user id and password they still not able to go and modify data unless they additionally break into the application server itself and then execute query from there so that's a few things which any application developer has to keep in mind ok I think this is a good point to take a 2 minute break and see if there are any questions