 So, what I am going to do is, maybe cover a couple of slides and then have a little bit of interaction to show them how this interactivity works. So, let us start with just covering the next couple of slides. So, the last slide which I covered was on PHP to give you a little bit of a glimpse into what PHP is about. And as I mentioned PHP is used in many things including Moodle which many of you are happy with. And when it came to the quiz few days back, almost all of you were very unhappy with the same Moodle system because of performance issues. So, part of what we will be doing later in this course is understand what goes on behind the scenes and what are the performance problems that this system ran into. Some of which may be related to PHP, some of which may be related to databases. So, that server side scripting was one part. The other part is client side scripting. And those of you who are around in the early days of databases which today might be called web 1.0 would know that the basic interactivity was in filling out a form, click on submit, then the screen kind of blanks out momentarily while the page, a fresh page loads and then you see results. But anyone who has used the web in the last 10 years or so is used to a completely different user experience where you click on a button, almost all of the page stays the same but a small part changes. So, this is part of what is called web 2.0 which has many parts. But one part is that it involves a lot of client side scripting. In particular the JavaScript language which was actually defined very early in the birth of the web took off after some years, some years after it was defined. And it helped to bring a complete change in how people interact with websites and made the interaction much smoother and it is what you see today. There are many other scripting languages. Macro media flash and shock wave things which you are familiar with. And in fact the AVU system which we are using is based on shock wave and slides which you are seeing are actually shock wave objects. And whenever I change a slide, message is being sent over to your side to just move to the next slide. Now, this makes a big difference because the whole content has been loaded earlier. And when I click on slide change, just a small message goes out but all the detail content of the slide, lots of stuff in the text, images, whatever appears instantly on your screen. So, there are many uses to client side scripting. What I will do is take some questions. So, we can take these questions from any centre at this point. So, anyone is free to interact. So, the questions can be on what you just heard or on any other topic which we covered earlier in this course. So, let me open the forum for questions. So, we have Amal Jyoti College in Kerala. So, how do we connect multiple databases in a single Java program? That is a good question. So, the question is how do you connect to multiple databases at one time in a single Java program? The first question is why would you want to do something like this? And the answer is many times you want to take data from one database and put it in another, copy some parts of the data for example. So, you might want to have multiple things open at the same time. The next question comes. So, of course, we do this in JDBC which we have studied and you will be using JDBC in the lab today. And if you look at the JDBC API, you will see that the first step is to load the files for which help you communicate to a particular database. It will take me some time to switch over to the previous set of slides. So, in the interest of time, let me just remind you of the two or three steps. The first step was driver manager dot load or something. And the idea was each database defines its own underlying implementation of the JDBC API which lets you talk to that data, you meaning your JDBC code talk to that database. And the other trick is that you can load code for multiple databases at the same time. And when you open a connection, you specify which database and then all further requests to that database use that particular databases code. And you can actually load the API implementation for multiple databases at the same time which is kind of counter intuitive. Most of the time when you load a dynamic library, you have one implementation. Here you actually have multiple implementations all co-existing and the appropriate one is called depending on which database you are talking to at that point in time. So, when you connect to another database, its implementation is used and so on. That is XML was widely used as a method for data representation few years till now. But nowadays, it is not so popular. Can you tell me a reason why this is so? That is a good question. So, XML was a standard which was developed for data exchange and it is a markup language from the same family as HTML. But HTML was focused on displaying things. It was a markup for human readable text whereas, XML was meant to be a markup language where the data could be consumed by other programs. And XML is still widely used. It has not been abandoned by any means. All our current document from a doc x and all the open office formats, they are all based on XML. They are compressed. They are called dot something else, ODT, doc x or whatever. But internally, they use XML. So, XML is very widely used. But for the purpose of applications interacting with other applications, XML was supposed to be the standard. But it turned out that parsing XML, converting it to objects in your language, whichever programming language you are using and back to XML all had some overhead. So, it is very good as a standard. But for coding, it takes more overhead. So, for one of the intended purposes which was in particular in web 2.0, you have data being exchanged between the scripts which are running on the browser in the front end and your application which is running at the back end. And the data they exchange, if you just allow tuples in the relational model, that is a little painful. You are programming in language JavaScript, which itself has its own object model. And JavaScript designers basically came up with something called JSON, which is JavaScript object notation, which is simply a way of taking JavaScript objects and serializing them into bytes. So, it turned out to be very convenient to send this serialized JavaScript objects to the web server. And when the web server wants to send data back, it sends these JSON objects. So, now the overheads of converting from JavaScript objects to XML and back is all gone. It is a lot easier to program, lot less chance for mistakes. In fact, a lot, lot, lot more efficient. So, JSON took over from XML for all these things. So, today web 2.0 is dominated by JSON. Then people said, look, if you already have JSON in between the front end and the application, maybe we can store JSON in the database also instead of XML. So, there are many data storage systems which use JSON as the underlying representation. And the goal is similar, you know, why JSON, why not relational data? Well, there are some complex data types which are convenient to store as objects. And XML was the initial representation. Now, it is moved to JSON. Let us go back to our slide. We had client-side scripting, which I already talked about. I mentioned JavaScript already. And I said it is very widely used in web 2.0. So, what are the kinds of things you can do with JavaScript? It is a language. So, it is an interpreted language which runs on your browser. And what are the things which you use it for? So, initially, it was used for simple things like user input validation. So, if I expected a date or some such thing, I expected it in a certain format. If I expected the user to, you know, enter a city, the number of cities is very large. So, maybe I could take the first few characters as city name and then automatically show a list of candidate cities from which the user could select it. Now, this is something which is ubiquitous now. Pretty much any app which you go to, if you are booking things on a clear trip or make my trip or any of the other sites, you type MUM and it will give you various expansion with Mumbai being one of them. So, what it has done is two-fold. It saved you the trouble of typing all the letters and it also made sure it did not have to worry about what if you typed MUMBAL or something which is presumably not a valid place. And then it says, no, I do not understand. Please correct it and tell me. So, all that is gone. So, you can now interact and just click on one of the options which it tells you. So, it is a very nice way of interacting. So, that is also part of what JavaScript does. And lastly, it can change what is displayed on the page. And this is again something which is very widely used. If you click on Gmail, Yahoo Mail, any of the other Mail, Hotmail, Rediff Mail and so forth, if you click on something, the page changes, but not all of the page. Some part of the page changes and the remaining parts of the page do not even flicker. They are rock steady. What happened is that your JavaScript program actually changed the HTML text which is being displayed and then told the browser, please refresh this part of the display. You can leave the other parts alone. So, what happens is, how does the JavaScript program access the HTML and how does it modify it? So, all of this is done by having a tree model of XML, an object model of HTML, sorry, called the DOM model or document object model. So, this is a tree thing which you can access like. So, if you are familiar with Java or C++ or any other language, you know you can have nodes which form a tree. They have child nodes and so forth, parent nodes. So, each HTML element becomes a node and HTML elements can be nested inside each other. So, you have children and parents and the DOM model presents the HTML data as a tree and the JavaScript interface to the DOM model lets you traverse that tree and make changes to the tree. You can snip off a part of the tree and replace it by something else. You can insert something as a child of an existing node in the tree and so forth. So, the result is the HTML which was to be displayed on the page changes and the browser simply recalculates what has to be shown and shows it. There is also a technology called Ajax which was originally called asynchronous JavaScript and XML. The XML part is now vanishing in Ajax because JSON has more or less taken over, but the acronyms space and the idea of Ajax which is actually a way of doing things. It is not a specific language or anything. It is a way of making your web pages interactive by doing asynchronous submission. So, what is asynchronous submission? You click on a button. The web page does not freeze. You can continue to interact with the web page. You can scroll down. You can do other stuff. You can continue entering text in boxes whatever. Meanwhile, the thing which you clicked on something is happening. So, what is happening? A request has been sent. A HTTP request has been sent in the background and that has been sent to the web server which takes that request, processes it and sends data back. Now, this is not necessarily HTML. It could be JSON data. It is not replacing the contents of a page directly. What is happening is that your JavaScript program sent internally its center request and the response it got back typically as JSON. It uses it to update your HTML page. So, this is the standard model of interaction. What is asynchronous here? Asynchronous means that when you clicked on a button, the request got sent off. You continued working with your page moving around. The request is asynchronous. It does not block what you are doing. When the result comes back, the web browser gets the result. It is called multiple thread. So, there is a thread which realizes that a result has come and it invokes the JavaScript which originally called that request and lets it continue and process the result which came back. Now, the page changes smoothly without any pause in what is going on. So, that is how you have interactivity in a nice seamless way with no glitches, no pauses and so forth. As a few more examples of the interactivity, when you select a country name in a drop down menu, the list of cities within or states within that country get changed automatically. When you go to a movie booking site, you select your city, the list of theaters inside the city automatically appears. All of this is part of Ajax, the framework. I have a small sample of JavaScript code here. You are not required to write JavaScript as part of this course, but when I run this course for my students in IIT Bombay, I do include JavaScript programming as part of the course. We have more time. Here it is compressed. So, what I get them to do is write JavaScript code which makes their application interactive. So, using plain servlets, your application is interactive, but not in the way that the current web 2.0 things are. You click on submit, your page blanks out, a new page comes. That is the mode of interaction. With web 2.0 and JavaScript, it can be much more seamless. So, I have a piece of code. I am not going to bother explaining it in detail, but you can read it offline. What it does is, it has a form action equal to on submit. This part here says return validate. What happens is, when you do this, this function written just above here, script type equal to text slash JavaScript slash script. Within this, you have JavaScript code. There is a function here which is written in JavaScript. This function gets invoked, and the function can do various validations. Here, it is checking if the credit is between 0 and 16. If it is outside this range, it will say, sorry, it is an error. Please correct it. So, this is an example of validation using JavaScript. There are other things you can do with it, as I said. I should point out a word of caution, though. While there is a standard for the JavaScript language, the exact way in which it interacts with the browser, in particular the DOM model and so forth, have some browser specific dependencies. What this means is, if you write code which works on Firefox, it may not work in exactly the same way on IE, and something that works on IE may not work on Chrome and so forth. Initially, this was a big problem. It meant that a website now would only be accessible from IE. You would see websites which say, use only IE to access this, and if you were using Linux, you would curse that website and say, you idiots, I am using Linux. Can't you see? I can't use IE. Those days are mostly gone now, because what happened is, there are a number of frameworks, one of which is called Yahoo user interface, and then there is another framework called jQuery, and there are, I think that is partly from Google. There are several other frameworks which provide JavaScript libraries, and what these libraries do is, that they insulate you from the DOM model for the most part. So, you can write code using those libraries, which is identical. You don't have to care about the browser, but when that code is actually sent to the browser, of course, it has to vary depending on the browser. How does this magic happen? And the magic is that the web server can actually ask the browser, please tell me what type of browser you are. IE, IU, Mozilla, Firefox, IU, Chrome, IU, Safari, I mean, basically it just says what type of browser, the browser replies saying what it is, and version number. And these API libraries, the library looks at the response, and outputs appropriate JavaScript depending on the response. So, you don't have to worry about it. The library takes care of customizing it to your browser, and you just write the code once, and it runs on all the browsers in a completely seamless way. The flip side is that even today it's very easy to build websites, which will only work on one browser. The reason you don't see this kind of problem happening is, people have now got used to using these frameworks, YUI, JQuery, and others. And if you ever use JavaScript, I would strongly urge you never to do raw JavaScript coding, or even to pull JavaScript code from random sites on the net, it's terrible, it doesn't work. As an example, there's this clicker software. That code, you know, I tried it from Chrome, and it didn't work. So I asked the person, what is going on? He says, yeah, I know it works on Firefox. Please use Firefox. Don't use Chrome. Clicker is okay. I want to use clicker. I'm willing to take that hit. But if you're a website which wants to attract customers, you would drive away many people if you did such things. And the solution is not to use raw JavaScript. The solution is to use one of these frameworks, so that developer was not aware of this, and hopefully he will go back and change it to one of these frameworks. So this kind of problem will not arise in the future. I want to wrap up a few more things, then I will take first a few new topics, application architecture. So we have been talking about three layer architecture with applications, databases, web server and so forth. But within the application itself, there are multiple layers. For example, what you present to the user could be part of, you know, presentation layer. And this is sometimes called the view. So this is part of what is called the model view controller interface. The view is how things are presented to you. The model itself is the business logic. Your application program implements the business logic. When you submit a form, the application program decides what has to be done and implements it. So that is part of the model. The view is what is shown. The controller is a part of the system which receives events, invokes the model to carry out tasks and then returns a view to the user. So the idea is that your app does not have to be rewritten in order to allow access from a mobile phone with a small screen versus a browser on a computer with a large screen. The presentation changes. The model should not change. What you are doing is the same. You are accessing email. You are, you know, using Facebook. It does not matter what you are accessing. The underlying implementation, you should not have to rewrite every time you come up with a new way of accessing this thing. So this separation is actually very, very important in the real world. We do not illustrate it in this course because there are a lot of overheads to setting up these frameworks. But in the real world, these are standard. There is another layer usually below the, the model I said is business logic. Now the business logic itself is separated really into the part which is pure logic and a data access layer which mediates between the business logic and the underlying database. This could be directly a SQL or it could be some other framework which I will come to in a moment. So this shows what we just talked about. Web browser internet controller. Request comes to the controller. The controller talks to the model which accesses data, goes to the database, gets stuff back and sends it back to the controller who will then send it to the view part to generate an appropriate display. And that is given back to the controller which sends it back. So this is a very common architecture. Now it is a data access layer. So data access could be done using raw SQL or you can use various APIs which hide the SQL details. So the basic idea which is popular today is to use what is called object relational mapping. So your business logic is written in Java or some other language which has an object model and the mapping from data in the database to these objects. How do you do that? The standard way in which this would be done is you write JDBC code. So your JDBC code extracts individual attributes from the database couple and then it creates an object and fills it in and passes it to upper layers of the application. This was fairly common. This was a usual way of doing business. Sometimes you can take the data and print it out directly but sometimes you would fill it into an object and pass it back to an upper layer to use. The idea with this object relational mapping framework is instead of writing code again and again to do this mapping from SQL tuples to objects, you define the mapping one. In fact, the mapping is defined using some XML notation. It is not even a programming language. It is a much simpler way of defining the mapping. So for example, the Java class student could be mapped to a relation student with attributes of this class mapped to attributes of the relation and you can have more complex mapping. Now the application opens a session which connects to the database. So which database to connect to what are the user ID password is also part of this configuration. The mapping and database, all this is part of the configuration file. Now when you want to retrieve an object, you can say student dot get and provide the primary key of student. So every object has to have a primary key. So you pass the primary key student ID or whatever else and the API will actually run an SQL query which will be like select star from student where ID equal to the value which was provided and fill the object attributes and then return the object. So programmer simply wrote something like student dot get with an ID and the result is an object of the class student. Now what if you want to update the data? So there are ways to modify fields in the student object. This is a Java object and then you can say session dot save object. What happens now is that any updates you did to that object are copied back to the database when you do this. Actually it's done when you, this actually just marks it for saving. When you commit, that's when the things are saved back to the database. So all of this is part of object relational mapping and hibernate is object relational mapping system which is very widely used these days. Again we didn't have time in this course for labs on hibernate, but it would be nice to ask students to do this because many people are using it. So interested students can maybe as part of a project download the hibernate code and build a small application using hibernate. There are other things which avoid SQL. The entity data model is something which is developed by Microsoft which provides some, not exactly entity relationship model but something very close to it directly to the application. And the mapping to relations is done internally. It can be stored in a regular relation database, but your program can work in terms of entities and relationships rather than SQL tuples. You can write queries directly on these, few more topics. Web services, you must have heard this term web service. So what is a web service? It's basically a function which you can invoke passing some parameter over the web. And what it does is it executes something and returns some information which could again be structured in various ways. Originally web services were supposed to be used with XML in both directions. You send request HTML, sorry XML and the response which you get is also XML. But there was a lot of work to code these web services. So something else which became popular was what is called rest services, representational state transfer. Here what you do is your request looks just like a regular HTTP request. So you have a HTTP URL, you have parameters just like HTTP, it is HTTP, not just like it is HTTP. You pass your parameter as part of a HTTP request. Normally the HTTP request would have given a HTML page back for display. The only difference is that here what is returned is not a HTML page, but an XML or even a JSON data which is passed back. And this result is supposed to be not displayed directly to the user, but processed by your JavaScript or maybe even your Java code. You can invoke rest services from Java code. So it can process it as data and then do whatever it wants with it. So this is very widely used. Just as a contrast, big web services is the term used to mean the original ones which use XML in both directions. So they are also used, but the newer ones tend to be simpler. I have some slides on rapid application development. It turns out that the web was two steps backwards in terms of how easy is it to build applications. This is kind of counterintuitive. If you go back to 20 years ago, there were some very nice tools for building form-based applications. They made it really easy. They were called rapid applications or RAD tools. And they made it very easy to build client server applications, such as, for example, if you wanted to build a bank interface, you could use this RAD tool and with very minimal programming, you could build an application with many forms and formatting and so forth. Strangely, what happened is that once the web took over, all those tools bit the dust because nobody was building client server applications. And equally, strangely, nobody developed really good rapid application development tools for the web for a long time. I do not know exactly why this happened, but it did happen. Of course, people did not ignore it forever. And you know, Microsoft was one of the early ones. In Visual Studio, it provides a number of features which make it easy to build applications without a whole lot of coding. You can drag and drop controls, which then get translated into regular code, which actually builds the interfaces. So, Visual Studio was an early mover in this space. So, rapid application development for the web. Later on, I think NetBeans tried to do something like this. That project did not succeed very well and that kind of died. But there are other frameworks, which are not necessarily exactly drag and drop, but they provide an infrastructure, which makes it easier to build application interfaces. There is Ruby on Rails, Java server faces and so forth. I would not get into the details. I already mentioned Visual Studio. ASP.NET is part of the technology, which provides controls, which are interpreted as a server and generate HTML code. So, this is a nice framework for developing these applications. Visual Studio did suffer in popularity because whatever it generated tended to work best on IE and not work all that well on other platforms. I think they have cleaned up their act now, but the big problem is you still have to pay the money and use Windows as the platform. So, I think this is a good point to take some questions. Then I will get back to two things, two major topics, application, performance and security. I want to spend time on both of these topics, but before that I will be happy to take questions. There have been a number of questions on chat. So, I will take up the chat questions, but first let me take a few live questions. We have PSG College Coimbatore. Go ahead please. How can we perform the operation of relational database migration? That is a translation of schema information from one database into another database. Whether is it possible to create an algorithm for the relational database migration? So, when you say database migration, you mentioned schema. There are different types of migration. One type of migration is, I migrate from MySQL to PostgreSQL. So, I keep the same schema, but I am changing the underlying database. So, we have some experience with that. When we migrated from MySQL to PostgreSQL for our academic data many years ago, at that time MySQL functionality was quite bad for joints and so forth. We suffered from it for a while and decided to migrate to PostgreSQL. I think MySQL is a lot better now, but PostgreSQL was better and I think for that application domain, PostgreSQL is still much nicer, which is one of the reasons we are using PostgreSQL in this course. So, the migration there involved many low level details. For example, MySQL string comparison would ignore case always. In PostgreSQL, case sensitivity was guaranteed. That is actually how the SQL standard defines it. Data is case sensitive. So, in MySQL, it would even allow weird things. I would have a column called department code, which would be capital CS in one place in the referring tuple and in the department relation, it might be small CS. It would actually allow that to match to. It did not even enforce foreign keys. That was another problem that there was a lot of data which violated constraints even, which had to be cleaned up. So, there were many issues and practical issues in migrating, including cleaning up the data, remove things which were missing foreign key, invalid entries in tables which did not satisfy foreign key constraints, changing everything consistently to, I think we change it all to upper case and so forth. So, there were a lot of practical difficulties in migration, which we had to deal with. Again, some queries which would run on MySQL would not even parse on PostgreSQL. So, we had to rewrite the queries also. So, there is a good amount of work to migrate between one platform and another. This is part of the reason that people like Hibernate, because there is no overhead to migration. Your Hibernate code can be retargeted to a new database and it will work. The code will work. Of course, you have to move the data that headache still remains and then you have to deal with inconsistency like the things I told you. That headache does not go away. It is just that writing the queries no longer is an issue. Rewriting the queries is not an issue. The other part is if you change the schema, how do you migrate data? And this is a very specific thing to the schemas which you use. So, what you have to do is, you have to decide the mapping from the old schema to the new schema. This is a standard problem which arises whenever you change the application. For example, in IIT today, we have all our data running, all our financial data on an application which is built by TCS. Earlier, we had the same similar systems, things built on FoxPro, which stored data as tables, although it did not have SQL, it did not support SQL, it did not support foreign key constraints and so on and so forth. But they were tables. So, there we had a different schema in FoxPro and a different schema in the system which TCS built. So, mapping data from one schema to the other required also a lot of work. We had to figure out the mapping. It was very application specific. I cannot give you a general purpose algorithm for this. We had to understand what were the real world objects that were being modeled in FoxPro, what were the equivalent real world objects in the TCS system and then for each such object, we had to write code to do the mapping. So, take data from this table, stick it in another table, do these transformations, take data in this table and break it up into these other tables. So, there is a lot of stuff which had to be done. Some of this we could do in a easy way by simply copying the FoxPro tables into Oracle and then writing Oracle queries which would read the FoxPro table and put data into the tables that the TCS system used. But, for some of these transformations become extremely hairy. Coding it as SQL was possible, but extremely difficult. So, that is when we realize some of the limitations of the SQL language. It is fantastic up to some point, but when things become complex, it is actually very hard to understand what is going on. So, at one point for the most complex transformations, we actually gave up SQL and wrote, we gave up declarative SQL, I should say. We wrote stored procedures which would do the transformation and that worked quite nicely for some of those complex cases. We have JNTU Hyderabad. JNTU, please go ahead. My question is, you said you were developing an application in three-layer architecture. I know what is the basic difference from three-layer architecture to the N-layer architecture and how exactly that enhances the performance of an application? The question was between three-layer and the two-layer architecture. How does that enhance the performance? So, let me show the slide back. So, this is the three-layer architecture where we had a web server application server separate. In the two-layer architecture, the web server and application server get merged. So, why does this enhance performance? Basically, if you look at this architecture, there is a process which receives the request. It then sends the data over to another process which does some stuff, sends it back and this sends it back. So, what happens here is you have extra processes and inter-process communication. Now, this introduces delays because process switching takes some time. There are overheads to process switching and you know then number of times processes switches also relatively limited. Now, none of these will matter for a low load web server. It does not really matter which one you do. But, if you are pushing the boundaries and want to get as good performance as you can from the hardware which you have, this second architecture reduces those inter-process communication and context switches and improves performance tremendously. As in related but not exactly direct example, long ago which is means a few years ago, four-five years ago, whenever gate results were put up, many of you would might have written gate at that time and if you ever use the gate online interface to check your gate score, you would have found that on the day when the results are announced, that interface is useless. You go to it and you cannot get anything out of it, the system hangs. You cannot even connect to the server most of the time. What is going on? It turned out people used the CGI program. There is a web server and it would run a actually a grep program, Unix grep program which would search for your roll number in the file with result and if there was a line matching it, it would output the line. If there is no line, it will say didn't find a match. So every single request turned into a fresh process that was launched. Now each process launch carried some overheads. So with the hardware of that day, you could process maybe a few hundred requests in a second. Actually it was even the get process itself was very slow, so it was even worse. You could process a few tens of requests a second. So in the first hour which is when all candidates wanted to find the results how many could you process. If you could do ten per second, you could do 36,000 requests in one hour. That's not at all enough. In that era, GATE already had more than three lakh people writing it. Today it's more than a million. So not even one tenth of the request could be processed in that time. Maybe one fifth of the request. Not everybody used it immediately. So the performance was terrible. The system would basically just hang. The number of connections exceeded all kinds of problems would happen. Then we recorded that same application using Java servlets. What happened here? Each request now was simply a fresh thread. The do get method was invoked with the new thread. The thread creation time is really minuscule and we did some benchmarking which showed that we could handle in that same hour. So coming back, the issue was that the system using CJI was terribly slow. The new system could handle all the applicants coming within the first hour. It could handle it easily using a very modest desktop machine. So our benchmarking showed that it could handle it easily. Everything was loaded in memory really fast. It's another story that I think it didn't quite get used. People found some other work around which was kind of tolerable. They bought more expensive machines and paid a lot of money to do something which we could have done using a simple piece of code on a desktop machine. So the solution is not always buying more hardware but by writing your program in the most efficient way possible. It was not hard to do it either. It was only a little bit of work. I already started talking about application performance. It's a good lead into this slide. So first of all, switching from CGI to Java servlets and loading stuff in memory was a big huge performance win. But for a generic website, there are a number of other techniques which help in improve performance. So one of the techniques is caching. So if you had parts of a page which have to be shown to many users, you can cache it. That's one kind of caching. Another kind of caching turns out to be very important for applications which make heavy use of database connections. So whenever a request comes in, your typical application might open a connection, JDBC connection to a database, process it and close the connection. So I said opening a thread is very cheap. But opening a database connection is not cheap. There is a lot of overhead to communicate with the database. The database takes the login password, processes it and tells you, okay, now I send me your query. All of these take time. So the number of database connections which you can open and close within a second is rather limited. You really don't want to open a new connection for every request that comes in. So instead what is done is called connection pooling. So what happens is that your application server will keep a number of open connections to the database. When a request comes, instead of opening a fresh JDBC connection, it will be given the one of the existing JDBC connection and it sends a request from that connection, gets a response and gets a query result and moves on. And when it's done, it doesn't close the connection. It returns the connection to the pool. This is a huge performance improvement from opening and closing connections. Another technique which is used is caching of database query results. So you can cache entire fragments of a web page which is, so what do you mean the caching fragments of a web page? You might have contacted the database and filled in something and created a part of the web page which has to be given to many users. Instead of going back to the database every time and refilling it, supposing the content doesn't change very often. Every five minutes you want to update maybe the weather. You don't have to go and check the weather on every single request. So you cache that part of the web page which you've pre-computed and further request show that part of the web page for five minutes then you refresh it. So that is caching of generated HTML. Another kind of caching is query results caching. You executed a query on the database. You don't expect the result of the query to change so you keep it locally and reuse it. This is a common thing for example I have a department code in the database and I have a separate master relation which has the department name. So I could join it with the department master table that's often done but sometimes I want to just look it up in the application program and print the name. So if I cache the result of the query which finds the department code and name of all the department, then my application program can simply look up the department name and print it. It's a cached query. Now of course it may get updated. If it gets updated what do you do? You could reload your application and this is acceptable for things which change very very rarely. If something is going to change more often your caching system may provide a way to purge its cache and reload it. So there are ways to do that. Let's look for, oh yeah, one last part of caching. So all of this caching is at the web or application server. There's another kind of caching which is done in your browser itself. Your browser caches HTML fragment and if you're using a web proxy to connect to the outside world the web proxy also does caching. This is unrelated to the server side caching. Now let me come to the last major topic for this part. I also hope to do a bit on database storage as opposed to application development. Let's see if we have that, but mostly we don't. So what I will do is I will start on storage tomorrow only and instead today I will just use whatever time is remaining to finish this up and then take questions. So the first topic in security is SQL injection. I had given you a small example of this when we did JDBC and I said that any user parameters which you take should never be concatenated into SQL query string. Instead you should use prepaid statements and set the parameter. So let me just refresh you on this because it's a very important topic. If you didn't understand it last time, I'm covering it all over again. So here is an example of a query. Select sdar from instructor where name equal to plus name double quote. So the user was supposed to type a name and the system was supposed to give in information about that instructor and the sample quote here in place of a real name the user type x quote or quote y quote equal to quote y no closing quote. So when this is concatenated here the query you get is select sdar from instructor where name equal to actually the null string or y equal to sorry name equal to not the null string the name equal to character x or y equal to y the string y equal to the string y of course the string y equal to y so it's true so it would output information about all instructors. Now what is wrong with this? There are many hacks which you can use to this. One hack is instead of just showing the names of all instructors which is relatively benign. The user could have type x close quote semicolon update instructor set salary equal to salary plus 10,000 semicolon dash dash and what happens then this x quote closes the first SQL query the semicolon is a separator and the next part is another SQL query which the JDB system would allow. You can have multiple SQL queries in one string and it will execute all of them. Why doesn't it turn it off because there are applications which use this feature. What else? So here you manage to update instructor salaries you could have drop tables you can cost all kinds of havoc. There are other subtle things you can do. There are many websites which in order to authenticate user would take the username and password and write a query such as the following. I don't have it on the slide I'll use the whiteboard for this. So a badly written piece of code to authenticate a user would look like this select star from users where user name equals. So this is the query created by concatenation single quote. So you started a single quote plus name. So this is a name which was taken from web form let's say plus inside double quote the single quote to end that name and password equals single quote to start the password string double quote plus password plus we just have to have a single quote to end that if you want a semi-colon you can add that. So that this is a example of very very bad piece of code let me write that in big letters bad never write code like this. So what is the problem here? The hacker would type a name which looks like let's say name equal to x quote or let's say 1 equal to 1 and then semi-colon dash dash. This is what was typed into the name box. So this is a name and this is password. Doesn't matter what you type into password it's going to be ignored anyway. So what happens now? The query which is actually executed will select star from user where user name equal to x or 1 equal to 1. So it's actually going to return all users. Now how does you know wouldn't that confuse the application? The point is the application might have embedded this query as part of a exists query. So the idea was if there exists a row in the user table which matches the username and the password then you allow the person you say okay we have authenticated you I know who you are. So that was the exists query using this. But what this hacker has done is he subverted the system to run a query which effectively selects star from users where true and this result will never be empty. So this test will be passed immediately. So typing in a few characters the hacker has completely bypassed the user ID password check totally. So in a few seconds the user has could have logged into your system using anything. In this case he logged in as x. So he could have also logged in as whoever he wanted by replacing x by any string that he wanted. So the hacker has now become anybody and can do anything on the database. The hacker no longer needs to even do SQL injection anymore is logged in as somebody and can do whatever that person could have done transfer money you know modify account balances if he is a teller whatever it's a complete security nightmare. So first of all prepared statements would have avoided this. Let's go back to the prepared statement if use properly would have avoided it. If you use it in a stupid way you're going to be in trouble again look at this one connection.prepare statement and here the string is being created again by concatenation. As soon as you create an SQL query where you take user input and concatenate it you are vulnerable to SQL injection should never do this. Instead what should you do? You would have something like prepare statement where the query is select star from user where name equal to question mark that's just the fixed string and then you would have you know statement.setString1 to the name. So what happens if the user types in a name with a quote the JDBC API will insert escape characters before the quote. So the user typed in single quote which was an attempt to close the SQL query but the JDBC API will add this backslash quote so it treated as a actual character not a by SQL not as a quote which terminates an SQL string. So use prepared statement properly you can avoid SQL injection. There is another security hole which is exploited quite a lot and that is called cross site scripting. So the idea is as follows. There are many websites which will display images from other websites. In fact advertisements are typically displayed this way. So when you visit a particular website you see ads. Did the ads come from that website? No. It's actually there's a piece of code which is maybe a frame or whatever and in there it says load this frame from doubleclick.com or Google ads or Bing ads or whatever else any other site with self ads. As another example even simpler example images are usually embedded in page by having a HTML tag that looks like this image source equal to and then a URL the double quotes are missing from this slide so this should be enclosed in double quotes. Now the normal use of this would have been to download an image from that site but look what we had put in here. Does this look like something which is really an image? This thing says mybank.com transfer money amount equal to 1000, 2 account equal to 14523. Now what on earth is this? This is URL which presumably is a URL used to actually do money transfer. If I had logged into mybank.com and I wanted to transfer money I would execute a query which kind of looks like this. I don't type this. I enter an account number and I say transfer money from my account and this is the thing which gets executed. Now the thing is supposing I had not logged into the bank and I visit a website which serves me image which looks like this. What will happen? The bank this URL will get fetched meaning the my browser will send the request to the bank and the bank will say sorry you are not logged in and rejected. So no harm this. But the catch is that if you had actually logged into the bank and then you went to another site which had a piece of code like this you would actually end up transferring money to that account assuming the URL is a get URL and so forth. What about post URLs? Well you can write a JavaScript program which can invoke post your URLs to the post method also and do similar hacking. So with JavaScript you can do even worse things. So the risk is whenever you go to one site that site can make you do an action on some other site and so this is called cross site scripting or XSS or it's also called cross site request for XSRF or CSRF. There are many terms used for this. So this is a small example of how you can hack one website and when the user visits that website a request gets executed on behalf of that user by that user's browser on some other website and causes all kinds of havoc. So how do you prevent this? In fact the most insidious form of this is comments. Many websites allow you to leave comments and when a new user comes the comments are shown to that user. The problem is if somebody wrote a comment which included a piece of text like this image source equal to blah blah blah that was a comment which the user typed in and what happened the system saved the comment and when a new user came a perfectly honest system. So this is a website of some newspaper or something which allowed comments and on a site showed this user comment which included a cross site scripting attack. So some hacker has used a website of newspaper let's say which the website itself was not compromised it was an honest website but a comment that the hacker put in cost me to get into trouble when I visited this perfectly honest newspaper website. That's extremely problematic. So to prevent this the standard thing which is done is any user input which is taken by websites is first stripped of all tags which could be like this. So anything with less than and so forth like this is stripped. So what you get is text which will display as is in HTML it will not be treated as a HTML tag which gets interpreted and executed in some way. So what is done is the removal of all these HTML tags or at least text to allow only very restricted HTML tags is called cleansing of user input. So there are APIs to do this. So that's done to on any user input. So that's what you do to prevent your website if you are that newspaper you will make sure that all these things are stripped from user comments to prevent somebody else from being attacked by you. What about Moodle? Moodle allows you to enter text which is then shown to somebody else. When does this happen? It happens often when if you have an assignment Moodle has a feature which allows you to answer an assignment in line or a quiz which has short answer questions you can type text in line. So you the student took the test and now the instructor is viewing your text. Supposing Moodle allowed arbitrary HTML to be in there the student hacker could put in HTML code like that and do a in this case it's actually within the same Moodle site but the idea here is that the instructor is logged in and this code gets executed as the instructor and the student could change their own grades by using this kind of scripting attack that's pretty insidious. To prevent this in any of the online quizzes you know anything in Moodle which takes in user input and then displays it to somebody else all HTML tags are removed. A side effect of this we found out the first time we use this feature the Moodle quiz feature to take SQL query. We said let users type in SQL query and we will evaluate them offline and guess what some of the queries involved conditions like r.a less than 5 and guess what happened all the less than got eaten up. So students submit queries the TAs try to run the query and this query won't run and they would get queries where a huge part of the query had been eaten up completely missing. The student says but I typed it the TAs says but there's nothing here and that's because the cleansing removed big parts of the query between a less than and greater than that appeared somewhere else in the query. So all kinds of havoc resulted so we kind of stopped using it. There are ways around it you can type ampersand, LT, semicolon or whatever knowledgeable user can type that instead of typing less than. So there are such ways to deal with it but that is something you should be aware of. That's to prevent your site from being used by somebody to attack other sites. What about stuff to do to prevent your website from X cross-site scripting attacks launched from other sites. So you are mybank.com what can you do to prevent this. So one thing you can do is any request that comes to you must have a referrer attribute not must have can have. So the browser normally such this to the web page which had the link when the user clicks a link on a web page the referrer attribute is sent to the URL of that initial web page from which the link was clicked. So it is a good idea to check the referrer page if that referrer page is from mybank.com that means the user logged into mybank.com and then clicked on money transfer submit button. So that's something which came from the same site but if this hacker try to include it in the page from newspaper.com the referrer field would be set to newspaper.com and mybank.com will reject that. So that's something pretty useful. Another trick is to check the IP because the referrer can be hacked. If you could get some part of the cookie or whatever out it may be possible to launch certain attacks from another website. So it's good to check that the request came from the same IP as where the user logged in. Even if somehow the user's cookies are stolen somehow if the attack is launched from another computer that can be prevented. So that's quick summary on web security. There are obviously many many more issues but I want you to focus on two things. Cross site scripting and SQL injection which have been the most dangerous and most common attacks in recent time. I should just mention one other minor point. This referrer attribute there was a time when people were using the referrer in one page of a website. In order to check that the user had logged in and come from another page of the website. It turns out that this is a very very bad idea because you can hack this up. And when I was running, so in fact some of the applications in IIT Bombay had used the referrer check to see if somebody had already logged in and was coming from a login page. That was a very bad idea. They did not check for session in the sub pages. So there's the main login page which would check session. And from there if you click on a menu to go to subsidiary pages, the subsidiary page should have checked the session and should have checked who the user is before allowing further access. Unfortunately the applications did not do any such checks in the subsidiary page. All they did was check if the referrer page was the initial login page which had a menu. A series of links. This unfortunately can be forged very easily. So just a few weeks back when we were doing the coordinators workshop, one of my TAs came up to me and said, hey my friend tells me that this website inside IIT can be hacked. And my friend is able to see my grades which is not supposed to be able to see. So then I realized that this is something which we knew about long back and I had told the programmers to go fix that code. And I realized that hey probably they didn't do it. I had asked them but they didn't do it. And so you know I just did a bit of hacking. Just using Chrome I could go in and first go to that website. And then in that website I could edit the HTML code and add a link which would let me access this student's roll number using URL with the roll number in it. And I saved that so you can actually edit HTML of which was downloaded from a website. I won't update the website but the HTML is available locally. Now I click on the link, the referrer is set automatically by the browser. And low and behold I could see any student's roll number without even logging into the website. So this is a big vulnerability. So of course this time I went back and said hey I told you long back you did nothing now look somebody has figured out how to hack it. I hope they have gone and fixed it. But the point I want to make is that there are lots of security holes. Just identifying it and knowing about it is not enough. People have to go fix them and people are very reluctant to do this until they have egg on their face. Till somebody actually hacks and they are shown up they postpone it. And this exactly what happened. So this is a big issue. Here it was just somebody's grade but there have been attacks which involve huge amounts of money, big banks, credit card processing agencies, big companies which should have known better have had SQL injection vulnerabilities and cross-site stripping attacks. Both of them have been there in websites of major companies that deal with money of all things. And people have hacked and stolen hundreds of millions of dollars actually using such methods. You don't hear about it that much because those companies don't want you to hear about it. Of course after any such attack they go fix it. But it should not have happened in the first place. It happened because they were not aware of this. So I want you to make sure that every single student of yours is aware of these things. Today hackers know all about this. If you an application developer don't know about this you are a sitting duck. Your app can and will be attacked. So every student should know about these hacks, these attacks and has to make sure their code is not vulnerable. So as a teacher it's your duty to make sure your students know about this because when they go out in the real world they are going to write bad code which is vulnerable and get somebody into trouble. And unfortunately I find that most students are not aware of it. When we interview people for positions here I ask them about SQL injection most people don't know. I did a small sample when coordinators had come to IIT two weeks back and I ask them do you know about SQL injection attack. And very few people knew about it. People had heard of prepaid statements. They knew that they were supposed to use prepaid statements but they didn't quite understand what it was for and why it mattered and many people didn't even know that. So please emphasize this to your students. Okay I will stop here.