 Let us move on to today's topics. So today we are covering three topics or perhaps I should say as much as possible of these three topics because I believe that we actually have a rather tight schedule for today. So the last chapter for today which is indexing I will probably not cover in full detail today. I may shift some of it to tomorrow. So today's agenda is to start off with how to build web based applications. Now application design is not really the same as database design, it is a different issue. However, what we realize is that most applications are database backed and typically there are no other courses which cover application design in most universities. So we took it upon ourselves to cover application design along with database design and that really helps our database projects too because what uses a database project which does not have a front end. If it is just a bunch of relations and SQL queries, it is useless, nothing can be done with it. So any realistic project has to have a user interface and it has to have some notion of what it is implementing. So from the viewpoint of our course projects also application development is very, very important. So in IIT Bombay, we tend to cover the details of this particular chapter on application design and development as part of the lab associated with the course rather than with the course itself. These are mostly very practical issues, there is no deep theory here but there are important practical lessons which students have to learn. So depending on how you do it in your university it could be part of the theory course or an associated lab component. Now this chapter in our book is actually fairly extensive. To continue, we are going to look at web application design primarily because today pretty much all interfaces are web interfaces. This was obviously not true about 15 years back because that is when the web made its appearance. The web started appearing around 1993 was I think when it was introduced, it soon became well known by around 1994-95, it started growing within the US and Europe was where it started of course and it spread to India maybe just after that 95-96 people in India started becoming aware. So in that era well into 2000s, pretty much all database applications were not yet web based. They were standalone applications which people had to load on their PCs and then run and this had a lot of problems. One of the problems was that whenever you change the application you had to actually copy it onto thousands of PCs across the world. But if you did not copy it, you would have one copy of that application using the old code and another copy using the new code and they could clash and there could be lots of problems due to this. So having separate standalone applications which are distributed across a geographic region can be very troublesome. Even if it is not distributed, for example most of the banks in India before the advent of core banking, they had their databases sitting in each branch. So each branch had a number of terminals which would access the local database directly and perform updates on it. So what was the problem with this? The problem was that every bank had to have a person who was in charge of keeping the database secure, who was in charge of backing up the database. If any problem happened, fire flooding, the bank could not afford to lose data. So every day backups had to be done. If there was any hardware problem with the server, they had to deal with that. So there were a lot of issues and then each branch would have its own copy of the code and sometimes updates would happen on one branch and not in another. So the bank would be dealing inconsistently across branches. They would be doing different things in different branches. So this became a nightmare and for this reason, in addition of course to the convenience to customers that you can go to any branch anywhere and do your transactions, that is the reason why banks moved to core banking. And in fact if you have ever gone and looked at what people are doing behind the terminal in banks which have core banking, wherever I have looked, they are actually using a web browser. So what has happened is all our banking functionality now runs on the web and this is something very core. Of course, they do not run on the public web. They have their own intranet. Public web is asking for trouble because people can easily hack in. But what has happened is that they have an application sitting in the central servers. There are probably multiple machines in there handling the request, a single database and everything is the interface is completely web browser based. The same thing has repeated in many places. Wherever applications interact with end users as opposed to internal customers, internal staff, it has to be web based because there is no question of your installing an application. Well more or less there are exceptions but it is almost always web based. So web interfaces are core today. They are not universal. Even today I am sure that many of your colleges use applications which are not web based and part of the reason is it is actually easier to develop client based applications. The tools for those are even today they are a little easier to use than web based tools. It is changing but it is not change 100 percent. So many companies are selling college and university automation systems which are not web based. In my opinion that is a very backwards step. It is asking for trouble because upgrading, maintaining everything is harder. In fact, there is another well known issue of security. So NIC reported at one point that they had built an application for somebody. This was a while back before they really went into web based applications. And the problem was with client server based applications is that code runs on the client's machine. And the server in these things is a dumb database which no program is running on the database. The programs are all running on the client's side. So the database simply offers maybe an SQL engine that's it. So now the problem is that the code which is running on the user's machine, the user can modify that code. It's not very hard to do so. So somebody actually did a demo for them where in the space of five minutes they were able to get direct access to the back end database and modify it as they please. Even though the code was written carefully such that it did all kinds of validations and prevented updates, the problem was that code had direct access to the database. It was sending SQL queries directly to the database through the connection. The password for that connection was on the client machine. So these hackers could easily get at that password, directly connect to the database and directly do the updates in there, defeating security entirely. So this is the other major Achilles heel of running code on the client machine. The client machine should just be front end which offers screens all logic including authentication, authorization, whatever, what not should never ever run on the client side. Now that doesn't mean no code should run on the client side. There is often a need for code which does presentation. It builds the graphical interface. It allows you to drag and drop and do fancy stuff. That can decide on the client. So that is a fundamental issue in application design. So these slides cover some of these issues. So first of all as I said an application has to be split into several parts. There is a front end, a middle layer and a back end. The back end is usually an SQL database. The front end is stuff that runs on your browser. And the middle layer is really the application program. The core of the logic is in here. So the front end has web-based interfaces. Initially the web-based interfaces were all static HTML. So you have a HTML page, you fill a form, submit, goes back to the server, does something, gives you a result. Now you go to another form, fill, submit. So every interaction went to the server and came back. Now some people realize that it would be better to catch certain errors at the beginning rather than send it to the server and then catch it. What if somebody entered a roll number wrong? Why even send it to the server? You shouldn't, can't you catch it locally? And thus arose JavaScript, scripting language, which was initially designed to do simple tasks at the front end. Over a period of time JavaScript was actually very nicely designed. It had a cool interface which allowed it to modify the webpage which was being displayed. And people made very creative use of it to build applications that had very rich functionality. For example, if you use Gmail or Yahoo Mail today, you can do all kinds of stuff which happen locally without actually going to the back end. And that's a lot faster for you because you don't have to wait for a round trip. Submit, goes somewhere, takes a while to come back over the network, and then your screen is refreshed. That is slow. So the front end over here, the graphical user interface has become quite rich over the years without doing any software installation. All of the software which is running is actually being shipped to your browser by the website on the fly. So that's a quick overview of what we want to do. Now here is a diagrammatic representation of the evolution of applications over three eras. The first era was before the PCs. In that era, the only computers, real computers, were big mainframe machines which sat in a data center and cost millions of dollars, crores of rupees. And the users, nobody had machines in their home. Computers were very expensive, even PCs. So the only thing which people could afford to put at the clerical staff's desk were called terminals. They were called dumb terminals initially because all they could do was enter characters and display characters sent back from the computer. But soon people realized that dumb terminals take up too much load on the mainframe. So very soon, IBM had actually introduced what they called intelligent terminals. It turns out these intelligent terminals could accept a form and with fields to be filled in. It allowed the teller or the clerical staff or whoever to fill in the fields in that form. And then there was actually a button called submit on the keyboard. It was not on the screen. Today you're used to clicking submit on the screen. In that era, there was a submit button on the keyboard. So they press the submit button. And only at that point did anything go back to the mainframe. And then the mainframe processed the request and sent something back which again would be displayed on the screen. And guess what? The screen display could be formatted. You could have fields here and there which displayed information in a way which made it easy for users. So when the web first came up, there's a person called C Mohan who is an IBM. He's very well known for his recovery algorithms. We will see a bit of that when we cover recovery. So he was going around telling people that when the web came up, he said, hey, the web browser is nothing new. IBM had it from the 1960s because it already had a terminal with forms and submit button, which is actually true. It seemed very odd at that time. But if you think about it, it was really the same idea, but with one big difference. The terminal could not connect to any old application it wanted across the world. It could only connect to that one application on that one mainframe. That is it, it couldn't talk to anybody else. The reason the web explosively grew was you could talk to anybody in the world. So that was a huge difference. Okay, so that was the mainframe era where you had a network which was typically running on top of phone lines and terminals. If you bought an airline ticket at the Indian Airlines office, they had a terminal connected over the phone network to their central computer, wherever it was, Bombay or wherever it was. The next era, so PC is becoming very cheap. So everybody who was, let's say, in industry or in faculty and universities all got their own PC on their desktop. So then a lot of people said, these mainframes are very, very expensive. Don't buy mainframes. Run a database server on a PC and then run your applications on PCs. And so desktop PCs, containing an application program which ran over a local area network and connected to a database running SQL. This was the personal computer era. This was roughly the 1980s, early 80s through early 90s. When most applications were built like this, and in fact people would go around and saying, D blue, what was this, D blue? Well, IBM was known as Big Blue. Its trademark color was blue, so it was widely known as Big Blue. So what people went around saying is, D blue yourself. Go away from blue and run things much, much cheaper by installing fairly cheap PCs in your office. Don't buy a mainframe anymore. And that actually convinced a lot of people, it worked, it reduced costs. But like I told you, there were a lot of security issues which were okay as long as you had airline staff sitting in an airline office running that application. But there's no question of allowing anybody in the university or in the world to access that application. And when the web came up, it solved all the problems of the personal computer era. So as I said, web browsers are the de facto user interface. And they're very convenient because you don't need to do any installation. If tomorrow Gmail wants to add a feature, it's very, very easy for them. They don't have to tell everyone, okay, now you have to install the new version of Gmail. No, instead, they just update the application on the server. And when the user logs in next time, they will receive a slightly different JavaScript, which will make the screen look slightly different. But they don't have to install anything, it just works seamlessly. Okay, so now, here is a quick question to get people awake. Center coordinators, please make sure your receivers are connected and the software is running. We will activate the question in a bit, but let me read the question. The question is, webmail systems such as Gmail and Yahoo use static HTML only, run JavaScript on the browser, run Flash on the browser, or run C code on the browser. Those are your four options. Don't answer the question yet until we tell you to go ahead, okay? Time is up. So I had given this answer to the question right on this slide and the previous slide, the answer is they use JavaScript. What is, in an earlier era, maybe 10 years ago, Yahoo Mail and I think even the early versions of Gmail use primarily static HTML. Even now, there are many mail applications which universities use. We use something called Squirrel Mail, a web-based interface to our internal mail, which is basically static HTML. It's an old system. But with that, you cannot get rich functionality. You can't drag and drop, you can't have mail refreshing itself automatically without disturbing the screen and a number of other functionalities which you have come to expect out of these systems. The only way to provide it is to run some scripts, to run a program actually on the browser. And that program is typically in the JavaScript language, which all browsers support. There is this system called Flash, which is used widely for videos. And in fact, Avue, I believe, uses Flash. So what you're seeing today is actually running Flash on the browser or maybe as a standalone application. The thing with Flash is it's designed for video and audio. It's not that useful for basic graph GUI functionality, although it can be used for that. And the last option is C code on the browser. That would be a very, very bad idea because when I say C code, we compile C code. If you download machine code directly and run it, the problem is that it can do anything it wants to your machine. So if somebody emails you in .exe file on your Windows machine, you say open it and run it. Guess what it's most probably going to do? It's most probably a virus which has replicated itself and it's going to take over your machine and there's no control. So this is a very, very dangerous thing to do. But then JavaScript is also a program. So why is it okay to run JavaScript programs but not to run a .exe? And the difference is that JavaScript language is controlled. And the program which you get can be verified. The interpreter will prevent it from doing any actions to the local files. It will not let you arbitrarily save data into some local file, overwriting it and so on. So it is protected. The language interpreter will make sure that the program doesn't do any damage to your computer. So it is safe to run JavaScript. Actually, even then running JavaScript from an arbitrary website which is designed to exploit errors in JavaScript interpreters can be dangerous. But any genuine website will not have such evil JavaScript programs. And so it is safe to run JavaScript. Now coming back to the answers, the answer is obviously flash. Okay, so now I took a while to motivate the web. Maybe this is stuff you knew already. I'm sure you know about HTML and hyperlinks. I'm going to skip some of these slides in the interest of time. But introduce you to this terminology which some of you probably know, most of you know, but in case you don't. HTML is the formatting which you see if you look at the source of a web page. It's the hypertext markup language. HTTP on the other hand is the hypertext transfer protocol. And this protocol actually does more than simply send a HTML page back to you. This protocol allows a two-way communication between your browser and the application. The browser can send a request, the application can send HTML text back. But it can also tell the browser, here is a piece of text, save it. It's called a cookie. And it can also tell the browser, I think I sent you a cookie earlier. If you have it, please send that value back to me. And that value is used to identify who's the user, as we will see in a bit. So there are many more things which are part of the protocol. But it's a two-way communication. The browser can ask something. In response, the application can ask for something more. The browser sends it. The application asks for something else. The browser sends it. Then the application sends the final web page back. And after this, the browser can again ask the application for something else. So it can keep going on. HTML itself is a markup language, which as you know, provides formatting, bold, italics, tables. It allows forms with input features, which you're all familiar with. Now here is a small piece of HTML source code. Again, I'm going to skip it because we don't have too much time. If you're not familiar with it, go read it because it shows you how to format a table using raw HTML. So there are table with table row, some of which are headers. This ID name and department here form the header. And then there is data for that table. We have not shown all the data. But each row would be over here. And then there is a form in here. And the form has some text. It also has a select button, which we are going to see in the next page. I'll come back to it. So here is the table with the header ID name and department, three records. Then a form, which lets you select search for student or instructor. Then you can enter a name and then click submit. If you come back here, the student or instructor is here. There's a select box. The name is an input box of type text and 20 characters. And then the submit button is just a button, which submits the form. And where does the form go? The form goes to the web server from which this form was downloaded in the first place. But with this thing called person query added to the URL, the method is said to get, which means that the values for these inputs are actually going to go as part of the URL. If you have seen URLs with some prefix, and then it says question mark, then attribute name, value, then there are some separators, then question mark, another attribute name, value, that is the get method. There's also a post method, which uses HTTP to transfer the values with the user as input. It does not become part of the URL itself. The post method is probably what you should use most of the time. Get method is used purely for read only stuff, where you are not doing any update. So you should avoid the get method, except to get the form initially. When you submit the form, it should go by the post method. So the example we gave here is not really recommended, except for forms which are read only. In this case, it's a read only form, which is going to display the students with the given name or instructors with the given name. For this, get is okay. If you are doing any updates, you should never use the get method. So web server acts as a front end, it receives the request, then the request has to be processed. So how is it processed? So web server itself may have a code for executing Java, Java byte code. So that is one option, where the web server is also going to run the application program, which was written in Java. Another option is for the web server to invoke another program, and that program would be an interpreter. The interpreter would be for let's say Perl or Python or one of the other scripting languages or it could invoke a complete executable program, which then runs and there is a little bit more, because the program needs to interact with the web server to take care of the HTTP protocol. It may need to tell the server, get this cookie from the browser. So there is an API, which lets the application program and the web server talk to each other. And that interface is called the common gateway interface CGI. That was the initial, the first one, but we are not going to use it. We are going to use a situation where there is a application server, which can directly receive web requests and that application server in your labs tomorrow will be Tomcat. The Tomcat server can receive directly web requests and process them by executing Java code and then send the result back. So our web server is Tomcat tomorrow. So now here is a typical three layer web architecture. You have the web server, you have an application server, a database server and then data. As I said the web server could talk to the application server Tomcat and pass data on. But these days typically this overhead of having two separate processes is often avoided by having a single process, which acts as the web and application server. So Tomcat can do both these tasks. It can run the application program and it can take care of web requests and responses. So that is what we will be using. And Tomcat can connect to a database server. Now a lot of web applications have millions of users. So if you have a single machine down here, there is no way it can keep up with that kind of load. So what they do is instead of having a single web server here, they will have thousands of web and application servers. So once they have thousands of web and application servers, what about the database server? So in certain cases they manage by having a central database and then local copies of the data which are read only. So there is still a single central database server. So many applications run like that. But if you take a really large scale application like Gmail or Yahoo mail, there is no way that a single database machine can handle the kind of load that these guys put on the database. As a result, for really large scale services or for the matter Facebook, many of you would have used Facebook. There is a lot of stuff in Facebook which is actually stored in a database. So how do they handle it? So what they do is actually have thousands of database servers also or at least hundreds. So they have potentially thousands of application servers and hundreds to thousands of database servers. So each application server will talk to whichever database server it needs to talk to. And there are a variety of tricks here. Certain systems will provide a single interface. You just run SQL and the database server will figure out how to partition it. Certain others partition the users. So the application server has to figure out which of these hundreds of database servers has the records for this user and it will go talk to that database server and get the required data. So there, the data is partitioned in a way that the application knows and it has to deal with the partition. That's a lot cheaper because then you can buy off the shelf databases and use them with, in fact, you don't even buy it, Facebook for example uses MySQL, the free database server. So it's one of the largest users of databases on the fly in the world probably because it has such a huge number of hits and it has thousands of machines running MySQL. But all of this is transparent to you as the user. You would have no idea whether it is one machine or 10,000 machines. You have no clue. It doesn't matter to you. When your request goes, it will actually go to one machine and all your interaction is with that one machine. When I go, I will also be talking to one machine but it may be a completely different machine. Now the HTTP protocol which I told you, which lets the browser and the server talk to each other. It's what is called a connectionless protocol. What does this mean? Well, actually there is a connection established, a socket is open, data is transferred but the key point is this connection can be closed immediately after a response is sent back. That is the socket in the network level is closed. This is actually fairly important for web servers. If a web server gets thousands of requests per second, which a single machine can handle and each of these opens a connection and then leaves the connection open for a long time. For hours, there is a problem. The connections are a finite resource on any operating system. Why is that? Well, it's a part of the design of the TCP protocol but there is a limit on how many connections a machine can have open at a time. As a result, the HTTP protocol is designed to let the connection be closed immediately after a response is sent. Now if the user asks a follow up query, that's a brand new connection. The problem is given that each connection is new, how does the application server or the web server know who it is talking to? So one moment I send a query. The next moment you send a query pretending to be me. How does it know whether I am still talking to it or now somebody else has jumped in and is talking to it, that's the question. And the solution is to use what is called a cookie. What is a cookie? It's just a small piece of text containing identifying information. Now what is this identifying information? Some usually randomly generated set of characters which the application or the web server will send to your browser after authenticating you. So typically it can be used for many reasons. So cookies are used for tracking individual users, for dealing with ads and so forth. But the use we are going to have for cookies is so that once you login, the server knows who you are. You have given a login name password, the server knows who you are. Now when you make subsequent requests, it's a brand new connection. At this point, how on earth does the web server know it is still you and not somebody else who is connecting to it? That is one of the uses for cookies. That's the use we are going to talk about just now. So the trick is that the moment the server gets your login and password, it knows who you are. It authenticates you after checking the password. And it is going to send back some piece of text to the browser and say, this is a cookie. Save this value with this thing. And later on it will ask for the cookie of that name. Now your browser will not give away this cookie to somebody else. Now you know I was talking to one web server and another web and at the same time in another tab I talked to another web browser. If that web browser can ask for the cookie which this one site, there is a problem. It's going to reveal information to that fellow who had no business knowing what I am talking to this fellow. So this cookie which is set by this guy will only be given back on request with this guy. So why will the server ever ask for the cookie back? Well the next time you go and open a connection with the server, the server will say, here's a new connection. Do you have a cookie of this name which I sent you earlier? If this is the first time you go to that browser to that server, the browser will not have that cookie and it will say, sorry I don't have a cookie. If it does have the cookie though, if this is a second interaction, it will send the cookie back. Now this is a randomly generated value. The web server can check, did I send this random value to somebody short while back? Who did I send it to? I sent it to Sudarshan who authenticated 10 minutes ago and so I know that the person who is talking to me now is still Sudarshan. If somebody else wants to jump in and pretend to be me, they can't because they don't have that cookie value. Now this is why if another web server that I am talking to in another window, if it asks for the cookie and the web browser gave that cookie back, well then that guy can pretend to be me and cause damage. That's why the cookie will never be revealed to anybody except the server which set the cookie in the first place. So that's how cookies are used to establish a connection. Now eventually if you are inactive for a while, your session times out. What is happening? Nothing happens in the browser usually. The server which is keeping track of, I sent this randomly generated large string to the browser and it was associated with Sudarshan. If I have been inactive for say 20 minutes or 2 hours or whatever the server decided, it will say okay from now on I am discarding information about this cookie string which I had sent earlier. So now if the server comes and if I make a new connection to the server, the server says give me the cookie value. I will send whatever I meaning my browser will send over a cookie value. The web server gets the cookie value and looks up in its table. Who is the user associated with this cookie? And guess what? A few minutes ago it just threw it out because my session timed out. It threw away that information. Now it will no longer find it and it will say sorry I don't find that string in my table so I don't know who you are so log in afresh. So that is exactly what happens when you go back to a site after a break and the site says sorry you have to authenticate yourself again. Moodle does this, mail systems do this, everybody does this. So cookies are the core infrastructure. Now a program which runs on the server needs to be able to deal with the HTTP protocol including dealing with cookies and so on. And thereby talk to the user, deal with authentication, receive a request, decide how to process it, talk to a database and then send the result back. So all of this is the job of the application. Now a lot of this is common to all applications. And so what has happened today is there is an API called the Servlet API which lets the application server talk to the application code which a programmer has written. So there is a bunch of code on the application server which is common to all applications. This includes code to which implements the HTTP protocol and so forth. The rest of the code is application specific. So what the programmer does is he writes this code in Java and that is loaded on to the application server. So when a request comes, the server should also be told that when a particular request in our earlier form, we had a form called person query. So the server has to be told that whenever a request for the form person query comes, you should invoke this particular servlet. So that is told to the server and when a request comes, the code for that particular servlet class, that class code is executed which already been loaded and it is executed when a request comes. So to build different features to applications, you build different classes which are all part of the servlet class. They inherit from the servlet class and you sub load all of these onto the database server and depending on the request which comes, the appropriate class is executed. The code for that class is executed to deal with that particular request. So the request for person query which we saw will be one piece of code, but there will be hundreds of such forms each of which may go to a different servlet potentially. So the way a servlet is done is it's loaded onto the server. Whenever a request comes, if the server now, if the server has a single process, a single thread, it invokes this, the methods of this class. Actually what happens is a new object is created and the code is invoked on that object each time a request comes. If this code will prevent the web server from processing any other request in parallel, there is a problem. This code was written by application programmers. They may do something which takes a long time. It may take 10 minutes and in that 10 minutes, the web server will hang. No other request will get processed. That would be a very bad idea. So the servlets don't work that way. Instead, each time a request comes, the server spawns a new thread on the server. What is the thread? It is a separate, you can think of it almost like a separate process, it's not. But many threads can run concurrently within the same server, huge number of threads can run. So the fact that this particular thread which was serving this particular request is taking a long time does not affect the web server directly. Meanwhile, it can receive and process other requests. So all web servers are basically threaded. And the servlet API is a Java API. Now there are similar APIs in other languages which do similar things. We don't have time to discuss all of them. So we are only going to look at the servlet API. So let's look at a small piece of servlet code which does the following. Let's start from the beginning. There are three imports. What are the imports doing? They are the Java IO library, this is Java servlet library, and then there's a Java servlet HTTP library, which the servlet interface was designed to work with other protocols also. But the HTTP protocol is really the one which is most widely used, so that's only one we are going to look at. So now the person query, we have created a class called public class, person query servlet. We could have called it anything we want. But the key thing is it extends HTTP servlet. So what does it mean to extend HTTP servlet? It means all the functions which are there in HTTP servlet are inherited by it. Moreover, it has to override certain functions. And the primary functions which it has to override are three. There is a do get, and correspondingly a do post, or there are a few more which it can use. So let's focus right now on the do get. Why do get? If you remember, the HTTP code we saw earlier said that here is a form whose method is the get method. So there are three methods, get, post, and what is the third one? I forget the name. So the method has to be specified as one of these. And if the form specified do get, the get method, then the server will invoke this particular method, do get, on the object which it creates. And it passes in two parameters. One is a request whose type is HTTP servlet request. The other is a response whose type is HTTP servlet response. And there are some exceptions. We will deal with those later. So what is the servlet have to do? It has to take input from the request. And it has to put output onto the response. So here is what it does. In this case, the response is a HTML page. Therefore, it says response.setContent type text slash HTML. So that's HTML content. It could also be sending a binary image for a photo or various other things. And correspondingly, the content type will have to be set. Now since it's a text response, it does a print writer out equal to response.getWriter. So this is an interface to write text to the response. And then on this object out, it says out.printLine. And this is HTML text, which we saw earlier, similar to that. So it says head, title, query, results, slash, head, then body. And then the actual content, what it wants to display, will go in here, which will all be printed out to this out object. This out object is linked to the response. So whatever it is printing here will eventually be sent back to the browser. When is it sent back? It will be sent back when this do get finishes. When it exits, all of this is sent back to the browser. So when it is finished writing whatever it wants to write, it says slash body to indicate the body ends and then out.close and return. So that's basically what a servlet does to process a particular query. Now the body here depends on what it needs to do. So in our case, if you remember that form took a name and it took a parameter, which is either student or instructor. And depending on whether it was student or instructor, it looked up the name in the student relation or the instructor relation and it returns information about people with that specified name. So how is this implemented? This is what we want to implement. How is it implemented? So there are several steps here. The first step is to do request.getParameterPersonType. This person type is a parameter that name person type is there in the form. It's an input person type. So the person type will have a value, which is either the form is going to set it to student or instructor. So what this does is, if person type equals student, then do something else, if it's for instructor, do something else. There is also a name. How does it get access to the name? Similarly, request.getParameterName will return the value of that name. So now it has these two variables, which have the type and this should have been name, not number. So it then goes in and does something. What does it do? It has to talk to the database to find out people with that name, students with that name. How does it talk to the database? Well, JDBC. Now we already discussed JDBC and some of you have already done the JDBC assignment. So use JDBC, get the information, and then that information has to be output to the browser. How do you do that? Well, we have created a table, OutDotPrintLine table with three columns, and then there's a header which says ID name department. And then for each result, we loop. That loop would typically be on the result set. We've got a result set back after running a query. We loop on the result set, and for each row of the result set, we output this thing. What is the output? Well, first get ID name and department name from the result set into local variables, and then print the value of these local variables onto OutDot. So this again is formatted in HTML. tr is table row slash tr ends the row. td is data for a single cell of that table. So td then the text, and then slash td ends that table cell, and then another cell, and so on. So all of this is low-level HTML coding. As I said, if you code using these things, it takes a little bit more time to build a web application. That's one of the reasons that it's a little more time consuming than building a client-server application for which there are some very nice tools that have been around for a long time. Such tools are being introduced now for web applications too. But they are not yet as standardized or as widely used. So you can get away from the details of all this. There are libraries and tools today which you can use. I'm not going to cover them because there are many such tools. There is no standard. If I tell you use this, and tomorrow it's no longer widely used, it's not supported, you're locked into that too. So I'm not telling you which one to use, but there are many available. There's one which I can recommend to a large extent called YUI, Yahoo User Interface, which lets you do a lot of other cool stuff with JavaScript also to build a rich interface. So that I can recommend. So coming back, all of this is output, and the result is sent back to the user and displayed. So that's the basic way in which I request this handle. That was for a get. There's also a doPost if you use a post method, and the body of it is identical. There is no difference. Just that the method is called get or doGet or doPost. Otherwise, there is no difference at all. So far, so good. The next feature which serverlets provide is the session feature. I told you that you can set a cookie and create a random value, send it to the browser, and so on. That's a lot of work to be coded each time. So the serverlet API wraps all of this and gives you a much more easy to use interface called HTTP session. So internally, it sets a cookie. In fact, there are other ways to implement it without cookies also. So it can actually use one of those ways. How it implements it doesn't really matter at a high level. But what it gives you is an abstraction of a session. So what you do is when you get a request, the serverlet code can check if a session is active already. So if a session is not active, that means this is the first time a request has come. And then what does it do? So how does it check if a session is active? It says if request.getSessionFalse is true. That means there is an existing session. If it is true, then we can directly process the request. If not, there is no existing session, and we have to redirect to an authentication page. Now what is this parameter false here? The idea here is if you set this value to true, then it will create a new session if there wasn't one in the first place. That is useful for certain situations. But typically, we won't do that. What we will do is when we check here, we will set the parameter to false. When we want to create a session, we will set it to true after authentication. So how does the authentication page work? It will ask you to enter a login password. And when that is submitted, the serverlet for authentication will check the login and password with the database relation which stores user names and passwords. Again, all of this code has to be written carefully without ever concatenating user input. Remember the SQL injection problem. You should be careful to use prepared statements in the correct way to prevent SQL injection. And as long as you're careful, you can go fetch the data from the database to validate the user. And then if the login password match, you create a new session using request.getSessionTrue. That returns a session object. Now the session object can actually be used to do various things. So the first time you authenticated the user. Next time the user comes in, the session is active. But how do you know who is the user? One way which a few students have used, which is wrong, is to use the cookie mechanism to directly set the user name as a cookie which is stored on the browser. That is very wrong because people can hack the browser code and set whatever cookie. So user x, y, z can set a cookie which says user ID is Sudarshan. And then when that request goes, that badly designed application will think that x, y, z is actually Sudarshan because the cookie value says Sudarshan. That's a very bad idea. What you should instead do is say session.setAttribute. And the attribute name is user ID. And the value is the user ID of whoever was just authenticated. So if it has just authenticated me, it will do session.setAttribute user ID Sudarshan. This information is not sent to the browser. It is stored in the web server. So when the new request comes from me, it will not contain the user ID. The web server will ask me for the cookie, which is a randomly generated string which it created. So if I had already talked to the browser, that cookie, random string, the session protocol doesn't know all of this. We don't have to worry about it. But internally what happens is this string is sent back. It will check that string and say, ah, this string corresponds to a particular session which was created earlier. And so the next request, if the session is active, I can say session.getAttributeUserID. And the saved user ID, which is completely in the web server, the user ID never went to the browser. So it is safe. And so session.getAttributeUserID will give the user ID which we know was set earlier after authentication so we can directly use it. We don't again have to ask the user what is your login password on each request. That obviously would be very painful. Each request asking for login password and so on. So that's how the sessions work. Now in your lab exercises tomorrow, you will be creating servlet sessions and authenticating users and do all the basic things using servlets. Some of you may already know all of this, but I suspect quite a few may not have done a lot of programming. So here is a very quick quiz question. It's not really a quiz. It's a survey question. I over here means you, not me. So option one is used servlets already. Option two is used equivalent features in another language, example.net, php, et cetera. Three never built a web application, but know the concepts. And option four, all of this is new to me. So here is a quick survey. All center coordinators, please make sure things have been activated. Participants, please press ST and get your remotes ready. Time is up on that question. Obviously, there's no one correct answer. I'll continue while the results are brought up. And then we will discuss the results. So I mentioned this briefly that servlet code actually runs inside of some application server. And the most commonly used one is Apache Tomcat, although there is another called Glassfish, which is widely used. And another called JBoss, which is even more widely used, because it supports a whole bunch of other features, which are part of the J2e, including beans and containers and whatnot, which I'm not covering here. They are quite useful, but they are not essential to what we are doing. So those are the free ones, and they are very good actually. Tomcat in particular is very, very widely used. But there are also a whole bunch of commercial ones, which offer certain other features, very useful for certain applications. So those include web logic, web sphere, oracle application server, I forgot to mention, Microsoft's IIS, and a whole bunch of other servers. And these servers support deployment and monitoring of servlets, what is going on. So in Tomcat, there's actually an administrator console, which lets you see what are all the servlets which are active on this system. And each servlet is actually part of a, you can create multiple servlets, package it in a single thing called a web archive or war, and upload that war as one application to the Tomcat server. Now, there are actually a number of steps here for deploying this thing onto the server. Luckily, when you use an environment such as Eclipse or NetBeans, all of these are done transparently for you. You don't have to deal with this nitty gritty details. But if you're going to use this in a real application, you will have to figure out how to copy these, upload these war files, which are created onto Tomcat, which is running not on your desktop, but rather on the web server. That's the one extra step. It's actually very straightforward. But you will mostly be able to do all of this without ever seeing that step. So let's see. This time, most of the centers have responded. We have 190 responses. So we have improved. And I don't know if you can see the responses. They are interesting. So about less than half the people have actually built an application using either servlets or anything equivalent. Servlets is actually less than a quarter. And others are slightly less than servlets. So for those who have actually built any web application, servlets has a slight lead. There are many other things. People use PHP very widely. There's something called Ruby on Rails, which is used widely. Anything which Microsoft builds, of course, is used very widely. So there are many alternatives. We stick to servlets because we can't cover 20 alternatives in a book. And servlets' support is free, unlike Microsoft's. Although, in terms of technical quality, many of these are just as good as servlets. There's nothing wrong with them. Then the third option, never built an app, but know the concepts. Again, there are about one-fourth. And slightly more than one-fourth, say that all of this is new to me. So for those in this last little more than one-fourth, we are probably going very fast on this. And you will probably need to take a little extra help in the labs tomorrow from your center coordinators and from one of the other half who already knows some of this stuff. So do take help, but make sure you write these programs so that you understand how to build these. A feedback which I had got from certain people is that a lot of places around India, they all are doing fine on the theory part, because textbooks are available, slides are available. But the lab component is something which they don't know what to cover. I hope that this course, half of it being lab-based, will help all of you in, especially those in this last quarter, actually last half, who have not actually built anything, to get your hands dirty with building stuff. Because only when you do it, can you tell your students, yes, you can do it, and you should do it in this lab. And only when our students start building this as part of their courses. Now, I know that most students do end up building some of this as part of projects. But I also know that, unfortunately, the way projects run, I have seen this myself, is that they're groups. They cannot be individual, because there are too many students. And in any group with three or four people, it often is the case that one or two do the work, and the other two or three take a free ride and end up doing nothing. That is very bad. We are allowing people to graduate who have never even built any simple thing. So that's very wrong. And so do make sure that in your courses, a lab is enforced. I think pretty much all universities now have internal assessment, and the lab should be a major component of internal assessment. Again, the goal is to make sure people have done stuff and understood it. They can take help to understand. But at the end of the lab, they should have understood what all needs to be done clearly, and build something on their own. That should be our goal. And part of it is to be able to ask them questions at the end to see if they understood what they did. So I think that is it for servlets. And so that covers all the basics which you need for tomorrow's lab. The other half of this chapter, I'm going to cover fairly fast, because they're not critical for the lab. They have stuff which you should know, but you can read it offline. So one of the components is actually quite useful still, which is server-side scripting. Now I showed you how to build a servlet. It turns out writing a servlet is a lot of work, because you have to take all the HTML text which you want and stick it inside print statements in Java code. It gets very messy, and it's quite hard to understand what is going on. So very soon after servlets came out, people realized there were drawbacks. And there's a server-side scripting language called JSP, which actually combines plain HTML with Java in a very clever way. We're going to see an example of that. And that can be used. You can certainly use JSP even in your assignments tomorrow, probably it's not required. But if you do anything more on this, JSP is actually something very, very useful to reduce your programming effort. And what is it we will see? The basic idea of all these scripting languages is you write HTML code directly. And how do you write the HTML code? If you know HTML, you can write raw code. If you are not very familiar with HTML, there are editors which let you do HTML editing directly. In fact, what is nice is you can create really nice looking pages with these HTML editors, even though you don't really know much about HTML. So you can create nicer looking websites. So now you've created the HTML. The content has to be in there. This is the HTML which will be sent back to the user in response to a query. But the content is not just static HTML. It is stuff which you've got from the database. So now what you do is you edit this text and stick in the small pieces of Java code, which do the actual work of talking to the database, getting the response, and then printing that. So what you need to print, which you get from the database, is the only thing which is inside of Java code. The rest of the static content of the page is all directly in HTML and can be generated with any HTML tool. That's the idea. So now what you have is HTML code with pieces of Java code or PHP or any other language code embedded inside of it. So it's kind of flipped. Earlier what we had was Java code with HTML in strings. Here it is flipped. You have HTML code with Java code inside special delimiters. And here's an example. So here is the static content of the page. HTML, head, body, slash, body, slash, HTML. All this is static. So this is created directly in HTML. Now the dynamic part here, this is actually a toy example. What it does is it checks if the parameter name has been set in the request. If so, it says hello and prints the name. If the name is not set, it just says hello world. It's a toy. But obviously you can do more interesting things here. For example, our person query servlet could be done here by taking the input from the request and processing it and printing the output. So inside here, the Java code looks identical to what we saw before. Request and response are standard words predefined. So you can just say request.get parameter. Out is also predefined. So all that has been taken care of without you seeing it. So you can directly say out.printline hello world and so forth. Now what actually happens is this JSP code is compiled on the fly by the application server into Java. It's actually rewritten into Java servlet code, compiled and loaded on the fly. JSP has a number of other features, including new tags and so on, which can be used, but are not essential. So all of this was Java based. Now Java is a strongly typed language and it takes a lot of lines of code to do simple tasks. Now this notion of strong typing is very useful for building complex bodies of code because it prevents you from making silly errors, which can be very, very difficult to detect in large bodies of code. But when you're writing small pieces of an application, all this extra typing machinery, which helps to protect you from yourself, may actually result in you're doing a lot more work, but the benefit of it is limited. So there's a whole class of scripting languages of which PHP is a very widely used example, which are very loosely typed, which let you do a lot of stuff by writing a little bit of code. Of course, at the risk of programming error being hard to detect. But if your whole program is just one page, it is not spanning thousands of lines of code, this may be acceptable. In fact, PHP is actually very widely used. It's a very nice language, actually. And all of you have used Moodle. Now you know that Moodle is a very, very rich application. You have seen only a few parts of it, but it has a huge amount of functionality. It's one of the biggest applications which are available in public, where you can actually download and see the contents. And it's very well written, actually. I would recommend as a project for students, when they, as a course project, it's a bit ambitious, although some students here have modified Moodle as a course project. But as a BE project, taking Moodle and then modifying it to add new functionality would be a very nice BE project. They would learn a lot by reading existing Moodle code and then modifying it. And Moodle is written entirely in PHP. So PHP is a very powerful language. And Moodle, one of the nice things is it has a number of coding standards which are followed very strictly. And you can see real life code which follows standards and which has been designed very nicely to achieve, you can do a lot, with just a few lines of code because libraries have been built appropriately. So I urge all of you to get students to explore the Moodle code and play around with it. So coming back, what is PHP? I'm not going to explain what is PHP in detail, but I'll just give a small snippet of PHP code. Again, PHP is typically embedded in HTML, that is its goal, typically. It doesn't have to be. PHP programs can exist outside of HTML also. But here's an example of embedded PHP. This first part is the same. Then you say less than question mark PHP, it indicates the PHP script. And then it says, if not is set dollar underscore request name, echo hello world, else echo hello dollar request name. So it looks syntactically a little different, but it's basically doing the same thing as the JSP code we saw before. Okay, so that was for JSP. Now your Eclipse and Tomcat both support JSP. So you're welcome to play around with JSP in addition to Java servlets. They also support PHP for that matter, but that's a new language, which you have to take some effort learning. Okay, now moving on to client side scripting. What we saw was server side scripting, which is script, which reside on the server and executed on the server and return HTML to the user. Now, in addition to HTML, you may return JavaScript, which is the most widely used client side scripting language. Flash is also widely used as we discussed. There are a few others. Applets, which is Java code, which is run on the browser, were popular at one time, but they have fallen by the wayside. They no longer used much. VRML is another one, which is used occasionally. You know what JavaScript does. I'm not going to go into too much detail, except to say the following. It's very widely used, as you know. One of the key things which JavaScript does is it can modify the contents of the page. So what you get is, when you send a response to the browser, you're sending HTML code along with embedded JavaScript code. That JavaScript code can actually modify the HTML, which is residing on the browser. How does it modify it? Well, the HTML code is actually passed and turned into a tree structure. And that tree has a representation called the document object model or DOM. The JavaScript code can directly access that tree and modify the tree. It can, if you want to add a row to a table, for example, it can go down to the node of that tree, which represents a table. That has several children corresponding to each row. It can add a new child corresponding to a new row. So that's how you have certain applications, where if you click on a button which says add row, a new row pops up, and then you can fill in the row. It can do validation. So if something should be a number, when you click submit, it checks that it's a number and says, sorry, it's not a number, and prevents you from even going back to the backend. So it can do certain validation at the front end. It can, in fact, do something more. Most of the current generation of web applications are based on what is called web 2.0. What is web 2.0? It's not in brand new technology, but it's HTML plus JavaScript used in ways which it was not done earlier. And one of the key ways that web 2.0 differs from the earlier web applications is that in the earlier applications, only when you click on the submit button would interaction normally happen with the backend. There were a few exceptions. In the new generation of applications, which appeared in the last five, six years, the applications have JavaScript code which talk with the backend, and what they can do is, they can do this in the background. So your JavaScript code is running. You type in something. You don't see anything happening. In the background, that's code is started off. It talks to the application server, gets some information, and then displays it in the page. You didn't see the page refreshing. Nothing happened. Everything was quiet, and something just appeared magically in the page. Or you started typing in a query to Google, and Google magically completes the query for you. So what is happening is each time you type a letter, once it gets three or four letters, send something to Google in the background without interrupting your typing, it gets back some responses saying that these are the things you can fill in as completions for this query, and it displays those. So all that is happening in the background without your noticing it. And it's so fast that between two key clicks, it has already gone back, got something and displayed to you. So that's really nice. So all of that is done by what is called asynchronous execution in the background. And there's something called Ajax, which basically it's a collection of tools which lets you do this asynchronous communication. It's not one language or one tool, but it's an approach to building applications which do all this stuff asynchronously behind the scene. And these have enabled this new generation of web interfaces, which are much nicer for users than the previous generation. So we have some more examples of JavaScript. This one does validation of form input. It checks if something is, if you entered the credits, it checks that credits is a number which is between zero and 16. In this case, greater than zero and less than 16. And this piece of code tells you how to invoke the JavaScript validation function as a part of the here form on submit return validate. So I'll skip the details, but you can get JavaScript code executed when submit is skipped.