 So, today we are going to get into application design and development and focus on serverless. Last time we looked at how to connect to a database using JDBC. Today's lab is on building web apps using serverless and so let us cover the basics of this. Now, again I will skip a lot of slides here and focus on just the relevant stuff. So, this is a typical architecture of web applications today, this or small variant which is coming up in the next slide. You have a browser talking HTTP and connecting to a web server which in turn redirects the request to an application server, which in turn talks to a database server, which in turn talks to the underlying storage system. The underlying storage system these days may be disk, it may be a disk on a separate box which is network attached storage or storage area network or more frequently the web server and the application server are combined into a single unit and the reason is otherwise there is more overhead of inter process communication which is usually unnecessary. So, this is what we will be using today. We have a servlet system and the Tomcat system which can serve both HTTP pages as well as serverless. So, quick brush up on HTTP which influences how web based applications are built. The first thing to note is that HTTP is what does it say, connectionless, actually it makes a connection and then closes it. It does not keep the connections persistent and the reason for this there are several reasons. One is operating systems usually put limits on how many network connections you can keep open. If you close a connection you make it available to others and other is that if a server fails your next request can go to a different server and you know nothing about it. So, you do not want to keep connections open for a long time during which servers may change. However, what this means is that if you log in once and authenticate yourself the connection is lost. So, the next time you connect to the application server it is like you are a new person, it is a new connection. So, how does the application server know who you are? How many of you are familiar with cookies most, but not all. How many of you are familiar with sessions, HTTP sessions, again quite a few, but not all. So, let me spend a couple of minutes on these two topics. So, what happens is your browser has opened a network connection, it is done some stuff maybe it logged you in it took your username password you got logged in, but after that the connection is closed after a few minutes the connection is made again. So, the first thing is how does the application server know who you are? And the trick is that it stores a bit of data on your browser and it can ask for it back. So, a cookie is a small piece of data which the application server can through HTTP. HTTP is actually two ways, you connect to the application server, but it can make request back to your browser and your browser can satisfy that request. So, it can request your browser store this cookie and when you connect again it can tell the browser do you have a cookie with this particular name what is the name usually the URL of that website which you are connecting to do you have a cookie with this name and then some extra identification information and your browser if it has that will give it back to that server. So, that cookie usually contains something. One option would be for that cookie to contain your username. So, next time when you log in your browser will give that username, is this secure? If the server stores your username what is the problem? A few people are shaking their head saying it is not secure the rest are just looking on how many of you think it is not secure raise your hands seriously think not just because I ask. And you are right it is not secure because the user can hack the browser. So, if I want to pretend to be Phatak I can store a cookie on my browser I can modify the cookie to replace a Sudarshan by Phatak and the next time the applications ever ask for the login my browser will tell it Phatak and accepts it I can be Phatak obviously not a good state of affairs. So, what can you store in the cookie then? You need to store something which cannot be guessed and if you try random values you will probably fail most of the time. So, what is stored is some large string and the string is an identifier and the application server remembers this identifier. Now, if somebody could tap into the communication between me and the server they might be able to pull off the identifier if the connection is insecure if the connection itself is not encrypted plain HTTP somebody who is snooping on the wire can pull out my the cookie which has been sent to me and then they can connect to the application server using that cookie. So, that is a risk. So to prevent that today people most secure websites use the HTTPS protocol where everything which goes between the browser and the server is encrypted nobody else can tap in and see what it is and how all this is done is a topic for a security course we would not get into that but I do want to say a little bit of both application security because we are building applications so it is usually important that you use HTTPS if you are doing something important or if you believe that the network is intrinsically secure and if you are within an organization usually the these days the network is relatively secure if you use a switch network if somebody can hack into the network they can tap your packets but otherwise is user who plugs in their laptop usually cannot see your traffic earlier it used to be different the networks used to be hubs which meant that all the packages that you sent will go to everybody on the same Ethernet and every anyone can snoop on the packets and easily see what you're doing today is a little different it's not so easy to snoop but if you're going outside of your institute across the public web who knows what is going on out there somebody may be snooping on the wire you don't know and if you go to China you do know somebody is snooping at least that's what many people claim that if you go to China you know if you want to really be secure you should take a throw away a laptop come back and we format the hard disk don't take your real laptop I don't know if it's true but that's a reputation they have earned very richly well earned reputation for people hacking into things and you know that they have hacked into our government computers and so on so these things will happen so it's if you're running something important you better use HTTPS protocol so coming back a cookie is a piece of text which is usually fairly large so if I try generating random cookies the chance that it will be a cookie which is currently used by some user is very very small can happen everything in life is to some extent a chance anything can happen but the chances fairly small by creating fairly large random strings of course consecutive cookies cannot be related directly to each other if the server starts giving cookies like 1 2 3 I get a cookie 5 I can try 6 and it probably belongs to somebody you can't do that either so cookie has to be fairly large random string okay so that's how the application server finds out who you are without asking you for the login password each time so what happens is the first time when you connect it will send you a cookie and it will verify your login password then it remembers this cookie was sent to this browser and I have checked that that browser has given back that login password so I know that the user is authenticated so the application server keeps the connection between the cookie and the user the next time when it gets that cookie it knows this is that user okay so now this is the basic protocol but what is the software which implements this protocol one end of that software is your browser what about the other end what software runs there and here there are many many alternatives the first alternative is a simple web server and again there are many web servers Apache is widely used in Microsoft line IAS was used so these are web servers primary now how do web servers run applications in the olden days you in the very very initial days when you send an application to a web server it could execute a program at that point it executes a program which does something and gives a result back which is then returns to you now this is a very inefficient way of doing it how many of you have written gate and tried to get your score on the day of announcement and found that the server crashes any of you how many of you know students who have done this is actually very embarrassing for many years our gate servers used to crash because they use exactly this architecture they had a web server which would launch a new process for every single request and to make things worse this was written by people who are not database people and that process would run a grep on a file and search sequentially through a file and show you the result if it if your roll number is there or say not found it is a terrible way of doing things so you really don't want to launch a process per request so the way it is handled these days is that the web server or the application server depending on what you're using will launch a separate thread to service your request so if you're using Java servlets for example a thread is launched at the overhead of launching a thread is very small the overhead of launching a process is very high so the overhead of launching a process and an operating system today is probably of the order of somewhere between 0.1 to 1 millisecond in that ballpark so per second you cannot launch more than about say 10,000 processes but there are and then the system is doing nothing but launching processes at that end whereas there are if you use threads the overhead of starting a thread is very small in fact there's usually a pool of threads and in fact even processes what Apache does for example it has separate processes but it has a pool of processes so it doesn't start a new process it uses one of the existing processes to serve your request that's how it gets parallelism PostgreSQL does the same it doesn't use threads there are other databases which use threads but PostgreSQL has a set of processes and each request which comes in is sent to an idle process if there's no idle process your request is queued up and when that process process your request it sends you so this is the common architecture now what about the gate application why couldn't it have a process to do it but today you can do that if you use a different kind of interface so many of you would know about PHP so how does PHP run if again you launch a separate process per PHP request the overhead is going to be high but there are PHP modules available for web servers which means you can run it in the web server okay so that's the genvik approach and let's focus on the servlet approach servlet is an API basically in Java again you could code your own from scratch but the benefit of a standard API is it does many things for you for example this business of creating cookies sending it to the browser getting it back all that is done by the API you don't need to know how it's done you can just use API for everything so here is a piece of servlet code as in the JDBC example before this is just the scaffolding the actual work is shown in the next slide so servlet so I defined by creating a class here there's a before that you have to import a bunch of stuff for file IO servlet dot star Java X servlet HTTP star so you can access various parts of the API in your Java program and then you define a class in this case you've called it person query servlet which extends HTTP servlet which is a subclass of HTTP servlet class which means there are certain methods of the HTTP servlet class which it inherits it can override those methods or leave some of them as default so in this case it is this class is implementing a method for do get so what is the do get method doing it has two parameters one is a HTTP servlet request that parameter encapsulates everything to do with the request that has come in and the second parameter is the HTTP servlet response which encapsulates everything of what what you send back to the browser and these are the two objects which you use for getting input and sending results back and this method throws two possible exceptions servlet exception and IO exception so any method which calls this in in this case who calls this you are not going to write any code to call this the server the application server which you're using you'll be using Tomcat it is going to call this do get method on a particular class so you have to do a little bit more work to tell Tomcat if you get a request with this particular URL send it to this particular or call this particular servlet class which I have defined here so that you can do manually or if you use eclipse that may be auto generated for you so in the instructions which you have that mapping is created behind the scenes for you but you can also edit it and create your own mapping so now what does it do the first line there it says response dot set content type text HTML why is this important when you make a request on the web you may get different kinds of data back and the HTTP protocol has a way for the browser to find out what is coming back so in this case you are telling you are by doing response dot set content type to text HTML that's a standard type you're telling the system tell the browser that the type is HTML and it'll process it as HTML it could also be binary data the octet stream I could which you could use for various things you could say JPEG so that it's a picture which is interpreted as JPEG so there are many types standard types so you can define the type here and then the data which you put into the response should match that type otherwise the browser will get into trouble so that is the first thing next you're saying print writer out equal to response dot get writer so this is a class an object in this case of type of class print writer which you can use to do print into that so now I am doing the following out dot print line head title query results slash title slash head so what is this this is standard HTTP which says this is the head of the page and the title is query result then it says out dot print line body it says what follows is the body and then there's a bunch of code which is on the next slide and then you say out dot print line slash body so in HTTP every tag has to be ended by corresponding slash of the same name so it has to be properly nested it's not properly nested what can happen well browsers do their best to manage if HTTP syntax is messed up but how they manage may be browser dependent so you may have code which works fine on browser a but dies on browser b because there was a syntax error in there and there are tools to detect these syntax errors but you should be careful to match these tags so forth and then out dot close to say we are done that's the body of a servant now the actual content inside it should do something useful so in our examples of what it's doing is it's this particular piece of code is used to find students with a specified name so how do you well it could be this is part of a larger program so we have that example thing in the book so what it's doing here is you can specify that you want to search for students or instructors and give a name if you say student the name is only search in students you say instructor the name is searched only in the instructor relation so that's the logic that we want so the browser in the browser the user is selecting a drop-down box which is either student or instructor and then entering text for a name in another box so how is that done how is that used accessed here rather and then the user is clicking on a submit button so the I'm not going over the HTTP form that is something which you need to also deal with you have to create HTTP forms there is a sample form you can modify that for simplicity if you're not familiar with it so now in the servlet body I can access these parameters by name so if you see the first line there it says string percent type equal to request dot get parameter string percent type so in the form I had said percent type it's a drop-down student or instructor that value which you chose in the drop-down box is retrieved here the next line is request dot get parameter name so this is the name which you type there so now the system checks if percent type dot equals student Java you can't say directly equal you have to say dot equals here if it dot equals student now I need to look up student I've not shown the actual JDBC code here but that has to be filled in now don't forget when you fill in the JDBC code that code will take a parameter which is student of the name rather now don't forget you should never concatenate parameters which you got over HTTP to create a query string never ever do that it's a huge security hole use a prepared statement with the question mark place holder and then set that particular parameter to the name which you receive that's a correct way of doing it okay so now use JDBC and you've got a result set back let's say now you have to format the output so in this case there may be multiple students with the same name so we are creating a table so here out dot print line table border you want to show a border calls equal to three columns and what are we printing now for each line so this one is just the header PR is table row start TD is data for one cell and that data is here ID the next cell is name and the last cell is department and each of these is closed at the appropriate point so that is the header of the table now we loop over the result set we retrieve ID name and department name into these variables let's say and then we saw dot print line this is more or less the same except here ID was within this string whereas here I'm printing TRTD and then plus ID so I'm printing the name which was retrieved and so forth till the last one and finally out dot print line slash table so this is what is printed out okay now there is a slight issue here when I just print out text like this supposing somebody's name contained angular bracket be close angular bracket or something like that what would be printed what would be shown on the browser the browser would treat that as a as play as a HTML command the problem is what is being sent out here is HTML it's not plain text and things which begin with angular bracket are treated as commands and then the name will get messed up so if you for example print out SQL program with less than the less than may be interpreted as a HTML command so there are libraries which will escape these characters and print it out so you can use those to get it done right we are not done it here and finally print the slash table to close the table and that is the part for student is another part not shown for the instructor and so that's the main logic inside very simple any questions at this point about the protocol or the syntax okay then I mentioned this business of identifying users yeah sir I just want to know how to how do we differentiate that web server and application server how do we differentiate what is the difference in border is fuzzy so something which is designed primarily to serve plain web pages would be a web server something which was designed primarily to run programs in servlets or other such things would be an application server but the boundary is very fuzzy these days like I said many things will mix the two functionalities Apache has modules which can run many things in the web server itself so now Apache is HDPD you could say is an application server in that sense Tomcat can serve HE ML pages so Tomcat is also a web server in this sense this is the original design goal was pure web server versus pure application and it reflects in the set of features that they provide but otherwise the boundary is not very sharp it's fuzzy sir if you develop the applications in web in the dotnet that is more powerful than the satellites in Java applications in dotnet what do you mean by powerful so you can do exactly the same things in both the question is convenience how many lines of code you have to write to get a particular job done and servlets are actually a little bit of a pain because you have to write a lot of lines of code if you go back to this example you had to understand html you had to output raw html with a lot of print lines it's very clumsy looking code so for I mean there are other tools available of course so in the servlet world itself there is something called JSP which allows you to flip the thing you write html code in fact you don't necessarily have to write it by hand there are tools which help you create html using a GUI so you don't need to know html it can generate the html for you and then you insert pieces of Java code at suitable points within that so that is very widely used that started with Microsoft application server page ASP and then the Java world created JSP which is a clone of that so that is what many people use rather than raw servlets so I have a slide also on that later and some people have already been using that and you know you can use that for the assignment if you wish but my suggestion is for the purpose of this assignment stick to servlets and then you can add a JSP if you wish and for the purpose of the project you're most welcome to run it using JSP we use JSP a lot in it it's pretty convenient it's a lot easier than servlets but ultimately even JSP pages are compiled to serve it now that's not the only thing there are many other tools even with JSP there's a lot of overhead to doing simple stuff so there are many tools so in the dotnet world visual studio does offer fairly easy ways of creating not just the html but also the stuff that goes into it for example tables and so on you don't have to code that raw you don't have to know html to create a table and put stuff in there so there are objects which you can use you can fill them in and it'll create a table in fact not only does it create a table it creates a table which allows you to interact with the table how do you interact you can sort click on a column it'll sort on that column okay so there are many more such features which are provided by specific things which you can use so now that in the dotnet world that comes packaged with dotnet in the Java world you can do some of these through JavaScript libraries so there are a lot of JavaScript libraries why UI is one popular one jquery is another very popular one so why UI is from Yahoo jquery I think came from Google so these are very popular tools and you can use those so in the web 2.told world there's a lot of Internet I'll come back to JavaScript later on excuse me sir morning so my question is what is the comparison between CGI that is common gateway interface as well as so that's so CGI is what I didn't mention the name but when I told you the old gate application would launch a process per request that is using the CGI interface that interface is very inefficient and nobody uses it unless they don't know what they're doing okay so that that's not a good way of doing things my question sir so when we are logged in into that Gmail type of things it's asking us to if you want to store that username password automatically like that yeah whether it's a kind of cookies where it will be stored either both a location or only in that local location that's a good question so many systems offer to remember your name right or remember your login for some period how do they do that they're using cookies there are basically cookies are stored in the browser's private file system area so you can view it if you go through your browser there is a way to view what all cookies are stored in the browser and then you can even so what it can do is it can store a cookie which identifies your session on this side and on their side they will have a corresponding persistent store for that cookie so that even if you come back after one day and even if they have rebooted their system in between they will remember the cookie which they sent you and if you if you if when they ask you for a cookie if you give that cookie back it'll look it up and say ah I know who this person is so that's how they remember you across sessions you may close your browser shut down your machine come back up but the cookie is stored in the file so the browser is fetching it from a file and on that side similarly the cookies stored in persistent storage maybe a database or something important question often I'm facing when I'm working with my institution environment often it's asking about that confirm the certification for each whenever I'm visiting some website it's going is prompting some security issues just come from this particular certification where you're accepting the certificate or not like that what's that actually I'm not getting okay so this is true with the HTTPS the secure HTTP protocol let me give you a high-level view of what is going on so there are several issues here in security first of all when I am connecting to a website and talking with it how do I know that I am connecting to that website okay so I am connecting to Google how do I know the other side is Google so somebody who is hacked a network can say if I get requests for Google.com route that to some other machine and that machine can accept my username password now it has my username password now of course it first has to convince me it is Google how does it do that it serves the exactly the same page it goes to Google gets a copy of their login page and serves me that login page and I don't even know I have gone to something else because I typed www.google.com and at the network layer somebody hacked it and rerouted it so I have no idea I've gone to some other site as far as I know I went to Google I'm clueless but somebody else has captured my login and my password so that's a serious problem this was recognized very early and the solution for it was a digital certificate so I won't get into the details of how it is done but it's essentially think of it as somebody else signing this thing saying that this site is Google.com now the hacker cannot get that certificate original there's a private key associated with it which is stored at the Google servers the hacker cannot get it back he can get it your network and reroute it but he cannot get that now what HTTPS protocol does is it ask for the certificate from that site and there's extra stuff going on which this what it does is it gets what is called a public key of the site so I don't know the public key of Google.com my browser doesn't know but I know the public keys pre-stored of some certifying authorities like very sign so there are a number of certifying authorities the keys for those are stored in the browser when you download the browser it's already there and now using those keys it can verify the signature of the certificate and check that this certificate was issued to Google and this is the public key of Google. Now what do I do with the public key now there's a extra protocol using this public key to make sure that the other site server actually has the corresponding private key this comes in pair public private key the private key is known only to Google it will not reveal it to anybody else and the public key is made available to me and I know that this is Google's public key otherwise that website who hacked in can give me a public key and say this is Google's public key and fool me but because I've got a signed certificate with a signature I know I know that this is Google's public key and we proceed to establish a connection so all this is part of HTTPS but the key is that the site should give a certificate back which is signed by somebody that I trust now for the purpose of testing and so on you can self-certify your own certificates so many sites say we have done HTTPS but they will give you a self-signed certificate and then when you access it your browser will say I don't recognize the signature do you want to go ahead do you trust this site or don't trust it unfortunately even IIT does this in many cases they have not bothered to get a properly signed certificate from a certifier it's not hard to do it but people have been lazy even in IIT Bombay and then we have to say okay accept the exception that's probably what you saw but if somebody hacks in I won't know no they can also give a certificate and fool me I have no idea so that is the risk now this certificate thing is gives a lot of prediction but it's not ultimate in fact people have hacked into the certifying authorities and generated certificates for themselves somebody generated certificates for Google and proceeded to use it to lure people to some other site they thought it was Google they revealed the passwords so security is never 100% so there is a Guardian but the Guardian is hacked into when somebody gets into it then you're in trouble that's question sir is regarding with that extend as well listen there are two keywords in that Java isn't it extends as well as listen and like what's the difference between these two things what extend on keyboard we are using extents extents is a previous slides we saw that extends something else here so first extents extents is Java syntax for saying that this is a subclass of the other class so this class is a subclass of HTTP servlet therefore it inherit some methods it also inherit some method definitions which it has to implement so do get us implemented it actually overrides the thing which is already implemented in the parent there are also other things do post and several others which are in there so that is extends it's a subclass now what is the other one listener listener listeners listener that is not related to this I think you're talking of the Oracle listener in this Java implements okay I'm not sure so I already told you about sessions I started on this so you don't have to worry about cookies what you do in the servlet code is you just do the following you say request dot get session false it tells you if a session has been set up already now this session setup is not actually authenticating anybody it is just setting up a session so all that checks is did this browser connect to me recently and the session has a timeout after sometime the session is forgotten so what this is checking is did the browser connect to me in the last 20 few minutes and that default you can say if it is true then it is an existing session and you can look up information which is stored about the session if it is false that means the session is new one somebody new has come I have not seen this person before and this other part false here what is that saying it is saying if the session was not there don't create a session if I say true what it will do is it'll say no it's a new session but it also creates a session automatically okay so that session is not identifying anyone it is setting a cookie it's generating a cookie and sending it I don't know who this browser is but I recognize this particular box I don't know who is behind the browser okay so if somebody else comes I know it is not this browser but it's still anonymous if I want to know who it is I have to take a login password and then do something more so in this case if it is true it's an existing session otherwise redirect to an authentication page what does the authentication page do it accepts a login and a name and a password it checks if the password matches that login and if so it does request dot get session proof what does that do it creates a new session at that point we came in here because there wasn't a session now it creates a new session so now when you see that a session is active you can look up this thing so well before that once you authenticate it you want to remember this user in the insecure code which I told you you set a cookie with the username but that cookie is sent to the browser which is insecure here you do something slightly different instead of doing it in a browser end you're doing session dot set attribute the string user ID and the user ID which you got here login password whatever that was so this is set in the session now this session information is never sent to the browser it is stored locally at the application server and it is identified ultimately by the session and that session is a cookie which is going back and forth with the browser you don't have to worry about that detail once you set this so this session dot set attribute will happen here after checking the login password so first you say request dot get session true create a session immediately after that you will say session dot set attribute user ID with that login now here if request dot get session false equal to true that means it is an existing session you will say session dot get attribute user ID you should do that that's the way you know who that session was after logging now a common security mistake is people create new servlets and forget to check this this check has to be done in every servlet that you use if you forget to do it somebody can access that servlet and get in without actually having logged in so that is a security hole in fact it's unfortunately all too common a hole because programmers tend to be forgetful so what happens then somebody who knows the URL of this thing so most users will go to the main website and get the login page type in login password and get in a clever hacker will go to a page which is not checking the user ID and provide parameters of their choice and then have them executed now the most of these things will at least have a session check so there is a session so that trick is they first login as themselves and then they go to a page which forgets which has forgotten to check for the user ID and give some other user ID and do what they want now this happened some years ago there was a US website which was used for university applications it's a very poorly designed website which unfortunately is still used by many universities so what that website did is they goofed up and they made two mistakes the first mistake was they were supposed to declare results on a certain date but they had another web page which was not public which was showing the results for testing purposes ahead of the declaration date and that URL that's I don't have the use servlets whatever technology they used they forgot to check for user ID so what some clever students found is that they could log in and then go to that URL somehow that leaked out they could go to the URL and find out whether they had been accepted by university or not they couldn't actually do much more they could only find out if they had been accepted they couldn't even see what others had done but they were not supposed to know until a particular date because maybe universities reserve the right to change it till that date they may market temporarily has accept I don't know what the reason it was and then this was found out and unfortunately for them there was a log of who had accessed that page and they had logged in as themselves so now the website after screwing up in step one very efficiently identified who all had done this and very sadly for those people the universities involved said that you guys are hacking we withdraw your admission so it was quite unfortunate for those users I you know most of them probably knew they were doing something slightly shady but none of them thought it was such a big deal because they were just seeing the results a few days ahead of time but anyway they lost their admission so the point is that it's easy to forget it and it can have real-world repercussions in a previous slide we are writing a request dot get session in bracket falls we are passing false to that method why so that is an indication to the method if you say false it will not create a session if you say true if a session exists it returns it if it doesn't exist it creates a new session so what does it mean to create a session it talks HTTP with your browser and creates a cookie and sets the cookie so that next time if it asked so what happened first time you you said request dot get session falls what happened is at that point the application server code went and talked to the browser and said give me a cookie with this name and what had happened is the browser said I don't have that cookie sorry and then this returned false request dot get session return the value false and therefore you went to the else case now when you do request dot get session true the session didn't exist it will again ask the browser and the browser says no I don't have the cookie but at this point because of this true here the application server creates a new cookie and tells it now save this cookie value that's what happens so we are doing session dot set attribute user ID in that case this user ID will be stored at client side only or as well at server side it stored only at the server the user ID is not stored at the client if you store it at the browser it can be hacked so it is not stored at the browser it stored only at the app server app web server will store this session yeah the only thing that is stored at the browser is some other cookie which is some large random string okay so now how do you run your servlet code there are many pieces of software which you can use Tomcat was one of the early ones and you will be using Tomcat but there are many other things which also run servlets glass fish J boss are both widely used and there are many others now you are going to be using eclipse you have used eclipse already yesterday eclipse does not have server built in so you have to interface eclipse with Tomcat so that when you click on something and eclipse it actually starts running Tomcat and puts your servlet code in the Tomcat directory so Tomcat can access it all this is done transparently by eclipse after you configure eclipse to use a particular instance of Tomcat there may be many Tomcats on your computer you have to tell eclipse which Tomcat to use now the problem is there is a Tomcat which runs where default and eclipse will grab that Tomcat if there is a such a Tomcat on your system and the problem is that Tomcat is running as a separate user you don't have access to it and therefore you need to create your own copy of Tomcat which you will run and there are instructions for that and the second thing is Tomcat runs on a particular port it listens on a particular port by default and if so the port is 80 80 that's why you send request now if your system already has Tomcat running by default depends on how it was set up so your desktop machine usually don't have Tomcat running by default so it's you can use all the default Tomcat settings but if you're using a machine which already has Tomcat running and you want to use Tomcat to test out your application you have to edit the Tomcat configuration file to use a different port number okay don't think our instructions include all that we have assumed that Tomcat is not running which is typically the case our desktops don't have Tomcat running your laptops probably don't have Tomcat running but in case it's already running beware that you have to go edit the Tomcat files and change the port number the second thing is eclipse and Tomcat can talk to each other and eclipse when you tell it use this Tomcat eclipse knows which port but if you want to directly Tom to Tomcat you have to give the port number okay so now let's quickly brush over a few more topics I am not going to go into detail the first is server side scripting and we already mentioned JSP Java server pages this flips the regular servlets in regular servlets Java code is the main and then you have print line for all the HTML code here HTML code is main and then in angular bracket percent percent angular bracket within that you embed Java code and there is a translation system which runs on the fly which takes this and creates a servlet out of it and then compiles that servlet and runs it all dynamically you don't even know it's going on but it's based on servlet this is just the layer on top of servlets so it's easier to do things here what is not shown here is how does this JSP connect to the database so you need to include some other class which does the database connection and call it from here to get a database connection and then you can have code here which talks to the database or you can just call some other external Java function from within that and that function can do everything else you want for database connection and what it returns is what it does here by the way if you see out dot print line hello world this gets embedded inside this HTML okay so that's how the Java code here and the enclosing HTML interface with each other so you have to use out dot print line to generate HTML code which is embedded inside this enclosing HTML so there are a lot of tags which you can use with this PHP is another very very widely used model for example is built using PHP it's actually very easy to write a lot of code in PHP the overhead is less than with it's like JSP so here you will note this is all HTTP and then open angular bracket question mark PHP and then question mark close bracket that's a delimiter everything in between is a PHP script and there is a bunch of predefined things in PHP for example dollar underscore request is the equivalent of HTTP request in servlets and it's treated as an array indexed by the parameter name it returns the name which was set and of course you have to see if it was set so what this is saying is if not is set request name then just say echo hello world otherwise echo hello world request name and now note that echo here is like out dot print line it the HTML page generated has all this plus whatever this has a code so Apache includes a module to execute PHP in the process itself so then the PHP is executed and whatever required is done there and then that pages so Apache is Tomcat sorry not Apache HDD PD is essentially an application server for PHP the next thing is client-side scripting was a long history for client-side scripting JavaScript is pretty old initially it was used in very limited ways today it's very widely used all of them 2.0 technology which you all of us use extensively when you go to Gmail when you go to Facebook when you go to LinkedIn when you go to pretty much any website today you are using web 2.0 technology which at its core requires JavaScript so that the browser can do stuff more than just send request get result back which was original HTML I'll skip the details here but just give a small example of JavaScript how many of you have used JavaScript quite a few but not all for those who haven't you should encourage your students to use JavaScript because today pretty much all applications use JavaScript but on the other hand you have to be very careful with JavaScript if you use JavaScript raw to interface with HTML and the browser and so forth what happens is you can create web pages which only run on certain browsers okay so this unfortunately happens all too often there are minor incompatibilities between JavaScript implementation in different browsers so today if I go to the computer science department webpage unfortunately one part of it was written using some JavaScript library somebody downloaded from the web and if I knew it from one of Firefox or Chrome one of the two there's a drop-down menu the menu appears but when I try to go and menu item and click it vanishes it's like playing a game it's taunting me haha try to click on me I'll disappear before you can click very irritating and I was the student who is responsible for it to fix it and he says sorry I don't know how to fix it so when I get time after this course I will go and bash him up but as of today the page doesn't work properly on all browsers very frustrating so this is an example of how not to use JavaScript do not download random JavaScript libraries which somebody has created and use it because they probably didn't take care to that it'll work across all browsers even worse don't write your own JavaScript raw I'll tell you what I mean by that then you can guarantee that your JavaScript will only work on the browser you tested it on it will not work on anything else the one which will download at least works on two or three browsers okay it's not on all so what do you do the trick is see the core JavaScript language is standardized what is slightly non-standard is some of the interfaces with the browser so the trick is to use standard JavaScript tools so I mentioned this before YUI is one such thing from Yahoo and jQuery is another these are very popular JavaScript libraries and if you use these you call functions that they provide and that function will actually check which browser are you using and use the appropriate code for that browser now supposing your browser changes when browsers keep updating these guys do make some effort to first of all they the initial code itself is pretty good they have taken care that will work across all browsers and periodically they update it in case new browsers become popular they will make sure it is compatible with the new browser you just get a new version of the library and download it your application should work fine so you don't have to worry about incompatibility so that's the only way to write JavaScript interfaces core programming language is standard but interfacing with a HTML and browser never do it raw always use libraries okay so let's look at what is going on here this is JavaScript this also HTML inside of HTML says script space type equal to text like JavaScript function validate blah blah blah so all this is going to be executed by the browser now this function validate is saying document dot get element by id credits dot value blah blah there's a bunch of stuff this code is actually accessing the HTML in the browser so this document get element credits and so on is part of the HTML of that page this function is accessing that HTML and this is what I said you should try to avoid as far as possible the simplest thing is okay this is probably okay but once you get to more complex stuff it the exact syntax varies by browser so you should not do that directly you can do it but you're asking for trouble and then it checks if it is not a number or if credits less than 0 or greater than 16 it puts an alert saying credits must be a number greater than 0 less than 16 return false what is that doing when the function is called it checks if you entered a reasonable credits if not it pops up a box saying sorry error and then you have to click okay to make the box go away and then what have when is this function called it's called here form blah blah blah input type equal to submit value equal to submit that's the submit button in the form it says action equal to create course so this is a servlet maybe with create course and say on submit return validate so what this is doing is it is calling this function and if the function returns false it never goes back to the application right at the browser it short circuits it and cancels your request this is very useful this is you find this in many many places some validity checks are done before you go to the back end makes life easier for you and if validate returns true which is the default if it runs out here it returns true then the summit goes ahead and summits it to the application is clear that's quick key on JavaScript skip some details here some buzzwords object relational mapping how many of you have heard of this buzzword anyone how many of you have heard of hibernate I don't well there are many meanings for hibernate I'm sure most of you I have seen the windows sleep hibernate shut down options I don't mean that hibernate I and I don't mean the bears hibernating either I mean the hibernate object relational system how many of you have familiar with the hibernate object relational system anyone one two whatever few very almost no one so what is object relational mapping so when you build a Java application you can write SQL to get data from the database and then process it it's a lot of work to get to write SQL and get at it so for a long time people have been attempting to provide a way when in Java you can write a function which says you know get student given a primary key student dot name student dot address and you get a student object you don't need to write SQL underneath you have to have some way of telling the system when somebody says get student with a primary key you actually have to go to the database get a student record using that value as the primary key fetch it and create a Java object out of it what is the Java object contain for every attribute of the student record it'll have a corresponding method like get name get address whatever and maybe set name set address if you want to allow updates to go back also so what it's doing is it is mapping relational tuples to Java objects and there are two steps in using hibernate the first step is the mapping step that's why this is called object relational mapping you first create the mapping step somebody has to tell the system this is the table in SQL and this is the corresponding Java object and here is how I map attributes in the relation to fields of the object and this mapping can be reasonably complex it may not be direct one-to-one I can create an object student which has a method which is the courses which will return a set of courses that the student took so this mapping is taking the student row and the takes relation and combining it to get a Java object so you can even specify such things but we won't get into that the bottom line is you specify this mapping and then the programmer simply says student dot get with the roll number and then print student dot name print address whatever it's a lot easier to write this than to write the SQL query so it's become very popular there are actually two reasons for its popularity a it's easier to write b how to generate that SQL you have to write SQL for Oracle you have to if you switch your database to Postgres the syntax changes a little bit you have to go rewrite on it for Postgres hibernate another object relation ORM systems what they do is they will take care of these details for you they will generate appropriate SQL for that database to fetch that particular student record so it's very easy now if you build a system with hibernate hibernate is open source so it's free if you build a system with hibernate you can retarget it at very short notice from Oracle to Postgres SQL why do people like this because it gives them bargaining power so nsdl which does several things the national securities depository it also manages a lot of your tax information I'm sure many of you have used the pin dot nsdl chalan how many of you have used it so nsdl does a lot of these things so they were building a new system and a professor fartik was their consultant on what technologies to use so he told them you know you probably will end up using Oracle as a back-end but don't tie yourself to Oracle use hibernate so they could build the whole system using hibernate and when they actually had to negotiate to get a database they could play Oracle versus IBM versus Postgres SQL even if they wanted it didn't have some features they needed but they could tell these vendors look our system is written in hibernate you give us a good price otherwise we will go to this other guy so what have you achieved you avoided vendor lock-in if you have vendor lock-in you pay whatever price the vendor say you can do nothing the vendor says 5 crores you pay 5 crores or you're stuck they didn't so the point of ORM is they no longer stuck they could get a good price and they were really happy with professor fartik for advising them to do this because they saved a lot of money later on when they were negotiating so that's one of the non-technical reasons ORMs have become very popular it avoids lock-in any questions on this there are drawbacks also of ORM in particular if you want to write complex queries it's not easy you know it's great for fetching a record updating a record but it's not good for complex queries they have defined their own query language and so on but people prefer to code directly in standard SQL rather than use hibernate query language and of course once you do that you are logged into whichever SQL you are using but what is important is the number of such complex queries is small usually application has a lot of code which uses very simple SQL queries and a little bit of code for reports which use complex queries so that part alone they would have to rewrite if they move from oracle to db2 the other parts are in hibernate it's a lot easier any questions somebody with the mic yeah is it the hibernate and ejb are the same technologies because in ejb also same kind of things are happening ejb yeah so as I said there have been many attempts in the past to do this mapping and hide the details ejb was certainly one of the earlier attempts but it didn't take off that much hibernate on the other hand has been very popular so why did hibernate take off and ejb didn't take off I think I'm not sure exactly I think there are some technical differences which created problems for ejb whereas hibernate could do it in fact this has been a very long-standing thing this dates back to the 1980s at least since then people have been trying to build these systems which provide an object view of data in a database there have been many attempts which most of them failed they were all commercial companies which you know so people don't use them because it's a lock-in to that commercial company what hibernate selling point is that it's open source you're not locked in you don't pay us anything and you avoid lock-in to a vendor so I think the economic case was very strong for hibernate ejb also had that but I don't know why it didn't take off sir yeah sir stored procedure we can also use in database queries and here the object session we are also using as a broadly object relational model is there any kind of correlation or is these are completely different so it's different so stored procedures are code which runs in the database and this is stuff which is running in the application the application is talking plain SQL to the database now stored procedures actually they're useful in one way because you can do work in the database without back and forth to the application but from the viewpoint of lock-in they are very bad because no two databases have the same stored procedure language if you write it for Oracle it'll work only on Oracle it will not run on PostgreSQL and vice versa so you get a lock-in if you use that so many people avoid it it's very useful but people avoid it simply for economic reasons of avoiding lock-in but if you're doing it with PostgreSQL and you know it's not economic reason you could use their stored procedure language but again if you want to migrate from PostgreSQL to some other database for scalability or whatever other reasons then you're locked in okay so let's move on to that so there's some slides on hibernate I'll not go into details there's also an entity data model which Microsoft has been pushing for some years not been very successful but it has similar goals some slides on performance of web servers again I'll cover it in the main workshop I won't do it now and then there are some slides and security SQL injection I already covered but it's again repeated here to make sure that it's driven home because it's so important but also here I want to mention one other kind of application level security issue called cross site scripting what happens in cross site scripting so what is the model for your browser identifying who you are to it to a particular website what happens is a request is sent from your browser to website it does something if I connect to the to my bank I log in and do something my when I click on a submit button somewhere there is a HTTP request being sent to the bank's website saying do this action now normally you think that this would only happen when I click on a submit button on the bank's webpage it happens with my knowledge you think not true supposing you did the following I go to some shady website and that website has the following code image source equal to HTTP colon slash slash my bank dot com transfer money question mark amount equal to thousand to account equal to something so let's say that your banks webpage had a transfer money link when you filled in the amount and the account number click submit this is what it does let's suppose now you go to this website you never clicked on any such but you're logged into the bank that's a prerequisite you logged into the bank you didn't log out in between you went to another website now that website had this piece of text in there image source equal to now why image source equal to because the browsers today allow you to fetch image from so when you go to a site food dot com it can say fetch this image from somewhere that's needed for many applications to run properly so the browser cannot say I will only fetch images from your website if I go to website x if the browser says I will only allow images from site x many applications will stop working so browsers don't enforce that they allow images to be fetched from somewhere else in this case the browser has no clue that this is not an image doesn't know so it goes to that thing it executes that request and that request now goes ahead and transfers the money so what has happened you visited some site that site had a script this you can think of this as a script which executed a request on some other website that's why it's called cross site and your money was transferred gone if the if you tomorrow you go to the bank and say I never did that the bank will say here's a log you were logged in you did this you didn't do it really but it was done on your behalf it's your fault the bank will not accept responsibility it's your headache not theirs okay so you understand what has happened so how to prevent this unfortunately it's a very hard problem if you go to an untrusted website it can do whatever it wants so lesson number one is don't go to untrusted websites go only to websites which you know are reasonably well maintained that they won't allow they won't do such things themselves and they will also put enough predictions that it's hard for hackers to get in and do that the lighter part is very hard many websites have been hacked you go to a government of India site which has been hacked this might be there so of course the other way is when you deal with your bank log out immediately and banks tell you to do this all those who have used online banking system they will always say do your transaction log out immediately don't hang around okay they will also have automatic log out after few minutes but they will also tell you log out immediately to avoid any such problem so you can do a few things to predict yourself and there are other hacks so many websites I give comments and they'll show the comment to other users in that comment you could put in this code the website was not a cheating website it simply took comments from users and showed it to other users if they allowed this comment to be put in they have just enabled cross-site scripting so whenever a website takes user input and shows it to other users which is very important that they do what is called sanitizing the input what is sanitizing removing this kind of stuff okay so this is this thing prevent your website from being used to launch these attacks so disallow HTML tags in text input provided by users so there are functions which will detect and remove these tags use them if you take input from the user which you show to other users do that and there are a few more tricks here I don't have time so go read it up later I just wanted to tell you that these issues are there so that you are aware of it if you build applications today you have to be aware of SQL injection you have to be aware of cross-site scripting these are two things which are widely used by attackers today okay so in the main workshop I will actually be spending more time on indexing and query processing but for this workshop because of condensation I decided I will cover this in detail I'm going to shrink indexing greatly I'm sure many of you have already taught it and you know about it so I'm just going to highlight a few points in there and similarly in query processing covering there's also a chapter on storage including file how data stored in files and and about disk storage self-read and so on and buffers but I'm going to skip it here but I will cover it in the main course