 What I am going to do now is introduce you to a few topics in the next 5 minutes or so without going into any detail whatsoever and let you read it offline. So, do not try to read the slides, please read only the headings of the slides. I know time is limited, but I wanted to introduce you to these terms so that you will go back and read them. The first one is an application architecture, there is something called model view controller architecture which is a buzzword many people throw around and this slide tells you what those things are. It is a way of building web applications which separates out the code into layers. One layer which has all the real logic is often called the business logic layer and that layer does not worry at all about how forms appear and displayed, how you collect input, how you validate input all that is irrelevant for the business logic layer. The business logic layer does simple things like register a student, look up a student's registration and return the data, it does not format it. The presentation or user interface layer takes the data which it gets from the business logic layer and formats it and displays it and vice versa. This helps build applications in a fairly modular fashion and below the business logic layer is the data access layer which actually talks to the database. Now typically data access is done directly in SQL however increasingly these days there is a something called object relational mapping which helps to hide the SQL from the upper level program and I have a slide coming up on that. So I will talk more about that in a little bit. So this is a pictorial depiction of the same architecture, model, view and controller. So a request comes to something called a controller which talks to the model which is basically the business logic and when it gets a response back it talks to the view part of the code which formats it, returns it and then that view is displayed to the user. This is a abstract model and the model itself talks to the data access layer which talks to the database. So as I said increasingly there is something called object relational mapping which is used in many applications. So what is object relational mapping? Many of you might know about object oriented databases and object relational databases. They were all designed to create a brand new language, object oriented databases took C++ or Java and said let us add features to the language which let us directly store objects in a database and fetch them without using SQL. The database cannot be an SQL database as a result. These systems were researched and developed extensively in the late 80s. There were a number of companies which brought out object oriented databases and many of the board syllabus in India include object oriented databases as an advanced topic. Some even have it as a basic topic unfortunately all of these object oriented databases are pretty much dead. One or two of those companies are barely alive with a few employees with a small revenue but they failed miserably. What went wrong? What went wrong was people actually do need a declarative language like SQL to do data analysis and so forth. It is not enough to have C program directly access the database that is one problem. Of course the vendors realized that they created their own language but those languages did not catch on. The second bigger problem was the moment you directly modify an object from C++ there were a lot of errors because a small pointer error somewhere messed up the database. They were horror stories of a small bug which would completely destroy a database. That is completely unacceptable for any real application. One small mistake can ruin a database nobody will touch it. So that is probably the biggest reason that those systems did not take off. So that you know they have kind of bit the dust and then people said that is a bad idea. Now let us add object oriented features to relational databases. This led to the object relational model which in some sense both preceded and succeeded the object oriented databases and SQL added a lot of object relational features around SQL 1999. This was after the object oriented databases were dead. Unfortunately for all those who put a lot of money into object relational features they have met more or less the same fate as object oriented databases. There are several reasons for this. One of the reasons is standardization never took place seriously for object relational features. Each database did its own thing. SQL had a standard but nobody bothered about the standard. Each one did its own thing. So the problem is if I use object relational features in my application and I am using Oracle, I am locked into Oracle. Oracle knows that once they know I am using it. And guess what? If I am a company and I use Oracle, I buy one license for my development. I want to deploy, I want to buy some licenses and guess what they are going to charge an arm and a leg. They are going to, they know what it will cost me if I do not deploy this and their quotation is going to be based on that. They can charge what they want, I am stuck. So getting locked into a vendor is something which people are really scared about because then they have to pay whatever price. You can get locked into PostgreSQL for the matter. Their price is not an issue. It is free. However, if you need a new feature which PostgreSQL does not offer and becomes critical like materialized views, then you are stuck. You cannot migrate to Oracle which offers you materialized views or SQL server. So people do not want to lock in even to something free like PostgreSQL. So given non-standardness, these did not fly. So more recently what people did is they said, let us forget about all these guys. They are wasting their time. Let us build a layer which sits on top of a relational database. It talks to the relational database in SQL. So you can talk to any database as long as you use SQL. But the interface it provides above is of objects very much like the old object-oriented databases. The database interface is pretty similar, but database access is purely in SQL. So what this means is if you have a small pointer goof up, that will cause the system which is which you are running to crash, but it will not cause any damage to the database. Unlike the old object-oriented database where you could mess up the database due to some pointer error. So these are fairly robust that is a public domain one called Hibernate which is very widely used these days. There are a few other proprietary ones also. Recently there was a project for NSDL, National Security Depository which is a company which does backend stuff for about 80% of all stocks that are traded in India. The BSE, NSC stock exchanges they do the actual trading. After the trading happens there is something called settlement. So NSDL has about I think 80% of the settlement so every share which was bought or sold eventually that has to go to NSDL for settlement. So that is a very, very big operation NSDL is processes huge amount of data every day. So NSDL has an old legacy system and they wanted to build a new thing and they consulted Professor Fartuck and one of the things Professor Fartuck told them is look you are a company which is obviously a very high value company. If you build an application which is locked into Oracle or DB2 or any other database the vendors are going to quote an arm and a leg for their product. They know you are locked in they know it is very high value instead use Hibernate. So Hibernate was used not for the object features really although that was a definite plus for the programmers for using it. But it was used because Hibernate can provide a single interface to the programmer where they have a notion of objects. Those objects are actually stored as tuples in a database. If you retrieve an object it is created dynamically by fetching the data from the database. If you update an object the updates are translated back into SQL updates to the database. All that is done by Hibernate and what is nice is that you can switch the back end. Hibernate will work equally well with Oracle, PostgreSQL, DB2 you name it it will work. So without changing the application at all you can change the database underneath. So it is like take one out slide another in and it will work. Hibernate just works seamlessly. So for that reason they use Hibernate and then they may use Oracle or DB2 whatever but they are not locked in. If one of the vendors says quotes too high a price they can say fine we will take the lowest vendor and they know that and their prices will be under control. So Hibernate was used was recommended by Professor Fartuck for this non-technical financial reason that is one of the reasons it is taking off. But there is also a technical reason where when you have logic business logic which is written in Java Hibernate is Java based. So when you have stuff written in Java the programmer no longer has to write SQL. They can say fetch object student with this ID instead of writing a query select star from student where ID equal to this there is a simpler API. Then they can look up the fields of the student just as if it is an object Java object and then they can update the field and that update will be translated into SQL. So it is actually easier to use for a variety of tasks than to write it directly in SQL. So it is actually a big win. There are limitations. Hibernate is not the right choice if you are running complex queries. It has a query language but it is more work to learn it. You may find it easier to write queries in SQL. And it is also not the right choice if you are doing a batch update. If you are inserting a million rows at a time Hibernate will slow down the process. There are some limitations but it is a technology which is taking off and it will be good if your students know about it. So that is the reason I have spent nearly 10 minutes telling you more about Hibernate. I am not giving you actual Hibernate code. I am motivating Hibernate. So please go back and fool around with Hibernate. There is a little bit of description in the book. Not much kind of just enough to give you a flavor but if you actually want to code in Hibernate you will have to go read up more stuff. So this slide has a little bit about how it opens a session, create an object, save the object and retrieve run queries which fetch objects and so forth. Okay I am going to stop there with Hibernate. There are other things like the entity data model. I am going to skip that. There are some more slides. I am going to go really fast here on web services. The only thing I want to mention is this is again something which was introduced a long time back, 10, 12 years ago. It did not take off that much except in some limited domains, non-database domains. It is actually used a lot. But more recently even in database backed applications web services have become increasingly important. There is something called REST which is a way of accessing web servlets or any other web application. From a program as opposed to from a browser which is being used increasingly to build applications. So your advanced students may want to figure out more about REST and about something called JavaScript object notation. If you know XML, this is a simplified XML-ish thing which is very tightly integrated with JavaScript and it is used extensively off late in building applications where data has to be transferred to the browser from the back end. It is also used to store data in a database and retrieve it to the application. So it is used in both levels. There are a few slides on rapid application development. I am going to skip these, go read them offline. There are also a few slides on web server performance, something about parallelism which I already told you about. Some stuff on caching which you can read offline, I am going to skip it. There are a few slides on application security and we have a slide on SQL injection. This is actually the same stuff which I covered in chapter 5 but I just wanted to bring it up again because I want to remind you over and over again that SQL injection is by far the biggest danger and it is also unfortunately the most common loophole that programmers tend to leave. Even this semester in this course, after I told my student many times about one fourth of the student projects had SQL injection vulnerability. So it is something which has to be driven home. You have to be very careful about this. So let me wrap up with a quiz question but before that I have already told you yesterday about SQL injection, I am not going to do it. There is one other class of security vulnerability called cross site scripting which can occasionally be an issue. It is not very common but it can be a big danger in certain situations where you allow a get method to update the database. I told you at the beginning of today's talk if you remember the get method should never ever be used for updates. It should only be used for lookups and everything else should be done by post method. One of the reasons for that is if you allow the get method to do updates that leads to a class of vulnerabilities which what are called cross site scripting. I do not have time to describe it in detail. So let me wrap up with this stuff in the slides and in the book which tells you about what it is and what you can do to safeguard yourself. Producing the post method is certainly one of the important parts of safeguarding yourself. So let us wrap up the morning session with this last quiz question. Everyone please press the ST button and be ready. So the question is is the following prepared statement secure connection.prepare statement select star from instructor where name equal to quote plus name quote using a prepared statement and this name comes from some user input. So the questions are yes it is secure no it is not yes if the table is locked and the last one is none of the above. So please get ready to answer it. Time is out for that question. So yesterday I already had an almost equivalent question and I showed you and I told you that prepared statements are key to avoiding SQL injection but it is not a magic mantra. If I just say prepared statement that does not solve all the problems. What is important is any user input has to be mapped to question mark values in the prepared statement and then you have to use the set string or set in those functions to set the parameter values with the value provided by the user and then you execute the query. So there is a sequence prepare statement set the parameters execute. This particular prepared statement is screwing up because it is concatenating the user input which is the cause of all the problem and once you have concatenated user input it is too late saying prepared statement makes no difference the SQL injection can still happen. So let us see what people thought the number of centres responding has improved significantly 144 answers and yes this time the mass majority have got it right but again significant number have chosen one of the wrong options. So please revise these concepts and understand what is going on. I can now take questions so please indicate on a view if you have any questions. I am going to take questions on all the topics which we covered today morning. I see one question from NIT Varangal, NIT Varangal you are on. Okay we generally see the different web pages cache the contents and the next moment we open it up it opens up fastly. But my question is can the web server be cached for performance reasons if so how? So please turn off your mic. So the question is can the web server itself be cached well what is caching? Caching means creating a copy on the fly. So I do not quite know what you mean by caching a web server but let me interpret it as follows. The web server as I told you is often parallelized meaning there are thousands of machines Google dot com, Gmail dot com, Yahoo dot com, Bing dot com, Rediff dot com, I am sure you know economic times dot com. Any website which is high volume today will have many many servers running it and even though you have the same URL they do a trick to decide where your request goes to. So depending on which IP your request comes from it is routed to one of these hundreds or thousands of servers and it will be routed consistently to one server if that server fails it will be routed to somebody else. But the point is that server will keep track of interaction with you. So first of all it is highly parallel. The second thing is if the demand increases so supposing you know times of India most of the request will probably come in the morning some through the day some more in the evening. So there are going to be a lot of times in the night when basically the load is very low. So more still supposing some major event happens suddenly the new sites have a huge increase in the load whereas some other sites may not have such an increase. So what many data centers today can do is they can bring on more servers on the fly. So supposing they find that for times of India dot com they have allocated 10 servers and suddenly there is a spike in load. They can on the fly bring in 10 more application servers for times of India and now your request new request will be routed to one of these servers and as a result an increasing load can be met by increasing the number of servers. This is not the same as caching. These servers have to reside in the data center. They cannot reside on your desktop that does not make any sense. So caching only makes sense for data which you are seeing. It also makes even more sense for a lot of JavaScript code. So if you go to gmail.com what you see is just a little bit of stuff on the screen but to implement all these fancy user interfaces there is a lot of JavaScript code. If you had to load this every single time you went to gmail.com it would be very slow. So what they do what everybody does is they package this JavaScript code into a few library files. Yahoo has a similar library called Yahoo User Interface, YUI which is open source. It is a very nice resource for building rich web applications with JavaScript, Ajax support. It is really nice. If you ever code in JavaScript you should use YUI. You should never ever code JavaScript raw by the way for the simple reason that there are differences in the JavaScript support by browser. So IE deals with JavaScript in one way, Firefox deals with it in a slightly different way. And if you code it your application may work on IE it would not work on Firefox and vice versa. Then you have to sit and test it on each browser and make sure it works. Very often they do not. So I have had situations where I went to passport Nick in from Firefox. I could fill out a number of things and then when I submit it fails because they had included JavaScript which they only tested on IE. That was very unfortunate. I think they have fixed it now but this was the situation for a very long time. If you use YUI on the other hand you do not code directly in JavaScript certain things are hidden from you certain JavaScript libraries and YUI will take care of you know which browser you are on and generate an appropriate code use appropriate library function actually depending on the browser so that it will work correctly. So coming back to caching all of this code which your browser is executing the JavaScript code is downloaded as a file and cached in your browser. So that the next time you go to Gmail, Gmail does not have to ship all of this JavaScript code it only ships you the relevant data which is what mails you have in your inbox and a few other things like that and the rest of it is cached. So caching is certainly a very very important way of improving performance. Another way in which caching is used so if you go to let us say certain site like I go to New York Times dot com often because they have very nice articles on science and technology and health. Now if I have to go to their web server in the US each time there is a long round trip delay. So what they do is they outsource to a company they used to use Akamai I do not know what they do today but what that they do is that Akamai or similar companies cached the data from New York Times in a server which resides locally here in they have offices in Bombay I am sure in most of the metropolises in India and across the world. So what happens is that content is cached so when I go to New York Times New York Times actually redirects me quietly to Akamai I do not know about it but Akamai is the one which is actually serving me the actual data with the articles and the photos and so on. So it comes very fast because it is actually cached locally. So that is another kind of caching trick which is hidden from you but is used very widely to both to improve the response time and to deal with very high load. So New York Times server will get some initial request but the request for individual pages are handled by the thousands of servers which Akamai has. New York Times does not have to see all the details they will get summary information but they do not have to send all those bytes over. So the load on their servers is decreased drastically by such caching there is a whole lot of tricks which go on I would not get into the details. Back to you if you have a follow up question. Good morning sir. Sir I have a question you said about HTTP protocol. So we have something also called HTTP S protocol which stands for HTTP secure protocol. So can you state how the security consideration is different in HTTP protocol and HTTP S protocol. Thank you sir over to you. Thanks that is a useful question. So the question is what is the difference between the HTTP protocol and the HTTP S protocol. The HTTP S protocol is a protocol with security built in. Now what is the need for security you know I send a request to sbi.co.in it should go to sbi.co.in presumably what is the security problem. The problem is it is possible for hackers to hack into router and in fact such things have happened. Where they say that if a request comes to sbi.co.in they will redirect it to some machine which they probably captured using a virus and so the request is redirected to that machine and that machine captures your login and password which you are sending to sbi and forwards it to sbi and so you think you are talking to sbi they are sitting in the middle they have captured your login and password and hey they can go back tomorrow and take all the money out of your account. That is a serious problem and the problem arose because you cannot fully trust all the links between you and sbi.co.in somewhere in between somebody may hack and cost trouble. HTTP S is a protocol which deals with security at end to end. So one of the things it does is it is based on a system of certificates digital certificates there is some discussion about digital certificates in the book. But you can learn a lot more about HTTPS from Wikipedia or other sources we do not cover it in detail in our book. So the idea though since you have asked the question is there is a public private key encryption scheme which through certain mechanisms through digital certificates I do not have time to explain exactly how it works although a little bit is there in the book. So the key thing is that your browser can actually verify if it is talking to sbi.co.in or to somebody else that is part A. Part B what prevents is somebody sitting in the middle listening in on your conversation because there is a part of the protocol which uses public keys to exchange a secret key which somebody snooping in between cannot see. Your browser figures out the key which sbi.co.in is using to talk to you. So two of them use some protocols to prevent a man in the middle from ever figuring out what that session encryption key is and as a result if somebody reroutes through their site they cannot see what is the traffic being exchanged because it is encrypted. So the most important your login and password are being sent over HTTPS means that they cannot ever see it. On the other hand if you send login password over HTTP they do not even have to impersonate you they can just snoop on the wire they can put a tap on the wire see the packets going through and in the packet in plain text is your login and password they can just read it and save it. HTTPS prevents this. Now you will say it is not so easy to get into the network and tap in well what you are using Wi-Fi if you do not secure your Wi-Fi unsecured Wi-Fi people can listen in and tap into it easily. So there is a lot of risk of tapping especially when criminals get more sophisticated probably it does not happen that much in India yet but it will happen sooner or later. So HTTPS provides very good security to prevent such things from happening and one of the things you notice in your browser is it shows you a symbol which says it has verified that you are talking to SBI or you are talking to whichever HTTPS site you are talking you think you are talking to. So that is part of the implementation of HTTPS in the browser. I hope that answered your question back to you if you have a follow-up. Sir I have one more question in one of your slide you mentioned about application server and web server in the other part you combine those two things. So I wanted to know just we have something called CGA server so can I say a CGA server or web server or an application server over to you sir. Okay CGI is the protocol which web servers use to execute application programs and run the program get the result and ship it back. This was the original interface it is also not that efficient but it is fairly easy to use to put programs into the CGI folder including PHP and other programs. So in that case there is a web server and then there is a separate thing which process which runs the application. So that is typically a three tier meaning there is a web server application program or server in the database. In contrast when you use Tomcat there is no separate CGI Tomcat can directly receive a web request and then execute your servlet. So when you use Tomcat these have been folded into one. Now some sites do not directly use Tomcat they use Apache as a web server and then tell Apache when you get a request for this region of the website forward it to Tomcat. So in that case Tomcat is running as the application server there is a separate web server also. So that depends on the configuration which is used. Yeah good morning sir. My question is related to JSP sir like in one of the sites it is given that JSP is compiled into Java plus servlets. So you mean to say that JSP when compiled implicitly creates a servlet related to that? Yes exactly it creates a servlet servlets of course are in Java so when JSP is compiled it is done on the fly the application server Tomcat for example will take your JSP page create a Java servlet program out of it compile it and then execute it. How can provide security in JSP? Okay security in JSP is no different from security in Java now I told you one aspect which is when you talk to the database how to prevent SQL injection that is one part of security. A second part of security is you know cross site scripting and other such issues which I did not talk about but they are on the slide. A third and very important aspect which people sometimes forget is every JSP page must first authenticate users before doing anything it should make sure the session is active who is the user before proceeding. Now what I have seen is people will do this in 90 out of 100 JSP pages but in 10 out of 100 they will forget this check and this is not detected during testing because nobody is trying to break the system and once it is live somebody will eventually figure out that this particular JSP page can be executed without any login check or you know you can do something which violates the security policies of the site. So what is important is that every JSP or servlet regardless of how you code it every thing in there must check that the session is there people have logged in and that this person who is currently logged in is authorized to use this particular page that is a minimum security that is required. Now people forget to do this so a policy which we use at IITB for example is that all of these checks session active is the person authorized to use this check these are all put in one JSP file which is included in every other JSP file. In fact this file has some other stuff like headers information for display and so on. So the moment you include this file you can be sure that if the person is not logged in so if the session is not active the page will not get executed if the person is not authorized to view this page we have an authorization menus on the menu systems all that is checked and if it is not authorized the person is rejected only then the rest of the JSP code gets executed. So that is one way to secure a JSP application I hope that answered your question back to you if you have a follow-up. One more sir. Yeah. Do you need to install a Java package to execute an HTML is called as JavaScript? Okay the question is if you are using Eclipse or for that matter NetBeans if you are creating HTML with JavaScript do you need to install Java that is a good question the term JavaScript the name JavaScript is very unfortunate in some sense because JavaScript has absolutely nothing to do with Java. In fact the person who invented who created the JavaScript language called it something else then the marketing people said hey Java is hot by the way it was all from Sun. So JavaScript was from Netscape so the marketing people there said Java is hot if you call it XYZ people won't use it if you call it JavaScript then people will use it and that is how the name came about but it has nothing to do with Java you can create JavaScript from any language you want PHP dot net you name it and as a result if you are writing a PHP program using JavaScript there is no need for Java in your IDE okay I hope that answers your question back to you if you have any other question. Yes in credit card applications especially the debit cards master and visa cards and the database is a locally distributed or they have a copy of database but if someone withdraw money from Chennai airport could fly to Delhi in another hour can also withdraw money how they are integrated and how often that is getting updated that's a good question I think quite here the first few words but I think I understood the question so let me repeat what I understood the databases used by credit card things agencies are probably there are no single database centrally they are cashed locally in order to get performance and volumes and as a result if you use the credit card in medras and then go to Delhi in one hour and use it again there how do they make sure things are consistent and so good question which I will answer although in this particular course we don't have time to get into this issue of distributed data in detail there is a whole bunch of work on dealing with these kinds of issues in this case what they would probably do is see that kind of transactions which run are relatively straightforward so you run a transaction which says somebody is trying to you know buy a product for 2000 rupees or somebody is trying to withdraw 500 rupees from the machine and then in the first case if your card is swiped they trust the vendor who is doing it and then they give a response back saying okay authorized now at this point maybe your local database is doing the authorization and if you do the same thing in Delhi immediately you said you fly to Delhi but hey maybe you made a copy of your card and given it to your friend in Delhi and your friend uses it immediately in theory it would be possible for both of you to withdraw money let's say you are allowed to withdraw 10,000 rupees total and you withdraw 9,000 rupees your friend at the same time withdraws 9,000 rupees because the checking is happening against a local database it's possible that both of you succeed and you are overdrawn the bank will take a risk on this hey it can happen occasionally but it can also happen that you legally withdraw 9,000 rupees and don't pay it back then what in fact the secret dirty secret of credit card companies is that they hate customers who pay all their money in time they love customers who delay payments because then they can charge late fees and exorbitant interest rates so they probably love a customer who overdraws as long as they can trace that customer and extract the money from them in fact there is a whole dirty system of gundas so eventually a gunda will come knocking at your door if you do such things so they are reasonably protected it's a risk for them occasionally somebody may not pay back they are willing to take that risk so consistency in this sense is not guaranteed the moment you have copies of the data but at a business level let me be acceptable on the other hand if you absolutely need consistency then you cannot keep local copies of the database it has to be completely central in fact these days the need for local copies is diminishing connectivity on the Internet is very high bandwidth very fast so visa doesn't need to maintain copies of the database what they will do is each issuing agency will have its database visa will simply route it somebody swipes a card here visa will take that route it dynamically to your credit card issuer who will process it send the response and it comes back so in fact in this case it is not distributed each provider has its own copy and it will be 100% consistent that is what is actually happening these days but many ATMs and so on they are designed such that if the network fails they will still operate and that mode the mode of operating during a failure is where you can have an inconsistency but hopefully it will be rare so the chance of it is small okay I hope that answered your question if you follow please go ahead yes sir it's it's true in credit card but in that international debit card and they use some kind of I heard that some daily limit or banking risk amount say like 15,000 rupees are something is that true yes I don't know about international debit cards in general but your ATM card in general there is a limit and the limit is for couple of reasons the limit is not simply because of this distributed database issue the limit is because somebody may steal your card and spy your password there was a very recent scam where somebody installed a camera on the ATM machines tiny camera which could see what you are typing in and there would be people hanging out there and after you swipe in the credit card your bank debit card ATM card rather then they would say there is some problem and make you swipe the card in their reader and there's a more sophisticated version of it where they install their swipe reader on top of the banks ATM and when you put it in it is actually is getting your magnetic stripe information and the camera is getting the keystrokes and now they can withdraw money at will so it is to protect from such things you don't want a situation like this to clear out your account totally so by putting a limit they make sure you know you will discover it soon enough and the amount of money that is lost will be born day so that's the idea it's not a database consistency issue okay we have taken quite a few questions at this point I'm going to stop