 So, the question which we have is about how to access the data from remote databases using different protocols. Question is how to access data from a remote database, we already covered JDBC, JDBC goes over a network. So, it does not care where the database is, it could be on the same machine, it could be across the world as long as you are connected. So, at the lowest level JDBC or equivalent protocols, ODBC, few other protocols are there, can be used to access data remotely. But there are some things which directly JDBC does not handle, so there is some issues in updating, in atomically updating copies of data which are at remote places. In the area of distributed databases deals with a lot of these issues, we do not have time in this course to go into distributed databases, there is a chapter on it in the book, if you are interested you can read that up, but in this course we do not have time for that. Does that answer your question? See, there are protocols like SOAP, which can be used to retrieve data from remote databases, if the other end they give permission for us to retrieve. So, could you please explore more on that? Thanks for that question. So, I see what you are saying. Now, when we directly talk to a database, it is at a very low level. So, I interpreted your question as talking to a database directly. However, very often you do not want to expose the low level details of the database, including the schema, but you want to provide some functionality which you can access remotely. And the area of web services is based on this idea. So, the idea is if you heard of remote procedure calls, it is basically the same thing. The idea is that you can invoke a procedure which runs on a machine somewhere else, it generates results and then sends it back to you. Now, in order to execute that remote procedure, it may fetch data from a database, it may write data to a database, but that is orthogonal. So, the idea is that instead of directly providing access to data, you build a layer of software on top and let applications invoke this software through what look like procedure calls. So, this idea has come up around in various incarnations. SOAP is one such protocol or it is called web services. So, web services basically was a way, there were several standards defined where you use HTTP to execute a remote procedure call somewhere and you use XML representation to send and get data back. So, that was the plan and many services do it that way. These days for certain services, the overhead of dealing with XML schema and so on is a kind of a pain. So, many applications provides something which are sometimes called web service light. What they do is they use this representation called JavaScript object notation or JSON which is tightly integrated with the JavaScript language. So, they can make a call sending a data through back in using JavaScript objects. The thing executes, generates a result, sends a result back also using JavaScript objects and some of these have been made very simple through what is called rest or representational state transfer which basically means in short what it means is instead of defining a complex API, I can define basically something like a URL and then say that pass these parameters and I will execute and return the results in typically in JSON. So, it makes it easier to write applications which talk to a remote side, get something executed and get data back. If you think about it, a servlet is basically doing something like this except that the assumption is that a human is entering data and clicking submit, the servlet executes gives it back to a human. Now, what we want is the thing on this side may not be a human, it is an application program. In fact, it may be a program running on your own browser or it may be a program running on this server which needs to get data from another server to complete its tasks. So, this is quite widely used these days. So, there is a whole area of how to architect applications using these services or web services as a building block. This is called service oriented architecture or SOA. There is a lot of activity on standards and architectures in this space, conferences and so on. So, software engineering people are paying a lot of attention to these things in these days. And the other thing is that once you have cloud meaning you have computers sitting somewhere else which are doing your work, in order to do any work you have to communicate with them from your application. So, it is very important to have this services defined so that your application program can use all these services to get some work done. So, there is being used increasingly these days. Does that answer your question? I think we will take a few questions from chat at this point. How is session information stored in Tomcat? So, how do you store session information? Tomcat is basically running as a process. It has some area of the memory which is devoted to storing session information. So, there is surely a hash table in there. The cookie is would be the hash key. So, given the cookie which the browser gives back, it looks up that table using the key, gets the session object and then makes that session object available to the servlet. When you do session dot get attribute, it is actually fetching thing which is stored inside a session object. So, that is straightforward. The next question is can we run different databases on the same server? Can you run Postgres and MySQL on the same server? Absolutely. The only thing to make sure is that they do not run on the same port. All of them listen on a particular port so that incoming requests will go to that port. Now, each database has a default listen port which they have made sure will not conflict. So, Oracle has I think 15, 21. Postgres SQL has 54, 32. MySQL, I do not remember the number. So, if you just take the default configuration, they will just run with no problem. In fact, you can run two Postgres SQL databases on the same machine. Very often during development of Postgres SQL internals, we need to do that. We have modified Postgres SQL and it will run on a different port, but that is not a problem. You can have multiple Postgres SQLs on the same machine. Next question, what is the limit on data stored in a session? I do not think there is a predefined limit, but if you store too much stuff, your memory is going to get filled up on the machine. Related question from the same places, what is the session handling capacity of Tomcat? Again, I do not know what is the limit, but I do not think there is any hard limit. You can store a lot of sessions in Tomcat. Next question, what is the difference between thin, thick and smart client? A thin client is a browser. A thick client was an old thing, where you have to run an application program on the user's machine and it talks to a database behind. So, that is a thick client. Now, these terms, this is the original usage. Nowadays, they are not as well defined for the simple reason that even Gmail is no longer a very simple, very thin client because it is loading JavaScript and running it on the browser. I do not know what you refer to a smart client. Perhaps this is what you mean, but I am not sure. I have not seen the term myself. Next question is, you said HTTP is connectionless. Then how come request, response or client server synchronization will take place? I see that connectionless was a confusing term. It is actually a badly chosen name. Connectionless does not mean there is no connection. What it means is, the connections are not going to live for a long time. The connection is made for a request. The request has some handshake back and forth. When the request has been satisfied, the connection can be closed. So, it is not really connectionless. It is connection short, short connections. So, synchronization back and forth all happens on that connection. And when the particular request is being resolved, then the connection is closed. In fact, these days most web servers will keep a connection around for a little while. So, that if another request comes immediately, they will reuse the connection instead of opening a brand new connection. So, for a few seconds at least the connection will be kept alive. With that, I think I will stop. We are well beyond time. We will break here. And as I was saying earlier, I would welcome any questions and feedback. I see that a few people have raised their flag. I see Samrat Ashok Vidisha. If you have any comments, please go ahead. You are on now. Sir, this program on database management system is quite good and very beneficial for us. Actually, this is my personal view that I couldn't take GDMS as my favorite subject. But after attending these classes, I get in confidence for this subject. And it is quite very good, sir. Thank you very much, sir, and I wait for you, sir. Thank you for the feedback. Let's see if anybody else has any feedback. By the way, I welcome any suggestions on any changes that you want in this course, or also on any topics you would like to see covered in the advanced topics coverage, which is on the last day of the course. So, let's see if anybody else has their hands up. I see Indore having their flag up. Indore, I am going to connect you through. Indore, please go ahead. Sir, can you explain what is the main difference between NP-hard and NP-complete problems? Okay. So, the question is what is the difference between NP-hard and NP-complete problems? That is slightly outside the scope of a database course, and is very much in the scope of an algorithm's course. But since several people are curious, let me answer this question. So, first of all, how difficult is a particular problem? It's a basic question which algorithms people were looking at for a long time. And then in the 70s, a very interesting result came about from Cook, who later won the Turing Award for this. And the basic idea was this. People already knew that certain problems could be solved in quadratic time, certain could be done in n log n. But there were a class of problems for which people could not find algorithms which took less than exponential time. For example, if you have a classic problem known as travelling salesman problem, and to solve that problem on a graph with n nodes and vertices, any algorithm which was known took at least time to power n. Nobody was able to do it faster than that. So, the idea was can we show that these problems are intrinsically hard? And interestingly, even today, nearly 40 years after the initial result of Cook, nobody has actually been able to show that these are intrinsically hard. It's a really fundamental problem which is still open today after all these years. However, what Cook showed was something very interesting. He showed there is a class of problems for which if you can guess the answer, then you can verify it in polynomial time. Of course, that doesn't mean you can get the answer in polynomial time. You can verify the answer in polynomial time. That is the idea. And in order to guess this, there were an exponential number of choices for these problems. So, if you had exponential time, you could try each of those choices and then check if it was optimal and in the end you have the answer that you wanted. So, we know it can be solved in exponential time. But even the verification takes polynomial time and actually solving it in less than exponential time is not clear if it can be done. And what Cook showed is that there is a whole class of problems which are equivalent. And these are equivalent to a set of problems. There is another very well-known problem called the three-side problem. But let's not get into the details. The bottom line is for all these problems, they are said to be NP complete because we know that we can verify the answer in polynomial time. And further, they are all equivalent. If you can solve one problem in polynomial time, you can solve the other in polynomial time. But nobody has figured out how to solve any of these problems in polynomial time. So, it is believed generally that the best you can do with these problems is exponential time, although nobody has actually proved it yet. So, NP complete is a class of problems which have all been shown to be equivalent here. Meaning that we can solve them in exponential time for sure. And maybe if any one of these problems can be solved faster, we can solve that one also faster. NP hard says that it is at least as hard to solve as one of these problems. But perhaps it is harder to solve. There are problems which will most probably take even more time than NP complete problem. So, NP hard is usually it is used in the following sense. I have a problem and even if I do not yet have a solution which can be a way to verify the solution in polynomial time, if I could and there are exponential number of solutions, I would say it is NP complete. If I do not have that, but I can say that this problem is at least as hard computationally in terms of amount of time it takes as any one of these NP complete problems, then we say it is NP hard. It is at least as hard as NP. It may be harder. We have not yet shown anything about this particular problem. So, such problems are called NP hard. Usually what happens is initially you take a problem and say it is NP hard. Very often it is easy to show it is NP hard. To show it is NP complete, you still need to do a little bit more work. Sometimes we just say casually it is NP hard without bothering to explain whether it is actually NP complete or it is not. Most problems in the slides which I used, I may have mentioned NP hard a few times. Also those problems are probably NP complete. But the point of mentioning NP hardness was to say that these do not have a very cheap polynomial solution. We just left it at that, but most of them are in fact NP complete. I hope that answers your question back to you if you have any follow-up questions. Valchand, in-shoot, over to you. From CESHOP.net, which adapter or which driver interface do you recommend? I am sorry, could you repeat that question? It got cut in the middle. Can you repeat it from the beginning? Sir, yes sir. Connecting from CESHOP.net program, which program or interface do you recommend? Which driver? Driver or interface do you recommend to do what? Yes, yes. Accessing the data in Oracle. Oracle? Okay, so the question is if you want to access the Oracle database, which driver should you be using? Oracle provides a driver called JDBC thin driver. That is good enough for all our purposes. So, there is a corresponding class file which you can download from the Oracle site and so you have to load that into your Eclipse or NetBeans and that provides the JDBC thin driver and you can use that. Does that answer your question? From CESHOP program to Oracle. Okay, CESHOP program to Oracle. So, I am not sure what are the drivers for CESHOP. I am sure Oracle provides it. I have not used CESHOP to connect to Oracle. So, I don't know the answer directly but it should be easy enough to find the driver by doing a bit of web search. Okay, so one more question. Hello. Yeah. Sir, my question is can you give us a brief idea about Hadoop technology? Okay, the question is can you tell us a bit about Hadoop? Thanks for the suggestion. What I am going to do is on the last day when I have an advanced topics, what I will do is I will certainly cover a little bit about MapReduce including Hadoop and about big table and equivalent very large scale distributed data storage systems because both of these technologies are seeing increased use in building very large scale web applications. So, that is certainly something which many people would be interested in and I will cover it on the last day. Any other questions or suggestions from your... Okay, thank you sir. How do you... Okay, but let me say just a few sentences about Hadoop. For those of you who are wondering what is this thing called Hadoop since somebody asked that question. Basically, when you have highly parallel systems for processing very large volumes of data, traditionally parallel databases have been used. They have been around for a very long time. Parallel databases were first built in the early 80s both in academia and in industry. So, now they have been around all through the 90s, at least 25 years now that they have been around and people have been using them for analyzing very large volumes of data. But there is another community which found that they needed to process data which was not actually in databases. They were in files and they needed to do three cases more complex processing of that data than the SQL language supports directly. So, what they did is they built a parallel programming infrastructure which lets you first of all divide up the data files across machines and then run a job in parallel across all the machines and then collect the results from the individual machines and aggregate them to get a single final answer. So, this paradigm was called map and reduce. Map means you break up the problem into pieces, process them parallelly. Reduce means that you collect the results from all the things and reduce the local results into one single final result. So, this is a map reduce paradigm and map reduce again was proposed probably 30, 40 years back when parallel processing first became possible and was researched extensively. However, it has become very widespread in recent years because the amount of data which people have to deal with has exploded and people started writing programs parallel programs which ran fine when you had maybe 10, 20 machines. But when your scale is 1000 machines there are serious problems which start to arise. First of all when you have a 10 or 20 machines most of the time all the machines will be up. If you have 50 machines many of you have labs in your colleges with 50 to 100 machines I am sure and you know that at least a few of the machines will be dead at any point of time. Of course, these are machines which students abuse and pull plugs and do various things. But even if you put all those 50 machines in a controlled room in a rack and so forth 50 machines will have failures every now and then. Whereas if you scale to 1000 you are going to have a lot of failures. And then when in the face of all these failures running a computation becomes very difficult. And what the map reduce implementations which are available these days such as Hadoop do is they will make sure that even if a machine fails some other machine will take over and finish the computation and they offer a very nice infrastructure for parallelizing tasks easily 1000 even 10,000 way parallelism is possible with this infrastructure. And so that is increasingly in use and I will tell you a little bit about it on the last day. So, that is it for that question.