 what is concurrency control? It is how the system controls access to resources and there are two things which are typically done. First of all, there is a usually a locking system which lets you prevent concurrent access. But on top of it, there is usually a protocol which is followed and we will see these. So, there are many protocols. Some of them are based purely on locking. Some are based on other mechanisms and the idea is that there are some underlying things which are used and a protocol is a set of rules which you must follow in order to prevent problems. So, first of all what is a lock? A lock is a mechanism to control concurrent access to a data item. So, a data item can be locked in the context of databases. There are usually two modes, exclusive and shared mode. So, the idea is you have a lock manager and the protocol, locking protocol should ask the lock manager. It can tell the lock manager, please give me a lock on a particular data item in exclusive mode. When the lock manager can say, you have the exclusive mode lock or it can say give me a lock on this thing in shared mode. The lock manager may say you have the lock in shared mode. The lock manager has to ensure that if one person has an exclusive lock, nobody else can concurrently have a shared lock or an exclusive lock for that matter. Exclusive lock means that if one transaction has an exclusive lock, no transaction can have any other lock on that data item. All others who have asked for locks can be asked to wait. We told that maybe you will give you the lock after some time. Right now, you cannot get the lock. That is what the lock manager does. So, if somebody already has a lock in shared mode, a transaction asked for exclusive mode, the lock manager cannot grant it at this time. It will make it wait. And once the shared locks are released, then this transaction can get an exclusive lock. Just one transaction can get the exclusive lock. Others have to wait. So, that is the notion of locking. So, here is a lock compatibility matrix. A compatible means that two transactions can have compatible locks on the same data item. So, S and S are compatible. So, two transactions can have shared locks on the same data item at the same time. All the rest of faults, which means if one transaction has an X lock, no other transaction will get either an S lock or an X lock on the same data item. And by the way, the table is symmetric. What this means is, it does not matter which came first. If S lock came first, X lock came later, X has to wait. If X lock came first and then an S lock request came, then the S lock request has to wait. So, that is the basic idea. So, now, the minimum requirement with locking is that before you read, you better get a lock. Before you write, you better get a lock. And the lock has to be of a appropriate type. So, before you read, you better get a shared lock, lock S on the data item. Now, here is a transaction which performs locking, but it can still get into trouble as we will see. So, you need something more than just plain locking. So, what it is doing is, before reading it gets an S lock, it reads it, then it unlocks. Then it proceeds to S lock B, read it and unlock. And then it displays the total A plus B. So, it is doing locking. It is doing something which is essential that when you read a data item, you better have an S lock on it. Turns out this is not enough. It is not enough to guarantee serial identity. What can go wrong here? Supposing in between this step and this step. Now, note that there are concurrent things going on in the database. So, as soon as this unlock happens, maybe the CPU control gets transferred to another transaction. Let us see what that transaction does. Let us say that transaction transfers 50 dollars from A to B right here. In this gap between unlock and lock, this gap can be long period of time before this transaction gets another chance. In this interval, another transaction comes in, transfers 50 dollars from A to B. If you look at the total display, it is not consistent. It is not, the total here A plus B is not consistent. Why? Because when you read A, the amount had not been transferred. When you read B, the amount had been transferred. So, the total here is 50 dollars more than it should be. You are seeing an inconsistent result. It is not serious, if you just do this. So, what you need is not just locking, but a protocol. That protocol is what is called two-phase locking. Come back to this starvation and deadlock. Skip those slides for just a moment. So, look at the two-phase locking protocol. In the two-phase locking protocol, there is a growing phase during which transactions can obtain locks. It cannot release any locks. After that, the transaction can continue doing some work for a while and then it enters the shrinking phase. Meaning, in this phase, it may release locks, but it may not obtain any new locks. So, this is the protocol. First of all, you have to get read locks before you read, write locks before you, exclusive locks before you write. That is required. In addition, what this is saying is, in the first phase, you can keep obtaining locks, but the moment you release even one lock, you cannot obtain any more locks. After this, you can continue running and releasing locks till the end, but you cannot acquire a single new lock. The point is that once you have released a lock on an item, you cannot read or write that item. If you have a new item which you want to read or write, you cannot because you cannot acquire any new locks. So, the basic idea is very simple. In the initial phase, you can ask for locks. Later on, you can release them, but you cannot ask for any new locks. This particular protocol can be shown to ensure serializability. I will not try to prove it for lack of time, but it is actually pretty easy to show. It is very easy to show that transactions can be serialized in the order of their lock point. What is the lock point? It is a point where the transaction acquired its final lock. In fact, any point between here where it acquired the final lock to the point where it released even one lock, the first lock release. From the last lock acquire to the lock release, any point in between is actually fine. That can be called the lock point and you can show that transactions can be serialized in the order of their lock point. Now, that is very good, but it is not enough. So, I will come back to deadlocks in a moment, but two phase locking does not ensure freedom from deadlock. That is okay. We can live with that, but there is something worse. The first is that cascading rollback is possible. So, why can that happen? In fact, something worse can even happen. Here, you go back here. The transaction might release an exclusive lock somewhere here and then another transaction may get a lock on that data item and read it. What has just happened? You released a lock, an exclusive lock before you committed and another transaction can come and acquire that lock and read. So, you may even land up with a non-requerable schedule. So, the strict two phase locking does the following. It holds all exclusive locks till it commits or avoids. Until then, it cannot release any exclusive locks. It can release shared locks, but not exclusive locks. So, that is something which you have to keep in mind. Now, there is another variant of this, where you not only hold exclusive locks till end, but you hold all locks till end and this is called rigorous two phase locking. This has the extra property that the order in which transactions commit will be the order of their serialization. Now, without this extra thing, that is if you release read locks early, you are still serializable, but the serialization order may not match the commit order. In other words, a transaction which committed earlier might come later in the serial order. So, if you do not want that to happen, users may not be happy about it, then you should hold all locks till the end. In fact, if you see actual implementations, they tend to hold all locks till the end because they usually do not give you a convenient way to say release the locks in between. So, practically most lock based implementations actually implement rigorous. Again, there is some terminology difference between different papers and books. So, in the industry, people say strict two phase locking. When they say that more often than not, they mean what a textbook calls rigorous two phase locking. So, this terminology was taken from some original research papers which define these terms. They were supposed to be a definitive paper, but in industry people went ahead and started using the term strict two phase locking to mean what we call rigorous two phase locking. That is, all locks are held till the end. So, if you see somewhere where something says a database follows strict two phase locking, most probably it holds all locks till the end. So, that is a two phase locking protocol. It guarantees serializability. Now, before we proceed further, let me point out two issues which any locking protocol has to worry about. The first is deadlock. So, here is an example where transaction T3, red B and road B. At this point, it has an exclusive lock on B. Meanwhile, T4 came along and red A. It has a read lock on A. Then it asks for a lock on B. By the way, this lock S means it is asking for a lock. That is what this means. It does not mean it has been granted the lock. It has requested a lock on B. Can this lock on B be granted? No, because T3 has already got an exclusive lock on B. So, this cannot be granted. T4 is waiting at this point. T3 finished writing B and went ahead and now it wants to exclusive lock A. So, it requests lock X on A. Can this lock be granted? No, because T4 has already got an S lock on A. Now, what has happened? T3 is waiting because it needs a lock which T4 holds. So, it is waiting for T4 to release, complete and release the lock. T4 on the other hand is waiting for a lock that T3 holds. So, it is waiting for T3 to complete and release the locks. Now, clearly neither of them can make progress. So, we have a dead lock. So, what do we do? First of all, any lock manager should have built in functionality to detect such dead locks. How do you detect it? The lock manager can keep track of who is waiting for who. At this point, since the lock S cannot be granted, the lock manager realizes that T4 is waiting for T3 because T3 has that lock in exclusive mode. At this point, the lock manager also realizes that T3 is waiting for the lock on A, which means it is waiting for T4. So, now there is a cycle. T3 is waiting for T4, T4 is waiting for T3 and there is a cycle which is causing a dead lock. So, the lock manager should detect such cycles which have resulted in a dead lock. Now, what does it do when it detect such an issue? Luckily, databases already support the ability to roll back transaction. So, what the lock manager should do is tell one of T3 or T4 to roll back. So, what does roll back do? It undoes whatever it did and releases the lock. Say T4 is rolled back. T4 has not done any update. So, it does really nothing for it to do. It can release the lock on A and now T3 can go ahead and complete. On the other hand, if T3 is rolled back, what does it have to do? It is already written B. So, it has to undo the right of B and then it can release the lock on B, after which T4 can continue and complete. So, one of the transactions in a cycle must be rolled back and its locks released. So, that is one issue. The second issue is less problematic issue, but it can happen if you do not design your system badly, which is starvation. What is starvation? Supposing a transaction is waiting for an X lock on a data item, while somebody else has an X lock. So, right now the transaction has to wait. Now, supposing a new transaction comes by which wants an S lock. If you look at the compatibility, the new transaction wants an S lock. Whoever has the lock on it currently is an S lock. It is compatible. So, the new transaction gets an S lock. Say fine, no problem. Now, a third transaction comes in and also wants an S lock. It gets the S lock. At this point, maybe the first transaction with the S lock has done its work. It commits and goes away. Maybe the second one which got the S lock also commits and goes away, but the one which wants the X lock is still waiting. Now, a fourth transaction comes, which also wants an S lock and it is granted the S lock. And this can keep going on and on. Each transaction which wanted the S lock might complete and go away. So, each one of them is finishing. It is not causing an endless wait by itself. But, if you see the poor transaction which wanted an X lock, it is starving because one after another the other read transactions are going ahead and preventing this from making any progress. So, that is one kind of starvation. There is another kind of starvation when due to deadlocks, the same transaction is rolled back again and again. Both of these have to be avoided and concurrency control managers can the lock manager rather that part. Concurrency control manager is a bigger system. A subsystem is the lock manager. So, the lock manager can ensure this first problem would not occur as follows. Supposing a transaction is waiting for an X lock. It is wanted an X lock and you cannot get it because somebody has an S lock. It is waiting. Most lock managers would do the following. If a new transaction comes in and wants an S lock, they will not allow it to proceed. Yes, the lock is compatible with the existing S lock. But, if they allow it, the X lock transaction may starve. Therefore, they will make it wait. In other words, no jumping the queue. If somebody ahead of you in the queue is waiting, you have to stay behind in the queue. Even if you are compatible with the current lock on the data item, you have to wait for your time. So, if you prevent queue jumping, starvation by this means is prevented. The other part, repeated rollback also has to be taken in the lock on during deadlock detection in rollback. I will not get into the details. So far, what we have seen is the basics of locking deadlocks, starvation and two phase locking protocol, which is very widely used. In particular, the rigorous form where all locks are held to end. So, this is a special case. You can get a lock whenever you ask for it, as long as there is no conflict on that particular data item. Once you get it, you keep the lock until you finish the transaction. At that point, you can release the locks. This is the most common mechanism which many databases support. SQL Server supports it. IBM DB2 supports it. MySQL supports it. But there are other protocols, which we are going to see briefly. Snapshot isolation, which Oracle and PostgreSQL wrap up with a few more things about locking. The first thing is, if you wrote an SQL query, you are not explicitly saying get locks. So, how does the system decide what locks to get? If you have a transaction with a number of SQL queries, the first one may read a data item. The next one may update the same data item. So, when you run the first transaction, the database system sees that you are reading it. So, it is going to give an s lock. When you run this next query, which is part of the same transaction, we turned off autocommit. So, the next query is part of the same transaction. It is updating the same data item. So, you also need lock conversion. In other words, in the first phase, you can acquire an x lock. You can acquire an x lock on a data item, which you have not locked. You can also have a situation where you already have an s lock. And now you want an x lock. And this is called a upgrading of the lock, or lock conversion from s to x. This is acceptable in phase one. Conversely, in phase two, you can release an s lock. You can release an x lock. But you can also do the following. You can downgrade an x lock to an s lock. Now, again, most implementations do not support explicit downgrading. But upgrading happens automatically. Whenever you have a SQL query, which first reads, and then another SQL query, which updates the same item, or even the same SQL query first reads the data item, and then updates it. It can happen. It is not uncommon in fact. Then upgrade will happen. So, this slide talks about automatic acquisition of locks. Whenever somebody needs to read, you get a read lock. Whenever you need to write, you get a write lock. This shows some other details. I am going to skip the details. Not particularly important. So, there is a lock manager subsystem, which does all the handling of locks. So, the simplest way to think about it is to think of a lock manager being a separate process to which you can sign lock and unlock request. This is a perfectly valid implementation. But it turns out that you can get much higher performance by not having a separate lock manager process. But instead, have a lock table, which is data structure, which is maintained in shared memory. And all the processes in the database system can access this shared data structure through a mutex. And a library function implements the lock manager functionality. I will not get into details, but that is how it is practically implemented. But you can think of it as being a separate process to which you sign messages. Couple of slides on deadlock handling. First of all, once you have locking, deadlocks can occur. There are a few tricks to avoid deadlock, to prevent deadlock. One way is to require each transaction, prediclase all the data items, which it needs to lock and what are the locks. If it is possible, you know, the problem with this is that when you write an arbitrary transaction, it is hard to predict what all it will access. It is Java code and the database does not know what it is. So, this is not usually possible. Some cases it is possible. Then there are some other protocols, which are based on ordering of data items. And this is a variant of this is actually very useful. So, we saw a situation where the following happened. There were two data items, just two data items. Now, two transactions, both of which just update these data items. So, somebody wrote a transaction t 1 like this, write a. There are also reads I will forget. Do not worry about where, assume the reads happen along with the writes. So, read plus write a. Now, t 2 does write b, write a. That is what t 2 does. Now, supposing the transactions ran one after the other, there would be no problem. But the fact is that, both these transactions, they get executed repeatedly. So, sooner or later what will happen is, you have this particular ordering. This happens first, this happens second, this happens third and then this happens fourth. This can happen. Now, what happens? This guy gets an exclusive lock on a. This happens second. So, he gets an exclusive lock on b. Now, this guy wants an exclusive lock on b. He is going to not be able to proceed. This guy is still active. It now wants an exclusive lock on a. It cannot proceed because this has a lock. t 1 and t 2 are waiting for each other. We have a classic deadlock. It turned out all that needed to be done was t 2 was going to be re, would be rewritten to also access things in the same order. So, t 2 was rewritten very easily. So, first access a and then b and that is all that was required. The moment you do this, deadlocks can never happen. Why? Supposing, t 1 wrote a and t 2 comes along and wants to write a. It is blocked because it cannot get the lock. Now, t 1 can proceed and get the lock on b and complete. t 2 cannot interfere. Conversely, if t 2 first got the lock on a and t 1 wants a lock, exclusive lock also on a. It cannot proceed. t 2 will complete before t 1 can proceed. So, there will be no deadlocks here. So, ordering is a very useful tool to prevent deadlocks and where possible you should do it. But, it is not always possible because transactions can be very complex. This is a very simple pair of transactions. I already mentioned that you have to detect deadlocks which are basically cyclic weights. So, here is a set of transactions and the edge denotes who is waiting for who. This edge means t 18 has a lock and t 17 wants a lock in a mode which is not compatible with t 18. Therefore, it has to wait for t 18 and so forth. Here, there is no problem. Somebody is waiting for t 20. t 20 is not waiting for anybody. It will eventually finish unless it creates a deadlock later. If it does not create a deadlock, it will finish and go away. Now, t 18 is not waiting for anybody. It will finish and so forth. But, here we have a cycle. t 18 waiting for t 20, t 20 waiting for t 19, t 19 waiting for t 18. So, this is a typical deadlock. So, you have these graphs called a wait for graph and you look for cycles in the graph. It is pretty efficient to look for cycles. There is a simple DFS algorithm and lock managers typically implement this. So, once a deadlock happens, they will ask one of the transactions to roll back. There are again some issues on how far to roll back. It turns out that you can do partial roll backs. Many systems supports that. Instead of completely aborting the transaction, you roll it back partially enough to release some locks and then allow the other transaction to get the lock and proceed. Then, this transaction which is partially rolled back can go forward. This cannot always be done. Total rollback is the default. So, let me stop here. Here is a quiz which you can read and I will tell you the answer in a few minutes. Meanwhile, I will take questions. Sorry, there is a glitch in the slide. It says lock A of B. This is a mistake. This should have been lock X of B. Just pretend that this thing is lock X and do this thing. Meanwhile, we can take questions. We have COEP Pune, please go ahead. Sir, actually my question was regarding can you throw some light on Mahut. Morning, we discussed about Hadoop, but then there is one terminology called Mahut. Yeah. So, Mahut is a data mining package which is built on top of Hadoop MapReduce. It is Apache Mahut. There is a number of data mining algorithms which have been implemented using the MapReduce infrastructure, Hadoop. So, you can actually run these on very large amounts of data. That is the idea. You want a data mining algorithm which is scalable to very large data. Now, if you go back to standard data mining, it has been around for a while. There are many algorithms which are not very efficient. They will work beautifully on small data, but if you try to run it on really large amounts of data, they will die partly because the complexity is very high. But even those which do not have a high inherent complexity, if you have to run them serially, it would be very, very slow. So, the idea of Mahut is to take those algorithms whose complexity is reasonable and provide parallel implementations of those algorithms which can be run on very large amounts of data. So, that is what the Mahut project is about. You can go look it up on the night to find out more. Sir, my question is on the transaction concept. Now, just now you talked about conflict serializability. Once we have ensured that the schedule is conflict serializable, do we need to check for view serializability also? The question is, if we have a schedule which is conflict serializable, do we need to check separately for view serializability? The answer is that any conflict serializable schedule is also automatically view serializable. There is another notion of equivalence of schedule which I do not have time to get into. That notion is weaker than conflict serializability. In other words, there are schedules which are view serializable which are not conflict serializable. But every conflict serializable schedule is definitely view serializable. Now, if you do not know what is view serializable, I do not have time to get into that here. But there are details on the book slides and in the book itself. You can go read it up. We will take some more live questions, but before that let me take good question from chat. In an ATM machine, when we withdraw some amount, ATM machine will do the transaction and you get a confirmation. But in some situations, if the ATM machine does not have enough balance, it rolls back the transaction it committed earlier. So, why are they allowing a transaction to commit at the database before checking the amount available in the ATM? This is an example of a badly designed transaction. This should not happen. Somebody has goofed up in the design where they should have first looked at the amount you asked for, checked if the ATM machine has that amount because access to the ATM machine luckily is serial. Nobody, no two people can access the ATM machine at the same time. So, you have the lock on the ATM machine while you are in front of it. So, now they should have first checked if there is enough balance. Then they should have gone to the database and checked if you have enough balance and then let you in the transaction either proceed or tell you, no, you do not have enough balance. One of the two things should have been done, either okay or not okay. But I think what this person is complaining about is that some bank has goofed up and there are situations where after the amount is deducted from the bank, then the ATM realizes it does not have enough money and then the transaction is rolled back in the bank. In fact, rollback in this case might mean there is a fresh transaction which adds the amount back okay. So, this is a bad design, but sometimes other problems can happen. The ATM had enough money, it was all ready, but when it tried to dispense the amount, maybe it is jammed. There is a mechanical problem, it is not able to dispense the amount. If it detects this, it will run a transaction to credit the amount back in your bank balance. Sometimes it may not detect it and it may think you have actually got the money. Then it becomes your headache, it is no longer the bank's headache and you did not get the money, but you go back and check your passbook and you see that they have deducted the money. Then what to do? You go complain to the bank saying, hey, you are cheating. You did not give me the money. Then the bank may say, they may go back to the ATM and check the balance there. If there is extra money in the ATM, that is proof that it did not give you the money. In addition, most ATMs also have a camera to record what is happening. In case there is fraud, they have a picture of who was there. Now, that picture can show whether the money actually came out of the machine or not. Of course, if you say that, hey, I asked for 15,000, the ATM machine gave me 14,500. That is harder to prove using the video, but maybe even in that case, they can check the balance cash in the ATM machine and check in some cases, they may decide you are not lying and give you the money. There is a bigger issue here that so far, we have looked at transactions inside the database, but the ATM machine is a classic example of transactions that happen outside the database. And here, I have already given you an intuition of how this is handled. A transaction happens in the database. It is committed. Your balance is deducted. Then the rest of the transaction happens outside the database. If everything goes well, that is fine. If something goes wrong outside the database, some corrective measure has to be taken to conceptually roll back the transaction in the database. But it is too late. The transaction is committed in the database. So, the trick is to have a compensating transaction. In this case, a transaction which credits some money back into the account. If you see your passbook, you will see two entries. You will see a debit followed by a credit. So, it is not transparent to you. It is clear that something happened which was rolled back. It is not atomic in that sense. But you are happy that your money is back and you are willing to overlook the fact that there was a debit and a credit. So, that is acceptable in the banking scenario. We have Vivekananda Tamil Nadu, Tiruchangut. Please go ahead, Vivekananda. How does HDFS get a good throughput in Hadoop? How does HDFS give good throughput in Hadoop? So, first of all, Hadoop is not necessarily a super efficient implementation. It is very widely used because it is open source. It is free. But you are paying a price in that stuff is done in Java. There are some overheads for doing it in Java. And HDFS is also written in Java, which means HDFS also has overheads due to Java. So, there is some price for all of this. If you compare with what Google does, their GFS and their MapReduce are all written in C or C++ or some other dialect of C. So, their implementation is likely to be far more efficient. But Yahoo and Apache went with Hadoop because there are two issues here. One is the efficiency of execution. The other is issues because of programmer error. If you are running a C program and it has an error, first of all, it is harder to debug C programs. They generally have a lot more errors. So, there are going to be a lot more runs where people run something and get a wrong output program crashes and so forth. Those problems are less in Java. So, productivity goes up, but it is certainly at some cost in terms of the throughput that you can get. So, yes, there is a price you pay. So, now people are providing, for example, Hive, which is SQL based. But the underlying implementation, they are moving to a non-Java platform, which might be faster. Now, that in turn runs on top of H base, which itself is, I think, Java based. So, Java has some overheads, but it is a trade-off. You could get some improvement by using C, but Java is not so bad. It is not horribly inefficient. It is a little less efficient, but not horribly so. So, it is considered good enough for most applications. Royal College, Kerala, please go ahead. Sir, how you use CloudSim for doing some simulations in Cloud. Can I relay this Hadoop with CloudSim? I am not familiar with CloudSim, but I believe it is something which lets you simulate what would happen if you were running in a Cloud. Is that what it is? I am not used to it. I am not sure. So, without knowing exactly what is CloudSim, I cannot relate it to Hadoop. But from the name, it sounds like a simulator which can simulate long-distance delays and so forth, which would reflect what you would see if you were running something in the Cloud. If you are running in the Cloud, the problem is that when you are communicating with service which is running on the Cloud, it is going to take a long time for your packet to get from here to there. For example, if your service is running in the US, you are looking at several hundred millisecond delay for packets to get there, which is partly simply because of speed of light delays. And then partly because there are many routers along the way, each of which may introduce some delay. So, I think CloudSim, if I am not mistaken, helps you see what will happen if you do something in such a setting. And it has no relation to Hadoop, but I may be wrong because I do not know exactly what is CloudSim. So, if you have a follower, please otherwise leave it at that. We have Sasuri College, Tamil Nadu, please go ahead. My question is, can you give us a simple explanation for difference between DBMs, RDVMs and ORDVMs, OVDVMs? A difference between relational, object-oriented and object-relational. Now, this sounds like exactly the kind of question you should not be asking an exam, but I see unfortunately many exams ask people to define concepts like this. So, I will refuse to answer such a question and I will strongly urge you not to ask such questions anywhere. I know you are asking the question because such questions come in exam, but we need to reform our exam system. I am going completely off topic, but you have triggered off something which I have been meaning to say during this workshop. I did not get the chance. And the point is that we want to ask questions which test some deeper understanding. Now, how does it matter, you know, how an ORDVMs is defined, how an ORDVMs is defined? Can you define Hinduism? Can you define Islam? Can you define, you know, a car? Does it matter? Okay, does, do we care about it? You know, there is a war going on between cars versus quadricycles versus otterikshas if you are familiar with that debate, defining these things. It may matter to some people, but for you, you get in sit and it takes you somewhere. That is what matters and then you focus on things which really matter and that is the same kind of attitude we should have towards databases. Focus on things which matter and sometimes things which are intellectually stimulating which lead to some very interesting things. So, maybe normalization is kind of less important today because ER modeling takes care of most of the issues, but it is still a very nice topic intellectually and it is good to expose students to some such deeper mathematical topics in any area. So, you want to look at either mathematical elegance or practical importance and then ask questions which are related to that to test the understanding of the student in that area. So, if you ask a student to write an SQL query or to find a bug in an SQL query or to fill in the blanks to complete an SQL query. These are all things which require understanding. Defining an ORDBMS, ORDBMS, you can look up the definition on the web if you care and it is described in the textbook. So, go look it up, but please do not ask such questions in exam. Like I said for the same reason I will not answer it, I will say go look it up on the web or in textbooks. If you have any other question, please. This is a small doubt in normalization for the ORDBMS, how to implement it for the hash table and to the hash mapping table? How to implement in normalization for the mapping storage? How to implement a hash table? So, there is plenty of you know code available in textbooks on how to implement a hash table. Any algorithms textbook will talk about it. Now, the difference in a database is that if you are building a persistent hash table in a database that means the contents have to reside on disk and should survive system shutdown and restart. That is the main difference from a in-memory hash table. Now, if you look at practical implementations, yes Oracle did implement hash organized tables with hash file organization and essentially they created a file with a number of blocks. A hash function would identify a block in the file and the record would be stored there. And what if the block is full? That block would have pointed to an overflow block and you would go there. So, this kind of thing was implemented. There are some issues with this. What if the table grows? You need a hash function which hashes to more blocks. So, again people came up with other implementations of these. But in the end it turned out that they did not offer any major advantages practically over B plus trees. So, today practically speaking Oracle you know I do not know if they still even support it. They kind of say do not use hash file organization. Postgresql also did more or less the same thing. So, all of them have just standardized on B plus tree because there was no major benefit to hash table organizations which are used for storing relations. But in memory hashing is used widely in all the query evaluation systems. They all have it because in memory hashing is much faster than having in memory search tree. So, that is used. Did that answer your question or was it something else? Back to you. Thank you sir. What is this? Raishoni Institute Pune. So, my question is what is the optimistic concurrency control? Let me answer that. The question was on what is optimistic concurrency control? I have a few slides on that which I had planned to finish today. But because yesterday's topic rolled over to today, a bit of today's has rolled over into tomorrow. So, I will cover optimistic concurrency control very briefly tomorrow. However, even tomorrow there are other topics. So, I would not be able to get into it in detail. So, if you are really interested in this, it is available in the book. But to summarize it since you ask the question, optimistic concurrency control essentially lets a transaction proceed without locking, but keeps track of what it read and what it wrote. So, based on these read and write sets, when the transaction is ready to commit, it makes a call to see if any concurrent transaction has a conflict based on the read and write sets with this transaction. If there is no conflict with any other concurrent transaction which committed earlier, this transaction is allowed to commit. That is in summary what optimistic concurrency control. It is called optimistic because you allow it to run assuming that there will be no problem till it is ready to commit. At that point, you will compare its read and write sets with the read and write sets of all transactions which were concurrent meaning they were running while this was running, but committed before this one. That is the intuition. So, the details we will look at very briefly tomorrow. But if you are interested in it, you can go ahead and read it up. It is there in the book chapter. Any other questions? Thank you sir. Yes sir. We have a Tandai Peria. Velour, please go ahead. What happens if we give the initial date function month more than 12? Month more than 12. Is it valid or is there any constraint? Yeah. So, month more than 12 would be invalid clearly. So, that you know. So, the way you do it is you say date and then a string. So, it will it should give us syntax error. If it does not, that is a bug in the implementation. There is another issue of seconds beyond you know 0 to 59 is normally the allowed seconds. Occasionally, you have a leap second. So, second with a value of 60 might be required. So, that may be allowed, but month beyond 12 should not be allowed. Any other question from you? So, what is the software tool available to maintain rate? So, question is how do you set up rate with software? Now, Linux comes with rate tools built in. So, when you install Linux, you have to do a little bit of work to tell it that you know this set of disks should be kept in a rate mode. So, if you are familiar with Linux installation, the initial setup steps where you do hard disk partitioning and so on. At that point, you can modern Linux installer will let you define a rate right at that point. And the software is built into Linux. You do not have to install any new software. For Windows, also I suspect the same thing holds, but I have not set up a rate with Windows. This is for software rate. Again, software rate has some drawbacks. You should not be using it for bank accounts and such like. It is perfectly fine for your desktop computer, maybe even for your department computer, but it is not good enough for a bank where if you lose the last transaction, it means a lot. For you, if you lost the last file which you saved, big deal. But for a bank, if you lost the last transaction, then there is a problem. Back to you if you have any other question. Sir, what is the command to know what are the servers or applications are running in Ubuntu? You want to know what all servers applications running in Ubuntu. So, in Linux, not just Ubuntu, any version of Linux, there is a PS command. The PS command normally tells you what processes you are running. So, if you say PS space minus AX, it will tell you what all processes are running in the system, all processes, whether they are services or user processes or whatever. But let us stick to database questions here. Sir, we will note down the command. Yeah, so it is not a database question. I really do not want to spend too much time. But this particular one might be useful to you. So, it is PS space minus AX. PS is what processes are running. Similarly, LS will show you what all files are in a directory. If you want to know their sizes and so on, you can say LS minus AL. So, these two might be useful for you in your Hadoop assignment because the way we have asked you to run Hadoop requires a little bit of command line. So, you have to open a shell script and run it. So, I think it is worth showing these two commands to you. Hello, what is the abbreviation for peanuts? You mean expansion for peanuts. That is a good question. In fact, peanuts was built by my PhD advisor Raghu Ramakrishnan. And when I visited him, I asked him, you know, how did you come up with this term, peanuts? So, his claim is there was a bottle of peanuts on his table and he had to come up with a name. So, he says peanuts. I do not know if you… I do not think he was serious. But there is no official expansion for peanuts. Let us assume it is just peanuts. Any other interesting questions? Sir, is there any automatic software available for doing schedule in transactions? So, you want software for scheduling in transactions. First of all, the scheduling, once you submit your query to the database, the database is handling all the scheduling decisions. You cannot control it externally. It is all internal to the database. Now, if you want to control when the different transactions submit things to the database, that kind of scheduling, I am not familiar with… I know there are software packages available for deciding on scheduling for, you know, like other things. If you have construction or some other such activity, so you create, you know, steps, dependencies between steps and then you can get charts for scheduling these. Normally, in databases, this kind of issues do not come up. So, I do not know of specific scheduling software. But for specific applications, like real-time databases, people have shown that certain ways of doing scheduling might be better than certain other ways. So, people have built schedulers for real-time systems. But again, you need to get into the database to build it in the database. You cannot do it externally. If you have access to PostgreSQL, you might be able to hack the PostgreSQL code to build it. And maybe people have done it, but I am not familiar with what tools are actually available. There is a lot of research on this. Practical tools, I am not sure. Okay, last question. Sir, can we insert tuples in a relation with special information? Insert a tuple with special information into a relation. I think I mentioned this earlier. There are types for spatial data which are part of PostgreSQL extensions. Oracle has it. You can certainly insert tuples. There is also syntax for this. You can go read it up. Oracle spatial extensions or PostGIS, which is Postgres with some extra data types for spatial data. I think we should perhaps stop here.