 the 38th lecture in the course design and engineering of computer systems. In this lecture we are going to study more about the topic of atomicity. So what is atomicity? We have discussed this a couple of times before in the course. So atomicity is a property where if there is a sequence of operations then sometimes in computer systems you require that all of these operations they do not appear as different different steps but they appear as one atomic unit only then the system will work correctly that is you either complete all these operations or you do not do any operation at all. But for this atomic unit you cannot execute half of this atomic unit and leave the other half right no partial execution should happen. So such set of operations are called transactions and all the operations in a transaction must execute or none of them should execute. If you execute a transaction partially then you will have all sorts of errors and bugs in your system, your system will not have correctness okay. So that is atomicity the ability to execute all the operations within a transactions or if there is a failure do not execute anything at all okay. So that is called all or nothing atomicity. So what are some examples? You know we have seen this before when we were studying file systems you know for a file system system call if you are implementing there could be many changes being made to many disk blocks and all of these should happen together but you cannot do half of them and leave the other half say if you know some failure happens in the middle you cannot just do half and half. For example you know the sequence of operations are if you are writing some data to a file okay. So there is a new block with the file data then there is your inode in which you are storing a pointer to this new block. You know you are storing the block number in your inode and there is a bitmap where you are marking this block as occupied all of these changes should happen together. If you do some and you do not do some you will run into issues. For example if you add this block number to the inode but you do not fill in this data in the block then what will happen when you are reading this file you read this block you will find garbage in the file. If you you know allocate this block to this file but you do not mark the bitmap then what will happen the bitmap will think that this block is free it will allocate it to some other file and your data will get overwritten right all sorts of bad things can happen. So if a failure happens you skip all these changes do not do any of these changes at all but if you do them do all of them together okay that is atomicity. There are other examples also that we have seen you know if you are doing a checkout from an e-commerce website the billing and shipping should happen atomically you cannot say that I have charged your credit card but there was a failure I did not ship the product to you that is not acceptable. Similarly a common example is you know transferring money from you if you are transferring money from one account to the other you deduct the money here you add the money here you cannot deduct the money here and forget to add the money here that is not acceptable. So why will atomicity be violated of course you know people are not out there to you know violate atomicity on purpose but due to bugs or you know most common reason is failures you are in the middle of a sequence of steps and a failure has happened and therefore some were executed and some steps could not be executed and whatever you executed were maybe lost some changes were lost due to memory contents being flushed out things like that right due to failures is when atomicity is usually violated in computer systems and in such cases when failures happen of course we cannot avoid failures but when any failures or faults happen what we want to do is either you complete the operation or you do not do anything at all and that is also okay you say a failure happened this account transfer did not take place this checkout did not take place is okay but you cannot say failure happened I did half of it I did not do the other half of it. So in this lecture we will see what are the techniques to ensure atomicity of transactions this is also a part of fault tolerance okay. So we have studied this principle of modularity right applications are built as modular components and with modularity atomicity is very very important if you do not have atomicity then it is very hard to build modular applications why is that so suppose you have module A that is sending some request to module B right module B has some API it has a set of requests that it is exposing you can ask B to do all of these different things and A is making a request to be asking it to do a particular operation. Now when processing this request suppose B has to do multiple steps to handle this request then if B completes all of these steps then B will send back a positive reply to A saying your operation is done now we could not complete any of these steps then B will you know send a failure or no response will be returned to A that is okay then A will retry the entire request okay B does all the steps we are good if B does not do any of the steps we are good but what if we perform some of the operations I did these two steps but I didn't do the other two steps in such cases what is the response that B will return back to A it cannot return a positive response it cannot return a negative response and how will A retry there is no API request to say do half of the previous request know that does not exist. So therefore if you want correctly clean modular applications then you need atomicity if a component is exposing an API and saying I will do this job for you then you better do the job for you or do not do it at all in which case I will retry that is okay if you do not do anything I will retry if you do it well and good but if you do half of it then the other component is kind of left in a lurch doesn't know how to retry okay. So therefore when you design APIs of modules always keep this in mind you know give your APIs in such a way that you can ensure atomicity don't have an API where you are doing like 20 different things and therefore sometimes you do half of it sometimes you will not do half of it then your return values will all be messed up. So therefore when you design APIs make each service fine grained enough that you can execute it atomically. So there is also another definition of atomicity which we used previously in the course which is what is called before or after atomicity that is if you have a sequence of operations these operations should appear like one unit they should either occur completely before or completely after other operations but they should not be overlap between these operations. So sometimes you want transactions to be fully before or fully after other transactions but not you know you don't want this kind of an overlapping scenario. We have seen an example of this you know when threads access shared data for example if they access a shared counter then you have these three steps where you know you load a counter then you increment the register value then you store it back into memory and we have seen that you know if these sequence of operations overlap in a certain beard way then you will have race conditions and your counter is not updated correctly we have seen all of this when we were discussing threads. So in such cases what we said was we need atomicity either you know one thread fully finishes then the other thread starts or after this thread finishes the first thread starts but you cannot have this overlap that is also considered as atomicity but to differentiate between the two types of atomicity that is all or nothing atomicity this is before or after atomicity a transaction appears like one unit it lie there fully before another transaction or fully after another transaction and we have seen this we have studied before how do you do this if you remember you use locks. So each thread will acquire a lock do its set of operations release the lock therefore this kind of overlap cannot happen because there is a lock statement here and there is an unlock statement here therefore this thread cannot start executing in the middle. So locking and of course we have studied locking deadlocks how do you take care when you have multiple logs all of this we have studied before. So this is another definition of atomicity which is called before or after atomicity but in this lecture we will mostly focus on all or nothing atomicity. Now how do you do all or nothing atomicity the most common idea in computer systems is what is called right ahead logging or simply called logging which is all the steps that you want to do in your transaction you write them you log them onto some persistent storage and then you execute them this will guarantee atomicity how is that possible let us see in more detail. So whenever you are doing right ahead logging you know you have a transaction you have a series of actions that you want to perform atomically that is called a transaction. So you do not just directly go start executing these operations one by one you know then you would not have atomicity what if you do half of it and then you crash. So instead you will execute them in the following manner you will first start a new transaction assign a unique transaction ID then you will you know identify some log. Log is nothing but some storage space in some persistent place like a disk or a data store database something where it will stay safe even if failures occur power failures occur. So for example you cannot store your log in main memory because that can get wiped out after failure. So you will store it in some persistent storage. So you will first assign a unique ID for this transaction you will start logging and what you will do is in the pre-commit phase so this logging this right ahead logging algorithm has multiple parts. So the first part is what is called the pre-commit phase in the pre-commit phase what you will do is you will record all the operations that are part of this transaction that have to be performed as part of this transaction. You will not perform them yet the original copy of the data you will not touch but you will only note down I will do this I will do this I will do this I will do this in the pre-commit phase. Then after logging all of these operations then you will commit you will write an entry saying I am done this is my list this is the beginning of my set of operation this is the end you will write a commit point into your log. After this then you will start installing your transactions then you will start replaying or installing these transactions executing these operations one by one using your log. Now you start modifying the original application data and once all of this is done of course then your transaction is completed all the sequence of steps have been done then you will clean up your state and you will clear this log entry that is the end of your transaction. So how is this helping us what so now let us see how is this helping us to recover from failures. So suppose you know you have a failure before this point in this pre-commit phase you have a failure then how will your log look like your log will look like there is some begin and there is a bunch of transactions but that is it there is no end there is no commit nothing you know you have failed as a power failure you restart back again then if you look at your log somewhere on disk you will find something like this. Then you know that the list set of steps is not complete therefore you will not do anything to this transaction you will abort this transaction and your original data has not yet been touched in the pre-commit phase. So therefore it is as if nothing happened no change has been made you are good. What happens if there is a failure after the commit point you have still not completed all of these operations you only done some of this but there is a failure but the good thing now is you have the list of operations in your log when you restart you see this log entry you know that you know it did not finish because if it would have finished you would have cleared the log entry therefore you know that it did not complete it failed somewhere in between. So what will you do you will start replaying the log you will start executing all of these operations one by one and you will get to the end of the list and then you will clear the log entry and therefore if a failure happens after the commit point then all the operations in your transaction are complete if it happens before the commit point nothing is complete. So either way in no matter where the failure occurs you have all or nothing semantics in your application this is the key idea of write ahead logging. Now there are a couple of ways in which you can do this write ahead logging what we have described so far is what is called redo logging that is you do not touch the original data initially you will only log all the changes and then you will say commit and then you will proceed to install all of these changes. So during the recovery phase it is called roll forward recovery. During the recovery if a failure happens in between a transaction then during the recovery you are installing the transactions you are making your changes to the original data but pre-commit no changes are made to the original data and any recovery is done by redoing the operations. An alternate way is what is called undo logging that is in the pre-commit phase you will log the old value of data and you will directly make changes to the original data. So your original copy is edited but only after the old value is preserved. So now you have all the old values stored in your transaction and then you will write a commit entry. Why is this done? Because if you have changed made only some changes and you crashed in between you have not finished your transaction then when you restart you can see that you know some changes I have made but I have not made all the changes I do not see a commit point therefore the transaction is not complete. Using these old values you can undo the changes during recovery that is called roll back recovery you can roll back if your transaction has 5 steps you only did 3 steps you did not finish the last 2. So what you do you will roll back you will undo these first 3 steps also so that you have your all or nothing semantics. And how are you undoing you are able to undo because you have the old values stored in your log you are not just directly editing you are first saving the old value then editing the original data. So you can see that there is a slight difference in semantics between redo logging and undo logging but both of them achieve the same purpose in the end and you can choose which technique you want depending on your application for example undoing operations may not be easy in some applications you know you cannot say you cannot add money into a bank account and later on say oh sorry I am undoing that it may not look good. So depending on your application you can either do redo based right ahead logging or undo based right ahead logging. So let us see some examples you know so this crash consistent file system this is what we have seen before when we studied file systems. If you are using file systems in a situation where you need highly consistent entries on your disk then you will have to use some kind of logging so that even if crashes happen in between a system call your file system remains in a consistent state it will never be the case that you know you have added a pointer to a data block to an I know you did not write data into the data block so your file will have garbage such scenarios will never occur if your file system is crash consistent. So how will you do that whenever your file system updates multiple disk blocks what you will do is in the pre-commit phase you know here are all your disk blocks that are you know the I know data block bitmap all of these have to be updated as part of a system call you will first make the changes on a separate copy and you will write all of these change disk blocks to your log. You will not change the original disk blocks you will not change the original I know but you will make a copy of the I know then you will edit it here you will edit the data block here edit the bitmap here all of these changes you will make in the log then you will write a commit entry and now you will start installing you will you know copy this disk block onto its original location you will merge this copy with the original location and so on you will start installing the transaction. So that if a failure happens in the pre-commit phase then you have not touched the original values you are good if a failure happens in the post commit phase then you already have all the changes ready with you you simply go and replay them from the log in this way your system call will be completed. So right ahead logging is used to ensure that your file system is consistent in spite of a crash. Now let us look at a more complicated example suppose you have again going back to our standard example of an e-commerce system suppose you have a server that is managing all the checkout requests you know user has added some items to a shopping cart and the user says checkout then you have to you know build the user's credit card for all of these purchases and then you have to ship the order. So let us consider some such checkout server and there is an API this checkout server has a API where you give it a shopping cart it will check out that order and it will ship the order will do the billing and shipping for you. So you have your front end or client whoever is making a request to this checkout server and this checkout server will do the billing and maybe to do billing it will contact the bank or somebody else some other server it will contact to do the billing and then it will contact somebody else to do the shipping it will finish all of these operations and it will send a confirmation back to the user. This is your checkout server and now suppose let us take this example and let us see now this checkout server has to do two things atomically it has to do the billing and shipping atomically we have seen that you cannot it cannot be this case that this guy has billed your credit card but then a failure happened and it forgot you know do the shipping to you that should not happen. So how will you do this checkout atomically? So again you can use the same idea of write a head logging you know instead of directly executing the operations you know the billing and shipping what this checkout server will do is it will create some entries you know it will create a order information where it will note down all the details of the order these are the items this is the billing information this is the shipping information all this order details will be created and they will be stored somewhere in some database okay in some persistent storage somewhere this order information will be stored okay it will all be logged all the details. At this point the transaction is committed and this can be a confirmation can be sent to the user and now you start performing all the information is there you start executing them one by one okay billing shipping everything you start executing. So that even if in all of these steps somewhere in between you fail then when you restart you can resume okay you can you know all the information is there with you and you can resume okay if the server fails before the commit point of course you know the order is not even recorded in the system there is no confirmation sent to the user everything is lost and the user will retry that is okay you know the user will retry with a new order and the second time hopefully it will happen and if the server fails after the commit point you know the billing is done then all the information is there at the server either a new replica you know new passive replica became active or this server only restarted whoever is continuing this operation will have all the information in the log can resume execution of the operation and the user won't notice a failure right in this way you can use write ahead logging to first note down everything you have to do and then do it so that in between somewhere if you stop you still have a record and you can do it again properly. So here but there is a small complication you know if you are thinking through this carefully you will notice a small complication. So consider this scenario okay the server recorded all the details of the order and it starts executing the order you know it has a sequence of steps it has to do it it has to do first billing it contacted the bank it charged the credit card at this point it was just about to do the shipping it didn't do the shipping yet and at this point the server crashed and then the new server replica came up or maybe the old server restarted whatever and then it look at its log or this order is there in my log it will start executing the order again it will start replaying the order again and then it will start once again it will say billing and it will once again bill your credit card it will talk to the bank charge you again is this acceptable of course not it is not acceptable okay but you might be wondering what went wrong we did the same thing with the file system and it worked and now why is it failing so here is a complication okay so there is this property of some operation being idempotent what is idempotent operation is if you repeat the operation multiple times also no harm is done whether you execute the operation once or you execute the operation 10 times the effect will be the same such operations are called idempotent operations for example writing the value of 1 into a variable a this is an idempotent operation if I do it once a will have the value of 1 if I do it 10 times a will still have the value of 1 on the other hand incrementing the value of a this is not an idempotent operation if I do it once the value will be incremented by 1 if I do it 10 times the value will be incremented by 10 right so this is not an idempotent operation so replacing old values of the disk block with new modified disk block where we have seen in the file system you know you log all the changes and you install these changes one by one into the original disk blocks that was an idempotent operation why because you are taking the entire disk block overwriting the old disk block but charging a user's credit card that is not an idempotent operation you charge 10 times and 10 times the amount will be charged okay so some APIs in your system are not naturally idempotent okay and therefore for such APIs this replaying the operations after a restart such mechanisms is redo logging all of this won't work very well okay them so how do you fix this problem the best way is your modules implementation must somehow try to ensure that any API that you expose is idempotent for example if your banking server is exposing some API saying you know charge a credit card it is exposing some API using which you can send a request to charge a credit card then the banking server should ideally implement this API in an idempotent manner for example what it can do is it can maintain a database of all the transaction IDs that have been built recently and you know you created an order you build the back you told the banking server hey build this order charge this credit card and now after 10 seconds you come back and say build the same order then the banking server has to realize if it keeps a list of all the transactions it will realize this particular order ID of this particular user it has been built just now therefore I will not repeat it again and it will simply send you a confirmation saying done okay in this way the banking servers implementation can make this you know credit card payment and idempotent operation in which case if you come back to this scenario the server did the billing in the middle of billing it crashed then when a new replica comes up it will once again request billing but the bank will not charge twice why because this API of to the bank is idempotent so the bank will say if the previous billing did not happen then it will charge if the previous billing happened it will simply send back a confirmation saying your order has been built it would not charge it again twice so if you have idempotent APIs if you are a module and you are implementing all your requests in an idempotent manner then it is very easy to implement operations atomically it is very easy to implement atomicity of transactions so especially in a multi-tier application you know you have many different modules you know you have your client that is or a front end that is talking to this checkout server and which is talking to some banking server for payment some other shipping module for shipping all of this you have a complicated multi-tier system right at every level you should ensure idempotent operations okay so how do you do this for example your client has made a request to check out a shopping cart if the checkout server finished all the operations it sent a confirmation back but somehow this confirmation was lost for whatever reason now the client will once again come back and say check out this order then this checkout API should be idempotent this server should remember that this particular shopping cart I have just checked it out I do not know why this guy is asking me again so it will not execute the order again it will simply send back a confirmation saying done but of course if the previous request failed for whatever reason and that is why the user is retrying then the checkout server will execute the request correctly okay similarly at the bank also you know if the checkout server asked the bank to charge a credit card if the and again it makes the same request again if the previous request succeeded the bank will simply send a response back it won't re-execute it but if the previous request failed then the bank will re-execute okay so every API in the system has to be idempotent then it makes it easy to design your systems and how do you do this one way is you give some unique identifier you know when the user checks out that order that transaction has a unique identifier so that everywhere in the system you are tracking that request using this unique identifier so that when that request comes the checkout server will look oh this identifier I have already processed it I will not redo it again or I have not processed it I will execute it right this is concept is similar to TCP sequence numbers you know TCP if there is a packet loss and you know the same packet is retransmitted and somehow the receiver gets two copies of the data it will not it is not like in your file that two copies of the data you know some line in your text file or email will be repeated twice you know that will never happen why because the receiver is using the sequence numbers this is sequence number 500 another packet starting at sequence number 500 it will filter out the duplicates and it will only use one copy in the final data it is giving to the application in this way this concept of always filtering about duplicates retrying idempotent API is these are all important if you are building a large system and this really simplifies the design and implementation of your system and this idea is used in many places for example if you have RPC frameworks you know some client is making some RPC to a server and some RPC frameworks they guarantee exactly once RPC execution that is no matter what failures happen we will execute it exactly once how is this guaranteed again by you know you track this RPC request and if for some network failures you repeat the request then the server will remember ok this RPC request came just now I did it so it will not redo it again it will simply send a confirmation back saying I already done it here is the final answer in this way several places in a complex system the use of idempotent APIs is very useful so that you can do this right ahead logging you can replay your log but still get correct results in the end. So that is all we have for this lecture in this lecture we have discussed what is atomicity you know all or nothing atomicity before or after atomicity then we have seen how you do all or nothing atomicity using right ahead logging and we have also you know gone into a little bit more detail and we have seen that in real life systems when in large multi-tier systems if you are doing right ahead logging and you are replaying your operations then it is very important that all your APIs are idempotent as much as possible so that even when you replay a request it is executed exactly once within the module. So I would like you to think a little bit more about these concepts of you know atomicity and idempotent operations in your real life and other systems which operations are idempotent and how is this idempotent semantics guaranteed what are the mechanisms used to make things atomic and to make things idempotent. So think about these things a little more not just from computer systems but you know in other real life interaction system. So that is all I have for this lecture thank you very much and let us continue our discussion in the next lecture. Thanks.