 Thank you, Anjala. Hi everyone. Hi. So, welcome to my presentation, which I titled as lossless upgrade of Bosch deployment. I also call it stateful upgrade of Bosch deployment. So, the problem what we are going to talk about today is. So, basically I working backing services team. So, we basically deal with different data services. So, which also consistently expected to have a stateful and these are wrapped up in Bosch deployments. So, few examples from our product offering stack is PostgreSQL, then Redis, this MongoDB, then RabbitMQ. So, apart from data offering for the customers, it is also important the other states of the deployment. So, when Bosch deployment gets updated or upgraded, then states needs to be preserved. So, by state, I do not mean that the state of the customers data, but it is more of transient state. So, the transient states includes the memory pages, then open files, open files includes normal pipes, then domain sockets, ACP connection, etcetera. So, upgrade or migration, we categorize in two types. Maybe in your production system, you want to upgrade your deployment, you want to upgrade the stem cell of your deployment or maybe you want to move your deployment from one Bosch to another Bosch. So, in both the cases, what it happens? So, you are creating a new VM. And also in case, say you are updating the services, let us say PostgreSQL version 9.4 to 10, you are upgrading that time your process gets rebooted and that exactly the problem starts. So, app, the consumer of the data services, what they expect? So, they expect high availability, maybe that is that is that can be taken care by two constantly running processes, yeah that is fine. But when the second process comes up or maybe when the failover happens, right. So, how to take care of the states which are already there, which are already connected with the app? How about TCP connections? How about memory state? How about them? So, our problem is exactly there. So, we are talking about that. So, take one real life example. So, let us say you are booking one ticket service and you are booking some tickets from online app. And in the behind the scene, your app is consuming one backend database service. Let us say take example of PostgreSQL. So, when you are querying constantly, right. So, the data after certain point of time, it will be served from the cache, not from the disk. So, in order to have the better performance, but if the process restarts after upgradation, all the cache are lost. So, how to save the state and when it comes back, how to get it back. So, that the app does not face any performance issue after coming back, right of the services. So, from the app perspective, it should be smooth enough while you are upgrading your system in backend, right. So, as I was talking about few backing services. So, the backing services which are primarily memory intensive, let us take example of RabbitMQ. So, which comprises of transient messages and queues, then also PostgreSQL. You might think it is not very memory intensive, but behind the scene, it heavily depends on the primary memory. So, which as part of this is shared buffer cache. So, most of the queries are sort, frequent queries are sort from the shared buffer cache. Also same as for Rarys and other stateful services. So, how do we provide a solution? So, we need one stateful migration of the Bosch deployment or backing services, right. So, the tool we are using for this prototype is Crue. So, Crue is basically responsible for checkpointing and restore of the process state or memory state of the deployment or of the services, right. So, that app experiences a smoothless migration in the down or maintenance activity. So, this slide is I think not new to you. So, it is very popular. So, here I am talking about the life cycle of a particular CF service instance. So, when managed service instance offers the services. So, it comprises of four major parts. One is provision, bind, unbind, deprovision. So, what is the most important part of it? So, after bind, app starts consuming the services. So, that is the whole point of offering services, right. So, the point is app should experience a smoothless migration when you are updating your deployment. So, here I will be able to talk about Crue. So, Crue stands for checkpoint and restore in user space. So, it runs in user space. So, kernel module is not involved. It does not run in kernel module. It runs only in user space. And it also heavily depends on the p-trace system call. So, it is Linux-based tool and p-trace is a system call which basically helps you to hook into another processes context or address space. So, at high level how it works. So, if a process is running. So, Crue will do dump of the all the state files of the process and it will store as a image binary format. And when the process again comes back after upgradation. So, then from the image it restores the state. So, more about Crue how it works. So, in our namespace let us say our process tree is there and one root process and all the subsequent processes. So, recursively Crue then takes the state of the all kernel level objects sockets files sockets may be domain socket or may be TCP connections also and then pipes and then stores as image files by in binary format. See here in detail how check point and restore is done by Crue. So, in check point so first it collects the process tree the root process and all is subsequent recursively child processes. It collects and freezes it. Then it collects the task resources associated with those processes. So, task resources comprises of file descriptors it also depends on the Proc interface Linux Proc interface file system FDs, FD Infos then also memory maps then also it takes care of the CPU registers. So, here I will probably bit of talk about the P trace was what I was talking. So, P trace is a system call. So, it helps to inject a parasite code into the context of another process. So, that that code can collect the all the process state or memory from that Tracy process. So, Crue injects the parasite code and makes a domain socket connection back with the Crue and that code collects all the all the memory state from the Tracy process and sends it back to Crue and Crue saves it in the image image format. So, at the last step it cleans up also using the same P trace call after collecting all the state then it also discards the parasite code which which was taking the backup. So, at restore what it happens? So, it first Crue comes up comes up as a process and then it forks the root process and all subsequent child process the entire process tree and attaches it to itself. So, the Crue is now parent of this whole process tree right. So, then it restores the all the task resources associated with the process tree it restores all the memory pages then TCP connections everything it restores. Then it the then Crue detaches it the the root process the whole entire process tree from itself and attaches back to init process. So, that it can smoothly run. So, here is a summary. So, how Crue works under under the hood. So, as I was mentioning Crue uses p trace system call to inject the parasite code into the Tracy process and in it takes out all the memory state from it and saves in an image file and it also depends on the proc interface to take in account of all the FDs file descriptors and also any inodes associated with the files for the specific files. And then it also relies on KCMP system call. So, which is basically responsible for taking out the state between any inter process communication basically pipes. So, till here all the memory states are preserved. So, now let us talk a bit to talk about the TCP connections. So, Crue also takes care of the TCP connections. So, how it does? So, all the open TCP connections which are established right. So, it moves it back to TCP repair mode. Then what it does? So, then the socket buffer. So, tx rxqs it captures or fetches all the data from there. Then it also takes care the packet sequence number pertaining to the TCP connections. Then it also few TCP handlers data it also saves. At the time of restore. So, from the state file image file. So, it restores back the state then also for the TCP connections the socket buffers it restores back and at the end all the packet sequence numbers. So, this is how it restores. So, here we will talk about how the stateful migration happens in a Bosch deployment. So, I will talk how for a rabbit mq from a source VM to destination VM how the state migration happens and smoothly how it happens. So, in the first source VM say one rabbit mq process is running, Crue is running, the migration tool is running and then we take help of Bosch errands. So, this is a prototype. So, this Bosch errands what it does the corresponding scripts can be also injected into the drain script. So, that when you do Bosch stop or any controlled update before the process is killing right. So, you can take capture all the states. So, we simulate exact the same behavior using the Bosch errands. So, it is nothing but running a script. So, that script can be injected into the drain script. So, that it collects all the step. So, when it comes back again after starting you can give them back. So, the app is connected to the VM. So, the service is up and running when the control update happens when the backup happens. So, we dump the all the process states as I mentioned earlier. So, into a into any distributed storage it can be any persistent disk or it can be block storage anything any distributed persistent storage right. So, as part of this what we do we run one Bosch errand say Crue dump. So, first what it does? So, it stops the app communication to the service. So, that app cannot write the data now onwards yeah. So, after that it does the checkpointing and the checkpointing how it is done. So, into the image file. So, details I talked. So, this is slightly trimmed down version. So, then so, this source VM let us say you are upgrading one stem cell update you are upgrading your Bosch deployment using stem cell update. So, the new of course the new VM will be created because the stem cell version is new or you are migrating in VM the production system maybe one hypervisor to another hypervisor. So, in a controlled way you are you will be creating another new VM. So, in the new VM a whole fresh set of RabbitMQ, Crue and Bosch errands and those processes will come up. But how about the old state which app is expecting right. So, we store from the from the distributed storage the process states and all the TCP connections and and we give the communication back right. So, here also we run one Bosch errand command say Crue restore. So, it restores the all the dumb state and it also gives back the IP rules. So, that app gets connectivity back to the new VM to the new services. So, in its demo time. So, in demo what I will show. So, in demo it only takes care about the memory state not the TCP connection. So, that can be easily integrated but as part of this demo only memory state will see how yours backing up and then restoring. I am not sure from the back whether it is visible maybe you can come with front. So, we have created already one service instance RabbitMQ service instance. So, it comprises of 5 VMs 2 HA proxy and 3 RabbitMQ VMs. So, in its inside of the each VM the RabbitMQ process is running also the Crue running. So, showing the RabbitMQ queues. So, app constantly pushing up some messages, but using one API will be hitting will be hitting 10,000 messages into the queue and after that will be stopping the process and after stopping the process those messages should come back around 10,000 messages. So, here it is showing how many how many are queued how many ready to consume how many are already in queue and all. So, yes. So, 9000 around 9700 out of 10,000 already is ready to consume and by the consumer it is gradually receiving. So, here we are running the Bosch errant Crue dump to take the state of the all messages and also as part of that script we also stopped freezes the all the IP rules. So, that app does not get any connectivity to the service and yeah it is consuming one by one. Now, we will do once yeah. Now, we will do a hard stop of the VMs. So, that it will be recreated I mean basically here we are simulating the stem cell update behavior where the VM gets recreated. So, stop hard of the hard of RabbitMQ. So, all the 3 VMs are deleted yeah only those 2 HA proxy VMs are there earlier you have seen 5 VMs then you are starting the RabbitMQ VMs. So, it will then recreate all the VMs and the processes inside that yeah you see there is no message in the queues. So, here is also for the command Bosch errant command for Crue restore. So, the all the messages will come back here also we open the TCP connections from the app. So, that from the app it gets connectivity. So, now the messages starting up yeah you see now the total number of received messages are increasing one by one and that should go till around 1000 and at the same time also the app also pushing some messages. Yeah now 8000 yeah it is decreasing. So, from the queue from the receiving end it is going on yeah 9600, 9800 yeah around 1013. So, app also constantly pushing some messages. So, 10,000 messages are back. So, we restarted we restarted the process but states all are back. So, this is example of RabbitMQ. So, it also similarly applies for any memory intensive services. So, for that matter I will call a postgrease SQL is also a memory intensive services because it serves query and the queries are cached. So, those are served from cached. So, those are part of transient state not the persistent state. So, it is very important for an app to have this smooth less migration activity. You have any questions please yeah sure. Should those be considered like plain text like yeah yeah queue does encryption on those files. So, those are saved yes. There is no downtime yes. Yes that stop all traffic yes. So, for the maintenance window I mean app will experience a little bit of downtime. So, which is also true for any upgrade right. But states are back yeah yeah yes. So, in real world how it will happen. So, they say there are two nodes or three nodes. So, from the one node which is master. So, traffic will start coming there. So, then when it stop and it will move the traffic there and we take state of this and immediately after taking the state I was talking I will restore that state back to a new VM, but which is already running to that process also we can transfer these states. So, in the back end I mean it does not lose all those messages if you take example of private mp process yeah. Yes yes it also exactly it also depends on how you are handling the high availability say yeah how you are managing the failover yeah. So, this is Bosch errant. So, as I told it can be injected in drain script of that particular process yeah yeah yeah sure. Code upgrade yes yes yes yes. So, exactly same I was telling. So, when it does upgrade right. So, yeah it also depends on the system files. So, basically we you are you are right that maybe in 9.4 what is the system file structure or directory structure might not be hold true for the other structure, but from the data perspective if you take a take example of PostgreSQL shared buffer. So, it should be consistent across the versions. So, we are talking about we are talking about preserving those states yeah yes. So, that is I mean purely OS I mean OS related. So, we are here taking state of the particular process specifically Linux memory processes yeah. Yes. So, yeah exactly. So, say I mean this is also happened I mean this restore backing up restore and also your failover process. So, these two should work together. So, it also depends on how you are handling the failover. So, let us say your failover is taking 10 seconds or 15 seconds of downtime. So, then this state should be back immediately after 15 second of time. So, these two needs to work together yeah yeah yeah yeah exactly. Sure. Yes. Yeah yes yeah exactly yeah yeah yeah yeah yeah exactly exactly. So, Crue is heavily used for the container live migration. So, I mean container and container spins have happened in few seconds, but for the Bosch I mean for to achieve high availability you need at least two VMs based on your availability zones also sometimes three VMs. So, for having the high availability. So, as I told so from one VM you collect the state and another VM when you are you are moving the traffic back to the all existing VMs. So, you should also transfer the state to there. Any other questions please yeah sure. I mean what is your question exactly? Yes yes yes yeah that would be great for super level of optimization. So, what it happens if you take example of database services. So, all the data's. So, those are asynchronous also backed up or moved to the secondary VM, but this processes state it also can be incrementally moved, but I would prefer to move at last when you have all the states because it does not take so much time at least the transient state backing backing part. So, incrementally would be challenging I am not sure because I mean after two seconds of time how what the state will be I do not know right. So, so it snapshotting of the state I am not sure, but at the end you can definitely go back. So, when you are moving the traffic back to the another VM. So, you can definitely yeah move the states also from the previous master or previous serving a process right yeah yeah sure. Yes yeah exactly. So, yeah that that matters how you deal with the app connectivity. So, that is the binding parameters. So, you are behind one virtual IP say. So, you have to also move the virtual IP back to the new VM virtual IP or say URL any. So, you have to move back the URL. So, that app gets connectivity yeah sure fine tuning in the sense can you can you elaborate okay. So, yeah. So, it's I think independent of any services. So, it's deal with only the Linux process yeah. So, that's it I mean it's you can apply to any any process for that matter. So, for that matter in any app or any process you can apply this. It doesn't depend on the service yeah sure. Okay yeah yeah it's a prototype. So, there are other other few options. So, Crewe is not the only tool open VZ then BL CR is also a tool. So, there are a few other tools. So, you can check that out, but Crewe has I mean it's many other advantages over all other tools but if we want to move into the production. So, then we need really need to think I mean whether Crewe is the only option or we should use some other tools for a fine tuning perspective. Yeah okay yeah. So, if you move to a different kernel. So, again as I saw proc interface. So, those should work similarly say proc then PID, FDs right. So, those file structure should be also same into the new system otherwise it doesn't work. Yes exactly yeah yeah. Any other questions? Okay in that case thanks everybody thanks for your time. So, if you want to contact me. So, this is my email address and also I am part of service fabric. So, you can check it out it's an cloud foundry incubation project you can try out in your local and you can contribute you can create issues. So, we'll happy to help you and also we have a slack channel service fabric yeah. Thank you everybody thanks for your time. Thanks.