 Hello, everyone. Thanks for joining the call. So the presentation. So today me, Shashank and my colleague will present how to deploy a stateful broker, which can be deployed in multi-availability zone, have NZDM qualities, as well as it can be working on multiple cloud. So I made this a statement. Stateless design is the crystal meth for brokers. So why is that so? So majority of the times we are already into the world where people talk all stateless design. But do you really think that everything can be stateless? I don't think so. So in the words of the famous distributed computing guru, Jonas Boner, the creator of a car, he says, isn't state, you can never remove a state, you can just shift the state, right, from one component to the other. So you are making somebody else's problem. So the idea here is if you have a broker, the cloud foundry broker or any OSB API spec specific broker, can it be totally made stateless? Or there is specific requirement where you need to maintain state. So this talk centers around the concept where we say how to write a stateful broker. And then what are the challenges you face in terms of providing NZDM qualities, how to deploy it in multiple availability zones, and then how do you get good fault tolerance out of it. So as I said, there will always be requirements where we need a request spanning across or multiple requests going across or maintaining certain state. So you can never say that I will be totally stateless in all kinds of system design I do. So there can be requirements like you have to maintain cash in one of the nodes. So what happens if the request then goes to the other node? There is no cash. You have to recreate the cash there as well. There can be issues like where you want to coordinate between two nodes. As an example, if you have a scenario where you want to deploy certain things, then you make a request to one node and if that node goes down or if you want to concurrently deploy something, it should not happen that both nodes pick up certain IDs which you want to coordinate on or which you want to attain a lock on. So it's basically the problem boils down to doing a coordination between distributed nodes. So how do you solve such problems? Then how do you maintain a state? This is very specific to the polling API which Cloud Foundry provides. So how do you maintain a state to serve across the polling APIs? So want to make a stateless broker. So if you have a stateless broker, there are very well-defined principles where you say I will add a load balancer in front and do a scale out. This is easy to do. Increase the number of nodes and you can do a horizontal scale. But this is not true when you have stateful or state being maintained within the broker. So think about this. As I said, you have to coordinate between nodes. So take a classic example of what we do with Service Fabric today or I'll come to what Service Fabric is, but in principle it's deploying, doing a Bosch deployment where we need certain state to be maintained. Reason being that we don't want two nodes to take the same idea of the deployment and try to deploy the same thing. So there are problems of doing a lock contention, solving how the locks can be maintained, how they can be coordinated, how nodes talk to each other in a seamless way, which is also scalable. So go back again. The other problems which we see now, we are talking just saying that we solve a problem of having an HA and we have, let's say, a coordination mechanism solved. But what happens from the consumer perspective? You register the broker with the Cloud Foundry, Cloud Controller, but then you only provide a single IP to it. Now, as an example, you say that I want to serve requests from a different node because the first node has gone down, maybe the availability zone has gone down. How do you switch the IP? You cannot go to the Cloud Controller and register it with a different IP. So you want a stable endpoint exposed to the Cloud Controller from your broker. Then you need support for zero downtime maintenance. So in case of Bosch, if you are re-selecting a virtual machine, you might take few minutes, but that is not the downtime people will accept. So you want to develop a system where you can get NZDM qualities, where you can provide a stable IP and you can also be fault tolerant across availability zones. So now we talk about what is the example of such a stateful broker, where I hand over to Keith to talk about service fabric in the context of a stateful broker and how we have mitigated or addressed certain concerns. Thanks, Shashank, for setting up the context on what is stateful broker, what are the challenges there of stateful broker? An example of a stateful broker, no surprises there. It's service fabric that we have been talking about. So let's see. What is the service fabric architecture? So this is a diagram of service fabric architecture. This is the laser? Yes. Yeah, okay. So any user using service fabric services can go via various platforms. It can be Cloud Foundry, it can be KITS. Using CFCLI plugin or other UI or any other CLI, user can create a request on a platform that I want to create a service instance, which is provisioned by a service fabric broker here. So the request comes to service fabric broker. So there is a plus plus here. If you notice, there is a reason for that. So service fabric broker is an OSBA API compliant service broker. But apart from that, service fabric also maintains and operates thousands of instances in SAP production. So we need some more extra capabilities for this broker to handle, like backup and restore, scheduled updates, scheduled backups. And these are some of the extension APIs that service fabric broker supports, and hence the sign plus plus. So you can think of these extension APIs being fulfilled by, let's say, actions that is coming up in OSBA API spec. So when service fabric broker gets this request, it uses API server as a resource manager, where this is API server from Kubernetes, and it is backed up by HCD. So this resource is created in API server. Now there are various operators that are listening to this resources. So take an example of Bosch operator. So whenever there is a service instance creation request for Bosch based deployment, the resource created here is notified to operator. It gets that, okay, I'm the operator. I am supposed to solve this request. So it in turn gets the lock on the resource and then processes it via with the help of Bosch and it creates a Bosch deployment. So as you can see here, I have here one KITS operator, which is a dotted line. So with this kind of event-driven architecture with the help of API server and operators, we can bring in any kind of operators here. So be it KITS operator, be it, let's say, you want to provision Ali Cloud's native Upser RDB RDS. So you can easily add another box here saying Upser RDB operator, and it can serve the purpose that you want to provision. So I hope you understood the overall architecture of service fabric and now where the challenge comes in. Because now service fabric stores the state in API server and this is currently one VM and if we add one more VM, how does this consuming platform talk to the service fabric broker? How do these nodes coordinate between each other? Then there are multiple operators. If we are running multiple instances of these operators, how do they coordinate between each other? How do they manage the locking of resources? So various challenges come in and you can make out from this slide here. So let's move on. Let's see how we have solved this for service fabric where the setup is multi-AZ. It is highly available and it supports near-zero downtime maintenance. Okay. So I told you that there is currently one VM in our setup. So this is, let's say, in Availability Zone Z1 with some IP. Now, let's say we want to make it HA. So we add one more VM. So it is in separate different Availability Zone Z2. Now there are two VMs. Okay, fine. But how do they coordinate between each other now? So we have Keep Alive D process running on each of these VMs and Keep Alive D is nothing but a mechanism which decides who is the master, who is the backup node via VRRP protocol. And it basically sends each other these hard bits to decide who is the master and backup. So if one node goes down, the other Keep Alive D knows that, okay, I didn't get hard bit. So I'm now supposed to promote myself to master because master is down and it promotes itself to master. So this is basically a mechanism and this Keep Alive D process is running on each of these VMs. Okay, so now what are you solving with this setup? Just by adding one more VM and Keep Alive D. So we are solving multi-AZ issue of service fabric, making it more fault-tolerant and we are solving highly available functionality of service fabric broker. But there is one problem. So how do we, if some consuming platform wants to talk to service fabric broker, how does it talk? Which IP does it use? So that is why we need another virtual IP here, okay? So we are currently using 10, 11, 250, 210 and this virtual IP is a concept. I'm currently explaining the generic concept for this setup. We'll go into details of IaaS but let's assume this is a concept where we provide a virtual IP which is a stable IP. So this virtual IP then now points to master. So that whenever a request comes to this virtual IP, it is redirected to master and served via master. And hence, whenever service fabric registers with Cloud Controller or any other platform, it registers with this virtual IP because it's stable and we don't need to worry or consumer, consuming platform user doesn't need to worry about what is going in the background, how is the failure happening, who is the master, who is the backup and which IP I should point to. So it can just connect to this IP and rest is taken care by the setup itself. Okay. So now let's see, this is the ideal situation we want to be in. That master is always alive and it is getting served very peacefully. All the requests are getting served but we live in real world so it doesn't happen and hence we need all this setup. So let's see what can go wrong. So let's say this VM process, service fabric broker process goes down on this VM. So Keep AliveD notices that there is this process which is crash, it's not running. So what it does is it says, okay, now I cannot be a master, I will be backup. When it turns to backup, this Keep AliveD knows, gets notification and it promotes itself to master. So when this is promoted to master, the virtual IP is shifted to this newly promoted master. So now this virtual IP here is now pointing to this newly elected master and as you can see, this consumer doesn't have to bother about it if there was a failure, failure. So this is the typical failure situation that can happen in production or any environment and with this kind of a setup, what we are achieving is nearly zero downtime maintenance and stable IP. Why nearly zero downtime maintenance? Because let's say this was a legitimate go down of crashing of process because the VM is getting updated, the broker process is getting updated. So this can be a legitimate situation and in case there was only one VM, it would have been total downtime for service fabric broker but now since we have this HA setup and multi-AZ setup, now it can as well be served by this master while this node is getting updated and later this node can get updated and this can become master. That is why the NZDM concept is also achieved here. Okay, so now we have seen a generic concept or architecture diagram that I just explained. Now as I told you, this is just a concept that we want to have a stable IP. Now service fabric runs on various ISS, currently it is running thousands of deployments on AWS, Azure, GCP, OpenStack, Alibaba Cloud is also on the cards. So how do we actually implement this architecture in various ISS? There can be multiple approaches that we can take on different ISS. So let's take an example. Okay, let's take GCP and Azure. So if you see, this is a similar diagram that I just showed you and explain you. So just to reiterate, there are two VMs. They sit on different AZs. There is a keep alive D on both of the nodes. One is master, one is backup. Now this is the box that is changed in the slide. So instead of virtual IP, there is internal load balancer. So on GCP and Azure, we use internal load balancer to achieve the stable IP. Now what is internal load balancer? It is just attached to multiple VMs here. In case of this particular architecture, we have VM, broker VM 1 and 2 attached to this load balancer. Any request that comes into load balancer is forwarded to a healthy node via TCP port forwarding. Now what is called healthy node? What internal load balancer does is it has this HTTP health check. So it sends HTTP get request to a health check process that is running on each of the VMs. And whoever sends HTTP 200 okay is the healthy node for load balancer. And all the requests are then load balance. So if there are multiple nodes and some of the nodes are unhealthy, all the requests are transferred forwarded to healthy nodes. In this case, now let's see how we utilize this kind of load balancer in our HA concept. So what we do is when HTTP health check of load balancer sends HTTP get request to master node, it sends HTTP 200 okay. But whereas when it sends to backup node, it sends back HTTP 500 error. So from load balancer point of view, the master node is always the healthy node and backup node is always the unhealthy node. Though it is not exactly an unhealthy state, but since we want to route all the traffic here on master, we make this backup look like it is unhealthy node. So all the traffic that comes from a platform gets routed to master. Now in case of failure state, what happens is now this VM2 is promoted to master and this is a backup. So when HTTP health check asks for the health check, it sends HTTP 500 error because now this is a backup node. And this newly promoted master will now start sending HTTP 200 okay. So like I said, this internal load balancer then always sends request to master node because it is healthy node. It is posed as healthy node and backup node is there sitting for the backup. So this is the approach that we have taken on GCP and Azure. So I'll hand it over to Shashank to explain it on OpenStack. So I think as Ketki said, different IS have different requirements and different ways of addressing a stable IP. So as an example on Azure, we cannot have an IP which can work across availability zones. So for case of OpenStack, we use a mechanism called an allowed address pair. I don't know how much of that concept you are aware of, but the basic principle is that on a neutron port, you are able to attach one more IP, which is the floating IP or the virtual IP. But when I say floating IP in OpenStack, it is a different connotation because it becomes public. But we use an allowed address pair to make that IP private, which means that in case of a failover, I can just update the neutron port and switch the allowed address pair based IP to a different port. So here if you see the same scenario, what Ketki also talked about Azure and GCP, same concept applies via VRRP. The nodes are determining who is the master, right? And once the election happens, we do a port update to switch the IP to the master node. And all traffic over the virtual IP then it gets redirected to the actual master in this case. Again, so now if you understand the architecture of service fabric, we have a concept of a broker, then there's an API server, and then you have operators. So you might be wondering that we talked of only the broker part being made HA. But what about the operators which are running behind the scenes, which take the job and then do the actual deployment? So we have taken care of this, but there's a different mechanism because they don't need a stable IP because they are not the front end components which are interfacing with Cloud Controller. So what we do here is as an example, you see three operators, but just take an example of multiple Bosch operators running in a scale out mode. So whenever the request comes, so this is what you see. So when a request comes to the API server, both the nodes are watching for the resource change or the CRD change. What changes are happening? Is there a new deployment request? Is there a demand made on the system? So once both the nodes see, both get the events and both try to do a deployment for you. But before doing the deployment, what they do is they attain a lock because we already have an API server and behind the scenes, it's an LCD being there. So what they do, they attain a lock on the API server and only one node picks up the deployment and does it. So basically what we are trying to do is not enter into a race condition, but only one node picks up the deployment and always the system is in a consistent state. So this is the whole flow where I mean you can then scale out multiple Bosch operators and then do the deployment. Maybe then we go over the demo quickly and showcase. So let me just go over the demo quickly. Okay, so what we'll show is from the GCP. So where we have a load balancer and it is attached to both of the VMs. So I'll quickly explain how this pains look like. Okay, so on the left hand upside, we have a CF pain. So I'll do some CF operations there. Then we have Bosch pain here. So I'll do some Bosch operations here. And here if you see these are the two broker VMs that I'm logged into. So we have two node set up and I have logged into these two VMs. I'll explain in a moment what each node stands for. So let's first see what are all the services in the marketplace. So we have this blueprint service, which is a demo service for us. And there are various Docker and Bosch based plans there. And currently there are no services created, right? Then let's see how our service fabric broker deployment looks like. So I'm not sure it's really visible, but if you can see there are two broker nodes here, which are in Z1 and Z2. So two different availability zones and they have different IPs, 250, 211, 253, 11. These are the private IPs. Yes. And if you see now that these below pains basically represent these two nodes. So this 9737 is the left hand side bottom pain and CC8E is the right hand side bottom pain. Now let's see what these nodes have. So what is the state of these nodes? So as I told you, KeepLivD maintains this state. One node is master, one node is backup. And this is maintained in a file system by KeepLivD. So you can see that this bottom left is master. And similarly, we can see the other node should be the backup node. Yeah, so this is a backup node. So now what we will do is we'll try to induce a failover and see if the node request gets served via the other node. So we'll log in into the master node. We just saw that this first node is the master node. We'll log in into it and let's see what all processes are running in the VM. So you can see this service fabric broker process along with some other processes are running. So this is the process basically who tackles with all the lifecycle operations of any instance. So what we'll do, we'll also tail the logs in both of the VMs so that we know which request goes to which VM. So the master node should always get all the requests. So it will be evident from the logs that we see here. And similarly, we can tail here also. So you can already see there are some info or API endpoints getting hit on master node, right? So okay. Now let's do one thing. So let's create a service instance. So you know, service instance creation in service fabric is a sink call. So it will get created and the operation will be in progress. Now currently this is the master. So you can see that there are the requests are coming to this master node. And you can see last operation call from CF, which is getting by served by the master node, right? Let's see if the service instance ID matches. So if you can see 154 f4 and here also the ID is 154. So this is the instance that we are getting requests for. Now what we'll do is we'll stop this service fabric broker process. So we are currently in master node here. We'll stop this process and let's see if the failure happens now. So we will check that this, the process is stopped. Yeah. So you can see it is not monitored now. And let's see now this, this was the master VM. Now what is the state of this VM? So with keep alive, we can see the state of the VM and currently now it has gone to path. Yes. So the failure has happened. Now the master has become backup and you can already see here that the request has started coming to this, this other node which was backup node. Now it is getting, it has promoted to master. So all the requests are coming to the backup node now. So in a bit, in a moment we can see that this create in progress is also turned to create succeeded. So in the previous case what would have happened is the creation would not have succeeded because the node was down. But now there is another node which can very much take care of the creation in progress. And you can already see that the creation is succeeded. So what we have done is reduce the downtime of minutes to few seconds by this mechanism here. Okay. So then that's it from our side. If you have any questions for us, feel free to ask. Thank you. Great talk. I have a question about state. So the state is basically in at CDN, in the API servers, right? Yes. And you use Bosch to manage and scale at CDN, is that correct? So the machines for the service broker fabric are Bosch deployed and there is a Bosch job. That is one of the flavors. Yes. Okay. Thanks. Which we are using as a coordination mechanism. Right. So only one node at a time should attain a lock on the system. So we use at CDN to persist or to have a lock locking mechanism there. Okay. Well, thank you Kitaki and Shaka. So you can also reach out to us on our email addresses. Service fabric is already incubator project. So you can also join us on Slack and also reach out to us at any time. Thank you very much. Thank you.