 Welcome to this new Calix webinar. My name is Hanat Kavokant, I'm principal engineer at LightBan and today I will talk about how to build orchestration sagas with Calix workflow. So there are two types of sagas, choreographed sagas and orchestration sagas. In a previous webinar I covered choreographed sagas, so today I will talk about orchestration sagas. In both cases, a sagas is about implementing a longer running process in distributed systems. We are talking about a process that spends over more than one transaction. And as you know, a Calix entity lives in its own transaction boundary. When you want to mutate two entities, you send a command to one entity, you mutate it, that's one transaction and then you send another command to a second entity and that mutates this entity and that's yet another transaction. You cannot modify two entities in the same transaction. But in some, for some use case, I do need to modify more than one entity with one incoming request from the outside. So for choreography, we can use subscriptions and listen to advance and state change from my entities and then propagate the change. The other option that we have is to use a Calix workflows. And this, when we use Calix workflows, we can then implement an orchestration sagas. And it's about to build this process but having a central point. We can say that it's a coordinator or a conductor in an orchestra that is driving and managing the whole flow. So orchestration sagas is about implement a coordinator or a conductor in an orchestra that will be responsible to manage and drive the whole flow. You start, it sends commands to an entity, when that entity finishes processing this command, the workflow can move to the next step and send another command to another entity and so on. What's important to know that Calix workflow is guaranteed to run until it finishes and it saves the steps where it knows where it stopped. So if you are in the middle of a workflow and for some reason you shut down your service and break it back or there is some error that happens, the workflow knows where it was and it will restart from where it stopped. So just very high level, I will use, I have some diagrams to explain what are the main features in Calix workflow. And here let's imagine that I have an e-commerce application and I have something trigger the checkout of a shopping cart. So the input here is the request to checkout or a notification that the shopping cart has been checked out. So I built a workflow and this information gets into my workflow and initiates my workflow. And then the first thing that the workflow will say will do when it's initiated, it needs to do its first transition and it simply moves or indicates what is the first thing that needs to happen, what is the first step in that workflow. So the first transition here is, is a query payment. So Calix will move to the step, is a query payment that you define yourself and then a call will be made to some payment service to execute it. Now, if that call fails, it's not a problem, a workflow is designed to just retry it. So it will retry a few times, it depends, you can configure. If you say nothing, it will retry forever with some intervals to not overload your systems, but it will keep trying. You can also put a limit on that and say okay, I will retry maximum three times, let's say. But let's say that here we will just make a call, let's say it fails the first time and then it passed. Okay, I will retry and then it passes. And then I will transition, I will transition to initiate shipment that itself, this step also executed a call to the shipment service to initiate the shipment. And this call returns correctly successfully, then I transition to the end of my workflow. You may have arbitrary number of steps and combinations, you can move to one step, go back to your first step, doesn't matter much, you can decide how you want to compose them. As I was saying, you can, by default, a workflow once it started, it needs to run until the end. But you can also put a timeout, you can define a timeout, say okay, this workflow after one hour, if it's not completed yet, I need to tear it down. Or you can say that for every failure that I have in my workflow, for every step that fails, I want to retry that step at least three times and if it keeps failing, I will abort my workflow. You can have a timeout at the workflow level and retry strategy at the workflow level, but you can also have a step level. So you can define both timeouts and retry strategies at both levels. Now, one other important and I find it's quite interesting feature is that you may, you can also execute a step, it succeeds, but instead of transition to yet to the next step, you decide to pose your workflow. When you want to do that? Well, usually you want to do that when you do whatever you can do with the information you have and now you need to pose because you are waiting for some input from the outside. It can be a subscription, something that will trigger subscription that will come back to your workflow, and put it back into movement or it can be just that you pose and you expect that the user will come, connect to your user interface and fill some form, submit and continue the workflow. So the possibility here is to stop your workflow waiting for more input. That input can be from a user or it can be from your system, something that happens in your system that triggers a call back to your workflow and put it back into movement. And again, you pose, some requests come seen, you move to another step and then you transition to the end. As I was saying, I will use today the same use case that I implemented for choreography sagas. But let's recap, if you haven't seen this other webinar, I will explain here what is the use case anyway. So in event-sourced system, there is one major challenge. It's how can I have unique fields inside my model that are unique across all my entities for that specific model. So this problem is called set-based consistency validation. So let's imagine that the user here is an event-sourced entity and I have a unique ID that I can always use to reference to different models to different instance. And inside I have the name and I have an email. But I don't want the email to be reused. I want each user to have a unique email address. But I also want to be able to change that email address eventually. So I cannot use the email address as unique identifier. So the email address needs to be unique, but it's not my identifier because I want to be able to change. So if it was a traditional SQL relational database, I would have a table for the email. I would say, OK, this column here needs to be unique. But an event-sourced system, we don't have such a table. What we have is a journal with all the sequence of events that I can use to reconstruct my model here. So I cannot just go to this journal and scan all the payloads of all the events that I have to find out if someone is already using this email address. And as I was saying, the KELIX entity lives in its own transaction. So I cannot send a comment to an entity and say, OK, create this entity. But before you create it, make sure that all the other entities are not using this email address. I don't have the possibility to scan my whole system. So I need to have other means to do that. So the way to do that is one of the possible solutions for that is to create a barrier. So here we have this user. It will be implemented as an event-sourced entity. And we have another entity that I will call it a unique email entity. And the idea of this entity will be the email address. And because it's a value entity, a key value entity in KELIX, it's guaranteed that there is only one for a given email address and there is only one in the system. So it's kind of a barrier for me. So as I did on the choreographer saga webinar, what we'll do here is when I need to create a user, I will first create an email address. If it's already in use, then I cannot create that user. I have to change the commands of my request. But if it's not in use, I will create this unique email address for the first time and I will reserve for that user that I want to create. So I kind of create this record, this key value record, the unique email address, and then I move forward and I will create the entity. But as we said, those entities, they live in their own transaction. It can happen that I create the unique email address, but then I fail to create the user. When that happens, I need to have some compensating actions to free the email address because now the email address is reserved for this user that I was not able to create. So if I fail to create the user, I need to go back here and unreserve this email address. And if I create the user, I can go back here and say, okay, it's created, it's conformant, and it's in use by this other entity. And instead of using a choreography here where in the previous implementation, I was listening to events and to state change and coordinating the change by using subscriptions, here we'll create a workflow that will be our center point that we use to coordinate the whole. So visually it will be something like that. We have the workflow here. I will not have an application controller anymore. I will just hit the workflow directly from the outside and then I have my two entities. And when I initiate the workflow, the first transition for this workflow will be move to the reserve email step. So Kellex will execute move the workflow to that first step that is that we'll call the reserve email on the unique email entity, creating the entity if it doesn't exist yet and reserving the email or failing if it is already in use. So let's say that here's the sunny day scenario. I reserve the email, it returns, and then the workflow will say, okay, now that we've reserved the email address, let's create the user. Let's move to the next step, which will be, let's transition to the next step, which will be, which is create user. So now we do a call to the user, we create the entity and it returns and say, okay, good, I have, I did both calls and now I can confirm the email. I can go back to the unique email entity and confirm that it, that this entity is, this email address is effectively in use by the user that I just created. And when this call returns, I can finalize my workflow. So I have, I first transition to the first step, first step completes, I transition to the second step, it completes, I transition to my third step and then when the third one finish, I just transition to the end, to the end of my workflow. Of course that's the sunny day scenario, but things, errors happen. It can be that I have a validation error here or the user entity services down for some reason. So I reserve the email address and when I'm about to create the user, it fails. So in the demo that I do, that we're going to see today, I will put a retry strategy, a max retry set to three. So what will happen is first I create the user, it will fail and then I will simulate some errors, it will fail and then it will try again and then again with some interval between the retries and then it will retry for the third time. So we're going to see in the logs four tries, the first one and then the first attempt and then the three retries and then it will finally fail and the workflow will have another step, which is a failover step. After trying three times, we'll say, okay, that's enough. I don't want to try again because there are probably major issues here. So let's failover to yet another step here and then we unreserve email and then we unreserve the email as usual and then we can finish the workflow. So here the difference is that I have this workflow place where my workflow has a point for me where everything goes back to the workflow. The workflow can then make a decision. So the interactions between the parts are coordinated by the workflow and as the name says, it's an orchestration saga. So it's like the workflow here, it's acting like the conductor of an orchestra. It's the one that's managing driving all the process. Let's go to the code. So here the model is exactly the same as in the choreograph saga webinar. I did one small change here on the entity. I will show this right now. I put some random failure method here that I call. When I created the user, I will do some random failure because I want to simulate failures and I want you to see the workflow recovering from them. So what the condition here is, if the user ID that I'm choosing here is a negative number, I will randomly make it fail. So if it returns true, I will fail with this message. But if I send a positive ID or a string, because ID is a string, no, it's a string in Calyx. The unique ID from an entity is always a string. But if it's some text, if it contains some letters, it just succeeds. If it's a positive integer, it succeeds. But if it's a negative integer, I will eventually let it fail. So let's see the workflow. The first thing here that I want to show is that the workflow has a state. The user creation workflow has an ID that is exactly the same as the user ID, just for simplicity. It has a state, define it here. And the first time that you start it, it will send a post request to this method with the command to create the user. Then if the workflow was never created, if the state is null, or if it's in the positive state, and we're going to see that more in detail later, I will save the state. I will create the state, say, okay, I'm on that state, phase of my process, reserving an email. And let me show here what is the state. The state of a workflow is the user ID, the initial creation command, some status that can be reserving email, creating user, confirming email, finished, posted, or failed. And so that's the status, and eventually it may have an error message that I can inspect the state and see what is going on. So when I start it here, I will create, I will save the state, the state of my workflow, and then I will tell the Kellex engine here to say, and now it's time to transition to this phase, the reserve email phase step in which, and that's the input for that first step. I'm reserving the email here. That's the command on the unique email entity to reserve it. And then I reply to the caller, I send back the state of my workflow. And if the workflow is already created and it's not posted, I don't want to interfere indeed, I will just return the states as it is. I will not force any transition here. The workflow may be finished, the workflow may be running. If I send this request twice, I will just send back the current state of my workflow. Now, the definition of workflow is a method that you have to implement. Here's where you define all the steps of your workflow. And I will go to the end of this method where I want to show you this part. First, here you need to return the workflow definition. And the workflow definition, you build it by calling this API here and adding the steps. Those are the steps of our workflow. And we can imagine that... So the order of the steps here, you can add them the way you want. They don't need to be executed in that order. That doesn't matter. Because the goal here is that you need to declare the steps. And each step may transition to any other step. You can go back and forth between two steps a few times and then decide to finish your workflow. So the order is not important. But it's important that all the steps are defined in your workflow definition. So we have a reserve. The first step that we will execute here is the reserve email address. First is the battery. So at first I want to know if the email is available. If it succeeds, I will transition to the create user step. And this step is a special one because here I'm saying if it keeps failing, if it fails, I want to retry it three times, at most three times. And if after three retries it keeps failing, I want to unreserve the email. So here's the failover transition. Here I will go to this other step which actually unreserved revert the reserve email step. It basically unreserved the email. Now if the create user succeeds, then the transition, and we'll see in a while, the transition from create user is the confirmation of the email. So let's see first the reserve email, which is the first step here. It may be important to show here whenever you declare a transition, you have two possibilities. You can say transition to this step and there is no input, there is no new data that you need to pass to that step, or you have to pass the command or the input for the step. So here in the transition to I'm saying reserve email, I messed up something here, revert. I am reserving the email here, I'm transition to the reserve email and the input is this command here, this type. So let's go to the reserve email, this one. I have the name defined here because I want to use in the logs as well, so I just put on a variable and this call will receive as input the reserve email class. Then KX will take that and say, okay, I need to transition to that. I have already the input that you defined in the beginning of this file. Now when you say transition, KX will take that and it will execute that for you. And this is an async call. So it means that the lambda that you are passing here, this command is this type. We can see here, command is the reserve email class. So now KX in the background will execute this lambda passing the payload that you defined it and here we will use the component client to hit the value entity email, the unique email entity and reserve this email address. We will execute it immediately. So it's an async call. It's expecting a completion stage and if it succeeds, we will map it to result success. If it fails, we'll map it to result failure. We'll see now what we're going to do with those two here, but let's first just to recap what happens here on the reserve methods. On the reserve methods, here I have this value entity that works as a barrier for me to protect the email address and when I reserve it, so the email address has an address, has a status, can be not used, reserved or confirmed and it may have an owner. The initial state of this value entity is the address. It's not in use. It doesn't have an owner yet. When you reserve it with this command, you see it's already in use and it's already reserved for another owner. So if that's the case, I will emit this email is already reserved. So game over here. If you hit this method twice to reserve for the same owner, then it's already reserved for this owner. So I have nothing to do. I just say, okay, it's good. It's reserved for you. I don't have anything to do. Now, if it's not yet reserved, I will update the state and say, okay, this email address is now reserved for this owner here. Back to the workflow. So we do the call. If it's successful, we map to this one. Otherwise, we map to a failure. What I want to do is to capture the failure message because I want to put it in the state of my workflow. So what I do next is, so Kalex will do this call for you in the background. When it gets the results, that will be either result success or result failure. It will then call the end method. So for each step, that's how it works. You define a call that needs to be executed by Kalex by the runtime. Then we have the results. We come back to your step and say, okay, we got the results. What should I do next? And in this end-end method, you define what is the next thing to do. So the result comes back from that call. We pass it to you here. Now you can check. And what I'm doing here, okay, oh, it's a failure. So I do some logging. And I will return here an effect, workflow effect, in which I update my state saying, oh, it's paused. And here's the error message. During the demo, I explain why it's paused here. So the status is paused. So there was an error. They may have already resolved. That's why it failed. So I've just paused my workflow. I will not leave it, let it go to completion. I will put it in opposite state. It won't try again to reserve. It will just stop and wait for extra input. And then instead of transition to something else, I will just tell Klex, I want it to pause. So don't mixed up. Here, this pause is just the status inside my state because I want to be able to visualize it while here. Pause is us telling Klex, now it's time to pause this because I will bring more data eventually. Now, if you successfully reserve the email address, we move forward and we say, okay, now I will move to creating user phase and you return here the fact, no? So first update the states, then transition to create user step. Let's see the create user. So let's say that we transition to this step now. What will happen is, okay, I have... Let me go back here. When I transition to create user, I tell Klex what is the input of this next step, no? And the input here, it's something that I store it in my state, which is, let's have a look, is the initial creation command. I just put it on the state because later I want to use it. Now, and that's the moment that I want to find back this original command, the creation command. And now I move to the create user step and that's my input. Let's have a look on the implementation of create user step. I make a call. This time is not an async call. It's a call in Klex terms. It's a deferred call. I will use the client here and I will return a deferred call. That's what this call will return to me. No, that's not what I want to show. This return and deferred call that Klex will run in the background first. So what it does, well, it will talk to the event source entity identified by this ID, and we call the create user method on this entity. And here we come to this piece of code that I showed already. I showed already where we have this random failure, eventually dependent on the condition. We will force a failure here. If there is no name, it will also failure. Otherwise I will create the user. That's exactly the same that I did on the choreography one. Back here. Good. This call returns a don. So the output of this call is the input of the end-end. Once this returns, Klex come back to you, say, OK, I execute your step. Now what do you want me to do? And then you tell Klex, OK, that's the type. Here I don't know if you know this trick. I don't want to use, I don't need to use this type don. Here in my lambda, so like I have this trick, if you put two on the score, it's valid Java code. And it's just say, OK, on the score, I don't use it. It's just like a way to say, I'm not using this type here. Whatever. Here, update state. So I create the user. Now I update state. What is the next thing that I do? I will confirm the email address. And I transition to confirm. Here the difference is the transition, I'm not passing a type, an input type, because confirmation email does not have an input. So I can just say, now transition to confirm email. Let's see what happens here. OK, confirm email, yet another call. But this time I don't have an input type. It's just a Java supplier here. And inside it, I will call the value entity, the email that we reserved for it. And I will confirm. Let's have a look on the confirm method here that will be called in this step. If it's reserved, I will confirm it. And I'm done. If the status of my email address is something else, I don't care. I just don't want this method to fail because of that. And then I just return done. Here as confirmed. So I update the state of my value entity, taking the current state, mutate it to as confirmed, which confirms the status of this email address to confirm. OK, I confirm the email. Then Kalex, come back here. Again, here confirm, returns done. That's the output of my call. And then, therefore, that's the input of the end-end methods. Once again, I'm not using it here, so I may ignore and use the trick of using two underscores. But what I want to do here, it's OK. I will update my state. I say, I'm done. I'm finished. I reserve the email. I create the user. And I confirm that this email address is in use by that user. And then I can finish my workflow. So whenever I return here the fact, I will just show I can transition to something without an input. I can transition to a step with some input. We have seen both. I can transition to paused state. The workflow goes to a pause to a halt and wait for someone to provide more input or I can go to finished state. When I transition to end, what's the case here? It means that this workflow will not be as it could anymore. We reached the end of the workflow. OK. One more part is the... So we saw the reserve email. We saw the create user. We saw the user confirmation. We haven't seen yet the unreserved email. So the create user, if it fails, it will try a few times, three times, and then if it keeps failing, it will transition or fail over to this other step. Let's see what it is. So we got in this situation. We need now we were unable to create our user. We have an email that was reserved in the system. We failed to create the user. Now I want to free that email address that it can be eventually be used by someone else. So here again, the step defines a call and the call takes gold talk again to the value entity here, the unique email entity, and it calls the unreserved method. Let's have a look what it does. If this entity, if the unique email entity is in a reserved state and I call this method, I would just go back to the empty state, just reset its state. And otherwise, I would just ignore it. I just say, no, it's good. I'm not reserved, so you cannot unreserve me. And the empty state is back to not use it with that honor. If I'm on that state, I can reserve this email address for someone else. That's the goal here. Back here. No, one more. The call will unreserve the email address and when it returns again, it returns done. Again, I'm not using this type here, but I do update my state and I say, okay, I failed to create the user. So the status of this workflow is failed, game over, I'll not try again because I move this workflow to the finished state. I end the workflow. So here it's everything that I need to have to implement this use case, one single class with the steps and the coordination between them. The workflow here is managing and driving the whole process. Now, of course, we want to see it running. Okay, I hope it's there. So first, let me check like I did before. This email address is not in use. I will create user 001. It will also be the unique ID for this workflow. And here's the payload and it will reserve this email address. Let's see. Okay, that's the status of my workflow here. User ID this one. That's my initial command. It started in reserve in email address. That's the first step. There is no error message. But in the meantime, this workflow already completed. Okay, let's see what is happening here. I hit the workflow, start workflow, reserve email. It's here. And then it hits the unique email entity, reserve in the email address, back to the workflow step. Okay, reserve email. That's reserved. Workflow move to create user step. It creates the user. It hits the entity here, creating the user back to the workflow. Confirm the email address. Back to the unique email entity. Email is reserved. If I check the status of my workflow now, it is finished and my email address is reserved for this user and the user is also created. I have one here that will force a failure. I have a user without a name. And when I call it, let's see here the email address is not in use. I will do that. The email is now reserved. But if I check the workflow, it's creating the user. And I can see here that Klex is emitting some errors. There is an error here because it failed to create the user. The name is not filled, so we know that it will fail. Here again, here again, third retry. And now the workflow will fall back to the failover step, which is unreserved email address. If I go back here and check the status of this invalid user, invalid acme.com email address, it's not in use anymore. I have one more here, which is just for the fun. So as I said, when you create the user, this is the one that we just saw. I didn't put the user name, the user name, the full name for that user is empty, so it fails. But now we're going to see this one, the random failure. And as I was explaining, if the entity ID is a negative integer, I will randomly make it fail. Just for the fun. So here I will generate a request with a random number. The ID here, it's a random number from minus 100 to minus 1. That's the name. Then the email, because I will call it fill time, so I want to reserve each time a different email. I'm using a random UID here in the email name. And then we can later check here the workflow states. Let's see the first one. Okay, this one passed. This one passed. It's 68. First one passed. It's finished. Okay, let's try one more. Okay, this is failing. Okay, fail once. Okay, fail twice. Fail three times. Let's see it in fail once more. Okay, it passed because it's random. It failed three times. If you have one with four times, we're going to see it's which number was that? 18. We can check. It's finished. One more. Let's see one more here. Failing. I already failed it three times. Let's see if it will fail once more. Oh, it didn't. I want to see one unreserved email. Let's see this one. Oh, this one passed. Once more. Let's see if we got some chance. Ah, it passed. I want one more. Fail it, fail it. Third time. Oh, it passed. Anyway, so of course, if it fails once more, we are on the third case here. It will fail. After three times, it will give up. We could see it with the when I did this one. I will change the ID here. This one will keep failing because I don't have the name. I was hoping to have the random one to fail once. At least. But it didn't work out. Anyway, that's just for the fun. So back to the slides. So what are the main characteristics of workflow or orchestration saga? You can build it completely based on the workflow. You don't need to have subscriptions. You don't need to wait for the events to be propagated to your subscriptions to go back to your choreography. You have one single place where you can better control the flow of the calls that you want to execute. Once triggered, like choreography, it moves like a wave until completion. If not posed. I forgot to show this one. Let me go back here. I reserved this email address for John. That's why I had the pose one. I almost forgot to show this one. Now I will create another user and try to use the same email address. Let's see what happens. It says that it's reserving the email address but we know that this email address is already picked. So if I go here, it's saying oh, it failed to reserve the email address because it's already picked but it's posed. Why I put it in pose? Because if I complete this workflow and I'm using the same user ID I cannot restart it again. For that I need to have a workflow ID that is different than the user ID. But for my demo, I want to show this so I decide let's use the same user ID, make it simple, and then I can show the pose. What we'll do here, this workflow is not finished. It's not trying to reserve this email address anymore. It's just posed. If I go wrong, I cannot continue. So now I can come here and change the email address and I will send a request to the same workflow and as you see when I hit this method it will have here another payload. The command, I'm modifying the input here. So let's see. Now I'm reserving for undo and if I check here it's finished. So that's one of the case where you can use a pose. You can decide that something went wrong, you pose and you wait for someone to come in and change some data, provide more input so your workflow can move on. Just to show here what happened when we hit the start button, the start end points. If I don't have a state for this workflow it means that I really started for the first time but in that last case here it was in a pose state. So I accepted the request anyway and I updated the state and I transitioned back to reserve email so I kind of restart my workflow. That's one possible use of pose in workflows. So as I was saying, it triggers. Once triggered it moves like a wave until completion if not posed and that makes me remember that I wanted to show the pose transition. Once you can pose it and require request more input, receive more input from either some subscription for some other system or from a user. It's possible to inspect your workflow because it has states and that allows you to track where you are and that's the biggest difference compared to choreography where you don't have a clear way to track where is your workflow where is your choreography saga. In our orchestration saga because you have the central point the conductor of your orchestra in our case here the Killix workflow you have this point where you can go check the state and see how this is moving or if it's stuck because of some error if you need to provide some inputs if you need to unlock it or not or you can analyze the flow. I have this exercise here assuming that you have watched the other webinar about choreography and now we can talk about the compare both here. So what are the difference? In choreography we depend on events and state chains and subscriptions. We have our entities emitting events or emitting value entities propagating their state chains and we have subscriptions to react to them. On the other side we have a fixed place where we can coordinate everything. We have this conductor of our orchestra that is the workflow. Choreography moves like a wave until completion. I can say the same for the workflows but we can pause. Choreography it's important to watch out for head of line errors if you didn't watch the other my previous webinar I recommend you to watch it because head of line errors is an important thing you have to learn when you're dealing with subscriptions be aware of that and as I was saying in the other webinar it's hard to debug because you don't have a place to track the progress. Now back to workflows we can track the progress we can rely on events and subscriptions if we want we can pause the workflow and have an action subscribing to some events that hits back your workflow and put your workflow back into movement that's completely possible and as I was saying you can pause and wait for extra inputs you can also cancel your workflow it's running you can say every single point you can hit it with some endpoint and say whatever you are from now on you move to this other state and you can recover in case of failures you have this possibility to say okay I will try this thing I will try this step three times otherwise I will fail over to this other step where I have the chance to do some cleanup the source code for this workflow demo it's also on GitHub you can scan this code and download the code and play around with that and you can also join your Slack channel if you have questions and want to reach out otherwise this other QR code brings you to get started with Kaliq's page in our website that's all for this webinar I hope you enjoyed thank you very much