 Okay. Okay. Let's get started. First of all, my name is Renat Akhmerov. I work for Nokia as a software engineer. And today we're going to talk a little bit about Mistral workflow service and how to orchestrate tasks with Mistral. So let's see how it works. I guess it's not working. Okay. Let's see. Okay. Okay. All right. So this is the agenda for today's presentation. So we're going to talk a little bit about workflows in general and why they are important and about Mistral project as well. So and then to the end, I'll briefly talk about the current status of the project and plans for Newton's cycle and we'll have a QA session. If you have some questions, please hold until the end of the presentation and we'll have like four or five minutes for questions. Okay. Some general information about the project itself. So started in November 2013. So it's been like two and a half years since we started working on that. It's now the part of the big tent of OpenStack projects. It has pretty stable community and we've been observing like significant growth of interest in OpenStack community to the project. Okay. Before we get started with the Mistral itself, let's talk about workflows and some workflow concepts and why we think they are needed in the cloud environment and the distributed environment. So workflows, first of all, what is workflow? What does this mean? Actually, Wikipedia has a bunch of definitions related with workflows and actually I like this one most. So workflow management system is a software system for the setup, performance and monitoring of a defined sequence of tasks arranged as a workflow. Okay. If it's still a little bit abstract and unclear, let's talk a little bit about a particular example. As an example, I want to get to OpenStack Simon. How would it look like? First of all, we need to contribute to OpenStack. I think it's actually pretty important, at least I think so. And the next step is to become an active technical contributor and then we need to buy the summit tickets, summit pass, right? And after that, we need to arrange the trip, book our flight tickets, our hotel reservation and stuff like that, fly to the summit eventually, register at the desk, get a badge, all the typical materials that we get at OpenStack summits and we're done. Or we can actually skip the first two steps and do something like this. If you have a good relation with your boss, you can actually ignore the first two steps and you'll be fine. Let's notice that some of the steps in this workflow may be actually associated with some data. For example, when you're becoming a technical contributor, you will get a discount code so that you can use this code later on when you're buying your summit pass, right? So one step has produced some data which is consumed by another step downstream the workflow. And when we buy a summit ticket, summit pass, the data which is associated with this step is the pass number itself and when we arrange the trip, the data may be our airline tickets, ticket numbers, hotel reservation number and stuff like that. And of course, having a good relationship with your boss creates some extra budget which can be used at some points as well. So let's try to give a descriptive definition of what workflow is. So if we look at this picture, at this diagram, we will see that it consists of a number of tasks and connections between them. So eventually, both of them form a graph. What's also really important is that the workflow has some stage. So at any point of time, we should be able to say that some of my workflow tests have completed successfully, some of them have not even run and some of them have finished with a failure, for example. So state is really important and it's also about result. So every test has a result. It may have a result actually, not necessarily, but the whole workflow, the entire workflow may have optionally some result as well, right? And data. So it's kind of the same as result in state, probably. It's similar, but I just put the emphasis on this just to make sure we keep it in mind just because it's important. This is an important concept that's reflected in a certain mechanism in the workflow engine. And we also need to keep in mind that all of those tasks may actually take long to complete. So this adds some additional requirement on how we should be building a workflow system because everything should be pretty much a synchronous and we shouldn't be waiting for any completion of the tasks, like, actively. Okay, so how's this all related with a particular, like, computer system with our world? Or living as developers, cloud appraisers, and people like this? So the answer is actually pretty simple. Pretty often we model our processes, like especially long running processes, as workflows. That's the answer. If we look at this example, which I conventionally called parallel computer orchestration, this is probably something that not many of you have to deal with, but anyway I think it demonstrates the concept as well pretty well. So for example, I'm representing a group of scientists and I have a bunch of data that I need to process actually and I understand that one computer is not enough, so it's like a typical problem where we can apply technologies like Hadoop and some kind of HPC frameworks, like so. And oh, actually, sorry. What I would do here is I would just use a private or public cloud. I would allocate a number of virtual machines and for each one of them, I should be able to configure them, install some software, do some clustering, whatever is needed, and eventually every virtual machine would actually do some real job, some computation, and afterwards I should be able to build a report or something like that upon completion of all the tasks in this workflow. And eventually I should be able to notify a human, whether it's a cloud operator or a scientist or whatever, that the workflow is completed. So first of all, and by the way, some of those steps may be actually completed by maybe done with heat, which is also something that we need to keep in mind, but what I'm trying to say here is that even if you use something like heat or similar technologies, you still need some kind of like higher level orchestration over the whole process. And the questions that we can ask here is, for example, how do we manage like parallel execution? This is the first question. And how do we trace the state of each one of the tasks here? So because it's obviously that we can have like multiple, like many, many tasks, like thousands probably. And if something goes wrong, we need to understand where exactly the problem is. And eventually how do we synchronize all the computational branches so that we can build a report or like data aggregation at the end of each line here, right? So all of those capabilities are something that we have to deal with if we decide to implement our own solution. And by the way, even if we use something like Hadoop, some MapReduce when we work here, we still need to be able to orchestrate everything we need to be able to install it properly, do some cleanup actions and lots of other things. So the obvious answer for this is to use a workflow technology. And Mistral is exactly that kind of technology. It's an open source service that manages workflows. So now if we take a look at what are the requirements for such a workflow system. So first of all, we need a language to be able to describe those workflows somehow, right? So another thing is we should be able to run our tasks in parallel in a distributed environment so that if we manage a larger scale infrastructure, for example, we should be able to run at scale. We definitely need to support the parallel execution of our tasks. We need something like synchronization so that we can reconcile all the results of the computational branches. We need to be able to transfer data from one task to another like we saw in my first example with getting to OpenStack Summit because some of the tasks may produce something which is consumed later on by other tasks, right? We need some APIs actually to create update-delete workflows to run them, to do some monitoring, and we should have a persistent state of running workflows and tasks. So why do we need actually a persistent state? I think actually persistent state is the key here just because first of all, if we look at this example, for example, I have an arbitrary workflow and my task one has completed successfully, task three is OK, task four, task five, and something happened with my task six. Imagine if we didn't have any persistent state, if we were just implementing a script doing all the computation. It wouldn't be really difficult to understand what was going on and where the exact problem is. So state allows us to see the progress. So if we have a persistent state, we can attach to that state by using like command line interface, UI, or just programmatically using the service API. So it allows us to see exactly what's going on with my workflow right now. So it also allows us to see what was going on in the past. So the example is I'm a cloud operator and I'm going for a vacation for a couple of weeks and I know that I started my workflow that is gonna take a week. So two weeks later, I should be able to go back and see what happened like a week ago to be able to see all the results of my task and my entire workflow. So previously one of the requirements said that we shouldn't be waiting for any task completion like actively. So the whole system should be asynchronous. So we should account for asynchronous as well. And persistent state makes it possible actually to design for asynchronous processing. And what's also important in my opinion is that the ability to recover from errors. So at this example, if we, so we mentioned like task one, three, four, and five took several days to complete. In many cases, we should be able just to go fix the problem where it happened. In our particular case, it's task six and recover from that error and continue on the workflow because it doesn't make sense to start over because it's too expensive. It took like five, six days. And a persistent state of our process is a key to make it possible as well. And after all, it's all critically important when we have to manage the large scale infrastructure because otherwise it's really, really tough. So what if we just remember our lovely scripts? By using the word scripts, basically, I don't mean exactly like shell scripts or something like that. I'm mostly talking about like programmatic, like general purpose languages, maybe some specific script languages, and so on. But anyway, if we don't want to use any like frameworks like workflow or anything else, what the problems would be? So first of all, lack enough decent error handling because if the workflow is running, all is fine till it's running, right? While it's running. So if something goes wrong, we're left with a message that's really difficult to understand what happened and how to recover from that error because there is no any persistent state and if we want to make a persistent state we would end up basically inventing a technology similar to a workflow system to support that. So scripts are not scalable because everything is going, my program stack is in memory, is in heap, so the process just dies. We lose everything. It's hard to actually implement HA, it's hard to scale. We also don't have any like monitoring capabilities, we have to implement them ourselves. And of course it's pretty difficult to do something in parallel and reconcile different computational branches. In other words, right tool for the right task. Having that said, I don't mean that scripts are enemies of workflow systems actually. So workflow systems and scripts can be used together but the workflow provides you with the ability to create reusable patterns for your processes where you can incorporate different things like scripts, written in different languages, Python, Java, whatever, it can be anything. But the pattern of the process remains the same and in this case, Mistral or a similar workflow system works sort of like a glue, firing everything together, providing the mechanism for reliable data transfer between computational steps. It allows you to do things in parallels, synchronization, you know, that kind of stuff. So now I want to take a look a little bit about specific Mistral capabilities and mainly I'm not going to talk more about like APIs and how Mistral topology looks like but just briefly it has a number of components. The most important one is Mistral Engine. You can scale it linearly and depending on your load, your needs. And what's most important in my opinion is Mistral language capabilities. This is something that we are pretty much proud of and I think we found a good balance between complexity and power. So basically in the industry there is a bunch of different workflow patterns and there is a very good website called workflow patterns.com where you can go and find all kinds of workflow patterns and there are probably hundreds of them. So there are some workflow technologies that try to implement all of those workflow patterns and having that said, we didn't go that path which we realized that in practice from our experience, from experience of our partners many years experience the actual number of workflow patterns that are typically used in practice are like 5% of all those workflow patterns that exist in the market and in the industry. So we tried to make it simple enough to learn real quick and at the same time so that it allows you to do all the most important stuff. So Mistral workflow language is a YAML-based language and relies on a YACL expression language which is a very powerful expression language in our opinion, it's getting more popular in OpenStack community right now. So, and here is the list of main features I'm not gonna just read them right now and I'm gonna talk a little bit about them just in a second. So this is an example of Mistral workflow. This is how it looks like. So by the way, this is a really, real example. You can just write this example and feed it to Mistral and it will work. What this example does is well, first of all, it's called tenant cleanup and the idea is pretty simple. We have two tasks. The first one is get VMs out of our tenant based on some criteria. So that long line is exactly a YACL expression which does basically filtering. So if our VM has word Mistral in its name, we're gonna do something with it. So we build a collection of VMs that are compliant with our criteria and the second task, delete VMs, just iterates over this collection and deletes all the VMs. So you don't need to understand exactly everything in this example so let's move forward. So, pluggable task actions. As you may have noticed in the previous slide, every task is associated with inaction. So the analogies here is pretty simple actually. Inaction is something like a function, so it's a signature that has parameters, a name and a return value. So the task is a call of that function. So task is something that calls action and does a little bit more on top of that. Out of the box, Mistral goes with around, I think, 1,000 actions right now which is a set of simple actions, something like making HTTP requests, doing SSH sending in emails or sending HTTP protocol, and doing JavaScript, which is a very, very powerful thing in case if you need to do some data transformation right in the workflow body, it's actually pretty heavily used by some of our users. And we have a set of actions for calling OpenStack services. So right now I think it includes mostly all OpenStack core services like NOVA, Neutron, Heat, Sendering, and so on. And if you see that you don't have some actions that you need, you can easily write the simple Python code. Actually, well, the boilerplate code is really, really small. You can implement your own actions and you can actually plug them in if you need to. Conditional transitions. So this is a simple example of how we can make our transitions between tasks conditional. So at this picture, we have three tests. GetVMS, SendReport, and SendError. Basically, what it does is just reports all the VMs in my tenant. This workflow was just a sketch of the workflow, actually. So the first task does gets all the VMs and if the number of my VMs in the tenant is greater than zero, then I'm going to send the report. If it's not true, I'm not going to do anything. If I was unable to get the VMs out of my tenant, something went wrong and I'm going to send the error. So this is how we can do. Conditional transition utilizes YACL expression language again. Okay. Publish and persistent variables. Again, getting back to our first example here is GetVMS tasks. Basically what we do here is we build a collection of VMs that we want to do something with. And here's the example how we can publish something into what we call workflow context. In our particular case, the variable calls VMs, which contains a collection of our virtual machines. And the subsequent tasks can do something with that collection. It can process, it can iterate over all these elements and do something with them. We need to keep in mind that those two tasks may be running on different hosts. So it means that there should be a mechanism allowing to transfer that data between tasks. And what's also important is we may have different branches in our workflow and every branch has its own version of workflow context so that we can do some processing independently and then reconcile so that we can merge those results afterwards. For join. So this is an example of how we can do something in parallel. So, for example, we want to install the application. So we have a workflow for that. And we have a task called install app. But before installing our application, we have to complete two other tasks like we need to install a database and we need to install a web server. And we have a certain task for that too. Those two tasks will be running in parallel just because Mistral is smart enough to understand that there are no inbound connections for these two tasks and they will be running in parallel but the third one should be able to wait for completion of those two first to be able to proceed with. So, and if we need to do synchronization we can actually mark our task with join and they will be waiting for completion of all inbound tasks. Looping. So basically like we saw in our one of our previous examples we can iterate over a collection of elements and we can do some processing. And we have a keyword called with items here specifically for that. In this particular example the action which is associated with this task for each one of the elements in our collection. And it's also important that even though it looks like a sequential processing Mistral actually processes all those elements in parallel which is kind of interesting because it allows you to utilize like parallelism and so you will get your result much faster. But having that said we have a specific property which can actually limit those concurrence levels so we can actually make it sequential if we need to but by default it's all done in parallel. And we also have some in Mistral some specific engine commands that we call them so in this case I can explicitly fail the whole workflow if some condition is true or similarly I can succeed the whole workflow for example if I already got my data that I would expect out of my calculation. And I can also put my workflow on post if I need to and then by the way I can manually resume my workflow. Task policies which is a very very powerful concept because especially when we have to deal with like large scale automation where we may face different kinds of outages like networking for example problems and one of the interesting task policies that we can apply for each task is retry. I can configure my task to run up to three times for example till with a certain delay between my attempts till either it succeeds or exceeds the number of attempts that I could do or for example Mistral has a timeout policy so that I can wait for like five minutes till my task is completed if it doesn't happen it just fails automatically. And alternatively every task can be associated actually not with action but with what we call nested workflow so instead of calling action we can call it different workflow because it's pretty much like a polymorphic mechanism so from a task perspective it doesn't really matter if it's associated with an action or a workflow it's still perceived like a function and in this case by the way this nested workflow will be running in its own like stack with its own data with its own context so it's completely isolated from the parent workflow. Mistral now going forward and it's worth mentioning such mechanism as crontrigger so it's not related with the workflow language anymore so crontrigger is an alternative to crontab but it's different it's kind of different from crontab that it's highly available so for example if we have multiple instances of workflow engine running and one of them actually dies the crontrigger will still work so the example of using crontrigger is that we can create a crontrigger and configure it to run a certain workflow with certain input and we can configure a cront pattern how it should run so the schedule for example I can do some administrative work like once a week at 2 am every Sunday for example and I want to do some scaling up on all of my VMs in the tenant I have control over so this is a pretty important mechanism and a lot of people are using it pretty actively so now getting back to my example with parallel computing orchestration just a couple of words about it again so now we basically have answers to all of those questions that I asked when I started talking about this example so basically we can apply we can basically just build a workflow for that and we know how to fork how to do something in parallel how to reconcile the results and Mistral API also allows to see exactly what's going on with my workflow and what is running what has finished successfully or maybe with a failure and we can recover from error in doing everything we need so this slide actually contains some more use cases where workflow technology like Mistral can be used and for example at Nokia we have different kinds of healing workflows auto scaling and lots of other things and so if you find something familiar the right answer may be using Mistral Workflow Service okay couple of words about the status so workflow language for Mistral is pretty much complete as we think and it's been proven in practice because we are aware of a number of production installations of Mistral and it's actually working pretty well Mistral is able to work in a highly available mode seamless integration with OpenStack APIs simple command line interface and UI and as far as the plans for oh sorry it's Mitaka actually for Newton so it's still kind of hard to change to NewCycle so some of the things that we need to approach in NewtonCycle is multi-region support we have a number of ideas on usability improvements we need to be working on performance tuning and one of the important things that I've been actually wanting to implement is of course a workflow visualization so because ideally we need to be able to represent the whole graph statically and how it's executing, how it's running so that we have that visibility on what's going on because otherwise it's sometimes a little bit challenging to see exactly what's going on thank you and if you have some questions go ahead Hi, my name is Jiun from HDP I have two questions first question is the joining operations we have to wait for some of the previous workflow if there is any problem, one of the workflow the join operation is there any time or is there any mechanism for running backwards so you need to wait so one of your workflows should wait for other workflow so one way to do this is to organize them as like parent-child relationships so that they call one workflow from another or alternatively you can have a task associated with an asynchronous what we call action so asynchronous action and mistral is something that we don't wait for like we just fire and forget and then the responsibility of the third party to deliver the result of that action so that the workflow will continue so your workflow which is your dependencies may be able to actually put the result back to the parent workflow so that it continues can we describe the dependencies in our languages sorry can we just specify that dependencies in the languages second question is are you assuming get the only one running the workflow well if we have multiple workflows if there is any conflict between the two workflows for example some one workflow maybe delete virtual machine maybe create virtual machine there is any conflict cases how can you handle that conflict within the workflow right now it's mostly up to the user honestly but we have a number of blueprints for that to address that problem and one of the blueprints exactly about like limiting the number of workflows of certain type that can run in parallel so that you don't have to fall into situations like that thank you like a special case of this is just like a single-ton workflow it may have only one instance of it so we know that like heat templates we can integrate puppet scripts like our invoke puppet scripts from heat templates can we do the similar thing with the mistral scripts also what do I'm not sure I understand your question so you mean calling puppet from mistral well right now we don't support puppet any actions for puppet but it's it's a straightforward task to do actually you can contribute you mentioned long running processes could you comment on deadlock control by any chance like can you elaborate a little bit on that when you are sitting there waiting for a long long time for a process to finish that may not be making any progress yeah do you have a remediation strategy there well one thing that you could do is apply a time out policy so you can actually define that my task X should be running no longer than this amount of time and it fails automatically after that time it goes to it will be just going to a certain state so that we can see the time out this is Ganesh from AT&T I have a question we heavily use heat in some cases like we have we spin up certain VMs there is a dependency that other bunch of VMs need to wait until the other ones are complete probably we use the weight handle or the weight condition handle but I see that with Mistral you have these conditions and the work flows how could we do this like moving away from heat and going to the Mistral what benefits we would get and how do we do it well it's all a question of a common sense so in every particular case well I don't have the exact context of your exact problem but you definitely should be looking at what is most suitable in your particular situation I guess if heat has more interesting functionality more powerful functionality can you enable the mic do you have plans to support Mistral resources under heat yeah heat actually has resources for Mistral you can run work flows automatically by creating heat resources thank you there is also a resource for current trigger and a quick question it looks a lot like answerable to me I don't know if it came from those roots or not but can you use Mistral without OpenStack yes the answer is yes the only integration point is Keystone and it's optional so if you don't need to authenticate with Keystone you can use Mistral without OpenStack at all so in that case you'll be able to use all those built-in actions for access in OpenStack services thanks one of your slide you mentioned tasks 1, 2, 3, you know 6 and then they say if task 6 fails I know that one is long running but I say if I do want to go back to the state to start with Mistral support or at least remember the tasks in history can you go back what is your particular question can you recover back to the a particular state yes what kind of mechanism well it's called rerun mechanism so that we can rerun a certain task if we want to let's say you already finished task 1, 2, 3 how you go back to the state you start before a task number 1 okay so this is something that I need to look at honestly so I can double check that but as far as I remember you can just clear the state of certain tasks that you need to and start from any point of the workflow so you should basically just do some analysis and understand from the exact point where you can do the workflow from and you can do it manually I'm not sure if that answers your question I know a lot of workflow they support the transaction concept I think you call recovery I call people here talking about your rollback oh I see what you're seeing currently there is no rollback mechanism for workflow just because to be honest there is a conversation that's been going on for a long time in the community about that and when we figure that it's not that straight forward even conceptually because rollback sometimes well in many cases it may be not the mirrored process it's you cannot just run every task like in the exact reverse order so the rollback process may be significantly different so from that perspective you can just design the workflow that will be a rollback for your main workflow does that make sense so your point is I need to design a different workflow to rollback I'm not saying it's a right thing to do in all cases but right now this is something that's available and we definitely need to look at how to implement that in general but like I said it's kind of a controversial topic I mean in general and we should be able to understand exactly how it's going to work so whether we need to provide some sort of rollback actions for every task where we should be doing something else so it's pretty hard to define the exact meaning of the rollback for a workflow that's what I'm saying thank you