 So welcome, everyone. I'm Anrash Kohli from Nokia, and I would like you to take attention to me. So I would like you to open this URL that is visible on the screen at the moment. OK, you don't want to play with me. But in this URL, you can give me real time feedback. And we are going to review it at the end of this presentation. So it would be really good if we could work together. The important note you have to take is that whenever the small penguin appears in the corner or wherever on the screen, you have to refresh the screen. All right, so let's begin. I'm going to talk about performance of Mistral. And as you can see, the penguin is already in the screen, so you should go and refresh the web page. And if you do so, please answer the question that is there. Mistral is a workflow engine. And workflow, according to Wikipedia, is an orchestrated and repeatable pattern of business activity. Well, that's a little abstract. It's usually visualized as circles and lines between them. And I usually call it potatoes and the roots. But we are doing with it, we are developing in Nokia the Seban product that is the VNFM of Nokia. And we use Mistral to run lots of business activities, like any kind of long running actions that includes the initiation of deployment or running offensive elections, whatever is needed to run an application. You came to this presentation because you are interested in numbers. Well, numbers in our sense, or for performance, come from measurements. And you want to use it for predictions. You want to measure the memory, CPU, IO performance. And you may have any kind of KPIs that you define as you wish. You are going to use the numbers you get from the measurements to assess the performance reliability and scalability of the given product you are testing. In case of Mistral, we came up with these types of dimensions that you can use to create tasks. So for example, you can choose between the number of tasks, the types of tasks, the data flow between the tasks, whether the workflow is flat or hierarchical. You can use these kind of dimensions to define a very targeted test set. In Mistral, we have quite a few services, or it's constitutive quite a few services. The first one, where your request arrives, is the API. And the API is an N-active redundancy. As an N-active redundancy model, you can start any number of it, and they will just work with the underlying other services. The next level is the engine. The engine takes care of orchestrating the actions and actually computing the workflow graph. This has an N-active redundancy model as well, but there are some little quirks that we are actually working on in the Mistral community to fix those. And the last level is the executors. And those are very scalable. You can start any number of those as well. And these will run your actions. All services use RabbitMQ or any kind of AMQP-compatible messaging for transportation. And it uses MySQL or Postgres as a database back end. So let's now go into detail about these services. As you can see, the Penguin is already there. So you should pick up your phone and open the web page right now. So the API is used for making REST model queries. You can create, read, update, delete whatever model entities you want in the present in Mistral. And it is used for initiating actions towards the engine. What statistics you would be interested in, let's say the memory per instance. In our measurements, we found that initially each instance is going to take up about 70 megabytes of memory. And this can increase if you are doing hard on the given instance. And the second one is how the load distribution works. Well, Mistral uses a quite interesting setup. If you do not start different processes, then the parent process is going to fork itself when it starts. And then it is going to use the same socket for accepting connections. In this way, the fourth child processes are going to be running in parallel, all the requests that are coming in. And this is like an OS level load balancing. You can measure it using Jmeter. It's an old and very robust testing framework. And the tests you can think of are, for example, creating different types of executions, listing, limiting number of entities, traversing the API, whatever. What you are measuring is the responsiveness, the load distribution. That's quite neat, actually. And you can also test these API travels and stuff. To get back to numbers, I made some little measurements here. This is the create, read, update, delete cycle for workflows, actually. It's not the update and delete, but the create and list. The table shows that with the creation time or creation throughput is not going to very much dependent on how many entities are already in your system. Of course, this depends on your database. But the listing is going to be affected very much. So with 100 entities in the database, you can get up to 30 requests per second sequentially. But with 1,000 entities, it's going to be three requests per second. And at 7,000 workflows, the returned document is going to be around 1 megabyte or so. You can optimize this, of course. And you should use the limit parameter and never do requests or list requests without limits because the response is going to be unpredictable. The next service is the engine. It is processing the workflow. And it manages the data flow between the tasks that constitute your workflow. It does the input-output calculation. What it uses is Yackel and Jinja. And this is going to be an interesting point, which type of technology you should use and where. What may influence the performance is the size of the data you're transferring between different tasks, and whether you keep the results of a given task or, of course, the number of tasks in the workflow. Lately, there have been changes that were targeted towards moving the action execution back to the executor because it seems like, in some cases, the overhead of transferring data to executors is too much and causes an undesirable overhead. OK, the next service, and here's the penguin again. The next service is the executor. It runs the actions. And it's operating in an RPC request mode. You send it a request, and one of the executors is going to pick it up and start the execution, then send the response. To make it more interesting, we ran a quite long test. It took about one week. And I wanted to know how performant the Chrome triggers are. So I created 50 Chrome triggers. These execute once every minute. And this results in about 3,000 runs per hour, 72,000 runs per day, and about half a million runs in total for this whole week. To prevent the database from crashing, I set an execution policy that is going to clean up the database every time it reaches given limits. So if the number of executions is more than 700, then the access is going to be deleted. Or if the age of an execution is more than one day, then that is going to be deleted as well. The test was very simple. When the trigger fired, it started a workflow or an execution which created a Docker container. It touched a few files, then cleaned up the files, and then stopped the container. The point was that we didn't just create and then delete the container, because this would have overloaded this but we rather first created a container and then reused it later. And this is the result. Don't be deceived. It's not your fault. This is a little timing issue in the visualization. So the upper diagram shows the system load, and the bottom diagram shows the disk IO. As you can see in the initiation phase, the disk IO goes up really well. The system load is quite low. And after a while, I realized that I started the test in its too small instance. So here in this gap, I simply resized it to a bigger flavor and then started the test again. For a while, everything worked well. And then some chaotic actions arrived, and the disk IO crashed. The executions didn't start in time. Containers didn't stop in time. So it was quite thrashing. And then when I stopped the test, it gradually got better. The point here is that at the end, what you can see is that the system load went down, and the disk IO also went down, which means Mistral actually didn't lose anything. So it was stable throughout the test. We didn't have any kind of issues. So a little more war stories. Last summer, we realized that with big workflows, there is a huge performance bottleneck in the Mistral engine. And it has been fixed by the PTO. He has been working on it very hard. And by the bike release, we already got a rally job that tests whether the conditions haven't changed. So at the moment, I can say that we have an assurance that the performance of Mistral is not going to change from commit to commit very much. Then another interesting finding was that we were using Yackle expressions to transfer the data between tasks. And we used quite big data, like megabytes, 50 megabytes, 100 megabytes of data. And it was really, really slow. Then we changed to JINJA, which proved to be a good choice, as it was performing like five times better. And what was even better is JavaScript, which worked almost instantly. So the lesson learned was that if you are dealing with big data, then you should use JINJA and stay away from Yackle. Then connecting to the big data stuff, we found that there is a configuration field in Mistral's configuration. That's the execution field size limit. This is going to tell how big data Mistral is going to send to the database. We raised it to 100 megabytes and waited for the results, which were very bad. And the reason was we were using MySQL. And MySQL didn't allow very big packages or the package sizes that we were aiming for. And so we had to raise it. And after we raised it, everything just worked really well. And the third one was the database trashing. So in the previous slide, I forgot, but this is the execution expiration policy. The execution expiration policy is a really neat thing that helps you keep your database tidy. It's going to clean up entities or executions as the time goes. And the initial implementation was a little buggy. And it tried to delete everything in a single transaction. You can imagine that when you are inserting 50 megabyte rows into MySQL database, it's going to take a little more time to clean it up. And we started a test which kind of run for a while. And about one hour into the test, everything started to fail. And we didn't know why. The reason was, in the end we figured out, that this single transaction was too long. And it timed out simply. What happens when a transaction times out? It rolls back. So we were back at the same place. We were having way too many executions. And we were trashing the database. So we implemented a new item for this configuration, which is the maximum executions. And also fixed the issue that we were trying to delete everything in one batch. All right. So I prepared a little demo for this session. And this is the setup of the demo. The code for the demo is only present on my machine. I only have it. But this is going to be contributed back to Mistral very soon. You will find it in the Mistral Tools Docker folder. There will be a build sh that is going to build the Mistral container image. And there will be a start, whatever shell script that you can use to start up the setup. And it runs on Rebit and MySQL. And it sets everything so that you can immediately start using it. OK. Oh, am I excited? Of course I am. OK. I hope you were following this. All right. So let's see how Mistral looks like. This is not how Mistral looks like. I created this web frontend just for my purpose. This is not going to be contributed anywhere. Because next week there will be a huge update to the Mistral GUI. And we should be really happy about that. But for my purposes, I'm going to use this GUI. What I'm going to do is I create a new workflow, like so. It looks like this. It has two tasks, generate some input, and then do some processing. I can run it, the depth. This is a repeating workflow. The depth is the number of tasks that have to be repeated. So it's going to be two. I start it. It's running. Then it's done. I can see the tasks, whatever, and then I can delete it. Of course, this is not the best way to test Mistral. And for this, we are going to use the Jmeter application. What I have set up here is a benchmark that deletes overflows, deletes all executions, which I'm going to do right now. And then it is going to create 50 workflows. It is going to create a base workflow that is going to be used by the fifth task group, which is going to create 100 executions. And just to show you how this whole thing works, I will start the test. It's running. It's visible here. If I go to the aggregate report, I can see that I have already created the 50 workflows with, like, 8.9 per second. And at the moment, it should be running the executions. Yes. These are started in parallel or asynchronously. So there may be still some running. OK. And I can also visualize the results. This is the response time for the create requests. And if we go back to my fancy GUI, we can visualize all the workflows, the executions, tasks, actions, whatever. And then just to not leave the door open, I'm going to clean up everything. In a few seconds, it will be done. OK. It's done. Fabulous. And this actually concludes my demo. And I would like to just tell you a few quotes I heard during the summit, which are kind of very promising for Mistral. The first one was, can I use Mistral to implement my VM expression policy? Absolutely you can, was the answer. Then, grown jobs are one of the killer features we still do not use in triple O. Of course, you can go ahead and do so. And just today, I heard that Masakari, the VM high availability service, could call Mistral workflows, which was happening in the OpenStack AJ discussion. And I think this is a very promising message for us. Thank you. Do you have questions? Oh, I had a few questions. And I hope you responded to those. Are you excited? Of course you are. Do you know Mistral? Nobody answered. I like numbers. Performance test is a DOS attack. It's not necessarily who believes in fairy tales. And the executor should be what it is. This is your decision. I cannot say anything to it. But if you have more questions, then you can ask them right now. OK, no questions. And thank you very much for your attention.