 Project that I've been working on with some other folks in the open-stack community for the last about five months You can get totally about myself. I've been a Yahoo for about six years now So look in California Sunnyvale Bay Area Yahoo from New York So all US Yeah, so I've been involved with the open-stack for about the last two and a half years did some other stuff internally and and I've been pretty active and trying and a lot on the discussion list you probably know my name from for various good or bad reasons Yeah Thanks for all coming Just gonna give a little overview of sort of what task flow is and how it how it connects into open-stack or at least part of the Vision I have for it in open-stack or and sort of how other people can help it and where currently like what kind of problems it solves and all those kind of good things So sort of sort of the reason that I sort of built task flow or After doing a lot of work with various projects in open-stack is there's this constant kind of issue around Reimplementation of this problem called what I call it state management at Akaitis, which is not a real word But that's when I made up It's sort of about how workflows in the various components Are all sort of ad hoc it and hard to alter and hold hard to understand in a general manner And there's not there. They're very distributed. They're very Coated as needed in a way So there's certain problems that happen when you when you go down that path and certain projects have seen that more than others Or when you have when you have the speed of just creating workflows ad hoc You sort of lose out on for sort of advanced operations that you could do that are sort of necessary for a system like open-stack and certain the subsystems that compose opensack like Nova Cinder glance Some of the start of the obvious stuff is that these kind of distributed systems just making them work in a reliable manner It's not an easy thing Yahoo has a lot of experience doing it and it's not an easy problem that That in general at all like there's reasons why zookeeper and a lot of other Processes and programs exist to help out in this process Hadoop was one thing that we built that also has similar issues. So There's certain things that I've tried to take from those various projects and try to help out with this taskful project Some of the things someone one of the examples from the Cinder Project that I helped out with during Havana was the kind this RPC boundary that you see in a lot of the projects It's sort of a scalability boundary Where you can actually scale out horizontally by having more RPC receivers But that also creates problems with just state consistency when you have a boundary that you have to jump across via an RPC message that goes into a queue You have to figure out what that means when it when it takes a long time to get there or what kind of states or what kind of resources are in a different Different states of existence at that point So this is sort of one of the issues that we've been working through with the Cinder folks about how to Do a state machine that we can talk about a little bit later, too if you look at the code you'll see there's quite a bit of If just by analyzing code you find that there's race conditions that may May just be systems or symptoms of other problems, but they exist So there's there's certain optimizations that we can work around with a library like taskful Which I'll go into a little bit more about and how we can sort of improve the process that exists to make it more Can a warmer reliable manner So some of the other issues of course when I mentioned manager the driver API boundary This is a bit a bigger problem for projects like Nova and Cinder which actually have the drivers volume drivers or virtualization drivers and there's a there's an API between them But there's not a good defined state machine that happens inside the driver layer So there's been I think I'm not sure how many bugs but there has been bugs about Certain processes inside of Cinder the drivers sort of doing things that maybe the manager layer does not want them to do Like manipulating database records that they shouldn't be having access to or going into undefined states that make it hard for the manager layer to recover from so Something like that is a good example of that boundary State recovery of course is it is it is an interesting question when you have The you need to do upgrades of the various pieces of the software So without that kind of recovery built in you sort of have a problem when you need to just terminate the process to do an upgrade So say you want to have a Cinder volume process you want to stop it To do a software upgrade When you're stopping and you can't really predict what it's doing at that current moment by having something like task flow existing It's hard to do a stop on that process directly without having to go do some manual cleanup later So I yahoo we've seen this with Nova Compute if you try to stop it during an upgrade There's some periodic tasks that will run later when you start it back up to try to recover from where it left off That's sort of delayed recovery. So that's not always the best way to go about it either So I'll propose sort of what we've done in task flow to help increase that reliability So an obvious question and why does this matter? There's various reasons when one to upgrade path is sort of one of those reasons for me personally because Every six months there's a new release that you have to move to or you get stuck in these hard upgrade cycles that take A lot of human resources to just manage and coordinate Various companies are going through that like every six months. So we how are as well You want to have this kind of API reliability? So that's sort of all he's having API uptime if you if you have to shut down your processes And you have to do manual work to recover them when you upgrade that means some part of your system is going to be down at that time So minimizing the manual work needed to actually upgrade and recover from the upgrade is a big part of keeping your system reliable of course as well any kind of state corruption that happens you have to Will cost people money right if those manual processes to do the recovery of that Also cost you money and people and time So those are all things that if you if you design with a little foresight you can you can try to avoid Most of those of course, there's always situations where you have to get people involved and that's that's why we're here So one of the under other part of the problem that I've been trying to help out with is making sure that you can Understand the workflows and the various state transitions that happen inside of the various projects So that includes things like how does the virtual how is Nova boot of the M? And how can you alter that that boot to do different things? So you want to do something special in your cloud? They call out to another database or we'll call out to another web service without without sort of having a well-defined workflow It's hard to know exactly what to do and when to do it and what the side effects will be so certain things like that become at least have been necessary out who's are doing various Legacy integration, so we've had to you have to have those that human resource knowledge To actually go into the code and actually understand it out of the level deep enough that you can actually make those alterations That's not something that all these different projects can or all these different companies that use open-stack can actually Afford to have that kind of internal knowledge that requires people to actually go deep into the code to figure out how to add different things on Yeah, as we discussed before just doing upgrades has been problematic for yahoo I know other like certain other companies or people Have to have different strategies around upgrades some of them just involve actually like rebuilding a new cloud So we've been trying not to do that a yahoo just because that's a seems like a waste of resources and it is but it's it's riskier So it would be nice to have this kind of state consistency and Help our library to actually make that upgrade process more streamlined require less manual work so If we can do that we can move to live upgrades in it in my opinion we can move to live upgrades in a much easier path it will eventually Become a lot easier and I think it will become routine for people to do upgrades Following a certain standard set of guidelines and in a live manner So this this is all sort of important things that we have to get around to an open stack The other reason for me is also we can build a system that does all this I believe we can I know it's not it's not necessarily easy and open source to get agree Meant on all these various things, but I think we're sort of at a time in open stack where we can the foundation is stable enough And we all sort of understand where it's at that we can start to make these kind of Alterations to make the system more reliable now that we sort of know that what it means to boot a VM How to use the cloud and sort of a dynamic away? We can actually start focusing on what are the problem areas with reliability that we want to address How do we avoid having corruption of different states that we can't recover from so a yahoo? I know at least from our operational standpoint. I'm from just a from a developer So we want to go as many nines as we can if you ask yourself how many nines does open stack have when it can't do like a upgrade That's it starts that the number of nines gets pretty small When you have to take it down and without special strategies to do like different upgrade strategies around Having new clouds that you bring up and you do migrations over So there's all these things that we want to try to make it pretty simple And we want to try to increase the reliability of the whole system in general So it's a win for everybody in the end So there's this question of how do we get there and also what kind of what kind of things that can that library bring along if it If we actually open those kind of doors When you have sort of a thing like taskful, which I'll get into soon You can I think we can enable various things that just haven't been thought about before and sort of bring unique features that That we can add on to different various opens our opens that projects that are sort of being talked about in the design sessions right now If you were active in yesterday, there were some Nova stuff about tasks There's how to view this various tasks they had to do recovery from those So there's concepts that are popping up that are good And it's kind of new doors that we can open for different projects is something that will be pretty powerful I think in the next year and very very useful for different operational capabilities, so At this point, I guess I'll go into sort of what taskful is and and sort of what its goals were for the last five months And how who's been working on it and where it's actually being used and sort of how it how it operates So a little bit about it We've just recently released a pi pi Library version for this. It's been developed by Yahoo Grid Dynamics Rackspace of various few other companies. There's a whole a channel for an IRC that we've been I started and have been Active in there's a pretty detailed level of documentation That I've been trying to make sure it's it's once people make sure the foundation of this library is very well thought out so Yeah, so there's lots of good stuff that I've been trying to follow with respect to practices and involvement of the community in general so there's certain things though that it's not and These are sort of a summary of those are pieces of those Currently, it's not trying to be a web service with an API in front of it There's some sessions. I think an unconference perhaps. I'm not exactly a hundred sure but about Mistral this project I just got announced about Two or three weeks ago. That's trying to provide sort of a more general approach around Workflow as a service. So that's you can there's some sessions on that There's also a link that I think off of the main open stack page about that. It's not going to solve everything so There's there's there's still programmers involved. There still has to be some coding So you won't get your rainbow pony out of it automatically Yeah, so you still have to do careful coding, but hopefully it helps in making That carefulness a little bit easier and helps you understand some of the basic principles around reliability and recoverability. So Hopefully this that's that's things that's not and there's some good stuff that will happen I think and he and Mistral and a bunch of other projects around this these concepts So I think it's gaining the foundational concepts Which I guess we can talk about now are sort of in my opinion the basic frame of Taskflow so if you look at most code and it has some kind of structure, right? There's the Nova compute booting has a very structure that sort of spans across three different components the API Which does some validation creates a database record? You can think of then it goes to the scheduler the scheduler makes some decision and then it arrives at the compute node which actually splits into Setting up the image setting up the volume So it goes through this various structure you call that you can think of that as like they're your houses frame or your applications frame in a way So the thing that this starts to differ in taskflow is that the execution of that workflow or the structure is actually Control the a concept in taskflow called an engine. So there's this Gain there's that there's there's this thing when you gain when you actually control the execution and that you can actually resume From that execution later say when you do a kill nine of the process that is running that workflow You can actually pick that back up and then continue working on it. So that Eventually gets you to an upgrade strategy that doesn't involve manual labor to figure out what happened and what went wrong So there's also this concept of persistence. This is sort of connected into the last Area where you need to to be able to resume something you needed to know you need to have enough persistence to know where and what Was executed so you can actually pick it back up in certain cases this You can do this like if you say if you have simple persistence layer You can say persists like a volume ID and you can pick up say which slot what last thing was I doing on that volume ID? Certain things like if you think of the programming world you think it like an argument to a function That's a resource. There was certain things like that. You can't persist But so we've been working through some sort of best practices. I'll go to later about certain certain patterns that well I think will help in task will to help you Sort of organize the code in a way that will work in this manner So I sort of court of that is this concept of work recovery. So something that's that's Sort of done differently in various open-circ projects is this concept of how do I how do I shut off the process in a way that? I can recover from and How do I also handle failures from those different processes? So think about say like a Cinder volume creation So it goes through a similar steps as steps as the Nova create or Nova create instance and that it will eventually get down to a Stage where it will actually call into the driver manager and it could potentially fail at that stage So there's this question of what at what what do you do about the failure case for that scenario? So in Cinder there's this they do a similar thing as Nova where they reschedule But if you if you can't actually if you don't spend a lot of time looking at the code to see that you won't you won't actually sort of Understand that that Workflow is occurring. So this the task will approach to that would be basically you form a bunch of tasks Which I can sort of show in this diagram here You form this upper Structure here, which is I would call the frame of your workflow and you break it into small little pieces So you can think of this one as being maybe create your Format your volume, I guess you could call it and you call this one to play maybe place your data on the volume and then use you Provide various inputs and outputs to sort of do the control execution of this workflow This is sort of the top level high level interface. That's that the task actually I provide There's this code for this actually this so this is sort of where that's taken from and then you as a library user sort of just construct this thing that will be executed not Defined up here doesn't it's not actually executed at the point you define it But it's executed later and via this concept down here where you provide this thing called our flow, which is this it was a There was there was workflow We had definition issues in the beginning of task flow where we wanted to make sure that we don't overlap with too many other words that we're all using in various projects, so We decided to go you can think of a flow is just a bunch of tasks that have some connection between them Not the workflow if you want to call it that but You didn't hear that from me So yeah, so you provide that kind of information to this concept of an engine the engine is the controlled execution layer It will actually has a compilation mechanism, which is pretty basic It sort of translates at least in some of these different engine types It will translate the this flow here into something simpler It's but it's up to the engine to do that and then it also has this concept of running So the running is the important one We're actually executes all of these various things that you've set up for it to execute in whatever order that you've defined That so then goes down to this stage down here Where there's a various set of state transitions that this engine will go through so you the some of them are listed here that are taken Out of the source code that there's actually more but you can see that there's this pending to running So there's running to success and all those kind of states failure states. So these things allow you to Hook into that mechanism. You can actually Watch what's happening to the workflow as it's going on. So that sort of allows pretty detailed Level of level of analysis of what's going on in your workflow Which is useful for things if in OpenStack in general like to say a nova create You can they have a way to actually monitor what's going on in a workflow So this is where if you think about the task API is if you've been in those sessions for Nova They want to have a way to actually see what a workflow is doing so with these kind of state transition notifications You can actually say write those notifications out to some other database and then you can report that back to you in API So this is sort of the reason why task flow doesn't have an API is that it exposes these underlying capabilities To get that same information. So it also in the back end Support stored the persistence layer which is internal to the task flow, but is abstracted through an interface So that can be replaced with other implementations. There's this back end that will actually do the persistence of the various Transition inputs and outputs and state transitions of the whole workflow. That's useful for when you say this process that's running this Die so you can basically tell it I want to resume this engine in this flow Provided this information from whichever one of these back ends and then it will restart itself going through the various state Transitions to accomplish that and so that's sort of the high-level idea what the whole structure does So I guess we can since I've over it going over these we can maybe skip a little bit of this But so there's the concept of in task flow of the smallest unit that's possible to actually execute You can think of it almost mapping to a function in program in Python, but it's a this concept of a task object It's the smallest thing that actually do some some meaningful layer of work As you can see here, there's a little bit of different in that in that it also is expected to revert that same work So usually if you're in the function function oriented world of the object oriented world you have Try accept blocks right so this thing the try block will be the execute part and the exception block will be the Reverting block in a way. So that's that's sort of how it maps to this concept that try accept block Of course, you can't actually control the execution of that. So that's that's one of the issues with that But it also receives inputs and and has output so the inputs is so that You can actually take function arguments and you can actually return some useful result that other tasks can depend on existing So the declares output it says sort of an interesting on that for the Python people because there's Most program language actually have no way to declare what the outputs of a function are in a meaningful way So especially Python where you sort of it's can be anything in a way. So the sort of the flow composition here Actually organizes all these different tasks into into some kind of structure that will use these inputs and outputs to do some meaningful work. So say You do a volume creation. You want to have a task that actually outputs maybe the database record result of that creation in a database and then you can have some further result after that actually go and process the The volume and actually make make maybe a Device for that. So you can organize your workflow in a way That's similar to how you'd organize it in a functional manner But you can also do various different things like dependency order Which is something that's a little bit different and then you don't see that in functional oriented programming too much where you can do topological Ordering of your tasks. So say if you have dependencies of your tasks Maybe they would depend on B and B would depend on C You can order these in a way that the engine using this workflow I mean sorry the workflow that's doing it in topological ordering and actually automatically in for how to run your Workflow in a way that satisfies all the tasks inputs and outputs. So that's that's still it's it's I'm trying to see certainly Where that's useful in open stack at the moment most of those stuff an open stack It's sort of a linear kind of pattern where you have say Cinder doing these certain set of steps So that one we're still working on if the how work where that can be useful I think it's useful in if you look at how heat does their stuff They they do more advanced topological ordering than most projects in open stack currently do so We're trying to create as many a small set of ways that will be useful for different projects but so that's one of them and Probably more useful for heat than others if they wanted to use that so it's us so sort of the engine use case What engines do of course everybody described a little bit before is they go through these different state transitions Which we have you there's a link on that we can once I share these later That they're sort of describes exactly what the state transitions are so they're pretty well defined I mean this is one of the things I want to make sure that Happen in Tassel is that instead of having to go to the code to figure out what is going on? There's pretty good documentation on what what the state transitions are they're up to date so There's a pretty well-defined set of state transitions I think some of the basics you saw before we're pending to running Running to resuming so there's those kind of state transitions once you define those and you have this kind of engine layer That actually knows how to use those you can sort of bring along these resumable behaviors That are pretty useful for various projects to actually recover from upgrades and to recover from various failures I used to know where I would have to do manually before Yeah, sure Mm-hmm. Mm-hmm Yes, it will revert yeah It will revert yes, it handles it does that automatically that we're working on some stuff around Having that be sort of a strategy defined So sometimes you don't want to go through all them before you want to go up to a certain point through what we call What we've been calling at a checkpoint I'm gonna go back to the last checkpoint and maybe you can collapse all the different tasks into one One we run one reversion task instead of having many so yeah so we right now we focused on sort of this is the simplest one which is just go backwards through all of them and Yeah, so that's that's should go ahead Huh? That's also we've been thinking about that and that's in the same kind of strategy area that we're working on so Yes, it will we yeah, we've been working on that a little bit sure Uh So I don't I think it's it can be but there doesn't have to be I don't think we're forcing it because I don't think you can force it And I'm proud to like opens like it's hard to pull off item potency in a pure manner at this stage I think if we if it was forced from like stage zero Maybe you'd have better chances, but then you might I don't I probably wouldn't have to develop this library if it Yeah, so and there's no force it. I don't think it's it's hard to force it when When there's so many diverse workflows that are already in place sir question backer So yeah, so that's it. So I'm going to repeat the question. Can you please repeat the question? Yeah? Yeah, sure sure. So the question was Mistral will be built around this while it will provide that rest the API and how much of it exists sort of right That's sort of the general gist so Mistral is still in the designing phase. So it doesn't actually exist yet I think it was more they're trying to make sure that the use cases for it are really well understood and pretty Really well documented, but we've been I've been interacting with them around using tactical as the underlying layer We're gonna see how that's gonna go and I think it to me. It's more important that they define The limited set of what they want to accomplish because you can you can sort of think of this is just a general programming paradigm You can say we're just gonna be anything with it So to me it's more it's really important that they they have been and are working on making sure that they have a very Well-defined set of use cases and and try not to overlap with two different many projects So yeah, so eventually I think they will provide that API around workflows what it will be. I think it's still to be decided. So yeah Yeah, yeah, so let me see if I understand that so Is it is it targeted for any use case or a limited set? Yeah So I've been as a as an open stack project It's more targeted toward the use cases that are in the various open stack projects Just because that's where I'm trying to aim first in and foremost. It's in general though I've tried to keep it disconnected in a way that you can actually use it outside of open stack So I think it's applicable to both areas I know that there are certain companies like that are using it not for for internal kind of projects I think AT&T I've talked on so with some folks from there and they're using it for something internal I don't exactly know what so it's not strongly connected to like it doesn't require that Nova be there It's it's a library that sort of it's a paradigm in a way, so it doesn't force it Sure Question yeah, so That yeah, so I'm sorry. Yeah. Yeah. Thank you So the question was is task flow similar to Amazon simple workflow service. I would say no Mistral maybe I don't know but that one's TBD So mistral if you were the work for Amazon workflow service is basically like a rest API that you provide a workflow definition to And you provide it sort of activates that for you, right? So task flow isn't designed to be that at that level. It could be something that is used By mistral or something else to actually accomplish that. Yeah, it's not targeted to be that I mean if people use it for that then sure Yeah, I see that you're providing For parallelism here, which is a really good thing for scale But I wondering if you include things such as exclusion or synchronization, which becomes essential when you're when you're doing distributed Tasks. Yeah. Yeah, so there's there's plan to have have some kind of synchronization mechanism as although it's a heated debate around that area and open SAC I'm letting it settle down a little bit in various other places first And we'll see how that goes and then I'm gonna skin it once they work it out a little bit then we'll I'm just piggyback on their stuff and Let's see how that goes. Yeah, no, there will be I think be more useful for you to be a little more aggressive in informing them that I'm pretty sure they're well Yeah, yeah, yeah, I think it's a known problem And it's a known known thing that's being worked on an open SAC that kind of Distributed system and how to do the synchronization around it. It's not just open. It's not just heat that it's not no No aiming at heat or anything. They're just the first ones to try to figure out how do we Sort of fix this in a way that works sitting in the neutron sessions. They keep mentioning their races Yes, yeah, I don't I think yeah, it's not just neutron I think it's a lot of places that are they're starting to realize like the issue a little bit and The heat is the first one in my opinion to actually start attacking you a little bit So good work heat people put in the front Question sure any more questions where I At what points do you write to the database for persistence sure so at Currently at every state transition, so that means when the function So there's various actually there's a various scheme of there's the tasks that are actually backed by the database They have a state that's associated so at every state transition of a task that the engine moves through it will write to that So yeah, it's gonna be a little database heavy there, but as Say a task returns some information. It's going to also write it there, too. So when it returns say I Don't know a string or something it's going to write back to the database so that that can be used later This was also just keep in mind that this is not probably the most best way And I think that's why we want to bring in this concept of checkpointing where we can sort of collapse some of these This information that's being stored into something so our so does each task have to be item potent Because you'll restart that task later, right? Yeah, but it's not going to it's going to skip over a task that's already ran So if you're in half of the task and it left behind some State and then you rerun it then it'll have to know that it could possibly So yeah, so task that yeah So if the task halfway through fails, then your question is how it's going to restart that one. Yeah. Yeah, so well So so the tasks have to be item potent, but the whole whole workflow doesn't yes, that's a good That's a good characterization. Yeah, so In general, I think they sort of expect that in code How do you I think you have you have to prepare for that one no matter what right? So but yes, well in your volume example I called the driver and then it failed and the driver might the storage might have made the volume it might not have So the drivers now have to know that the driver might already exist and that it's not necessarily an error Sure. Sure. Sure. Okay, so I think yeah, so it's it's not it's if a task is item potent And then that definitely helps in resuming but if the whole structure isn't item potent Which I think it isn't necessarily open stack and it will help out in that area So yeah, the question is this rollback sort of save you there Yeah, I think it does I mean it does save you it just like If you have if the simple word like rollback strategy may not be suitable for everyone So yeah, but there's this question around the driver question is a good one And it's a good question. That's sort of we we're ongoing in cinder is like the driver layer, especially when things go into the driver layer, they're almost like a black box, so It's hard to actually pull off almost any kind of recovery at the driver layer without having a pretty well-defined State machine what that's supposed to be the driver or what is supposed to go on inside the driver? So I think there's not there's ongoing work at least in cinder from what I've been talking with Duncan and various other people in the cinder team about how to sort of formalize that that state machine so You can actually do that kind of recovery so there's certain working tasks without working on for me for ice house to make that happen So the driver layer is an interesting one that's sort of a it's a problem out of case in and I think a lot of projects that you Sort of don't know what's happened after you make the function call It's sort of just happened and then you hope that whatever is I didn't pull that Sure, I was just gonna say in that particular respect is there been any talk about having the driver Simply contribute Subsections of the workflow and then have the actual Yeah, be wholly Contained inside one so there has but it's it's it's resistance in that it changes a lot of the driver Yeah, it changes sort of flips it on its head. So that's so I think there's there's potential there Maybe in the future, but it's swaying changes so much of the current code to certainly so that yeah, that's that's the heart of her And then I think I think yeah, people I've people I've talked to that's like yeah that we should we could if we put it If you just have a return on what it's going to do and you can sort of just control that right right, but The way the current drivers work the roar of a black box So it's sort of it's completely changes, but I think in the end. I think probably it will maybe end up like that But time will tell yeah, cool Thanks, so as we were sort of talking right now about the engine that actually runs these things Let's see where I leave off There's a curtain there's currently a couple of limitations of these engines So if you think of the concept that is the controlled execution you can once you control the execution You can actually run that that that workflow in various man. Oh wait one second. I have a question. Okay. All right You mentioned the upgrades as a important use case here. Yeah How do you if you are using task flow and you want to in your version 2.0 change the workflow What happens during an upgrade so so we've yeah, so it's a good question So we're putting in some hook-in points that you can actually do it sort of begins in almost into the same area as migrations for a database and Not not not the full set of it But a smaller set and that you if you want to change you want to do an upgrade of a version of a workflow You can actually hook in the task with the analyze what was executed and you can actually interrogate it and say I don't want to execute that same thing again, or I want to add on new tasks to that workflow Yeah, so it's a it's an area that I've written up with there's some documentation I can point out to you at the end about it if you want about some of our current thinking about how to do that How to add how to change a workflow if it's being upgraded or it's being modified during the software upgrade that's After you've done the software upgrade, right? So there's been certain concern thinking about how to do it and how to do it in a way that doesn't like actually fail So yeah, there has been some thinking about it It's not it's not simple stuff, but there has been some thought about it at least it requires it It's almost similar to the database migrations and in a certain way, but yeah, it's more complicated than that But that's the gist of it, but there's been some thinking we can maybe talk after that if you want sure cool, so back to the concept of this execution layer, so When you can sort of define your workflow and that's separate from how you're executing the workflow one of the things that we're working on in the Havana release was Sort of these basic ways of executing things that that matched how they can be used in various open stack projects To start off you may not want a complicated way of executing things So you may want to use these first two listed types here, but at a more complicated and more complex level is you want a way To actually distribute that workflow across various machines So there was some work done by some rack space folks and others as well as me to start working on that and continue evolving that This is it's a way It's a very more complicated piece because you have to worry about RPC timeouts How does how does the code of the organization the coordination between all these different work the tasks that are running? How does that work and so that's a work in progress But I think that's the most powerful one that will help a lot of the different open stack projects sort of Distribute the work in a way that they don't have to repeat doing that in every single project like glance has this glance work concept that they want but Also, I think he has a heat work heat engine concept. They want to distribute work as well So there's this repetition of work that I think if it's not repeated and there's sort of an organization around it You can actually use a standard set and not have to worry about that So I mean I doubt that glance really wants to maintain a worker concept And have to have a deployment strategy that court accommodates this so Having that as a central library that can sort of try to provide a decent enough for everybody mechanism if it's possible We have this distributed engine. I think will be pretty useful for the various projects So that one's of course is still work in progress But we're hoping to work on something that will work for some basic scenarios in ice house and get some more feedback there And see how that goes, but the first two ones actually just right now And we're working on improving them and making making various product to use them or getting them integrated into various projects So back to the little bit of the persistence layer. This is sort of an interesting one This is sort of describes a little bit what what we're persisting and how it's so useful in general As described here sort of the saves the task state the progress or the results of the tasks to some persistence layer It doesn't have to be a database, but the ones that we implemented for ice house I mean Havana or sort of a database file system and just the local in-memory for testing those kind of use cases Once you sort of have that persistence layer you can actually Reconstruct it and you can actually start resuming and of course they bring us into this question of how do you upgrade a workflow now? Now that you've paused it and you maybe shut off the software What do you do when it starts back up? So it's all those kind of questions coming to play once you have that capability So I think it's good to have those kind of questions in general because otherwise Hasn't really worked out but So the other son of the interesting part that sort of is being asked about a lot in the various projects Even in this design someone is I want to see the task API I want to see what the history of what was running and what happened and I want to sort of a play by play action Of what's going on if you try a nova boot right now You you'll see that it goes through a limited set of states and they're not easy to sort of figure out What's going to happen or what may happen and what was the failure mode? So when you have this kind of persistence there you can expose that via an API to say what were my tasks is doing or What what failed and what was the failure? So that's also useful for internal usage and that we can undo those that various chain of actions that occurred But it's also used for just for users to see what happened and the progress that happened and the whole workflow cycle So those kind of things are very useful in my opinion for internal and external usage by the various users of fast flow So another concept that is sort of unique that I developed all I guess in the early phases But it's not not all there in the the current release is this concept of a higher level job that gets executed So if you think of a noble compute Run instance call as being a high level a top level entity say you basically want to create a VM So that itself is composed of a bunch of derivative set of tasks or workflows that actually accomplish that goal so Right now in the projects you look at the different mechanisms that that's being executed You'll see that it spans about three different components. It's Sort of hard to follow, but that's that's all things that this is going to help with but in Ideal world if you know that set of workflows and the tasks and you'd have that top level connector You can actually transfer that top level connector the run instance request Immediately if it fails. So this is where an interesting concept comes in say The noble computer or something that's relatively easy to do start off with that is if it fails On a worker do say the worker being disconnected from the network or the power cycles that kind of Connection to this think mechanism called a job or you can actually a job board here, which is sort of a similar to the little picture down here You can actually Connect that repost that kind of job back to the job board or you can think of it as a cue in a way And you can actually allow another entity to start resuming that that workflow So when you have the workflow like concept you can actually just restart it and hope that it works on the next worker Or you can undo it on another worker. So it sort of helps in the scenario where a Nova compute or some other entity will actually fail And then you want to continue working on say some various job in a way that's highly available So this it's it's still being worked out the whole details around this I'm still prototyping it a little bit But the the concept I think is pretty useful in in a way that lets you perform tasks that That may be item put or made not be in a way that you can sort of Continue working on them if they fell on one worker or fell on another worker. You can actually Just continue working on them or undo as whatever is appropriate. So this is sort of where I'm I'm a little bit weird, but there's got me. There's a zookeeper kind of way to do this There's a various other backing invitations. So If you look at the mailing list recently, there's some zookeeper stuff So I'm letting that filter out a little bit before Yeah, that code goes in task flow. So that's another one. I'm thankful for these guys I'm in the front pushing the boundaries a little bit So yeah, so but we'll see how that works out and which backing limitation it finally becomes but The concept still I think is useful. So so what exists is another question. So Since it's the whole this whole task flow library started about I guess five months ago or around then What we got in so far at this release what I call the zero one release, which was I guess about a week and a half ago It sort of contains the whole the abstraction around the tasks the workflows The way to connect those together The way to resume them the basic persistence layer and and what I call the local engines, which is the non-distributed ones that were listed on that table I've been pretty proud that we've tried to make it really well documented What all the like the how to use it some examples the state transitions? So these are all pretty basic concepts that if they're not well understood They're gonna be it just makes it really hard to use so I tried to make sure that these are all in the wiki I think wiki at that open stack that or Tash last task flows is one way to get there. I guess once we release these slides you can all click on those links, too So what's missing some of the stuff that's being worked on or under construction? As was mentioned the tribute engine we want to get something going there that will satisfy a basic set of needs Something simple. I hope to start or continue on the work that was done by others to work on that So various projects can use it so they don't have to rebuild the same mechanism. Oh We'll need to see I need to sleep so that will help out with the glance so I think once that Or see who else needs it or desires it I think various projects one it will this it's gonna be an interesting one to work on To see how we can make that possible There's this locking service, which I'm still unsure if it's needed or not But I think at the lowest level of task flow it may be needed when but we'll see that one's that working progress as well I'm letting the heat thing filter out a little bit before I more on that The zookeeper stores layers if as a back end to the persistence layer, so we have pluggable back ends to where the status persisted There's going to be a little bit of work. I think I'm having a zookeeper back into that as well This job and job board concept. I've been working on a little bit This week and last week on that so working through that and seeing how that's gonna work out So those those to me are not like key primary things that were needed in Havana or even ice house so These are more of additional things that I'm hoping that will happen for the next release Zero dot two or we haven't officially named the version or which one we're gonna do but Sort of yeah, maybe I can click on some of these. Let's see if this will work. Let's show you some little examples Let's find out here There's a bunch of examples there that I put up Time-time. Yeah. Oh, well, that was fun. Good work guys Okay, time's up. Yep. Okay. Let me just say last thing Okay, yeah, here's just