 Hello friends. Good morning. Welcome. My name is Zane Bidder. I'm one of the core developers of heat in the form of PTL My co-conspirators this morning Renaka Murav from Nokia. He's the PTL of Mistral and Feilong Wang from Catalyst. He's the PTL of Zaka And we're here to talk to you today about how to build a self-healing application with aid Zaka and Mistral There will be a live demo at the end of the session So if anyone kind of came with the the hopes of seeing the live demo crash and burn You'll have to wait for the end of session. I'm sorry, but Stay tuned because I think we've got some good stuff for you in the meantime Also, if anyone in the audience has a dead chicken, could you please come see us up front? So these guys are going to talk to you a bit about what aid Zaka and Mistral are But the one word answer is they're the alarming messaging and Workflow services for OpenStack, but I'll let them fill you in with some more details But before then I wanted to talk to you all about Kind of the big picture here and why What we've done in this in the Newton cycle is built some integration between Zaka and Mistral So I respect for that with so long and in renard and so long and time I have a from my team at red hat For my heat PTL as well implemented it And we think this is a big deal and we want to tell you about why So to do that I'm gonna start at the beginning By answering the big questions. Why are we here? I'll let myself to why we here at the OpenStack summit What are what are clouds? I'm sure all of you working on OpenStack have had the experience of trying to explain what OpenStack is to layperson One time I had to try to explain it to an economist, which led me to to start thinking about how how could we explain OpenStack and cloud in economics terms and The answer I came up with was it's it's kind of the culmination of a culmination so far of a trend to reduce transaction costs that that Prevent us from transaction cost prevent us from Utilizing our resources efficiently basically so if you cast your mind back to the bad old days of When you had to buy physical service run your application on You want to do a new application the first thing you do was you you called up your your server vendor and you said hey I need some servers And he was like sure We can build those for you and take a couple of weeks and then we'll put them on a truck and shut them out to you Or really it's purchase or you be like sure I'll fax that right over to you. It's primitive stuff and Once the service finally arrived on a truck, you know Your IT department would have to rack them hook them up to the network and so on and finally you could deploy your application And This whole process could take you know a month if you're lucky So the result of that was you bought a lot more service than you needed You bought them a lot earlier than you needed them Because otherwise you were going to end up You know if you had a load spike or something you couldn't handle it because you you're a whole feedback loop here is a month long And think that really changed this in the industry was virtualization So with virtualization you your developer can go into your ticket tracking system I was going to put an open source project logo here, but let's face it. It's probably Jira Open a ticket with your IT department and say hey, I need a server Your IT department could could go into over it because open source. Yes Provision your server send it back give it to the developer success, right? We've reduced the fever. We've taken the the really slow things like trucks out of the loop And we've reduced it down to you know this this could easily happen in one day So we're for it took a month and you might you might go through that process once every six months Here it takes a day and and you might go through it every few weeks So that's great But we still have a problem here we still have a person in the loop And that's slowing things down more than it needs to And that's where I can say it kind of comes into the picture one day. I'll figure out which direction is which on this click So those take it becomes self-service right So you don't often can go to over stacks. They get me a VM and I'm stuck return it and you've got it straight away There's no waiting and you can easily do this 20 times a day without a second thought So that's great success, right? We took the developer out of the loop. Sorry, we took the person out of the loop Or did we? Because I still see one there. There is a big loop and there is a developer sitting there So you may have noticed that I did not put a title on this slide I Think a lot of people would would say this is cloud we've got open-stack. We're running cloud, right? To me that's not not really the case like if you put this up up as a service on the internet and Said yeah, I'm a I'm a cloud people would kind of laugh at you a little bit like this is not comparable to AWS or Azure or or Google Compute Engine It's it's more comparable to a VPS hosting service like Digital ocean or something like that It's great. It's better than then the non-self-service version But It's not really a cloud to my way of thinking So everyone's entitled to their own definition of cloud, but my my working Working hypothesis which I'm sharing with you today is is this cloud is when the application itself is managing its infrastructure So the developer might initially deploy the application But once it's deployed The application itself decides when it needs more servers It decides when it needs less servers it decides when a server is died and it needs to be replaced And all this stuff can happen autonomously not only during the eight hours of the day when you have a developer sitting in front of horizon clicking on So that's my working definition of a definition of a cloud and The key thing about this is that You really really really want this to be open source Because we saw your application is now Relying on the cloud API is there it's part of the platform against which your application is running So we saw in the 90s like what happens when you we write your application against a proprietary API's You don't want to go back to that So I think it's really critical that That I think opensack is the only project that can be this thing right now So I think it's really critical that we support this open stack and that it Will give people the option to to write autonomous applications against open-source clouds and By the way, if you're if you're a higher achiever You can replace the developer in this with a continuous delivery workflow. So that's the Jenkins logo there You can have Jenkins deploying to open stack. There's a couple of other options Zool is what the second for team has written That that is also a good way to continuous delivery There's also the solemn project in the open stack Is also aiming at the same kind of thing? So that's kind of the what we're trying to achieve here The the two challenges with this for having your application talk to the and by way by application here, I mean Not necessarily just code running on another server on a container or whatever it's it's also Cloud services like aid and and zikar and what have you become part of the application? So your application is basically could be defined by say a heat template Rather than think of it as a software package So the two big challenges here are When the cloud is talking to your application We need that to be asynchronous because the cloud can't wait for your application The cloud has to be because we made it self-service the cloud has to be multi-tenant so It can't block on anyone Tenant and that communication has to be reliable because you don't You know, you don't ever want to lose a message And zikar is the answer to this because it is an asynchronous cue that with reliable delivery for open stack So that's that problem solve. The other health problem is you want To have all your calls authenticated of course and and you want to lock down the authorization to only the things that you need to do and It's time to pass this one is when it's the open-sets services talking to each other on your behalf And we're using keystone trusts for that and the other half is when you have Code actually running on another server and container or whatever and that has to call the other stack APIs and that has been historically a bit of a problem because we don't really have a good way of Creating credentials for another server and locking them down to own the APIs we want them to access So that work is kind of ongoing and we're hoping to push that in but I was when I made the slide I was planning to tell you all that that was kind of still a work in progress and we had to wait for that but In the process of doing the talk I I realized that we've actually solved this problem as well with the zikar master integration So talk about that in a moment So here's what we're trying to build or in fact what we have built and and basically so zikar is in the center there And it's kind of the hub where so we've got Messages going through zikar and there are various sources of messages and there are various syncs where the messages can go so for example a lot of a Lot of projects were not a lot of projects, but Heat for example is able to send as a car notifications on certain events like when you hit Resource hooks, so you can set a hook in your heat template saying Break before this resources created or updated and you can send a as a car message at that point There's also Solometer and aid also notifications You can trigger alarms off of notifications They're called event alarms and aid there are of course other types of alarms from aid that based on whatever data salameter is collected and so these can all target zikar as a As a message Q and you can pump messages into there the application can also send messages to zikar And in fact as I was mentioning before about the the authentication there are pre-signed URLs in zikar, so you can give your application code running on another server a pre-signed Zikar URL and the only thing you can do with that is pump messages into that one zikar queue and Then we have the message sync so the main one we want to talk about here today is menstrual And that's what we've implemented in Newton is is the menstrual message sync So basically these messages coming out of this car queue can trigger a workflow in menstrual to start So that that's how it also solves our authorization problem Because you can you can only pump messages into one to car queue if you've got the pre-signed URL for that queue And the only thing that can happen as a result of that is running that one workflow, so it actually gives you a extremely fine-grained way of of Locking down the permissions to running that exact workflow so That's that's basically the system. I'll let you guys these guys give you some some more details, but Basically menstrual can call any API an open stack and so this this whole kind of loop here gives you an extremely flexible way of of Taking actions based on events in the cloud So if you know we're demonstrating a self-healing today You can totally get self-healing if your application is running on Kubernetes That's fine like if if that fits your use case you should probably you know buy open shift or whatever But It's not very flexible. It's a platform of service gives you You know you do it our way and you like it Infrastructure as a service is extremely flexible. You can you can do just about anything in here. So Lastly before I hand over to these guys. I just want to say That It's we're turning over to y'all, right? It's really up to you We will get the open stack that we deserve. It's an open source project So please Go out there try this out Try the demo out Figure out what what your own application needs start putting together your own workflows your own your own message kind of setups and Try it out use it be vocal in the community say, you know, this is this is What we want open stack to be we want it to be a real cloud not just to be PS hosting service if you buy from a distributor Call them up and say you know Please put more resources into this especially if you buy from red hat Call them up say I saw Zane these talk. Why aren't you giving them more time to work on this? If you don't buy from a distributor one will call you So yeah, just That there are people in the community who Don't think who think this is a distraction and I don't think it is and and it's really up to People who are developing applications on open stack the users of open stack to say hey, yeah, this is this is something we need We what we want to see more of this. So thank you very much I'm gonna turn over to for a long who's gonna tell you a bit about aid and Zaka Okay Next I'm going to introduce a little bit about aid and somebody called it old out her ALDH whatever and I want to be surprised, you know, if there is another pronunciation about it It is an open-stack alarming service and it's looped out from Celerometer So I leave some aid features, but you know, there's not all the features of it I just leave some features related to this topic So it can support event alarm and threshold alarm and in in this topic for our case we only Care about the event alarm so we can so when Celerometer collects the event from, you know The other stories and forwarded to aid and then we can I mean Zaka can get that that event so about About one or two cycles ago. I was asked to reveal a patch in aid that is a Zaka driver it looks very interesting because you can with that case you can You can get the event notification collected by Celerometer and forwarded to to Zaka I Say I'm excited because you know for The old way Celerometer can can collect all the notifications, but those notifications Can't be consumed by the end user When I say end user, I mean the tenant user so yeah, I I Have heard about for example another stories like searchlight Also doing this the similar thing to collect you know the notifications from From the rubbish ruby. I'm killed message bus and then forwarded to darker so Is a really good feature and that's one of the main part of this of this demo and Then what is darker? So I I talked with then about you know when we try to Propose this idea because just because I I don't want to propose another Ducker introduction topic in open-stack summit. We have talked about you know those kind of Topic a lot. So we would like to do some Some real things some interesting thing Anyway, so I would like to give a very short introduction about darker Doctor is a multi-tenant cloud messaging service for web and Mobile development you can you can imagine this, you know the same service in AWS is ice QS and SNS to queue service and not vacation service so the features of Dr Currently dr. Can't suppose Messaging service and you can call it to queue service or there are all and notification service for for the messaging service is like a pops up and Producer consumer the traditional way and notification service you can create some Subscriptions on a queue. So when there is a message posted in the queue The subscriber will be notified automatically and currently we support the databases from We have two layer sorry We have two layer Layers of database one is a measurement database and one is the message database for message database We support mongo DB and Redis for minimum database with support mongo and So Calcami So for transport layer, we also have a transport layer transport layer We support whiskey that the you know the traditional HTTP way and the web socket and For notification driver currently we support email So for example when you post a message in the queue if you create a subscriptions on the queue Based on an email address Then the message will be forward, you know send to the email address and we also support web hook for sure and The trust driver. That's my favorite driver. I didn't buy Tom's so the trust driver is a you know the case long the the key part of of this demo Yeah, so now I will pass away Okay. Yep. Thanks So just to provide a little bit more details on to that big picture that they described so I'm gonna talk a little bit about mistral and So essentially what what is mistral so mistral is a workflow service workflow engine or in other words we can call it like process automation and state management tool and I can say a couple of words a little bit later Why I choose chose actually This name state management tool So mistral is also a language to design distributed processes In fact, so they will we call them workflows actually workflows kind of a general term but term but in our case we Call workflows essentially distributed some distributed processes that involve a lot of things And a distributed environment actually a lot of calls So and it's also an Opus tech service, of course with the rest API and It's scalable. We can scale it almost like linearly and What's important in the context of this demo? Mistral helps wiring Opus tech services together. So it works sort of like a glue and This is something I will tell a little bit about later So like What does mistral do essentially so it allows you basically to Describe a workflow template workflow file and you can upload this workflow template into Mistral into the service and Essentially, you can start this workflow manually or in our case. It started based on some event happening in the cloud somewhere and When it's running it allows you to track the process the progress of this process actually so that you can see What's already completed? What's not what is still running? So that's why I chose that name State management tool because state is like really important concept here because it's persistent allows you to do all kind of monitoring like Manual intervention human intervention actually something failed you can Actually go fix the problem and start from the same point So that's why I think state is so important here and eventually when the workflow completes you can extract the workflow result through the API and You can also navigate through execution history so that you know exactly what was happening before Just a couple words about Mistral language, so it's actually pretty simple like a couple of hours to learn. I believe so at least I'm biased a little bit, but I Really believe that so and it's the amal base because we all love the amal in Opus tech and So what it allows you to do is it allows you to design workflows or distributed processes essentially consistent of tasks so task is like a key like a Basic element that you can build your workflows out of and you can define transitions between tasks So essentially what you do here is you can represent your distributed process distributed scenario is a graph essentially and when it's running the workflow engine actually makes decisions about what should be running next and that actually Shapes the path in the graph how the workflow is proceeding and Tasks are associated with actions. So Mistral goes with a lot of actions like Zen mentioned Mistral can actually call any open stack services and this is provided this in the form of actions So and when it comes to wiring multiple open stack services, it's essentially The interesting point so we can flexibly Define what should be done on a certain event that just happened in the cloud So and basically you can build a scenario of how to heal your application how to other scale it do something else And it's extremely flexible You can have a set of workflows for like any kind of situations essentially and when you design this workflow It's some certain steps of this workflow actually Mistral can call open stack. That's how it Helps wearing it. So more specifically Mistral provides you with all the actions so you can build this kind of graph describing this process and It combines all this like service calls into one distributed process and what's important here. It's actually It helps with passing data bit data between the services So for example, you call one service and you need that the output of the service to be the input of the next service. So And in order to do it reliably Mistral is a really helpful tool here and It's it also helps passing security contacts, but Like Zen said It's kind of a work in progress thing in many in many ways, but it helps anyway. So basically it like determines Defines the context of this whole scenario of this whole distributed process involving like multiple open stack calls Yeah, and like I said before it's also important that Mistral allows to do to make this scenario like state state stateful and suitable for monitoring. So you can See this persistent state by like connecting it to it from like any tools Like CLI something else And that's basically that's can we go to the demo. Thanks Okay, so before we show the crazy demo, let me Get a little bit introduction about how it works Then help gave a very good graph about how it works, but basically This is a pre steps how to set up the the application actually So firstly we need to create a stack in heat the stack actually the stack is your application for example auto scaling auto scaling instance and Create a workflow in Mistral based on the stack information for example the the instance ID the stack ID the stack ID and Then Create a pre-signed queue in Docker so that you can get up, you know the prison accused and create and The alarm in 8 and before that we also need to create a subscription in Docker based on the workflow. We just created so that when there is a notification in Celerometer celerometer will notify 8 and 8 for world For words a message to zucker and zucker notify Mistral to as cute to workflow So yeah, and the last step is create an alarm in 8 Based on the pre-signed queue information, so you don't have to worry about it, you know the authentication issue And precisely I just mentioned how it works So I'd like to Okay, so firstly I will yeah Initialize the environment just clean some out results. I just created just give it a new key That I really just because I'm using Docker for heat to To listen the you know the heat stack create Changed status instead of just pulling heat again again to check if the stack has been created completely Okay, the first step is create the stacking heat Okay So now the stack has been created successfully and we are going to the next step So the next step is Create the creating the workflow in mistral just based on the information we just got from the stack We need the instance ID and the stack ID So, you know at later we can we can mark the instance as unhealthy and call heat Stack update to update the entire stack And then we create a pre-signed queue in backer You will see the the information the pre-signed queue information You just create subscription. You can see the first subscription is based on So the the workflow ID of mistral so that you know when there is a Notification an alarm from 8 to darker darker will create the execution in mistral and we also create Subscription based on an email so that you know when they're in alarm you will be notified based on the email Okay, so We just create all the stuff we need the last step. We just create an alarm in 8 Just not finished. So now let's Take a look at The instance we just create two instance Is an out of scaling group. So there are two instance and that group and for now We can try to ping it works Works so now I'm going to stop one instance, I know the the Noa stop status is not a perfect Status you can trust to you know to do the auto healing, but I think it's good enough, you know to show the idea Okay, so we can take a look at the no status So what we are expecting is there should be a new instance create and The old one will be Yeah, just leave it at a status. Come on. Well, I've never got strike again actually that's one of the the issue I I See when I you know to prepare the whole demo the the in one of Our salameter to aid is not really Really stable I would say try shutting down the other one Okay, it's not really easy to Try again. No As I tried it before before the session I will try the other one to see if it works Okay, I now I'm going to restart the The salameter agent notification to see if the notification can be forward Oh We're almost out of time. Do you want to play the video? Oh, yeah, maybe we can we can actually in the meantime Maybe we could take some questions. Yeah, you might have questions. I'm happy to ask Excellent question. So the question was can we use the heat stack to create all the stuff that we see in the script here? I'm still missing one. Yeah, short answers. No, we're missing like two resource types of needed heat One is so we can create aid alarms and heat but not the aid event alarms So I've raised a bug for that. We will we will get that implemented in our car The other thing is there's no way to create a pre-signed as a car queue and heat at the moment Again race bug will fix that in a car So I would expect in the next couple months. We'll be able to Put this whole thing in a single heat template and and you can define your whole application one heat template and say bam There we go. Self healing Application in one template. Yeah Yes Yes, yes, it can Yeah, actually you can yeah, so It can be well the only integration point actually with OpenStack is OpenStack actions Available in the workflow language and they're available only if you authenticate against guest keystone But if you disable it, it's fine to use it anywhere else. So actually most of our like real Big use cases right now Have nothing almost have nothing to do with open stack. Yeah, so you're we're detecting and the Vm stops or fails or whatever and and what you're detecting you can change it depending on what events you set up alarms on Yes, so you're so if you want your application to be the one detecting foes instead of Someter and aid You can you can post messages to the zikaku directly from your application or you can post Your application can post Statistics into Solometer and you can have alarms coming up based on that as I said before it's it's very very flexible But yes, if you've got you know, no girls monitoring your thing you can have that post messages to the car Pretty much anything you can think of Yeah, you can set it up. Okay. I have pulled the the script I'm using on GitHub so you can Grab the link to take a look. It's at the whole floor. But yeah, as Tim mentioned after all we created those Missing results in he's we can basically create, you know all of the things in one hit template and yeah So I guess we ran out of time I don't believe there's anything special you need to enable I know there is a couple I think bugs that we fix that may not Be unstable in the end yet the only thing the only thing you may need to enable is enable the event event alarm in Celerometer's configuring file test. Oh, yeah, thank you very much. Thanks everyone