 All right, it's 11.15, should we go ahead and start? All right, well, welcome to the session on using OpenStack to clean up after itself. I'm Nina Garadia. I'm an architect with IBM Cloud. My colleague Todd Johnson, unfortunately, had a personal conflict, so he's not here today. So anyway, let's go ahead. So what I thought I'd start off with is looking at why did we do this? What made us, what were the use cases that led to this piece of work and some of our thinking behind the approach we took? And that will lead us right into Mistral because we decided to use Mistral as our workflow engine. It won't be a deep dive on Mistral by any stretch of the imagination, but it's really more in touching on the features in Mistral that led us to decide to use Mistral. And then we'll dive into the actual use cases itself. I thought I'd start off with the project clean up, since that actually turned out to be the simpler of the two use cases. So we look at that, go through a quick demo on that, and then build on that when we look at instance expiration, and then we look at what the next steps are and then open up for questions. All right, so one of the things we found as we started deploying private clouds is that once you provided the ease of usage to your consumers, it actually led to a really good problem where we had a lot of people using our clouds, both for our customers as well as our internal developer clouds and IBM, but what we found is it actually led to what we call VM sprawl. We ended up with lots of VMs, not all of them being used all the time. And while we could have done something like Janet a monkey, which would go out periodically, look at unused resources and clean it up, what we found is if we had a policy-based cleanup process where if you were using your VMs for development, we had two-week sprints. It made sense after every two weeks to just go and automatically clean up the VMs, right? But obviously the duration that you wanted to leave those VMs would vary based on your projects for a development point of view. Two weeks made sense, whatever your sprint cycles were, but if you were doing stress testing, right? If you were doing a longevity test run, obviously then you wanted to leave the VMs down for a lot longer. And if you had production workloads, you didn't need any kind of expiration. So what we found is we really wanted project-based policies to determine when you wanted to initiate your cleanup. So we were throwing that idea, we already had some solutions in place with an IBM varied from scripting to actual product-based solutions. So we were looking at actually providing that feature in OpenStat because it was a value to us. And then we were actually pointing to this use case that Shamir from the product work group had actually created out in the community where he was basically, he had a requirement for a very similar feature. And I put in the extract there, again, he was saying to address VM sprawl, he would really like to be able to have some concept of a lease of life for a VM. So that was really the background to what initiated this. So what we said is if we take a step back, I already talked about needing expiration policies on a per-project basis, but it wasn't just a policy. What we also found is the action that you would take to expire quote-unquote your VM varied. In specific projects, you might want to just stop the server. Did you want to delete the server? Did you want to take a snapshot and then delete it? So the actions that you took also could vary based on your project and based on the policies in your company actually. But at its core, your expiration was really a set of policies and a set of actions that you wanted to take. And then if you dug even deeper and you looked at what did you mean by a set of policies, right? It was really a set of metadata, right? How long could the VM run? Did you want to send warning emails? When did you want to initiate sending the emails? How often did you want to send emails, right? All this is a set of metadata that you could apply on an instance and the metadata would vary based on your project. And then within that metadata, you might want to give the user the option of overriding what your defaults were. So for example in IBM, if you had a two week default in our development projects, sometimes I would spin up a VM just because I had to give my end of sprint demo, right? And I didn't need the VM for two weeks. I might say I just want the VM for one week, right? So you might want to allow the user to override what your defaults are, but override it within a set of constraints, right? So that is really what we meant by policy. And then actions, when you looked at a set of actions, your actions, if you wanted to take a snapshot and then delete it, it's a set of tasks that you need to run. So what you're really talking about is a workflow, right? So then we said, we need a place where we can define metadata and we need a workflow engine. And of course being developers, I first thought was, oh, do we need to write a new service? And when you dug around, you know, OpenStack already has these features. Glass has support for metadata definitions, which is really very convenient. You can define sets of metadata for NOVA, for sender. You can make it project-based. You can make it public. It's service and project-based. So it had all the features we needed. And Mistral has a really good workflow engine. So we said the building blocks are already there in the community. So let's go ahead and use that. So that actually led us into Mistral. To be honest with you, Mistral wasn't a project that we had a lot of experience with in my team. So we dug into Mistral and it is the workflow engine for OpenStack. It works great. And the nice thing is you can define your workflows in YAML, right? And it has support for YACL expressions when you wanna have conditional execution of workflows, et cetera. So it's a very powerful, you can actually end up building these really powerful sets of workflows with the features that they have. And the YAML DSL is well-documented. And in fact, overall, I have to say the documentation for Mistral is actually pretty good when you're starting off. It does have a lot of good documentation. So we started off with Mistral. So just to dive a little deeper, obviously it's a workflow engine. It allows you to execute a sequence of tasks. And what it allows you to do is you can define a set of actions. So each task will execute an action. And Mistral comes with a set of pre-canned actions which are available for you. Some convenient ones, you know, to send mail, right, to post an HTTP request. It has support for SSH. It has support for executing JavaScript. So it has a set of convenient pre-canned actions that you can use. The other thing it has, which is really of value to us, was the fact that you could actually execute actions against other OpenStack projects. It has the OpenStack Action Pack. So it has support for Keystone, for Nova, for Glance, for Cinder, Heat, and so on, which is very convenient for us because now from within your workflow, you could end up executing OpenStack commands. And under the covers, it calls the Python Clients for the various projects. And then the third option that it does have is it allows you to define your own custom action pack. And when we started off and we were playing around, what we really did is we defined our own custom actions, which we used, and then over time, we worked with the Mistral community to get it back into mainstream with Mistral. The other convenient thing with Mistral is not only do you have the ability of calling it other actions, but you actually can call other workflows. You can have a workflow call another workflow, which is really convenient because then you can actually design your workflows as building blocks and you can end up building these larger workflows, which we did. And obviously if you have an action calling another action or a workflow calling another workflow, you have to be able to pass data back and forth. So it allows you to publish data, both between actions and between workflows. And like any good workflow engine, you can run stuff sequentially, you can build your dependency tree, it'll end up building your dependency graph, and then it executes to that. And then in the context of OpenStack, when you create your workflows, you can define your workflows to be private or you could define them to be public. So that's also very convenient. The other interesting thing was that you could define your workflows, you know, at the point that you create your workflow, you could have it run immediately, but it also has a support for CRON triggers. So you can actually schedule your workflows to run very similar to a CRON job, but it's actually running your workflow. And that's something we found very interesting. But what it didn't have, surprisingly, I said it has support for various OpenStack projects, it didn't have support for Mistral itself. So from within a Mistral workflow, you can call out to Mistral commands. And that's something which did get upstreamed into Mitaka. While it had support for Glance, it didn't have support for Glance metadata definitions. So that's something that we should be upstreaming pretty soon. But the third thing that it didn't have was the ability to trigger a workflow in response to an event. And that we thought would be extremely useful because if you could respond to a NOVA event, you could go out and do things with it. And as we were discussing this notification trigger, we realized that if we could trigger in response to a Keystone project-deleted event, hey, the old problem of what do we do with all our often resources. Because in OpenStack today, if you go into Keystone and you delete a project, it gets deleted in Keystone. NOVA doesn't know about it. So you could have all these instances that were created against that project which just hang around in NOVA. Or you could have images in Glance, you could have network definitions in Neutron and so on because these are all different projects. And this had been discussed in the community some time back. And in response to that discussion, Keystone had gone and created, in the sense they provided this event, so they post this event when they delete a project, the problem is no one's responding to that event. Now if we had this notification trigger in Mistral, what we could do is we could respond to that then deleted event. And we could trigger a workflow. And the nice thing is in that workflow now, based on all the projects you have deployed in your installation, you could initiate the cleanup for those projects. And from the event what we get is a project ID. Unfortunately Keystone today doesn't send the project name so we have asked for that. It's just a usability issue when you send out emails saying, hey, this is what you've done in response to this project being deleted. It'd be nice to have the project name. Right now our emails only have the project ID but that's a net. Right? So what we've done is we've registered a workflow in response to that specific event being received. And what we do in that workflow, okay. So what we do in response to that workflow is we go out and we try making this a little dynamic because you end up with one workflow per project. So we've got one workflow for Nova where we're going to find all the instances. We have one workflow for Glance, we've got one for Neutron, and we've got one for sender volumes. Right? But this can grow over time. So all that we've done is what the parent workflow does is it goes and finds all these child workflows, right? All the workflows that have been created with a specific naming pattern it finds and then it executes those. Right? And once it executes those then it goes back and it gets all the results. Assembles an email and sends out an email. Right? And since this is in response to an event this is really transparent to everyone. Right? It's really business as usual as far as the cloud administrator is concerned. Right? We haven't changed anything in the process and I'm just going to quickly show you how this works. It's just a quick demo and I have to say I'm running this demo for my laptop so it might not be the fastest. So what I've done is I have a project called Test Elite, right? And typically the admin would just come in and delete this project. Right? You've been told the project's no longer needed. You want to delete it. But before I delete it, I just want to show you something. Okay? If I go into Test Elite you'll find I have instances. I have some center volumes. I actually have an image. I also have some public images which we do not want to clean up, right? Even though it's visible to the Test Elite project these are public images. And obviously since I had some VMs running I also have some networks defined. Right? You can see two private networks. So anyway what I'm going to do is I'm going to delete the project. And Horizon doesn't like it if I delete the project that I am in. So I'm just switching. So Test Elite, right? Businesses usually just come in. You delete the project. You could do this from the CLI. Gone ahead and deleted it. Under the covers what you'll find is that the workflow has kicked off. Right? And it's going to go ahead and start deleting the instances, et cetera. You can see it's running. So all this is happening in the background. And pretty soon. Oops. Ignore that. Network's flaky. It'll come back. This is what happens with a live demo, right? Okay, it's almost done. But what it's doing is basically going through the workflows. And I can, as I said, we had a parent workflow and it's basically executing all the various workflows under it. And once it's done, I'll get an email. It's really as simple as that. Right? So this is very convenient because we've just automated the whole project cleanup process. And if you had heat resources, you'd just have to provide a workflow to delete that. It would automatically get triggered as part of this parent workflow. So again, very convenient, transparent to your users, right? And you don't have any orphaned resources. Now once we did that, I'll get the email pop up pretty soon. Okay, I don't know why I didn't get the pop up. You can see it says the project has been deleted. And here are all the resources that have been cleaned up. Right? And the interesting thing is we had three instances. We had two volumes, two networks. Obviously, if you delete the network configuration, you first have to delete the ports. So it deleted the ports for you. And glance, we deleted the one image. It didn't delete the public images. All right, so that was convenient. Very useful. And this is something we're building on. We'd like to add workflows for the other projects. Since this was a proof of concept, we just focused on the initial core projects we needed. The next thing we looked at was expirations. And I touched on this in my lead up, right? But again, expirations, you have to expire the instances are given after a specific amount of time. Absolutely important to send warning emails. Now for the prototype, we just decided expire meant you stop the instance, right? We're not going to do any of the snapshotting and deleting right now. It was absolutely essential that we made this project level configurable, right? One size does not fit all. And the other thing that was really important to us is we're starting off with instances, right? But you really need to be able to expand this concept to any kind of resource you want. You might want to also have an expiration policy and volumes, for example, in your installation. So again, architecturally, we come up with a solution, but this is something that you want to be able to extend to whatever resources you want to manage. All right, so in this case, in the previous case, we were responding to the project deleted event. In this case, what we really ended up doing was reacting to an instance created event. When an instance is created, what we do is we look at what project the instance is in and is there an expiration policy associated with that project. We also see if there's any metadata on the instance which is overriding what the default policy for that project was. Based on those two, we figure out what is the expiration date and when do we start sending emails and how often do we have to send those emails, right? Once we have that information, we do a couple of things. One is we update the metadata in the NOVA instance so that if the user comes into her eyes and they'll actually see when it's going to expire. The other thing we do is we actually create cron triggers, right, because this is activity. We're gonna expire the instance sometime in the future. So from within our workflow, we've calculated when the instance has to expire and we set up a cron trigger which Mistral supports, right, for that date, right? And we associate that cron trigger with a workflow which has the expiration steps in our case which is gonna be to stop the server, right? So that's the first cron trigger. The second cron trigger is to send out the warning emails, right? And again, depending on the policies, like when is the first email gonna get triggered and then how often do you wanna send that email? Okay, and then once we set up this thing, right, we say, okay, your policies that the VM's gonna expire in two weeks. So you set up your cron trigger to fire in two weeks which you're gonna go and stop the server. What happens if the user comes in before that and deletes the VM, right? So we have to take care of that use case as well. So then we've actually registered another cron trigger for your Nova instance deleted event, right? And in response to the deleted event, what we go and do is we check to see if there were any cron triggers for that instance and if there were, we just clean up the cron triggers. So we handle that use case as well. Now I'm gonna demo this but I have to say for the demo, I've taken some shortcuts, right? Because I'm gonna have the VM expire in five minutes and start sending warning emails in three minutes. So we're back to, so what I'm gonna do is first, I'm gonna boot because remember I said we're gonna respond to an instance created event. So I'm just gonna go ahead and boot an instance, all right? So you can see Nova's gone ahead and created this. So let's, sorry, let's go back to, and I had done this under a demo two project because as I said, we had policies, right? So we should see this instance here, all right? Again, from the end user point of view, it's business as usual. You can come in through Horizon, you're coming through CLI, nothing's changed. Remember what I said? You can see under the covers, the Mistral workflow was triggered, right? It's calculated what the expiration is. It tells you when the first email's gonna come, it tells you when it's going to expire, right? The other thing I'd said is under the covers, it had actually created your crown triggers, right? So as an admin, you can come in and look at your crown triggers and you had two. This is the email one, for the demo two project, we send out two emails, you can see, and obviously the expiration one would only run once, right? So it's just got remaining execution is one. Now while we wait for this, because it's gonna take a few minutes, I wanted to show you the metadata definition. I'm not using the defaults for demo two because obviously, and I'll take you through a demo with the real defaults, but I just wanted to show you how these definitions are set in glance. So we've defined something called expiration policies, right? This is where the defaults, and in this case, I've made it public, but you could actually have one per project. And the metadata definitions is all defined in JSON. And we have different sections, we have different objects in there, and this is just an informational one, and obviously the nice thing is you can define your own sets of policies, right? This is what we've just used for the proof of concept. In this case, we're talking about the first one is really about your emails, right? When do you send your first email? What is the default value? What's the maximum value? So if you define your bounds, right? What's the minimum and maximum? We don't allow the user to override it beyond that range, right? And we always have a default because, like what I did from the command line, right? I didn't define any metadata, right? So there was nothing that came from the user, so then it just uses a default from the policy, right? So that was the first section. The second section is a work in progress, so it's not being demoed because it's not fully implemented, and that is we really would want it to have the concept of a grace period, right? You want to shut down your server and then give them some time, they can come and do whatever they want, they do a snapshot or whatever, and then you can delete it. So you could actually have a grace period to find, but we haven't finished implementing it, so I'm just going to skip that section. And then the last section was actually the actual expiration period, right? The first section we talked about was the email notification, the last section was the policy defining when does the instance actually expire. And the nice thing about this is that, as I said, if you looked at this metadata definition, right? We've created this and we've associated with NOVA. You can create another policy and if you wanted you could then associate it with sender. And what we found, which is actually a pleasant surprise for us was that Horizon has really good integration with Glance Metadata definitions, and I'll quickly show you that. And that is if you came in through Horizon and you booted up an instance, if you go and deploy an image, it actually automatically integrates with Glance Metadata definitions and it pulls in the metadata for you, which is really neat. There was some changes we had to make, but all of those landed in Metaka. So I'm just gonna quickly show you that I'm not gonna actually deploy it. I'm gonna change to demo, because I want to use the default policy. Let's go ahead and launch it. Just gonna use tiny, I think I've met all the key things. Let's just go into metadata. And if you notice, this is something that Horizon already had, right? If I wanted to override any of the defaults, I can just add that. And Glance has built this panel based on the JSON, right? So it won't allow me to override numbers which are outside the range. So that's really nice, because I could now go ahead and launch this and I still get the expiration support. If I didn't do this, I'll get the default policy, right? Which is also very convenient. Okay, I'm not gonna launch this. Let's get back to demo two. Go to the workflows. And you will see, oh, I did delete the earlier ones. You're gonna see a lot of the, let's just look at expirations. So you can see it sent. It's executed all these workflows. We can look at the tasks. Actually, before I do that, let me just go back to the executions. The nice thing is, if you look at send expiration email, you can click on that. It'll show you what all the steps are. What is the input? What is the output? It's really good when you're debugging and something goes wrong. Very, very convenient. And it also shows you corresponding to this execution. The execution is really an instance of your workflow, right? Your workflow was really a template. It was your YAML file. At the point that you execute it, right? You execute your workflow, you get an execution record. And then that has the different actions, the different tasks. So you can see what the corresponding tasks are. And you can see the state of the tasks. It's very, very convenient. Now I should have, I don't know why it's not popping up. So you see I've gotten the warning emails. The first one came which said your instance is about to expire. I was busy talking and I didn't get a pop-up so I didn't see it. I got the two warning emails. And out here you find that the instance has actually expired. I got the final one. And if you go back to you can see the instance has been stopped, right? So again, very convenient because as I said, what we liked about this was you could define your policies using Glance and you can define your actions using Mistral, right? And the combination of having your event trigger and the cron trigger makes it extremely powerful because you can react to specific events. And if you wanted to, of course, you could react right away or you could then plan things out in future as you want to do additional cleanup, right? So we put those two together and we came up with this expiration workflow. And I stress, right? And I really do stress this. I mean, we're demoing instances but you can extend this to other resources and open stack. All right. So having done that, right? What are our next steps? This started off as a proof of concept, right? This is both of these were features we very strongly felt the need for to be honest with you for the project cleanup in our clouds, we have, you know, we have scripts floating around, right? And I think most administrators end up doing that because you can't afford to have open resources. What we really liked with doing this is, you know, it's integrated into open stack, you're running open stack commands. It flows into whatever logging monitoring you're doing, right, whatever tracking you're doing. And one thing I didn't cover was the fact that as you run these workflows, you know, the project cleanup workflow you want to run as an admin because you have to go and clean up resources across various projects. But expiration workflows, you have the option of running it under the user credential itself, right? So when you have these triggers, you have an option of using the admin credential in the person who submitted the workflow, right? You can end up saying you want to run under the credential the person who submitted the workflow or you could run under the credentials of the person or the credentials of what you get from the event, right? From the user context of the event. So you had that choice. Right now when I ran the demo, I actually ran both under admin. And the only reason is that we found that in Keystone, when you go to retrieve the user email address, because you want to send all the warning emails, right? You have to go and retrieve the user email address from Keystone. You need admin authority for that. So we have to figure out, is it just a policy thing or do we need to talk to someone from Keystone? So right now we're running it under admin, but that's really because of Keystone, all right? So we need to continue validating it, expanding on it, right? Making sure we cover, especially for the project cleanup, cover all the major projects. So we welcome additional input and obviously additional contribution, right? So that we make sure we come up with a robust set of workflows. Obviously the notification trigger support is not in Mistral. There's a blueprint and spec out there. As spec goes through the process, we'll upload whatever code we're using for this. Just upload that as a work in progress. I talked about the Mistral action pack for Glance metadata definitions. That's pretty much almost done. We just had to clean it up and start the upstream process for that. The other interesting thing is we talked with the app catalog team, right? Because you have Glance images, et cetera, published in the app catalog. And we thought this would be, if you know, especially the project cleanup, we think these are really useful set of workflows, right? It would be good to publish it and make it available to others who are interested. So they do have a blueprint to expand the app catalog to support additional kind of templates, heat templates, Mistral templates, and so on. So yeah, hopefully we could work on that as well. The other thing we've also been, you know, we need to hopefully close in this summit during the design sessions. It's also as we're working, we've published these workflows in the app catalog, I think when we're closer to having it all work, right? But as we're working these workflows, you know, as we're developing these workflows, it'd be nice to do that as well in the community and not just, you know, off on our own. So maybe, you know, I think there's a space where operators publish scripts that are useful to each other. Maybe we could look at using that or maybe look at Mistral or Worstcase, put it out just in a public GitHub under IBM, where everyone, but I do think even the workflow development should be a shared thing and not just, you know, us coming back and saying, here's what we think it is, right? This is a POC, so we started off with it, but I do think that's another thing we want to move back into the public space. All right, and I think I'm actually on good time for questions. There's actually a mic there because it's being recorded. Yeah, go ahead. Yeah, for the notification support, is that gonna be limited to just open-sag notifications or can you implement it so that you could connect up an external message bus and have extra notifications trigger? That's a great question. To be honest with you, from a POC point of view, we've only looked at open-sag, but I do know the Mistral team has some questions in that space because Mistral can function independent of open-sag. So I think when you look at it from the overall Mistral point of view, I'm sure they would be very careful that we don't force them into having to have open-sag, and I think we can go one of two ways so you could make this an optional feature just as you have the action pack, right? Or if they wanted to, you could look at making it more generic. Yeah, I would think the implementation could be generic enough that it would plug right into open-sag, but then you could plug in any other bus you have. Right, right. Thank you. Sure. When you create a new project or new VM instance, is it mandatory to have the expression date? I'm sorry, could you repeat the question? Is it mandatory to have the expression date when you have a new VM instance? No, it's just we wanted that, right? If you, so you could set up your workflow which says if your project doesn't have an expiration policy and glance metadata services, just assume they're not participating in expirations and then you don't do anything, make it a no op, right? And then you wouldn't be forcing expirations. If you had a production project, right? You wouldn't want that to force and an expiration date way out in the future, right? You just don't want to do expirations then. So absolutely you could make, you could write the workflow where you were not forced into that and we wouldn't expect you to want to be forced into that, right? Yeah. Is Mistral workflow can be used in VMware integrated OpenStack? I am not a Mistral expert. I do know Mistral can be used outside of OpenStack and have they been used with VMware? I don't know. The Mistral Ptl is here in the audience. So maybe, yeah, if he could answer that. Hey, first of all, it was a great presentation. Thank you very much. Thank you. The answer is yes. Mistral can be used independently of OpenStack and all of the production installations that right now that are working, that we're aware of exactly using the schemer so without OpenStack. So the only integration point is basically keystone authentication. If you don't need it, you're fine to use it without OpenStack. So do you have users, do you have anyone who's used it with VMware? Do you know? Not that I'm aware of. Okay, okay. Thank you. Two questions. The first one was how easy would it be to integrate Python scripts or something inside the workflows? So how easy would it be to integrate scripts? Like some scripts that we might have for our infrastructure to integrate into the workflows. Right, so Mistral does have the support for being able to execute, as I said, they have these pre-canned actions that you can support. So I'll be very surprised if you can't run additional scripts because you should be able to run Java script and other scripts through that, yeah. And the other question was, can users define their own workflows too? Pardon? Can users, like the tenants, define their tenant-specific workflows? Or is it like more an admin thing? Yes, a user can submit a workflow within that tenant and then it would run under credentials of that user. So then they would be restricted, sorry. They would be restricted to whatever actions OpenStack would allow them to run within those credentials. Okay, cool, thanks. Yes, you can. Hi, this is Winston. I wanted to answer the previous questions whether Mistral has been, were there actual customer using VMware with Mistral? Yes, we do. Not direct, you can write custom actions in Mistral to interface directly with VMware. What my team does is we use something called Staxflum and already have a list of actions that are predefined and we do have customers that are using Mistral to interface directly with VMware, not just that, but also with OpenStack and also with AWS and other providers. Okay, thank you. Can we integrate OS purge with Mistral? We're gonna ask Renate again for that. We have specific Mistral question. Can you repeat the question, I'm sure I got it. OS purge is, again, a project where you can use to delete resources when you delete a project. It's a client-side application. You mentioned that in an ID and then it goes and checks all the resources it has and it purges everything. Can we use that to integrate with Mistral to put it in the workflow? Yeah. Any other questions? Okay, now thank you, everyone.