 Hello, everybody. I know lunch is waiting, so we'll try to keep it crisp. And thanks for attending the talk. So today we are going to talk about the predictive analytics and visualization. But we have actually a series of three talks in this summit along related themes. So who are we? So we all work for Lou Tucker. I have a team of awesome, super smart folks. And we build cool stuff at the intersection of cloud and big data. So the context of the talk is that if you look at OpenStack installations today and you look at how the operators typically monitor and figure out what's going on with OpenStack, they use a bunch of open source tools like, for example, they could be using the Horizon dashboard. But you know what it looks like. And you could be using open source tools like LogStash. And LogStash will just give you a bunch of repeated logs where you can search and things like that. But it gives you some insight, but it doesn't give you a whole lot. And then if you are using OpenStack along with VMware, VMware has log insight. And it was demoed, so you know what it does. But these tools, we believe that these tools are just the beginning of how one needs to operate and visualize the data center running OpenStack. So what's the real problem? The problem is today, I mean, there are tools aside. If something goes wrong in your OpenStack cloud, there is no way to know exactly what's going on. So for example, if you are running a very heavy job, you would not know whether the network died, or the disk died, or the server died, or your application died. So where would we want to go tomorrow? So we believe that in the ideal world, we should be able to just talk to a system and figure out what's going on with my OpenStack cloud. But that's really futuristic. If you move back a little bit, you would like to use all sorts of gesture UIs and all that to figure out, because the amount of information that you get out of an OpenStack cloud is just tremendous. And we'll talk about it. And it's really hard to have an intuitive way of visualizing what's going on. But we are not there yet, either. So what we want to do is we want to give you a simple tool initially as a first step. We are going to take baby steps in this journey. And the simple tool is the one that's in the upper left-hand corner. And we'll talk more about it. So our goal is that we will first focus on visibility into what's happening in your OpenStack cloud, because there are a lot of issues in operations. Visibility is the simplest and the first thing that one needs to do. So we have three visibility talks. In this particular talk, we are going to talk about real-time predictive analytics for OpenStack. And we have another one this afternoon, late afternoon, on network visibility for efficient operations. We'll show you use cases and how we use analytics to do this. And then tomorrow, do drop by for the storage visibility talk. So what's our approach? So there are many ways to figure out visibility and what's insights out of our data center. We believe that we want real-time and predictive. And the reason why we want real-time is that OpenStack has a lot of moving parts. A lot of things can fail. And ops people, ops experts, want to know exactly now what's going on. Not look through the rear view mirror and say, oh, something went wrong a minute ago. So you would like real-time responses, too. And essentially, if you look at the way to things that are done today, the things that are operated today in the data center, it's all rule and policy-based. And we believe that because of the number of moving parts, policy-based systems actually do not scale that well. And it's expensive because you need to figure out exactly how to encode these policies and rules. And you have to invest significant amount of resources doing that. And the most amazing thing is, if you look at any data center or OpenStack Cloud, you'll see the amount of digital exhaust that is generated. And if you can harness that, it's going to give you a lot more insight than what maybe even what human operator or folks can do. And the reason is that while the human ops expert has domain expertise, he or she may not have the visibility of exactly all the events that are happening because there could be hundreds of thousands of events going on in a large data center. And of course, digital exhaust in this talk means logs and metrics. But you could even use thermal imaging and all sorts of sensors, physical sensors, to figure out what's going on if there's a fire or if a server is going extremely hard. But we'll only talk about logs and metrics in this talk. And as we just said, we want to first talk about visualization because we feel that it's the first thing one needs to do to figure it out. And human beings are amazing at looking at visual cues and figuring out what's happening. And hence, this particular talk. So I'll hand it over. Obviously, as a cloud admin, you want to have as much information about your cloud as possible, including physical service and the components of OpenStack. The tenant project data, virtual machines, and application-layered information, including what it does and how well it's doing it. In our project, we've tried to make this data transparent and as easy to consume as possible. We want to best serve the user to save operational costs in the most efficient way possible. I don't know how many of you were at Portland Summit last year, but a few of our colleagues demonstrated an application called Curvature. This is a drag and drop tool for OpenStack deployment where you can create, delete, control VMs at the click of a button. That's now been open sourced and is available on GitHub. The link will be available later in the show. It's being used by multiple companies at the moment, including one right here at the summit, which you might have seen walking around the booths. We've used Curvature as our inspiration, and we've decided to take it into a new direction using analytics and data information. So what is Avos? Avos stands for analytics and visualization on OpenStack. It's a stateless application with very easy deployment. We've made it as plug-and-play as possible, and all you need is to hook up the OpenStack endpoints and plug-in keystone credentials, and you're good to go. We've designed it to be as client-side as possible, so there is no effect on the compute node of the OpenStack server, so there is very little effect on the OpenStack cluster. We pull the information from OpenStack APIs where necessary and where not necessary. We list a message bus for, for example, when a new VM is created and when a new VM is deleted. This creates minimal overhead on the OpenStack cluster and provides very useful information. We aim to show the data as meaningfully as possible so that it's very easy to digest and so that the administrator can get all the information instantly from viewing one pane. We'll now show you a demo of the... Cool, yeah, so I could go through and list all of the features and all it does, but that's boring, so let's go into a live demo. Do you want to bring up the video, Matt? The second? Okay, so here we have Avos running on a data center that we have back in San Jose. Of course, this is a recorded simulation. You can see we've got a network with various Ubuntu VMs there, and we're plotting... This is a slightly outdated version, so the heat map takes a bit of time to update, so we're plotting a heat map here of CPU util. So if I double-click on an instance, we can see a very brief summary of the properties of that instance, so we've got the name and its current state, when it was created, all the usual stuff that you can get from Horizon, as well as what networks is on, and those are linked that you can click through to see the information on those networks as well. It's just kind of trying to make a one layer that you can visualize everything within your cloud. So again, here you can see, I don't know if you saw that, I've also got some Windows VMs and some Linux VMs, so based on the image, you can instantly see what VM you're looking at, so you can hone in on your particular application, particularly when you have a larger cloud. We've also got a search feature, so again, if you know an instance by name or any of its properties, you can filter that out and you can find the particular part of your cluster you're looking for. So then we have, essentially, the purple is routers, the green is public networks, red is private networks, orange is volumes, very easy to kind of quickly identify what's going on. So here now we have, this is a cluster that's actually doing something. You can see the heat map is going crazy. Here we're running a Hadoop job, so we've got quite a labor-intensive job that's running on, I think it's 10 machines, some of those are other stuff in the cluster, and so you can easily kind of get a vague idea of performance and the activity and what's going on in your cluster, which is something you can't really do anywhere else in OpenStack at the moment. Here we have another plot. This is another thing we've been doing, which is network flows. So currently, in OpenStack, you can obtain traffic in and out of a VM using Solometer, but what you can't do is know where that traffic is going. So we've been able to track down detailed instance-to-instance flows. So again, during a Hadoop job, you might be able to find what's going on, where's the activity hotspots, and here you can see a much more complex example of that in a real cluster where there's lots going on, and you can see the hotspots. It's a heat map, they're red. Couldn't be easier to identify what's going on, and also the outer ring is physical host. Those are the physical servers that the VMs are scheduled on. So you can instantly see maybe that one physical host has far too much network traffic in and out and potentially use that to optimize your cloud in the future. So there's various other things that we'd hope to do with both of these visualization methods. But that's done, let's jump back into the slides. Cool, so it's all very well and good. It's pretty, it looks cool. You can show it to users, and they'll think, wow, OpenStack is cool. This is really pretty and shiny. But at the end of the day, unless it has actual uses in the real world, system admins aren't gonna adopt it. It's gonna fall flat. So I wanna go over a couple of just very brief use cases of kind of things we think we might be able to do with this information already. So again, the first thing is bottleneck. So say you have a distributed application running across 10 different servers and it's not performing as it should. And you can't figure out why you're tearing your hair out. You'd normally, the current way, you'd have to probably SSH into each VM and try and figure out what's going on. And you just wanna go home, right? So that's one of the first use cases. Instantly we can see here, okay, so this is our application. Most of it is running fine. There are two particular hotspots of blazing red activity. So within two seconds, we've identified the most probable causes of our problem. And we can SSH into those VMs specifically and we can try and figure out what's going on. And then another use case we've done. If we're using, if we're pulling all these metrics and we can use mathematical formulas, we can, we've noticed distinct differences in different types of traffic. So for example, on this plot, the green is a regular Hadoop job. It's combining all of the metrics into some kind of analysis of our Hadoop job and what we're doing isn't particularly important, though you're welcome to read that formula. But then you can see the red line which is we simulated a DDoS attack. And instantly it couldn't be easier to spot that something has gone wrong. And so we'd hope to use this data to be able to notify the user when things go wrong. We also have the blue, which is probably slightly more difficult to see, that you can actually see it up there is a network down. So again, physical network went down and as the, as the map produced jobs realigned and tried to work out how to redistribute the load, we found that there was a noticeable distinct pattern in that as well. Next slide. So we think that's pretty cool. I don't know what you guys think. So here's a couple of things that we'd hope to do in the future. So we want to implement, the search is good but we want to make it better. If you have a data center that has maybe a hundred physical servers, thousands, maybe tens of thousands of VMs, if OpenStack is going to work at scale, you need a better way to find problems in that large environment and hone in on them. So we want searches that can, for example, find instances that shut off without any particular hard work, or maybe find instances that are working above a certain CPU threshold maybe. And so you can instantly know, or maybe even below so that if you have two servers and you're at two VMs that aren't particularly doing much, maybe you can combine the workloads and save costs. You're provisioning less VMs. So again, then we want to kind of reduce OpenStack logs and combine them into more relevant, useful information. And then we want some kind of real-time error analytics. And then we'd hope to implement the features of Curvature, which again, there's a screenshot of Curvature there. It's also very pretty. And give us the ability to deploy. Curvature allows you to just drag and drop components onto the graph, click deploy. It sets it up on OpenStack for you. You don't even have to think about it. And as we said before, please, it's now open source. Please look it up on GitHub and try it out. We think it's pretty cool. So in order to do this, we've had to make a few changes to different components in OpenStack. The first thing is most of this data, as you can imagine, is pulled from Cilometer. So we've slightly changed the structure of Cilometer for better API queries so that the return structure is optimized, et cetera, et cetera. And then we've had to add some custom meters to get these flows between VMs. And then there's a few frustrations anyone who's ever committed to OpenStack has probably found things that they don't think are perfect. So a few ideas we've had is standardization of the Python clients. There's a lot of inconsistencies and also potentially integrating this into a horizon panel so you can get the best of both wells. And yeah, the end goal is to make it easier to build on top of OpenStack so you can see what's going on in your cloud and so you can manage it better. So that's about it. To summarize, if you have predictive analytics and visibility, you can reduce ops pains. You can improve performance. But in order to do this... Excuse me. Sorry. Too much talking. In order to do this, we've got to gather the data, then extract insights from it, and then just present it to the users with a kick-ass user experience so that everyone's happy. Cool. That's... That's it. Who knows what will happen to OpenStack in the future? Thank you. Open the questions if anyone has anything particularly. Yeah? No, I think you should maybe... Sorry, yeah. We can't hear you. It's okay. So you showed examples with a few nodes or have you tried this with hundreds or thousands? And if you do have a lot, for one, does it visually work? And for two, does it rind to a halt given how much it's trying to process or can you set the cloud with that? So actually, that's a great question. We've done it. So we've done these experiments on 10 to 20 node cluster. We've not seen any slowdown, but we know that as we scale to hundreds... say, 1,000 node deployment, we will have data processing challenges and we are in the process of solving some of those problems. Just one thing. I've worked at Industrial Light & Magic for a while and Hollywood will love this version of OpenStack. Sure. Absolutely. I get that. Can you talk a little bit about availability with your partners to test it? What's your vision for the project? So I think this was more like showing up an early version of what we would like to build and where we would like to go, but we should talk offline. Of course, everything is on the table. We might even open source it. We are still trying to see. We are first trying to figure out what's the level of interest in the community to decide where we would like to go with this. Okay. I think, you know, there's definitely probably interest. I know we have interest. Sure. Cool. Thanks. Nice seeing you guys again. Yeah. We also saw application performance over there. So what are the plans in that direction? So actually, right now what we showed, you was infrastructure performance only, but we believe that some of the things that we are doing can be extended all the way to the AppStack, but not in this demo. Okay. Thanks. But in order to drive the infrastructure performance visualization, we used a very heavy, very intensive app. It was a hard work job. That's about it. In terms of the applications running on the OpenStack. Yeah. So we should talk offline, but we believe we can use these visualization methods in the App domain also. But for this talk, we haven't. Great presentation. Thanks. Thanks. I can see this being really useful, not just for operators, but for end users. Absolutely. Yeah. So have you considered some sort of role-based access that would allow scoping of visualization just to a specific project, core set projects for those users? You can, yeah. Obviously, we're listing physical hosts that the VM is on. So obviously, you won't be able to see that information from a project view. So the view will be different. But yes, that already supports that. Any other questions? I think everybody's hungry. Yeah. I'm hungry. Yeah, we have an awesome team. People can code math, do visualization, do parallel computing. Cool. Thanks, guys.