 We start? Yeah, good to go. OK, hi, everybody. And thank you for staying so late to hear our presentation. My name is Iris Finkelstein-Saghi. I'm with CloudBand Nokia. I have here today, one is missing, but I have two of my colleagues here today with me, Danny Ofic and Alexei Weil. They'll be coming up to speak after me. And what we're going to talk about today is describe some of the advanced use cases that we've been working on with the Vittorage project. So for those of you who haven't been around and haven't been talking or understanding or hearing what Vittorage is all about, we'd like to give a short overview of what is Vittorage, how it came about, and what its benefits and values are to the OpenStack community. And then we're going to deep dive into three advanced use cases with demonstrations and explanations. Anybody who's interested in more information after that, then please be sure to stop by booth A10 in the marketplace. We're giving more in-depth demonstrations, live demos, actually, of Vittorage. So what exactly is Vittorage? About two years ago, we recognized there was a gap in the service provider's ability to understand their systems, ability to understand the root cause of problems and faults. So actually, fault management gap within OpenStack. And that is how we started out with Vittorage, which is actually an OpenStack official project. It became official about a year ago, official project for analyzing OpenStack alarms and events. And it actually enables service providers, and I don't know if there are any out there here today, where they have huge networks, huge systems, comprised of many, many elements to understand exactly what is happening in their system to understand the propagation of faults within their systems throughout the different elements. And we're going to talk about that in a second. And Vittorage enables us to have three specific benefits that are especially useful for service providers. So first of all, it provides a holistic view of the entire system. And we'll see exactly how with different visualization techniques in a second. The second one is fault propagation throughout the entire system, so from the core to the actual virtual machine to the application and so on. And the second or the third one, I'm sorry, is root cause analysis. So actually understanding and troubleshooting faults and understanding where they came from and how they could impact additional elements within the system. So a second ago, I talked about the different visualization techniques that Vittorage enables. So these are just three of the types of dashboards or visualizations that you can get within the Vittorage system. The first one on the right, on the left, I'm sorry, is the entity graph. An entity graph is like an aerial map of all the elements within the system. It enables you to see both the physical and the virtual elements of your network on the same map and see the connections between them. The second one in the middle is what we call topology graph. Topology graph is actually a slice of the entity graph where you can see how to focus on the specific fault within the system. And it provides the different layers. So going from the host to the zone, sorry, from the cluster to the zone to the host and to the actual virtual machine. And the third visualization is root cause analysis. So that represents actually the root cause of a specific problem and the different elements that it affects throughout the network. So these are the different visualizations. And Danny and Alexei are going to show you specifically how these come about and how they are shown within the Vittorage platform. And now I'm going to hand over to Danny, who's going to dive into the first use case. Thank you, Riz. Yes. So hi, guys. Again, I'm Danny. I'm going to talk about Vittorage architecture. So let's start about its basic components. We have Vittorage data sources. There are the information reflected in the entity graph. There are OpenStack services, such as Nova, Cinder, Neutron, and external projects such as Nagios and Zabix to monitor, for example, physical entities. We have notification. We have the SNMP traps. We can notify other orchestrators, for example, with SNMP traps. And we can also monk or host down in Nova. And we have Vittorage templates, which are the policy rules. So when you get Vittorage, you get a system who can analyze and alerts and use them with your structure, but you need to tell it what it needs to do when it gets a lot to one item and how it affects other items. And this is a policy. So it's a simple YAML format. And it also supports complex conditions, such as and or and, of course, not. So let's look at Vittorage architecture. So we talked about the data sources. So here are the data sources. Nova, Cinder for services and external data sources, such as Zabix. We take those data sources. We get notifications from, for example, Nova. We transform the data from Nova, and then we store it in the entity graph. The entity graph then notifies the evaluator. The evaluator then uses the Vittorage templates to raise alerts, to cause action, to change certain items entity's state, to make them in error state, for example. Updates the graph and notifies other systems. It notifies, as we have previously said, our SNMP traps, or, for example, Nova. And finally, we also have the API. And we turn to the API via the UI, via CLI commands. We have, for example, topology show to show all of our data structure and the relation between the entities. We have alarm list and RCA list to show the RCA and the alarms in the system. So what are we going to see today? We're going to see three use cases. The first one is going to be predictive deduced alarms. We already talked about, for example, if we have a host and the host is down, of course, all the instances are going to be down. If we don't have an interface up, which is connected to the host, we don't have an interface for the VMs. The VMs are down on our point of view. But we want to talk about predictive deduced alarms. We're going to talk about supporting high availability scenarios. They are achieved using the newly not. For example, we have two interfaces. One is down. The system will still be good. And we'll then talk about adding finite details to the entity graph. So let's talk about the first scenario, predictive deduced alarms. Currently, let's say we try to deduce alarm properties faults through the system, propagates faults through the system. For example, if we have an interface down, which is connected to the host, we'll have an error on the virtual machine. The interface is down. We can get a representation of a clear current state of the rest of the system. So in the first predictive state, we're going to see if we charge propagate future state as well. Zabix will predict. We'll see how it does that. Zabix will predict something regarding the CPU. There's going to be a CPU stress test. We're going to stress the CPU. It will predict problems on the virtual machines. It will propagate this information to the virtual machines. And it will notify users of the problem problems. Users will then be able to address more effectively the problem. So let's start with the demo. So let's start looking at the entity graph. We can see here in the entity graph, once I have the error. I can't see the error in here. In the left part of the graph, you can see the virtual entities, for example, virtual network, virtual instance, virtual server. On the right part of the graph, you can see we have the real infrastructure, for example, real interfaces, and bridges, and hosts. We are currently looking at compute 0, which we are going to stress. So this is our Zabix external monitoring system. Some of our triggers here, we added to Zabix. And we're going to look at one of those triggers, which is a predictive trigger. In the predictive trigger, we're going to take the three latest measurements, the three latest measurements. And then we're going to see how measurement A and B, in front of measurement B and C, is going to have a very high effect, for example, about 110% difference. And then we can predict that there's going to be a very CPU stress on all the virtual machines. So let's see the current CPU state. We can see that everything is fine. And let's start stressing the CPU. We run a Linux command of stress on that CPU on compute 0, which you saw before that. We can see, for example, for now that the system is not stressed here, the CPU is not stressed, but it's going to change. As you can see, it's starting to get a high CPU load. And we can see the predictive alarm. We can see here in Nagios, down, we have a predictive CPU load alarm. The predictive CPU load alarm is on the host. It's not on the VMs. We are actually monitoring the host here, the compute. Let's go back to vitrage. And we can see here in vitrage, we have an alarm on the compute. It's going to be a little bit hard to see here. We have an alarm on the right side on the compute. And it's a predictive alarm coming from Zabix. And then we aggregate this alarm, and we see we have the same alarm on the instances. One of the instances is comprised from a stack. So we have the same alarm on the stack. And also, we change the state of these instances to be an error, to be in warning. We aggregate the state. We can also go to the alarms. We click the RCA button. And we can see how the system displayed as the RCA, the root cause analysis. We can see that we had from Zabix a predictive CPU load on the compute. And then we can see that we have a predictive CPU performance on the instance itself. It propagated this data. And then we can see the stack is suboptimal due to the first predicted CPU load on the compute. So now that we see it, let's take off the stress. Let's stop the stress process. In a few moments, we're going to see that we're going to have a CPU without a lot of load. Everything is going to be fine. Zabix is going to inform vitrage on the status being OK on the CPU. And vitrage is going to propagate this data to the instances and to the stack. And as you can see, everything is fine now. So thank you. And I'm going to call here Alexei. Alexei. He's going to show you the next two scenarios. Thank you. OK, so I'm going to show you the two other use cases that we want to show in this session. And the second use case that I'm going to talk about is the high availability scenario. Recently, we have added to vitrage the node term to the vitrage templates. The node term allows us to use new conditions, such as, let's say, if I have a host that contains instance and the host has no CPU alarm on it, then do something. So this node term extends our template language and makes it much more powerful. As we will see in the next use case. So what I'm going to show you now, I'm going to show you that if, let's say, we have a heat stack that has two instances that work in a high availability mode between them, we will see that I will raise an alarm on one of the instances. And then vitrage will propagate a warning alarm to the heat stack. And then I will show that if I raise an alarm on the other instance, it will propagate a critical alarm on the stack, because both of the instances has CPU problems. So let's start the demo. So we are going to the entity graph. OK, so I'm not sure if I stopped it. No, no, no, wait a minute. Sorry. Sorry about that. Again, sorry about the interruption. OK, so here we can see that we have our heat stack. What do I have? OK, so here we can see that we have our heat stack, which is connected to both of our instances. We can see that the state of the heat stack is OK, is active at the moment. Now we'll go to the console, and we will raise an ODH alarm on the instance, which says that if I have a CPU usage of more than 70% on the instance, then the alarm will be raised. At the moment, we can see that we have no alarms. Let's see what instances we have in our open stack. We can see both of the servers here. I will raise an alarm. Now we can see that it was raised. And we can see here that the alarm is on. Now we'll go back to the entity graph. And we can see here that we have an alarm. Sorry. I know it's not working. OK, stick is here. Sorry. OK, so we can see now that we have an alarm on one of the instances, and we have an alarm on the second instance. And due to that, we have an alarm with state critical error on the heat stack. Now we'll go to the RCA view. And we will see what really happened here. We can see our alarm list. We'll open the RCA view. And we can see here that both of the alarms on the instances caused the critical alarm on the stack. We'll go back to the entity graph. We'll go back to the console. And we'll remove both of the alarms and see that they're getting down in the entity graph. We are removing the first alarm. It is removed. And we can see here that only one alarm is connected to one of the instances. And the heat stack state is back to suboptimal. We'll remove now the second alarm. OK, and we can see now that both of the alarms are down. And the heat stack state is back to OK, active. So what we have seen here, we have seen here that we had two scenarios in our templates that says that one scenario says that, let's say we have a heat stack that is connected to two instances. And if one of the instances has an alarm, and the second instance has no alarm, as we said, the no term, then create a warning alarm on the stack. The second scenario says that, let's say if I have a heat stack, and it is connected to two instances, and each one of the instances has a CPU usage alarm, then create a critical alarm on the heat stack, which says that we showed that high availability scenario. OK, so now we'll go back to the third use case. OK, so what is the third use case? The third use case talks about network troubleshooting, specifically OVS nick troubleshooting. We have added to our system a script that discovers the OVS topology and integrates it to the entity graph in vitrage. In this way, we have a correlation between the physical layer and the virtual layer of the network, and we'll show how it extends the power of vitrage. What we're going to see here, I'm going to bring down the physical interfaces in the compute, and we'll see what happens in vitrage. OK, so let's start the demo. So we have our regular graph. Here we can see, I'll try to show it here, maybe, if it works. OK, so we can see here that we have our computes. Those are both of our computes. We have the OVS bridge, which is the physical bridge, which is connected to both of the ports of the VMs that are on this compute. We can see that the OVS bridge is connected to the OVS port, and it is connected to both of the interfaces. It is the same in the other compute. We have the OVS bridge, the OVS port, and the OVS interfaces. Now I'm going to bring down the interfaces. Let's see what kind of interfaces we have here. We have two couple of interfaces, and I will bring them down using if config down. OK, now we'll go. What is going on here? We have created a trigger in Zabix that says, that monitors the physical components in our computes, and we'll see that the trigger is on on the interface, as we can see here. And now we'll see it in the bitrush graph. So here in the bitrush graph, we can see that this alarm was raised on the interface, and it caused two alarms on both of the instances that are connected to the host on which the real interface is on. So we can see here both of the alarms. Now what I will do, I will bring down the second interface. And we can see that another alarm was added. Another trigger has occurred, and was added in Zabix. Now let's see the entity graph. We can see that another alarm was raised on the second interface. Due to that, we have those alarms on the instances. And because we had the template in the previous use case, an alarm was raised also on the stack. Now we'll see the RCA of what happened here. The root cause analysis, of course. I think the battery is low on the stick. Sorry, please continue. So now we'll see the RCA. We can see in the RCA that both of the alarms on both of the interfaces has caused the network error on the instance, and it caused the suboptimal alarm on the stack. So now we will bring up both of the interfaces, and we will see that the alarms are disappearing. We can see that the alarm was disappeared from the stack, and the alarms on the instances are now yellow, which means suboptimal instead of critical. And now we'll see that all of the alarms are disappearing. So what we have seen here, we have seen that the richer our graph, the identity graph is, we can show much more deeper what vitrage can do and show in the system. So let's summarize all what we have seen here. We have seen three use cases in vitrage. In the first use case, we saw that vitrage can show not only the current state in the system, but also predict what will or might happen in the future in the system and help to the user with that. In the second use case, we saw that the new term in the vitrage templates, the new not term in the vitrage templates, can extend our language and help us with new kind of conditions that we can add and use in vitrage. And in the third use case, we saw that the richer our entity graph is, the more powerful vitrage can be. And with that, we can see that we can add data sources to vitrage very easily, and thus enrich its graph. So my last point is common contribute. We have plenty of things to do in vitrage. Many, many interesting things. Questions? Hello. On the slide where you demonstrated the use of the not, so that you could say for the high availability, is it possible to do it? So instead of having two servers that are highly available, let's say you've got 15 or so, can you do it so you can say if this one's down and five others are down or half of the others are down. Is it possible to do that as well? Yes. OK, thank you. Anyone else? Don't be afraid. Can it be plugged into any open stack or it's a version or it's a specific version? Again? I mean like let's say open stack seven or eight or can you plug it into any existing that you're already running or it has to come with? With what? With a specific version. Vitrage is available from, is official from Newton, I think. But we also had the Liberty and the Mitaka versions. So I guess from about Mitaka you can have Vitrage and again it's an official open stack project and we work with open stack rules and it is available in GitHub. So how does the deployment model look like? I mean is it a server side or a server side or a controller side services, Vitrage or you also need some kind of agent or you're good enough with the Syllometer or something which is already there. How does your topology look like when a Vitrage is deployed there? With Syllometer I mean? No, I'm just asking like suppose I have an open stack and I want to deploy Vitrage. So what goes on let's say on the controller or as a separate node with services of Vitrage run where and how do they communicate with the agents or the monitoring data collection bits out of? So the Vitrage graph and the Vitrage API run on the controller side as you said. We have the same thing in our system in Nokia. It monitors of course the compute for example with instances on the compute and everything. It is in order to deploy it, it is very easy. It is the same as you do with other projects. You have services on the controllers and you will have your Vitrage dashboard on the horizon. I think it's quite simple and intuitive like other projects. And currently you have packages RPM or Debian for it? Or how do I do it from GitHub? From GitHub? For Vitrage if I have to deploy it. Are there binaries in RPM, Debian, or I just pull it from GitHub? We have RPMs. OK. Yeah. Another question? OK, thank you.