 Okay, good afternoon. We'd like to start. So my name is Iris Finkelstein. I'm with Nokia Cloud Band and I'm here today with my colleague Ohad Shamir also from Nokia Cloud Band and with Gerald Kunzman from Doctor who's also a participating member of OPNFV. And what we'd like to talk to you today about is our project Vitrage and how we're collaborating with OPNFV Doctor project. So, okay. So just so you know how this is going to be set up, we're going to be starting out talking a little bit about the Cloud Band NFV portfolio just to give you a little bit of overview and background on what Cloud Band is doing and how Cloud Band has actually been transforming itself over the last couple of years to do more open source and be a major contributor towards these efforts. Going on to Gerald's part, we'll be talking about OPNFV Doctor giving an overview and a little bit about the NFV use cases and then we'll have a deep dive into the Vitrage project itself by Ohad. And then going on into how Vitrage is actually implemented and answering the Doctor project requirements. Why is this going backwards all the time? Okay, and finishing with the roadmap for Vitrage going forward where we see it going. So this is going backwards. I'm not touching it. Okay, I'm not going to touch it anymore. So talking a little bit about Cloud Band for those of you who don't know, Cloud Band started about five years ago. Basically, we're an NFV management and orchestration portfolio of products. As I mentioned, we started five years ago and that means that at least in the NFV industry, we have a long history. Very knowledgeable about this industry, a lot of connections with customers and with partners and a lot of work with the standardization organizations. If we look a little bit into the Nokia Cloud portfolio, then Cloud Band is very well situated with products covering everything from OSS management, element network management, virtual network functions both from Nokia and from third parties. So you can see that we have Nokia elements and we have third-party elements. And this is a very open platform from the hardware all the way up to the service management OSS BSS. In terms of Cloud Band portfolio of products specifically, we're talking about the NFVO, the NFV orchestration, which is our Cloud Band network director product, our VNFM, VNF manager, which is our Cloud Band application manager, and then the Cloud Band infrastructure software which is based on OpenStack and in collaboration of course with Nuage Networks. So going forward with Cloud Band, when we talk about how we're going to achieve NFV next-generation goals, we're talking about VNF certification. So this is a really important project going forward in terms of standardizing VNFs and making them cloud-ready and virtualization-ready. Of course, open architecture and you can see the beginnings of that here and we're expanding this open architecture and making sure that it really fits the industry requirements. Open source, we're going to be talking a lot about open source in a second, so I won't go into that too much. Hardware acceleration on the infrastructure, obviously. And I can't see over there, that's very far. So automatic orchestration. We're doing a lot of work on automation and especially on automating our orchestration so that when we talk to service providers, they have the most tools that they can get in order to be able to work and operate their networks in the most efficient way possible. One of the things that we've announced recently is the shared data layer where we abstract information from our VNFs and take them to a higher level again to increase automation and make getting the information for service providers more easy. And Nokia and virtual network functions are primarily built for NFV. So this is work that we've been doing together with other units within Nokia to ensure that everything is really focused on NFV and on a bringing service provider, the best operational experience. One of the things that we've been focusing a lot over the last couple of years is building the ecosystem. And CloudBand has, if I'm not mistaken, the first ecosystem in the NFV industry counts over 60 companies and partners today. Very widespread working together and building use cases on testing, validating, and certifying VNFs. And what we call lean-ups and I'll be showing you a demonstration of what exactly we mean by lean-ups in a bit. So I want to take you back just a little bit when we talk about CloudBand and Nokia and how we're transforming ourselves to be very much within this world of open source. So if we go back a couple of years, we were all about IP focus, so very proprietary development. But this has changed over the years and we see that we've begun using open source more, contributing to open source and going upstream. And this is really what we're here today to talk about. So just to give you a little bit of information about how we've been seeing things for the last couple of years. So back in the day, our brand differentiation was actually created through proprietary code. So we didn't know anything else. We didn't realize that this could be changed, that this did not really matter, especially for the NFV industry. And of course our patents equaled huge value and for us intellectual property was everything. And that's very corporate thinking of course. And if we look at Bell Labs, Bell Labs is a part of Nokia today. So Bell Labs today has over 15,000 patent applications, eight Nobel Prizes, 30,000 or more than 30,000 actually active patents. So that's very IP focused and that's still continuing today together with our work on open source. So over the years we've sort of done the strategic shift from a very IP focused type of development into realizing, especially I think since Cloudban came along, realizing that NFV has the potential to be using open source to the greater advantage both of ourselves as a vendor and of course to the industry and the service providers. So we started using open source and Cloudban has actually been using open source since the very beginning. And when you start using open source in a company such as ours, then you have to overcome all kind of hurdles, legal hurdles, various types of organizational change as you move forward and integrate this work of open source into your company. So organizational mindset and so on, you get the picture. And as I said, Cloudban, since day one we've actually been using open source. You can see a variety of projects that we've been working with and integrating into our code. And if we're talking current status, then we've been contributing heavily to open source. Over the last I would say year and a half and I know for some of you this may not seem like big numbers, but for us specifically if we look at Mitaka and I think these numbers are a little bit outdated, then 60-something plus lines of code is very big deal for us. So we've been contributing lines of code, contributing blueprints and really working very heavily on this. So one of the things we realized after we started contributing code that again that's not enough. So we keep moving forward with this industry and we're hearing from our customers that they're demanding open source. They want us to use this. They understand that this is in their best interest to achieve an open innovation forward kind of development. So at this point in time we decide that we need to go upstream of course and contribute to the very highest levels. And one of the things that we're doing especially in this aspect is our project vitrage and we're going to go into that in a couple of minutes actually. So we sort of tried to think about what are the things that we've learned about working with open source over the last couple of years. So I think these are the main points and we're really trying to be very proactive about this. Contributing, initiating projects and really being a part of the community. And I think one of the main reasons that we're here today is really about integrating ourselves into the open stack community and becoming a part of it and helping others become part of our projects as well. I've been getting a lot of questions over the last couple of days when we're talking about vitrage at our booth. What is the meaning of the word vitrage actually and I understand that I have to explain it. So vitrage is actually a stained glass window and when you think about a stained glass window you think about a lot of pieces of colored glass. And when you look at them separately they don't mean anything but when you put them all together you get this beautiful picture. And that's really the essence of vitrage, gathering a lot of bits of information from lots of data sources, putting them together and giving you a window and actual vision insight into your system. It's not moving. I'm sorry, something jumped up here. I have a new version of Java available apparently. Okay, so I want to really quickly show you, if you remember, a couple minutes ago I talked about what we call lean ops. So this is actually a visualization of a simulation that we created at the Cloud Innovation Center within Nokia. And we call it lean ops because its purpose is really to provide service providers with understanding of how they can bring lean operations to their NFV systems. And what we're looking here is actually a visualization of a cloud environment, a very sophisticated environment where we have both the virtual infrastructure and the physical infrastructure at the same place. And we can see all of this together and that's really important and you'll see later on for vitrage, especially why it's so important. So as I said we have the physical infrastructure at the bottom, there we can see the servers and how they connected to virtual environments. And if we go up we can see the virtual infrastructure and that's actually represented by city blocks, which represent VNFs or applications and by buildings which represent virtual machines or instances. Each one of these buildings has a different height. You can see and that actually represents memory CPU resources that were used for the virtual machine. So I'm showing this to you just to be able to give you a visualization of what we're talking about when we say vitrage for root cause analysis. So if you can think about the physical infrastructure that we see at the bottom, just imagine for a second that we have a failure and one of the switches in the physical infrastructure. So we have a connectivity problem in the physical switch and that in turn leads to a connectivity problem in the physical host and that in turn leads to the virtual machine being disconnected from the network and that leads to the application or the VNF becoming disconnected from the virtual machine and so on and so on. Now in this visualization you can just imagine that you have one switch that fails in one location but what happens if you have hundreds of switches that are failing all over the place and each of those in turn causes this chain reaction to the host, to the VM, to the VNF and so on and so on. And network operators today, they don't have this information. They maybe will have an alert on their physical switch but they don't know what happens afterwards. They don't know what are the elements or what are the entities in their system that are affected by this problem in the physical switch. And if this happens multiple times then this grows into a big, big, big problem. So that really is what Vitrage is all about. It's here to solve this problem and one of the things that are really important here is to remember that with this visualization and with Vitrage we can see both the physical and the virtual infrastructure and connect them together so we know which server is actually connected to which virtual machine and that allows us to gain much more insights than we could otherwise. And with that I think I'm going to hand it over to Gerald. Okay, so thank you. So I want to complete this picture a little bit and I want to introduce the OpenFeed doctor project. I'm working for entity Tokomo, a telco operator and you will also see why Tokomo is interested in such a solution. Okay, so maybe you have seen this press release. So entity Tokomo was the first operator worldwide that deployed a virtual EPC in a commercial environment in March this year. So this is very new information. And this actually is powered by OpenStack and this is also why we come here to this meeting, to this conference. Looking a bit more into detail what are the requirements that telco has, specific telco requirements. So you can imagine that we need extremely high service availability. So in our virtual EPC we have the different entities and each of these nodes that hosts a few thousands of subscriber sessions and if one of these hosts or nodes goes down then that would mean all the users that are connected to this node they will be disconnected from the network. Of course it would be a negative impact to our customers we want to avoid. But then also what comes after that, so all these nodes they try to reconnect to the network and they will do all at the same time because basically they all have the same timeouts internally. And this then will consequently result in an attached storm. So all the devices try to attach to the network basically at the same time. And this again, I mean we have experienced this also is leading to further congestion in the network to further failures in the network. So then the service is down for even longer time. So most important for us is that if there is a failure, failure in recovery must be as fast, as quick as possible. And here we are talking about sub second order. We will talk a little bit more how the architecture looks like and how the failure notification comes into play here. So basically in order to have really good service availability in a telco environment usually you have this active standby configuration, hot standby. So you always have like the virtual network function in active you can see here in the dark green. The same hot standby machine, the more lighter green BNF running there so that it can take over in the failure case. So imagine there is a failure in hardware. We need to find a efficient way, a fast way to send out this notification or this to detect the hardware failure in the open stack environment for example. Then open stack will need to find out okay there is a failure in the hardware but who is actually running on top of that hardware, which virtual network functions are running on this specific hardware. So whom should I inform about this failure? And then this information is reported to the BNF manager who can do the switch over. So it will do some network configuration and activate the standby instance. So that this becomes active and the service is running again. So before if the doctor project was initiated we did some initial testing and that took like in the order of minutes to get the whole process from detection, finding out the appropriate user and then sending the notification up there. So that was not acceptable for us. So this is why we among other companies initiated the OP NFE doctor project and we tried to find a solution to overcome this problem. And here you can see the high level architecture for NFE, a little bit different figure. So here on the left hand side you can see the virtualized infrastructure with the hardware resources in the bottom, the virtualization layer, the virtual compute storage and network entities, the applications, the BNFs that are running on top of it. You have the virtual infrastructure manager, the WIM, which is open stack on the right hand side and you have the user for WIM and the WIM administrator on the right top side. So basically there are two ways how you can notify about the failure. So the virtual entities or the application can also detect failures and can report this to the administrator and then you could do some reaction on the application level. There is a second way because the virtualization layer that is hiding kind of to the application all the failures in the hardware. So also we want to be informed about failures in the hardware very quickly. So our initial focus of the doctor project is the other way. So we go from the hardware resources through the WIM up to the user and administrator to do the reaction. In the doctor project we identified we need four different building blocks to achieve this problem or to solve this problem. So we have a monitor entity, this is basically doing the failure detection. Then we have the inspector entity which is doing failure aggregation, root cross analysis and so on. We have a controller entity and then finally we have the notifier which is then informing the user side. As part of the doctor project we have already made several upstream contributions to NOVA, Neutron and Cinder. For example one of them is the state correction. Here you can see like a screenshot of that blueprint we made. For the notifier we were extending A to H with an event alarm. And currently our focus is a bit more on the inspector side. So here Vitrasch and Congress are two very good candidates for the project for the inspector. Vitrasch here and Ohad will explain this in a bit more detail soon. It's about the root cross analysis and the failure correlation. So I was showing here two blueprints that we submitted to upstream but there are more. So we have a whole list of blueprints already accepted and completed for NOVA for salemeter. We're also working as I said on NOVA, Neutron and Cinder. If you're interested just go to the Wiki page of OpenFV. Here you can find the OpenStack community page where you can find all the blueprints that OpenFV contributed to upstream. And that was my last slide. I'll hand over to Ohad to show a bit more details. Thank you. So my name is Ohad. I'm product manager in Klob and Nokia. I will go deep dive into the Vitrasch. Before that I will keep talking a bit about the motivation. So I think we show a bit in a risk part. And so requirement system today are getting more and more complex. There is a barrier between the physical and the virtual layers. And it's hard to know what are the relationships between those layers. There are a lot of many, many monitoring gaps. There is no one monitoring tool either in OpenStack or external monitoring tool that can give you the old picture. You need to get the information from several monitoring tools gathering together in order to build a complete view of your system. So it's really difficult to understand what is going on in your system. And of course, as Gerald said, we have a lot of telco requirements, sub-second alerting and action to recovery. And this all together comes to what Vitrasch is trying to solve and to provide to the community. So what is Vitrasch? So the main three functions of Vitrasch are first is deduce alarm and states. Deduce alarm and states, deduce alarms are alarms that are not directly observed. Meaning raising alarms and modify states based on the system insight and I will show an example in a minute. The second function that Vitrasch provides is root cause analysis to understand what was the root cause of a specific failure. So Vitrasch can provide this insight. So you will know that alarm A causes alarm B causes alarm C. And the last main function of Vitrasch is holistic and complete view. So we gather in the information from all the data sources. We're building all the entities in the cloud, whether it's across all the layers from the physical to the virtual to the application layer. So we know all the relationship between the entities and the relationship between those alarms in order to give the user or the customer a complete and holistic view of the system. Vitrasch is based on resource topology engine that reflect all those relationships between entities and alarms. We support multiple data sources and I will touch on it in a minute. We have configurable business logic because different customers, different users, as different systems and each system needs to be configured differently. And we have clear visualization of all the Vitrasch insights. So let's see the Vitrasch architecture and put it all together. So on the left side, we can see the data sources. So currently in Mitaka we support Nova, Nagios, configuration, static configuration file, ARDH, Cinder and Neutron data sources, but it's actually very easy to add more. It takes about two or three weeks per one developer to add new data sources. It's quite easy. We are planning to add more data sources in the future, like IT, Zabix and Monaska. The information from the data sources is injected into the graph and reflected in the Vitrasch entity graph. So we took the information from the data sources and we represent it as a graph. In Vitrasch graph, the entities are the resources and the alarms and actually each entity is reflected by the vertex and the relationship between the entities are the edges. So it's very intuitive modeling that can bring the old picture of your cloud and it's very easy to ask them to do the Vitrasch action, like to raise the deduce alarms and to find the root cause analysis because we are having this modeling of the cloud. We have the evaluator and the template. So the evaluator is actually the logic of the Vitrasch. The Vitrasch evaluator is listening to the changes in the Vitrasch graph. So every time there is a new entity, a new vertex in the graph, it could be a new alert, it could be a new instance. Then it's raised an event. The evaluator listens to that event and upon each event it retrieves the relevant scenario, evaluates it and executes the necessary actions. The templates are very human-readable, YAML files, templates that are actually the scenarios. So each scenario is combined. There is another section of definition. I'm not getting into this, but it's a scenario as a condition. In this example, the condition is alarm on the host and the host contains instance and the action is to set the state of the instance to be suboptimal. So this is an example for a template and Vitrasch has out-of-the-box templates for all the common use cases and it also provides the ability to edit those templates or to add more templates. So it's really configurable. Vitrasch has also a Notifier component to notify Azure project on Vitrasch insight. For example, if we know that there is a failure that affects an host or affect an instance and we want to raise deduced states to change the state of that affected instance, we may want that other project can take this information and do the action. For example, we have Notifier for Nova, Notifier for Cinder, Notifier for AODH. And last, we have UI API and I will quick show you what we have in the UI. So in our UI screen, we have actually three main screen, one for topology. The second one is for the alarm list and the third one is the entity graph. The first one is this page is the topology representation. Here in this example, it's topology of the compute and it's a bit out or maybe to understand but the inner ring is the Nova zone, the middle ring is the host. Currently, it's a DevStack so we have just one host but imagine that this ring is divided into few segments, one segment per host and the outer ring is the VMs that belongs to that host. So we have the status of each resources, we have the relationship. You can drill, zoom in and zoom out on this sandwich visualization and in the left side you can see the information on the selected entity. So you can see the ID, the name, the states and also all the alarms that related to the selected entity. The second screen is the alarm list so we have in vitrage alarms coming from all the different data sources and you have all the information about each alarm. You can see on the right column the type of the alarm can come from Nagios, can come from vitrage. For example, the deduce alarms are coming from vitrage. It could be from AODH, etc. And there is a link to the root cause analysis. This table can be filtered, etc. And this is the window of the root cause and I will demonstrate, I will go step by step on a use case in a minute. But this is how it looks, the root cause. So we have the uptime error on a switch causes the host connectivity and then causes the failure on each one of the instances belongs to that host. And last screen is the entity graph. It's to see the graph itself, all the vertexes, all the edges, all the relationship between the graph. So this is the vitrage high level architecture and now I want to jump into a quick one NFV use cases switch failure. We don't have time to do a live demo, but I invite you to come to our booth. There is a similar, quite similar demo there. So let's say we have a storage switch failure. We don't have redundancy. This is the only switch connected to the host. So first we monitor the switch by Nagios. We raise an alarm when it's failed and vitrage received the alarm for Nagios and add a vertex to the graph. Connected a vertex for the alarms and this vertex is connected to the switch entity. The vitrage evaluator going and find the matching scenario for this failure and then you find the template that says that if you have a failure on a storage switch that connected to a host, then you have to perform several actions. So the first action is to raise the use alarm on the host and to edit, of course, to the graph. Then to change the host state in vitrage, so the host state now would be changed to error and then to add casual link between those alarms. So alarm number one, the alarm on the switch causes alarm number two, the alarm on the host. Once the deduce alarms on the host is added, we will do a similar process for the instances. So now we will raise alarm number three on the instances because we found that there is a matching with another scenario, another condition saying that if you have a failure on the host you want to raise alarms on the affected instances that related to that host. So we raise alarm number three on instance number one and we modify the state of this instance and we add a link between alarm number two and alarm number three. And finally, we will do the same for the VNF. So in this example, we modify the status of the VNF to be suboptimal. I'm not sure that it will be completely in error but it depends on the configuration. It depends on the VNF. So in this example, the VNF will be in suboptimal. So this is one use case that demonstrates what you can get from vitrage, all the deduce alarms and all the connection between those alarms and the correct status of all the instances. So this was very short introduction of vitrage what we have today in Mitaka and I want to spend another two minutes a bit about the roadmap what we are planning for the Newton release. So we plan, as I said, to add more data sources. We plan to do the integration with other OpenStack services for example AODH, for example NOVA that they will get the inputs from vitrage and actually modify the status of their entities. We want to do alarm aggregation. In many cases, you can see in the previous example that one failure causes a lot of alarms and you may want to aggregate them in one group and we want to add this alarm aggregation filtered by several categories. We want to add advanced templates and use cases. So the vitrage will come with out-of-the-box templates for the common use cases. The user can add more templates but we want to enrich those out-of-the-box use cases in the next release. And last, currently vitrage is using in-memory graph database. We are using NetworkX. It's great for DevStack but going to NVDeployments you need also persistent graph database supports and we want to add it in the next release. If I'm talking about connected to the doctor requirement so you can see that there is a very good alignment between vitrage and the doctor requirement especially the inspector requirement. So vitrage supports both pull and push notification. It's crucial for the doctor requirement for sub-second response. So we support it. Although I must say that there are some issues with the monitor's tool itself. It could be delayed in those tools also and we have to solve it but vitrage itself supports pull and push notification and we saw that vitrage entity graph and the template support the mapping between the physical error and the virtual. We know what will, for each failure we know what will be the affected resources and how to notify them and to raise the alarms related to those entities. And last, we have the configurable business logic by vitrage so it's really configurable. So if I want to sum it up vitrage brings three main functions. One is the holistic view. The second one is enrich the alarms and the states of your cloud, of your system and the last one is the root cause of each failure. Thank you very much. If there are questions, we can take a few. In vitrage, you have the topology of the cloud and how do you create this topology information and how do you maintain it to align with the OpenStack API calls and when you migrate the VM can your topology information be updated? Yes, so every action should be represented in the graph so if you add an instance it immediately creates additional vertex in the graph. We add this resource to the graph. If you remove or migrate the resource we change it accordingly in the graph so we keep the graph all the time updated and we are listening to the Oslo bus messages and also for notifications for other sources so we keep vitrage all the time updated. More questions? I have a question to Dr. Project. Can I think that Dr. Project is going to use VFM to trigger the failover? Can I think that Dr. Project is going to use VFM to trigger the failover? Right, right. Did you ever consider other OSS tools or just provide a VM, not the vulnerable interface? For us it's very important that the VM doesn't do any action on its own because we want to keep the control on the user side. We do the handover, we switch to the standby instance but then of course it will trigger also other recovery mechanisms so you want to repair that failure, maybe you want to create a new instance, a new standby. So it will also trigger other reactions that you will then request back to the VM. Okay, thank you very much. You can get the vitrage shirts outside. Thank you.