 Okay, thank you for coming down to this room, and we are going to talk about port management with OpenStack Congress and BitRage, based on the OPNTV Doctor framework. And for us, let's go next, yeah. First, I'll introduce our forecast on the failure inspection in OPNTV Doctor project. Then I'll ask OHAD to ask him to provide more information on how to configure the BitRage or how you can set to get the failure event to pass the OpenStack component or to your upper layer application managers. And then, we also provide a similar solution with Congress. So let's start. So this is the basic idea of virtualized platform. So we have physical machines and hardware switches. And on top of that, the OpenStack creates a virtualized environment, which, like, VM, and actually it's called instance or server. And we also have virtual network, and VM also has a virtual port, and we also have volumes, and those are connected. And then operator or service provider can do some services to their users. Then the difficulties in this situation, once we have some failure, for instance, NIC we have to identify which virtualized resources are affected by that fault or failure in the infrastructure layer. So it's very difficult to investigate. It's often occurred in the cloud services. So many most public cloud providers may have their own tools or they might build some solutions for that. And the difficulty is that how we can define the fault or the failure, it depends on its architecture or application or the back end technologies that are used in their own infrastructure. So sometimes they have redundant NICs or the hardware. So if you have any problems, one of the redundant NIC, then we still get alive because the back end is still active. So it's very difficult to define the failure. And sometimes the company, sorry, the government has the regulation or the operator has various policies. So failure has to be favorable. So this is a one requirement that we identified. In the Docker project, we defined it for management architecture. And I already explained it in the keynote. And as I said, in the keynote, we have Nova Neutron Cinder that already actually Neutron are still ongoing. But we have the API to collect this virtualized status. For instance, VM can be reset to error or active by external admin tools or admin self. And Cinder also has the API to collect the status of the fault-affected volumes. And once someone with some tools picks that status, they send out a notification to the Cerometer and the AODH, then they can send out alarm to the upper layer. But to do that, we still need someone to put the failure or the fault information to those controllers so Congress and BitRage can do that. This is more detailed about the APIs. So Nova has a reset server status with this post URL. And they also have the forced down API for the Nova compute services. We can estimate there could be some forced failure. But actually it was a bit different but similar. And Neutron carries one of my questions proposing this new API to present an availability of the port to the user. And Cinder already has its reset API to set various status of the volumes. So let's check the two inspector modules, BitRage and Congress. Those have different characteristics, but it was to know. So what, please? Thank you, Ryota. My name is Oad. I'm product manager in CloudBend Nokia, leading BitRage from the product side. And in the next 15 minutes, I will talk about BitRage, the OpenStack project. And then how we are integrated into the Doctor or PNV fault management project. So a lot of people coming to me and ask what is the meaning of the word BitRage. So BitRage is a stained glass window, which every piece of the colorful, every colored piece of the window does not mean nothing by itself. But when you combine everything together, you get the whole picture. So what is BitRage in OpenStack? What is the OpenStack project? BitRage is official project for root cause analysis. The main three functions for BitRage is first to provide root cause analysis to understand why faults occurred, what is the reasoning for failures. The second one is to raise deduce alarms and states. What is deduce alarm? So deduce alarm is alarm that is not directly observed, but deduced coming from the system insight and same for deduce state. And last, because and I will show it in a minute, the way that we are storing the information, the multiple data sources that we have, we provide an holistic and complete view aggregating together the relationship between the three different layers from the hardware to the virtual to the application layer. So the architecture highlights behind BitRage are first we have multiple data sources. We support a lot of data sources. It easily to extend, easily to add more data sources. We have entity topology graph that reflect the relationship between the different entities, in the different layers, and we have configurable business logic. We have a different system, different customer with different needs, so we want your system to be configurable to fit to each configuration. So let's deep dive a bit into the high level architecture of BitRage. So let's start with the data sources. Here in the left, you can see the data sources supported in Newton release. So we supported several OpenStack projects, the big ones, the core ones, like the Nova Cinder Neutron, AODH for telemetry. But we also support external tools like the Zabix and the Nagios. And as I said, we can easily extend it. The next main component is the template. The template actually holds the business logic of BitRage and I will elaborate it about it in a minute. We have BitRage dashboard, UI, horizon plug-in screen to present all the insight coming from BitRage and I will present a short demo for BitRage so you will see the UI screen. And last, BitRage also have the notifiers to notify other projects, could be internal projects in OpenStack like AODH, or could be external to notify external systems on the deduce alarms on the RCA, et cetera. So I want to get into the BitRage business logic, the BitRage template. So template contains three sections. The first section is like the metadata, the name, the description of the template. The second section is the definitions, which entities are part of this template, what are the relationship between those entities, and maybe the main part of the template is the scenario. The scenario defines the conditions and the action that we want to take. It's written in YAML format, very human-readable, very easy to edit or to add more templates. So let's take an example of a template, a host ICPU load. So we have three scenarios in this template. The first scenario is to raise an alarm. When we have ICPU load on the host and the host contains instances, we want to take two actions. The first one is to raise the deduce alarm on the instance, and the second one is to set the state of that instance to be suboptimal or error depends on your configuration, on your business logic. The second scenario is to add the root cause analysis link. To link between the hardware failure, the ICPU load on the host and the instance error. And the last one is to set the host state. In this case, it could be to suboptimal. In other cases, it could be that we try to call Nova to mark the host down, et cetera. So this is how it looks in vitrage templates. You can see these three scenarios written in YAML. If you can read it, it's very, very, very easy to understand, very easy to edit or to more, more templates like this. How does it work? So we have another component in vitrage called the vitrage evaluator. The evaluator listens to changes in the entity graph, and upon event, it retrieves the relevant template. We can have a lot of templates. So the evaluator should find the relevant template for this event. Then to evaluate the condition in those templates and to execute the action. And we are using subgraph matching to do that. So you can imagine that every scenario in the template is like a small graph, and you have to find how this small graph is, where this subgraph in the big graph. So now let's present a short demo for vitrage. I'm okay. Thank you. So maybe I need to do this. So these are the vitrage UI screen. You can see on the left, it's horizon based, and we have four screens. We have the topology screen. We have the alarm screen, the entity graph, and the template. Let's start with the first one, the topology screen. It's a hierarchical view of the compute. We can see that it's hard to see, but the inner ring is the open stack cluster. Then it's the nova availability zones. Then we have the computes, the nova computes, compute 01, and then we have two VMs on that compute. On the left side, you can see all the information about the selected entities. It's very easy to see if you have a fellow that there is something which is not green. So you can play it. So now move to the entity graph. The entity graph is the topology of the entire system, getting all the entities from all the data sources. You can see on the top, the open stack cluster, the nova, the computes. On the button, you can see the application, the it stack. The it stack has two instances, the application servers connected to sender volumes, and again you have all the information for the selected entity on the left side. Moving to the template view, here are the template list. We have currently five templates loaded in the system. In this case, I selected the austenic failure scenario, a different scenario from my first example. So you can see the tree section, the entities, the relationship, and then the scenario. The first scenario is that if you have a austenic failure and the host content is instance, we want to raise alarm on the instance and then to set the state of the instance. We have the root cause and last to set the state of the host. So it's easy to see what each template contains and what are the actions that will be executed once we will find the matching for this template. So now what we are going to do is to simulate austenic failure. We have a Xabix installer, so we have simulated austenic failure in Xabix. And this is the Xabix screen and once we got the failure, you can see we have a public interface down and if we are coming back to the virtual screen, now I am going to the alarm list and we can see four alarms. But why I got four alarms? I raised only one alarm in Xabix. So you can see that I have one alarm in Xabix, the last one, but I have three additional alarms raised by Vittoraj. These are the deduce alarms, the two alarms on the VMs, the two VMs that hosted on that host and the third alarm is on the application. It's how to understand what are the relationships between those alarms so I can go to the root cause analysis diagram for this specific alarm and then I can see very clearly that the application error caused by the VM network problem caused by the austenic failure. And last weekend, in a minute, I will go back to the topology view. So in the hierarchical view, back we called it the sunburst. It's very easy to see that we have a section that is not green. In this example, it's yellow. Then you can drill down and figure out what happened and go and troubleshooting your fellow and if I am back to the entity graph, I can see the relevant instances that are now in yellow and we have those four alarms in red and all the relationship between them. So this was a short demo for Vittoraj. And now I, after this very brief overview about what Vittoraj does, I want to talk about Vittoraj and Dr. Opie and V. So as Vittoraj explained, Vittoraj is one of the, together with Congress was one of the reference implementation for the Dr. Inspector. Vittoraj using push and pull interfaces for various monitoring tools and we have, because of supporting those mechanisms, a very fast failure notification. The requirement are 500 milliseconds to do switchovers so Vittoraj can support fast failure notification. We have the mapping between the different layers, between the hardware layer, the virtual layer and the heat stack layer. Vittoraj explores more faults. We enrich the faults and the statuses of the system. For example, in the example that I just showed, if you go to Nova, you will see that all the VMs, all the, all the computers are up and running, although there is a NIC failure so Vittoraj add additional information and additional alarms to the user. We provide the root cause analysis indicator to the application manager so you can get the same failure but you want to take a different actions, depends on the root cause analysis for the same action. And last, as I presented, Vittoraj is very configurable. You can adjust the templates to, to your own system, to your own configuration. Thank you. I will move to Massa for, to present Congress. Thanks, Ohat. I'm Massa Hitto from NTT and I'm also working on Congress Project, which is one of the big 10 projects in OpenStack. So before starting my part, I want to do a quick survey about Congress Project. So please raise your hand if you know the name of Congress Project. Oh, Congress is famous, rather than the Austin Summit. And next, if you use Congress before or you are using Congress Project now, please raise your hand. Oh, only one person using a Congress now. So it looks like half of us know what is Congress. So I should start what is, explain what is Congress first. Oh, so Congress is a governance as a service. Congress enable cloud administrator to define and enforce their policy for cloud services. So now I'm, I'm thinking you have a question. What's the meaning of policy that Congress can manage? Because the meaning of policy is, is valid based on your background. Like low regulations, business rules, security requirement, application requirement, so on. But now Congress, goal of Congress Project is managing any policy to any services. So this is, this is a quick overview of Congress Project. Congress Project is roughly divided to three parts. First part is API, which is described in the top of the boxes. It, the cloud administrator can define their policy via API and services outside of Congress can push that their, their information to Congress. And second part of Congress is the data source driver. The data source driver is in charge of collecting data from cloud services. In this slide, NOVA and Neutron is in the cloud. And then NOVA data source driver and Neutron data source driver pulls the data from NOVA and Neutron. And also, the monitor in doctor process puts the data via API to doctor data source driver. And the data source driver keep it in, in it. And finally, third part of Congress is policy engine. Policy engine calculate that policy violation based on the policy defined cloud administrator and the data collected by data source driver. So in this case, policy engine define, detect some error in NOVA. So policy engine say, hey, NOVA data source driver, to fix the problem in NOVA, you should call that API to NOVA. And then, this is the data flow of Congress in doctor project. First, monitor process outside of Congress notify the hardware failure events to Congress. And then, doctor data source driver receives that event and insert to it event list of doctor data. And then, policy engine receives a failure event and evaluate registered policy by administrator and for state collections. And this state collection is written in number four. Policy engine instructs NOVA driver to perform host service down API and the reset state of VMs via NOVA API. So this is the details of a schema in doctor driver and the example of that event notify that monitor process. Upper side of box shows that schema of that event notify that monitor process. And lower side of box shows that example. ID is some kind of ID notified by that monitor process. Time is when that event happen. And type is what kind of event happen. And host name is which monitor, sorry, where the event happen. Status is kind of that event of status of event. And monitor shows that name of monitor process which report that event. And the monitor event ID is some kind of ID used in the monitor process. So this is a quick overview of Congress and how we use a Congress project in doctor framework. But now I'm thinking how we use this architecture and data. So I will explain next that policy that used in the doctor project. Because a policy used in Congress is one of most important thing for Congress projects. So if we say that policy used in the doctor project in natural language, we can say that a hypervisor and instances on that hypervisor must down status or error status if some error are reported by monitor. Very simple, I guess. I think it's a simple policy. If Congress can handle and understand the natural language, it's very simple. But now Congress can't understand that natural language. So we have to translate that policy to Congress-style policy. So from now, I will explain that Congress-style policy. So first one is we have to list hypervisor that violates the policy. The violate, meaning stealing upstate, even though it's reported some error happens in that hypervisor. The upper side of box have a list of rules which we have to define in Congress. And the second one is we have to define these instances that violates the policy. In that case, the violate means stealing active status, even though it's reported some error happens in the hypervisor that virtual machine runs on. By the 1, 2, 3, 4, 5, 6, 7, 8. By the 8 lines of policy, it can list of these policy violations, sorry, hypervisors and instances by the policy. So next, the Congress-in-doctor project, it is supposed to call Nova API to change the status of hypervisor and virtual machine. So we have to define how to fix the violation in Congress. So upper side of box shows that how we, upper side of box shows that the rule over how we call Mark Houston API to the hypervisor. And lower side of box shows that how we call reset state API for that instances which violates our rule. So I think you could have that question, how we defined that event I mentioned before is failure or not failure. So next, I will show you two scenarios that how we define that event that calls a failure. So first scenario is for sensitive application or sensitive operators. So the application or this operator thinks define both non-broken event and broken event calls a failure. In this slide, Markdown event, oh, sorry, in this case, Markdown event is listed. And then this operator think HOSNIC 1 down and HOSNIC 2 down and HOS CPU high load is failure. And it's this kind of event could cause the failure of a system. So they have to list all three events as a failure. Next is the second scenario. In this scenario, cloud administrators think second scenario is both insensitive applications or insensitive operators. So they don't think that CPU high load calls any failure for system or applications. So in that case, the just difference between scenario 1 and 2 is just this event is defined or not. So in that case, we can define HOSNIC 1 is down and HOSNIC 2 down. It's only that event calls a failure. So like that, very simple to change that our policy to Dr. Framework. So that's a feature of Congress. So that's up. Yeah. We have five minutes for a question if anyone has. But just go to the mic on the side. Please use microphone. It's a bit far, but sorry. Can you go to the mic because of the recording? Thank you. Hello. How can I rapidly deploy it or use it? You're asking how to deploy Congress on bit ratio. Yes. OK. So Bitwise is an OpenStack project. We have a puppet installation. We have a DevStack installation. And I think we have all the documentation in the OpenStack. For Congress project, it's also the same because DevStack is supported in the Congress and any kind of deploy project. Could we support, I guess, I think support Congress project? Yeah. And also, we have OPNF3 release. Now we have just released third release, Colorado release. In it, we have Congress in it. And next, we might have Bitrate as well. So it's very easy to deploy with one. It's not one group, but it's very easier than setting up, integrate by yourself. More questions? So I just forgot to mention that we have a full demo of Bitwise in the Nokia booth. So if someone wants to go into the details, so please go to the Nokia booth. And there is OPNV booth. So you can get more information about the doctor project there. So please come to our booth. And please feel free to reach out to us. And yeah, doctor are defining framework, but it's still relying on those upstream projects like OpenStack Congress and OpenStack Bitrate. So yeah, please reach out to us. We can answer your question. Thank you. Thank you very much. Thank you.