 אוקיי, תודה רבה, מיידי. עכשיו אנחנו נתתתלנו על הלב-אי-קל-אופורטיינס, של פרקות נטב-אי-קל-אופורטיינס, שערעת על מולטי-ריג'ן ומולטי-קלאב-אי-אנווירמנט. אני נוי, אני אצל ה-רנדי-דרקטורים של קלאבנט, בבסינסי-אי-אנו-אי-קטל-אוסנט. כאן, אני נוא, לימור ומחל, סינירוספטורים של תאמן. ת uda villagers��� nosotrosרמי כבשטח mounting אני נתתלמיד על הפבה에서도 של הלב-אי-קלאב-אי-קלאב-אieten שב pine-oni разные פרקות, opportunity counlaş, seperti אי-קלאב-אי-קלאב, אם עבלaaa appeared,鍵ו, אז תודה רבה 12OO14 תודה רבה 12OO12 ävenounterate pages. וכך, משוסעיPLYCHVR 2002 בלמょfalSurfeld SuperGRB extensive mature PT ponieważ, אין איך Township זה עקבות נתנו כמה finely אוכן, מffe תקדימה על מה אנחנו צריכים להקראת כל הדברים. אז קלודבנט הוא סוגי-אונטר בן-אלכתלוסנט. אלכתלוסנט תקדימה סולטיין לנפבי נציגי. אנחנו תקדימה נפבי האנפסטרקטי, נפבי אוכסטרשן, סדן סולטיין, ולבדות פרקות אונטרו-לטרוק שיכולה להתקדימה על כל כלבים. קלאדבן הוא איזה סיימצג, שבאמצג 4 שנה, והוא פוקוס על קלאדבן פלטפורמי, שבאדרס את ה-NFV נדס. We have a strategic partnership with infrastructure vendors like HP, Intel and more. We have an ecosystem with more than 50 vendors that onboard virtual network functions, testing them, making sure they are working properly. We are active players in the open source area. We are contributing in OpenStack, in Oasis and other area, and it keeps us open as we can, and we stay with it up to date with the latest features we have. We have in our business unit around 150 engineers, and we are located in Israel. So that's it with the marketing staff to give you context. Now let's dive into the challenges and the solutions that we provide. Okay, so every one of us relies on telco applications to make him constantly reachable and accessible. We've grown used to the fact that we can call whenever we want, whomever we want, wherever it may be. But think about what this entails. Think about the order of a million of human work hours that was dedicated to make these applications as reliable, as dependable as they are today. But all of this work was dedicated to make this application in a physical boxed world. As we can see here, every telco application in its physical form was made in a special box which contained both the software and hardware. The software was written and ran on very closely integrated with the dedicated and expensive hardware. This fact made the service providers have to think a lot of times when, for example, for a service provider, when he has to scale an application, he has to anticipate ahead of time how many of these boxes does he need to make this scale cost-effective. Since every one of these boxes, of course, had a financial cost for the hardware itself, and on top of that, you need a person, a technician who specializes specifically in this box to come install it and later on maintain it through the application lifetime. So this thing is what's causing now this NFV transition we're seeing. So every one of the, all the service providers are trying to move forward and make the VNFs they run, sorry, to make them virtual VNFs, and all this while keeping them at the quality they have today. So they're trying to redeploy all of these services on a cloud architecture which will enable them to run on a common IS layer with shared compute network and storage resources. On top of that, we're going to also have a management layer to enable all of these VNFs to have lifecycle steps run automatically and generically. So when we're talking about application lifecycle management, there are a few steps that we're familiar with, of course. The first step is onboarding the application and testing it. After that, we go to deploying the application which should be on a cloud application as automatic as possible. Later on, once the application is running, we need to scale it, maintain it, and heal it all the time. This is done while we constantly monitor the application in all the layers we can to make us know constantly what its real state. And of course, later on, we terminate it. So today we're going to focus on four main steps here. The first one is deploy, scale, monitor, and heal. And I'm going to start with the deploy phase. So when we look at deploying a VNF, the main problem we have, the main requirement we have, is the distribution. Since when you talk about VNFs to keep their service level agreements, they have to be very massively distributed. For comparison, if you look at the Amazon IT Cloud, they have nine data centers all over the world which shows all of their applications. But the telco applications in their physical form, some of them had to be deployed with an instance on every city they operated in. So to get this in OpenStack, we have three options we can look at. We have the single-region option which is a local data center, so not a lot of distribution there. We have the multi-region option which has multiple regions all managed by the same keystone for authentication and have the same horizon. This wringles a lot closer to the deployment requirements we want. But there are still some limitations here. For example, the first one is that keystone, when we want to run it in HA on different nodes, might become a problem. The second one is that if we want to actually have a very massively distributed application, this means that we usually would like a policy placement engine that will tell us how can we optimally deploy it, on which node should we put each resource and not decide it on our own. So what we're doing in CloudBend is that we have something a little different here. We have on every data center a separate instance of OpenStack and we've developed a top-level management layer which can look at it from the top level and see all of the nodes and all of the resources and once the person wants to deploy the application, he can tell it how will he want to divide it and then it will tell him where is the best data centers to deploy the resources. So we have the management layer on the other options as well but this is where it comes really important. So I'm going to show a small demo here of deploying an application. So I'm just going to log in to the screen here. This is the CloudBend GUI. So I'm going to, this application was already on boarders. So I'm going to walk into the catalog here and we're going to see it in a second. So this is an application of VNF demo. That's what I call it. It has free servers and each server has two storage. We're going to see all of this soon. I'm choosing here the networks and the images. You can see here in Keeper's, they're all distributed through all the data centers and I'm going down to the segments section where I'm going to have it distributed. So I'm going to add a segment here. The first segment is going to be on CloudBend in Vancouver right here and I'm going to add another one and it's going to be in Tel Aviv and we're going to divide here the tiers of the application between the segments. So we're going to choose 100% because we have one server per tier here so it doesn't really matter but you can do it in whatever percentage you want and this enables you to have an application run in HA on different nodes. So it's useful. So I'm going to deploy it now. So it's going to deploy. We're creating an open stack while we speak and we're going to see it later when it finishes. I just want to show you here this shows the topology of the application. I'm going to expand it. So you see here that we have three tiers. Each of them has one server and two volumes attached to it and you can see down below on the left side that we have two of them on Tel Aviv and one of them in Vancouver that's the distribution view. So let's go forward. We'll come back to that later. Sorry. Let's talk about scanning. Well, I already mentioned the virtual network functions or internalization space between the physical world and the virtual one. As a result of this phase can you hear me? As a result from this phase we still have some legacy code that was running on a dedicated hardware that now is imported to the cloud. Sorry. This legacy code relies on a specified physical configuration and as a result both the scale in and scale out are relying on restricted rules. For example the next VM in a scanning group can be different from the previous one. It can acquire different image. It can acquire different configuration. Furthermore another critical requirement is controlling the scaling order or sequence. Achieving those capabilities and OpenStack today is very challenging because he do not support distinguishing between different resources in a scanning group or controlling the the sequence scaling. To solve this problem in Cloudman we are today using stack update and resource scope. It allows to control both the parameters of instance in a scanning group but it also means we are not using scaling group and we are not enjoying all that it gives. We cannot enjoy, for example, the auto-scaling on a silometer alert and sometimes update stack can take a long time and it will affect the time of a simple scaling event. Moving forward we are now presenting a set of blueprints to give these capabilities to hit. These blueprints include features like adding index and parameters to adding index and parameters to the scaling group. Who wants to talk about monitoring? Yay, monitoring! We are now going to talk about monitoring especially finding the fault monitoring. When talking about monitoring there are two fundamental issues. One, time is money. I want to find and I want to fix the problem as fast as I can. The second one at the end of the day every cloud application is sitting on a bare metal note. That means that we have actually three layers of monitoring. The physical layer, the virtual layer and the application layer. Each layer can generate tons of alert and one system event can generate a lot of events on different layers of monitoring. I know that knowledge is power and sometimes I need every bit of information to solve the problem but too many data can create a chaos and hide the real problem. For example, if I have a bunch of alerts on VMs being done I can miss the one alert about the host being done and not understanding the relationship between the host and the VMs will affect the way of finding the problem. For monitoring the different layer we have a different tool. We have Nagios and Gandlia for the physical layer. We have Sinometer for the virtual one and we have Monaska and different built-in tools for the application. Cloudmen collect all the alerts and analyze and virtualize the relationships between the different layer and the correlations between the alert. You can see over there on the top the assemblers view of the relationship between the different layers. Every data center is actually a slide on top of that you have the host, availability zone and the VMs. Here you can see the connection between the different alerts. We can see here that we have here, it starts from here, you have memory load that leads you to a host fall down which creates a lot of VM alerts which will affect my tier and eventually will make a failure in my application. So in continuous to root cost analysis on monitoring that we more just discussed once a problem has been detected it has to be healed by the management layer. In our case we are talking about cloudband. It doesn't matter if the problem already affected the applicative layer or it will affect it soon. Let's take one example of a problem that is about to affect the applicative layer but hasn't yet. Imagine a fan is malfunctioning on a physical host. What is likely to happen is that the host will overheat and shut down killing all the VM hosted on that host. A healing solution that is favorable for such a problem is migrating all the VM to a different host. The placement of such a host cannot be chosen arbitrarily. It has to comply with certain SLAs agreed upon between the stale co-vender and his customer the service provider. The SLAs is service level agreement. Let me show you now. Such a SLA can include as specified here zero downtime or data backups to make sure as smooth as possible and continuous of service to the customer. For example, let's take a service where it's very important. Imagine you are making an emergency 911 call and your call doesn't get through because the service is undergoing migration. Clearly such a behavior is unacceptable for this kind of service. Healing challenges. So today OpenStack lacks sufficient support for automatic healing. One of the reasons for this is that sometimes healing flows are very complex. If we take the migration example we can make it even more complicated by adding a demand for all applicative processes on the hosted VM as much as possible being gracefully shut down. Another requirement can involve reservation of IPs or reattachment of block storage. In a distributed environment there are even more challenges. The distances between the original placement of the host and the target location can be very big and that can be problematic if you want that process of their healing to happen fast enough. So the way you can help the process to be fast is doing some preparation steps in advance. Examples that we are doing specifically in Cloudband as you might have noticed from what Noah said we distribute all images in advance and even we do it very useful for the user, it can deploy the image in either OpenStack node and we distribute it from there all over the system. It doesn't have to go through our management layer. Another example is the creation of virtual networks. So a deeper understanding of what the problem is will always give you a more optimal solution. Clearly, migration isn't always a solution and another favorable approach is redeploying the resources somewhere else. To make it more optimal you can use tools such as LIMOR describe for root cause analysis because always when you know what the problem is you're more likely to find a better quicker, more optimal adapted solution that will fix your problem better. If you are lacking right now the option to find the root cause of your problem there is another approach that is based on gradual treatment process that tries different healing solution in escalating severity. This is an example here. Once a problem had been detected you can go to your lightest solution possible that is quicker. So this is an example for such a flow for an application. If you find a problem in your applicative layer or somewhere you don't know, you can start with applicative layer and simply restart the process. That will solve your problem if it was a software problem like a deadlock that was pushed in by some careless developer. If that worked wonderful, if it didn't you can go on to the next solution in a higher severity. You can try to reboot the VM. If that worked excellent, if it didn't you can try redeploying the VM somewhere else and if all else failed you can go to the old world approach and just call a human. So what we are going to show in our demo very soon is a solution based on two open stack tools Convergence and Mistral. Convergence is a set of blueprints in heat project that is aimed towards fixing a stack whenever it is different from its template for some reason usually a failure like if a service is in error and it should be up by its template because obviously you don't want to deploy a server in error then it should fix it in some way and get it back up. In Kilo the first phase was already pushed and today it doesn't include any fixing capabilities it only has the option to do an API call of check stack and if the stack is different from its template change the stack state. Mistral is the workflow engine of open stack it allows you to write a complex healing flows that fits your application as best and possible. So let's see them Mistral workflow this is our Mistral workflow we written specifically for this demo what we are doing is simply check in every minute if the stack is okay and if it's not okay we will update the stack with update stack which is an API in heat we check if it's okay with the convergence tool that was described so the demo will shortly start and I will explain you the rules and you can scan around. So what we are going to do now is demonstrate how we automatically heal the application so I need the audience assistance here I need everyone to take out their cell phones and go open the browser and go to this address you will be asked to enter your name and company name and then you will get the game view where you will have to hit on those small linux pingwins every time a blue one comes out you need to hit him on the head every pingwin represents a VM or a volume in our system you will see it live happening we have a scoreboard and the winner will get north face fleas with the cloud bin log on it so I will give you 20 minutes to go inside and I am switching to the scoreboard view with the runtime view 20 seconds sorry I will show you again you can start play by the way I think you can already ok now we can see the players I will maximize the screen ok so ok you can see on the scoreboard you can see David from cloud band ok we are not giving prices to cloud bands so make sure you are not winning we have second place so the minute one of the VMs or the volumes will get 500 hits sorry 200 hits in total it will go down and then we will see the mistral workflow automatically heal the VM or the volume so I think it will take like another half a minute one of the volumes has already 150 hits ok game over so we have a winner Dawn from Sandvim who is it? come to the stage we will give you the freeze code you can see in the meanwhile that the volume number 3 over here was hit you can see it turned yellow now the workflow engine runs on the background now it became red it detached and destroyed the volume and in a minute the workflow will heal it create a new volume attach it to the VM so it takes like between one minute to one and a half minutes you can come to the stage I will give you the freeze in the meanwhile so as we said earlier there is a coin job of Mistral that what he is actually doing is checking the stack in the background and if it recognizes a failure it runs an update stack the update stack actually fix the problem in the topology and create a new volume so sometimes it takes more than a minute on the way try to refresh maybe ok and the new volume became live you can see a new volume was added to the system and was connected to the server and the system can continue to run so thank you very much so just to conclude we walked through the operations of virtual network function we talked about the requirements and the challenges that exist today in OpenStack and how we address them around deployment scaling monitoring and healing we have a follow up session 10 minutes from now about virtual EPC the challenges around there and how we handle root cause analysis and monitoring so you are welcome to stay yeah and for questions if anyone has a question feel free to ask so you want to answer that the problem wasn't the different component the problem was that we need the different parameters or configuration in each one it can be like two VMs that have the same image or something but they need different configuration so virtual network function usually a scaling group doesn't really represent the virtual scaling group because the fifth VM is not necessarily similar to the fourth VM it might include different image different configuration that's why the use of today's scaling group is not address this need so we as limo mentioned solving that by working with a resource group or even with workarounds like update stack preparing in advance the scaling options and doing update stacks and we have two blueprints that are now in progress for adding those capabilities to the scaling group okay thank you very much