 Okay, so I'm Alex Glickson and I'm here with Ezra Silvera. We are from IBM Research in Haifa. We are excited to be here at the summit. We would like to thank everyone who are attending this session. We're going to talk about management of heterogeneous cloud infrastructure and more specifically on managing workloads that comprise virtualized and bare metal resources. The presentation will have two parts. I'll present the first one and Ezra will be presenting the second one. In the first one, I'll give a brief overview of the heterogeneous clouds, why we need them, a bit of motivation, opportunities, challenges. And then we will go into more technical details on specifically managing workloads with bare metal and virtualized resources with OpenStack. And at the end, we will be glad to answer your questions. So let's start with some background and motivation. So first few words about the evolution of the usage patterns of infrastructure as a service cloud. So if we look at the early days, we can see that, so first of all, the value proposition of the cloud. So all of you probably know it's about agility, efficiency, elasticity and so on. So those reasons are basically driving the cloud adoption since the early days and until now. In terms of the main usage or scenarios, it started with relatively simple dev and test scenarios and what can be called the cloud-worn web applications. And looking at the requirements that those applications had, they were also relatively basic. So if we are looking at, for example, the Amazon EC2 service in 2006, it has just one standard instance type and then it started to grow to five in 2008 and so on, but it started with a relatively basic standardized infrastructure offering. So looking at the current situation, developer position is remaining the same, but we can see that more and more applications and organizations want to leverage the benefits and basically to apply the same or to get the same agility, efficiency, etc. for much broader set of applications. This includes legacy applications, business critical applications, things like high-performance computing, analytics and so on. And also in terms of the organizations that are deploying those solutions, we can see that there are more and more enterprises and more and more organizations which are based not just on the cloud apps but also a much larger variety of applications overall. And this trend is also reflected in terms of the infrastructure offering that exists now in the market. For example, from the previous example of Amazon, we can see that now they are offering, it depends how exactly you count, but they are offering more than 60 instance configurations. And there is also a trend of bare metal cloud providers which are offering bare metal servers with relatively, also relatively high spectrum of configurations that can be used for lots of different purposes. Sorry. So just a few words regarding heterogeneity, what do we mean by heterogeneity? So a few examples include different kinds of CPU, memory and disk and those different models and also different ratios between them. There is a heterogeneity in the compute space of different CPUs, different accelerators, GPUs, etc., that different applications might want to leverage. Different storage configurations including SSDs, SAN, NAS, etc., network, virtualization and also bare metal which is an interesting case that we are going to focus a bit more in the second part of the presentation. So just to give an illustration of the different configurations of how they can be used by an application. So you can see here an example of an application that has four tiers. Each tier is optimized to run on a certain virtual or physical hardware configuration. So the web tier, there is a 3D rendering tier and Hadoop cluster and the database tier. So without going into details of what the application is doing, there is an example here of how the different tiers can be configured and deployed in the cloud. It could be on KVM-based virtual machines, it could be bare metal machines with GPUs for Hadoop we might want to use SSDs and in Finiband we might want to use SAN-based storage for databases and so on. And the idea here is that we would like to have all those options together with the user experience that a cloud user is used to. So there are two things that we see which are driving the heterogeneity of the infrastructure. One is the requirements of the application. So different applications might run better on different kinds of hardware. And another driver is basically the time or the evolution over time of the hardware itself. So in terms of applications, there are many different kinds of application requirements that are now coming to the cloud that used to be just deployed in data centers and now there is a desire to migrate them to the cloud. So roughly, if we focus on the new requirements, they can be categorized as related to special resource requirements, especially for resource intensive applications. And aspects related to isolation could be performance isolation or security isolation. So there are, of course, many different applications. Each of them would require different things to migrate to the cloud, especially in the enterprise world. But those are the categories that we've identified as kind of the most critical. And so on the second, from the second perspective, from the heterogeneity, what we are seeing basically is that along the time there are new models of hardware that become available and the old kinds of hardware become obsolete. So just taking a snapshot once again from Amazon, you can see here a table of instance types that are currently available. And those in light green have been introduced just this April. And those in pink or in dark red has been deprecated essentially. So you can see here that over time there are many configurations that are leveraging new hardware and become available, and there are others that are phasing out. And in any given production environment or large environment, you're likely to find different parts of your infrastructure with different hardware that have been acquired at different time periods. So this is another inherent reason for having a heterogeneous environment in the cloud. And we think that the support for this heterogeneity is essential for any cloud solution based on those two trends. Okay, so let's talk a bit about opportunities. So there are two perspectives on opportunities that can be seen. One from the application provider perspective. So the availability of a much broader set of configurations enable additional workloads to be migrated to the cloud, including those from traditional market segments like manufacturing, finance, scientific applications and so on, which are now being able to leverage agility and efficiency of the cloud for their businesses. And there is a huge opportunity there. Just looking at the high-performance computing market or in the broad sense including the technical computing, big data. So this market is currently larger than the infrastructure service market as a whole. So we can see that there is a huge potential of offering those workloads in the cloud model. And from the cloud provider perspective, there is an opportunity here to build different customized solutions that would fit different application classes. This is especially important given the current trend of commoditization of the infrastructure and the recent price wars that you are probably aware of. So this basically gives the provider's means to deliver differentiated solutions leveraging different heterogeneous infrastructure and resources underneath. So in order to enable all that, we need a cloud infrastructure that is natively designed for underlying heterogeneity and seamless ability to seamlessly host applications with special requirements while still preserving the benefits of the cloud model. So just a few examples in terms of the requirements that this model imposes. Just to summarize, we have access to a wider range of hardware and software configurations. The user expects at least the same level of user experience related to managing flavors, images, networking, and higher level services. And we think that with heterogeneity there is an opportunity here to move to a higher level of application obstruction. So instead of specifying the individual resources explicitly, we think that there is a need here to go to a higher level of obstruction. It could be based on heat templates, for example, to specify the actual requirements of the application or some performance goals and so on, and that there will be some translation layer that would map it to specific resources underneath based on the current condition of the infrastructure and based on the other workloads running at the same time. So from the provider perspective, the basic requirement is, of course, unified management and unified APIs for the entire lifecycle of the infrastructure, secure multi-tenancy across the different resource types, and also elasticity and efficiency, which becomes much more challenging when we have a larger fragmentation of resources and a larger amount of different kinds of resources. For example, there might be a need to dynamically repurpose hardware, repurpose servers to different configurations based on the demand for certain configurations. Ezra will elaborate on this a bit more, but we think that such capabilities are critical to make those offerings not only suitable for the workloads, but also remain competitive and efficient. So how all of that is related to OpenStack or what can we see happening in OpenStack in this respect? So it seems that OpenStack is following a relatively similar pattern, starting with a simplified approach to managing flavors, scheduling, performance, and so on, and evolving to a much broader set of services that are addressing different kinds of applications, including Hadoop, databases, bare metal workloads, much better scheduling to accommodate the requirements of the applications, and so on. And we also can see growing interest from the community in those topics. I've just listed here a few sessions that are happening this week. There was one this morning on bare metal multi-tenancy. There are a few additional sessions around bare metal. There are sessions around having hybrid environments in terms of hypervisors and so on. So there is a session about this solution that IBM and software are offering. So we see that there is lots of dynamics that would eventually take OpenStack to a position that it will be a good basis or that it will be sufficient to host those additional applications and workloads. But of course there are some technical gaps that still need to be addressed. And Azure will elaborate on some of those gaps and also some of the specifics of hosting hybrid workloads using bare metal and virtualized applications. Okay, so I hope we'll cross the microphone switching. So Alex was talking about the general notion of heterogeneous cloud. I will concentrate on the specific area of bare metal and virtualized environment and applications. First of all, couple of definitions that I will use throughout the slides because there is some overloading in terms. So a hybrid environment in this scope means we have both physical and virtual host and the user can deploy bare metal instances and virtual instances. We also define pools as resource groups of the same type. So we will have bare metal pool and virtualized pool. We define hybrid application as a composite application. It can be multi-tier application that spans both virtual and physical servers. So Alex gave example before of 3D rendering when we have one tier on bare metal and the other tier on virtual environment and so on. So we set ourselves when we explore this area two main goals. One from the administrator perspective is to minimize the management complexity and to get a unified management across both bare metal deployment and virtual deployment. And from the user perspective, we want to get a simplified day-to-day management and to show also enhancement in the user experience. We set ourselves a design goal because there are many ways to do that. So we set ourselves a design goal to try if possible to use a native OpenStack solution meaning that we will not need to use any higher level orchestrator or any other models above OpenStack. So this is a quick illustration of what we mean by hybrid application and hybrid environment. We can see we have two tenants. Each of them has both virtual machines and physical machines and we can see that the applications are mapped to both virtual and physical machines and we can see that as usual on the virtual world the virtual machines of the two tenants can coincide on the same host. You can imagine that this imposes severe problems on security isolation and so on. So I want to go through a couple of scenarios that first demonstrate the strengths of this hybrid notion and second we will touch them later on in the presentation. So the first is nodule-purposing. Here what we try to get is get a native policy that dynamically balance between the pools of the physical and virtual servers. So for example if you go to repurpose a virtual node where you have a hypervisor to become, to take the place of a physical node where you can deploy bare metal applications and there are few steps you usually do. First you need to detect a resource congestion on your physical pool. We can then identify a candidate host on the virtual environment then that we can repurpose. We will evacuate all the VMs from that host and assign it by changing all the necessary network and storage configuration, assign it to the bare metal pool. From there the moment there is a request for bare metal deployment we can use that server. The other scenario is what we call runtime decision and target deployment. I think it was also mentioned this morning in the presentation. Here what we want to do is we don't want to specify the target for the deployment. We just want to specify I want to deploy a database or MySQL or whatever, use generic terms and the system will automatically choose the server type according to some performance. For example it will leverage information from the silometer, the scheduler, the heat and so on in order to come to the correct decision. For that we need also to, in terms of the image manager, we need to either construct the image in runtime, meaning that we take a base image and then adapt it in runtime or we can also maintain multiple versions of the image, but present the user with some virtual image that represents those images. And then we can deploy them regularly using the regular provisioning mechanism, either through Ironic to deploy bare metal or other things. So when we go to this whole area there are several approaches for the management. And we define them on the range of shared nothing to shared everything. On the shared nothing each pool, each resource pool is managed by its own OpenStack instance. There is no sharing between the services of OpenStack whatsoever. On shared everything we have a single instance of OpenStack managing both virtual and physical environments simultaneously and all services are shared. So we have the same neutron managing simultaneously both the virtual and physical environment. Of course there are some intermediate solution. We can consider regions and cells as some intermediate solution where we have shared the keystone and separate services. So we all know that OpenStack can natively support virtual host, of course. We can also see that especially lately OpenStack can handle very nicely bare metal support. So why not both of them together simultaneously? So indeed this is what we were focusing on, trying to come up with a single management. We call it integrated management where we have a single instance of OpenStack. So in general the basic architecture is we use a special resource type as we defined it before. We call them the pools. They are mapped to resource pools. In terms of the scheduling we use multiple host aggregates. We use specifically bare metal aggregate and host aggregate filters in order to do the scheduling. And in the network, in addition to the regular administrator network, we also use the separate management network dedicated for the bare metal machines that are used exposed to the user. There are two different bare metal machines. If we deploy compute northern bare metal machine, that's an administrator role as we see it. But we also want to allow a regular user to deploy to bare metal machines and we don't want that user to touch the whole management system. So we had to separate it. And we have the data network and there are challenges as well because we want that network to spend both virtual machine and physical machines. So the advantages are very clear on that approach. They are aligned with what Alex mentioned on the requirement. We get a native and simple OpenStack solution. We do not require any external mechanism. So for example, the repurposing scenario that I presented before driven and managed by a heat alone managing both pools simultaneously. We don't need to coordinate between configuration of two services. So for example, if we want such complex networking configuration, we don't need to take care of two different nutrient services trying to interact between self in order to manage a configuration. We get a simplified administration because we now can control everything from a single point. We get a unified look of the system. This reduced the complexity a lot because we reduced the number of services and it helped us with diagnostic and root cause analysis because we don't need to correlate many log files coming from all over the place. Everything is in the same place. From the user perspective, it's also evident that it's better because the user get to see a unified topology of everything of the whole system and it doesn't need to go to different things. So what are the technical challenges that we have here? So first of all OpenStack, it seems that it has some gaps when coming to manage both bare metal and virtual machine together. So to do it separately, we have the triple O, we have the under cloud and over cloud and all of that, but to do it together there are some gaps. First of all, in the networking we know that we already saw that we mandate nutrient to manage simultaneously both the physical and virtual network. We also want to take advantage of the unique ability of each area. So we may want to use an open flow for the physical and OVS for the virtual and so on and we want everything to be connected. I want to emphasize the fact that I will go through several examples here but those are dedicated specifically to the hybrid case. There are some general issues when you go to bare metal especially. There are still gaps in the bare metal area. We heard about multi-tenancy in bare metal and so on, but those are part of the bare metal work and I'm quite sure everything will work eventually. So on the compute side we need to enhance the scheduler to better supported rigidity. This means that we want to allow different policies for different pools. On physical pool we may want to look on the GPU consumption rather than on the CPU consumption. So we may need to introduce here a different model. We want to support instance management for bare metal. Today it is managed the same way as regular instances which is not reasonable because there are many operations which are not applicable to instance in bare metal. Another example is the hierarchical view. We believe that you need to add some hierarchical relation between the bare metal instance and the compute node running on top of it. You can deploy a compute node on the bare metal and deploy VMs on top of that. If you manage it in two different open stacks it seems okay. If you manage it with a single instance of open stack there are several issues that can happen. For example you can go ahead and shut down or delete the bare metal instance. Everything will shut down and it will be okay. However the virtual machine might stay in some active state or even in a zombie state in the database because there is no propagation between the two layers. Lastly I want to touch two other examples. First for the image management area. Once again once you try to do hybrid application you immediately see things that can be improved. For example the access control in glance. For bare metal we are using, we are using some special RAM disk during the PXC boot. Now you don't want to expose those images to a regular user. The user wants to deploy a patch on a bare metal server. He doesn't want to see such images. However in glance today if he used those images he can also see them. So we need to add some different model in which he can use images but not see them and so on. And I mentioned before the issue of supporting runtime adaptation or runtime selection of images. From the UI experience I will show you a couple of examples. There are things that I can call glitches on horizon but that's probably because it was not designed or it need to be extended in order to support simultaneously management of both virtual and physical environment. In addition some of those issues are coming from inherent issues in the underlying data model and so on. So once we fix those it will be fixed on top. So the last thing I want to go through a hybrid application example. We went ahead and just tried where in the middle of exploring that area we are trying several applications. Try to see how to configure the network, how to set up everything. So what we try to do is deploy a hybrid WordPress. Here what we try to do is we want to have the Apache WordPress on a virtual machine. We want the MySQL to be on a physical node. So we actually did it and we set up everything with a single hit template that deployed everything simultaneously to the two nodes. We used host aggregates for the scheduling so we had to define all of those and we used this image builder from TPELO to build the bare metal images and we had to inject some specific cloud init element into it in order for that to work. I must say that we must admit at least that we had to do some manual configuration of the network. It didn't work automatically. We had to change some of that. You can see here screenshots of the heat stack resources and topology and I can show you what it looks like in Horizon and you can see what I meant by trying to enhance it a little bit because if you do want, and it depends, some people will say they do not maybe agree with that approach but if you do think that we can manage everything with a single open stack and you want to do that in a native way, so it might not be the best way to, for example, present the bare metal machines as hypervisals. We may consider a different view or something like that and you can see that, for example, a regular user can see those two deploy-based images which he actually doesn't know even what they mean. So in summary, a true genius cloud environment are gaining momentum. We strongly believe that it's critical to support those in order to host broad spectrum of application. We already see that. We believe that open stack is a promising solution to manage hybrid clouds, otherwise we wouldn't be here, and by using integrated management, we believe that we can get a simple and native open stack solution out of the box and simplified administration enhanced user experience. We need, however, to have a very careful design in order to maintain all the regular requirements of security isolation when we span across both bare metal and virtual environment, and there are still some gaps that need to be addressed. We are just now starting to explore that area. We already did some work. We encourage everyone that want or that shared the same vision come and talk to us. Me, Alex, we will be happy to collaborate and be sure to stop by the IBM booth to see some interesting demos. Thank you all, and we now have time to questions. You can use the microphone if you want. Hello, my name is Roman. So you said that you want to be able to run the same applications both against the virtual machines and against the hardware machines. So how do you decide the moment when you need to migrate from one type of servers to another type? And can I just ask the question, too, because it's related. So how do you resolve dependencies for the specific features of hardware machines like GPU or something like that? So first of all, on migration, we are not migrating. It's not P2V. So if that was maybe I was misunderstood, we are not migrating the application from physical to virtual. We are repurposing the server. We are changing the goal of the server. So it was a KVM host in the virtual environment, and now suddenly the Hadoop cluster needs another node. So I want to move that one to there. So I'm not doing P2V. And the other one on the bandwidth, I didn't get the question. Yeah, I think I did. So we do see a need to make the resource abstraction richer so that we can express the different capabilities that are offered by different machines so that we can do the matching in an intelligent manner. So for example, GPUs and others, we need a way in the data model to express those capabilities and be able to match them to the requirements of the applications through heat, for example, or something like that. So there are some gaps in that area as well. Another question? So you mentioned a challenge or risk, perhaps, when repurposing a physical machine that maybe has another compute hypervisor on it. You could use ironic to repurpose it in installed Hadoop, but then what happens to the VMs? So you did not really mention that you're using triple O to install Nova hypervisor on that machine in the first place. So what are you doing there? No, no. That's a good question. So maybe I didn't mention that, but the assumption is that everything was starting from there. So everything was installed using that open stack instance. So the compute nodes that you are talking about were deployed using that system. So I have control over that. I'm not discovering an existing system and trying to manage it. Okay? Okay, thank you very much. Thank you.