 Okay. So hello, everyone. I'm Barry McClearn, and this is Matt Rooney, and we're from SAP UK. We work in the cloud architecture and engineering team. And what I just want to talk today about how we have integrated additional open stack services into our bare metal as a service solution. So SAP has been aiming to consolidate a number of cloud platforms that we have internally around the shared open stack layer for the purposes of making the best use of our resources and infrastructure. So we want to talk about how we have leveraged open stack and re-architected the solution that we previously had to better make use of it. So I'm going to start by just giving a quick overview of what we do in the cloud architecture and engineering team. And also what we've done since the last summit, which was in Austin. I want to talk a bit about how we've re-architected our solution, how our new design has actually made it much more beneficial for us. Talk a little bit about some of the issues that we've faced when we attempted to do this re-architecture and then what we intend to do from here. Okay. So the infrastructure that we work with is we call it the infrastructure pod. And for us, that just means it's a standardized set of infrastructure where we have a number of nodes, or Burmese nodes from a single vendor within a number of racks connected by top of the rack switches. And then they are connected over a high bandwidth interconnect to another rack with a mass shared storage solution. So on top of this, then we deliver our hand of landscapes to our customers. And the benefit of this infrastructure is that it's optimized for performance, scalability, reliability, and security needs of our customers. And you can see that when we're talking about our nodes, we're talking multiple terabytes in order to support our in-memory HANA database applications. Okay. Within our team, then we work on a solution called Cloud Frame Manager. And Cloud Frame Manager is Burmese as a service solution that we use to manage the lifecycle of our infrastructure resources in the infrastructure pod, and deliver our hand of landscapes for our customers. So Cloud Frame Manager, or CFM as I'll call it, acts as a control plan for the Burmese infrastructure. And it does things like our server provisioning on our Burmese. It also sets up all of our network interconnect and our, sets up storage management. There's a diagram there, sort of giving a quick, brief overview of Cloud Frame Manager. Okay. So as I mentioned before, SAP actually has a number of these Cloud Platforms, and there is an internal push to consolidate them so that we can make the best use of our internal resources. And the plan is to build all of our platforms on top of a shared open stack layer, so then we can use the same infrastructure. Okay. So we actually presented some of this work at the summit in Austin in April. And it was basically our first steps into building open stack into our existing solution. And that involved integrating with Standalone Ironic to provide the Burmese provisioning for local nodeboot, which is not something we were doing previously. Previously, we were using all of our nodes with NFS root solutions on a shared storage device. But we had certain customers that required local boot. We actually have this solution running now live in our data centers with live hand instances for our customers. But based on this, we had a look at our solution, and we saw ways that we could improve it. So at the start of the year, we began to look into how we could re-architect our solution to be more flexible. So CFM was designed with a specific use case in mind, a specific stack with a HANA instance on top of maybe multiple nodes, which we internally would call a frame, which also includes the network setup and the storage. But our solution, because it was tuned to this single use case, was less flexible. And since we had already done some work to integrate Ironic into our solution, we saw there was a potential to move even further towards OpenStack with a much more flexible design. So with this architecture, we moved to something a bit more componentized, where we could build it on top of individual OpenStack components. And we actually made use of the OpenStack API in designing our own API to make it as flexible and improve interoperability with our other cloud platforms. So we've now integrated additional components, such as Keystone, Nova, Neutron, and Glance, as well as continuing to use Ironic, but no longer in a standalone manner. So you can see we have a diagram of our re-architected solution. So we still have a number of ways of interfacing with our API. As we call into our API, we now call into an authentication back-end, which is actually Keystone. The requests move through a number of plugins. So one of the plugins we have is to contact the OpenStack API directly. We also have other plugins to perform other operations, which are not necessarily into OpenStack, such as to deploy our HANA instances. And with this sort of combined solution, we are able to control our entire infrastructure and set it up for our customers. So I just want to move into talking about a couple of the problems that we faced and how we actually worked our way around to get a solution. Some of them have involved upstreaming our code, some of them have involved being a bit more creative in terms of our worker runs, but I'll go through each one. So the first one that we had is that we need to do read right as we're provisioning. So Ironic actually does provide read configuration automatically, but it's done during the cleaning phase. And for us, this wasn't a workable solution, because we need to provision the node for any customer, and our different customers will have different read configurations in mind. So our solution to this was to actually store the read configuration that we require in the flavor, has specific flavors for specific stacks of different customers, and we passed this information to Ironic at the point of provisioning. So the first step in this is actually to store the read configuration in the properties of the flavor. So this is one of the blocks of metadata that we can store arbitrary data and extract them later. The problem we found that there's actually a 255 character limit on each of the property fields. So our workaround was to have a non-standard schema. So Ironic has a JSON schema for defining your target read configuration when it gets sent to Ironic Python agent. So to get around this, we actually used a positionally significant array of fields in order to store some of the same data. So you can see an example there where it defines the name of the read configuration, the read level, so you can see the first ones are just read one. And the next one is then two, as in the number of drives that you want to read. The type of drives, the size, and a flag that sets whether or not it's the read volume. So we have one read volume and one non-read volume, and in the non-read volume we also have to set a mount point, which we use in our own solution to determine where this additional volume should be mounted in the file system. So then whenever we have selected our flavor and are deploying our instance, we actually pass this information to the Ironic node whenever it acquires the instance UGID of the server that we are provisioning on it. We have one more issue there where when it's in the deploying phase, it's locked, so you can't actually modify the Ironic instance. So what we had to do is wait until we're in wake callback and then parse the information out of the note info whenever Ironic Python agent starts. So additionally to that, then we wrote our own custom hardware manager and Ironic Python agent, which picks up the read configuration that's passed to it, and then actually modifies the Git OS install device workflow. So before the actual OS install device is set, we actually perform the reading then, and then we are able to get the read configured drive as the OS install device before the disk image is written. Okay. So then the next issue we had was Neutron VLAN isolation. So this was actually mentioned earlier, the multi-tenant networking in Ironic. So previously in older versions prior to Mitaka, Ironic only supported flat networks. Neutron actually does support VLAN segmentation, but Ironic wasn't integrating with Neutron to provision the servers on the tenant networks. So the issue here was that we weren't storing the information about which port each of the Ironic ports for a node were connected to, which didn't allow us to then isolate the traffic on specific VLAN segments. So we worked on this with some of our partners and actually upstream this bit we've worked into Neutron and Ironic. So we actually added the segmented network support into Ironic with its Neutron integration. So there was a number of changes that needed to be made there. We had to change the discovery process in Ironic Inspector to pick up the information about which switch port the network interfaces were connected on. Obviously that also required changes to the OpenStack database to store this new information along with the port information of each node. We also then had to make a change that allows us to pass a single port to Neutron rather than all ports at the same time. And then finally we had to work on ML2 driver implementation to add the VLAN setting to the interface. So the good news is that this is now live. It's been available since Mitaka and it's also in Neutron. And we're also currently working on adding port group support. So we do a lot of port bonding in Cloud Frame Manager. So this is something that will also be beneficial to us. So that's in progress or not. Okay, so the last issue I want to talk about that we have had problems with is node locality in Nova. So as I mentioned before in the infrastructure pod, we have a number of Bermetal nodes in a single compute cluster. And then we have multiple compute clusters that then share a storage node on another rack. The problem we have there is that our HANA instances would need to share the same storage node in order to get the best performance. So we need to know which set of nodes we are provisioning on. So Nova does currently provide affinity, anti-affinity, but this is for VMs only. So this doesn't quite give us the solution so that we can determine that we have selective nodes all from the same infrastructure pod. So our solution was actually to implement a pod or scheduler filter into Nova. So all of our ironic nodes are now tagged with pod-specific information, including which shared storage device is being used for that pod. And then when we create a frame, we create all of the servers at the same time. So all the server instances are then scheduled to use the same pod by taking the pod information out of the ironic node info. Okay, so just to wrap up, so we now have a new solution that uses a more component-based design. And it gives us a lot more functionality and a lot more flexibility for if we wanted to include additional open stack components in the future, it'll make it much easier for us. By basing our API on the OpenStack API, we can ensure that that is much easier to do, but it also allows for interoperability between our solution and the other cloud platforms that SAP provides to our customers. And also because we were able to use our expertise that we had already built up with an OpenStack and also our contacts within the OpenStack community, we were able to overcome a number of issues and actually provide some code upstream to improve other people's OpenStack instances. So looking a bit more into the future, we are going to be deploying this rearchitected solution into a production environment sometime for the end of the year. We're hoping that's going to go well. We are also looking into testing and integrating a number of upcoming ironic features. And finally, we're looking into leveraging Manila in order to better manage our shared storage devices. So currently we use an in-house driver for our NetApps, and we would rather be using a Manila driver for our NetApp storage. Okay, that's everything. Thank you. So a lot of this work is based on a, a lot of that work is based on a white paper that we have. So we have a couple of copies of it here if anybody wants to take one. But apart from that, if you have any other questions, feel free to ask. Thank you. Thank you. Yep. What type of door switches do you have and which particular ML2 driver from Neutron do you use? I mean that one that supports erroneous multi-tenancy feature. Yeah, so a lot of that work was done with Arista. So we work very closely with Arista to create the Nova, or the ironic Neutron integration and also on the ML2 driver for Arista. Thanks. Thank you. I have a question related to the scaling. What kind of solution do you use to kind of scale your bare metal server, server provisioning? Yeah, so we actually use our own in-house solution. So at the point of creating, we call it a frame. So we create a frame that consists of a number of nodes that are connected to a set of networks and switches. Each of those are then have some sort of isolation to ensure that customer data stays private. We can then scale up, out, up and down with our own solution. And again, we're just sort of expanding that notion of frame by adding additional nodes into it. So that's mainly, that's all handled by our own in-house solution. How do you manage upgrades, patching, et cetera? Okay, that's actually something that we're working on at the minute. Again, we have our own in-house solutions for that. So because all of our compute infrastructure racks generally use a single vendor, we just get the vendor tools and are able to automatically call into them. So the current solution we're using is Ansible for a lot of that work. And the lifecycle management. I mean, can you kind of upgrade or change the servers once they are old and replace them? Well, yeah. These works, or how do you address the lifecycle management of the compute servers? Yeah, so again, because we have these racks, we generally use those. If we were changing specific nodes, it would usually be for if there was a fault or something like that, but then we can use our own solution internally to migrate all that information across. Okay, thank you. If anybody who has any additional questions, feel free to come up afterwards. Thank you.