 So hello, everybody. Sorry for the delay. We had a slight mishap with our presentation Thank you for joining us today We would like to show you how we at SAP are using open stack at home in our shop to service our customers My name is Martin. I'm here with Andrew and Michael. We are all engineers working on our open stack cloud To give you a little background SAP is a software company. So we are like all other software companies currently in transition We are moving from shipping our product on a CD to our customer To a deployment model where we run the application and the customer is just a user using our applications in our cloud and This is where our part the converge cloud actually comes into play So we are what we would call an internal cloud So you can't buy infrastructure services from SAP, but our payload is not development only or in-house applications only, but it's a massively external facing business from SAP. So for us Uptime is a very important factor We are currently Planning to be in 15 regions worldwide nine of those are already up and running and we are shifting gear around the world very quickly To manage our data center build-ups For us performance is very important So one of the key factors is that we want for performance reasons as many as Open-stack services as possible in one network so that we don't have to go over multiple hops and lose bandwidth in between We also want to give our internal customers or our business units a large set of services they can use So we are not limiting ourselves to Just the classical computer and storage, but we also want to use advanced network services like load balancing. We want to offer Secret store and DNS as well and also we have a lot of Services on top that we offer which we build ourselves like our own dashboard and our Billing services built on our HANA database And also we are currently working on Kubernetes as a service so that we can give containers to our customers as well a Little bit to our Scale so in absolute numbers. We are ranging roughly under 100,000 cores We are currently around 1.1 petabytes of memory and We have around 40,000 running instances and a relatively high turnaround. So we are counting around 4,000 instance operations a day meaning Create an instance do something on a deleted We do it again. We have a lot of customers using automated tests So our turnaround in general is very high as you can see from the numbers We are expecting basically a full cycle of all running instances every 10 days We also have a relatively big net growth. So a couple hundred instances a day Remain after all those operations actually a long running We Have two main things which might differentiate us a little bit from what everybody else is doing with open stack And I will shortly try to give you an introduction what it is and why we are doing it So first of all our 10,000 foot view would be we have a Control infrastructure underneath where we run our chorus and Kubernetes, which is fairly common nowadays We have been doing it for quite a while now On top of that we run our control plane infrastructure So the open stack itself our analytic services our monitoring and everything else, but our customer payload we See as a completely different entity. So we don't mix our control infrastructure and the infrastructure where customer payload Actually is running on so we are investing a lot of work into splitting those up properly The one thing We do need a scale and we had bad experience and with trying to run Our network in a way that we use overlay from end to end from device to device So we decided to approach it differently because we need to integrate a lot of services into those networks for performance reasons So we use something which is called hierarchical port binding in neutron and Those are the particulars, but in the high-level view would be We use overlay only within the network and as soon as the packet leaves Any switch there is no more trace of an overlay network. So this is a functionality Which neutron gives us where we can run at different parts of our fabric different protocols. So VMs only CV lines Storage boxes only CV lines network device only sees a villain except for the core fabric It sees the excellent packets and it gets encapsulated and de-encapsulated at the network edge level So this gives us the opportunity with very little overhead on the attached devices to achieve a relatively large scale This is one part where we look a little different And it's also one part which took us a little time to get implemented because it's not the most well publicized feature and the second thing is as I already said we have a strict split between the control plane where OpenStack actually runs and the data plane and We are concentrating our efforts mostly on maintaining a lot of low-level drivers where we can use OpenStack's orchestration to remote control already Installed equipment in a data center or equipment which we have been comfortable with running at scale for a long time already. So Imagine an large enterprise which has been using these centers for the virtualization for ten years It's a huge benefit if you don't introduce a second hypervisor another operating system where it runs on and a new orchestration layer at the same time, but you can leverage your existing operations infrastructure to basically orchestrated in a more seamless way for the user. So as you can see we are running our L3 through hardware routers. We are running our L2 through through a Cisco ACI fabric. We are using established hardware load balancers. We are using V-Centers and ESX servers as our virtualization platform And we use Cinder and Manila to drive our various storage backends And so basically if you Imagine that we can this allows us to treat the SLAs and SLOs for our two parts of the cloud completely differently So we can define that we don't need to care about the Kubernetes run OpenStack site that much if we can ensure that the customer traffic survives an outage in any of the OpenStack controls So if the dashboard goes down for 10 minutes it's less of an issue than if an external facing application suffers a 10 minute outage and To give you a little more insight on how The left hand box works. I'm handing over to Andrew Sorry. Hi, so I'm gonna focus in the top right hand corner, sorry on how we actually deploy the The OpenStack and our own components on on Kubernetes and and how we get it running so There's a slide that if anyone's been to any presentation on OpenStack, this is probably already well known but we're using Helm to deploy and Helm is a package manager effectively for Kubernetes and the package in this case is called a chart and it's providing tooling for install update delete and It allows us to compose a OpenStack deployment based on a series of dependencies so a release in in Helm is a Version deployment of a particular chart and in our case that reflects as an OpenStack region and we can go with Kubernetes and sort of split control data plane from a empty Kubernetes infrastructure to a Running productive environment in about five minutes and we can do pretty much zero downtime Subminute possibly sub 30 second updates So what we've created is a set of charts that allow us to Deliver a group of regions Which are pretty much identical in the top? right hand corner again here And all we really want to differ in those regions are the things that make that region unique so it will be things like endpoints The definition of the hardware passwords the rest of it the functionality is identical across the region So what we do is we create a region chart Which just has a dependency to a meta chart this OpenStack Helm and the region values that we push into Helm that generates the configuration at this level are just passwords endpoints domain names hardware configuration And we rely on the next level down the the OpenStack Helm chart to allow us to deliver a predefined set of unique of a constant function OpenStack deployment below that we then have the individual components So we have Nova Neutron Cinder plans whatever we're putting in there and each one of those then has a set of default values That tell it how it's going to behave So really the the behavior is fixed through the chart structure and then below that we have Infrastructure components like Postgres, RabbitMQ etc and a typical chart would look Like this this is Nova so we have deployments in Kubernetes for all of the Processes that we need to run we have a an ingress and service for all of the Services we need to expose either internally within the cluster or externally through the API or as an API and the things that are slightly different from some of the other sort of OpenStack deployments that we've seen are the the way we're handling the the DB migration and also the way we're Using operators in Kubernetes to scale our hardware. So in terms of the DB migration we we have a job that runs and That runs the migration and in Kubernetes that will stay there until we delete it if we need to upgrade the database We delete the job run a deploy again, and it will it will deploy the DB and for the Compute nodes and any hardware scaling that we're doing we have an operator, which is basically responsible for listening for external events and creating the necessary Kubernetes constructs that allow us to provide hypervisors and all of the configuration and all of the The values that we previously showed are being mapped in as config maps and then mounted into the the containers as as volumes and then transformed into into OpenStack configuration and What we do is we have this big? chart That we just throw out to Kubernetes and it deploys everything and what we found initially was that that led to Timing problems with things like databases not been available or RabbitMQ not been available and luckily We were quite early to the game, and I think we were also joined by Stacking ATIS and they solved this problem with this Kubernetes entry point Binary which basically provides very basic dependency management so what we do is we throw everything out and we define within the specs the Dependencies that we need to keep or ensure that the system is eventually running so we start off with the databases and the Ingress and services been been running and then we wait for the migration job to start So that's got a dependency on the database obviously once that's completed the API starts up. It's dependent on the migration and and then once that's Available the the headless services the conductors schedulers etc. The compute nodes can start to Can start to go in there and it's the and then the operator takes care of scaling up the hypervisors once we've once we've done that and how that That works we have a thing we calling a v-center operator It doesn't follow the the core OS pattern exactly in terms of the the way it's Sort of defined to work in the in the documentation, but it basically provides us with operator type Behaviour so we we deploy that into the into the cluster and it's listening or polling a DNS DNS service and as our infrastructure guys build out a new v-center The last thing they would do would be to create the DNS entry for that for the API on end point Once that once that happens The operator takes over and finds that it's got something new to configure and it takes some pre-configured Configuration for things like the username and master passwords that it can generate the necessary configuration for the compute node And then it will create from a template the necessary compute and compute pods and also the config map to to get that working so what we can do is we can say Create a v-center put it into the environment and it's immediately recognized and configured and up and running And we're using that also for seeding keystone. So things like service users passwords Endpoints, etc. They're all they're all been managed and populated via a An operator and we see this we're not doing it at the moment But we see this as other ways as a way to scale the other components in the in the data plane So new load balancers new routers, etc. Would be self-discovered by an operator and configured or the necessary agents would be configured to bring them online and This is a very small but Black screen which shows the the result of the of the helm install So at the end of the end of that five minutes, this is what we're left with it's basically all of the components we've got running in the In the system and hopefully there's a demo at the end which will put this into a bit more bit more light so what we What have we learned during our two years? I think the first thing is that generally open stack works pretty well on Kubernetes I think that seems to be well recognized In the last two summits the main problems we've seen a headless services are very difficult to health check So things like conductors schedulers. It's very difficult to tell whether they're actually Doing what they're supposed to do and we've also had problems with signal handling and orchestration. So The orchestration that we've solved with the Kubernetes entry point and we're using dumb in it As a basic process manager that will handle signal mapping There's a lot of cases where some of the open stack components don't react to the the signals that Kubernetes is using to manage the podlight cycle The other problem we've had is that some of the some of the components don't They don't behave very well if you just throw them in and they're not ready for the environment that there are the environments Not what they're expecting. So for example a conductor process Tries to start if the database isn't there. It just waits. It doesn't retry. It just waits It looks like it's running, but it's not and there's no way for us to find out whether it's really working or not so there's a few gotchas I guess in the in the individual components that you have to work around and Luckily the the orchestration that we're using is doing that The other thing that we found is that the the big monolithic helm chart that we've created doesn't really scale very well not in terms of technical scale, but in terms of the way people are Interacting in it. So we started with one or two people building it and we've now got a team of maybe ten people interacting with the helm chart and the question of is it safe to deploy is asked probably daily and so what we'd Recommend I think is looking at and what we're going to try and do is to try and break that out into individual components The reason we didn't do it in the first place that was that when we started helm didn't really Help us to do that and then but there's been a lot of development a lot of features in the last few months That will hopefully solve that problem for us. So we're looking to split that up the other thing is the is the monitoring from our previous life we created an open clone or It was actually developed in parallel and one of the things we didn't do right at the beginning was integrate monitoring We put it in at the end and it was not not great so the thing we've We've act we're accepting is that Things are gonna break. There's a lot of moving parts and it's very complicated to understand what's going on And with the nature of Kubernetes is those processors are gonna stop. They're gonna move around they're gonna They're gonna start up in a way you didn't expect so we've We're trying to use a lot of the the sort of core capabilities of of the Kubernetes Seeker system particularly Prometheus to to scrape out a lot of metrics and we've Integrated a lot of middleware into or some middleware into Open stack that will allow us to export via stats D as many metrics as we can find within the open stack components And we've also got some similar middleware that's doing the exception reporting for us. So we're not we're not log parsing where actually capturing the exceptions as they occur which has helped us significantly I think in terms of problem-solving We're also using a lot of canary tests so that when we make a deployment into a region We've got both data plane and control plane checks which very quickly alert us in our slack channels whether or not something's gone wrong, so we're Yeah, we're very We're very keen on monitoring as much as we can the in terms of open stack there's Where it when Martin said we're an internal cloud We're trying to be an internal public cloud so we're trying to make it as easy for our consumers to consume the cloud as possible So we give them a self-service portal a little bit like any public cloud dashboard and we controlling things through things like quotas access control and we're setting up administrative domains in in Keystone and We're then restricting what people can do through policies now all of those at least from our perspective are very niche features in OpenStack because every time we try to change them It's taken a lot of effort to make them work and Those things for us are really really important in terms of being able to restrict what people do or allow people to do things and to control How much people can use? The other thing is and it's related to that is with it We we believe that open stack needs a proper UI for end users horizon We couldn't really give it out to our end users. They wouldn't They wouldn't use it and it doesn't support most of the features. We need to control it. So things like quotas policies domain support they're all very sketchy I think within within horizon and there's no self-service for people for getting people on board and for allowing people to access requests and for letting administrators process those requests. So we've built a dashboard which we call Elektra which Provides pretty much all the features that horizon has but then all of the things we need to allow our users to use the system and What we'll do now hopefully if we've got time I think we have we've got a very short demo which Really talks about that all shows the the effect of the control plane data plane split So hopefully this will work I'm not sure how I make it play So it's going to demo a very simple scenario of Taking down some elements of our control plane. So this is our dashboard here and we've got a couple of VMs which are running a very simple web application which we'll show shortly and We've then got a load balancer, which is basically load balancing between those two servers and This is our website which is very close to Michael and my heart And as you can see it's working Everything's running and if we look at our dashboard here everything's green and and rosy So what we'll do Now is I'll set up a ping to the two well to one of the VMs and one of the and to the load balancer endpoint Sequences have been shortened. So you might notice some skips in the sequence numbers and this is our our control plane that we showed earlier and what I'm going to do now is delete neutron so or delete all the neutron components from from our control plane and And because of this control plane split now everything's still running through the hardware you can see the monitors are starting to Are starting to go red and if we look on the control plane now will see hopefully all traces of neutron have been expunged and if we now go back to the look at the monitors, I think the API is going to go down shortly and It's unfortunately showing the one thing that we're keeping in our data plane or in our control plane Which is the metadata service we have some plans there But we haven't implemented them yet and the API and the LBAT API have gone down for neutron now but our valuable mustache website is still hopefully working and As you can see the ping lived throughout the the downtime and then what I'll do now is just do a helm upgrade Which will now make it so again. So that Predefined release is now going to be applied and hopefully neutron will come back It'll come back up. So that's the helm deployment finished. It gives a nice report of what's happened and if we now look back on the On the pods we'll see that neutrons hopefully back and running for us. So we're back into a Healthy state and if we now go back to the To the alerting screen we should hopefully see that those alerts are going to recover Hopefully that's it. So that's how we're I guess managing a very relatively complex open stack deployment across multiple regions and how How helm is helping us to do that with Kubernetes and Now hand over to Michael. He'll tell us what's happening next All right, so I want to talk about what's next for us with converge cloud Andrew as you heard talk mostly about we how we are leveraging Kubernetes to help us operate our open stack and Now that we have open stack running. We're actually thinking about how we could use open stack to Make our Kubernetes better But first I want to take a small step back and tell you how we're actually installing those Kubernetes control planes When we start out we get completely bare data centers and there's nothing in it We don't have any infrastructure nothing and So we started to build out our own little machinery to actually install those clusters on bare metal machines and I went to the talk yesterday about the Kubernetes club sandwich and The speaker was actually asking why would you want to install Kubernetes on top of open stack? If you could just as well put it on bare metal, you should be in a happy land, right? But my answer to that question is it's not really a happy land Kubernetes on bare metal is just as hard if not harder For starters, we need to build that infrastructure It might be easy if you install Kubernetes on your raspberry pi and you just type in the commands but if you're looking at building up 15 regions and each region consists of 10 or 11 hosts You're in a bit of a trouble so we started to create that infrastructure and You get to a point where you're thinking am I building ironic or should I maybe spend my money on some product and buy this thing so It's a bit sketchy there and we also figured out that we're really missing the under cloud in Kubernetes There is no load balancing which is native to bare metal and we have to figure out a workaround to actually expose our services And there is something but it's a bit of a stepchild. They are using these external IPs We don't have any volume management natively we also have to go back to our vendors and ask them to have Some kind of intermediate component which brings us volume management into Kubernetes or we have to do it manually as we do at the moment and Also the whole networking part in Kubernetes on bare metal is an interesting puzzle with which is left for the interested Admin to figure out or you have to go down to the marketplace and talk to a vendors what they have to sell you So we actually implemented something called we call the cube parrot. It's a BGP Speaker, which is Kubernetes aware and so it talks to Kubernetes and it talks to the underlying infrastructure And it tells the whole system or the then networking components How the Kubernetes networking is supposed to look like but you have to do that your own on your own and Finally Kubernetes is not How do I say that it's not really supporting the bare metal use case that well Google themselves. They are running on GCE and that's their main use case what they are driving and we often run into edge cases where we are thinking like How is this even working is anyone else using this and that's just the reality We are seeing with this whole bare metal thing. So how do we get out of that? trouble The first step we are taking is we are starting to manage our Kubernetes clusters to with Kubernetes So this picture actually looks like a no-pray now. Why are you not are you not doing that in the first place? And when we started we experimented with Self-aware Kubernetes clusters, which could manage themselves, but all of that stuff is getting quite complicated So we are now finally came to the conclusion that we are just managing kubernetes with kubernetes instead of building up our bare metal IPXE ironic clone further That also allows us to use a unified tooling So we're also going to use helm to actually install kubernetes themselves and All the auxiliary systems that we have rounded like the Prometheus servers which manage this whole stack and Some extra components that help with operating the the open-stack cluster But ultimately we're still missing the under cloud The interesting thing though is that now that we have that whole machinery and we actually have a running open stack We came to the idea. What if we take all of this and we stick it on top of open stack? Then we're gonna end up with something like this instead of bare metal. We have the same cluster on open stack and We try to reuse as much as we can from our bare metal infrastructure We can reduce all the the hacks and the bare metal edge cases that don't really work And we can recycle our own procedures and stuff and actually give this thing to our customers So we are thinking about or we are actually implementing Kubernetes as a service based on this concept and we are reusing the same Principles and mechanisms and tooling and software and everything which we have for building up the kubernetes clusters in the first place And what's really cool about it is that what Andrew showed you with the control plane split is also now happening for our customers So if they set up load balancer in kubernetes, it actually ends up configuring An f5 load balancer through albus and it gives you the same control plane split as we are leveraging for open stack itself The we also have cinder volumes then there is native support in And the open stack cloud provider for kubernetes. We get auto provisioned volumes right from cinder we have Native new to networking also in the cloud provider what we intend to use is a flat network where kubernetes just talks to Neutral directly and sets up static routes for the pods From here we then thought what can we actually provide to our customers to really super charge those clusters Now that we have open stack, maybe we can do some cool features which no one else can do And i'm going to show you like two examples that we are implementing at the moment And one of them is an open stack native increase controller To give you a bit of context increase the word fell a few times already It's an l7 reverse proxy in kubernetes and it allows the users to Easily define an increase point for their services Like if you look at the bottom left That's a increased spec and as you can see there's a host name in there and a service name And what's happened what's happening in the background is that kubernetes will pick up Whatever increase controller you have deployed and make that spec a reality There is the most common increase implementation is based on engine x So what's actually happening is that when you deploy that you get a bunch of engine x servers Which listen to the kubernetes api and reconfigure themself depending on the users put into For gce there's a native implementation where the increase controller actually talks to the To be to be under cloud and sets up the load balances And since like a week or two ago, there's also an increase implementation for aws Which our colleagues from cora is provided And our intention is to actually implement this increase controller for open stack And so that we can all use it and what would ultimately ultimately happen is that the spec on the left will be translated to Will be configuring a native hardware f5 load balancer, which gives you like Hardware tls termination and enterprise grade load balancing We're going to connect it to designate. So all the dns names are already automatically being set up So you don't even have to worry about how the keystone cloud sapc name gets into the floating ip of that load balancer And one big problem that we're seeing is the certificate management for all these applications For our open stack, we actually have 30 certificates per regions And give that times 15. That's a lot of certificates to manage So you need to have some kind of automated process to renew those tls certificates That's something we have already and we intend to just stick it into this controller as well So that our users get this whole nice workflow right out of the box and don't have to worry about it at all anymore So that's one example the second example how we can supercharge those clusters Is about the gpu demand that we are seeing Just like everyone else. I suppose we have this machine learning hype in sap And our business units are starting to request gpu resources from our cloud The use case we're seeing most commonly is a tensor flow on kubernetes So for us, that means we have the problem that we need to mix and match VMs and gpu resources for our customers And also here opens that can help us to actually build this up quickly We're just going to use nova and ironic And for our orchestration layer on top, it doesn't really make a difference whether it's a VM or whether it's a bare metal gpu box So with those two examples, I'd like to conclude the presentation And you can find all this magic source actually actually on github Most of it is open source And you can see find it in the organization sapcc for converge cloud With that, thank you for your attention. And if there's any questions, we're happy to help you out Or is there something option can do better to accommodate your use cases or scenarios? Yeah, a lot of it isn't actually Open stack code. I think there's a lot of little bug fixes which We keep trying to get ground to putting upstream, but it's a very hard Long process a lot of the stuff is just completely standalone or it's things like the helm charts which are You know, then there is a open stack helm In open stack now, but there wasn't when we started and it's it's uh, it's very similar, but not quite the same and I think The actual changes that we've made or need to make to open stack there Largely in vendor code and we're working with the vendors in our data plane to To sort of upstream some of the requirements and the edge cases we're finding But there isn't actually that much that we're changing or need to change within open stack apart from You know, there's things like we're horizon. We just We just found it was going to be just too hard to to change so we just wrote our own because it was going to be quicker So I think it's not really Um Yeah, it's not really things we need to change upstream I think there's a few things that we'd probably like to maybe discuss with people and You know in the longer term, but it's not it's not something that we need to do I think at the moment in terms I don't think it'd add any value to Upstream our ui. For example, it's there and people can use it And if we want people to collaborate with us, they they can do until we get some traction Were there any particular challenges with nfs as a service in your environment? Not really Or not I mean to be honest. It's not one of our heavily used services So we're we're seeing that as we scale there are issues coming with some of the You know some of the edge cases that we're finding but I think I'm not aware of any particularly, you know, where it's it's basically playing manila with With a net at back end and so far It's it's working pretty well, I think Is there anything we should be looking for or Just wondering that's all Because there was not necessarily mentioned in manila Yeah, no, I mean there's there's a whole load of we Unfortunately, we probably don't have time to go through every time There's a whole load of things we we would have liked to have gone through but yeah, I mean We haven't really had any Apart from the things I mentioned from an open spec open stack perspective We haven't really had any sort of major things that The Concernings and a lot of that I think is is because of the way we're pushing it Into into the sort of more traditional Hardware space, so we're really just remote controlling The hardware and there are issues and there's things we're working with You know some of the vendors with to solve but they're not I don't think most of them are unsolvable problems Two more questions, right? So Did you settle on ironic for bad about the positioning or I didn't get your conclusion on that part Yeah Other than that so the recent open stack on this one, there's a lot of deltas are on OVNs And people Hardware acceleration for those things If we had to redo this, would you have reconsidered any aspect of it not working for the networking part? Yeah, I mean there's always things you would change but fundamentally no actually it's still from our last iteration where we had You need to come back Where we basically had we're in a situation that We had our own platform doing stuff like this in Completely our own way and we were modeling upstream APIs from amazon to give to our customers And there we already did a lot of orchestration of the back end and We decided against Some overall arching sdn seam We tried to make it simple because in the networking part when you make it complicated It doesn't scale anymore and it's not operatable anymore. And so we tried to Make the orchestration Let's say more complex But keep the actual on-device configuration as simple as human possibly humanly possible to keep it operatable. Otherwise If you have basically if you need to trace a packet through five different software switches and try to find where the Thing gets lost that's You're doomed to failure or you're having a very hard time to operating that in the long term. So the general concept is still I think Something we don't regret yet. I think I think the other thing is we're a very small team where I don't know maybe Even without our extended team less than 50 and I think leveraging our existing support for The hardware-based stuff is really helping as we it's a problem We can sort of Not worry about if that makes sense or that we have colleagues in other areas of the of the organization that can handle the support Without necessarily needing to know that it details of open stack or any You know any specific implementation All right, thank you folks. Thanks