 Hello everyone My name is Sebastian Schiele. I'm one of the founders of Luzer and with me I have I'm Simon Pearce from CIS 11 from Berlin and At CIS 11, we're a managed hosting company from Berlin. We've been offering our customers managed hosting for the last 10 years and Up to now it's been a fairly traditional platform. We've been offering virtual Linux containers We have about 4,000 of them which the majority are managed with puppet as configuration management and orchestrated via our own tooling over various hardware nodes Distributed over a few data centers. We also have our own open stack cloud Which we run for infrastructure as a service and we basically wanted to extend that portfolio With Kubernetes so we were looking away What would be the best way for a medium-sized German hosting provider to run a managed Kubernetes offering? Of course, we're not Google. We're not AWS. So we needed to find something else and our partner Luzer here helped us integrate great installer into our network. Yeah, I'm from Sebastian. I'm Sebastian. I'm from Luzer We are a startup from Hamburg and we build a platform How you can run and manage multiple Kubernetes cluster like a good container engine in your own environment on open stack on Bermetal Or on different cloud providers and together with this 11 we built up the platform for them Yeah, so we would basically like to take some time to talk with you about that today and basically show us how we did that and Give you a short live demo of everything works out as well So you can actually see how that installer runs and how you can basically Set up a Kubernetes cluster with a few mouse clicks So should we get started? Yes, okay So basically we start off. What was the challenge for us and also for for Luzer basically getting a management interface up and running which would allow us to maintain and View multiple clusters. We didn't want to have to log on to each individual cluster or to check the update status of The different customer clusters we wanted to have like a unified web interface Where a system engineer can look at and could see the state of the clusters would be able to start an installation for a customer if you didn't want to do it himself or to To see basically the version of Kubernetes were just installed. Maybe even add or remove a node Also a thing that was very important to us was multi-tenancy, which is something which Most of the offerings that we've seen up to today seem to lack They all seem to focus on single projects or on single companies what we our idea was basically to have this multi-tenancy approach where you could have more than one customer running and We also wanted the master components as a complete managed service so we want to take care of the API server the cube controller the Scheduler and also the ETCD. We want to take all that hassle away from the customers So they don't have to deal with those sort of things and give them the ability to be able to Update that this very quickly if they want to and also an administration interface where you could maybe at a later date Change certificates add users and do various things with the cluster and also offer a choice of add-ons Which could be installed one to be named here would be a CNI so you could choose between for instance calico Canal flannel see the customer would be able to have things like network policies if you needed it or other people might be More interested in BGP networking so give the customer the choice and allow him during that installation to choose between the different Network interfaces which exist the K8 K8 s master is run as a container And it's run in an individual namespace for the customer which I will show you in detail later on We also have a single service endpoint which we maintain which has also been a challenge We have like one IP address, which is the API Service and that endpoint is for all of the customers So we have multiple clusters running behind one service endpoint Which also allows us to do a lot of nice things with that as well and we can also upgrade multiple clusters as well So we're able to view all of the clusters see them Check the upgrade status and maybe upgrade them all at once Otherwise you'd be sitting there. You'd have some form of terraform or maybe ans of all playbooks and you don't need to upgrade every individual cluster, which will take forever to be honest and it would also Bring lots of errors with it as well trying to do that We tried it in the past and it didn't work very well It scales up to about maybe 10 20 clusters But after that you need so many members of staff to actually do and manage all the clusters It just doesn't work out another thing is user role management Which is also very important so you can have Different roles with our back people that are allowed to deploy to maybe different namespaces and also provide the customers with Unified way to install home charts and different things the idea is later on to have like a service catalog Where you'll be able to use a monocular to install various helm apps and other things from the dashboard most of the existing tools, sorry most of the existing tools focus on a single cluster, but not really on multiple clusters as I said earlier most of the Existing solutions don't really seem to have that As a model The access to the K8s master is also slightly different because behind that one IP address We don't just have one K8s master running. There are multiple masters running So we need to find a way to actually get that API call to the right master and I'll show you how we did that later on as well There are various ways that this can be done and We require within our stroller a minimum of three VMs because we're running this on top of open stack You can basically choose the flavor that you want similar That you would do when you spin up a traditional VM with a cloud provider You can basically choose the flavor or you can mix different flavors But we require that you have a minimum of three VMs so that you can distribute your pots between the VMs Of course, you can you can add more at any time you want. We can add Prometheus. You can check the Utilization of your cluster and you can add new ones at any time and that can also be automated as well and Of course to accomplish this additional tooling is required One thing may be a specific proxy to get into the correct cluster Yeah, and so when we started two and a half years ago with customers working on Kubernetes We are also Google Google partner and when we were on the Google cloud It was every time quite easy press a button and you have a cluster and for other customers It was every time a challenge. Okay, we built it with their tools and then it starts every time the same Same way so first we build up a first cluster then they ask hey, can you help us with updates? Then they come we also want to have some more clusters. Can you help us building up more clusters to manage them and Yeah, we want to have something like a self service and it ends mostly it was like Yeah, we want to have something like a Google container engine But on our platform and then we did this a few times for customers and we're asking ourselves Can't we do this better? Can we build something which feels and works like probably a Google container engine and the customer can run This and manages in their own environment on their own platform like OpenStake by metal or even on their own cloud provider and so What we really want to achieve is like Providing a self service for the developers the developers can create and manage the clusters so it can concentrate on operating the infrastructure But the developers can create clusters can decide on their own when it's time to upgrade my cluster how do I want to size my cluster and Different teams can have different clusters instead of putting everything in one big cluster Everyone can decide. Okay, should I have a bigger cluster or do I want to have smaller clusters also? Probably building up clusters in your CI pipeline like we have a work job running You want to test something spin up quickly a cluster do your jobs? throw it away again Another challenge was of course updates of a cluster So we really want that the developers can focusing updating the cluster and it's not a big deal for the operations updating many of the clusters as Simon said If you have two five clusters, it's working. Okay with tools, but if you're going to like 1050 hundred and even more clusters it's getting quite difficult to update And especially also to provide to your developers different version of clusters so that you can manage and runs us Of course, we also want to install like at-ons see And I at-ons Helm shards and the dashboard so that the customer or the developers can immediately start and don't must think about how do I integrate this How do I need additional or what do I must install additional to the plane setup? And of course we want to have flexibilities in our cluster. So we add we want to add and remove worker nodes and The developer should decide when it's time to add Worker nodes. We currently also working on like cluster auto scaling that in the future We don't really want to rely on the cloud provider specific auto scaling. We want to have an integration of the cluster auto scaler so that we can do auto scaling on any platform and we don't must look into it. How is Google AWS or how is OpenStake doing this we can then doing it anywhere And yeah, as I said Auto scaling of worker nodes. We're currently working on this to get this done Of course, there is a question. How do I work with external load balancers? So we integrate we want to integrate the same with networking with the existing tooling our focus is really spinning up all this Set ups and yeah, let the customer choose. Okay. What's the best networking option? What load balancers do I have in my own cloud and How do I can work closely together with Kubernetes? Of course one important Thing is also automatically back up and recovery from the Kubernetes master So that we ensure that your at CD master is every time healthy and in in the case that something happens your one of the node crash or the complete program is lost that we can recover your at CD and that you don't must manually interact with it so that it's completely automated and The system is completely running out of the box and you don't have much to do And what we mainly do is we come up with the idea What's the best way or how we can run a lot of Kubernetes cluster and we ask ourselves can't we run kubernetes on kubernetes? So what we spin up is we have one management kubernetes cluster and on this kubernetes cluster we spin up for each cluster in a namespace all the components kubernetes needs and Then we can connect from the outside world like from OpenStack the worker nodes from the specific cluster so that it's connected And connected to the specific cluster. We only need one dedicated IP address for all of these clusters connected and Yeah, heaven SSH tunnel that's the API server can talk to the to the nodes and do some stuff there But that's our main setup what we technically built is like a kubernetes operator Which knows how to deploy and to upgrade kubernetes cluster But in that way that we only need this operator more or less for the for the startup and for the upgrade phase afterwards you could completely remove our operator out of it the clusters are working and the good thing is because it's running on kubernetes when something is failing like when the API server is crashing it's automatically restarted and Updates of the master control plane is for us quite easy. We do a rolling update from kubernetes and we are done so Yeah, and this we can easily then move also to different cloud providers and Integrating new cloud providers now takes us between like two to ten days because we only must look into it how we can connect the Worker nodes to it and on the worker nodes. We only need like a container runtime. We need the kubernetes we must configure the kubernetes Token of the API server and the URL of the API server and then we are done and all of the rest we then roll out again with kubernetes itself so the networking is rolled out with demon sets and all the rest also and We saw in the talk before they had the discussion about machine API and cluster API We come up with a similar concept. We called it. We call it node set We are also currently in discussion with the guys how we can combine this both kept concepts together because they are From the concept side quite close together from the technical implementation side. There are some difference, but we want to make it Yeah, a line. We want to have an alignment and Yeah, in best case get it as close to kubernetes as possible and the idea is like really when we started we were thinking Okay, why can I not manage with kubernetes nodes? So we come up with the idea. Hey, we need something like nodes We need something like a node set like a replica set and then we want to have a node set controller who every time checks how many nodes should I Start and then it creates a node resource and then within node controller which talks to the specific Cloud provider API and spin up the nodes and then it's configured and so that we have the complete flexibility Of course We are dealing on the one side with quite a lot of Cloud providers, but on the other side also enterprise customers who want to run hybrid setups and their authorization and Access management is every time a challenge. So what we support is different identity identity provider like What we mainly use is all out or LDAP you can use Google as log on possibility use You can use GitHub, but you can even provide us your own identity provider And then we can integrate it or your active directory and we can integrate it and you can use this users for the management So what we want to have is like a seamless management and cluster login So really a single sign on for the user that don't need additional users on our side and the next step we want to also We support airbug and network policies inside our clusters But we want to like push it from the outside provider inside to the cluster. This is something where we're currently working on Yeah, and what we want to have is like we want to provide multiple Providers, so we want to have the same setup of what we have now is the same setup of the Kubernetes cluster master on every provider because we every time run Kubernetes we can do this more or less anywhere and What we only need is like we must deploy a VM we need docker or a container runtime on it and Qplit and configure the Qplit and then we are done So the complex part running and maintaining the cluster we can take it every time and put it to the new cloud provider or to a different Provider as long as we have a Kubernetes running we are fast and easy to to set this up and And The good thing with this is also when you have different setups, so in this case like Open-stake and per metal the same team can operate all of them because like the master control plane works every time the same So you don't have this challenge. Oh now on this platform I have a completely different tooling or I have a different deployment method than on another platform All of the setup is more or less the same like 99% is same There are of course flavors, which is different like load balancer storage And how we deploy especially on by metal the nodes But like all the communication all the stuff is done the same and it's easy for the team to operate all this clusters and to manage all this Okay so now we would like to talk a bit about the hybrid setup and the way we Set all this up and got it running. Some of you might think oh This is kind of unusual these guys are taking like bare metal servers They're installing hypervisors on a open stack and all this stuff and on top of that They're putting Kubernetes what and on top of that they're putting the Kubernetes into Kubernetes How does this work and is this a good idea? Yes, it is It's a good idea because we can use existing APIs which we already have an open stack It allows us to leverage things like storage as well by the cinder API and use images and also Scheduling underneath which we need with an open stack So it also one thing we're working on the moment is our second cloud region It's going to go online. I think in January or February and next year So that will then allow us to integrate two cloud regions within the installer and the customers We'll be able to use something similar to federate. I suppose you'll be able to distribute your Worker nodes between the two regions We also have certain customers have Very specific demands. They don't want to run on a shared platform They say okay open stack and hypervisors with shared by different customers is not an option for them So we can also integrate bare metal servers that we have in our data center a customer may rent 304 bare metal servers or even a whole rack of servers for his own personal use with his own switching and routing equipment To make sure it's completely isolated from either any other customers We also run to storage zones. We work with a storage provider from Berlin called cobalt cobalt And they offer us a storage which we utilize via the cinder API Which allows us to have two different storage zones in both data centers All of the nodes have SSDs for the storage. So it's very fast as well Integration of additional data centers or cloud providers as possible Loads have done a lot of work on that for other customers. They've already integrated digital ocean AWS Google and it would be easy to integrate any further partners that we may have or even maybe an on-prem solution if the customer's got a form of API which we can authenticate and some form of Provisioning service or VMs it can be used So it's fairly easy to extend this setup as well, which is great for us as well because it offers a lot of flexibility as well What I'd like to show you next is live demonstration. So hopefully fingers crossed this is going to work out We're gonna start up a live demonstration on our servers in Berlin. So long distance is how it works out So basically this is the interface at its current state. It's a very simple interface We did this deliberately. So there's nothing not much of not many Menu options here or anything. You can basically what you start off with is that you upload your SSH key Which I've already done here to distribute on all the working nodes. We've basically got the Managed cluster and the create cluster and we're gonna try and create a cluster here Keep on yeah And then we continue to the providers you can see here We have our open stack and our bare metal service here So we're not going to try and provision a bare metal service here because it will probably take too long But what we're going to try and do is start up a cluster on open stack So we select open stack as provider We continue to then select the data center at the moment as I said We have one region that we're running all this stuff on from January onwards You'll be able to choose the region you run is or run different work workloads on different regions so basically here is An option to add my tenant and everything which I'm going to do now. So I'm just going to quickly add in my data At the moment, we're running the stuff that we have on your bunto. Let me just grab my password Just copy this. Yeah, so that's basically all that we need to do. We'll also add my SSH key Sorry, pull it down here Add my SSH key and that's basically all we do. So what we've got here We're using the all the default settings basically. I've got my tenant already added I'm going to spin up three machines with your bunto 1604 and afterwards hopefully install the cubelet on them and connect them to the master so we can review the settings here what we've created and Basically after that we create the cluster and if everything works out We should see some master components coming up. So as you can see now on the dashboard We've got these twirling symbols at the top showing us the health of our master components So we're running Kubernetes 1.8.4 here We can also decide which versions we want to provide for our customers So basically we check the versions first and then we're able to To activate them here, or we could also Allow specific customers only to see specific versions. So we can already see we've got the first green button here That should be our ETCD. So the ETCD operator is deployed. We have a three-node ETCD cluster with a Persistent volume underneath so the ETCD cluster takes snapshots onto that persistent volume Which also allows it to be recovered as Sebastian mentioned earlier Which is of course a great thing to have because I'm not sure if anyone's of you have tried to recover an ETCD from a broken state It can be a pain So we should also see an API server coming up at the moment. Yes, we've got it a cube controller and Scheduler so we've nearly got all of the master components up once they are up and running We should see the creation of our worker nodes to actually distribute our workload on So we'll just wait for a moment for this to happen And then I can quickly show you in the open stack dashboard the VMs that are created So how many of you if I quickly ask question while we're running our demonstration are actually running kubernetes in a production environment at the moment? Oh quite a few people and how many of you have actually got a working CI CD pipeline up and running Also quite a few people because that was a something that was quite interesting because you see lots of talks about CI CD pipelines at the moment Seems to be an issue at the moment in the community. Okay, so we see our nodes are ready They are being created. Let's see if we can see them here in our dashboard Hopefully if everything works out, it's locked me out. I'll just quickly lock back in again Yeah, so we have the three VMs that we created with zero minutes. They're just coming up and waiting to be provisioned So the next thing that would happen after they pre provisioned they will get their cubelit The cubelit will start and then they should hopefully if everything works out they should join the master Then basically all that we need to do after that is we have a button here Which then allows us to download Pre-configured cube config for that customer and then you can basically start running. So basically as you've seen from Zero to go kubernetes in about five minutes, which of course is very simple Every single developer can set up his own cluster if he needs to or he can share a cluster with other members of staff if he Needs to So we should see that coming up shortly Yeah, and what do you see also on the top is currently we run the latest version, but it's it's great out There's the upgrade button. So when there's a new version available, the developer can decide okay up Press the upgrade button and then first will upgrade the master and later upgrade the nodes and of course The developer can add or remove nodes depending on their requirements. So they are completely flexible can define Okay, what workloads I do I run when I want to scale up when I want to scale down and Yeah, I've then a plane vanilla kubernetes with dashboard and all the stuff running can use storage classes out of the box from old mistake and It really feels like for the developer like a Google Container engine. They easily can also move from there Into the setup and can work that deploy the application there. Yeah So we see our first working nodes just come up now So we see we've got our green indicator here. So the other one should come up within a few minutes It normally takes about roughly about five minutes for everything to come up So we can basically then download the cube conflict here if we want to and Config and let's see and I would think they wouldn't be ready yet, but they should be really ready Yeah, so here we can see two of the machines already ready as you can also see here on the side They've already signed up to the master and the others are just about to be registered So we should see them all coming up now. Yeah, so we've got the last two aren't quite ready But that would happen in a few seconds Yeah, so that's basically it from the demonstration side Should we go back to the presentation? Yeah One more thing I talked about already Our cube node is open source if you're interested into it We are in discussion with the machine API where we want to integrate additional where we're currently working on Because what we see from the customer? It's one to get like clusters up on the other side we must get workload on it and we were looking around and also had a discussion with Simon, okay What's the best way to do this and played around with Jenkins and other tools and everything We every time want to run native on Kubernetes So we were thinking okay, why is there not a native tool for Kubernetes to run? Kubernetes workloads or CICD workloads and we come up with the idea building cube CI which is extension to drone CI and Which makes drone CI possible to run on Kubernetes. So instead of connecting to the docker socket it will in the future spin-ups Pots and also have some plug-ins already developed for your QCTL and for him. So we want to make a package so that Simon can go to their customers and say hey here's an easy pipeline put this in your git repository and then we get it on our platform and Everything works natively or we can easily manage this. Yeah, okay So it's come back to our last a few slides in yeah, thank you So basically I want to quickly wrap this up and talk to you about some of the lessons We learned and about the future roadmap where we're actually going and some of our targets that we have for 2018 So basically as Sebastian said we need to wrap this thing up with some tooling We the KHS service is not enough people need workshops people need the correct tooling to enable them to be able to deploy their Applications most people aren't don't or most developers don't really have this great interest in Kubernetes Most people have the great interest in their apps, which is good They want to be able to deploy them in a standardized way, and they don't want to Be having to go through the internet or reach rages until they can find out how to do that So we want to allow enable them to do that with an easy way What we did first is we used the same port for every single API request what we did is we were using SNI So we were sending HTTP headers to distinguish the different clusters that worked it worked Well, we're like standard tooling, but we had quite a few problems to accomplish there for instance things like Prometheus they want to do a service discovery within the cluster and They don't use the hostname they send like a direct request to the IP address So it wasn't accessing the correct cluster and there was also a few other tools that had problems with that So we decided to change this and to give every customer a unique port for his API server But always use the same IP address so that we've got that unique service endpoint to the I mentioned at the beginning of the talk Yeah, one of the things we definitely run into where we were first had our first 15-20 clusters running I think was a severe Etcd problem we actually decided that it would be a good idea which it wasn't to limit the resources the etcd can use It crashed on us because we had so much data because we were running Kubernetes in these namespaces It created a lot more etcd data than we had previously and it basically crashed the database and we needed to recover from it Which took me quite a while Take regular etcd snapshots on persistent volumes always make sure that you have a decent backup of all these etcd data It is definitely a lifesaver if anything goes wrong do that for every individual cluster But also for the master cluster as well That was definitely also a very important thing that we find that we needed To do and also restrict access to master components may sound weird But you saw this also from AWS and also Google won't allow you anywhere near the etcd It's normally a good idea to keep your customers away from these sort of things I mean give them a button where they can update it But don't allow them to just update to any version because it would a break and we had customers which actually decided They would try and offer version of communities which of course didn't work correctly on our platform So that's why we make sure that we are the people that actually actually give Give out the or hand out the versions which can be installed and Last but not least is our roadmap One of the things we're working on the moment is improving the cluster authentication We're working on a role-based access control system similar to what? AWS was talking to in the opening keynote like I am sound something interesting Some that we're gonna need to later on to make sure that we can restrict developers To certain namespaces and also to certain tasks as well within the cluster and maybe give people read only access and other things Quite a few of our customers need these Worker note auto scaling is also a thing. I mean due to communities Most of you know you have the horizontal pod auto scaling which makes it easy on a CPU load or maybe on HTTP requests to scale the pods but what you actually do when you've reached the Capacity of your three worker nodes. We need to know this and we need to be able to scale the worker nodes up to a limit So the customer could maybe say okay. I want to scale between three and ten worker nodes So that's also something that we're working on the moment Which I think we'll be able to release early next year Support for different Linux distributions is also a thing at the moment Our cloud is running on Ubuntu and we are running Ubuntu too for the worker nodes But we had quite a few people that said, oh, why don't you support CentOS? Why don't you support core OS? What about all these auto updating features the core OS we want this We don't want to use Ubuntu. So that's also something that we're looking into at the moment Another thing that we're doing at the moment is also configuring an external load balancer. I mean Google they're in a Happy position that they have a load balancer which is distributed on and announce the 150 different points of presence We don't I mean we're a medium-sized service provider We have three or four data centers which we can run on and we would like to use an external Load balancer for this. So one of the things that can be used is cloud flare We basically modified and also put it in a recall request for External DNS. It's on the contract repo of Kubernetes and that allows us then to monitor ingress resources and Automatically update cloud flare with IP addresses of new worker nodes and new ingress resources So a cloud flare actively does health checks on the worker nodes and we can also add new ones and get them in the Load balancing pool automatically without touching cloud flare. So we just use the cloud flare API for that and Also drone which Sebastian just talked about is we would like a standard way for CICD Something which a new customer may start off with us and he's looking what what can I what can I do? how can I deploy there's so many different deployment tools around and We think drone could be a good idea. I tried deploying stuff with Jenkins It works, but it's complicated and we'd like a unified way so that The time to market for the customers is very short and it would be easy to transform a setup Which is running on conventional VMs into a containerized setup and the last Last bullet point is basically automate the upgrade process the We've already automated the master upgrade process But afterwards you need to add further worker nodes and then drain these worker nodes to get to the new version We would like to completely automate this and take all that pain away from the customers It only takes about 30 minutes to do this, but I mean it's still 30 minutes of your time So does anybody have any questions that you would like to ask if we have a few minutes of time? Yeah, sorry. Yeah, if you if you have if you have questions We are still here or otherwise we also have a boost we have a loot to boost come by and we can discuss