 Hello, everybody. This is the session for SIG cloud provider, the cloud provider being the part of Kubernetes that is an abstraction layer so that Kubernetes can really be cloud native hosting things that are portable regardless of what Kubernetes is actually running on. We've got three presenters today. We'll start, I don't know, in the order on the left. So Chris. Hi, I'm Chris Hodge and I am a strategic program manager for the OpenStack Foundation and I'm one of the co-leads of SIG cloud provider and I'm also a co-lead on SIG OpenStack. I'm Steve Wong. I'm not an official leadership position of SIG cloud provider but I do work in SIG VMware that does the cloud provider for running Kubernetes on top of VMware infrastructure. I work for VMware. I've been active in Kubernetes dating back to 2015 but back in that era it was, I was mostly active in storage and I'm also active in the IoT and Edge working group. Hi, I'm Walter Fender. I work for Google. I'm one of the TLs on SIG cloud provider and I've been leading the effort to extract the entry cloud providers from Kubernetes Kubernetes so that all of the cloud providers can be working on an equal footing. And Walter's modest but he also has a lot to do with API machinery which is the Kubernetes API and also the controller that runs the cloud provider and maybe a few other things. So the agenda today we're going to get a little bit into as background the historical context of how the cloud provider started and it turns out that the cloud provider started one way but it's moving in a slightly different direction and we're going to tell you why it's changing and how it's changing. And finally we're going to close with a description of how this SIG cloud provider SIG actually works just in case you want to get involved with the activity or learn even if you're not actively involved how you can go find the historical record of the agenda that is how decisions got made and maybe look at some YouTube videos because sometimes that kind of context can help you understand what's going on. So the mission of SIG cloud provider is to manage to allow Kubernetes to run in a way that's neutral to all public and private cloud providers. You have public cloud providers you know what those are but what a private cloud provider means is kind of running on prem on various forms of infrastructure whether it be hypervisors things like OpenStack and potentially even bare metal. And we're responsible for establishing the standards and requirements that have to be met by all of these providers to have an optimal user experience. So background for these things that this group owns are those tips there. I'm not going to summarize them here because they're really detailed but we'll have a link to this deck at the end and I think the SCED site has already a PDF of this deck where you can find those links and look at them later. The origin story for cloud providers are and still the case as of 114 that all of these abstraction layers for eight very popular things to host Kubernetes on are like integrated into the very fabric of Kubernetes itself. Seemed like a good idea when Kubernetes was getting started to make it convenient to where you could just deploy it and it just works but just like they say in Kubernetes being a host of microservices monolithic isn't always good and bundling in these things by default had some issues. What were the issues? Well you know really Kubernetes should be an orchestration layer managing only the standardized parts not the thorny aspects of unique that are unique to any particular deployment pattern. People also came to the conclusion that inclusion of these things could constitute an endorsement of a particular platform. The Kubernetes project doesn't want to be a kingmaker coming up with the specific list of what's authorized and not. The other issues are that by bundling in this code it became very tightly coupled. So suppose let's take an instance where there's a bug that affects one particular cloud provider just in the interest of fairness I'll hypothetically suggest it's the company I work for. There's a bug in the VMware implementation security issue and we need to get it patched right away. Well if that code is in tree we have to cut a full Kubernetes release pass all the tests to get out out there and that slows the release time. You really don't want that to happen unless there's a really good reason to tightly couple it like now. There's a lot of value in not having coupled release cycles. It also allows independent releases to these if one particular cloud provider came out with some fancy feature because they changed something in a particular public cloud might give them the option to enable that without waiting for a Kubernetes release cycle. So there is both problems and opportunities that can be solved by taking cloud providers out of tree. So I'll turn it over at this point to Chris. Okay so with the same stage set we want to understand how can we fix the cloud providers for everybody. You know and this includes both the providers who are in tree and the providers who are out of tree. To start on that so there are two major things that we have to do. The first is that you need to make a way to have all of the cloud providers be out of tree and then once you have accomplished that and set that stage then you can start taking out the cloud providers out of tree and then everybody has their own provider. You have experts who are maintaining that code and you've kind of achieved that level playing field. So we started by for all of the supported in tree providers the provider code has been officially deprecated and their dependencies are being moved to staging and Walter has been one of the key people who have been working on this effort and you know I'm really driving it forward because he has such a strong expertise in the Kubernetes code base and how to disentangle all of those tightly coupled resources into what we call the Kubernetes staging staging directory what would you call that yeah staging subdirectory that where we can say that these are APIs that external that providers depend upon and they won't change. With the in tree providers moved to staging after the upcoming release they will have a path through and removed in the following release. So our goal is that by the end of 2019 there's going to be no in tree provider code all of that will be gone and so if you are running a cloud provider and you you have tree you have code that's in tree I think OpenStack is a good example of this because you're more likely to be running your own OpenStack code you should be thinking about migrating from using what shifts of Kubernetes to what to what ships from you know the individual provider and then finally any un-maintained provider code is going to be removed entirely and so we just Kubernetes just had its next release a few last week I guess was when that happened and then we should be removing that code in during this release cycle yeah I think the dead code is going to be moving out I think he went back I went back okay so for the out-of-tree providers we want to introduce the cloud controller manager so the cloud controller manager replaces the cube controller manager and is a daemon that embeds the following cloud specific controllers the node controller the route controller and the service controller now if you're familiar with the Kubernetes code base you might be thinking where does where does my volume controller go and to kind of help explain that I'll take over here because I've been active in SIG storage so for the exactly the same reasons why it made sense to get the cloud provider abstraction layer going down to the actual compute infrastructure out of tree same rule applied to storage there were 30 plus storage plugins in kubernetes I think it's even more than that if you counted the flex drivers and they were tightly coupled and on the same release cycle as kubernetes more and more storage solutions were coming to to try to jump in the pool and it just became probably even more unmanageable than it was with cloud providers when it got up to 30 some any one of which could have bugs holding up a release and because of this kubernetes moved to an architecture called the container storage interface now unlike the cloud provider the container storage interface is cross container orchestrator so this is an industry standard that supports not just kubernetes but apache mesos cloud foundry docker and potentially other orchestrators so it starts with a vendor doing an implementation of a csi plugin and then kubernetes has a new interface to use the external storage plugin now it turns out that this is coupled to the external cloud provider because I believe it's fair to say that every person doing a maintaining a cloud provider elected that when we transition out of tree on the cloud provider we don't want to engage in this ugly test matrix of testing external cloud provider with internal storage and again with external storage and ending up with this grid of things that you have to support so a decision got made by everybody I don't know if there ever was an official statement but practically speaking if you go to an external cloud provider you're going to have to go to external an external csi based storage plugin as well yeah and for more on that we're not going to cover that here because the storage sig owns that and they've had sessions that got recorded at past kube cons as recently as a month ago so go look at the videos and fly decks for those to learn more but but but it is worth mentioning though that if you are looking for a csi driver and you will there's a good chance that you will find it with the cloud provider code for for a particular vendor so keep that in mind when you know when you're thinking about the you know the storage controllers that they often wind up in the same place yeah especially if the csi driver is specific to the cloud provider so csi gets interesting because there are two flavors there are going to be things like amazon's ebs csi driver and then something like you know a just a local file system driver which is also csi and so those will be found in different places and for the one where you're going to land in an on-prem scenario typically the if you're running on a hypervisor it will have a bundled csi but you're really free to use a bunch of others potentially too that have nothing to do with that hypervisor so it is entirely possible to use many csi plug-ins at the same time and there are perhaps use cases where it's a smart thing to do right and it all speaks to the reason my csi wasn't included directly within the cloud provider. I'll just put in one more tidbit for migration when you go from in tree to out of tree right now on storage the migration story isn't necessarily fully fully bank the plan long term with the storage thing is to go replace the entry storage plug-ins which are all still in there with stub code that will call out to the csi implementation but right now that hasn't been done so just be advised that that is a consideration if you're engaged in migration and if you've already got working previously created persistent volumes it may or may not be straightforward to do that migration to go from the entry to the out of tree it's I think fair to say it's potentially plug-independent and you should look to the authors of that plug-in for advice on how to handle that back to you Chris. Okay so we're going to spend a little bit more of this talk discussing the nuts and bolts of how you build your own cloud provider because because this is you know this is important if you're providing your cloud service or you want to be involved in the development of someone else's cloud provider you know it's important to understand this process so the first step is you need to write a package that will satisfy the cloud provider interface the next step is to create a copy of the cloud controller manager main.go and import your package and so you're taking the package that implements the interface and you're wrapping it inside of an executable that is common amongst all the cloud providers and initializes you know it has an init block that loads your provider code and then finally you go through the work of publishing which means building testing you know you know building it testing the your code works both through you know integration testing as well as functional testing packaging it inside of a container so that you can launch it as an application and maintaining it and supporting your and supporting your customers we're going to be consuming this cloud provider so at a high level to build the out-of-treak provider you need to implement the cloud provider interface and there are several interfaces that you have to implement and you may not actually implement all of them if your cloud doesn't provide one of these then every one of these has an optional flag to say that you don't support that and the cloud controller manager will ignore it and so this includes the load balancer interface so cloud specific ingress controllers is covered by this instances which is cloud specific information about nodes within your cluster zones which are cloud specific information about the host availability zones and so if you have a cloud that spans multiple zones you know this gives you a way to handle clusters that are spread across those zones as well as clusters specific clusters which is cloud specific information about the running clusters so if you have multiple clusters running and finally routes and so cloud specific cloud specific information about networking so this is the actual code this is the actual provider that you have to implement and so there's an initializer and then a method that returns you know the true the returns if each of these are set and and an object that is an implementation of that particular interface okay so once you've implemented that provider then you need to build your cloud controller binary so the external cloud controller runs as a separate binary that interacts with the with the Kubernetes API service it's configured with a standard set of options and these can be extended to match the requirements of your cloud and a starting template is provided by the Kubernetes cloud controller manager directory we have a link to that right there and on the next slide we also have just you know again this is we I've cut a little bit of the code out because there's there are a lot of comments there but this is basically the gist of everything that you need to be able to create your binary um so you bring in a number of imports include you know including your cloud provider as well as some optional um you know the permutous client goal plugins and for for metric registration and you basically have a an initializer log so you set your cloud controller manager command initialize the logs and then execute the command and all of your command line options are kind of pulled in and set up beforehand in the cloud controller manager app so it's a great it's a great way to get started you will likely want to have your own configurations on there you might want to pull some of the configurations out to simplify for your users okay once it's built you want to have two types of tests at minimum um the first is unit tests all the appropriate mocks to guarantee your implementation behaves as expected functionally but also importantly when you load the cloud controller manager the kubernetes behavior is going to change to use all the implementations that you have provided and so you want to make sure that kubernetes is behaving exactly the way it's expected to behave and you accomplish this by running the standard end-to-end tests with the provider enabled on the instance of your cloud for full integration and conformance testing and then once you've done this you want to enable release gating against your cloud and so this has to be implemented um so so both have to be implemented and you need to run the end-to-end testings and these have to be reported to test grid if you've done this and you can demonstrate that you that you're that your particular cloud um has um has implemented these tests and also has a critical mass of users then your cloud can become um like these tests will become gating for the kubernetes release and we'll give you kubernetes certified status right which is yeah and the key is that even though they they're moved out of tree so they're not part of kubernetes it's possible that somebody made a change to kubernetes itself that broke it when running on your cloud even if your cloud provider is fully compliant and passes all your tests and we want to know about that okay so finally once you've built your cloud provider then you want to run it and so to accomplish this there are a few things that you have to do when you start your kubernetes server where previously you would say cloud provider in the name of your cloud provider instead you specify cloud provider external then when you start start your ccm binary you set the cloud provider flag to be the name of your cloud provider and the and then the cloud config flag to the plat to the path of your cloud configuration now the similarity similarity the similarity in options comes from the scaffolding code um the cost of this is that there are a lot of unneeded options that you are being pulled over to the cloud provider and this is uh there's actually an issue that is actually pretty old by now but it's still something that i think longer term that the uh sig cloud provider should be looking at addressing and one quick note um the cloud provider external is actually a flag that not only the cube api server but the kcm the cube controller manager and the kubelet also need that flag set yeah yeah they all need to be aware that they're that that that the uh that the um that the cloud integrations are being handled by an external provider in an external binary okay so when you're running in production use a daemon set this is why they were made um and your cluster will behavior will change in a few ways um notably specifying cloud insider cloud provider external will add a taint to node dot cloud provider dot kubernetes dot io uninitialized the effect of this will be that there will be a no schedule during initialization and so the assumption is is that that you actually can't do work on the cloud until your cloud provider is up um so you set the taint um to make sure that you give you give the cloud provider time to initialize um cloud information about the nodes in the cluster will no longer be retrieved using a local using local metadata but instead all api calls to retrieve node information will go through a cloud controller manager this means that for larger clusters you may want to consider if your cloud controller manager will hit rate limits because there's going to be a lot of communication between the kubernetes api in your cloud api um and you know in the rate and it's going to be responsible for almost all the api clouds you make um you know to your cloud from within the cluster and so this is something that you'll want to think about um because there have been reports of cloud providers who um kubernetes hits the rate limiting on the on the on the cloud and you'll want to adjust those limits appropriately if you know that you're going to be um running kubernetes is inside of your cloud with that integration in place um and so it's important to note um you know that once you um that you set a toleration also so that the ccm can bootstrap itself so so that once you load the cloud provider you want to be able to load that you know if you're running it as a daemon set that you load it um that that you ignore the taint so that it can be scheduled and it can be brought up and that process can be started and so this is also another important aspect of when you decide to run it as a daemon set this is how you make it so that that um the cloud controller can be scheduled and started so that the entire kubernetes cloud can initialize okay finally there are some implementation details that you want to consider um what library are you going to use to interact with your cloud um typically you're going to have some sort of library which is you know probably in go um that interacts with your cloud how are you going to handle authentication and authorization and this is actually a problem that goes much much further because you may have a number of different integrations with your cloud particularly with the um with the the new cluster api and you want to make sure that you abstract how you handle authentication and authorization which means that you're going to have it you'll want to build an external library to handle that and these are the things you want to think about when you're building your provider of all the pieces that you'll possibly that'll possibly be used by other components that will be shared okay uh go ahead one more slide we said thank you because this is the end of the organized presentation but i think there's a link for q and a and how to get involved and how to find this deck so maybe one more we'll do q and a but i'll leave this up because there's the link to the deck yep so um so if you want to join the cloud provider sig we have a slack channel um sig cloud provider um and all of the cloud providers themselves also have their have their own slack channels so you can communicate with them um the google group is a community sig cloud provider it's a fairly low volume mailing list but important communication does happen on that and you do you should grow in join the group because a lot of the documents have access gated to members only so there's no particular qualifications other than providing an email but once you join the group it will end these situations where you'll see a dock list link try to open it and you'll get permission denied so the other thing is if you have problems in slack or google uh you can always open an issue on sig cloud provider and begin to chat that way and um i'm one of the uh chairs of the sig along with jago mccloud um and andrew psych him um but really one of the great things about sig cloud provider is um there's a tremendous amount of work that's going on and um you know and you know even though there are like the sig chairs the um when you show up and you do work within the within within within the sig um you know it can you know it's actually you are seen as a leader within the sig and um you know and there's a tremendous amount of work that that can be done particularly in finalizing the extraction process and then coming up with these commonalities that we're going to be sharing across the the providers and so if this is something work that you're interested in please show up in the slack channel show up in github um you know talk about the issues that you want to work on and we were are more than happy to guide you in the direction to become uh you know positive contributors to the sig okay with that does anyone have any questions sure uh can you let them borrow your mic of course we've got another one coming well um uh my uh my desk and her based on their all words you know the program of virtualization program uh well we build the uh well desk center based on the all words uh i i know that there's a uh cloud uh cloud provider to uh provide uh if i if i want to build the uh the commodities uh uh cluster uh based on the all words cluster and that's uh uh is that the uh we can use the uh you use sig team plugins to help us to build the uh commodities based on the all words cluster just i want to know the details how the how the how you program to to scheduling the uh the container based on the all words i'm not familiar with overt i do represent you know a hypervisor based cloud provider and i think chris does as well as overt is overt one of the cloud providers that we haven't found developers for yeah so overt is one of the entry providers that is going to be removed so if you are interested in continuing overt development then you'll want to actually start use use some of the things we talked about here to build an external provider for that um but essentially so i i think if i understand your question it's how do you how do you schedule the nodes on the on the cluster and sig cloud provider isn't doesn't worry about how a node is created the assumption is is that there is some mechanism that has created the nodes inside of your cloud and that you've started up the processes inside of that node that are going to be part so you're either so you've either designated a node as a control node or you've designated it as a as a as a working node um and then the cloud provider becomes aware of those and communicates so so if i can add to that uh the first thing i'd mention is i would go talk to sig cluster life cycle because how do nodes come into an existence is something that's usually dealt with by cluster life cycle right now once the node exists and it has been created in kubernetes how you provision that node according to the cloud provider that is exactly what the cloud provider sig cloud provider is concerned about but getting the node into existence you need to talk to cluster life cycle and then beyond that if you did want to influence placement of workload with scheduling then the zones come into play if you choose to right well that's that that's the because well zone zones are a little bit higher level there's actually there's the there's the node interface which understands is able to tell you know what sort of node you have what it's what is what is you know memory and and cq capacity is um what sort of workload is work is on it right now and you know and that will that will be the thing that it will read data from the from the from the from the cloud api to help determine how it schedules workloads for the different nodes yeah right and driver like a driver to help the yeah for for future releases of kubernetes we're going to need someone and maybe that would be you would write an external cloud provider for over and to be more specific if you if you want to not go through too much pain well maybe you may want to think about helping us actually move over into staging rather than having us just delete it if it's something you care about rather because once it's deleted you're going to have to go in the look back machine to find an older version of it and probably you know copy that and if you can just keep it going you may be better off right and extend and and moving it into staging is will extend the life cycle of that internal provider for at least one moment before you have to then get it and find a home for it and definitely come to sit cloud provider because finding homes for things like that is one of the things that we do right in our in our meetings what we should probably have our meeting times up here um they're thursdays at well you see you can only tell them one time zone too and yeah i usually say you can see so but yeah but there's but the anyway if you join the group you'll you'll see the calendar and learn what the meetings are they might not be at times friendly for china but right and if they're and if there are if you are interested you know if we have enough people who are interested in a in a different time that is more accommodating towards the asia pacific time zone um you know as a community we are we are always welcome you know to the idea of oh if there is demand for you know for people to be able to attend the meetings at the time that that's good for them to accommodate that either by changing the the meeting time or adding a you know a tick tock meeting so that we you know it's us european friendly one time in Asia pacific currently on the other time i think we have time for at most one more question because i've there's people stacking up to use the room for the next one anybody gotta last one otherwise we can hang out in the hall hi i'm oxen from the alibaba cloud and there i want to know the future roadmap for ccm so ccm or for cloud provider extraction or for cloud provider so i think the big thing we're working on right now is trying to get everyone to doing a standardized build out of tree test grid integration and moving kubernetes to a kernel model and i think that particular schedule is currently taking us through the end of 2020 if you look up andrew sycam and you look for the roadmap he actually has a very nice breakdown quarter by quarter of what each of our goals are for cloud provider extraction and cloud provider through the end of 2020 once we have everyone out i think there's going to be a lot of you know looking in and trying to determine what's next i imagine that's a lot of how do we make a better interface to doing various things that are undoubtedly going to be coming up it may be things like how to support specialized custom hardware uh do we want to move to more of the csi style model for some of the the the various pieces also um we have this never ending list of these little tweaky things about things like how do names uh nodes get named how can we make that consistent across cloud providers for transitions for hybrid clouds for that sort of thing and also for things like how do you determine a node is dead and what are the calls that need to be made um nodes have health checks that they make against the cube api server when a certain numbers have been missed now we have to determine okay is this node just slightly unhealthy or was it removed by the underlying cloud and so that some of those sort of life cycle models need to be looked at fairly closely okay thanks and i think we're going to have to cut off questions but i for one am willing to hang out in the hall if anybody got cut off