 All right, I think we can start now. Welcome everyone. I'd like to thank you for joining us today. Today is a CMCF webinar, watched new in Kubernetes 1.17. My name is David McKay. I was the lead on the communications team. And I'd like to welcome our enhanced lead, Mr. Bill Cullen and our release lead, Guinevere Sender. So thank you very much for joining me today. Before we get started, we do have a few housekeeping items. So first, there is no speaking during this webinar. So please use the Q&A box at the bottom of your screen. We will try and get to as many of those questions throughout the webinar and cover off anything we can at the end as well. So ask them as we go and we'll do our best to get that covered. This is an official webinar of the CMCF. So because of that, we are subject to the CMCF code of conduct. So please do not add anything to the Q&A or the chat box that would be in violation of that code of conduct. So please be respectful of each other and of the panelists. And today we're going to get started. So I'll hand over to Bob. Kick this off actually with the Guine. Hi everyone. Good morning, good day, good afternoon, wherever you are. May I introduce you to Kubernetes 1.17 with the subtitle of the chillest release because we only have one holiday season release a year and Q4 is it. With that, we have our fantastic Kappie Bara mascot who is one of the chillest animals of in existence. And I got the fantastic Alison Doudney and her partner Tyler to do a really cool logo for us. So she is extremely talented and I'm so excited she made this artwork for us. Can we do the next slide? All right, David, did you want to go over the agenda? Of course. So we've each selected one feature that we're really happy with in the 1.17 release. We're going to talk about the stability changes that have happened in this release moving on to snapshot and restore volume support ending with topology aware routine. After that, we will travel through all the other SIG updates with the 1.17 release followed again by the Q&A at the end. Okay, this is where I'll kick in and take over for a little bit. For this release, we had 22 total enhancements. That's much less than we've had in previous releases but we sort of see this cyclical nature where towards the end of the year fewer and fewer enhancements, especially with the holidays and the shorter cycle will make it in there. What they make up for it in the beginning, we'll probably have significantly more for the 1.18 release. Of those, we had 14 that were going to stable or GA and most of those are like smaller feature things that just help the project as a whole. They aren't like necessarily huge features. Four that were graduating to beta and four that were new as alpha. Yeah, so I think I'm going to address one of the biggest themes that was part of this release, a stability release. Really? Yes, really. In fact, this is something that SIG release has been tossing around for quite a while. Hey, maybe especially given the fact that there are a fair number of folks who are concerned with the overall stability and the quickness with which the project is moving, that it might actually make sense to have one release a year be dedicated to just making sure that the project is stable, that all features are moving forward, becoming more usable and becoming more stable. But it never really quite became reality and that has sort of been true for this release as well but it kind of shook out naturally that as Bob already mentioned, most of our features are graduating either from alpha to beta or from beta to stable. We are hoping that this will turn into a real formal stability release going forward and yeah, that's generally the direction that we're hoping this will take. So of the 22, 14 went to stable. We'll dive right in and cover two of them here in a sec. But for a quick rundown, the ones that were promoted were taint node by the condition, configure a pod process, name story sharing, schedule daemon set pods by the Kube scheduler, dynamic maximum volume count, community CSI topology support, environment variable expansion subpath mount, defaulting of custom resources. If you are here for the 116 release, that might actually look familiar. That one was sort of a second part, second half of one that went to stable in the last release. Move the frequency of the Kube heart beats to the new lease API, break apart the race test tar ball, watch performance support, behavior driven conformance testing, finalizer prediction or protection for service load balancers and a important little thing of avoiding serializing the same object independently for every watcher. So that I will dive into sort of our first big featured enhancement snapshot and restore volume support. So this is a report that my audio is breaking up. Can you hear me okay? Okay, cool. This is a beta feature. It was actually introduced as alpha in Kubernetes 1.12. And during the alpha, it actually got rewritten from the ground up twice. And it's now moving to beta. So many more people will get a chance to actually use it. So Kubernetes itself has proven to be like a great abstraction for describing workloads programmatically. However, this has been a bit more difficult with the array of options for managing stateful data. Many of these storage drivers out there actually provide some level of like snapshot and or backup mechanism. So like, you know, when you're working in AWS or Google or something like that, you can take a snapshot of a volume and possibly restore it later. But there hasn't been any sort of like integration with Kubernetes itself. So you sort of had to manage that sort of thing out of band. And what this enhancement does is actually brings those primitives back in to Kubernetes. So you no longer necessarily have to, you know, go to like another console or something like that to script that backup and restore. Now, just to make something clear is you can't use this directly out of the box with the 1.17 cluster. It requires a bit more plumbing and the installation of the external snapshot controller. This is sort of similar to how you would install any sort of CSI based storage driver. And you'll probably be able to expect a lot of the cloud fighters to just do this automatically for you. So the external snapshot controller adds a few new CRDs but the two big ones are the volume snapshot class and volume snapshot. The volume snapshot class is sort of similar to a storage class. It defines which CSI driver is used, how the snapshots are made, and their retention policy. And then the volume snapshot itself is sort of an instance of the volume snapshot class for a provided persistent volume. And this might be a lot to take in, but if this is something that is of interest to you, I highly encourage you to check out the blog post link at the bottom. It came out, I think it was December 9th and it does a really good job of summarizing all the functionality with this and how to use it yourself. Next is the ApologyWare routing of services and this is a alpha feature, which means it must manually be enabled if you want to use it. But currently, when you use a Kubernetes service, it will sort of map to random pods within a cluster. And this isn't, you know, this is ideal for a number of reasons. Let's say you have a pod that's serving that service that's like right next to another pod that's requesting it. Why go out to like another host or something like that when you don't necessarily need to? So what ApologyWare Routing of Services does is it allows you to specify sort of a list of preferences via a new keyword of Apology Keys. So you can say like, you know, if it shares the same, if it's on the same host, go to that one. If it's not on the same host, prefer one that's in the same zone. If it's not in the same zone, prefer going to one that's in the same region. And if none of those are true, go to any of them. And that way that should, you know, minimize the latency and sort of randomness that you can sort of see when spinning up a service. So if you don't mind me interjecting there, Bob, we do have a question on the routing. The mark is asking, can you require routing? Can you go through another layer before it hits a service? It's essentially the, as far as I know when it comes to how it works, is it just changes the like IP tables rules in the order in which it's hitting it? I don't know if that directly answers your question. We can follow up after in the chat, that's fine, thank you. So that I will kick over to SIG updates. And we will start with API machinery. So touching on what was said before, this enhancement was sort of two separate ones, pruning and defaulting. Pruning was promoted to GA, and defaulting was promoted to beta in the 116 release. This wraps it up by bringing defaulting to stable the 117 release. So CRD or custom resource definition, they're like sort of user-created extensions of Kubernetes, and you went up sort of registering them as their own API objects, and they'll have their own like API version in kind. So before defaulting was possible, users had to sort of provide a value for like every field in the CRD that you were adding, or the developer had to bake in some extra logic by means of like a utility admission webhook to provide some defaults. And managing this actually became quite a bit more complicated when you start thinking about like having to handle the various like upgrade, downgrade conditions where you might add or remove a new field. And so now with the defaulting being sort of baked in by default, this issue is mostly gone and you can now just easily provide defaults in your own CRD spec. The big thing is that to do that, you must use the new API version, apiextensions.case.io slash v1. You can follow the link to sort of defaulting once this presentation is published, to read a bit more about that. Because of defaulting was like, these defaulting is such a big thing, I honestly expect that 117 will become a minimum requirement for a lot of CRDs and applications going forward. So this one and the next one sort of work together and they aren't really user facing, but it's a watch bookmarks and this improves the performance of the QBAPI server. So when a client initiates a watch and by client I mean a programmatic client. So something both like client go or one of the Python libraries, something like that. And they're watching a set of objects to get notified when something changes. It gets a list with a resource version number and that maps to sort of a set of changes or diffs from the previous version or previous revision. And if the client happens to disconnect and tries to reestablish that watch, the QBAPI server and client would have to sort of play back all the resource versions to then get back to where it was. And this caused unnecessary load on the server, especially if you can sort of, you might have a whole slew of things watching those objects. Just cascade into a, obviously a bit of a nightmare. So now you can actually sort of like bookmark where it was and that way when it reestablishes the connection it doesn't have to go back and replay all those changes it can just sort of like pick up where it left off. The less object serialization. So if you have multiple watches watching the same set of things previously, for each client the QBAPI server would have to like serialize that object for each one. Now there's a little bit of like caching in play. So if you have multiple things watching the same thing and they'd be notified of the same sort of updates they'd be cached. And we saw like significant problems with the like scalability tests where we'd go up to like 5,000 nodes where this would start to cause some performance issues. But after these changes there's now a general 5% improvement in CPU usage and a 15% less memory used when you're dealing in this scenario where you're having multiple watchers watching the same thing. We'll kick off to architecture now. So behavior driven conformance testing. This is a non-traditional enhancement but it's sort of a larger plan where we've agreed to tackle some better like conformance testing project-wide. So essentially this is defining how our conformance tests should be built and documented. Right now there isn't a sort of single like explicit list of behaviors or assertive truth out there. These are scattered amongst design docs, enhancement proposals, user docs and a subset of the E2E tests as well as the code itself. So this makes it like near impossible to identify if the conformance suite provides a meaningful test for a cluster's operation. And this cap actually went like right to stable by it was just merely deciding the plan how this is going to be tackled. So if conformance is something that interests you, I would highly recommend following the links and reaching out to the teams working on this under SIG architecture. And any help is greatly appreciated. This last one is the removal of the project-wide usage of the node role labels. This might impact people. So this was actually sort of an accident over time. The node role labels, node-role.curian.io slash star late node role namespace was not intended for widespread use by the project itself but several things actually started referencing it. These were introduced by QBDM to help them manage the provisioning and manage the lifecycle of the QBDM provision nodes. And they were not intended for use beyond that. However, like, sorry, this is because there are many sort of like tools and provisioners that are not QBDM based, which means, if nodes are missing those labels and certain things would probably have issues if they try and reference them. And certain parts of the project itself, if I recall correctly, there was a service load balance or a few ETB tests that were referencing them. But because they were referencing that, it means like they wouldn't function in any sort of conforming cluster that wasn't provisioned by QBDM. So it's been decided that these labels will stop being used project-wide. They may continue to be used by QBDM but they may not be used for any conformance-related activities. And this kept simply outlines the plan to start the removal and deprecate them from the other places in the project. Kick over to Cloud Provider with another fun label thing. So for quite some time now, we've had the beta labels used by the cloud providers to sort of signify what instance type it is, what zone or region it is. And it's time to bring those to GA. So as you can see in the list here, like failure-domain.beta.corrace.io slash zone will become topology.corrace.io slash zone and so on for the other ones. And the general deprecation plan for these is both labels will be applied to nodes through the 1.20 release. And in 1.21, they'll stop being referenced or applying them, but they will not remove them from objects that already have them applied. If you are relying on these things, I would encourage you to check out this cap and think about updating anything that you have that might reference them. That was the only one for the Cloud Provider this time. And we'll go right to Cluster Lifecycle. This is a really short one, honestly. It's structured output from Qubate ADM. So as Qubate ADM is sort of becoming the underlying substrate or underlying tool for many other things, it'd be really nice to actually have machine-consumable logs or other things that can be bubbled up easily or parsed easily. So this is adding essentially Qubate ADM logs that can be output in JSON instead of just simply unbuffered text. Now it's time for some of the fun stuff. We're back to topology-aware routing of services. I covered this a little bit more in detail earlier, but essentially you can set a predefined list of preferences via the topology keys parameter and service definition, and that will be the sort of preferred way services or pods will, services will route two items. Next is an interesting one. So the IPv4, IPv6 dual stack support was graduated to alpha in the 116 release. However, since there has been a significant amount of effort across many parts of the project, we are continuing to track it in this release. Anytime there's very large changes like this where tests and other things that impact other parts of the project might be updated, we opted to track it. It just means it's not graduating, but we track it as an enhancement. So some of the major changes that happened with the IPv4, IPv6 dual stack enhancement is dual stack support was added to the cube proxy in IP tables mode, as well as support for dual stack in the downward API. So now if you're doing a reference with the downward API, you'll have both the IPv4 and IPv6 address and they're separated by a comma. The other thing were some changes made to the cube controller manager where before when you had to specify the like node seeder max size and the configuration parameter, like it only had one option. Now you can actually specify it for both IPv4 and IPv6 via the like dash dash node dash seeder dash mask size dash IPv4 and dash dash node seeder mask size IPv6. And those can strictly only be used in dual stack mode. The other thing with sort of all this effort going on right now the plan is for them to push to beta in 1.18 which means that clusters will fully support dual stack mode sort of out of the gate. Next is the new endpoint API or endpoint slice. You've may have heard of this just around a little bit or a schedule all over the place. This is sort of the long-term replacement for the current core V1 endpoints API. The current API actually has a lot of performance and scalability problems for that like impact multiple sort of components like control plane. The sort of just of it is if you instead of like recomputing an entire list of endpoints and notifying all the watchers when one is updated they are now sort of broken down into groups I believe of 100 and only the group that had an updated endpoint will be updated recomputed and updated. And you can like small clusters didn't like have too many problems with this in the past once you got to very, very large clusters with potentially hundreds of thousands of pods it becomes a significant performance problem. And you can imagine soon with potential for pods to have two endpoints IPv4 and IPv6 that will be magnified endpoint slice has actually been like a requirement before IPv4 IPv6 dual stack mode could be supported. Another fun one finalized protection for service load balancers. So service type load balancer requires Kubernetes in an external entity usually a cloud provider to work together to ensure the proper management of their life cycle. So in the past there have been a couple of conditions where the Kubernetes service could be deleted before the actual external load balancer was deleted. So if you can imagine now you deleted that but you still have an AWS load balancer provision that's not so great. So now essentially the deletion of the service will be blocking until the external load balancers removed. Functionally this enhancement hasn't changed too much since Alpha outside of being flipped down by default it's just been sort of like closely watched to make sure like the TI can soak and we haven't seen any sort of problems with it. That we'll move on to node. So configurable pod process namespace sharing. So the default container behavior and this is the same thing that you're like working with Docker locally or anything like that is that each container will exist and run in its own process ID namespace with the entry point process serving as PID 1. So when you're in a pod and you have multiple containers and that they can't see each other's processes there've been certain ways to do it in the past but like it's been a little hacky. Now with share process namespace enabled you can actually remove that boundary and let them share one sort of PID namespace. So this might seem like it removes some of the inherent security or isolation mechanisms that we see with containers but it also opens the door to sort of more complex workflows and enables things like a debug container being attached to another container where you might have a singular goaling binary. I also know like several groups are looking into this as a means for better CI runners. So this means that you could have your sidecar container sort of function as the init and that manages the job execution in another container and with the various like build dependencies. It also means that like that container could perform other actions say when being a container main process that was watching terminated it could do some cleanup actions before fully terminating the pod. Move the frequency of the Kibla heartbeats to lease API. So this is another backend thing that end users generally won't interact too much with but it will serve both as a performance boost and remove some of the scalability limits encountered with clusters with more than 2000 nodes. So on an interval, Kiblet will check back in and update sort of its records with a bunch of information about the current state of the node. This includes things like what images it has on there what volumes are mounted as well as a slew of other things and these individual updates can be like upwards of 15 kilobytes in size for a single update. And when you think about that's a lot for just a simple node status. So now you multiply that by a few thousand nodes and those nodes checking in every 40 seconds you can start to get an idea of the load in which that puts on both the API server and like the backing that's already database. So now instead of including like that full update there'll be much smaller update as a sort of ready signal that is used as the node lease object. And then a full update will be sent on a much sort of longer interval or if there's any recognized meaningful change. And the biggest thing that you will actually probably see as most end users is if you use kubectl and like look at the namespaces you will see a new names system namespace called kube-node-lease. And this is where all those lease objects are going to exist. So hey, Bob before we jump into scheduling we had a question there related to the hot process name sharing. Sure. Is there any security configuration to enable or disable the sharing the process ID or the process table? Give me one sec. My slides are frozen for a sec. This was happening earlier. You have to explicitly enable this on a pod to do it. Come on. This is going is really going so sorry about that. That's all right. I mean, I think you've answered that question. Do you think you'd be able to get back for the current slide or is it proper frozen? I'm trying. Let us know if you need one of us to share and we can advance through the slides for you. Okay. I might need to at least restart the screen share or something and give me one sec. Should I sing a song this night? Maybe some intermission music. There we go. Okay. Now we'll dive into scheduling. So the first one is taint-node-by-condition. The Kubernetes scheduler itself doesn't actually check node conditions. So these are things like, I forget to actually like specific wording, but there's like memory or a disk pressure or like out of disk. And that doesn't directly impact scheduling. So instead the node lifecycle controller would automatically set the like no schedule or no execute taints on the node based on those conditions. And this has sort of been in place directly since 112, but now it's just essentially graduating to stable and some other like cleanup things have been done. The main reason for doing this via the taints and tolerations is that way a cluster admin can sort of actually force a pod to run in a node even under one of those conditions. Next one, the schedule daemons and pods by Kube scheduler. So this one isn't a huge change for a lot of users. It's already in effect, but the sort of history behind it is that before this was graduated to beta in 112, daemons sets were not scheduled by the Kubernetes scheduler. The daemons set controller actually handled that itself. And this was because, and previously caused some weirdness with the scheduler since the like spec.noName was already present when the daemons set was created. And being scheduled separately by this other process was causing some other problems where certain things weren't being respected. So this is most notably when nodes are flagged as unschedulable. Now that they are being managed by the Kube scheduler, all those restrictions, everything like that will be handled properly. Storage is where I think we have four or five. We'll quickly revisit the big one, the snapshot and restore volume support. Essentially, you'll now be able to manage snapshots and things like that within the cluster. So it's covered at the beginning. I'm not gonna spend too much time on this one. This is here mostly for completeness sake and there's some useful links at the bottom that you can use to go see the docs on them. We do have a question if you're wanting to take that now. Sure. But Phil asks, does the snapshot controller support scheduled snapshots? I don't believe so, but you could basically set up as a crown job. Perfect. So dynamic max volume count. So with the various like entry volume providers, so this is things like the GCP disks, AWS disks and a few others, there were some specific hard coding scheduling predicates in there for the maximum number of volumes that could be attached to a node. And this is mostly mapped to the, whatever elements they were for the cloud provider. Let's say for example, in Android OS, you are limited to sort of 40 EBS volumes per instance. And Kubernetes, you wouldn't want something to possibly be scheduled on there where it already has like 40 disks attached and cause a problem. And so by moving to a dynamic design, these various like cluster wide settings or like hard coded settings have been deprecated and are now tied specifically to like the CSI driver or volume type that are associated with that node itself. So now you can sort of like mix and match if you happen to have like nodes running different storage drivers. This probably won't matter too much in the cloud providers, but if you're bare metal and you have say like Cep and some of the other ones, that sort of thing would be pretty important. CSI topology support, sort of similar to the preferences we saw with the service topology preferences, you can now have preferences for scheduling of storage. So you can sort of define, labels and their acceptable values for use in volume placement. This is very useful for like, if you want to make sure that your pod is consuming storage within a specific region or zone versus an area where you don't have that. Next is environment variable expansions via sub path mount. This was introduced almost a year ago in the 114 release and was started to help some of the legacy workloads where you may need to write to a specific place on the host. The intention for this is to sort of work in a combination with the downward API so that you may, you say like, for example, pod names within a mount. This came out specifically because if two or more pods running on the same host happen to reference the same host location and start writing to the same file name, they're going to clash and you're not gonna have a good time. So you can work around this by using the sort of sub path directive, all sub path expression directive that has been introduced. If you tried to use the sub path one previously and it being hard coded, yeah, bad things are gonna happen. It can enforce uniqueness. So this has been a very large one for the project itself and migrating the entry storage plugins to a CSI driver based. Many of the entry based volume types that were built before the CSI or container storage interface were implemented. There's been a long term effort to get those out of tree so that they may be managed separately. But unfortunately, like right now is sort of this middle ground. This has led to having to support them both entry and as the non CSI version and out of tree as the CSI version. And as you can imagine, there's been a large duplication of effort. So this enhancement aims to move the remaining internal logic out of tree while maintaining the original API compatibility to minimize other code changes. Right now the plan is to do some minimal changes to 118 and then push to graduate to stable in 119. The only one from testing is to break apart the test tarball. This is again like another internal facing link cap where previously we had this large Mondo tarball that had all the testing suites for like every platform baked into it. So if you were on AMD ARM, RPC, it didn't matter. They were all baked into there. Now they are on their own separate artifact and it's improved the testing times and testing quite a bit. Moving on to Windows, run as username for Windows. So in the 116 team release, we saw two very big Windows improvements. This, the run as username and a similar enhancement for GMSA or group management service account support. The work in GMSA was allowed to soak for the 117. Oh, I think it might be freaking out again. The work for GMSA was allowed to soak in the 117 release and as expected to pick up again with 118. For the most part, this just adds similar functionality to the run as user directive that we see in Linux containers just for Windows ones. And as this is very sort of tightly scoped with minimal code or API changes from when it was built in alpha, it was moved to beta and turned on by default for the 117 release for Windows containers. That's all the enhancements we had for 117 release. I guess with that, we can go over to some of the questions. All right, thank you very much for that both, that was awesome. I think what I'll do is I'll tackle some of the ones from the end that were related to the CSI as soon as we've just covered that. So the first question we have is, the CSI alert topologies parameter can it be configured as a default? I believe that it is configured on, let me jump back. Actually jumping back's a bad idea. Yeah, yeah, I think it just broke it again. Yep. So I believe that is set up on the storage class. I'm not 100% certain on that. So anything being provision of that storage class should be in the right zone or at least preference should be in the right zone or region. All right, we have one more for regards to CSI. Is there an end of life date for the entry storage drivers? What's going to happen if an upstream doesn't participate in the migration? As it stands, all of them are participating in it. But the, it should be like completely, it shouldn't impact end users at all. The right now, essentially there's like a shim being built so that way any requests internally will just essentially use the CSI behind the scenes. The big thing would be like, you might need to spin up the CSI driver for the specific cloud provider in addition to what you would normally spin up in a new Kubernetes cluster. Okay, okay. Okay, now we're going all the way back to SIG network. Okay, that's quite a hard question. What I might suggest is you've asked that in SIG network on Slack, but I will throw it to Bob in case you feel unbraved. But can you elaborate on how the IP rules are changed with regards to the topology aware routine? So when you create a service, it gives you like, you basically have a IP on the, IP tables role on the host that will point to the current end points for a particular service behind the scenes. What this does is it just shuffles it in the order of the IPs. And this is, again, if memory serves me correct. All right, next question. As the key topology seems to be a good way, host name and region, can it be enabled by the fault on each service? You can't do it. Yeah, you'd have to use something like a mutating admission web book or something like that so that any time a service is created, automatically add those preferences for you. All right, perfect. All right, we just have a couple more and then we'll have to wrap up. But is there any reason why the application for the node specification of Kubernetes cloud providers, i.e. node.coupernevies.com? I can take- Why are we deprecating? I can take that if you want. Yeah, so basically it's, as Bob mentioned earlier, there isn't really an official definition of control plane or what makes a control plane node and every cloud provider sort of has their own take on it. So in order to avoid having actual internal dependencies on these specific labels, we are just, it's basically just a cleanup thing and to encourage people to create their own labels and their own behaviors. Okay, perfect, thank you. And we have another question with regards to the cloud provider labels. When will the node name not be dependent from the cloud provider and move to just being labels and then in brackets and unify the host name overwrite option of the Kubelet? And we've got anything to see with that. So I'm rereading the question. The node name will not be dependent from cloud provider and move these specifically to just the label. I'm kind of unsure on that question. Yeah, again, I think you can use whatever labels you like as of right now. So again, I think this is mostly just so that we don't encourage internal dependencies on something that is really cloud provider specific. Okay, thank you for that. And the final question is about ingress. With many projects bringing in new privatives essentially for ingress such as STL using ingress root. Is there going to be any unification of these within the next few versions? I don't know, but there is a cap for that. But they're discussing it. I can see about digging up the link and dropping it in the... I feel like I need to do the meme picture. All right, so that could be followed up with later. That's great. All right, thank you, Bob. Thank you, Gwen. That was great. Thank you everyone for joining us today. This webinar will be was recorded and will be available online with the slides later today. And we look forward to seeing you at a future CNCF webinar. So have a great day. Thank you once again to Bob and Gwen for joining us. And I'll see you all later. Thank you, David. Thanks for hosting. Thanks everyone.