 Well, hello everybody and welcome again to another OpenShift Commons Briefings. This time we're kicking off 2018 with a talk on Kubernetes 1.9, the features, functions and futures. Derek Carr is our guest speaker today and we're going to have a lot of content so I'm not going to talk very long. I am going to mention that we are hosting another OpenShift Commons gathering in London on January 31st so go to commons.openshift.org if you want to find information on that. But with that and no further ado, I'm going to let Derek take it over and we're going to drive through all of the goodness of Kubernetes 1.9. So thanks Derek. All right, well thank you Diane. Hopefully everyone had a refreshing holiday break. What I want to go through today was all the great work that was done just in time for Christmas for Kubernetes 1.9. I'm going to try to give a summary across the entire ecosystem of what got accomplished just so for folks that are aware, I mean my background on the project is long and storied but these days I focus a lot on particular areas around the node and resource management so I'll do my best to answer any follow-on questions and areas that might be out of my particular domain but feel free to ask anything that you want further verification on afterwards. So with that in mind, what's new this time around in Kubernetes 1.9? So let's pull together some of the stats and I was a bit amazed here to be honest. This was a shorter release Q1.9 so always the fourth quarter release of Kubernetes is a bridge due to the Thanksgiving and Christmas holidays and what impressed me is when looking at the stats across all of the pull requests that get merged across the entire Kubernetes organization, 6,000 plus pull requests are still a lot of pull requests and even in that time there were something approximately like 75,000 comments across pull requests and issues so just looking at the measurement of like the community vitality and health I mean a lot of work is going in a very short period of time and a lot of discussion about future work is happening. If I wanted to talk a little bit about what the focus of the 1.9 release was it was largely on stability and targeted graduation of particular features and I'll give a little bit more detail on that as we go through. For folks who may not be aware the Kubernetes project is subdivided into a set of special interest groups which we call SIGS and as well there are some working groups that kind of span SIGS that a lot of the discussion and development activities for the overall project come out of. In particular I think when trying to get an accurate summary there were about 18 top-level features tracked in the release that were produced across all of those SIGS and working groups and what I'm going to try to do today is give a little color and commentary about particular ones. Generally speaking I think everyone across the Kubernetes community recognizes that Kubernetes is becoming a central part of folks IT operations and as a result there's a strong need to make sure that stability moving forward is is paramount. So there was a lot of focus in the 1.9 release here to continue to focus on fixing bugs and ensuring a stability a stable platform moving forward. In addition I would say there was a bit of a slowdown on the community adding new alpha level features in favor of graduating things that had been alpha or that were in beta to a stable status which I think is is a sign of overall maturity happening within the project. And then in addition this is important to me particularly at Red Hat here where we run very large clusters on our customers behalf is that a lot of the experience that we get across the community about running clusters in production is informing what we choose to fix and how we fix it so that overall we can continue to refine polish scale and improve the supportability of the platform. So what I wanted to do to structure today's discussion was kind of give a summary overview of what some of the a subset of the 29 SIGS had produced in the 1.9 release and give a little detail on why some of that work is is valuable. So first I'm going to kick off SIG apps graduated all of the workloads API to our V1 status so for folks who may not be aware new API resource types in Kubernetes go through a an alpha beta and then ultimately a V1 or promotion state. Typically things start in alphas they're being iterated and learned upon beta when we think that they're starting to mature and we want to get users to actually take advantage of them and when they get to V1 we really feel like they've reached a rock solid state and it's unlikely to change they will not change in a backwards incompatible fashion moving forward. So the key APIs that move forward were the daemon set deployment replica set and staple set. This was about a let's say a it took over a year of effort to to reach this state and a lot of work was done to ensure that lessons learned from one workload controller were carried over to every other one and a lot of work was done to ensure that there was consistency across those controllers for how they manage resources. The bachelor workload APIs in particular users who might be using jobs and cron jobs they're going to have a separate path to V1 but I think it was a big win for the community generally to see the four major workload types for stateless and stateful workloads have all now graduated to V1. So thanks to remember if you're looking to migrate your existing content to the new resource types there were some changes that were made to the workload API types as a part of their graduation to V1. In particular any of the workload controllers the selectors that are used to target pods that they manage there was some behavior in the past where the selector inside of your pod template supported defaulting on what your controller managed that that was removed out for a lot of lessons learned and so in general you have to give an explicit selector on your workload API type. In addition selectors are no longer mutable so we the community kind of feels like there's better patterns around handling things like canaries and stuff so that when you have a workload type defined whether it's a stateful set or a deployment the set of pods that those things manage based on its selector are actually not mutable anymore and in addition all of the workload API types now support a common upgrade strategy for rolling update by default. For folks who have a large investment in the platform now obviously backwards compatibility is important as you know this was a 16 month effort essentially so if you are just starting to take advantage of these workload APIs in Kubernetes 1.9 plus time frame you know we heavily encourage people to start using the apps V1 resource types but if you have existing resources that you've authored or you're managing the platform will continue to support bi-directional auto conversion with the older versions for an extended period of time. So that kind of was the major highlight out of SIG apps I want to do a little bit of a deeper dive around some of the great work that came out of SIG API machinery. So you know if you think about what SIG API machine produces they do a lot of the I don't want to say plumbing but they do a lot of the great work that supports extensibility to the platform and one of the key areas that really evolved this release was around admission control so for folks who may not be aware historically since Kubernetes 1.0 there was the ability to do something called static admission control so you could write little code snippets that intercepted requests to the API server and they were used to do defaulting and to do resource constraints and things like quota and in the time frame now between Kube 1.0 all the way up to now Kube 1.9 you know the number of admission controllers kind of blossomed in the project and a lot of patterns were identified that were common across the ecosystem for the types of things that we saw admission controllers do and I think in Kube 1.9 there was a great work to to clean up and improve the extensibility story so that a new users who want to go and intercept requests to do some custom action don't need to get their code merged into the core Kubernetes repository they can manage this externally and B the admission chain flow was cleaned up to address recurring patterns we were seeing were mutating and validating admission control handlers conflicted so the good news here what I want to talk about is something called mutating and validating webhooks have graduated to beta and essentially this means if you are interested in extending the platform to do things like I want to intercept when a namespace is created so I can do something or validate that these names conform to particular naming conventions or for example I want to intercept when a pod is created and maybe inject a common sidecar container in the past these things were really hard to do because you had to get code into the core now you can do these things using what's called a mutating webhook and you run a small server and there's examples that will be published out of the community on how to do this that can run as a pod on the cluster and anytime a request comes into the server you will get a call out in the particular chains called here and you have the opportunity to have your external code mutate the incoming object prior to it being persisted as well as validate that object to enforce any constraints you need once the API server code pass starts calling up to external resources prior to persistence it's important to make sure that these things are low latency and performant to support the community and their needs around monitoring of this type of thing there are Prometheus metrics now collected around the latency of calling out to particular webhooks and in particular as I said these these these webhooks can be managed outside of the cluster as well as being managed into the cluster via a pod referred to by service so like why is this important so folks may have heard about projects like Istio and last I recall Istio likes to inject a common sidecar container into every pod spec in order for the platform to work similarly the service catalog you know historically the service catalog wanted to be able to dynamically inject particular m bars into pods on creation and then there was work in the community around things like pod presets and then naturally OpenShift is interested in intercepting what happens in Kubernetes to perform its own constraints and and validation needs and so this mutating webhook and validating webhook is really a big ecosystem enabler because it allows people to intercept creation requests and do something in response without having to touch the core platform on the right hand side here you see a sample webhook configuration so these are stored in the API server like any other resource type and basically what you can do is you can set up a client configuration that says this is how you contact my external webhook and then you set up some rules that say I want to intercept these particular operations on these particular resources and if those rules match on an incoming request you know the server will call out to your your your webhook you have the opportunity to do some action you know and say yeah or nay if the request should proceed there are a failure policy that lets you control what happens if your webhook admission server is unavailable so you can fail open or fail closed obviously if you fail open that would mean if the server can't find your external webhook it kind of just ignores it and couldn't give some I guess non-deterministic behavior but generally you have the flexibility to say what you want to happen when things can't be reached I gave a link to a sample admission webhook server that we at OpenShift have worked on that lets you control reservation of namespace names and encourage folks who are interested in exploring that to do their own enablement after the call this is also particularly powerful if you're using things like custom resources so a lot of people are using custom resources to drive operator patterns and many folks wanted to intercept creation of custom resources to be able to perform an action this now kind of completes that vision and we look forward to getting a lot of feedback from the community about it another great thing that came out of APAP machinery which I think we touched on in our last community call around the 1-8 release is something called chunking and basically this now graduated to beta this is of particular importance to me as an operator of a very large clusters in our online environments where previously when you would you know many of our controllers or our clients whether you're doing migrations or not would commonly need to list all the resources so if I just give an example of like some of our online deployments which commonly have you know 10,000 namespaces and each namespace has nine secrets it turns out listing 90,000 secrets is a really painful slow operation so one of the great things that is now possible is you can basically when you do a cube control get of this resource by default now it has a standard chunk size so it will fetch the resources in groups of 500 by default so that end users see immediate responses and have a perceived latency improvement as well as the server is much more efficient at actually being able to return all these resources timely without reaching a timeout so this is one of those internal density and scale improvements that might not get a lot of attention in blog posts but are really critical to actually running reliable dense clusters or doing things like migrations so this is something I definitely want to highlight and I think it's a big win for the community around reliability the last thing I want to talk about from API machinery is some of the work that was done around custom resources so for folks who are aware you can register your own resource types that you want to manage in Kubernetes so if you have your own third-party operator resource type that you want to be able to do crud on obviously you can do that now in Kubernetes but what you were not able to do previously was validate your resources prior to persistence so a lot of people had to do client-side validation which has its own pros and cons but new in Kubernetes 1.9 and on by default is the ability when you declare your custom resource type to give an optional open API v3 schema and when your custom resources are then created by end users they get validated against that schema and create and update calls so a quick example of this is on the right hand side you can see a custom resource definition that has in its spec a new validation clause that says okay anything that's the spec.version property on this resource must be one of these two values as well as you know the spec.replicas must be between you know this value range and so if you look at an example on the left this is an example of that custom resource definition in this case it's a kind called app and it declares a version field and a replicas field that don't validate and you get a really nice user experience now in kube 1.9 where if a user posts what you see on the left that gets validated on the API server according to those validation rules this case that's going to fail and you get really rather nice validation error messages in response that lets users know why something was or was not valid I think this is one of those things that gets operators of the platform that want to go and extend and enable their own tube style controllers a level of power that they didn't have previously and kind of brings custom resources one step closer to having the same experience that you see in the native out-of-the-box resources in kubernetes so I expect a lot of great work will come out of this as a result so moving on I want to talk a little bit about some of the stuff that came from sigauth so highlighting two particular items here so there was work that happened around auditing to try to provide better clarity on actual timestamps when logging audit events so there was some confusion in the past where if an audit event was recorded you didn't actually have two timestamps to know when the request was received versus where it was in the auditing stage for processing that request now there's particular timestamps to let you now track those two things clearly and kind of gives better granularity into your audit logs and the other item I wanted to call out a little bit since this is useful for things like custom resources is a new feature uh in our back that lets you define cluster roles uh that union together the role the rules of other cluster roles so for example what you'll see here is created a cluster role called monitoring and the aggregation rule says I want you to match any cluster role that matches the aggregate to monitoring true label and then there's a back end controller uh in kubernetes that then goes and says we'll find all cluster roles that match that label and dynamically populate the rules for the monitoring role based on that um so this is nice if you want to integrate your out of the box uh role types like default edit and view uh with uh any custom resources you create um um so I thought that was useful to call out next up out of uh six cli this is one of those features that I really like uh I have a long history with the project I believe I I tried to get this feature in originally in 2015 um and it's taken some time uh but uh I was really happy to see that this landed which is you can now use field selectors in uh cube control so folks may not have been aware but in the underlying api you could pass field selectors to kind of restrict the set of uh resources that came back and this was really common when um implementing certain controllers or even when doing things like in the cubelet where you want to find all pods that were bound to your node um you can now basically reproduce everything you could do in the api via cube control nicely using the field selector syntax so for example if you've ever struggled to get uh how do I find all node all pods scheduled to a particular node in a one line script command you know that's now possible here where you just have the filter pods on spec dot node name or want to find all pods that were running all pods that were not running if you want to filter events you know based on their source basically you have a lot more flexibility now uh in cube control uh to do things based on the actual field values versus just things like labels um inherently uh field selectors you know you have to know which field selection clauses are available for that resource type um but I think generally speaking this is a really useful big win and if you it'll avoid people are going to write a lot of uh jq type filtering semantics that we had seen in the past I want to very quickly go over what was going on across some of the sigs that deal with our cloud providers so in the aws sig uh some work was done to support uh c5 instance types that used mvme uh device volumes in addition uh nodes that present themselves as having ebs volumes that are stuck attaching are now automatically tainted um and the expectation is that operators will now monitor for that taint and remedy as they see appropriate so that might mean if you if you see a node has been tainted um because it has volumes like attaching you might you know um uh just choose to restart that node for example um in addition on azer there was some work to improve the load balancer implementation and just general stability and on open stack uh a number of um iterations were done to improve what how it integrated with block storage uh and the load balancer on the networking side um a couple items I'll discuss here for networking so alpha support was added for ipv6 um and in addition I believe in the last uh update call from cube one eight we talked about there was alpha support for uh cube proxy now supporting ipvs instead of just ip tables uh this is now graduated to beta in the cube one nine release and we're excited to see um the outcomes of that as people start to evaluate it um there were a lot of potential reported benefits that you know we need to measure uh in our dense clusters ourselves to see the pros and cons of the change but uh generally speaking ipvs has a lot of potential long-term benefits for improving your performance on dense clusters where you have a large number of services where writing things like ip tables rules was very slow or even evaluating those chains was slow uh in addition uh potentially offers us more load balancing algorithms for how we choose to route and some improvements around health checks and connection retries uh moving on to uh sig node there was generally speaking a lot of performance and reliability improvements that were done in cube one nine to just make sure that the keyboard is more stable at running your workload across the container runtime ecosystem i think we're starting to see the all the work that was done around the container runtime interface uh and sig node come to fruition so i wanted to highlight the great work that was done out of red hat and intel and others for cryo getting moved to stable and so it passes all of the e to e test for cube one nine uh and it has uh integration with mini cube and we encourage everyone to try it out in addition the other runtimes have evolved in the ecosystem so container d moved to beta as well as the others enlisted here um generally speaking this this is really uh important to me because you know and this is probably the first release where it was really true that uh the idea of being able to plug and play particular container runtimes uh has has come to fruition and now you get to evaluate the runtime you want to run based on you know uh performance metric stability uh those types of things um in particular here at red hat we will be looking to deploy cryo out to our uh open shift online clusters um very shortly uh so for uh the debugging tools a lot of uh to make it easier to debug environments when you're using a variety of container runtime choices uh there is a new effort for cri tools uh that has improved to basically make you be able to introspect what's happening in machine independent of the container runtime uh in the resource management space a lot of work was done to just kind of continue to iterate and prepare for graduating features we've been doing for a while uh so uh for device plugins um a lot of work was done to just kind of improve the reliability of how the cuba interacts with device plugins at this point we still only have a limited set of plugins available in the community largely focused around a gpu accelerator use case but if folks are interested in uh participating in integrating with other plugin types i think we'd love a contribution uh in addition uh for workloads that uh cpu latency sensitive needs we continue to iterate uh or what's called our cpu manager or pinning policy and so the static cpu pinning policy that says if you uh request a core you get an exclusive core and you'll have that core for the life of your pod uh stability work was done to ensure that that works across uh cuba restarts and basically at this point i think we're at a good state to graduate uh that work in a future release uh and in addition huge pages for folks who are managing large databases or or caches uh based on some experience of using what was done previously we eliminated it from being tied to the the quads model but it um basically that's another thing that we're preparing to be able to graduate to beta in the future on the note side it's important that you know how your workloads are running so there were a number of numerous metrics improvements done so uh as folks may be aware the cuba embeds a component called cadvisor so cadvisor got extended to add support for accelerator stats so uh the node now can report the make model how much memory your gpu has how much memory that gpu is using and how utilized uh that gpu is and so this is just an example of something we see as being important uh prerequisite to be able to graduating things like device plugins uh in addition ephemeral pod storage is an activity that's been going on for a while uh in the community that lets you control how much local disk um pods can consume uh right now we have monitoring and uh and metrics data now reported to say how much uh local storage is being used and in addition for folks who uh integrate with the cuba summary api for metrics collection in the past we've just reported stats uh that were container only uh but now we give pod double usage stats um which lets you know if that pod has a multiple containers very easily how much memory you see i don't know who the node is i i i think i've solved it here it comes again everybody is unmuted all right Derek try it again all right all right so moving on uh there's a resource management work group which is an effort across six scheduling and node and variety of others another particular item i want to highlight here was um enhancements to quota so uh this was another one of those things that came out of just the observability of running really large clusters and unique challenges you run into uh when you want to preserve the amount of etcd uh space that's used uh so the major improvement that came in the quota in this release is that you can now do object count quota on all standard namespace resource types so there's a syntax for this now where you can just say count and the resource name and the group they're in and in addition you can also now quote a huge pages so that was another preparatory work item done to support graduating that to beta in a future release so a quick example here is if you are wanting to control the number of pods that a user can consume and in addition the number of jobs that they can spawn uh this is the example of the new syntax that lets you basically uh quota any standard resource type in in Kubernetes um so this is uh really necessary in the future we expect to be able to quota um custom resource types as well uh which is did not get to that point yet um okay so stick scheduling uh there were more iterative improvements in pod priority and preemption uh so new one cube one nine uh the pod priority feature which is still an alpha uh is now respecting pod disruption budgets uh in addition it's integrating properly with the cuba eviction logic so folks who may not have been aware uh pod priority is basically a mechanism that lets you say um you associate a priority uh integer to each pod and uh items with higher priority are giving better guarantees towards scheduling and if that resource that pod can't be scheduled uh it will preempt pods with a lower priority to ensure that there's a fit um so this introduced some unique challenges when trying to integrate with how the cuba itself chooses to evict resources uh which in the past has always been when pods are using more than they asked for and resources are scarce uh the new uh logic is basically you continue to be at danger if you use more than you requested um but um assuming there are no pods that are using more than requested uh it then will break ties with priority and then work against uh whoever the largest consumer resources relative to the request um in addition some interesting work that some of our team members here at red hat have doing was we had a new priority function uh i believe it's alpha that lets you um for folks who are aware when a pod has a cpu request and a cpu limit uh right now the scheduler had only satisfied resource requests and it didn't really care what your limit was uh we had gotten a lot of feedback that uh users wanted to prefer pods whose limits could be satisfied so that you could reach your maximum burst and so uh this is a new priority function that was added that is a useful tiebreaker so that says the scheduler will prefer to land your pod on a node that can satisfy both your request and your maximum burst limit one other thing i know here i wanted to call out here was there's some work going on to seek scheduling around various incubator projects uh one of those items is the de scheduler uh which is basically looking to um look at an existing set of pods that have been scheduled across your cluster and perform for better for worse you know a defrag and see if there's a better home for that pod now and if so look to move it um this is a an incubator work that's continuing to evolve uh in the community uh and six storage um few items i'll call out a lot of them are all alpha um the first is for folks who are aware when kubernetes wanted to support new volume plugins uh you always had to get code in core kubernetes so it's kind of a similar problem as the mission control problem i talked about previously and that's a bit of a hindrance uh towards uh uh broadening the ecosystem uh because of a few reasons one it makes your integration have to be open source and some folks had trouble with that too uh it's just hard to get your code into kube sometimes uh and so there was a great effort uh done for around something called the container storage interface which defined a common uh api pattern across multiple uh container orchestrators this was an effort across the kubernetes community the mesos community cloud foundry um docker swarm and basically a new volume plugin was written for kubernetes core uh that is currently an alpha that knows how to uh interface against the container storage interface definition and in the long term this will allow you to enable volume plugins that can be deployed uh containerized on the cluster uh and not need to be into core kubernetes itself uh in addition alpha support for raw block devices was added there's one implementation today in the community i expect that to grow in the future uh and then finally i think we talked about in cube one eight there was an initial support to allow you to resize your provision volumes uh that resize support uh got extended uh to additional volume types so new in cube one nine uh you can resize your gce persistent disks your sef uh disk your a to s ebs volumes and your cinder backed uh persistent volume flames uh so i expect based on the experience of of that uh growing with multiple um storage volume types that that will be set up to go to beta in a future release uh the last item i want to highlight was some work that came out of sig windows so new in cube one nine the cubelet and cube proxy can run on a windows server 2016 plus release and what this will let you do is have windows based uh nodes in your cluster your control plane components still run on linux uh some work was done at sig windows to try to further evolve or improve uh the support of running pods on windows nodes so listed a number of them here uh but at this point basically i think uh sig windows in the broader community is wanting everyone to evaluate its usage and provide feedback to the community on how to further uh iterate but this is a heartening sign to see where the set of workload types that are supported on Kubernetes continues to grow not just on linux itself but across the broader uh operating system ecosystem so that's uh cube one nine uh in a nutshell let's look forward a little bit for cube one ten and then we can take q and a so cube one ten is is very early uh so after cube one nine went out the door as you can imagine everyone took a very well-deserved long vacation and cube one ten planning is just starting across a variety of our stigs so what i wanted to highlight here was a couple items that you know uh i personally hope to see continue to get attention uh but you know this is a bit early days and subject obviously to change i expect continuing moving forward stability and bug fixes will continue to be a recurring theme uh we talked about everything being extensible in uh across the Kubernetes platform so whether that's container on times that was uh api machinery extension hooks custom resources storage volume types uh device plugins uh along those themes of not needing to get something into core cube to take advantage of it i expect all those extensibility um uh vectors to continue to evolve as well um as more and more clusters are running greater and greater densities you know we should continue to see um improves uh scaling uh improvements some of the items listed here uh that are of interest to me as i noted earlier the descheduler component uh is an incubator project that's being worked on uh in Kubernetes today and um is probably going to be looking to get some more uh actual real world feedback uh post cube 110 uh the priority and preemption features i'd love to see those get graduated uh to beta as well as a number of the other things that were discussed previously and um for folks on the call if there are particular features that you would love to see uh Kubernetes start to explore as well the community is very open to input i would encourage you to attend the appropriate SIG and and let its leadership know what you had in mind um because that ultimately makes the project better for everyone um so that in mind i will open the floor for any questions and uh all right folks um thanks Derek that was uh in a nutshell um a lot of work went on to get kubernetes 1.9 out the door and um yes everyone deserved that wonderful vacation break and hopefully are back and reinvigorated for 110 um if you have questions ask them in the chat if not um because i'm not seeing any questions which probably means you've stunned them um into silence with all of those features and you can reach us on the slack channel or in the kubernetes community channels um and get answers there and as as Derek pointed out he's also Derek playing car on twitter but you probably know him on github as well so please feel free to reach out to him or to ask questions on the open ship commons mailing lists and if you're not on that list yet um send an email to me or um tweet me on the at open ship commons um twitter handle and i will get you set up there um it's going to be an interesting year 2018 lots of good stuff coming down the pike um for kubernetes and all of the ancillary upstream projects that are related to it so if there's topics that you're interested in hearing about kubernetes related upstream or other uh workload stuff please let me know and i'll be happy to um organize recruit speakers um and to get folks um generally the information they need is as quickly as possible so again i'm not seeing any questions Derek which means you've probably done an awesome job here um or stunned everybody and um i really thank you for taking the time today to listen in on this um it's a rather large audience so um we're pleased with that so um thank you um all this uh the slides for this and the video should be up um by monday morning on blog.openship.com and um we'll look forward to hearing more from everybody um so thanks again Derek thank you dian