 Okay, let's get started. Welcome to the session. I hope everyone has enjoyed that day at KubeCon. I'm Yuan Cheng from Apple Cloud Services Team. I've been working on Apple Kubernetes infrastructure. I've been active, contributed to Kubernetes project with a focus on scheduling. Security is something new to me, so I'm still learning. And yeah, I'm also Apple over in the field engineering team, so helping work with various customers across the company to make sure that they're kind of successful on our Kubernetes platforms and working with Yuan on this migration. Okay, here is the agenda of our talk. I'm going to give an introduction overview of different service account tokens in Kubernetes and the parameters and feature gates that can be used to manage and configure the tokens in Kubernetes. Then all different kind of the tokens, there are implications and potential impact on different use cases. I will also talk about how to track and monitor the different service account tokens in Kubernetes. Then James will deep dive into how can we and similarity and upgrade or transition from the traditional old legacy token to the more dynamic time bound and more secure tokens. So if we look at the service and account token, API token in Kubernetes, token is a piece of information that authenticates application container port to API servers. It's very important to secure and manage tokens properly because token and this is used to grant class resources to different applications. Any of the compromise, right, can have significant security implications. So traditionally, as you may know, right, and the legacy token is based on the use the secret. So when you create a service account, it automatically generates secrets with the tokens. This token never and expired, so it's there secure. Also, the service accounts can be shared by multiple or different applications. So that's also making it as secure. So it's definitely not recommended, but unfortunately, in a lot of the clusters, legacy, application and workload may still use the old tokens and the legacy and don't leave the secret-based tokens. So now, of course, we are moving to this and more dynamic and time-limited tokens and acquire obtain and buy from the token request API. Then application or the Kubernetes will refresh, reload it automatically or periodically. And one example is spawned and service token for the port. Finally, there are still in the sum and the long-lived token. If you really did it, you can manually create a secret and associate with the service account and that's another way and support and the long-lived token. But it's master-necessary, it's still not recommended and for long-lived tokens. So really the goal and so we have been working on and try to share with you discuss is how can we and migrate, upgrade, move from this and the legacy would never expire the tokens to this more secure and dynamic and time-bound tokens. The key challenge or the important thing is make sure we are not going to break the current usage, disrupt the current workload use. So now I'm deep in diving a little bit about all different tokens and this is the legacy very old, right? Automatically generate and the secret base token. So if you are familiar with the old version, right? Kubernetes will create a service account if we automatically generate a secret with the tokens. So then bound to the port, it's never expired and if you can, you can definitely share these tokens, right? Across different ports, if they use the same and service account, as here you can see, right? On the top, if a service account, it will have this and, yeah, so, okay, have a secret and this name and the reference. So it's bi-directional reference, reference to this and the secret. And inside the secret, you will see and hear, right? Also have a service account and annotations is used to reference to the service account. So this is the old way to use this and never expired secret base tokens. Of course now we are moving to this and time-bound tokens using the token request API. So you request a time-limited and tokens from the token request API. And here, example, right? You can see and you get this and the token, their service account, also there's some expiration time and the master use the time step and all this kind of information. Important thing is, yeah, also its audience bond, right? Which and, yeah, particular ports and this token is associated with and definitely it's got expiration time. One example is, yeah, since and the Kubernetes 121 and this will call on the bond service and account token change behavior, right? We used to create this and automatically create this and secret based long live and service account token. Then the secret is mounted as volume and import. But now is the Kubernetes will obtain a token from the token request API and now and the mounted, right? This project volume, then expiration time and by default now and yeah, it should be and finally one hour. So then cook that will automatically and refresh that and reload that for the port running using the service account and the token. But in order to facilitate and the transition and upgrade, so now there are flag called service account extend token expiration. So by default that's true and this means even it's expired and then the token will be automatically and extend to one year. Of course, after certain version in future and then this you may work, you should and switch or change it to force then just have this in the short time limited and the tokens. So as you can see here, that's the project and the tokens and finally and as I mentioned earlier and you can still create some don't leave the tokens and the business secret. But now you have to automatically created. It means you create a service account. It won't automatically generate any of the tokens or secret for you and you have to manually create another secret. Then you add annotation, special annotation and reference this service account. Then control plan will generate a long leave the token for this and the secret. This way and also if you deleted this and service account this secrets and the tokens will be deleted as well. So this is different from the automatic one. This bi-directional reference and they are could be shared across multiple and service account. So very important thing here is it can be a little bit confusing and sometimes hard to learn that there are quite a bunch of the feature flags and the parameters that can control and the token and usage and configurations and other thing. Here is a list of the feature flags, feature gates and the configuration and the flags that can use to control this token. For example, the first one basically and is... Is something wrong? Oh yeah, the power, okay. I didn't notice that. Thanks to the audience, sir. Okay, thank you. Okay, so the bond service token volume and this is enable, disable, right? This and the bond service token and the service account extend basically means, right? True means will automatically extend this expired hold and the tokens in case you don't automatically and refresh it and that marks the token expiration, of course. When you request a new token, time bound token from token request API, you can specify a token and expiration time but also the administrator can set a limit, right? Maximum and expiration time. Also now, as I mentioned, even we are now migrating and recommend and use the time bound tokens but still and if you set this legacy service account token no auto generation force. So you can still and create this and automatically create this and secrets and long live and service account tokens. And last two more is about the tracking and clean up this service account tokens. So the tracking means, yeah, how do you track the legacy token use? Clean up means finally, do we need it? It's still working progress. It basically means, right? If a legacy token has not been used for a while should we clean up, delete it? So if we talk about this audience changes and a lot of work is going on, the impact on different use cases. And so if you look at that, we still have a lot of the old system and legacy applications and use this and auto-generated secrets based on long live tokens. The good news is, yeah, until so far the Kubernetes 1.30, right? You can still use it but then also you can still generate this and automatically generate this long live tokens. Even it's not highly not recommended, right? Unless it's necessary. And in Kubernetes 1.29, we will disable and stop supporting these features. So definitely that I think the team and should consider and yeah, migrating now and how to migrate it. Also and after Kubernetes 1.30, it's working progress. We'll talk about this and how to clean up the existing and legacy and tokens. So that's what you can plan and the transition upgrade properly. So the second is, yeah, so now since Kubernetes 1.21, it's beta, 1.22, it's GA, right? We use this project volume and this bound service token. So the Kubernetes and we refresh and get new tokens for port, right? Automatically from the token request API. The good news is most applications and it should still work and particularly applications if you use the standard and the library and they should automatically reload and the tokens from right at the disk if anything changes. But of course, and if the application what have a reason you use very old, outdated and the library, right? It didn't and the reload this and the new tokens. So one thing as I mentioned early, this service account extended token expiration, right? We can still use it and set is true. Now the default is true, can extend and the token expiration to one year and after 1.26 and most likely and you want to make sure and the port, right? And the project volume and port tokens and use a short expiration time like a default is our. Finally, and later in the JavaScript deep diving to if you have some external system, right? Use the tokens. So secret that's going to be challenging, right? How can you upgrade it? Also, once you use the dynamic one, how can you make sure and the system can use this new and generated tokens access the talk to interact with your Kubernetes clusters? Here is a summary and all these different features future gate and may apply to different cases. Yeah, hopefully, yeah, you can use this as a reference and see which one and is applicable to which cases we find it useful. So another interesting, right? We talk about a lot of the features and the different cases and how to upgrade it. I think the most important thing, right? And observer being it here, right? You need to understand so far and what's the current use of your token. So the good news and the fortunately yet upstream and yeah, we also contribute recently PR to the Kubernetes is we can track the different token usages. One is the API server audit log now already in the record and quite a few and the different events with the annotations. For example, even we extend the token expiration, right? But some already expired, right? So there's still a token you can fight from the audit log. The second thing is this and the legacy token, right? Automatically generated by control plan and you can also identify this type of tokens. The third one and is the manual and create secrets, right? Then generate the non-level tokens. You can use this and by looking into the auditors API server audit log, you can fight all this and here are some examples or different annotations in audit logs. And the second is also now there are a bunch of metrics you can use to look at see how many steel tokens, right? Legacy tokens or different type of tokens and still running or still are being used in your system. So all this information of course and we believe it will be very helpful, useful, right? For administrative and to understand so the current status of your cluster and how all different tokens been used then you can based on this data, right? Make a decision and when, right? The customer should migrate which feature gate you should turn on disable enable and make sure on one hand can make this and the token more secure, right? And manage it properly without disrupting and your current use, right? It's very critical. You don't want the custom page you and me and I said my application now and yeah cannot access the API server or yeah not working anymore. Okay, then next and James and we'll talk about yeah, show some examples how you could and upgrade or migrate your tokens and also how to integrate external system and with the Kubernetes. Yeah, thanks very much. So as you can probably tell from all of this there are a lot of options. There's a lot of knobs to turn and a lot of things to tune. So we kind of this is an example of like how you could go through this and like ways that you can gradually migrate users across rather than like an all in one everything breaks for well. We suddenly surprise users who and say like your tokens are going to expire in a month's time you better rub out some changes to your software to rotate your tokens. So the first thing that we'd recommend doing well we'd say you could do maybe. I don't want to recommend that you make your classes less secure but we need to get there somehow is actually extending that that duration that we do things for. So allowing up to a year which I think is the maximum supported in Kubernetes so allowing up to a year for your tokens to expire. So that means anyone submitting a request through the token request API can ask for up to a year. We then also enable like automatically making pods tokens valid for one year. So that helps to like prevent your current running workloads suddenly getting disrupted. One year here is chosen as a number because it's you know quite a long time and gives you enough time to try and get through these remaining migration steps and reach out. I think one of the most essential things here though is tracking which is why there's so many of these programs that you talked about here because you're going to need to work with these teams work with these individuals to actually make sure that they are updating send warnings and potentially even generate you know emails whatever else notifications to actually speak with them. Yeah. So the other thing as well is you probably have some people who are still using long live tokens and that you know touched on we'll get to in a bit the external systems often you know with historically gone create the service count borrowed the token dumped it into some other secret store or something else CI pipeline that's probably going to have to continue for now. So yeah re will disabling the feature gate to polarity of these things can be a bit confusing but basically service count legacy service count token no auto generation setting that to false to allow you to to allow you to continue generating those secrets by default and so that those who expect a secret to exist can then well it continues to exist. So yeah as you go further along and this is kind of again time is passing here I think throughout this whole thing you're looking at these metrics you're assessing what the impact of these things are and going from there so I will say this is an example of a plan if no one's listening you might need to go back and revisit some this plan and adjust accordingly or get them to listen but yeah the next thing that we can start looking at doing is disabling that automatic generation of the secrets for each service account so this will then require that people annotate their secret like create a secret object and annotate them so this again there's no magic to this this is going to require communication people actually working and making changes to their processes and that's why obviously we focus on why we're doing this to actually build a more secure posture to begin with gradually and here we've got one month I think you know we need to consider what we choose but gradually reducing that maximum token expiration again this is going to have to be based on your progress with like actually getting people to update and that's why having these tracking things in the audit log having things like the metrics we talked about is essential and yeah you're probably going to need at least someone to sit there and really drive this effort to make sure it works or make sure it's on track so yeah as you talk about here disabling sorry the polarity on this one always confusing me because we've got a no and a true and false but then disabling that automatic generation as we go further along and now like we've started to set a bit of a path here we continue to actually reduce this reduce that maximum expiration time yeah we disable automatically extending that on pod so people can then go and customize things themselves so if they need to but just gradually reducing this down over time down to yeah the actual default of one hour so it sounds very simple when you put it up on a screen like this because I think we're skipping out the bit where we go around with our users and continuously try and get them to do it and then they don't listen and you get them to do it again and then it was well in 128 we actually have the ability to enable the legacy service account tracking token tracking feature gate now that will actually automatically it will automatically annotate our service account objects or secret objects after remember secret secret thank you yeah this one can enable you or now you to track the last use right the time step of the last use of this legacy one so you can base this basically can use to clean up the unused legacy tokens yeah so that's really useful for actually understanding like we might have obviously we're going to have potentially thousands or however many service account tokens as secrets in your cluster we don't actually know if they're used like they might not be being used at all especially if these are for like workloads running in pods they're now using projected tokens so the secrets aren't actually being used anymore so by having this feature enabled we can see that and we can start building dashboards whatever else and as you see in a minute when it comes to 130 we can actually start to automatically delete and clean up these unused secrets so that overall kind of gets us slightly closer to a more secure place because we don't leave a load of unexpired tokens around because these are still genuine tokens associated with a service account that's got our pet commissions to do things so these are security risks already so by actually deleting them we can automatically clean well we have controller automatically cleans them up and prevents them from being used again because they don't exist anymore so yeah that touches on kind of like a lot of the workloads in Kubernetes like your pod objects and so on well yeah your pods where you get your projected tokens external systems can be a little bit trickier because ultimately we are making a change like previously you're unbeknownst to your to your perhaps not intentionally you've allowed people to go and create a credential that never expires can be used forever really convenient to drop into your CI pipeline or something like that and your you know your users are probably going to be a bit unhappy that they can't do that anymore and it is going to require a different approach so there's kind of two ways to do it and I think some of you may have seen I think last year or so you can get things like federated IDC this is again stepping outside of service count tokens these aren't service count tokens anymore this is technically a different or method happens to use JWT's but yeah something like that well in fact I'll go to the next slide yeah this is a very effective method I think depending on the capabilities of the external system you're working with so if you have some form of identity document in your external system you can utilize that to actually pivot into a token that could be used to authenticate I mean depending on your identity document you might just be able to do it straight away it's kind of a cheating one because it relies on your external system running in an environment that has an identity that you can then use to pivot into something for Kubernetes the alternative that we can do which I mean it still relies on service count tokens but it's yeah it's certainly more portable and easier is probably something closer to how the kubelet the kubelet sort of works today where the kubelet does have a long-lived credential which it uses to go and actually interact with the API server on behalf of your pods to get those tokens and that is to actually have have some form of long-lived credential which you can then utilize that only has permission to actually go and request a short-lived token on its behalf so again you have something long-lived but in this instance you've got a little bit more control or at least tracking over like how these things are used that leads nicely on to the final piece I want to mention which is a recently merged feature around credential identifiers so because we now have the idea of you're going from a long-lived to a short-lived token those short-lived tokens will have some kind of identifier that is unique to that specific token now because we're going from a long-lived to a short-lived there each time we do that we actually have an identifier for that that like issuance and by like tracking these things back we can see when they're first used when they're first issued so if there is some kind of an issue we can actually track this stuff back this is an alpha feature in kubeletes 129 which I don't think it out yet no but yeah that's another feature that we've kind of got that kind of helps us to actually build our posture here in the meantime whilst we actually develop like further capabilities in our external systems too so yeah to kind of summarize here newer Kubernetes versions are using time-bound tokens by the token request API the kubelet is doing this for you by default the API server is making sure your pods get these credentials if they need them we're gradually moving away from long-lived auto-generated tokens and this is going to be a long migration I think for most of us I think it very much depends on the type of workloads you run and how you run them tracking is key tracking monitoring all these things communication feedback loops on that how effective is your communication it's almost you know like running a big SEO campaign internally for yourselves to actually make sure that these things are effective and that in itself is a whole other kind of worms that we're not even touching on in this talk there are a lot of feature gates there's a lot of options there's a lot of parameters and it's important to actually get familiar with them to really understand how we do them and yeah I think we've just got a few acknowledgements because along the way we've had quite a lot of people helping on this and aside from that thank you very much for attending the talk thank you very much for attending the talk and if you've got any more questions then we've got some time to answer them yeah please provide your feedback and oh yeah appreciate it there's a big queue up here to scan thanks for staying so late during our session oh okay if you have a question please use the microphone speak for over there and if you don't then we'll be around here anyway yeah yeah yeah then no questions it's good cool oh good one how close do I need to get can you hear me? yeah yeah great so I guess you know as we're moving towards these time-based expired tokens and you've got this automatic process to rotate credentials even during the span of a lifetime of a you know container how do you like how are you approaching kind of what needs to take place in the application itself to like discover that that's changed and it needs to you know absorb or understand that that change is required because it's now like a config change in the life of the you know container so yeah so so like we mentioned and the tracking and the different and the case is important so the idea and the plan we are proposing here is like you shouldn't let a legacy token steal and running working and but at the same time you'll more need to collect the data like an annotation in the APS of Audient Log you identify as a yes you're still working right take some proactive actions then you can all in and identify the applications it could stop working after right we stop supporting the legacy token so that's why we think these adaptations and the event are very important and let's help you identify and who are the potential right workloads are custom they are still using old legacy tokens and then working with them and come up with a plan right how to migrate it this is the way and vision I think the other side like terms of what we recommend to those users when we reach out to them if you're using like an up-to-date version of client go for example that will automatically be able to like see that a token has changed and refresh that it kind of comes down to client support and I think that forms part of your communication strategy is making sure that you are understanding you know how they're doing like what they're doing how they're doing it that's the big fuzzy hole in this whole thing that is very difficult for us to actually really say because it's very organization specific like it's reaching out to people it could be you know looking into the audit log looking at user agents to try and you know understand what kind of client they're running and it very much depends on what kind of platform you're running because if you've got kind of if it's just you running your cluster probably a little bit easier you understand your own workloads and applications if you're running a larger platform where you don't necessarily control these things that's why you need to extend it to a year or yeah I don't want to say more because you're not allowed to do more but yeah you've got to come up with something thanks all right speaking of more what is the max duration you could set for this time bound token so the maximum you can extend it to from my understand I've got the two people who did it here in front of me but I think it's one year hence one year beyond that like we're gradually reducing that default so I think one hour is eventually where we're trying to get to so we have like constantly rotating things going on got it and does it always need some sort of developer so I work for defense and I have to field this and we have air gaps and once it's underway and fielded I can't have access to it as a developer for X amount of years so with this auto generation of token or going to time time bound tokens do I still have the ability to rotate it or refresh it or do I need to send some sort of support to constantly update this so if this is like a you mean for like a token mounted into a pod or something yeah so that the Kubelet will automatically do that for you so it sometime before it expires it's going to actually go and fetch a new one using the token request API again talk to the API server fetch a new token and make it available it is up to your application or your workload to actually notice that so using up to date client libraries and whatever else is you know essential thank you so in one of your slides you mentioned something about this tail tokens so does this token request API deletes the stale tokens as well with the version 1.30 plus yeah I think you probably mention yeah there are some metrics and also yeah the stale tokens right yeah yeah the stale tokens means yeah now we we migrate to this and the time bound tokens right but with some feature flag right feature gate we are still supporting even you right it's automatically extended but we are and we are warning and you right and some of your tokens ready and expire so it's still working but it's with the expired tokens so this also give you idea right like the previous question and so how can you identify the potential right and the workload or customers and that could be broken right in future if we and disable this auto generate or water extension yeah because if the token stale implies that the client hasn't actually reloaded it from disc or whatever else so you know they're kind of like warning indicators reach out to those people well sometime in the next year at least but pretty promptly and you've mentioned something about the tracking of the what you call the long live tokens so how do you use those trackings for like to delete automatically with the version 1.3 0 plus yeah for that one and yeah it's still I I believe and yeah it's not GA and yet okay so that feature basically like I said this tracking means a wheel track and the user config map store the time step of that token that being used last time then when they see enabled you can get each token that must and use the time step then in feature in future in the version 1.30 and I think the plan is then you can based on that said if a token has not been used then the longer than one year or certain period of time this token will be deleted that tracking feature basically means should I and record and the time step of the token used I see yeah okay thank you thank you it's a new feature and still in the working progress ability anymore okay if there are no more questions and thanks again for coming to our session and enjoy the rest of your day and thank you very much