 Okay, I think we'll go ahead and get started. Looks like we got the good group here so far Thanks for joining us today everyone. We'll be talking about keystone tokens and some lessons that we've learned over time Running with some different token types in our production environment. I am Brad Piccourney I'm a principal software engineer with Symantec and I've been working on OpenStack since about 2013 and Currently work primarily on Horizon and this is my colleague Preeti Desai Thanks, Brad. Hi everyone. Welcome Thank you for joining us. My name is Preeti Desai. I'm an OpenStack developer for IBM I work on various OpenStack projects and one of them is Keystone So we are all here to find out what is the best token format That we can configure in our OpenStack Deployments We have four token formats so far Starting with the oldest UUID. Let's take a look at what is UUID? UUID is the simplest token format we have and it's based on the version for UUID and It's very simple to configure just provide the just specify the provider name and that's it We have the UUID token configured now if I want to Get a UUID token for myself. I give credentials. I provide I initiate a request with my credentials and the target project What happens next is Keystone validates the Identity my identity whether my user account is valid or not Next it gets the project ID whether this target project exists or not Next it basically gets the list of roles that I have on the target project and It gets the list of services and also the endpoints associated with this all services Then it bundles all this information into a token payload, which is in JSON format and it creates UUID using this formula now all of this information together is stored in the token backend and We have our token So this is how Keystone Functions and creates a UUID token now if I take this token and go to any open stack services for any resources The open stack service goes sends back request to Keystone for validating the same token Let's let's look at the example here and the key point that I would like to note about here is the ID and also the key in the database. So this is just for simplicity. I got the sample from SQL backend and also said note the valid The valid field in database so now when it comes down to the validation The open stack service sends a request for validating this token Keystone retrieves this Token payload from the back end the token back end it checks whether that valid field is set to true or not if it's set to true then it basically gets all the payload and parses the token and retrieves the metadata and checks whether the token is expired whether It's basically it's expired or not and also next it checks whether the token is revoked or not So this is the high-level workflow of how the token as UID token is validated using the Keystone service upon all of the successful checks it returns the token payload and With the success so now the open stack service knows that okay that this person is authorized and you can perform the request Now for example, let's say if I have a UID token and for some reason I feel that I would like to revoke it How there are many different scenarios where a token can be revoked But let's say I find a suspicious person having my token and I would like to revoke this So what happens in the revocation is when a Keystone receives a delete request on the token a basically using the previous workflow from the previous slide it validates the token and If the token is valid then it retrieves the you audit ID from that token after retrieving the audit ID based on what is the audit ID it creates a revocation event based on the audit ID and With in this new revocation event it adds the time since when this token is invalid so this information is bundled up and It's basically a new revocation event in the sequel database and After creating the event it actually Goes and prunes basically filters delete all the expired events or whatever events are expired from the database and then It goes back to the token back end and sets that valid field to false So basically the token still exists in database. It's just that the valid field is set to false now So now Let's look at if I have this kind of token format in multiple data centers I Initiate a token request in my data center one Retrieve back the token you ID token and ask I want a new virtual machine So no more configured with Keystone middleware Sends back request to Keystone for the validation Keystone checks the token back end whether the token is valid or not and then Returns the response to Nova and Nova Goes back and basically creates a new VM for me now if I take the same token and I go to a different data center what happens So with the same token I said give me a new virtual machine in a different data center Nova configured with Keystone middleware sends the request to Keystone and Keystone and turn cannot find that token because in production in General the token back ends are not replicated because it's not feasible to replicate the entire token back end So it does not find that token and says failure You are not authorized to perform this operation So from this entire workflow, it's clear that Even though you you ID is the simplest. It's the lightweight But it does not support the Authorization and authorization across multiple data centers So now for solving this kind of issue There is a new there is a different token format which will which is pki and pki z and let's look at both of these token formats Basically, they're pretty much similar to each other so we can just have On the same run Pki is cryptographically signed using x519 standard. It's not encrypted The notion is that it was encrypted and secured. It's not it is just signed It's in the CMS format and it is converted into the custom URL save format Versus pki z is the compressed form of pki and prefixed with pki z to Identify that this is pki z token Now for configuring this kind of token. We need three kinds of certificates Signing key so you generate a private key in PAM format and that becomes the signing key Now you need the signing certificate So using your private key you basically create a CSR submit this CSR to a certificate authority and then retrieve receive the certificate back from the certificate authority and Then you also need the certificate authority certificate. So these three Certificates has to be configured in the keystone con file and then you set the provider saying you have either pki or pki z now so how If I'm requesting a token in pki or pki z format how keystone and behind the scenes Functions so after similar to uuid after validating identity resources everything It creates the json payload and now this json payload is Signed using the signing key and the signing certificate and in case of pki It's converted to the udf-8 format and then the custom URL save format and in case of pki z it's compressed using gsil zlib and It's converted to the gain custom not custom but base 64 URL save format and Then the Prefix pki z is appended to the token and both of these tokens are stored in the back end So now I have this token and I go back to keystone service for validation Okay, let's look at the example here. This is the sample again from the sequel back end and note that the The longer string. It's just a truncated string of pki pki is huge It's in the key and then there's id field Id is generated using it's basically hash of the pki token. So And this is I I had MD5 by default it is configurable so you can configure it to anything And pki z looks something like this So we have the pki z token and then the id is again the hash of the token So now if I take this token Back to the keystone service for validating what happens it generates this hash out of the token and The entire workflow is pretty much same as uuid it gets back the token payload and then checks whether it's valid Retrieves the metadata expiry and revocation the entire workflow is pretty much the same as uuid So this is note that this is the keystone service validating the token and not the keystone middleware Validating the pki token keystone middleware. We all know that it's pretty much can Validate without going back to keystone. It can validate the pki token. It decodes the token and checks for the expiry and the revocation now So Let's say if I again find some suspicious guy having my token, how do I revoke this token? So upon the delete getting the delete request the entire workflow is same as uuid It basically again validates and then revokes the token Now If I have this token format in multiple data center I go to data center one retrieve the pki token and then go to NOVA So give me a new virtual machine and then NOVA configured with keystone middleware Validates the token and then returns me the virtual machine now using the same token I go to the different data center and so give me the virtual machine again NOVA configured with keystone middleware can validate that token and then it returns the virtual machine So this pretty much looks like that it works across the multiple data centers But note that the services which are configured with keystone middleware can work if the for example the services like Horizon if it's configured if it's integrated directly with keystone remember that keystone for Pki validation it needs to have the token back and replicated. It's a persistent token So it's not basically truly Supportable in multi data center environment Even though the token validation requests are not sent back to the keystone service It doesn't truly support and the biggest issue is the large the size It's huge and it's even larger than the standard HTTP header size and even with the compression It's not it's just a 10% of maybe Compress size, but it's still large and Again, the notion is Pki is encrypted. It's secured. It's not anybody Whoever has the token the entire token payload is in the token itself So anybody can decode that token and look at all the information so now We looked at you ID Pki Pki Z Now there's a new token format for solving this kind of issue Let's look at the latest new token format for net Furnet is the cryptographic authentication method and then it's based on the symmetric key encryption for symmetric key encryption's the keys are stored in key repository and the keys Basically, there are multiple keys so the primary key is used to Encrypt the token and then all the rest of the keys are used to decrypt the for net token so now This is how you can configure for net specify where is what is the repository and What are the maximum number of keys? Allowed in in your repository so with Furnet the keys are very crucial and the Furnet keys are combination of Signing key and an encrypting key and it's basically 256 bits of size so 150 128 bits for the signing and then the rest of 128 for encrypting Now There are different types of for net keys all of them look pretty much. I mean all of them look same but the file names are integers starting from zero and If you list a key repository, here's the sample that basically you'll see zero one two three four file names now the primary key the very first type primary key is Used to encrypt and decrypt the token and then the key file is named with the highest index The secondary key is only used to decrypt the token and then it's named Bigger larger than the lowest index but smaller than than the highest index the staged key is D it's used for decrypting and it's the next key in line to become the primary key and Staged key is always named zero So let's look at how The now with the symmetric key and corruption It's very crucial that we rotate these keys and don't I mean we just don't be consistent with forever But so look at how the key rotation works So if you just set up Furnit keys using the keystone manage for that setup it creates two keys one is the staged key and one is the primary key and There are no secondary keys in case of these two keys now We discovered that it's time to rotate. So what happens? How do we rotate you identify? What is the next large in largest index which is in this case? It's two so our staged key becomes the primary key and the file name is now two and then what happens is the primary key becomes the secondary key and The new key the staged key is introduced and it is named zero So now we have three keys on it in our key repository now again if you Let's say it's time to rotate what happens now again discover What is the largest index in this case? It is three So our staged key becomes the primary key and then file name is three Our primary key becomes the secondary key Our secondary key remains the secondary key and then a new staged key is introduced now for example when we said that Maximum number of active keys in the keystone confile. Let's say we set it to three. So in this case Least Secondary key the file named least will be deleted. So in our case one got deleted So this is the entire Furnit key rotation workflow It is little overwhelming, but I tried my best to explain it with the animation There are a lot of blogs out there explaining this information. So a lot of keystone team members have done a good job with this kind of configuration so now What what how does how does this for net token is generated? What does it look like? It is comprised. It is basically consisting of these fields for net version The current timestamp initialization vector cipher text and itch Mac So the for net version token version is basically the for net The for net version which is just one version available at this moment and then the cipher text is a combination of token payload so based on what kind of Scoping you need in basically if you are getting project scope token or domain scope token it creates a token payload and Then it adds extra padding since it's based on the block cipher It adds extra padding if it's missing the Standard size of for the block and then encrypts using the encrypting key and H Mac is the combination of all of this and signed so for net is Complicated little with these kind of but it's very secured and we'll look at how It's beneficial to us So now where is the token in our sequel back end? It's not it's not a persistent token. It's not stored anywhere And this is how it looks like so it's pretty much it's very small compared to pki pki z It's comparable with uuid and yet it is not persistent token format so now if for this kind of for net the The validation request has to go back to the keystone service How does it work when keystone receives a? Validation request it basically adds the padding which was deleted before creating this token To make it URL safe decrypt this token and then determine the what is the version of the token payload? Like I said, it depends on if you are generating the project scope token domain scope token It depends what is the versioning so keystone has a static versioning for example for project scope token It has set the versioning to 2 now it disassembles based on what is the payload and Then retrieves the appropriate fields from that payload and then checks whether the token is expired or not And then if the token is revoked or not and Then returns the token back to the user or the service now For this kind of token revocation workflow Keystone team has done pretty good job and it's consistent with all of the rest of the formats with But this is remember this is with the revocation events not the revocation list so it's pretty much same and let's say if now I have the multi data center environment where I Have the for net token I initiate a request Get the token back and then say give me a new virtual machine Nova Configured with keystone middleware has to go back to keystone now and then keystone validates that for net token and then returns the success I Have my VM and I'm happy Now I take the same token to a different data center So Nova configured with middleware goes back to sense the validation request back to keystone and Then keystone can validate this token now Since it's not persistent and keystone can regenerate the token based on the token data itself and It returns me the new virtual machine Even horizon services, which integrates directly with keystone can validate this kind of token. So it is truly Multidata center It supports truly multidata center environment. It is reasonable in size. It doesn't need any persistence The only issue that there are blogs out there Dolph has done a pretty good job on benchmarking all of the keystone tokens and Matt Fisher has done a blog on validating and Performing benchmarks on the for net token So the biggest issue here is the token validation time So now as the token revocation events grow the token validation time goes up And also the number of requests that it can serve goes down so now To answer our question. So what kind of token format is visible for the multiple data center deployment? It depends Well, not truly Furnit is Capable of Basically, it supports multiple data center Deployment, but you have to watch out for the revocation events and at any point if you have thousands of revocation events I think it's you have to be careful with your deployment. So I would conclude that for net is capable of multiple data center deployment and With that, I'll hand over to Brad for the horizon usage of tokens Thanks pretty So I'll next take us through Some of the things we've seen with horizon when working with these different token formats and this will Coalesce some of the things that pretty has been talking about with these tokens so in a real your real world use case of users using horizon and Dealing with multi data center and some things like that In semantic we've used horizon Most of the time we've been using open stack and we found some interesting behaviors with the way horizon manages tokens So when logging in horizon receives a token using the user's credentials and this is an important aspect of security in horizon that you don't have a Service credential for horizon. So if horizon is compromised or the token is compromised an attacker would only have the The credentials of the user as opposed to a more a higher level service token Horizon gets an unscope token for the user and then based on that gets a project scope token And then we'll get Different project scope tokens as the user changes projects It reuses tokens as much as possible and this is done to reduce the transaction load on keystone So you don't have too many token creation requests and then these tokens are stored in the session for each user that's logged in the session storage is configurable and This can be done in the local memory cache in the cookie back end where the the token is stored in the cookie on the browser side In the memcache back end a database or using memcache and a database a cache database So I'll talk some more about the cookie back end and the memcache back end The cookie back end has some very strong advantages and it's currently the dev stack default in this case the token is stored in the browser cookie and In this case very important to configure SSL for the horizon connection Since you're gonna have tokens going over the wire and an attacker could see those tokens and use them Also secure the the tokens on the server side with these The configuration values here and there's some links here to developer docs and the security guide for setting that up So the cookie back end is highly scalable You're storing the tokens only on the the client side with the browser. So reduces the impact of storage needs on the server side but the main limitation we've run into is this dreaded boot back to log in issue And you might have seen this in the past if you're if you use horizon at all So you come to horizon and log in like usual you use any mem password And for most Browsers the cookie size is limited to four kilobytes So if you have a more complex keystone deployment with a lot of endpoints in the catalog You sign in and what you see is this So you're back to the login screen and very confusing for your users And what's happened in this case is the cookie is overflowed Normally due to a lot of keystone endpoints in the catalog and the token becoming too large for the cookie So To deal with that issue one of the things we've done is switched to the mem cache back end And this allows for Larger token sizes to be stored Tokens are stored on the server side. So you don't have these issues of overflowing browser cookies Requires memcache D to be running and then horizon configured to look for the sessions in memcache D And can also be used with a backing database so that you have the cache and the the database behind it And this is what we currently use in cement deck What we've done to resolve some of these issues Another thing that's been done in the past to reduce the impact of these large token sizes is token hashing This is done in the Django open stack auth project. This is a dependency of horizon and Keeps the stored token data small as you're storing just the hash of the token We've had some issues in the past with this functionality getting broken And so it's currently not working on the master branch for pki tokens So we've got a new config value in Liberty to disable it That's shown here But you need to be careful when doing this in a production environment as if you disable hashing Then you're storing the whole pki token on the server side It works fine with kilo horizon and that's what we currently use But just keep in mind that on the master branch you may have some issues there Next wanted to talk about multi region in horizon Horizon has a concept of service regions versus authentication regions so the service regions or what you've probably seen in the top center of the horizon interface and These are different regions that are part of the same keystone catalog So you might even have separate keystone endpoints for different regions in that same catalog But for that to work properly Those different keystone endpoints have to be able to authenticate each other's tokens or you'll have issues so if you have keystone endpoints that That can't authenticate each other's tokens Those need to be specified as different authentication regions in the available regions So for uuid pki in pki z without token replication across your back ends Which is generally infeasible in production they you won't be able to authenticate those tokens across regions So generally you would need to have those set as different authentication regions But as we've Discussed in some other slides for net tokens do work between different keystone endpoints and so this should help us out in horizon Possibly having different service regions with different keystone instances So to get into more about for net tokens and horizon. Yes, they do work right out of the box Liberty and beyond no patches necessary So right out of the box in that case if you're using kilo you do need a patch for Django open stack off and the patches here and This this next topic is a little bit off topic from the different keystone token types But I wanted to mention it as some work that's going on in horizon right now in the community Support for v3 keystone domains via horizon. So if you're not using domains You're fine at this point You use project scope tokens for basically everything you do If you're using v3 though and you have domains you need to manage those domains with domain scope tokens And horizon doesn't currently support this so This requires changes to the Django open stack off project and also to horizon This is planned for the metaka release currently and there's some info here on usage We currently use it in semantic in production Documentation here is a bit out of date, but just see me if you're interested in doing this Then as far as Getting this into community These are the the current domains patches that are out for review So still working those So to come back to for net tokens Will they solve all our problems and We've seen that the smaller token size very useful No persistence for tokens. So the no persistence provides us the seamless authentication across regions So it should be much easier for taking a token from one keystone and being able to Devalidate it with another keystone But we have seen those performance issues if you have a lot of token revocation events in your environment You may see Performance issues so consider that as you're taking a look at possibly moving to for net at this point And with that thanks for all your time And if you have questions we can take those now And we have a microphone in back. Is that right? I know he's empty as a question about for net token If the tokens are not persisted in back end How does the association between token To use our metadata is kept So the token is so the entire payload is encrypted and signed and it's part of the token But if anybody retrieves if if the token gets to anybody an attacker He cannot decrypt the token and get all this information Because you need the keys to basically decrypt and get the so keystone can regenerate What the token payload is from the token itself The keystone service can do it So that's why there is no persistence needed in case of for net And not so much a question as a as a remark Without taking away anything from the benefit of using for net tokens, but Your suggestion is that you cannot do multi region With your uid tokens and If you use federation you you can you just put a seminal idp in front of it And then you have a valid seminal assertion that works across the regions And then you use a new keystone token, but that's all Invisible to the user. So it's not impossible to use uid tokens Yeah So with federation it is possible but the way I'd shown it like you have ildap which is replicated then it's not but yeah, you're right Any questions? Is it on? Awesome. Dolph Matthews retired keystone ptl Um, I had my hands designing maintaining supporting everything in this presentation And I just wanted to say thank you awesome job. Everything was totally accurate plus two. Thank you Thank you Thank you any more questions are feel free to So that has to be done using basically a deployment tool and you So when you synchronize it so it doesn't have to be synchronized at the same time For example Since the primary key once you reach the primary key you encrypt it with the primary key But the decryption happens with all the list of keys that are in the repository. So it doesn't have to happen at the same time The uh synchronization Let's say you say you update a repository in one data center and then after that After a few minutes or you know, there's it's possible to have that lag Time lag as it's authentication mechanism. Do you know are you guys any of you working on that or is that some other? No, maybe somebody from keystone just curious Oh, sorry. I didn't have the microphone. But yeah, so just a question on I know they're integrating kubernetes Google kubernetes in with keystone authentication. Um, I just wondered. I think it's fairly new Um, but I know there's some work going on there. Just wondering but anyway Can I trust the the token referent token by using the the current trust mechanism Sorry, I don't know if I want to trust the token You want yeah by using the open start token mechanism You don't understand. I didn't get that. Could you put the microphone a little? Yeah, maybe If you want to trust a token Trust by using trust. Okay. Okay, but using the current trusting mechanism Is it possible to use it by using the deferent token? Or not, uh, I think it is yes, it is possible You can generate a trust scope token you in fernet. Yeah, and can I least Suppose you have two regions Like the your picture when the user tried to to create a new written machine the second domain Can I list the user in the second? Suppose I have the same project in the in the both regions, but the the The user created the token in the first vision And probably the the the user is not Yet in the in the keystone of the second region Um, when I ask for the list of users Right, so let's say if you don't have the users replicated Yeah, then it wouldn't work. So it doesn't because for that Yes, you need the all the users to be replicated and all the other resources for example projects domain rules Assignments all of this information replicated across multiple In the second region No, no There is no persistent In the database, right? Okay. You have two keystone sites two sites with two keystones And are not there are no Correlations No No data replication Okay You have two keystones the data the database are not Duplicated ha ha the replication is not yet. Yes. Okay one user created a token in the first keystone Okay, and then try to use the same token the second keystone, but the second keystone the user doesn't exist No Because the user doesn't and the the mechanism doesn't work. Yes, okay There's also another concern there about For the tokens to work across keystones you would need to have the the keystone projects duplicated across so to be Replicating back and forth because if you have a a token scope to a project and then you You go and use it against a different project. It won't work either I think there's one more question So you kept saying that The performance went down if you had new Revocation records. Is there a mechanism to clean up the records? Like old records after a period of time saying There it is So, uh, basically when the when a new revocation event is created the all the expired events are getting deleted And that's when it gets impacted basically, uh, so you can set a separate like a flush job and then revoke all the Uh expired events, but I think uh mad did some research and I I don't have numbers for that, but um, I think that didn't help either Like in case of uuid you flush the token back end and then you pretty much gain more I mean you get back that performance, but with revocation events It was not the case I have a I'm getting the cut sign. So I think we'll have to stop it here. So come let us know if you have questions Yeah, and thank you everyone. Thank you for joining us