 Hello, everybody. Welcome to the next session, which is about new storage for Key Clock. And it will be presented by Michal Haas, who is a core developer, leading graded data grid integration in the new Key Clock storage. And Hinek Molozik, who is a Key Clock maintenance leading work on Key Clock storage effector. One more, if you would like to ask the questions to the presenters, please use the Q&A section here in open. Thank you, and the stage is yours. Thank you, Lubomir. So welcome to this presentation, guys. This is going about storage for Key Clock. And let's start right away with some brief introduction about Key Clock is and what it is using for storing the data. So Key Clock is an identity and access management solution. It means that it provides basically a secure way for authenticating and managing users for applications. For this, it uses a well-known protocols OpenID Connect tool and SAML tool. And it also provides single sign-on functionality. What does it mean? It means that if you have more applications that are all connected to Key Clock, it should be enough to just log in once and you should be automatically logged in to all of them. And here is a screenshot from our management console. The management console is used for managing all the settings for Key Clock. So it manages data about Key Clock settings, about application settings, about some access control like roles, groups or about users. And on this slide, I would like to show you some quite common use case, how the Key Clock is used. It's important to know there that this is just a simplified diagram. Usually it consists of a few more steps which are depending on the protocol that is used. So either SAML or OpenID Connect. So it works something like that. So first the user access using his browser, his or her browser, the application and the application doesn't know the user. So it redirects the user to Key Clock. And now Key Clock needs to check whether the application is known and whether he's managing it. And if yes, it displays a login form where credentials are inputted by the user and then Key Clock needs to check whether the user is known. So we connect the database again or contact database again and ask if this user is known. And if no, it may even happen that if some external source is configured, it can ask even LDAP whether the user is known or some other external source. And if everything checks out, the user is redirected back to the application with successful login flow and the application shows the protected page to the browser. As I said, this is just a simplistic diagram that shows how Key Clock is using database or source or that data store. And it may be much more complicated and all of these needs to be or how the Key Clock should approach this is stored somewhere in the store. So what are we using for storing all the data? Currently we have relational database that are storing most of the data. But some of these data are also stored in an embedded infinispan mostly due to clustering purposes. And this embedded infinispan is used for caching as well. And it is possible to configure for certain cases also some connection with an external source. So for example LDAP or KBROs. And all of these needs to be stored in quite complicated data model which looks like this. And now if you are familiar with Key Clock, you know that almost everything is associated with Realm. And that is why the Realm is here in the middle and there are so many associations out of it. And the data model like this has some disadvantage and it is that if we want to update it it usually requires downtime because we need to stop the old Key Clock then migrate the database to a newer schema and then start the new version. So with this approach we started hitting some walls and we started to think about new store and we formulated following points as our motivation. So we want to allow user to leverage any storage technology they want for storing or caching data. And for this we also want to simplify the data model. So we would like to split the model into some logical eras and we also want to allow users to store only and to use different technology for each era if they want or they even maybe want to store everything in one storage it would be also okay. So if we want to split these eras into more storages we need to have some losing relationships between those eras because if one entity is stored there and the second one is not it may end up with some broken referential integrity or something. And we want to allow users to, for example, store users in a relation database because the users are quite often queried and there may be a lot of them. But if there is only some fixed number of application it is probably okay to just store this fixed configuration in some static storage media files or GitHub or something. And last but not least we want also to support no downtime upgrades because if you have a structure or infrastructure that is relying on a keycloak with authentication it usually means that if a keycloak or single sign-on solution is down the whole infrastructure is down because no user can authenticate and no access control can work. So with this motivation in mind we revisited our current storage and we came up with following ideas how to make this work how we want. So we split the model into 10 independent eras and we ended up with a data model like this and now each rectangle on this slide is basically a separate era and you can see that there is much less tables and less connections between those tables. This is because we denormalize the schema and we allowed some tables or some data to be stored directly in the parent and it is important to know there that if someone wants to still preserve the referential integrity or still want to preserve all the relationships between data it can be done but we allowed losing these relationships and if we do it this way that we lose all relationships it is possible to leverage a different storage technology for each era and since there are less tables we also minimize the need for downtime because if we update some data it doesn't necessarily mean that the schema will change. Then we were thinking how to provide this more downtime upgrade support and we came up with the following requirement which is the one that is minimum requirement is that we need to achieve that it is possible to run two adjacent key clock version in the same cluster. What does it mean? It means that if we have a cluster of nodes with key clock version 4 for example it is possible to spawn a new key clock version 5 and next to those previous nodes and it should work together and it should cooperate together using the store in a state it was before update. This means that key clock version 5 needs to be able to read all the objects that are already stored in the database and it may even happen that there are some other objects that were not created by key clock version 4 but maybe key clock version 1 so this needs to work as well and it needs to work also the other way around that these key clock version 4 needs to be able to read all the data created by key clock version 5 and this given by the first requirement we can be sure that there is only two adjacent key clock version running so this requirement holds only for two adjacent key clock versions so key clock version 4 doesn't need to be able to read objects created by key clock version 6 for example and as you can see the objects in the store are still when in the time of key clock version 5 spawning the storage is still the same so objects are not updated these are updated only when it is written to them so for example if a user changes his or her username and the request ends up in key clock note version 5 the old object is read and data are migrated then username is changed and the object in the database is updated to newer representation version 5 and running the cluster in this way that there are a lot of objects with older version means that it can have some performance impact and we want to allow administrator to schedule some task to some specific time where the objects will be read from the store and updated long by one and and that's it for this so the important thing to note on this slide is that storage version doesn't necessarily equals to key clock version because to explain this I used the key clock version 4 and key clock version 5 in fact this was basically a storage version not key clock version because key clock version 6 or 12 can still have storage objects or storage version 1 if there was no change between those releases and even more even eras can have different versions because for example if there was updated in users schema the version was bumped but if wrong stays the same the version can be still bumped ok next requirement was to allow users to use storage they want for storing their data and for this reason we wanted to simplify addition of new storages and if you remember or if you know key clock it had quite a lot of methods to implement if you wanted to allow a new for example user federation you need for example to implement method get a user by ID or get a user by username or email and we wanted to get rid of this and we introduced a few interfaces and these interfaces contains just simple transactional operations and then few bulk operations and if you have a closer look how we were able to achieve this how we were able to turn these methods these lot of methods that needed to be implemented before to just few methods from an interface few operations from interface we needed to have a look at our at our implementation before and we had following layers so the first layer was services this services layer is basically triggered by REST call so it is like jazzers endpoints and this services layer was communicating directly the key clock storage layer and this key clock storage layer was combining together too many things in our opinion so one of the thing was session this session has all the information about current key clock current key clock run or instance and then it has possibility to communicate with other components of key clock or it has information about REST then on this layer there is also this entity and this entity represents basically all the knowledge about the underlying storage which means it needs to know how the data are stored how to communicate with the external system and how to build queries for this system so for example in jpt and then there are these models and these models are basically combination of session and entity and this model object provides some logical operations or it can make some logical decisions based on the data loaded from the storage and from the session so this can be considered a little bit as a spaghetti and we would like to replace it maybe with some lasagne so we had a look at methods we had and we found out that this is an example of one of the methods from JPA run provider as you can see that this method can be basically split into two parts one part is this physical part and this part knows everything about the underlying storage it knows the data structure it knows how the data are created using this jpql query and it returns just in this case it returns strings which is IDs of roles and then there is this logical part which is making some logical decisions based on the results from the database so for example in this case if no role is found it returns now and if any role is found it uses session for communicating with another component of e-clog in this case roles and it calls method get role by ID and it basically turns the storage data in this case ID to role model which is this combination of data from storage and session and now we were thinking about this and if we are able to basically split each method we have into two parts and the one of them will be the logical one which has knowledge of the session and which produces models and an example can be for example Mabuse provider and then the second part will be only the physical layer which knows all the details about how data are stored but this layer is not able to do any logical decisions based on the request it just returns raw data and by this what we achieve is that the implementation of new storage or new storage implementation is easier because the user or the implementer doesn't need to implement the logical layer it is enough to implement on the physical layer and the logical layer stays still the same important here is to also that logical layer cannot communicate directly with the storage because it doesn't know anything about its structure so yeah we changed this spaghetti into the lasagna and so how the physical layer looks like it has just a few interfaces as I said before so one of them is Mabstorage and this Mabstorage has only one method which is create transaction and then this transaction contains all the all the methods I mentioned before so it has some single object operations and then some other operations you probably noticed that the update method is missing from these operations and this is omitted on purpose because key cloak is built in a way that it counts rely on the fact that all objects returned from storage are in some way life objects and any change to this entity returned from storage should be automatically propagated to the storage on commit and then there are these bulk operations this bulk operations takes some criteria and this criteria basically represents some logical request so it may be give me a user by username or give me a user where email starts with M or something like that and by this this is basically all we've done and here is an example how this works out all together so if there is a browser and it is doing some request to let's say give me a user by ID from realm A such request lands in user's resource and this user's resource is then able to call a logical layer this map user provider represents logical layer and there is a method get user by ID and now the responsibility of this logical layer is basically translate this logical request to one of the methods from the previous slide and in this case it results in read by ID and now this map storage is implementation of some specific storage so it can be translated to maybe SQL SELECT or LDAP query or if the data are stored in memory in the Java map object it can be a simple get call then entity is return these are the pure data that are stored in the database these are returned to logical layer now this logical layer creates this user model which is a combination of session and the data and this layer is also doing some logical decisions based on the data so it can for example check whether the user really belongs to realm and this is then returned back to the user's resource and this user's resource just returns some JSON data back to the browser so to summarize what we have done we simplified the data model we split it into logical areas and we allowed only loosely counting between those areas we separated our key clock storage layer into two parts one is logical layer and physical layer and we simplified implementation of custom storages it is enough to implement the only the physical layer and there are less methods only few operations and we prescribed some rules for how the implementer should for what the rules we prescribed the rules that the implementers should follow to achieve no downtime updates functionality and now maybe you were asking about composability in the first slide I said that key clock needs to compose data for more sources so for example from database and LDAP and it was possible previously but now if we reaped the key clock storage layer into two parts it is probably wrong work so we found out another way and this will be part of the next presentation presented by the next so do you have any questions can you hear me yes sorry I had a connection issue so I see there are several questions in any section ok I would start with all this one so will there be a default implementation which is based on the same logic as today with rational database in storage layer yes there will be we are working currently on basically two implementation one is JPA and then the second one is HydroD and each can be used separately on its own so it will be basically it is possible to use just the relation database with preserving all the referential integrity as it was before but maybe with a little bit different schema sorry not little with a different schema there was one question I think regarding previous slides and this is it quite important of existing user storage provider implementations to use with future Kiko versions or is a map user provider just a different way to do it I can take this one so user storage provider is based on the implementing the operations on the logical layer that means ones that you have to deal with the session make sure that all the data is transferred properly into the models and you work with the models in that provider basically we wanted to remove this requirement I will speak about that later on in the tree storage part but I'm sorry there was something that I couldn't tell but basically to be able to allow extensibility not only for users but basically for any other areas we will rewrite this extensibility I'm not sure whether there will be user storage provider replacement direct replacement there will probably be a need to revert refactor of the user storage provider into a map storage in the future we would like to get rid of this very specific user storage provider because then we don't have any group storage provider we don't have any client storage we don't have any role storage provider client storage provider is kind of hacked in there but other areas are not possible to extend so we want to make it consistent with all the other areas and thus re-implement the user storage provider so there is the next question is it possible to build implementations today? are there example implementations then we can look at yeah I can take this one it is possible to do implementation today we already have as I said JPA and HODROD and it is already in main and it doesn't cover all the entities yet but we are working on this now and it should be finished let's say soon let me add one more thing on top of that there is a full implementation in memory implementation in concurrent hash maps which can be which can serve as a sample but we are not there yet with the documentation so you can have a look there you can try it with wildfire distribution there is a script in the examples which allows you to set up using the concurrent hash map storage and we are working also towards the documentation so that we are able to to the community as well to get better feedback thank you for the answers there is the last question and this if you break up the storage into different implementations how will the transaction semantics be? let me take this one this one is actually very tricky and thanks a lot for this question the transaction semantics is ideally if we had all the storages had XA transactions we would be much better off than we are able to do even with the current storage because for example if you go to LDAP, LDAP is no transaction semantics at all so we are doing our best efforts there it is very similar with what we are doing with InfiniSpan we are still trying to do our best not necessarily using XA transactions because these are performance inefficient however I will speak about that slightly or touch on that slightly also in the next presentation but the transaction if you keep it only on the level of a single database would still be a standard database transaction so if you keep your data source in that database you can still rely on the old transaction semantics because the transaction if you craft it in a proper way would be shared across all the areas there will be only a key clock transaction which would refer the same database transaction coming from different areas so basically the transaction behavior will be there thank you there is one more question and how breaking changes of the storage physical layer are handled to provide no downtime assuming that this translates to storage breaking change too I'm actually not sure I follow completely how breaking changes of the storage physical layer are handled to provide no downtime let me answer it slightly wider so storage physical layer the implementation of a storage physical layer has to handle no downtime upgradability by itself each of the storage implementations has to respect the rules that were stated there by Michal and then it would be possible to get no downtime upgradability however these rules are not the only ones and if you come up in your implementation with whatever other rules that would guarantee the no time upgradability you are definitely free to go with those because the no downtime upgradability is really hidden in the details of the implementation of the map storage so from keycloak perspective from the perspective of services this is completely hidden this is a layer below that the services can see I hope that answered the question if it's not clear under Apple for it to put another question or you can discuss it also after the session okay I don't see any more questions right now so please go on and you can continue if there are not any more questions apologies that was a wrong I will try to share the right at this moment this one yeah that should be it so welcome everyone who has joined since Michal's talk I would like to speak about tree storage which is basically a storage composition and I would like to warn you in advance that this talk will be a deep dive and one more thing tree storage contrary to map storage is only partially in the codebase it's still under construction so here what I'm presenting is the current state of the earth and I will be glad if you provide me with any kind of feedback for example on GitHub discussions that would be perfect or obviously after the talk in the area that we have for this so tree storage is about the composition and Michal already showed some example and I will go through that example to answer the question whether the composition is really needed so that's the motivation slide that you already seen slightly amended so the setup is again the same so there is a browser it wants to get some application page that's protected and it reaches out to keycloak and keycloak checks whether this application is known so it would actually get to the database that's one thing so once it learns of client it would ask for the username, password possibly and then after getting some information it would check whether this user is known and it would check actually two sources database and elder and knows that these two sources means that there is the same object being composed from two sources it's it gets even slightly more trickier if you go to the two-factor authentication then two-factor authentication details parameters are not actually stored in LWU well they may be but usually you would get it from the database and you would still speak about one and the same object one and the same user so only after you get these data and see the full contents of the user you would be able to obtain what parameters you should be using for that particular user and eventually you get to the protected page which would be served by the application or not if you don't know the two-factor properly so the current architecture I'm slightly repeating what has already been said but there may be some of you who weren't here in the previous talk the current architecture leverages embedded InfinisPen and relational database it is important to realize that the InfinisPen and the relational database both they are in two different roles the InfinisPen works as a cache for certain areas for example to cache some user data but it also works as a persistence layer for example to store the sessions and share the state across the cluster Relational database it's kind of similar mostly it's used for persistence but if you have some user federation for example in LDAP or your own custom user federation then it may be set up so that it caches some of the data it keeps synchronized state of that particular user federation with the data that store in the database and in that way it serves as a form of a cache in the current architecture there is exactly the storage provided that was mentioned in the previous question and there is a support for extending user storages but as I talked about that in the answer the implementation is basically on the logic layer and has to deal with many let's say boilerplate code that turns what is stored in the data into what is expected by the key cloak services that is the model there is some limited ability for extension of clients and there is no ability to extend other areas that is roles, groups, events whatever no support it and we wanted to actually change that a little bit with the map storage but as we see the composition is necessary so we need something more we want to leverage any map storage be it your implementation key cloak implementation and somehow stack it on top of each other so for any custom storage from any area that means you would be able to have your groups you would be have your custom implementation of role storage you would be able also to leverage what is offered by key cloak and just implement it on the physical layer that means your implementation should be kept simple so here we come to the tree storage tree storage a tool for organizing map storages into a tree so imagine that you've got an admin who would like to have this layout of the storages there would be a cache realized by an interim infinis pen or hot rod connection that would get its sources from source data from two data sources ultimately an LDAP which because it only contains a part of the object has to be supplemented by some jpa for example jpa would only contain the attribute values and from arrest well that maps basically to a tree if you check the colors they should be matching and you can see how the tree is organized on the picture of course you can have another layout if for example this rest calls also don't contain everything and they are on par with then you would like to have those attributes supplemented by the jpa as well also for that rest storage importantly we've got map providers we require we are able to plug into any map storage and importantly tree storage is itself a map storage so it is an implementation of a physical layer and that simplifies plugging it into into GeekLog itself because plugging in map storage is already there now when I mentioned that there is a user there is ultimately some store that decides whether the user is or is not there at all whether it exists I'm not speaking now about the properties about the emails name or whatever but whether the user itself exists and that is in the storage one of the leaf nodes so if there is a need to decide whether an object exists it ultimately would get to one of the leaf nodes which one that's something to be discussed later on the inner nodes like this one can cache some of the fields or they can supplement the object that it retrieves from the storage below with some additional fields as we have seen in the previous example with the attributes we are also we will need the concept of an authoritative node the authoritative node is tied to search criteria so it is a node that may contain object in its store each of the nodes is tightly coupled with a single storage so it can contain an object from that store now then it would be an authoritative so let me get to an example for example here if we search by a username then we would search held up and skim directly in this case and if we search by some attribute and now here we are assuming that skim contains the full description including the attributes then it would be actually these two nodes that would be authoritative this authoritative node is a concept to keep in mind and one more thing that is related to the three storage is there are some properties that are applicable to a storage that in the current view are stored at the same place as they are for example connection properties as they are some properties for for example synchronization that are generic properties that relate to any storage so now with the three storage there is a distinction between the properties that are specific to a particle storage and properties that are specific to the structure do I want to for example validate each object that I read from some cache then it is not a property of a particular storage it's not something that relates to connection itself it's something that relates to how this tree is actually processed so it is a property of a node or edge not going into details now similarly for a read only storage you would use the same implementation as you would for a read write storage just mark in the note that this storage is a read only now to understand a little bit more about what is the anatomy and how these compositions really work we need to get into the entities and this is going to be interesting in the physical map storage implementation the entity is the general representation of the raw object the raw I mean there is no models referred for example in the user entity you wouldn't find give me a row but you would find give me a row ID so if we replace this Java code that you may be familiar with with a picture which will be used for the next presentation for the rest of the presentation the map user entity is basically something that is here marked in the dashed line now two kinds of storages the first one is simply a native storage that contains everything that's in that particle entity type or a user you would get in a JPA every single property stored in some model what type of model it is irrelevant for this part there is another type of the storage that is the partial storage perfect example is exactly the LDAP that we have seen earlier so LDAP it contains just a few properties how do we make a map user entity out of these properties somehow and now that is becoming very interesting so in the node we store yeah I should probably say it slightly differently so we would turn the JPEG photo and street into two attributes street and photo and we know that this LDAP doesn't contain anything regarding the required actions so what we need is to store in the node the status of the entity fields we know that the LDAP needs to spit out objects of this type map user entity and we know that only few of them are really filled so we have to keep track of those and that is exactly what we are doing here the node keeps track of the status of the field there is a primary source there is primary source which may be also partial that is for the attributes street and photo nothing else and then there is also the information what is not handled by that storage I'm sorry we have this entity composition and we have a native map storage that is able to supplement the missing attributes missing fields by keeping track of the identifier of the lower object in let's say LDAP in the JPA and also be able to say what exactly the fields are what fields are stored in JPA and which fields are stored in LDAP and then act accordingly I will keep this for you to check in the attached presentation to this talk and I will skip this we I will just get to the read operation so if we have a read operation that is able to read exactly from the very first very top element then it would be able to retrieve exactly this object now if we want to get a username from this object we know that the username has been cached from LDAP so it can be written immediately from this object if a setter is called then we know that while this is cached it needs ultimately changed in LDAP so we lazy load this LDAP object and then change the username there we don't change it in the LDAP itself that's related to the transaction properties that you asked earlier but upon commit we write it through into LDAP I probably will skip this one as well because it's just a generalized example just let me quickly go to summary so we in the map storage need few more few more operations that would keep the linking between the individual original and supplementing objects or caching objects depending on the use some validation and invalidation and basically that's what is possible to fit into this presentation so Lubomir if you want to take over feel free to and thanks a lot for your attention thank you very much guys for your presentations it was really great this is the end of this session we don't have too much time for or we don't have any time for the question and answers but if you want to discuss anything feel free to go to discord or you can move on to the work adventure this is the great virtual platform where you can interact with each other so feel free to go there and you can discuss anything related to this presentations thank you again very much