 Hey, good afternoon everybody. Glad you all made it on the last day of the summit at the fourth very slot, pretty impressive. Actually in Vancouver we did a talk and it was pretty much the same situation, so we're like the curtain call act after this everyone goes home. My name is Andy McCrae and that's my colleague Jesse Fratorius. We both work for the development and engineering team at Rackspace working in the private cloud department. We're both based in London and in case you're picking up an accent we're both originally from South Africa but when Jesse starts to talk that's going to become a lot more apparent. So yeah we started around six months ago or so looking at doing a federation and like a global cost during multi-region cloud solution which we wanted to implement in the OpenStack Ansible project which we both work on. So we started looking at some of the options on how we'd solve the solution or come up with a solution to solve the problems that you need when you start talking about like multi-region clouds. Just to be clear like we start all of this for Kilo. So there have been a whole bunch of improvements that Jesse's going to give mention to in terms of changes in liberty but when we started doing this it was entirely Kilo. I personally worked on the Swift Global Clustering solution and Jesse did a lot of work on the auth federation stuff and identity so I think we've got all the bases covered hopefully but yeah we'll give it a go to try and let you guys know what we learned and some of the stuff that is a little bit challenging when you're coming up with these solutions. So let's start at the beginning. Why would I even need a multi-region cloud? So the most obvious one is I have two physical databases or data center or other locations and I want to have a cloud in each that I can access. So it's a performance issue and potentially even legal because there are certain regions for example Germany is a really good one where you have to have data stored in side of the country for certain legal issues around like banking and various other things but I might want to manage my two clouds from one location which isn't necessarily in that same place and performance wise if I'm in a certain country I probably don't want to be accessing a country accessing a cloud in a country that's kilometers and commas is a way because it's just not practical you're going to get performance issues. So that's the first like most obvious reason. So let's say you have a really large cloud and it's all in the same data center. What if the cloud grows to a point that it's not manageable now? No one really has documents at how far you can get with one cloud and then before things start falling over but depending on the resources you've got thinking about the database and various other things that are finite how do I then scale that out? Well regions can be a pretty good way of doing that because I effectively have two clouds within the same data center or area but they're managed kind of separately but then we put an overlay on top to manage both of them and then lastly you might want to segment the concerns that you have so maybe I have a quite small cloud but my company has two separate departments that can't use each other's resources. I can use regions to separate those out and it's a pretty decent use case and I think that's actually probably one of the use cases that is more common because they aren't that many clouds that scale to the size that you would need multiple regions and they aren't that many people who are doing multiple data sensor regions like across the globe. So what were we actually looking for? So the main thing we had was we wanted to be able to do compute resources in two separate regions potentially in different data centers access them in the same way and we also wanted swift so we wanted a situation where I could put objects in our object storage in one data center and then retrieve them from another data center and vice versa. So it's a pretty simple use case we found it was useful to just start at a very small and then kind of look at what's there and build in from there. So federation itself raises an interesting social issue and the issue is that there's a lot of buzzwords around federation. There's a lot of words that have really strange definitions it's not really clear what they actually mean and in fact in some cases some of the words mean more than one thing depending on what you're talking about. It makes it really really difficult when talking to like management chain or people that aren't necessarily involved in the implementation it it becomes confusing and I guess difficult to say we're trying to achieve these things and this is how we're gonna do it when there's like confusion around some of the some of the words. So I mean even just starting with the word federation like people have been talking about federation with an open stack for the last couple years and it's a it's a bit of a buzzword so when I looked up what federation actually means I was a little bit surprised when it had no mention of magical cloud things unicorns ice creams rainbows and also a cure for cancer and world peace. Everybody's been doing federation apparently there's not much documentation on what they've been doing and no one seems to be able to show that they've done it so little bit skeptical on that but it's been an issue for two years and it apparently solves all the cloud problems. It's not really true it's good for a couple use cases and it does have a place and it does solve a lot of key points but I think that this kind of catch all that federation is gonna solve all the problems you have with your cloud is kind of silly. Some of the things I spoke about like there's confusion between words that have two separate meanings. I worked on the SWIFT stuff and that's actually a perfect example. The terms in SWIFT for regions and zones are mirrored and over in that and well Keystone and and the other open stack services in that they also have regions and zones they don't actually mean the same thing so when we talk about regions and zones in SWIFT what we're actually talking about is a degree of least alikeness so SWIFT uses an algorithm to replicate objects and accounts and containers based on the least alike available host or node and that goes from order of regions being the least alike to zones and then servers within a zone and then drives within a server so if you imagine a region as being like an area in a data center or even a different data center zone being potentially a rack within a data center and then obviously a host is just a server and a drive within that host and what I'll try to do is replicate in a replicate objects rather to least alike places all if I have two regions and I have three replicas they will at least be one replica in each region but in Nova or in compute and identity when you talk about a region what you're really talking about is different endpoints so I would access an endpoint like region one or region two and I'm actually just accessing a separate Nova API endpoint that controls different compute hosts they don't actually care too much about the other compute hosts whereas in SWIFT what you're really talking about is one massive SWIFT cluster they all replicate to each other they all care about and have to know about each other but they're in separate places potentially and zones is exactly the same so availability and zones in Nova are groupings of compute hosts within a region I've actually had the conversation about the differences between SWIFT and compute terminology a lot so it is it is an issue and actually some decisions were made on the basis that these two things are the same and I had it go in and say like these are not the same things like let me explain to you now that like this in SWIFT it's still one cluster they need to know about each other you can't just have two separate SWIFT regions floating around that don't care about each other which you can do with Nova like in Nova I don't need to have a concept of these servers in this zone know about these servers in this zone they don't they don't have to and so it is obvious why this can be confusing to people and it becomes especially difficult when people making decisions on products and various other things aren't involved in in kind of the implementation side of it there's a couple other important terms mostly around authentication and identity so things like projects or tenants a little bit confusing because the name changed and then it kind of changed back again but essentially it's a grouping of resources within the identity service and then you have domains which are the owner of projects and then there's users and groups but I think those are reasonably self-explanatory and it's changing again as always so yeah there's some confusion around the keystone terms because they kind of keep changing and from v2 to v3 the terms are different and back when it was before keystone light they were different and so when we started looking at this it became obvious really early on that the main issue is about identity how can I off against both clouds is is the real issue if you imagine you had two regions and in separate data centers across the globe or more than two even and you had no authentication at all like it was just open for everyone there's actually really no issue aside from I need to have a way to specify where I'm accessing the resources from because I don't have to off it doesn't care like the users are all the same and it will just work so now that we know we need to off how do we do that and the problem really is that the way Nova and Swift work or not just know about other overstock services is quite different it goes back to the concept that a global cluster in in Swift is actually just one entity and that's just really large and in Nova it's two separate regions two separate API endpoints separate sets of compute hosts that don't care about each other and that creates some issues because in Swift the way it creates your user account the way you access your objects is based on the project or tenant ID and if we store that outside if we store that within the cluster the IDs and the project ID or tenant ID will be different for each region and the problem with that is that although the data underneath is being replicated and is technically in the other region is not accessible by any users there because the user that created it is in region 8 so even if I had the exact same user that authenticated in the same way and could use resources in the same way for Nova if they came from different clouds where the tenants and project ID didn't sink you wouldn't be able to access the same objects in fact you'd be able to get two separate sets of objects depending on which region you were accessing from for the exact same user and that means that you're a bit limited as to how you can do the authentication and that's actually one of the key issues that that we found almost really early on so the thing is that Nova doesn't care and there are other solutions so there is a solution for Swift which is container sync it's a little bit of an older solution than the global clustering and it's essentially having two separate Swift installs that then just sink containers so that is a possibility you can add it to the Swift pipeline and it will essentially give a list of of the other clouds that you want to sink to and will sink certain containers but it's got a different overhead like the management overhead for it is actually a little bit more like you have to specify when you create containers that they're synced it doesn't just happen automatically like it will with regions so hopefully I've framed the issue a bit so you have an idea of what we're trying to solve for and some of the brief issues we've had and Jesse's just gonna run through some of the ways we approach the issues and some of the things we found when we when we were implementing so there are a number of ways we can approach the problem and dealing with identity ideally we want to have a common source at the very least a common source for the project IDs because that's for for the Swift use case project IDs for global cluster that you want to access stuff the same stuff in both sides that the project IDs at the very least need to be the same so there are there is a stable solution to the there are two stable solutions to the problem the one is you replicate your your identity database so there may be people in the room that have even tried this it's a bit expensive running a Galera cluster cluster across multiple data centers potentially across multiple countries it can get expensive because it's a per transaction replication process you might do something that's a little bit more of a bulk differential move which is periodic but then the data object I just put in region a is not yet available in region B and it'll only be there in five minutes ten minutes an hour whatever the process is that's not exactly ideal but that's one way you can do it the other issue with that process is that you are essentially having to run this global cluster and you've only got one internal data source to look at for identity you've also got to carry the identity information so that's that has other implications from an audit security and management overhead standpoint that's one one other thing to consider and then depending on how you're you're running your keystone depending on the token method that you're using it may you get very expensive because of token revocation and also just making sure that the same token can be used across both both regions so ultimately database replication is practical it works will know it works it's a fairly well-known solution but it's expensive expensive in many different ways then we could use an LDAP back end LDAPs also directory services are pretty well known a service that are used to replicate in across WANs vast areas it's pretty well known to infrastructure engineers so that again is a pretty well known solution to this problem and can solve that but the keystone LDAP integration back end is not great anybody who's worked with it will probably vouch for that the code is a bit of a mess the keystone team know about it they're trying to trying to improve it but they'd rather actually do something which which we think is a better solution and that is make the identity and author the authentication process something else's problem the advantage of doing that as well is that it could not it could be not just one other entity's problem it could be many entities problems so then you only have the challenge of setting up a trust so and it's it's quite a different workflow Federation is also means many things to many people as Andy's already said so it's a new technology it's definitely very fresh in open stock and that was most definitely a challenge that we hit not a lot of documentation not a lot of experience in the community using it the entities using it you know there's a bit of travel knowledge but not a lot of explicit knowledge but if you can outsource you the identity management portion of things and you can then that your focal point becomes it's out of keystone keystone also doesn't have to be the endpoint for authentication doesn't have to manage that so the overheads a lot lower right another problem with using something like if you're using if you have one region or one cloud you want to call it that one installation with a common keystone database it only scales so far so maybe you do another one and you don't care so much that you've got the objects are replicating so you've got nice little DR scenario your objects are over there and maybe you are prepared to go and restore your database your keystone database over there so you can access it and that's part of your deal process that's great what happens when you hit that scale issue again then you build another one now you've got three to manage I just grows and grows and grows ultimately that just doesn't feel like a good solution there must be something better so if you're centralizing the authentication we've discussed the the Galera bits we've discussed the LDAP bits if you're using Federation then you've got to deal with the the new tech so that's a different cost but it could be managed in a different way in Federation is becoming more of a common use case and actually one of the nice things about working with oven sack ansible is that you guys can actually use that as a reference point for how to configure stuff so our playbooks now templates have now become kind of a document a way to do it please feel free to do so right so one of the great things about working with Federation is that the identity issue is outside of keystone the endpoint for authentication and potentially authorization is also outside of keystone so that takes away the identity issue but it doesn't take away the project ID issue so the thing is that if you're replicating your keystone database even if you've taken the identity or the authentication out of keystones domain you still have a problem when you're working with something like Swift you still have a project ID that's used as a container name so you're still sitting with the situation where it to access it across both the auth might be fine but now you have an access point issue and the reality is that today well today as of stable kilo today that is not a solution that can be solved with Federation as of stable liberty today we haven't verified but it doesn't look like it all right so what that actually means is if you want to do a global cluster you are kind of stuck with using something else as well not good news right so the reality is we have to use the best for both worlds for now at least the the keystone project database does need to be synced but the nice thing is that you can sync your keystone database if you're using Federation the identity bits are outsourced if you also add in another another feature that was added for for kilo which is to use for now tokens instead of few UIDs then you're sitting in a situation where your tokens on your database is either so now you're sitting in a situation where the replication is not as expensive and that's cool so it's not so bad the solution is still a bit complex but it's not so bad another option of course that you could use is you just have one identity cluster somewhere so you have one common access point held in one place or perhaps in two regions but they are close together enough that the expense is not so bad at least from a data at data transaction point of view all right so mentioned a few lessons learned one of the lessons we learned was that product teams don't understand the same terminology in the same way that we do so that's that's a little bit of fun another thing we learned is that Federation authentication for Keystone is not in the clients once you have your token you can do a lot of stuff but getting the token is not yet baked work is ongoing in the Keystone teams the guys are doing a lot of work there I haven't checked in the status in the latest Liberty bits yet but last I saw there were still issues trying to use the OpenStack client to actually authenticate to a SAML I think the RDC source might be a little bit better and I think Redats done quite a bit of work to get Kerberos working another thing that's a lesson learned is that when you're using Federation you suddenly don't have a user unless you choose to actually fix an external user to an internal user so it becomes a little weird when you as an administrator looking in your cloud and you're trying to identify you're trying to map a something that was created to a user account there is no user account you can't see it if you look in your logs you'll see that there isn't identifier but it's whatever the identity provider chose to share as an identifier so the control for what the the idea for the user is is outside of your domain that takes a bit of getting used to and from an administration or control standpoint it takes it means that you actually have to change the way you work a little and figure that out right it's not as simple as as you may be used to yeah I think from from this website as well some of the issues we had were around implementation so again because there's a concept of one whole cluster that's deployed which is kind of different to how you would think of federated regions in that they're separate entities it means that when I go to deploy a cloud with Swift in it I've got two clouds with Swift in it that need to know about all the other Swift nodes in order to replicate so do I just deploy one large Swift cloud and then maybe change the proxy servers per region do I deploy Swift in two regions and then have to separately and then have to sync them up by you know adding the ring and the and the SSH like odd keys and things to each of the Swift hosts in both regions and there's a couple of tricky ways that you could that you run into tricky things that you run into when it comes to implementing implementing Swift itself and again one of the things we we considered quite early on is that perhaps it's an application issue like you could quite easily solve this by pushing the authentication problem to your application it's not the best solution in terms of management but it is a solution that could work for you if you code into your application that you are calling from two separate clouds with separate off details it would still work for you so I mean there are a couple other things that you could do to push this away from being an open stack issue but again as Jesse said like the the overhead that you start to gain is just too much to make to make it useful unless you have only potentially two regions or very small use cases basically. So another thing is for the Federation but we decided to use SAML as the the Federation protocol and there are two ways to implement Keystone as a Federated Service Provider the one is using ModOrthMellon or just by the way basically what Keystone does is outsources the authentication portion to a plugin so you can write your own plugin if you want and that might make things simpler but for SAML you have two options ModOrthMellon or the Shibboleth plugin. We picked the Shibboleth one because it was the best documented at the time. It took a while to make it go including having to read source code. See source code. It wasn't fun because Shibboleth is largely used in the education community. The documentation is sparse and that that is there is often out of date so trying to figure out how to make it work that was a lot of fun. We've done that for you though so we keep it simple. That said we haven't done a perfect implementation either because there are just a lot of options and we will try over time to make it a little bit simpler and use a bit more of the functionality there but it's a bit scary running through that stuff. Anything else? So we just wanted to quickly talk about the OpenStack Ansible project which is where the stuff that we've done for the global clustering for Swift and the federation work for Keystone is so we've got the link up there for the GitHub repo. It's in the OpenStack name space now. One of the other things is that because federation is a reasonably complex thing, some of the options and how you actually set it up is a lot more difficult than we'd like it to be. As Jesse mentioned it's not the perfect implementation and we'd love it to be simplified and another caveat is that no one's actually used it as like a production thing. We kind of more got it in there to show that it works and this is a way we could do it and it does work and it's a good start point. Consider it experimental. We have some documentation up as well. The docs are referenced from that URL. We covered Keystone to Keystone Federation so that's Keystone as an IDP and if you don't know this already Keystone is not a very good IDP. Kind of intentionally certainly so because there are other more mature IDPs out there but what we've done will work with a standard Shibleth IDP and will work with Microsoft ADFS 3. Both of those are support Tamil too. Another thing just on the Ansible thing really quickly is that even if you don't want to use the OpenStack Ansible project for this or you're using some other deployment tool you can have a look at the roles and tasks. They're pretty clear cut to Ansible. It's quite easy to look at and you could see how we're doing it and then just take that and repeat it or give it a go and see if it looks good to you. As far as I recall the puppet community is already building stuff which references the stuff that we did. I think at this point just over 10 minutes left so if there are any questions, we thought we'd have a picture of a Federation gone bad. Don't trust that guy. If you do want to ask questions there's a mic in the front if you could use that. Is the Federation done on a per domain basis still or is it everything or nothing and also how do you go about doing role assignments with LDAP if you can do a map to a group you can't see the groups, how do you assign roles? That's part of the admin overhead. Federation or any of the stuff that's outsourced from an authentication point of view for Keystone. Keystone has a mapping engine which is kind of like its very own scripting engine. It's a little bit of fun to figure out. During the process of figuring it out we interacted with the Keystone guys to make it better so you're welcome. The documentation is better now. But essentially you map, you can map an external user based on whatever the IDP is sharing and whatever you've chosen as a user associated ID from what the IDP is sharing. This is all chosen on a per IDP basis. You can map that to an existing user if you want, so if you already have a user in your database, you can connect that, those two dots. If you do that then that user is already members of projects, domains, etc. It will inherit those. So that is an easy way but it's obviously not going to be the way that's going to, it's not the only way. The way that we chose to implement our sort of base example use case is that if a user comes in and has, if a user comes in as a federated user, it gets membership to a group. That group has access to projects and domains. So as far as I know, the only way to do that mapping at the moment, I don't think you can map users to domains. You have to map them to a group and the group has a membership to the domain. As far as I know, that's the only way right now. So it's map a user to a user or a user to a group. Another caveat is that Keystone v3 is the only way this works. You cannot do it with Keystone v2. Move along. Yeah, we actually didn't mention that. There are some issues like with getting some of the services to work happily with Keystone v3 as of Kilo. I think to be honest, most of them have been resolved now. That Keystone 3 is more mature, I guess. But there were some issues around that as well. Yeah, so Keystone as a v3 endpoint only does not work in Kilo, but it does work in Liberty. They may bake port the patches. It's largely the projects themselves that hadn't quite fully implemented using Keystone v3. Sorry? Yeah. So, well, that's the way we've implemented it, partially because it's a known way of doing things, and it's simple. We haven't gone to the extent of testing whether the service accounts could be in other domains. We may embark on that adventure, but I think there are different problems that are more interesting to solve at this point. Any other questions? How are we on time? We good? Yeah, we are. There's a couple of minutes left, but... Any questions? Thank you very much.