 Hello everybody. Good morning. All right. So we're gonna kick off the first talk today in the Mezos internals track. Vinod and I will be track leads for today. So today as the first talk we have Jay Guil from IBM and Ben Mahler from Midsphere. They're gonna be talking about hierarchical roles and multi-roll frameworks in Apache Mezos. Going to the details about the multi-tenancy features that were added. Recently added as well as covering some of the future features that are coming up. All right. Can you guys hear me? How about now? Okay there you go. All right so we're gonna talk about multi-tenancy. Okay so obviously what's what's the first thing we need to talk about? What is multi-tenancy? Pretty straightforward it's this idea of having multiple tenants share a single software system. So an example this is you might have something like AWS or Google Cloud where you have two different companies Coke and Pepsi sharing that same system. It's that same platform. If you're a single company you might have a private cloud where you have let's say the engineering department and the finance department sharing the same platform and maybe they don't want to see each other and things like that. I think the more interesting question here is this definition of a tenant. Sometimes when multi-tenancy discussions happen people are specifically referring to you know one specific type of tenant like a company but you know how we want to think about a tenant is really it could be any of these things it could be a company it could be different business units within a company it could be teams or it could be individual employees and so this notion of tenancy is really hierarchical right. You have employees they work within teams they work within departments within organizations within companies and so we want to think of all of these things as being a tenant so right now in in mesos what we target with multi-tenancy is primarily using a single mesos cluster within a single company so that means you know you're sharing a mesos cluster across teams in the company or across employees and and so on. For different companies at the current time we would recommend using different mesos clusters and that's because in that that use case requires extremely strong security potentially stronger isolation if you're not okay with container level isolation you need really good prevention of denial of service attacks you don't want one company being able to affect service towards another company and you have to there's compliance things to deal with as well so it's a really hard problem and we don't recommend it right now but we will make progress towards sharing across companies in the future okay so what are some needs for multi-tenancy of course we need security isolation not just when it comes to making sure that different tenants don't land on the same host or or are using containers or VMs but also in terms of the API right you might want engineering to not be able to see sales depending on what you're doing and so you need good authorization and authentication to do that when you're isolating resources on the host you know not only do you need to do security isolation to make sure that the file systems isolated and the network and storage are also isolated but you need performance isolation as well you want to make sure that one tenant can't negatively affect the performance of another tenant and that goes for all the resources that that exist on the host another thing you need is this notion of guarantees about how many resources you can get in a cluster we do that with things like quota to give you a guaranteed amount and we have fair sharing when you want to go over your quota and we might introduce things like priorities and so on in the future and then you also need really good fault isolation right if one tenant is inducing a failure in some way that failure shouldn't cascade into other tenants okay so there's also some end to end platform needs for multi-tenancy right if you're running a scheduler on mesos either that scheduler itself needs to be multi-tenant or you need some software that helps tenants get a per tenant scheduler right you also need software to manage these tenants that exist in your platform and companies have existing LDAP setups that you might need to integrate with so there's software needed for that and then for compliance you might need things like audit logging and if you're building a platform you might want to charge those tents for how much they're using and there might be things that you need that mesos might not tackle like for example network isolation is often done with software defined networking and that's usually provided by you know a vendor or something like that so a lot of these things that you need for multi-tenancy in a platform are kind of out of the scope for mesos mesos is one piece of this but you kind of need an end-end platform solution like something you can build in-house or an existing one like DCOS okay so we're gonna talk a little bit about a little more in detail about what exists for multi-tenancy in mesos today so just as a reminder the vision for for mesos is that you know it's this kernel of a data center operating system and so one of its core responsibilities is is resource management so resource management means a few things it means sharing resources amongst tenants giving guarantees to those tenants about how many resources they can have providing isolation of those resources and then providing accounting so that you can tell on a per tenant basis how much people are using so as a terminology thing in mesos we don't have the word tenant we don't use that word what exists in mesos is this concept of a role and that represents a consumer of a resource and so that's the mapping here when you think about multi-tenancy in mesos is we use these rules to capture you know whether it's a business unit a team and so on okay so what are some features that exist today well I talked about roles and with roles you can get resource guarantees there's two things in place today one is fair sharing with the RF and the other thing is quota quota gives you an absolute amount of guarantee and fair sharing is relative and we have resource accounting to make sure that you can track how much each role is using in the system and then we have a lot of security isolation and performance isolation I won't talk about them here but you know it's kind of if you want to learn more about that you can attend containerization related talk and then we have authentication and built-in authorization as well so this stuff is built in but it's also customizable so if you want to say integrate with a Kerberos system that you have you could build an integration to that okay so Jay is going to highlight two features for multi-tenancy that we recently worked on so I'll pass it to him all right I'm gonna introduce these newly added feature like namely multi-role framework support and hierarchy roles in more details and as Ben just introduced like we normally will have for example multiple teams sharing one marathon instance and that marathon subscribe with Maysos and these teams will launch all different application services using that marathon but from Maysos point of view it's actually Maysos doesn't know that these teams like because marathon subscribe with Maysos with only one role which we currently support in this case is marathon and so Maysos only knows that okay they are jobs running under the role marathon but actually these teams are from and back end reporting they are all launching different workload or applications so the problem with that is you cannot enforce the like a quota or reservations using with these different teams like front and back end because Maysos simply doesn't know about it I mean one solution to that would be you can launch like single what like one marathon instance per team so that Maysos can and they basically subscribe to Maysos using different roles but obviously you don't want to do that for every team in your organization and you're adding I mean because tenant could be one department or but also can be one individual employee but obviously you don't want to do that for every single one tenant so what we want to do is you can subscribe a framework to Maysos with multiple roles so that Maysos is aware of it and can allocate resources accordingly so different teams using that marathon a single instance of marathon and can be allocated with different resources so you can leverage like quota and reservations that associated to the roles and to make better decisions of the resource allocation and of course you we don't want to implement that in the scheduler which meaning you need to implement all these features in every single scheduler not only marathon but we want to have a unified implementation in Maysos to make such a decision so that's basically the basic idea behind the multi-role framework support and for hierarchical role support it's it's quite straightforward like in the organization is often hierarchical so you don't really have front-end back-end report lead at the same level instead you will have for example front-end and back-end under the engineering department and reports and leads under sales department and furthermore you probably will have like front-end running different workload like Rails and UI and API login in back-end so you see this kind of hierarchical structure should be captured by the roles as well being that we want to oh we want schedulers to be able to subscribe with Maysos using this path like syntax so engineering slash front-end slash UI will depict the UI workload under this tree structure and when we come to a hierarchical role the DRF algorithm reservation quota may be slightly different than the flat ones for example we want to enable this a dedication meaning that if you allocate if you set a quota to the team engineering front-end and that team can be able to further subdivided quota to the different workload rails and UIs I want to split that quota and set quota for my sub teams further and also we want to do the authorization isolation for example I don't want front-end to see the siblings allocation being back-end in this case and again to combine this with the multiple support we want like my single instance marathon to subscribe to different nodes in the tree being like all these different path paths so that still in the the whole tree can use single instance for marathon but to manage all the resources allocated to the whole tree and the DRF like it's kind of straightforward because previously was flat so you just compute the fair share of every node and then sort them and allocate the new piece of resource to the node with lowest share in the hierarchical structure you simply calculate fair share recursively at each level so given the resource allocation in this case we will calculate the fair share the result will be like this like an inch department will take like 20% of the resource comparing to the sales department and then you recurse further down to the end front-end and end back-end which take like one-fifth and four-fifth resource of inch department and then you recurse down like you find the node with lowest fresh air so in this case we go to edge first and then edge front-end so we will allocate the new piece of resource to that node so this is like this is quite straightforward when you come to DRF in the hierarchical manner and the hierarchical reservation will be like given such a such a tree and we said we reserve resources to edge like 100 resources in this case and that reservation is actually shared in the tree so it's shared among edge, edge front-end and edge back-end something like this and then we introduce the semantics of refinement so you can refine the when edge front-end receive the resources from the reservation of edge it can further refine it so basically called reserve API to reserve that piece of resource to itself so that cannot be used by the back-end as a simple sibling and when it's done with it you can just call unreserved API to return that resource to the edge which is a pyramid role and I want to emphasize that when you reserve a resource to edge it's a shared among the subtree what does it mean basically for example you have a situation like this you have edge and you reserve resource 100 to it and you have a framework X subscribing to it and the framework X will get all these 100 resources of course but then you decide to divide your edge team further to edge front-end and edge back-end and you subscribe from Y from Z to these two newly added teams then you don't it's not guaranteed that framework X is getting all these 100 resources anymore because 100 is shared among the subtree so exactly XYZ are sharing this reserved resources so it's so basically you don't get 100 for X anymore but 100 is shared by XYZ but normally you will have like we said before one single instance for marathon instance subscribing to all these different roles and get all this 100 and schedule jobs based on their their roles so this is hierarchical reservation and when it comes to hierarchical quota will be kind of similar to reservation you can set a quota to edge again that's shared among the subtree like this and I want to again emphasize that you can dedicate quota it's like refinement but actually you set quota to a sub role of edge so that guarantee is of dedicated to front-end so the back-end won't get front-end it's guaranteed to get 40 out of 100 of the quota in the subtree but this introduce the semantics that you should always set the quota to the parent role because that's shared among the whole subtree and then further you set quota to the sub role so basically this in the subtree the quota is 100 you cannot have the the some of the sub roles like 40 plus 70 you cannot that exceeding the quota of the parent role and therefore you should always set the quota for the parent role and then set quota to the child role so basically you cannot do this because quota is by default is zero and you shouldn't you cannot it's it's allowed to set quota to end front-end before setting quota to edge and similar to the reservation before if you have framework X subscribing to edge first and it's guaranteed to get 100 quota resources and then you decide to add more sub teams to edge so you and then at this point it's not guaranteed anymore that X will get all the 100 resources because that's shared among the subtree so XYZ again will get the quota 100 resources and so to summarize it if you still want the old behavior like you want to guarantee some application to get all the like reserved resources or quota resources you should always subscribe that framework to a leaf role and set quota or reservation to that leaf role and so to basically to get the same behavior as previously you did like to guarantee a certain amount of resources to that application and I'm gonna hand over back to Ben to talk about the future work and roadmap all right so where is this stuff right now multi-role framework support shipped in 1.3 of mesos for hierarchical roles it's still a work in progress we're playing to finish it in 1.5 and 1.4 some of the work is done already but quota will not work as we showed just now so it's not recommended to use it and then you know after we finish this stuff we need to integrate this into multi-tenant frameworks like marathon or Aurora and so on and then you know any end-to-end solutions will need to integrate this stuff as well so I wanted to also talk about some upcoming work that is being worked on already and the first is revocation and then and the second one is a priority tiers okay so you know originally when you know in the past when we introduced research allocation to mesos what we did was we performed weighted DRF to fairly share the resources amongst all the tenants in the cluster and that's the only thing we did and we did this in a non-revocable way which means we can't take these resources back once they're allocated and then later in time we introduced this notion of quota and what this did was it made our single phase allocation become a two-phase allocation the first phase was well let's try to satisfy everyone's guarantees that we gave for quota and then with with what's left let's fairly share that as we did before and so one of the problems with this is that in order to guarantee everyone gets their quota we have to make sure that we leave enough room if they're not using their quota right now to satisfy it at a later point in time if they do want it and so this headroom here is essentially unutilized and there's no way for us to give that out without breaking our quota guarantees that we told people so this hurts utilization and what we want to do to solve that is introduce this third phase of revocable fair sharing where what we do is we can allocate all unallocated resources as revocable and that lets us take those resources back if we need to give it for quota to someone ideally you know in retrospect what we had originally done was introduced just quota and revocable fair sharing but this middle layer here would serve as a backwards compatibility layer for people who expect the old behavior and we would allow operators to to confine how much of this can happen or turn it off completely so that's revocable resources at a very high level and another thing that's been discussed is this notion of priority tiers so for priority tiers what this means is introducing essentially priorities in these particular allocation tiers so for revocable fair sharing what priorities means is that a higher priority tenant can revoke resources from a lower priority tenant whenever they like for quota since it's not revocable the only thing that priority gives you there is like first rights on the resources so if there's an outage and you need to get your core services up those higher priority services can get the resources first and stay running and the other lower priority quota tenants will not get their resources so that's also priority tiers at a very high level and so just just a reminder you know multi tenancy is is really a very multifaceted problem you really need an end-to-end platform that's going to give you you know all the pieces of this puzzle that you need and Mesa's is just one core piece of that puzzle that provides some multi tenancy primitives and that's that's all we have we just wanted to put this up here to thank contributors that have worked on this stuff so far looks like we have quite a bit of time so if you guys have questions we can take those I guess I'll run this microphone around I'll repeat it so it's on the stream I guess okay the question was do I know which version of marathon or DCS support multi role it's none of them do right now it's being worked on at the current time so it's possible the next release we'll have it I'm not sure but it's it's still being worked on so the question was how to transition to using harker role as part of the multi role work we also allowed frameworks to update their set of roles at any point in time currently you have to resubscribe to do that but we'll also add potentially a call so you can do this without resubscribing to Mesa's yeah so the question is about this non-enrollable headroom that I mentioned I'll go back to it so if I drew a line oh you can't see my mouse of course if I you know this dotted line at the top here of the of the headroom is the sum total of all the guarantees of quota that we that we have in the cluster and this middle line here is how much people are actually using of their allocated of their quota right now so if I asked for 10 there would be 10 total here and if I only used five there'd be an extra five that is left unused because we need to make sure that if I later want that extra five that I got a guarantee for I can get it does that make sense does I answer your question or okay we can chat after to you have more questions the question was do I have any examples of of quota and priorities definitely not for priorities it there's no design at the current time there's only been discussions there just there's a design being worked on for revocation as well and then for quota the design I think Michael is actually working on that so I don't know if we have examples right now but there's going to be a design published within the next week or so for the quota part yeah right okay we did that so the question was can can we update the UI to show resource for usage by role as part of the multi-year-old work we did update it so I don't know if you're running the latest version or not but yeah that's the plan is to display this stuff there's a new roles tab in the UI where you can see a breakdown per role of how much is allocated what their quota is and so on that is in it's probably 1.4 oh which is not yet released yeah but it's it's almost released question is what's the timeline for revocation and priority tiers there's no timeline for priority tiers the timeline for revocation I would say is I mean the design is being worked on right now 1.5 will be in two months is that correct yeah so that will be a little tight I think it might be maybe 1.6 or something if I had to guess how cool quote I think it'll be 1.5 I would should be yeah sure do you want to do it or do you want me to so the question was essentially just to clarify a little bit about how this hierarchical reservations work and yeah so I'll I can just reiterate what Jay said and and make sure that you understand so previously I wouldn't have these children here right I would just have engine engine and so when I made a reservation to edge of course only the end roll gets to use that now when you make a reservation to end you're just saying that the entire edge tree is reserved 100 resources so we'll make sure Mesa's will make sure that 100 resources goes to that tree but those resources are going to be shared amongst the tenants in amongst the different roles in that tree now I could I could refine that reservation I could be and front end and I could get some of it and I could further refine it to specialize it to be only for and front end that's this notion of reservation refinement so I could guarantee that like okay I got some of it and now I'm binding it to me so that it can't be shared with all the rest of the edge tree right now anymore and if I unbind that it'll go back to and I see you nodding so I guess it's making sense so far and then what else to keep in mind yeah I think does that is that writes it yeah so in this case like if you had something that you wanted to run at edge still if that's something you wanted to do we would recommend that you run it at edge slash default or end slash whatever you know a specific role to make sure that you can bind it to that thing and it can't go to other um you don't have to reserve to the parent in this picture you could reserve it directly to your leaf role but if let's say an operator said I want to make sure that engineering gets this whole machine by reserving the resources like let's say you have a public machine right it's exposed to a public network and you want to reserve that for only the public network things you could do that and it's then shared amongst all the public network things and they can reserve portions of it is that it that that's kind of one use case that you can imagine yeah yeah I think to kind of add an additional point to what Jay said is you know if you want to use um if you want to use reserved resources and you are expecting those to go back to you you should refine it to your leaf role so in this case like if I'm framework why and I get some of this engineering resources but I know that I I'm running an engineering front end thing I shouldn't assume that those engineering resources are going to always be reallocated to my engineering front end thing I should refine that reservation and make sure that it's guaranteed to come back to my leaf role yeah I mean we definitely looked at both ways of doing it the disadvantage of the other way of doing it where like it's bound to only end here is that you need to two kinds of reservations you need one flavor which is a tree like reservation and you need another flavor which is bound to that internal node in this tree and so I think when we were designing it we wanted to just simplify that and say that all reservations are bound to the entire subtree that they are made on and so of course what that means is we have to tell people use a leaf role if you want to make sure that it's it's you know guaranteed to that particular tenant yeah so the the question was is a reservation tied only like what single resource like cpu or can you make a reservation to multiple resources yeah so a reservation is on a bag of resources you can it can be a collection of any particular resources together yes oh this is 100 units of some resource of course in reality this is going to be a multi-dimensional reservation it's going to say 100 or like one cpu two gigs of memory 10 gigs of disk maybe these maybe these ports that would be a single reservation which is on a particular agent and there could be many of these across the agents so keep in mind that a reservation is bound to one agent it identifies exactly these resources on this agent whereas quota is globally I need this amount that's the distinction between a reservation and quota yeah so this the the reality of this picture today is that if you're a tenant in phase one you're not going to get any resources in phase two and vice versa so um that was because we didn't allow this that was because the intention was to move towards this model where to burst above quota you had to use revocable resources yeah so um in this what we might want to allow in this picture here is exactly what you said where like we might allow a tenant to burst over their quota using non revocable resources but the current implementation doesn't allow that so that might be something that we allow in the future when we do this work uh I don't I don't the question was do we have to do that after we do revocable by default I don't think so no but it's just that's what the current implementation is yeah yeah I think it's we we could do that today we just we just also need to make sure that we don't we don't like the the difficulty there is today if I get non revocable resources I don't know if it's from my quota or from the fair sharing that's happening um and so we want to improve that as well we want to be able to tell someone like hey here's the quota for your tenants in your scheduler and so with that knowledge you know what's going on um and here's maybe how much they're allowed to burst over their quota or how much they're allowed to use like what their fair share is for vocal resources as well so we want to give all that information to the scheduler as part of this work um yeah yeah I mean today quota is not very usable and that's what we want to improve yeah any last questions going ones no okay thank you