 All right, let's get started. So my name is Vinod Khone. I'll be your presenter for the next couple of hours, looks like. I have two talks back to back. The first talk is going to be about fall domains in mesos. It's a new feature that's coming up in mesos. So I wanted you guys to know all about it. Little bit about me. I've been a long time Apache Mesos Committer and PMC member close to four years now, I think. I'm currently an engineering manager and tech lead for the mesos team at Mesosphere. And before that, I was kind of doing the same thing at Twitter when Mesos was getting its wings in production. And even earlier than that, I was doing a PhD in computer science on a completely different topic. Nothing related to what I'm working on today. So let's start with what do we mean by fall domain or what do people typically mean when they say they want fault tolerance and they want to know about fall domains? So fall domain, you could think of it as a group of nodes that share similar fall characteristics, which basically means if a fault happens, typically all these nodes in that domain are going to get affected. If you're using on-prem data center for running your cluster, this fall domain is typically your rack. Because a rack is serviced typically by one top of rack switch and may be connected to a PDU. So if a switch fails or your PDU fails, all the things in the rack get affected. They get disconnected from the cluster and they might get powered down. So in on-prem DCs, racks are your first level of fall domains beyond nodes. And if you're using public cloud providers, you have the concept of zones there, which you can think of the first level of fall domain as well that you need to think about when placing your apps. So what are some of the use cases for fall domains? The first one is obviously just touching upon the ability to do fall tolerance scheduling. And this is really important if you want to launch highly available applications. And most of the applications with microservices today tend to have a lot of instances of each application for load balancing. And you want to make sure they're spread out over a very wide fall domain so that they could tolerate like an AWS zone going down or your rack going down. And it's important both for stateless and stateful applications. As you can imagine, for a stateful, it's even more important because a lot of stateful applications have the concept of replication. So for the application to actually work correctly, you want your stateful applications to be sharded nicely across fall domains so that your application is not going to do bad things because of your bad scheduling decisions. So let's look at some example of what a bad scheduling could look like for a stateful application. Let's say you have a stateful application that has nine instances of it, nine shards, let's say. And you want to put them on three racks. One way you could place them is one shard on one rack and a bunch of them on rack two and a couple of them on rack three. This will help you tolerate rack failures, fine. But the replication here, when you enable the replication in your Cassandra or HTFS when you deploy it this way, it's probably going to melt your rack because a lot of cross-rack traffic that's going to go on because Cassandra, once you tell it that it has to do rack-aware replication, it's going to pick nodes in different racks to put its replicas, which means there's going to be a lot of traffic across racks, which is not what you want. Of course, the better scheduling would be to uniformly spread these shards across racks so that your network traffic for replication is more uniformly distributed. And you don't try to melt your racks or switch or anything like that. So that's one use case. The second use case is, of course, a hybrid cloud. And how do we enable that more in a first class way in Meso's ecosystem? As you probably heard in Ben's keynote talk this morning, there's a lot of use cases for hybrid cloud in the Meso's ecosystem. I think his slide showed that about 25%, 26% of people actually run it in hybrid cloud settings. And by hybrid cloud, what we mainly mean is an ability to extend your primary cluster with some secondary infrastructure. And for most organizations, this looks like an on-prem DC connected to your cloud providers infrastructure like AWS, for example. You usually use some kind of direct connect between your on-prem and your hybrid cloud. And you want to add some extra capacity on demand. And the reason why a lot of organizations like to do this is because a lot of them have seasonal traffic or they have to provision for their peak load sometimes or unexpected burst sometimes. And it's really hard to provision machines in your on-prem. It has to go through a lot of processes. It's a lot more easy, even if it's more expensive to bring that up in a public cloud. You just pay using credit card and you get your resource and you just add them. So it's a lot more flexible to add machines in a public cloud than trying to provision them going through IT department and finance and getting this machine set up in organization. So we want to kind of make this use case be easily achievable in the Mesos cluster. So a couple of things that you need to really think about when you're trying to support this hybrid cloud scenarios is that the latencies are different based on where your agents are running. If your agents are in your on-prem DC, right next to your master's in the same DC, the latency is very low, like tons of milliseconds probably, but if you have some agents in a remote region like a public cloud, the latency between them and the master's are going a little higher, which means the apps that are going to run on them are going to have higher latencies to any of the resources that they might need in your on-prem DC, like a DNS or a load balance or what have you. If you're using some database in your on-prem and you want your cloud apps to actually use them, you have to be really careful that the latency is not going to be very low. And the fault characteristics, again, are going to be a little different because you don't control the infrastructure. For example, if you're using a public cloud, when they go to maintenance and when something happens in AWS or GCP or Azure is not complete under your control. So the faults on your on-prem and your hybrid cloud is a little different. You need to take into account of that fact. And of course, we also want to have users to have a lot more control on where their apps land. We don't want an operator to come and attach a bunch of nodes in a cloud provider to on-prem and suddenly all the apps start spanning different regions and different fault domains because people are going to be surprised, especially if that's going to associated with cost. If suddenly a lot of your apps are going to AWS and you suddenly see a big bill, at the end of the month, you would probably be pissed. So we want this to be very explicit. When they want to go to an on-cloud instance for their app, they have to be very explicit. And it probably has to be behind some kind of access control so that people consciously make the decision. We don't want it to be a homogenous cluster. It has to be special in some way. So what are some of the existing solutions on how people do this or achieve some of this today? Most of the people in the Mesos ecosystem do this via attributes, user-defined attributes. So if you're not aware of attributes in Mesos, it's essentially a set of free-form labels that you could attach on the agents when you bring them up. And once you attach some labels to them, you could, most of the frameworks in the Mesos ecosystem, like Marathon or Aurora or others, usually have this concept of placement constraints that can run on top of these attributes. You could say unique host or target with attribute rack or something like that, which is nice, which works most of the time. I think people could use attributes, and that's what they've been using so far. But it's not really a first-class way or not really a very nice thing to do for something as important as fault tolerance, mainly because frameworks and apps, when you're writing something that depend on attributes, they are not really portable. Some organization or in some cluster, they might call the fault domain rack. Someone else might call it a hall or a cage or a switch or DC. So if you write some framework that can understand racks and you try to deploy that framework in a different cluster that has a fault domain called DC, then it's going to be really hard to write such a framework. Same thing with apps. If you have an app definition and you want to move it across different clusters and all of them talk different attributes, like there is no consistent way. It's not really great experience. And also the other important bit is with attributes, it's very misos agnostic in the sense that misos doesn't really look into those attributes at all, like of an agent is configured with a set of attributes. Miser's master knows about it and just forwards it to the frameworks it offers. Like it doesn't do anything. It doesn't control. It just does some format checking, but that's it. Like what value should be there, what key should be there. It's all free form. We don't enforce anything. So we can't do anything smart. So for example, if you want to do something like control to users, we can do that really easily because we cannot enforce what attributes people use. So those are some of the limitations, right? So given that, what are our main goals for this kind of feature when we set out to build this? The first thing, as I said, was we wanted fall domains to be a really first class primitive in Miser's because it's a really, really important thing for a lot of people that use large-scale distributed systems and run these microservices. And we wanted to have a common terminology that all frameworks and apps can depend on, not just use some free-form attributes. It has to be some very common and well-known primitives that they could all depend on. And of course, we want to support both on-prem and on-cloud deployments. A lot of Miser's users use it just in their DCs. And a lot of them just use it in AWS or GCP. So you want to support those. But of course, you also want to support the hybrid combinations of them if you want to combine them in some way. And also, the last important thing is we want to have some sensible default behaviors when you configure such a cluster in Miser's going forward, especially for hybrid settings. If you enable a Miser's cluster that can span on-prem and cloud, we want to have some really sensible, non-surprising behavior. So what did we come up with? So at a high level, the solution looks like this. We have introduced a new primitive called fall domain. And it's kind of wrapped in an outer message called domain info so that we could add more kind of domains in the future, which we'll get to. And then we added two level of hierarchies for our fall domains. We're calling them regions and zones. And then we also added a new capability called region aware. That's going to be useful for frameworks to opt into this behavior. So I'm going to talk about these in details next. So let's look at the fall domain protobuf itself. As I said, there's a high level protocol domain info. And the reason why we put it under that, instead of just having a high level fall domain, is because we thought we would add more sorts of domains in the future. Some of the domains you could think of are cost domains. Maybe you want a schedule based on cost. Maybe you want a schedule based on latency. Latency could be a different domain that you could add. Right now, we're kind of mixing both fall domain and latency domain in this one thing because it's kind of easy to understand. And most people don't think about those things as different. But you could imagine a fall domain hierarchy being different from a latency domain or a cost domain. So for now, for MVP, we just did fall domains. And as I said, it has only two things. Two levels. One is regions and zones. And there are also protobuf messages to kind of help us extend it in the future. But right now, they only have names in them. And the two levels that we have today are regions and zones. So region you can think of as the highest level of isolation that you could think of with the caveat that the latency between those regions would be potentially high, like 5,200 milliseconds or so. So if you want really good isolation, they don't share anything between regions. They probably are connected through internet. They have their own PDUs, their own facilities, own cable providers, and whatnot. That's the kind of fall domain you should target. Typically, the mapping for a region is if you're using an on-prem cluster, you could map that to your DC. If you're using a cloud provider, it maps directly to the regions. The next, the lower level of the fall domain is the zone. A region contains one or more zones. And zones are typically what you would think about when you're trying to do a stateful application and you want to have high availability. You don't want them to cross regions. That's crazy. You want to have them cross zones within a region. So that's because the latency between them is very, very low. So most of your latency sensitive apps are going to tolerate if they have to do cross-app communication between them across zones. It has a moderate degree of fault toleration, the expectation of it. So you could think on on-prem, this is like a rack. And on cloud providers, this is an actual availability zone, as they call it in AWS. And I think Azure also started doing zones now. They were only doing regions until a few months ago. I'm pretty sure Google also does regions and zones. So that's how they map to. So one important question that I wanted to chat about here was there was a lot of discussion around, why did we just pick two levels? Why didn't we have arbitrary levels or more flexible levels? And the reasoning was we thought this would capture the 90% use case of what people would be interested in. They're mostly interested in running very latency sensitive apps in a highly available manner. And that's what zones is for. And there are some people who want to launch stuff in different regions because they want to be close to their users, for example, like Netflix. And they might have a lot of traffic close to their regions because their users are coming close to this region. So they just want to have instances of their services in different regions. And this, we thought, captures 90% use case. And if we did something very generic, like level 0, level 1, level 2, to level n, like writing, again, portable apps is going to be really hard. And for frameworks that want to expose this to their users is not clear to us how they would expose it in a way that makes them still portable. So because of the portability reason and simplicity reason and the 90% use case that we thought most people care about, this is where we landed on. But the portable itself doesn't stop you from adding more things inside. For example, for domain, if you really think there is another level that 90% people care about, we might add it. But for now, we think we're pretty confident that most people can actually work with this. And we talked to a lot of our committee members and customers about the regions and zone two levels. They always start off with, oh, my data center is a little differently configured. But when we talk through to them, they can understand, OK, this is actually a lot more simple and easy to program against and explain to users on what the semantics mean. So that's why we can end it up here with just region and zone for now, just two levels. For latency-sensitive use zones, if you're not latency-sensitive, use regions. That's pretty easy to talk about. So to go before going to a little more details, let's talk about some of the terminology so that we're all on the same page. So since we always want messes to enable new features in a backwards compatible manner, we cannot really add stuff and make it required. We want upgrades to always happen without losing any of your tasks or anything like that. So which basically means we allow agents or masters to not have fault domains. That's totally OK. That's a backwards compatible way to do an upgrade. And if Node is not configured with a fault domain, it's called it's in a default domain. And we also have this concept of local region and remote region. So one of the sensible decisions, as I said, sensible behavior that we wanted to enforce was to make sure masters don't, messes, masters don't span regions, because they're supposed to have high network latency between regions. If you run your masters across regions and you have to run your zookeepers probably across regions, and since masters have to do quorum rights, it's not going to be good if you have to do cross region zookeeper. So we actually enforce in messes that you don't configure masters with different regions, for example. So yeah, so local region is basically the region that contains all the masters. And all the agents that are in that region are also considered to be local agents. And all the other regions that do not contain masters are called remote regions, obviously. And any of the agents that run in the remote regions are called remote agents. So we have local region that contains all masters and some agents, and then remote regions that contain just agents. So how does that work? How do you enable that in the messes cluster today? So there's a new command line flag that we added called domain. And it just takes a JSON object, which is a direct translation of a protobuf. So you can set it as shown here. You can set domain, fall domain, region, and zone. And you can do it for both your masters and regions. So there's a top-level primitive. It's not an attribute anymore. And then on the master side, what we did was the master's domain info is actually stored in the master info protobuf. If you're aware of the internal details of messes, when you're framework registers with messes, it gets, it's given the master info as part of the registration process. And this is really important because when you put the domain info and master info, frameworks get to know what's the region of the master. That's how they get to know during registration. They get the master info, and they know, OK, this is the local region that I should be aware of. And as I said, masters are not allowed to span regions. This is one of the sensible behaviors that we wanted to enforce. But they can, of course, span multiple zones within the local region, which is actually we highly recommend people do it if you're launching your masters in a local region. Please try to spread them across different zones because if your acts fail or zones fail, you don't want all your masters to go down at the same time. On the agent side, the agent's domain info is again stored in agent info, which is checkpointed to the local desk. And the thing that the master does is it includes that domain info when it's sending offers out to frameworks. So every offer that gets sent from now on actually includes the domain of the agent. So the framework can actually look at the offer and say, OK, this belongs to this domain, and this belongs to this domain, and this is how I want to spread my apps. So I'm going to look at that information and do it. And right now, one of the limitations that we have as of masters 1.4 when this was released is just like attribute changes, domain changes also need a drain, which is not great. So we're going to fix it. We're going to fix it in masters 1.5, where configuring an agent with a domain does not need to require a drain of the agent of all the tasks. So you can still keep the tasks around and still configure the fault domain. That's not there yet. So I won't recommend using it for your old clusters if you want to upgrade and enable it. If you're bringing brand new clusters, just enable it. That's fine. But if you're using old clusters and you want to keep your tasks, you don't want them to be rescheduled. Wait for 1.5. That's really important for you. So on the framework side, the changes are expected to be really simple. They're supposed to register with this new region ever capability. And this is, again, one of the sensible default behaviors where if a framework doesn't register with this capability, we do not give them offers from remote agents. So this, again, for backwards compatibility, we don't want frameworks or users to be surprised when someone adds a habit cluster hybrid nodes. We don't want the apps to automatically span because the framework is not looking into the domain information. So any legacy framework that's currently running, they do not get offers from remote regions. They have to explicitly opt in. And we hope when they explicitly opt in, they're thinking about, OK, now if I opt in, I'm going to get offers from remote regions. I need to be really careful on what apps should replace where. And once a framework actually registers with the region aware capability, we hope or we recommend that frameworks actually expose this to their users as well. Because once they register with that capability, they're going to get offers from everywhere, remote regions and local regions. And their users should explicitly ask for being deployed in a remote region. Otherwise, the framework should not try to deploy them in a remote region. So that's our recommendation. We can really enforce it. We can just guide them on what the right strategy is. So next, I wanted to explain some examples with how Marathon could use this fall domain concept to do some of the use cases that we talked about. This is just an hypothetical example. Not sure yet if Marathon is actually going to go with this sort of constraints. There are some new constraints here like ease and in. I guess we didn't talk about in here, but there's a new constraint called is, which doesn't exist in Marathon today. But this is a hypothetical example of how Marathon could do it, for example. So for example, you want to enable a use case that says, OK, schedule my app in a remote region, not the local region. You could just set a placement constraint still with a new key call at region. So when you say at region, it basically means don't look at the user defined attribute, but look at this first class property of an agent called region. And when it matches the ease, let's say I want to land it in AWS. Each one region, I put this constraint on it. It goes there into cloud. It's awesome. And if you want to do something like a highly available app placement, like for example, I want my app to be even spread across zones. You could actually do a zone group by three, for example. If you want to spread by three, that's a replication factor. You can do something as simple as this. So it's going to come into Marathon. Hopefully in the next release, I think the Marathon folks are trying to figure out what's the best API to support regions and zones. But this is likely going to be how they're going to expose it. And they hope more and more frameworks are also start using the regions and zones and expose it in their APIs as well. The last thing I want to chat about was upgrades. So upgrades, we take really seriously in Mesos. We always want to have a path for a Mesos cluster to be upgraded without losing tasks. That's one of the core tenets that we had for a long time. So what does it mean here in this particular feature? So we do allow masters to be in a mixed mode. By mixed mode, I mean some masters are configured with fall domain, and some masters are not configured. This is required because you're going to upgrade your masters one at a time. So we don't expect that all masters are suddenly going to come up with fall domains at the same time. You upgrade one, go to the next one, upgrade that one. So there's always going to be doing the upgrade case where some masters don't have fall domains, and they do have fall domains. So we do allow that. The one restriction that we have, if you're going to enable fall domains on agents, is that you have to do masters first and then agents. We typically mostly don't require an upgrade order. We always say, do masters first or agents first doesn't matter. But in this particular case, if you want to use this feature, if you want to enable this, you really want to do masters first and agents next. And this is, again, for one of the sensible defaults that we wanted to enforce was if someone acts, adds a remote agent from a cloud provider, for example, to your cluster, you don't want it to automatically show up to your frameworks as a local agent. That's going to be really bad. So for a master to be able to note that, it has to have a region so that it can compare with the agent's region and then say, OK, it's actually remote or local. So that's why we said, OK, masters have to be configured first with fall domain before an agent needs to be configured. If masters are not configured yet and the agent joins or tries to join the cluster with the fall domain, we don't allow it. The registration attempt is disallowed. So this is, again, some sensible default that we could do because we have first class primitives now. So this is a table that you could look at to reference when you're doing upgrades to think about what works and what works in different combinations. So when both are set, masters and agents domains, it's pretty obvious if the regions don't match, the offers are only sent to region-aware frameworks because they are remote agents. If the agent domain is not set, but the master domain is set, that's allowed because that's an upgrade path, right? You're first two masters. And while you're doing agents, some agents don't have domain set. So that's still allowed. And they're sent to all frameworks as normal. But if the master domain is not set and agent domain is set, as I said, that's not allowed. We don't allow such agents to come. If both are not set, it's as usual. We just think everything is local and then send it. It is same as today. We don't do anything. We're going to add a new flag in 1.5 as well, which I didn't put up here, which allows agents to not be talking to the master if they don't have a configured domain set at all. This is after your upgrade is all done, you don't want any more new agents to accidentally come to the cluster without a fall domain. So we're going to add a flag saying, don't allow non-configured agents. That will help you from operators trying to shoot themselves in the foot by adding agents in a new remote region without configuring it. So we're going to add a flag that will make it even less likely that the operator is going to make a mistake, a fat finger. So to wrap up, what's the state of this feature? So the fall domains themselves were released in the last list of methods, 1.4. They're still experimental. Of course, we want features to be upgraded to stable ones. Enough people use it in production and we're conference. So we always start off on new features with experimental. Same with this. As I said, agent domain reconfiguration without drain is not yet possible. But it's going to be made possible with 1.5. That's our goal. As a bonus of doing that, I think what we're going to also enable is ability for agents attributes to be changed without draining. That's been a long requested feature in Mesos. And I know it caused a lot of hardbun for a lot of people. So we are going to come up with a solution in different phases. But all the phases are going to help not just domains, but attributes as well and hopefully resources as well in the future where you can change some stuff and not require drain. I'm really looking forward to it. That's kind of exciting. That's been a ticket that filed three or four years ago, I think, to be able to change attributes in Mesos. So we're finally going to do it. That's awesome. So just to wrap up, this work was a joint effort by a lot of people, especially thanks to Neil Conway, who did most of the work on implementing the fall domains feature in Mesos, and a bunch of discussions and reviews from a lot of people, including Ben and Anand, and Yoris, who did some of the earlier work on fall domains. That's pretty much it. I linked the design doc here for people who want to take a deeper look into how the design looks and implementation details look. So if you're interested in it, the slides should be in shed.org already uploaded. So take a look at that and click the link. I'll get to see the details. All right, that's pretty much it. Happy to take questions now. Yeah, so the question was, if you allow the agent's region to be changed, especially if you want to change the region, how is that allowed? So I think what we're going to do, at least in the phase one, is we're going to have a flag on the agent that explicitly says allow agent reconfiguration, which basically means we're giving operators the power to say that, OK, I know in my organization people change the domains all the time, and most frameworks don't care about it. So I'm OK to change my agent and keep the task still running. Maybe the task don't depend on the region property at all, and I know that in my organization, so I'm willing to do that change. That's phase one. That's a very simple flag that you're going to give to operators saying, OK, this flag, when you said we allow anything to be changed, we keep the task running. If they violate their placement because of the way the following was changed, that's the risk that the operator took when it changed it on this flag. In the future, and that's the more interesting part, what we're going to do is when someone's trying to change the fall domain or attribute, we're going to send a signal to the framework, to an inverse offer, saying that, hey, this is going to be changed, and you have some stuff running there, tasks or volumes or reservations. If this violates any of your constraints, please move them somewhere else. So that's the future we want to go to where we want frameworks to be involved when you make any of these changes because it's kind of bad that we allow them to change underneath them if they have tasks running. So that's going to be phase two, where we say we still allow operators to change attributes or domains, but we are going to ask frameworks to move their stuff somewhere else if it violates them. So it gives us more flexibility to basically do that. So that's currently the plan. Yes. So the question was, what happens if you upgrade masters and if you upgrade your agents and do not set fall domains? So they would be considered local agents if you do not set any fall domains on them. So you could upgrade your masters to 104 and set fall domains on them, maybe upgrade your agents and maybe do not set fall domains because you don't want to lose your tasks. So agent domains won't be set. Masters domains will be set. So that's still allowed. They're considered just local agents. So all the tasks will still land. Frameworks don't get to see the domains of those agents because they're not configured, but that's still allowed. So your tasks will still get running. New tasks can still launch. They're all considered local agents. Stuff is fine. If you want to configure, once you upgrade to 104 and if you want to configure fall domain, that's on the agent. And if you're running tasks, you're supposed to drain it and then enable the fall domain as a new agent. So that's the workflow that we're going to recommend for 104. Or wait until 1.5 and you don't have this problem. Yes. So the question was, do I see a future where everything becomes a domain and when attributes go away? I don't think attributes are going to go away because people use it for lots of things that are not even related to fall tolerance. For example, people use it for tagging the operating systems running on them or the kernel versions running on them or the licenses that they have on them or whatnot. So there's lots of things that are unrelated to fall domains that people use attributes for. So it's a completely freeform way of labeling nodes. So I don't think it's ever going to go away. I think what's going to go away is some of the fall domain things that people are currently encoding in the attributes like RAC and DCs and stuff like that. I hope those things get moved away from attributes to fall domains. That's what my expectation is that it's going to happen in the future. Any more questions? Yeah. So the question was, how do stateful applications like Kafka know about the fall domains themselves? So this is something at least at our company where we also build a lot of stateful frameworks using a DCS SDK are going to use the fall domains. So the fact that the fall domain information is injected into the offer when the frameworks get the offer, they know when they launch a Kafka or Cassandra what zone they're running on. And the way they're going to run them is they're going to inject the zone value in the configuration of Kafka. So Kafka takes usually dash, dash, rack. Cassandra takes dash, dash, rack. So when they land that app, they're going to take the zone information that they got in the offer, get the value, and store it and pass it as a command line argument to their Kafka replica. So that way all the Kafka replicas know what zone they are in so that when data gets placed in Kafka, it knows how to replicate it across the racks, for example. So a lot of our stateful frameworks are going to utilize this pretty soon. It's a pretty big feature for us, for a lot of our SDK frameworks, Cassandra, Kafka, HTFS. All of them will be injected with the zone information once they get that information in the offer. Yes, if you use this as common sense, you get that for free. If you're writing your own framework, again, as a framework, you're going to get the zone information in the offer. So you could take that and put that in your replicas configuration, or whatever your task is. You can always pass it when you launch it. Because before you launch, you know what the zone is. So you should be able to, if you're writing your own business framework, you should be able to do that pretty easily as well. Cool. OK. So the question was, what's different between fall domain and attributes in terms of scheduling? Does it affect scheduling? As I said, the biggest differences are Mesa's is actually going to do something intelligent based on the domain information which it couldn't do before on attributes. For example, it's not going to send remote region offers to frameworks that do not have that capability, or it will not allow agents with a fall domain not configured if that flag is set. And a lot of these things, all are scheduling related. These are all the things that Mesa's is going to take a decision on. And these are the things that it won't ever do based on attributes, probably. Because attributes, we don't control. There's no concept of first class attributes. So you can think of domains and host name. For example, all of these are first class attributes in some sense of Mesa's. So that Mesa's can actually look into them and make some decisions, whether it's scheduling related or something else. And attributes are just free form. We just pass it through, completely opaque to Mesa's. It's just a contract between the framework and the operator. Like Mesa's doesn't come into the play. With first class limiters like fall domains, they come into play. Mesa's allocation algorithm is actually impacted by the domain values and the combinations of them. Yes. So the question was, is this a step towards federated Mesa's measures? Yes. We haven't yet nailed down how exactly in the federation world these domains come into play. But this is one of the things that we thought was a good primitive to go towards federation, especially if you want to allow masters in different regions to be able to join a federation and then expose resources. They should probably have this concept of regions so that the federation control plane can actually look into that and make some smart additions. So we hope it's a step towards it. But it's still very nascent, I would say, in terms of how the control plane would look in a federated world. This is just like a very baby step towards it. That's what my take is. Cool. Any more questions? Cool. All right. Thanks, guys.