 Okay. Excellent. All right. Good afternoon, everybody. So I'm Ramki Krishnan from Dell and here we have Tim Hendricks from VMware and other folks would like to thank them, but they're not here, but remotely attending, hopefully. Also, as we know, Green Data Center is everybody's favorite topic and we see a bigger opportunity with NFE and that's what this talk is going to be focused on and how policy can be used in NFE environment for driving energy efficiency. So quick run through the agenda, motivation will go through an architecture and actual implementation in OpenStack Congress and also a quick demo. So to give you the business side of NFE, you know, the big problems, first is capex savings. How do you save money through NFE? And next is can you offer value-ordered services? The next question, the next is operational efficiency and savings, right? And the biggest challenges when it comes to NFE are it's a new paradigm of a disaggregated hardware and software and distributed data centers. I mean, it's not the conventional cloud data centers. The big ones, for example, you open up an Oregon where energy is cheap and then you run, you know, massive workloads. Here, they're actually in network. For example, there is a central office in San Francisco which is being virtualized and those are the type of challenges we face, really distributed in network. So that means we have various constraints in terms of capacity, energy and so many others. And here is where we see the big opportunity, you know, specifically among the three problems we're talking about, especially through operational savings, through policy-based and analytics-driven, not static anymore, dynamic, analytics-driven resource optimization. And also the other is aspect not to forget the regulatory aspect, right, which is especially important for energy efficiency. That said, I want to give a quick overview of specifically the NFE infrastructure policies and resource optimizations, you know, which could or important to consider. So essentially, what we see is today the policies in the orchestration layer are static, more placement policies, you know, how do you place a virtual network function. But here, the bigger opportunity is around reactive enforcement policies. For example, today, when you say talk about workload consolidation, it's more of static time base. At this time of the day, when things are quiet, you know, I just turn off a few servers, right, to optimize energy. But what you're looking at is much more dynamic during different times of the day. That means it has to be analytics-driven. And an example would be perform workload consolidation periodically if average overall utilization falls below x percent. That means it has to be monitoring analytics-driven, not just the static time base policy. And also while performing such policies, it's important to keep in mind all your HA considerations, you know, as explained by the other policy where not more, when one be over the same HA group can be on the same physical server. So this is where, and now the interesting problem is to translate these policies into an optimization problem given all the constraints around compute, storage, you know, network and energy efficiency and achieve it. And what we see is typically though linear optimization techniques can be used, there is also nonlinear optimization techniques needed for certain cases. And also I want to point out that all this work is one of the key work items just pursued in the IRTF NFE research group. And there is a talk on it today. Please don't mess it. And I'd like to hand it over to Tim for more deep diamond to the OpenStack Congress architecture and implementation. Great, thanks, Rampy. Okay, I work on Congress and as you may or may not know, Congress is all about policy. And so when I was chatting with Rampy about this a while back, there were a few things about what he's gone through that really caught my attention about this particular domain of policies. The first is that these policies are naturally expressed as both statements that have to be true, that you cannot ever violate hard constraints, and statements that you just want to avoid if you can. You can violate them but try not to. Okay, and so we call these soft constraints. And so that was the first thing, is that the policies that people write or have this combination of hard and soft constraints. The second interesting thing was that the way that you enforce these kinds of policies is often done by simply migrating VMs around. And that's sort of baked into a lot of these kinds of policies. And then the third important thing about this kind of policy is that it's the kind of thing that you have to continually enforce. So even if you've deployed them properly and policy is satisfied, then the workloads are going to change and eventually you're going to have to reinforce policy. You're going to have to re-migrate VMs. Okay, and so these are the sort of three most important things about this domain that I found interesting. And so we're going to explain how we've tried to address these kinds of policies within Congress. But in order to do that, I was going to give you a quick overview of Congress for just a couple of slides. And so the important thing about Congress is that it's a service just like any other service, Nova, Neutron, or Cinder. And what you do is once you've installed it in your data center, then you really give it two inputs. The first input is up here already. It shows Congress connecting to all the other services in the data center that you want to write policy about. So here we've connected up to Neutron and Nova, Cinder, and Swift. That list is pretty much random and it just coincides with what images I had sitting around. Okay, but you can connect to any service that you like. Okay, so that's the first input, the services that you want to write policy about. The second input that you give to Congress is a policy. The policy describes how or how those services ought to behave. Okay, so the policy says what should happen, the desired state, the services allow Congress to understand what actually is happening. And now Congress's job is to make what is happening coincide with what is supposed to happen. Okay, and this is sort of a standard spiel about policy systems. You tell us what is supposed to happen and we figure out how to make that happen. There are a couple of design goals for Congress that make it different from many other policy systems. The first is that Congress should be able to accept any service as part of its input. So remember I showed you Nova, Neutron, Cinder, Swift. That's a great list. We also want all the other open stack services to be able to hook up to Congress. We also even want non-open stack services to be able to talk to Congress. Why? Because every time we go and talk to a user about the kinds of policies they actually want to write, they typically have some proprietary service that they've built that they'll never tell anyone outside their company about. And they want to use the information that's contained within that service to write policy. So if we have to support proprietary services, then pretty much we've got to support any service. Okay, and that sort of goes hand in hand with, go back one. That goes hand in hand with the second design goal, which is that Congress ought to be able to allow you to express any policy. Now, obviously we want to be able to write policy about compute and about networking and about storage, but we also ought to be able to write policy about how compute and networking and storage all interrelate to one another. We also ought to be able to write policy about those proprietary services. We also ought to be able to write policy not just about the infrastructure, but about applications. We also ought to be able to write policy about security and cost and business. So when I say any policy, what I really mean is you ought to be able to write a policy about any domain. Okay, so I don't have time to get into how. Those are design goals, keep in mind. And I don't think I've ever found anybody who has said those goals are not the right ones. Everybody likes the goals. They always want to find out how. Well, I don't have time to go into the how. If you want to find out more about the how, we've got another intro to Congress tomorrow. But what I will do is I'll give you an example of the kind of policy that we can write in Congress and then how we would express that in Congress. And these are examples that are drawn from this NFD energy consumption domain. Okay, so the first policy statement is talking about server utilization. And so this first policy statement says that we want it to be the case that the memory utilization of every server is at least 75 percent of its capacity. So that's a lower bound on the memory utilization. And this is going to be a soft constraint. This is going to be something that we'd like to be true, but we're not always going to force it to be true. Okay, and the second statement is that a physical server's memory usage must not exceed its capacity. All right? And this is going to be a hard constraint. We never want to violate this. So this gives you the kind of flavor of the kind of policies that we'll see in this energy consumption use case. And just looking at those English policies, if you think about it for a minute, well, what kinds of services might you want to consult with, if you were going to check, just use a person, whether or not this policy is being satisfied? Well, the services in OpenSec that you consult with, whether Nova, you need to talk to Nova because Nova tells you what all the servers are and the memory capacities are. And you need to talk to something like Solometer because it tells you what the actual memory utilization is at any point in time. Okay, so intuitively you need those two services. All right, so now I'm going to show you how you write those policies in Datalog. Now, I always hesitate to do this because Datalog is what I like to call the assembly language of Congress, okay? But I'll walk through it slowly, more slowly than that. I'll walk through it slowly so that, and I'll just explain it as we go. And I think it will be very clear, but keep in mind there's a UI for inputting this so you don't have to actually write it. Okay. All right, so the first thing to remember was a soft constraint. It said you have to use at least 75% of your memory capacity. All right, and so what we're going to do is we're going to use one of two keywords, warning in this case. There's warning, there's error. Warning means soft constraint, error means hard constraint. So a particular ID, let's think of this as a UUID, is it a warning if? If we go and ask Nova, Nova actually tells us that this ID corresponds to a host. All right, so this is a physical machine. And we also find out the zone that that machine is located in, the memory capacity for this machine. All right. And in this particular case, we only want the policy to apply to particular zones. So there's some other condition that we put in there that says whether or not this zone is the right one. Okay, and then we go and ask Solometer. We'll tell me the memory consumption, and here I'm hiding some details, if you know Solometer, it's not quite as simple. Give me the memory consumption, the memory utilization for this particular ID, and we'll call that AVG. Okay, and then the test here is actually that the memory utilization, AVG, is less than 75% of the capacity. All right, so this is pretty much a straightforward translation of the English into some machine-understandable language, right? Okay, so that's a soft constraint, and now just reveal the rest of it. So the second constraint, remembering English, was that the memory utilization can't be more than the capacity. Okay, so here we're going to use a keyword error. This means it's a hard constraint. Go ahead, and the first three of these, the last two, first three of these are exactly the same as above, which I've already talked through. The only difference is that the last statement actually checks that the average is greater than the memory capacity. Right, because we're writing what the conditions are for warnings and errors here, we're not saying what must actually happen. All right, so now imagine you're in a world in which somebody's giving you this kind of policy. Maybe some of the details are different, right? You don't really care what the bounds are. You don't really, there may be some structural differences, but you're given this kind of policy, and so the question is how do you want to enforce this policy? And so, you know, if it were me, what I would want to do is I would want to build some special purpose code that takes this kind of policy as input, and what it does is it says, well, I know that if I migrate VMs around, then I can actually enforce this policy. Okay, and so in Congress, part of the reason that we did this work was to understand how we could fit such a special purpose, such a domain specific policy engine into the Congress framework so that we could have Congress as it is today interacting and utilizing all the strengths of this domain specific, this special purpose piece of code. Okay, and so the way that you would set this particular problem up is that you would connect to Congress Solometer and Nova, right, because those are the services you need to give you the information about the state of the data center for Congress to understand whether or not policy is being obeyed. And then you'll also hook up this special service that for the sake of this talk we went ahead and built, and we're calling it a VM migration service. Okay, okay, and then the workflow that we want to see is that the administrator there at the top is actually going to write this policy that I showed you a moment ago, and he's going to hand that off to Congress. Okay, and then Congress is going to is going to look at this policy and it's going to say, well you know what, actually maybe the admin has given me a whole bunch of policy, remember about networking and storage, as well as these energy consumption policies, and it's going to look at this giant policy that's been given over time, and it's going to pluck out the portion that's relevant that can be enforced by this domain specific VM migration policy engine. That's step two, and Congress is going to delegate that off to that the main specific policy engine, and say you go off and enforce it, do the best that you can, I trust you, what delegation is after all, and then that VM migration service is going to be given that policy, that data log policy, and it's going to be also going to get the data from NOVA and from Solometer about what the current state of the data center is, and then it's going to figure out, okay I know all my VMs are here, and if I move them over this way, then in fact I would satisfy this policy, all right, and then it's going to go ahead and migrate those. So step three is the VM migration engine goes ahead and migrates the VMs to satisfy policy, but of course remember from early on what I said was this is a policy that needs to be constantly maintained and enforced, and so at the end of the day if after that VM migration service migrates VMs around to satisfy policy, if the workload characteristics change, then that VM migration service needs to re-migrate those VMs, and so perhaps it's Solometer, maybe it's even NOVA, maybe there's a new server or a new guest that's shown up, Solometer and NOVA give that new information about the state of the data center over to this VM migration service, and then it figures out okay now I need to move VMs this way, all right, and so that's the whole workflow, so it turns out that Congress already implements one and four, so those are baked in, and so the interesting thing about what we had to do to make this happen was we had to implement steps two and three, and so I'll just give you a brief overview of what we did there, and it's a pretty new bit of work that we did is really just demo caliber code, so there's some details that we still have to work out, but I'll give you a quick overview, okay, and so all right, so here's the first step, right, the first step was that the Congress needs to somehow know, given this mess of policy, that which portion of that policy is relevant to the VM migration service and then needs to hand it off, all right, so how does Congress know which portion of an arbitrary policy is relevant to a specific VM migration service, and so the way that we do this right now is any time you snap in a new policy engine, and by the way we expect to be able to snap in policy engines just the way we add services into Congress, we ought to be able to hook up any policy engine just like we can hook up any service, so whenever you create or snap in a new policy engine, that policy engine has to tell us, well here's the kind of policy that I can handle, and for right now there are just a couple of things that we have on there, that policy engine tells us what tables or what information is required to be in the policy and what information or tables is prohibited from being in the policy, and then you can also describe something about the structure of the policy, and in this case, as Ramki mentioned earlier, there are some linearity constraints that we're going to put on them, okay, but at the end of the day that new policy engine that we want to hook up to is going to advertise what kind of policy language it can handle, and then Congress, every time the admin changes the policy or inserts a new policy is going to redo this analysis where it does some matching, it says, well, the VM migration service advertised this, and I can find that match in the policy I've been given, and now I'm going to hand that policy off to the VM migration service, okay, all right, okay, and so now once that VM migration service has actually been given the policy, the question is what does it do with it, well in this case what we ended up doing was remember the inputs are the warning and the error statements, and along with the data that tells Congress what's actually happening in the data center, okay, and so what we ended up doing was we said, well, if we could translate that data log policy along with the data into a linear program, then it turns out that a linear program can be solved in such a way that it will just simply pop out and tell us what the right assignment of virtual machines to hosts actually is, so we can think of this as a sort of similar to any other domain-specific system in that the sort of right-hand side of that slide is encapsulating what that domain-specific system is good at, like it's built in some specific language that makes it easy to express certain kinds of policies, and in this case we're just assuming that that language looks like a linear program, okay, and so once we translate it to a linear solver, then that solver can just simply tell us what the right assignment of VMs to servers actually is, okay, and now once the VM migration engine has actually computed the desired assignment of virtual machines to servers, now it actually has to do the migration. This is an interesting problem, we didn't spend much time on it because we know other people who spent lots and lots of time, I was talking to Serba earlier today, and this seems to be the bread and butter. So anybody who wants to work on this or work on hooking Serba up, that would be great. Lots of interesting issues there, okay, all right, and so remember what I said was from my point of view, from Congress's point of view, one of the interesting reasons to tackle this problem was that we wanted to experiment and understand how to hook up one of these domain-specific policy engines, like this VM migration service, to Congress so that we could do delegation, so that Congress could take a piece of a policy and hand it off to a special purpose engine. So the question that I wanted to answer was, you know, in some sense the VM migration service and the standard Congress look very similar because they both take what as input? They both take data logs as input, and so what's really different? What's different about this VM migration service? And the answer is, well fundamentally it's built to simply enforce policy by migrating VMs. It knows the answer. That's all it will ever be able to do to enforce. And so that makes it domain-specific. What else does it make? What else makes it domain-specific? The fact that it explicitly computes this assignment of VMs to hosts, okay? And it also focuses on linear policy. And also the other point is that using Celia Miller, it's more looking at analytics for monitoring and just monitoring with a static time-based policy. That's another interesting aspect. Uh-huh. Okay. This is what you discussed. Okay. Great. So now I wanted to show that, you know, this is all real and we in fact have an interesting demo to show. So quick walk through the demo setup. We've got a question. Microphone. Okay. Having Congress as policy repository or the policy engine, there are a possibility that some policies had some collisions. Uh, are you working on that just to avoid collision in between the policies because, you know, yeah, I mean, well, okay. So okay. Well, I'll answer this one question and we'll go to the demo just so we don't run out of time. Okay. So, so it's possible that, that you could write policies that are conflicting with one another. So you couldn't actually, you know, satisfy them all, right? It's always possible. And so it actually turns out that in Congress, it's very difficult to do that today. It's very difficult to write conflicting policies today because all you get to do is, is identify the conditions under which there's a violation. So there's no way to say this is not a violation. And so then you can't really conflict because if somebody says if one statement says there's a violation and another one says there's a violation, it's just a violation in both cases. Right. So there's not a clear way of defining conflicts at least today. But let's, let's follow up after we do the demo. Yeah. Excellent. There are many issues that is going to appear in the time. For instance, we need to be aware about the, the power of the, of the, of the chassis of the, of the, of the computers. And we need to be aware because we are not, we don't want to deploy the same VN or the same amount of VNs in the same power. So those things today up in the stack or, or, or Congress is not aware about that. That's exactly right. I mean, so basically, I think there's so many considerations, not just that, even network bandwidth could be a consideration. So all that is part of the research work we're doing in the IRTF NFE research group, which, you know, which is actively progressing. And what you're hoping is we want to actually demonstrate specific aspects through real world implementations, not just be, you know, basically, this right stuff. So and if you really don't mind, we'd like to go through with the demo and then hold on to your question, please. Thank you. So where are we? So basically, we have two servers host 100 and host 101 with three VMs. The first one has VM 201 and other one has VM 200 and 202 connected to L2 switch. And so this is the current state of in Nova commands. Then this is showing it on the other server. So basically for each VM, it's showing which servers it belongs to. And now when you apply policy, so essentially we have it really running in a system. So we wanted to show it there. Hopefully. Yeah, I'm not showing it. Completely this kind of moment I switched function I think it's okay. Interesting. So yeah, it's a real demo we wanted to show but I think somehow the VNC session is not showing up. So we saw the initial state. Once you apply the policy, this is the policy which Tim described in detail before, right? The memory utilization. And then at the end of it, right? This describes the silo meter consume memory consumption and everything where the current states are. And then when you run the policy which you wanted to really show. So essentially this is these two VMs migrate to the server. The basically 200 and 202 migrate to host 100. Right. So both servers are under utilized in the initial state. Correct. The system figures out well, if I move all the VMs to one of the servers then that server is no longer under utilized. So there are fewer violations of policy. And potentially the next step could be to even power off the server. That's a possibility. That's the idea here. Key idea here. And this is showing the final state, right? For each VM 200, 201, and 202 all of them are in the same server. Right. And that's where we are. So having finished the demo what we see is especially for NFV energy consumption policies are about optimizing you know soft constraints we talked about. Many soft constraints you know are which span multiple subsystems compute storage network energy name it it could be even security could be even there but also like Tim pointed out there are certain hard constraints which are built in and you cannot violate them. And Congress is an effective tool for delegating such policies. We show VMs the migration policy engine is a specific example but it's open to other possibilities. The end result is reduced energy consumption reducing an OPEX and also addressing the regulatory requirements of operators that's the business angle to it. And we have related talks you want to go over the Congress once Tim? Oh sure yeah. So fourth 30 tonight we have a hands-on lab if you want to actually type some commands copy paste commands use the GUI we also have tomorrow at roughly three o'clock we have an introduction to Congress remember I mentioned this earlier I didn't go into many details so if you want to find out more details about how we achieve any service and any policy come to that and then after that we've got a working session in the evening you're welcome to come to that as well. And we have another talk on NFE research group which I mentioned tomorrow and this is about how it ties open source and open standards and research all together and please join us tomorrow. Okay now we have plenty of time Yeah 10 minutes yeah the floor is complete if you don't mind the mic Okay I'll repeat your question well I guess what I would say is the question we're trying to answer is how would you hook up something like DRS to Congress so that in Congress you can write anything you want any policy that you want and yet we can utilize all the power of enforcement that DRS provides right and so we sort of mocked in a version of DRS just to make it And also I think I'd like to highlight at another point the static versus dynamic we're focused on more of the dynamic policies analytics driven monitoring driven not just the static ones which exist today The question was I forgot to repeat your first one by the way but your question was we've given an example where we're doing vertical scaling and could we connect the heat to do some sort of scale out for the horizontal scaling so yeah there's nothing stopping you know what I said earlier was we ought to be able to set up hook up any service to Congress and so you can imagine hooking up heat to Congress as well the thing I always wonder about with scaling though is that people typically build these scaling features into something like heat does heat have scaling today yeah okay so what I'd like to do is be able to like delegate to heat and say you take care of scaling yeah yeah so yeah we could imagine doing that sure that's exactly what we're talking about so yeah exactly so one of the benefits of Congress brings to any service like a heat auto scaler or DRS or this thing that we built was that typically when you build one of those domain specific engines you don't have access to all the data in the data center and so it's very difficult to write the policy necessarily that you like because you don't have access to that data you can't say if there's a nova server because you're not actually talking to nova or in order or to get that data you have to actually set up these these calls directly into nova so congress sort of brings this namespace it allows you to say you can write policy about users and groups and applications and yet at the same time use the current functionality of heat or DRS to actually do the enforcement and so we can sort of hide all the fact that the policy was originally expressed in terms of users and applications and just simply tell heat well here's the information that you actually can accept in order and need in order to do that auto scaling and also our goal is to bring in the monitoring aspect the service as a service basically as you know workloads change over time at a VM level or at a network function level so that's our goal too besides the policy kind of yeah it nicely works in close cooperation would heat with heat that's exactly what our value is okay another question right okay so for for those on the videotape the question was I'll boil it down the question that I took away away from that was policies can become complex and what we really want are a bunch of nice tools to help users understand those policies where they're dead where they're alive and so on right yeah okay so yeah so the nice thing the reason that you use the policy language as opposed to like a programming language is because you can build such tools right you you fund them the fundamental difference between a programming language and a policy language is that a policy language is less expressive than a programming language okay and so the reason that you do that is so that you can build more powerful tools to do deep analysis and so so I guess I need to know more about exactly what you mean by dead because but at the end of the day yes we can build tools and if anyone wants to work on tools I would love to help if a question relates to the gentleman's comment about conflicts so when you express the policy it described either a warning or an error but it didn't say what you should do to take to remediate that right and in your example the migration service migrated the vf right to correct the warning or correct the error or correct the warning without violating the errors if there was another you know let's say DRS service listening and seeing the same policy what if it starts taking an action that does conflict the conflict happens in the action essentially like the migration service and another service will see a policy expressed they both decide they want to do something about it since it's not expressed what the action should be how do you resolve that kind of problem yeah so this is a good question right this is a deep question about like you imagine this world in which just imagine congress eventually right eventually you give congress this policy and it knows it's what it's supposed to make happen and you know maybe it decides to move VMs around and then you've got some other service sitting on the data center and it decides to move them back right and so you get this ping pong effect going on and you know if your question is how do we stop that I think in general the answer is well you can't because congress doesn't control everything and so I think that what we're trying to understand here is how congress can help coordinate and collaborate with these other engines and what you bring up is a good point right which is well maybe you've actually delegated some policy to one engine and a different policy to another engine and then what they're trying to do to enforce those policies are actually fighting with each other so to give you kind of an architectural view of this you know essentially what you're doing is you have kind of two apps which can generate conflicting policies but you don't know whether they're really there so that means really those apps have to be communicating their policies and making sure you know things are in a consistent state overall as part of the entire you know the stacked architecture at different levels so the conflicts could be at a compute subsystem but you have a global policy which is consulting that means if these are all independently operating without communicating you're in serious trouble so and that is one of the things we're also looking at as part of the research topic in energy and I guess another way they maybe ask the question more directly is do you think that the action for remediation should be expressed somehow in the policy or on the side of the policy maybe yeah so I mean I think this is a this is an interesting question and I think there are going to be cases where we want to allow people to be able to say here's how you do the remediation you know I have a strong opinion about how this remediation should be done here is the action to take at the same time in the context of delegation it's not clear that that would solve the problem right and so I think initially what I would say in the context of delegation which is what we're talking about here I would say don't hook up two services that are going to fight with each other right I mean in the short term that's got to be the right solution in the long term what you can imagine is a world in which when that advertisement that the the immigration service publishes to congress to say here's the kind of policy I can accept would also include enough information about the kinds of actions it's going to take to remediate policy that at least in the system could say well look this one says it's going to try to migrate VMs in this crazy way and this one says it's going to migrate them in this other crazy way and well maybe that's clearly the case that we shouldn't be hooking these two VMs up or these two services up together or at least we shouldn't be delegating to them at the same time we ought to choose either or so that would be my long term answer okay thank you so that was a cool question just a are we out of time are we still good one question the hook was removed so that was a cool question one possibility is that since congress is an extension of data log which is a subset of first order logic then if you gather up all the advertisements from all the different domain specific services and use an ontology and do some first order logic analysis you can probably find the services that are going to conflict with each other that might not solve your problem but it at least gives you a jump forward yeah exactly I think the policy engines as you said have to be communicating that's the you know that's the real way of solving it if they're not communicating then you are in trouble yeah because you don't you don't want this to degenerate into a set of imperative if this then this right that defeats the whole purpose of this thing so that's one way to stay declarative but still you know try and work within the bounds of the systems maybe all right great yeah and that's great and the reason to keep in mind the reason part of the reason to choose data log is if there are tools solvers a bunch of research and development that's been done to help us do that kind of analysis so we don't have to reinvent that from scratch we import that and we just have to use it correctly exactly exactly well great thanks Saul thank you