 Alright, so the time just rolled on to 1400, I guess everyone who's still at lunch can separate the consequences of which there are no nice posts. So I'm Philip Treka. I work for Citrix and I'm talking today about this title here, Gingest Mechanisms to Strengthen Guest Separation. So I've only been with Citrix for about two years now. I came over to work on a product that ended up being called Zen Client XT. It's a variant of Zen Client that does, or has some interesting security properties. And a lot of these things just come from Zen itself. They don't necessarily come from anything that's specific to XT. They take all sorts of advantage of the things that just Zen does well by default. The strong isolation properties of the hypervisor are very nice, its ability to use the hardware mechanisms that are available to provide these isolation guarantees is great. And it doesn't try and do too much. It is very thin at where it can be. And it doesn't put a whole lot of stuff into the hypervisor. It offloads this type of complexity into the virtual machines that support this. So these are things that I'm sure you've heard a lot about over the course of this conference and even in the Linux con. So George gave a great talk about protecting Zen systems and so he's covered a lot of this stuff. Like drivers and our ability to take them out of DomZero and put them into, well really to keep them out of the hypervisor and either put them in DomZero or in further segregated virtual machines outside of DomZero. QMU, which is of particular importance to us and relevant to the talk, and also InterVM communication mechanisms because obviously the value added is when things actually talk to each other. So anyone who's done work in security stuff specifically in the government space, you'll know that requirement number one is to make sure that data never moves between these things and then requirement number two is to make sure that they can communicate and do something useful. So these are obviously at odds and as security is in all these things. So the Zen has very interesting properties that allow us to do this in ways where we have a policy that actually defines the way these things can communicate. So having a formally stated security policy is very important to the Zen client XT. And I think to Zen and secure systems in general. And so I'll make a particularly strong point throughout this talk about the concept of policy semantics and preserving the semantics of the policy across changes to the system. So anytime that we move things out of the hypervisor or keep things out of the hypervisor by design, they end up being generally in a user space either in a Linux system or if you're on the more cutting edge you're doing things with some of these functional programming languages that we're seeing that are building an entire operating or rather removing the operating system entirely. We haven't got that far blue or not. That's a little more researchy than we're currently doing right now. So the user space that we're dealing with is the standard Linux stuff. So anytime we have resources that reside in DOM zero, we're interesting in and if they're acting on behalf of a guest, we're interested in separating them and protecting them from each other as much as possible. Specifically the work that we've done is relevant to this architecture that's known as ESFERT and the implementation as it lives in Zen client XT and as it can be used in other Zen based systems. So this isn't anything that's specific to XT. This is something that we're trying to get used more widely in Zen. And this is specific to the QMU instances. So that's the work that we've done. Now this architecture is actually pretty interesting in that it can be reused to solve other types of problems. And so ongoing work that we have right now is to use a similar architecture but at a different policy layer. And this is in the realm of multi-tenant orchestration. So we're having mutually distrusting guest VMs living in the same Zen platform, which is nothing novel that happens in the cloud today. But we're interested in providing them better security guarantees and better separation guarantees. And again, so this would be kind of an ESFERT for an approach to multi-tenant management. And the last part of the talk will be kind of speculation and hopefully encouraging the community to take a specific path with regard to interview and communication mechanisms. So in XT we had to solve this problem a long time ago in that we've disaggregated the system and this implementation kind of predates me. But V for V had some, which was the name of the implementation for our IVC mechanism, had some very nice policy semantics that I was a big fan of, and there was a recommendation over the summer for an IVC mechanism that uses more of the front-back driver model to achieve the same goals but without introducing new mechanisms in the hypervisor itself that uses the existing stuff that's there. So I have some thoughts on that and I'll pontificate a bit towards the end. So I guess a word of caution. I'm going to bounce around a bunch between these kind of high-level architectures and then drill down pretty quickly into somewhat esoteric discussions of policy semantics and security stuff. So if you have any questions, feel free to interrupt. Honestly, doing them in line is actually a lot easier, I think, than doing them at the end. Some context-driven stuff is fine by me. So if you throw a hand up in the middle of the talk, I'll stop and entertain any questions you'd like. The references are kind of provided in line and at the end as best possible. So there's tons of links and tons of information out there, so I'll throw that in there too. And I think this is particularly relevant in that we're starting to see, as Lars was talking about before he jumped out to go do the boff, that Zen is getting used in some pretty interesting things from automotive systems to all these embedded devices and all the things that go all the way up to the cloud and security ends up being, and separation specifically, ends up being very relevant in these systems, especially for safety-critical stuff like automobiles and these devices that, you know, they end up kind of representing the multiple different personalities that we have via work and personal. And of course the multi-tenant orchestration stuff is relevant to the cloud. So as much as I possibly can, I'm going to reign on the cloud parade a bit with some security esoteric-ness, and everyone feel free to roll your eyes as you wish. So there's just to cover stuff, I have a kind of animated slides throughout this and just to get some conventions out of the way with. The hardware obviously is the bottom of the stack, and everyone seems to represent these in a similar way, so it's probably quite familiar. But for people in the future that are watching this on the interwebs, the Zen and Dom Zero are largely part of the platform, and on these diagrams I'm adopting a green color for that. There'll be other boxes that pop up that are outside of Dom Zero, these are other virtual machines. Generally I'll pick a color for ones that represent the interests of the same organization, and they're different colors than they will likely represent the interests of different organizations. There's superscripts and subscripts accordingly as well. Their boxes inside of another VM will be used for processes that reside in that VM, and I'll use similar colors to represent the virtual machines that they are representing. And arrows of course, communication channel is pretty straightforward stuff. This is probably the most important part to pick up. The white boxes with dashed borders and the convention used for the labels, the underscore T at the end, is to denote a type. So this is SCLinux speak for a specific security container, and so the boxes that are drawn around processes are meant to denote SCLinux containers that can find what these processes can and can't do within Dom Zero. Not everything's drawn to scale, so sometimes you'll see Dom Zero be very small, sometimes you'll have other platform VMs that are drawn in a similar scale. This here is an ND VM, so that's meant to be a network driver VM. It's a virtual machine that we've taken and separated drivers out into so that they're no longer in Dom Zero, and then you'll see the collections of VMs pop up above them, arrows for communication channels, so arrows down to the network driver VM is a network path. That's pretty straightforward. Now if you see these same conventions here and boxes drawn around virtual machines, it's actually the flask language that is used to describe SCLinux policies is also used to describe XSM policies, and at this scope we have boxes drawn around virtual machines and they're meant to represent the containers that can find these virtual machines themselves. So as for itself, it was an architecture that actually came about quite some time ago, and it came out of the observation that the things that the Zen community has done very well to keep the hypervisor thin, they still need to be done somewhere else, and QMU is used to provide things like the device model and a whole bunch of emulated devices that are very large and complicated pieces of code, and QMU does that exceedingly well. The continual problem though is that lots of code means the potential for more bugs, and anyone who is at Jason Sonik's talk this morning, he actually went out of his way to find valid references for the speculation about the possibility of bugs and they exist. So one thing that we're particularly concerned about is that as we have different virtual machines on the system representing the interests of different organizations, we want to make sure that there is no way for one to interfere with the other, or that the possibility is as low as possible, and QMU actually is probably the most significant attack surface for the system. You can use SE Linux policy to wrap QMU, and it's actually quite easy, there's a readily available QMU policy, and as you fork off QMU instances, or the tool stack does rather, they land in the standard SE Linux type, and they have disks that are associated with them, so when you're using the emulated disk path, generally just it's on boot, and you end up going to PP drivers afterwards, but still the disk exists in DOM zero, in the QMU instance still exists in DOM zero, and we draw similarly a boundary around this for the block tab, the devices. Now this makes perfect sense, we have QMU being able to talk to its disk, but the policy granularity is still wrong here in that when we have things like potentially an exploit from one VM into QMU, it can still read the disk of the other virtual machine, which is considerably problematic, and the way that the type rules work is this is perfectly allowed for one disk, or for one QMU instance to talk to its own disk without taking additional precautions, it's still able to read from the other disk. This was a particular problem for us, so one of the things that we did to limit this access to appropriate resources, right, we want to have a one-to-one mapping from QMU instance to the actual disk that belongs to that virtual machine. We use this component of the SC Linux policy called MCS, it stands for multi-category security, and we augment the tool stack slightly, and actually we don't augment the tool stack, we actually provide an inline mechanism for breaking off the QMU domains such that they have random MCS categories assigned to them. The randomness of it isn't really important, I don't want that to confuse the issue, the importance is just that those categories need to be different, and each QMU needs to have its own unique category, so it's perfectly valid to start from zero and count up so long as you give guarantees that they don't overlap. We went the random route just because I was feeling crafty that day, and I wanted to read something from DevRandom. So now, I think the next two steps are pretty obvious, we want the TAP disks, the device interfaces for those things to reside similarly in domains that are separate from each other, and as long as you assign the appropriate category to these things, the multi-category security component of the system keeps them from interacting with a disk that does not belong to them. This is simply a property of multi-category security just by definition, so MCS is an artifact or a policy that comes from traditional military systems where you have, I'm sure if a lot of people have heard of multi-level security, MCS is actually a way internal to one of those hierarchical categories or rather hierarchical types of classification to provide divisions between individual containers in a non-hierarchical way. It's much more generic, much more reusable than something like multi-level security, which is a hierarchical policy that doesn't make much sense out of very specific uses. And so in XD, we've made a specific point to not implement things like MLS that are just specific to one group of people. We want to make things as general and reusable as possible. So the details here, the SEO Linux community actually worked on this quite some time ago. This was implemented or the spec was written by James Morris and a whole bunch of folks from the kernel in the SEO Linux community back in 2008. And the requirements are extremely well documented and they're up on the SEO Linux project wiki, so if anyone feels like taking a look at this stuff, it's a good read. And the initial implementation was outside of Libvert, but it was ported into Libvert and became the driver behind their security driver interfaces. So there's an SEO Linux driver to Libvert that will provide this type of stuff. For XT itself, we have a pretty embedded approach to building these systems. Our DOM zero is built from open embedded and it's pretty small. And Libvert was a little bit big for our purposes. It pulls in a lot of stuff that we don't particularly need. And we have a tool stack that we've been working on for quite some time. So since I don't know Haskell and our tool stack is written in Haskell, I took the kind of hacky approach of just having a standalone binary that we interposed between the tool stack and QMU. And it actually worked well enough that we never changed it. The nice thing there is that it's not a specific implementation that's locked up in our tool stack. You can actually pick this up and throw it on a Zen system that you build from source and you just have to do a little tweaking to put the interposed binary in the right place. And of course, SEO Linux is working. That's kind of the easy part, right? And it ends up being pretty minimally invasive. But using the Libvert implementation would be interesting and there have been some talks about using Libvert on more mainstream Zen systems than XT. So that's something that we're really interested in the way that we can reuse this. It also addresses a similar problem. I'm sure the folks in the audience are already thinking about Stubdoms and how these two mechanisms would either compete with each other. Stubdoms are another just another mechanism for for confining QMU instances, but they're generally more heavyweight. They require a totally separate instance of on Zen and other virtual machine dedicated to specifically that. And that ends up being potentially a non-starter for smaller systems. So on a laptop, you may have a limited amount of RAM, so you may not be able to spin up a Stubdom or if you're on an embedded system or even in the mobile context, it may be a non-starter. So having QMU still kicking around in Dom0 is still a reality and this is a very relevant implementation, we think. So the code's actually up on GitHub. I've written a pretty good analysis of the way that the MCS architecture works using some logical framework that was developed by Dr. Susan Older and Shukai Chan at Syracuse University and SetLogic's kind of cool. So it's an interesting thing to take a look at, if that's your thing. Like I said, code's on GitHub. There's a written another analysis specifically of Asphalt and you can get it working on OSSN, like I said. So it may be an interesting exercise to take a look at what it would take to put this into upstream tool stacks. Unfortunately, there's a whole bunch of them that kind of compete for attention, so choosing the right one is at your discretion. So using this architecture in XT for QMU is actually is pretty interesting to think about what it might actually be used for in other contexts. So we have a goal in XT to have virtual machines be able to be managed by organizations that are potentially distrusting. So you could have a device that has a management interface that could be doing the bidding of separate backends. And this is particularly relevant in the cloud context where we use these publicly available clouds and we use all use the same interface to them. And the tool stack does what registered users ask it to. And that is particularly problematic for some more security conscious folks. In considerations like who owns the devices and who's interest of the device actually represents at any one point in time is particularly relevant. So back to my diagrams here. This is to represent a management action in this multi-tenant orchestration where we have this kind of bubble coming in over the wire telling DOM zero to do something. This is usually an action that's done when the user hits this interface and asks the tool stack to spin up some VMs for it and maybe spins up three of them here. And we actually break them apart to see if you can see them nice and clear. And if this is a good cloud, we'll call it, it puts them maybe into a security context. For a lot of organizations this wouldn't be a problem to say they're all in the same domain. If you were to tell an organization that all of their virtual machines will be resided in the same containers, probably not a big deal for them. They all come from the same source. This leads for the possibility of one virtual machine to affect the other since they're in the same context. And you can take a look at the existing upstream flask policy for the HVM domain type and look at the self rules. That'll tell you everything that a VM in this type can do to itself or any other VMs running in that context. But now when we have another bubble come in over the wire and tell our tool stack to do something, maybe it's a different organization, a different person, and they spin up another collection of virtual machines. But as it stands now, these would just be standard HVM types in the XSM policy and largely that means that they're in the same security context. This has specific implementations if the organizations either don't trust each other explicitly or don't know about each other. You'd assume that if they don't know each other, they don't trust each other, right? Now if you were to tell that same organization that things that operate in the same security context have the ability to affect each other, not so bad if it's my VMs, but what if it's the other organization's VMs? Or even if they're just individuals, right? I'm sure most people here have spun up virtual machines in a cloud somewhere. No one ever tells you that the possibility for interference from other virtual machines exists, but it's a very real thing. And our goal is to kind of or try to make statements such that we can make at least coherent statements about what the actual security context and the interactions might be. So our approach to this problem, and it's actually very similar to what James was talking about during Jason's talk about the potential for spinning up nested scenarios. Since nesting is still not really something that is used on a large scale, our approach is to use basically SVRT, but instead of in the SCLinux context, we're doing it in the XSM context. And we have management backends that sit in the end device, and they represent effectively the interests of these different organizations. Now bootstrapping this is non-trivial. Generally it means that you'd have to have a neutral third party involved, like a management engine that specifically just spins up management engines for opposite backends. But by assigning these things different, using the same MCS categories that you would in SCLinux, you can keep these management interfaces separate. So now one organization trying to spin up virtual machines through its management interface can spin up the same VMs, but due to the nature of the MCS context, these actually end up being separate from the virtual machines that would be spun up by another organization. Now similarly internal, they'd be able to talk to each other, but you can't have this communication again across the groups of virtual machines. So this is a very similar end goal or end state that was achieved using SVRT for QMU, which is something we're particularly fond of. This is obviously not a hard guarantee, but this is probably the strongest guarantee that we're able to give with the existing mechanisms. And again, this work is ongoing. We don't have this finalized in shipping in a product, but because it's pretty complicated, but this is our end goal. So similarly to the previous discussion, the VM communication mechanism is something that's kind of near and dear to our heart because all of these things have to talk to each other in some way, shape, or form. There was a short discussion back in June after the attempts to upstream B4V had kind of gone a bit sideways and an alternative mechanism was proposed. This is, it's something that seems imminent. This is probably the right way to do it as far as the community is concerned. That's what the mailing list is all about, and we're perfectly happy with that outcome. The proposed approaches for context is to use the similar front-back driver model as the disk in that work uses today, but for interview communications using like the VSOC stuff that's in the newer Linux kernels. But the interesting part is the way that these things got negotiated. And it was originally proposed to do this through a rendezvous service that would be probably a third-party daemon living in a separate virtual machine from DOM zero. And there was also the possibility of the negotiation being done through Zen store, but using existing mechanisms is a good thing. No doubt about that. Our particular concern with the IVC model is what the policy looks like in the end state. So we were particularly fond of the V4V stuff, and my particular interest in this was the fact that we got some really nice policy semantics out of it. Since it all lived in the hypervisor, we actually created a new first-class object in Zen that had very clear access on policy semantics as a V4V channel was created. It was labeled according to the creator, and any reading and writing to and from that had send-receive semantics like you would expect from a message-passing interface. So this was very clear, and it was very easy to look at the actual policy and say, oh, these two virtual machines can communicate with each other over V3. That was really nice. So my concern here is that the policy for the new IVC mechanism have similar properties, not necessarily the exactly the same, but have properties such that you can look at the policy and see the communication. The comms channel that ends up being set up here using the front-back model has actually been a bit of a problem for us in the past when writing policy to confine or define interactions between front and back device model interfaces for disaggregated things like the network driver VM. You can see the creation of things like shared memory and events, but the problem there is knowing what's actually being done over them. So it's pretty easy to infer that there are communications between the net front and the net back if the net back is in the driver domain because you can see the communication between the two virtual machines. Generally it's the only communication over a grant and using the event channels. Now that ends up being a little bit difficult to separate the two things when you have things in DOM zero that may be providing both the net and block type things. But it actually gives us the opportunity to consider that problem in a larger context when we look at extending XSM for the VSOC and in IBC mechanisms. So higher level semantics would be nice just above the grant and the event channels. Now there was an interesting discussion about where the negotiation should be done for the channel in the IBC implementation. And so one of which the idea of having a third party daemon outside of DOM zero doing this negotiation was that there was discussion of a policy mechanism, but it wasn't discussed in the context of XSM. So adding new policies is particularly difficult when you want to make large scale or make assertions about large scale systems and you want to be able to do it with just one policy representation. Having multiple policies is a particularly bad thing because you have to deconflict the two when you're trying to make do an analysis. But it seems like the connection management is actually going to land in Zen store anyways. And I think this code can be attributed to Tim Deegan that really the third party daemon ended up just being Zen store and a funny hat anyways. So that's actually really good stuff. First off it's funny and it actually ends up being quite true. Now this obviously brings the Zen store permissions model into the discussion. So that itself is kind of a hard coded policy and it's somewhat discretionary which ends up being a problem for a mandatory access control policy. And actually a lot of information is lost with regard to these two policies if that ends up happening. So there's a possibility of actually using something that's called the user space object manager. So a lot of the work that has been done in the run up to four three is the XSM hop hyper call and all the good stuff you can do to talk to the security server that's living in the hypervisor. So SC Linux has had this for a long time and it's now available on XSM. You can actually have a user space process make a hyper call and ask the policy server that lives in the hypervisor to make a security decision for it. So it will tell you what the policy says you can do. The user space thing is still responsible for making the decision or actually enforcing the decision as to if it's going to actually enforce the policy decision that's made by the hypervisor. But that's fine because it keeps the hypervisor simple and it keeps these things very purpose built. Apparently though this isn't a new idea. I kind of showed up here thinking you know I discovered this really interesting new problem. I had lunch with Jason on Monday and it turns out that this is very well documented on the on the Zen wiki. It turns out that there's actually been some mechanisms put into XSM kind of planning ahead for this eventuality. But nothing was ever done in Zen store to make this to make this a reality. So where I'm from if you discover an old problem you're kind of obligated to fix it. So if anyone out there knows something about Zen store and you want to teach me a lesson about it. This seems like an interesting problem and something that really should be solved. And I've heard a lot of people discuss the Zen store permissions model with a bit of irony in their tone. We'll say that. And I know nothing about it. I know Zen store in that I've used the commands to read data out of it and put data into it. But it would be very interesting I think to basically map what SELinux has done for the file system using this kind of hierarchical tree to to confine accesses and to Zen store. That's actually a pretty cool thing if you ask me. So the end goal there would be to actually have policy semantics that can show when the IBC channel is negotiated through Zen store like someone's reading data out of Zen store that was written there by the thing that has the back end in it. And you can see them create the channel and then you can see them rather read data out of Zen store to get information about how to create the channel and create the channel using grants. That would be infinitely more clear than simply having you know shared the possibility of shared memory between these different domains. So again it makes dependencies pretty obvious. So this this larger problem of having flexible and loosely coupled systems isn't something that is that prevents us from from having good security that prevents us from making assertions about what can and can't happen in the system. Generally it's just that if I've learned anything from working in Zen client it's that good engineering practices really do enable good security for these larger systems. The things that are disaggregated that have well-defined interfaces are things that you can actually write a mandatory access control policy to govern in a meaningful way. There's obviously hazards with regard to separation when you don't do everything in Zen. So Zen isn't the final arbiter in every possible scenario. And that's perfectly okay as long as you're willing to do the work and conscious of the implications for things that are happening in other VMs. And there are plenty of policy and enforcement mechanisms that can give you this guarantees. And the way that XSM has been architected at the end of the graph has done some really great work in that area. You can still use the same policy that lives in Zen to make these types of policies and guarantees available further up the stack. It doesn't have to happen in Zen only. So we're particularly interested in what this policy mechanism looks like after or what any policy mechanism would look like after the addition of new bits to the system. So staying on the mailing list, reading these things and proposing this type of approach is really the first step in solving the actual real problem. So that's the work that we've been doing over the past years, two years I suppose now as I've been with Citrix. So if anyone's interested in similar types of problems in the larger context doesn't have to be just client stuff. It can be cloud or Zen in general. I'm perfectly happy. Zillars was saying that there's still a boff slot open. So if anyone wants to chat about it, I'll be around. And I think I went over the questions time, but since no one's dragging me off the stage, does anyone have any questions? Great. I'm happy to repeat it. There we go. Thanks, Stefano. But to the QMU and using security policies to separate QMU instances, what do you do about the hypercalls that QMU do? And how do you ensure that the hypercalls that you do can't interfere with, have you restrict what set of hypercalls to do this? Right. So in this context, with QMU running in DOM zero, this is separate from SFIRT, right? This wouldn't be SCLinux getting involved anymore. That deals with Linux kernel objects, not things like hypercalls. So the hypercalls would be something that Zen gets involved in. And it would know where those hypercalls came from, right? The label of the domain that made those calls, that would be DOM zero. And if the answer to that question lies in the XSM policy for what the DOM zero virtual machine can do, that's a lot. So owning a QMU instance and making hypercalls would be another particularly interesting problem. And so for stub domains, you can actually rein that in. This wouldn't be something that would address that problem directly. So at the moment, are you actually using what Zen server hasn't made and using proof set QMU and a hat to the proof command driver to restrict hypercalls? No. We have not made those changes. I was talking to James about that the other day, actually. And that's a very interesting possibility. Okay. Thanks. You're talking about wanting higher level semantics for your security policy about what happens in the connection manager or Zen store, so that you can see that a channel is being set up. Something I would just want to be careful of is to remember that the mechanism by which domains actually talk to each other exists regardless of this connection manager. Completely. And so you need to you need to express the low level security policy about what ground tables and event channels can be set up as well as having some control of the higher level semantics. Right. So the exoscent policy is a static thing, right? It really just tells you what this, what a virtual machine or a domain is allowed to do. So you're not going to see these things in runtime. So in Zen store, if you would implement this kind of user space, the mechanism, it would be possible to interpose on this. So the decisions are made in runtime. But like you said, the policy is static and these things will exist in the policy regardless of whether or not we want to let them happen kind of on the fly. So there's no revocation. So there still is in a way to link something like a read from Zen store to get information about setting up a grant or an event channel and then the actual setting up of that. Those things are still, still very distinct. Yeah. So I think the point I'm worried about is having a policy that stops you using Zen store or whatever other broker doesn't stop you setting up a channel. Of course. So there's this is something that again, like I said, I don't know a whole lot about the way these things are actually constructed. So it's a very valid concern. And if you, you know, if you just have to guess the data and to create the channel, still a possibility. All right. So it sounds to me like we should be buffing after whoever I've already ran five minutes over into their talk. So appreciate the discussion guys. And let's do this offline, I suppose. Thanks.