 Okay, so the next speaker is Richard Guy Briggs who's been involved in Linux security for what 20 years Also, almost I first met Richard. He was Back in the late 90s. He was working on the freesmon project and I had started doing some development on that That was before we could have IP sec in the in the kernel due to the the first round of crypto wars and That's how I actually got started in in the next kernel development through that project So Richard is here to talk about audit and namespaces and containers Well, this is encouraging. I was figuring that the way to clear the room was to announce talk about audit All right, so I've been whoops don't want that kick dear Excuse me, I guess I'll have to talk for less than five minutes on each slide so a bit of history I started hacking on computers back in the late 70s and I guess there's been sort of a steady progression ever since There's most of the most of the history there Exposer to PDP 1123 started doing some real programming Then got some education Well got some schooling shall I say the education came later then as James mentioned worked on freeswan and then worked on some imager drivers for security cameras in fact and Then I've been working with red hat for three and a half years now So I'm known in other places a sunracer from some of my solar car racing and RGB on IRC and more recently as a Diplomatic dependent and so that will explain why I might yondring the talk because I'm still somewhere over the Atlantic in terms of jet lag Yeah, so what is audit well, it was introduced by Rick faith in 2004 who was redhatter Came in and Just about the time that the kernel started actually using git And that the logs were were kept What is it? It's more or less syslog on steroids Syslog has a lot of functionality that's used for monitoring what's going on and a lot of it's used for debugging Audits point is more to Securely document what's going on so that if something bad happens you can go back and potentially use these logs in a court of law To be able to say okay. Well, this happened at this point in this particular person did that and whatnot It works well with s c linux and with other Security modules in the kernel and the point was to be able to have it be able to report stuff that's going on Actions that have been taken place on part of some of the other security tools There's a there's a user space daemon which logs to disk or to net So there are events that are generated in the kernel itself which are reported but there's also events generated by various different user space tools and They will contact the kernel and queue things and Yeah, so there are configurable kernel filters so that you can select what you actually want to see or be able to see more Detail on something or ignore others It only reports behavior It doesn't actually interfere actively with what's going on in the running system with the exception of If you've got some situation where it's not able to document or report what's going on Then you can set things up to be able to actually panic the kernel and stop it So what are namespaces? They were first introduced Sorry, I'll back up a second Namespaces are kernel-enforced user space views so It's able it's possible to set things up so that from a set of namespaces you've got a limited view from the from user space of what's actually going on on the system and This is set up so that You could have Various different processes compartmentalized and they are not able to see beyond their their own scope So at the moment there are seven different namespaces. We've got pure namespaces, which include the mountain namespace it was the first one to be introduced in 2001 and The way from the naming of that particular namespace it looked like people thought at that time that that was going to be The only namespace that was going to be introduced It's expanded since then and Yeah, so the UTS namespace is the one that basically says this is the host that you're running on and Can also provide domain information The IPC namespace as I've seen a few other people suggest nobody really knows what this does But it was introduced at a similar time I'm not sure whether anybody actually uses them now The net namespaces those are a little bit easier to understand There again, they're peer Systems are peer namespaces. So the system comes up in the initial peer net namespace and All of the physical devices appear in that namespace so then if you want to have a new network namespace you can unshare to go to that network namespace and if you want to be able to use one of the physical names physical devices in that namespace then you can assign it from the initial namespace into that network namespace and each network namespace has its completely independent stack so you can have its own firewalling and Devices and they're completely isolated if you want to get them talking to each other Then you can set up a virtual ethernet pair between two namespaces and then treat it as another device and set up your rules and your firewall stacks and all the processes there In terms of hierarchies there are three Namespaces that are set up as a hierarchy. So the the permissions are inherited from one to another the PID namespace was the first of those to be introduced and the idea there is that in the initial PID namespace you start at process one and then in a Child namespace you have Whatever PID that process is that becomes the first process in your new PID namespace Has PID of one in the new namespace, but it's got a PID of I don't could be three thousand or something like that in your initial namespace So it has representation in all of the parent namespaces and can be monitored as such peer PID namespaces don't have any view into each other. So if you've got two child children, which are PID namespaces which are spawned from the same namespace PID namespace Then each one is not able to see into its peers space user namespaces are I guess the most contentious one so far There's a lot of security Traps that are Waiting to well not just waiting then some of them have already exposed themselves But it's the most contentious in terms of how do we do security with username spaces as a result there are a number of Distributions that have not enabled username spaces by default yet because there's still some work to be done to iron out Where these are going? In username spaces you can Spawn an unprivileged user will be able to spawn a username space and then within that new namespace they can map all the users within there back to an existing user in the Parent namespace so in that username space caught it In that username space you can Have a root user with UID of zero, but it would map back to an unprivileged user in the parent namespace So that presents some issues about How much permission do you give that route within the username space? Yeah, so it can be very powerful, but it can also be a trap so In terms of how odd it relates to this stuff I'll get to in a moment see groups are the newest namespace which have just been introduced this spring and The point of that one is to hide memory limits I'm sorry not just memory limits, but system limits of various different C groups and the problem before was that if you've got You're using C groups within a set of namespaces You're able to get an idea of what the system limits are and the whole point of it. It's to hide them I haven't been following the details of it, but I know enough that It's here and it's being accepted upstream and Things are being ironed out I wanted to say at the beginning if you've got questions by all means raise your hand and I'll try and address them in line I may simply defer to a later slide to answer the question or To discussion afterwards if it's if it's getting too involved So what are containers? Well? We don't really have a hard definition. There seem to be many definitions and as many almost as many as there are users There's sort of a General consensus that's a combination of namespaces kernel namespaces secure computing sec comp and C groups The kernel doesn't have any concept of what is a container so at the moment It's up to some user space management tool to be able to say this is a container and I'm managing it like okay Cool, so what about the kernel? It doesn't really know about these things and it's user space. That's It's a concept. That's completely in user space at the moment at the moment So At this point from the kernels perspective where there is some interest in trying to Get a better sense of what is a container to be able to do the auditing so that When an event happens then you have a better idea of where this thing happened and what the context was and whatnot so a couple of different Potential directions that we can go there is to go with some kind of container ID that the kernel knows about or to try and track down all of the namespaces that are used in in one particular Event so that we can then trace back through other logs to say okay Well this set of namespaces was responsible for this action So what is the problem well to quote Highlander there can only be one at this point There can only be one audit daemon and it has to live in the Initial user and PID namespace and that's locked down by kernel rules that basically say it detects What namespaces you're in and simply gives you an error if you try and run it So so far with the mount UTS and IPC namespaces there's been no issues It's just not a problem things were wide open until 3.7 RC 1 There were some there was a limitation with network namespaces because it simply wasn't listening and it couldn't respond Once the user namespaces came along there was some necessity to start to lock things down because Of this obvious security implications of allowing this to happen The net namespaces that was fixed in 314 There was an interest to be able to have Back up a second. So network namespaces have been used by many network appliances to be able to run thousands of different network namespaces and and for example Some kind of switch where it needed or it was more convenient to be able to manage what was going on by compartmentalizing it into a number of different namespaces with their own firewall stacks and Network IPsec stacks and things like that and so There wasn't really any good reason to restrict which Network namespace it was in in which it was needed to listen in The application For that was VS FTPD, but I'm getting ahead of myself So user namespaces those as I've alluded there are some security concerns and And They're being there's ongoing work to address this stuff Eric Biederman, I've had a number of conversations with and he's certainly familiar with the issues as are some of the other security guys and It has Exposed some issues that have been in the mountain namespace since the very beginning But they were not really abusable until we started adding other namespaces and other ways of being able to use them So in terms of audit The audit daemon it seems to make the most sense to tie the audit daemon into the username space And I'll get to that a bit more in a moment So in terms of the network namespaces The initial network namespace was the the one that was originally the one listening and there were a number of proposals on how to try and deal with it and In the end the least complex solution one out for the moment for the short term there were discussions of multiple cues and things like that and Ways to be able to to say okay. Well is this network namespace Socket equivalent to this other one over here and so for the short term. We just put in a simple Patch to be able to allow any network namespace to talk There were obviously other restrictions so that if it was in other User or pid namespaces. It's it wouldn't work anyway, but at least the network stuff Worked so it it broke existing containers Because the original Assumptions were that the system would refute of sorry would return Econ refused when it wasn't available because the protocol simply wasn't there. Yeah Yes, so yeah to clarify there This wasn't for configuration issues. This was clients who were simply recording security information so All right, one of the one of the mechanisms that Audit has for determining whether or not you're allowed to run the daemon is I Guess most people have assumed that root is all-powerful on Unix systems and with capabilities That is able to restrict these things and no longer give root all-power. So there's Three capabilities right now Cap audit control is what allows an audit daemon to actually have permission to be able to run in that in that space Cap audit right is the one that you're referring to which actually allows user space daemons to be able to write a message To the audit log There is a third one cap audit read, but that's not Germain for what we're talking about here. So in this particular case in containers Pam When somebody was trying to log in in the container Pam would then try and write a log message a Unaudit user message Saying hi so-and-so logged in when it tries to do that and it got back and Econ refused it would say oh Audit's not configured. I just ignore it. I'm gonna just walk away But once we changed it so that in any network namespace The audit daemon was actually available Then it started returning E perm rather than Econ refused and since this was a subtle difference Then Pam got upset and just refused to log in altogether So we had to kind of juggle things a bit and so now we actually lie And give Econ refused when we're in that situation So we're gonna have to go back and fix that at a later date. There was a bit of a I guess a conflict and because it technically broke user space Only even though the fact that user space was broken to start with It changed the behavior and we all have a well many of us have a pretty good idea of how Linus feels about breaking user space so pin name spaces This was fixed so the use case here was VST PD off and Again like Pam or log in D Once a security event happens then it wants to send this information to audit and say hi This event happened. You probably want to know about it. You want to log it so VST PD broke in some distributions and So then it came through and say, okay, well, how are we gonna solve this problem? It's running in a Pid name space Which was completely restricted Pid and username spaces because of the higher hierarchical nature We basically said okay. Well that that that's not allowed to run but in this particular case it seemed to make sense to be able to allow it and We didn't see any danger in allowing cap audit right only in this particular case. So the We had to do some clean up in the code to be able to assure ourselves that any of the reports that were coming from non-initial PID name spaces were able to go through okay and the translation from its Pid name space into a name space that was at least well understood by the kernel Was necessary. So because the audit daemon lives in the initial PID name space We simply translated to the initial PID name space and we store it in that that way There was a number of other reporting and whatnot that was done using the wrong kernel functions or macros and those were cleaned up and fixed up including parent PID so There's also in some interfaces where When there's some reporting necessary to do to user space We want to make sure that when the time comes that we do have other tools in user space that can use cap audit control That they will get the correct information instead of getting something that's based in a different name space so looking ahead once we do allow cap audit control in PID name spaces, we're gonna have to go back and Fix a few things, but it shouldn't be too too Difficult to deal with we we have a pretty good idea. We understand the problem now now The user name spaces Gelfang had submitted about four patches in 2013 to try and address this stuff He was also the one who was responsible for one of the network Name space patches that was a bit more complex than necessary There are some ideas in his patches which will be helpful in the future and so we'll be able to borrow some of those ideas and Deal with the network namespace when that time comes The issues with his patch set for user name spaces there were quite a number of questions that came up that weren't sufficiently answered and As Was alluded in previous talk? There is some good rigor from the community as to why are you doing this? What's your use case and Have you thought about these different ways that things could be abused? And so there was some good discussion back and forth and it looked like things were just not quite ready for it yet Not withstanding the issues in the user names the security issues in the user name space So there's also a audit namespace Ideas were thrown around there weren't there was never any code chucked around yet But in the discussion that ensued from that It started to become more and more clear that we didn't want to muck around with Yet another namespace and that it seemed like the most logical place to anchor audit was in the user namespace still requires cap audit control and There's it's not completely clear whether that we need cap audit control in the initial namespace Or sorry in the parent namespace that is spawning the user namespace or somehow for it to end up with this in the User names the new user namespace itself So it there can only be one that rule isn't entirely broken because now that rule applies per user namespace Well not now, but in the planning that we're doing in the direction that we're going So another and a really important part of this is that we cannot have an audit daemon in a user namespace That's able to influence the audit daemon that's running on the initial namespace Which is the master one for the machine The best example that I can come up with is The ability to panic the kernel so if you've got things Configured in a user namespace to panic the kernel obviously it's going to take down the entire machine, which is simply unacceptable It's being pointed out that people who are really care about security aren't going to be interested in running audit in namespaces Perhaps But there's other use cases that have come up and there are people who may be interested in some of these services of audit but are quite willing to run containers and a potential use case here is that rather than panicking the kernel Audit D running in the names in a user namespace that detects something some condition that it's unhappy with Can simply kill off that user namespace and all its children There could be some interesting Ways to be able to use this so for each User namespace audit each one would have its own rule space. And so if you've got a hierarchical namespace such as the PID or user or C group namespace If you've got some kind of rule that applies in your In that particular namespace if the rule exists also in the parent namespace Then it could trigger a similar type of message in both of the namespaces at the same time But you can set up your name your your rules in your parent namespace such that it may not matter What's going on inside this user name namespace and you could avoid having things? DOS And Then within your namespace You could actually run your audit daemon and have all the logs that you want and be able to monitor What's going on and and and be happy with that information. It's also the question of the queue There was some discussion about where the queue should live and whether it should be a shared queue and whatnot And the obvious one is that if you overflow the queue in a user namespace Well, you know in a sub name in a child namespace Then it couldn't overflow the queue for the entire machine and that's simply unacceptable in terms of the influence of a audit in a container so it looks pretty clear that we're gonna have to have our own queue in each in each space so namespace IDs Within the Within the kernel There's an interest to to be able to track What's going on in various different containers? We don't have any idea. What is a container at this point in the kernel? And so there's two potential ways to approach this the one one potential one which was the one that was first proposed by Arista Rosensky of red hat three four years three years ago Was he proposed using the proc I node to be able to track these so there was some objection in the initial discussion that happened there and People reserved the right to be able to change the meaning of that particular metric and It was felt that that information was incomplete without the Without the device ID So I came along afterwards and proposed the serial number on these things so it was a monotonically in in incremented serial number per kernel and it would simply increment as namespaces were created and That the intent there was to create something that it couldn't be argued with it interfering With the the usage of the proc I nodes or potential change of use of the proc I nodes The proc part of it was eventually discarded because of the Cree you folks the What do you call it back up and restore for containers That yes Exactly. Thank you. I couldn't remember the name So it was It was removed from there, but still made available from the kernels perspective For being able to do audit logging the reason that the Cree you folks didn't like it is because if they're taking a Backing up a container and moving it to another host then if they try and restore it then that particular Number is not something that they could restore on the new host and it would show up as something different So the the process that was in it would be able to detect the fact that it had to be moved to a different host So that kind of stuff. It wasn't to be wasn't to be exposed Now that problem still exists with the proc I node, but that's their problem and I'm not going to compound it So the patch was also reworked for a namespace file system Which came about about a year and a half ago? Alvaro Pulled the namespace stuff off into its own file system and the device ID was added to it to be able to qualify any of the numbers that were presented so the idea here is that any events that happen you've got a set of namespace IDs which are included with the event and Basically be done with it and some user space orchestration Container orchestration tool would track those from the audit log messages and Be able to sort of reassemble the picture afterwards and say okay this thing was created here it had this these set of namespace IDs and then it was moved over to here and it had this set of namespace IDs and At a higher layer it would map and track all of this stuff so that it could keep track of what was going on the other Potential way to go here is to use something called a container ID or the idea of a container ID and This would have to be added to the task struct itself the idea is that Once you've got a container ID, then it's inherited by all of its children unless something says oh, it's now part of a different container So in this particular case, then you could use the event ID Sorry, you could use the container ID in various different events and the user space Orchestration tool would then simply be able to track that container ID throughout its full life It would be inherited by all of its children and the idea would be that I Guess I was just reflecting on this within the last week And it might actually make sense to do a capability to do that something like cap container set or something like that and then a Container orchestrator would have that particular capability to be able to set the container ID on a particular Process and then once it had that then it would automatically inherit down to all of its children Yeah, when it gets migrated when it gets migrated it would have a different Actually when it gets migrated you could Actually carry over the same container ID except that there could be another Container on the other host that's already got it punt. That's a user space problem Okay Anyway, it's yeah again, that's a question of how the user space orchestration tool manages that resource and It's simply I guess a helper so that the kernel is able to help the user space figure this out I should have gone back and Started with the first point on my slide here is there's already a precedent for this Which is the session ID and that was requested by the security folks to be able to track each login session So somebody logs into a machine and they switch to a different user They use sue or sudo or something like that and they switch around and they do various different stuff it all comes back to one initial login and When that initial login is done, that's when the session ID is set And so it's able to track all of the activity based on that original session ID so there's already a precedent on on using this type of mechanism and again the Kernel itself doesn't inherently know anything about the session ID This is something that's I guess a user space concept and the kernel is simply helping user space in managing that information That makes sense There was another question Yeah, and and again in both cases The audit logs would be aggregated by a Orchestrator that is able to go across hosts Because you could have more than one host that is hosting containers and you can migrate the containers between them And you want to be able to track the audit activity across both so that's that's a matter of Developing more tools at a higher level to be able to securely deal with this information so in conclusion We're okay for number of the namespaces and didn't have to do anything fortunately to to be able to manage them The net namespace is okay for now We anticipate that there are some changes that we'll need to make once There are username spaces in multiple audit daemons in particular If you could have a username space and its host to more than one net namespace And so it's a matter of getting all of those net namespaces within that the scope of that one username space To all be able to talk to the the audit daemon The PID PID namespace again, we're okay for now. We're able to receive Audit user messages from any PID namespace as long as that process is in the original the initial username space We'll need translation per PID ns, but that's well reasonably understood so The user day is sorry the audit daemon. We're expecting to anchor it in the username space after some discussion and and exploration and We think we understand reasonably well Why that needs to happen? Although there are still security concerns about user namespaces doesn't inherit the audit daemon No, no, it doesn't no it doesn't it depends on the rules that are in place for that Initial that parent namespace so the parent namespace would have its own rule set in its queue and if a User namespace doesn't choose to run an audit daemon. There isn't any reason to Assume any of the information that's coming from that user that child namespace So it the only reason we would care about it in the parent is if there's a violation or a trigger or a trip of a rule That's in that parent namespace right, so if it's running in its own I don't know Mount namespace and It does something bad to a file there and we don't care about that namespace Mount namespace Who cares? Not not well the only reason we would care is if there was a violation of some kind of resource that they cared about in the parent namespace Like a file But if it's a different Mount namespace and it doesn't it isn't shared with that that File system space by the parent who cares Right, so if there's a rule if there's a rule in the parent that says I care about this file and Something happened to the file then we'll log it and The idea there is we want some more information about what that process was and oh we care about this file it affected this particular file system in this Parent namespace, but the activity actually happened in this container over here And here's the set of container ID so that we know that it happened in this thing over here So we would still want to log the spawning of new Containers, but not necessarily log the kind of the activity that happens in it that stuff would still be logged Yeah, I thought I saw another hand waving No, all right Where was I my slide here Right so namespace IDs versus container IDs Still have that decision ahead of us in terms of which way we want to go I'm favoring the namespace IDs, but that ends up with eight IDs added to each event record which potentially Yeah Potentially the thing is that the at the moment with the IDs We could simplify it so that it's down to a couple of digits and a hash could be Significantly longer, so I'm not sure we really win anything there and it's harder to read So Yeah, namespace IDs, we're looking at seven namespaces plus the device ID which could be relatively short integers right now the offset on the for the NS FS Inodes is something like I don't I think it's Half of max in so it's not particularly pretty Now so if we go to container IDs, then those I guess could be arbitrarily set by the Orchestrator orchestration management tool or they could be assigned incrementally by Just serially from boot In which case they would be relatively small But I guess they could be large if you've got huge numbers of containers spawning and being killed off constantly And again the question of needing some orchestration tools needing the I guess Develop the higher-level tools to be able to map and track all of the activity that's going on So any questions everybody happy? Yeah, you wouldn't inherit rules from the parent namespace once you start up a new audit daemon you would Put in place the rules that you are interested in or care about if if host B has somehow different policy In the namespace that is spawning the namespace then that's up to the orchestration tool to manage that stuff I don't know why the orchestrator would have a different set of rules, but it could Yeah, anyway, all right any questions great, so There we go. I'm RGB at red hat comm The litics audit mailing list would be the place to be able to discuss this stuff once we start getting a bit more serious about it, then there may be some spillover into the containers mailing lists and There'll be certainly some stuff that shows up in LKML There's the URL to subscribe to the mailing list, but if There is any useful comments or patches or that kind of thing They will they will be accepted. They won't be rejected if they're if they look reasonable to the list maintainer and upstream audit We've recently migrated over to github and so all of our audit stuff is in there There's sub projects for the kernel for user space, which is still currently managed on Steve grubbs SVN, but we're migrating over to github soon and then we've got a documentation Repository as well as a test suite repository, so there it is