 Actually All right, so I am showing that it is 1130. I think we've got I think we're waiting for a microphone to come in So be patient. We may start just a minute or two late or a minute late Okay, we're gonna get started Our next talk is an introduction to container security. Our speaker is Thomas Cameron. He's a red hat solutions architect So without further ado, I give you Thomas. Thank you. Thank you very kindly My name is Thomas Cameron and I am the global solutions architect leader at red hat So I'm responsible for all of our field technical Sales staff so all the presales engineers presales consultants around the world are my internal customers I'm I try to keep them up to speed on technology and Methodologies for helping customers and things like that as we discussed before we got started There's kind of an alphabet soup after my name. I am a technical resource inside of red hat Been with the company for about ten years and I have been in the IT industry since 1993 so Been doing this for a while, but I have certainly learned the longer I work at red hat the less I feel like I know because I work with some incredibly smart people and Every time I start to get a little big from my britches I just go and visit engineering in Westford and come back and I'm like, yeah, definitely Linux definitely Linux admin so Let's talk about what we're gonna talk about today today is going to be a Repeat actually of a presentation that I did it all things open back in October I Found that when I was talking to folks about containers, you know There were invariably questions that came up about container security and you know the stock responses well containers You know can be secure because they take advantage of a lot of functionality in the Linux kernel and they take advantage of some of the features of Linux you know Colonel namespaces and security enhanced Linux and and things like that and people are like, yeah, that's cool But what's a what is a kernel namespace do and I was like? It's a namespace That the kernel Spaces names and I realized you know I sort of understood at a at a fairly high level what kernel namespacing was but I started really digging into it And the more people I talked to the more people I found that were like yeah I kind of get it, but like I don't really get it. So I decided to come up with a fairly simple Overview we're not gonna dive super deep This is just the fundamentals of container security So I'll talk a little bit about who I am about what red hat has been doing around Containerization we'll talk about what containers are how they work What they aren't and talk about some of the myths that I have heard from folks who are exploring adopting container technologies And then we'll talk about the the nuts and bolts of what container security is what makes up all the components of container security including kernel namespaces Control groups a little bit about the docker daemon and how it helps to Keep you secure and then we'll talk about Linux kernel capabilities security enhanced Linux And then I'll go over some tips and tricks what to do what not to do and talk about our conclusions So by way of introduction as I said, I've been with red hat for a long time. I've been in IT since 1993 I actually changed careers in 1993. I was a police officer for several years I have a pretty strong academic background in security and law enforcement and I have you know professional experience and in law enforcement I realized that I couldn't afford to pay my bills as a police officer and went off and became a computer geek So now I'm a computer geek been with red has since 2005 got a whole slew of professional certifications My first job out of when I changed careers was in a novell shop. So yeah, I'm dating myself right there But I was a C&E back in the day back when that was like the thing, right? And I worked I went to work for Microsoft for a while. I got better So I was an MCSC and an MCT a Microsoft certified trainer. So I've been doing this for a long time I do spend a lot of time focusing on security and organizations like banks and manufacturing companies e-commerce companies and things like that I've helped red hat customers for over a decade now working on these issues working on security issues And as I mentioned earlier, man, I totally recognize I don't know everything. I know a lot I've been exposed to a lot, but again, it's it's it's hard to know everything. So kind of in a nutshell. I'm just a big old nerd So let's talk about where red hat fits into the container ecosystem We have actually been working on containerization since before 2010 We made the Makara acquisition We bought a company that is doing platform as a service back in 2010 and they were using Linux control groups and security enhanced Linux for process isolation, but honestly, this is really before Containers as we think of them today was really a big thing, but we've been doing this for a while We're fairly familiar with well I would say we're expert because I talked to those guys sometimes and I again walk away feeling like an idiot but So we bought my car we rebranded that as open shift And so now you may have seen that red hat has a platform as a service offering called open shift There's three versions. There's open shift origin, which is the upstream open source Anyone can download a compiler play with it. Do whatever it's kind of the fedora land version of open shift Then there is open shift online that red hats has been offering as a service for several years now You can go to open shift dot red hat comm and you go through a really simple web UI and say I need this much capacity And I want this application framework and maybe this pre-canned application and you can have your application up just like that And then we have open shift enterprise, which is for deployment on a customer premise behind the firewall now originally open shift use what we called cartridges and they were they were Units of compute that were basically segregated by security enhanced Linux by Linux control groups and kernel namespaces in about 2013 Docker really started to get it stride started to do really well and we realized it really makes sense for us to Change away from what we had been doing to what docker was doing So we adopted docker as a technology and our pass offering back in 2013 We are a top contributor to docker last time I checked this was this was back in October We were the number two contributor to upstream docker behind only docker itself so we got a fair amount of experience there and As as you know if you've been to any cloud track at the conference at scale The adoption of docker is incredible. They have done a phenomenal job. It's good technology good community The company itself is doing real well they've been through multiple successful VC rounds and you know the laundry list of people who are who have thrown in behind Docker is amazing red hat is is grateful to be on that list as well and Even Microsoft I was in the Microsoft session the Microsoft Azure session yesterday and they're like yeah We're gonna be doing docker containerization on Windows Server 2012 First they ignore you then they laugh at you then they fight you then you win so I'm sorry I'd like to see him try The the comment wasn't expand and make it proprietary So let's talk a little bit about what containers are so containerization now I'll be talking specifically about docker But this these concepts should be fairly universal no matter what you're using whether using lexity from a boon to or whatever But basically it's a technology that allows for applications Whatever those apps are whether it's web or database app server or whatever to be abstracted from and in some cases Isolated from the underlying operating system in the case of docker the service can launch containers regardless of the underlying Linux distro and containers can absolutely enable some amazing application density Since you don't have the overhead of virtualization where you've got the full operating system with the same set of libraries getting loaded for every VM it's really really lightweight and when you factor in the the architecture that it's just the bits that you need for the application and you also take into account the capabilities of Linux control groups where you can give very fine-grained control over how much resource or how many resources each Application or each container is going to get you can get some phenomenal phenomenal utilization And you know I joke the same container can run on different versions of Linux a boon to can run on fedora and since us can run on Well dogs and cats human sacrifice Mass hysteria, but it's really cool. I'm a red hat guy. I've been with red hat for a long time It's just what I'm most comfortable with I Can't tell you how many times I've spun up a container and it's like oh, yeah The underlying technology on this one was obviously built on a boon to you. That's cool I run it on my fedora box or on my rail box and it's like no big deal So it's really really an awesome technology for application developers So how do they work or actually no, maybe not containers make it easy for app developers to build and deploy apps It's not really that a mass hysteria that we were talking about earlier. So The question then becomes well, what are containers not? Because one of the things that does concern me a little bit and frankly frustrate me a little bit is this like We're gonna containerize everything and everything is gonna be awesome. It's gonna be wonderful and it's gonna solve all the problems in it Containers are not a panacea They are not the cure to all that ails you and Containers are not fit for every application not yet Maybe not ever. I don't know. I never say never except when I say I never say never because you never know, right? I mean somebody's gonna come up with something. It's like some huge massive awesome container what we really hope for When we're kind of thinking about what would be the best case is what we really hope for is for the big software vendors The big enterprise software vendors third-party shops like SAP and Oracle and you know fill in the blank Wouldn't it be awesome if instead of getting a DVD ISO image That's got a freakin install that sh that does all kinds of crazy non-packaged weird stuff all over your file system That you don't know what it is wouldn't it be cool instead if they just went Here's a container go That would make life so much easier. So you know, will we ever get there? I don't know. I hope so I hope so And most importantly containers are not virtualization I think that in a very very simplified way They are the next logical step in the whole, you know physical virtual then high density move But they're not virtualization you can run containers on bare metal You know on an OS directly on bare metal or you can run it in VMs So virtualization is most often a component of an environment that has containers, but the two are actually separated So let's talk about container security There are Multiple layers that are involved in securing your container environment You know, it's not it's not a simple environment when you think about all the different things that you want to do to harden your system The system that's running the containers the containers themselves the networking stack in front of the container How we do network address translation from real public IPs to the internal private network the containers use There are a whole lot of moving parts there and there's a lot of opportunity for friction between those moving parts So let's talk about what all of those little components are and maybe get a good handle on how we can reduce that friction So containers use several mechanisms for security Linux kernel namespaces, which is a really fascinating set of technologies and I'll show you some examples in a little while of that Linux control groups for that find grain for excuse me fine-grained resource control how much memory how much CPU how much IO things like that The Docker daemon itself when you're when you're using Docker as your containerization platform a fairly infrequently discussed Feature of container security is Linux capabilities or lib cap. I'll talk a little bit about lib cap It's a pretty it's a it's a pretty cool way of saying I'm going to limit the capabilities that a process has even if it's running in the context of root and I'll show you some examples of that And then Linux security mechanisms like se linux or app armor now. I'm an se linux guy So for me, I will talk about se linux, but you know any sort of pluggable security model like app armor or GR sec or anything like that could be used Now I'm gonna break out of this for just a second How many folks disable se linux in their environments? Bad users bad bad Go to YouTube Look up se linux for mere mortals. It's a presentation that I did at Red Hat Summit It's 45 or 50 minutes long if you watch that presentation and still shut off se linux I'm gonna have to smack you. I'm sorry. I love you. I do it out of love But go watch se linux for mere mortals. It's a it's a one-hour overview of se linux and do you have a comment? Yeah, I've actually done it like for three years. It's been the top rated session at summit for the last three years So it's I mean at the risk of thumping on my chest too terribly much. It's a pretty good presentation so se linux is to be clear an Integral and very important part of container security when you start doing massive scale when you start doing Platform as a service where you're running this stuff across, you know, really large environments. Don't turn off se linux. It's really important So like I said, I was having conversations with folks about Container security and it was always just like, you know, oh, yeah, well We we do Colonel namespacing and that helps with security and then the conversation would go on and then one day somebody was asking really specific questions I was like How how do I describe what Colonel namespaces do? And so I came up with a couple of ideas So let's talk about what they are first namespaces are just a way to make a global resource appear to be unique and Isolated the namespaces that the linux kernel can manage include mount namespaces process ID namespaces Unix time-sharing time time-sharing systems namespaces UTS Interprocess communication namespaces network namespaces and user namespaces I've seen several folks taking pictures of slides You are absolutely welcome to do that, but these slides are posted on the scale website You can have them or you can go to my my people page people dot red head comm slash T Cameron and you can download them from there So those are the different namespaces which can be managed by the colonel and let's talk about what that means So the first one I want to talk about is mount namespaces Mount namespaces really just allow a container to think that a directory which is actually mounted from the host OS an Existing directory on the host OS is the exclusive domain of what's going on in the container the container is not aware that that's actually File system from outside of that container space and you could potentially mount it within the container read write with root privileges And so on when you start a container with the dash V And then the host path like the actual directory and then where you want it to be mounted inside of the container So dash V slash var slash w w w slash html colon slash bar slash w w w slash html For instance you can spin the container up so that it thinks that oh Yeah, that's mine and no one else has it and I'm not even aware of anything else and then you can flag it as read write or read only And so then it sees that directory in its own namespace and the cool thing about it is that you there's no Exclusivity you can do a mount namespace of one physical directory across multiple Containers and that works really really well. So as an example of that. Can y'all see that okay? Okay, so as an example of that I'm logged on to my my laptop my t5 40 and I cat var Dub Dub Dub in the bar Dub Dub Dub html index dot html And I see as a silly web page right just that's the actual file that exists on my file system So I do docker run dash V and then I do var Dub Dub html and I'm gonna mount that within the container on var Dub Dub html and I'm gonna fire up a fedora container and execute the command bash and Then you notice that my prompt changes from being root on my t5 40 to be in root within my container And if I cap that same file I see that it's there In the context of this container the container is not aware that that's actually not a native file system I can run that container in fact I did run that container from within roots home directory. I'm not necessarily a good idea running stuff as root But I did it just for sake of examples, but my point is it abstracts and isolates what's really there and tells the container No, it's all you it's all yours. Don't worry about it. Don't worry about anyone else on the system so so I Have a pretty good understanding of this but what would be a security advantage of doing that and this is the Audience participation point. Why would that be a good thing from a security standpoint? It can access your file system. Let's be real clear if I had mounted that RW It can access your file system, but one of the beautiful things about it I'm sorry and the comment was well, it can't access your file system and actually it can but you have control You can say nope only read only You're not gonna be able to modify it. Yes, sir Exactly you have jailed that container to where you're gonna give it access to one specific Directory it can't it can't go up the file system to to the to the container That's the root of the file system So from a security standpoint it's a heck of a lot better doing it this way than trying to do all kinds of crazy bind Bounds or weird stuff like that or to rooting truings find don't get me wrong. I like truiting, but this is really strictly locked down Okay All right, so process ID name spaces a pid name space Or pid name space is really just lets the container think that it is a completely new instance of the operating system So when you start a container on a host, it's gonna get a new process ID on that host But pid name spaces means that the container thinks that it's in its own process tree And whatever the command is that you started that thing up with is gonna be process ID one Which is a little bit weird because we're all used to process ID being one being you know in it or something like that But in this case, I'm gonna launch a fedora container running bash and I'm gonna run the psax command So I run that command So I do docker run dash it fedora and execute the command bash when I do a psax process ID one Inside of the container. It thinks I am alone. I am my own operating system. I know nothing about anything else, but My process ID one is bash, which we all know is actually, you know, you couldn't boot a system that way But again from a security standpoint, how awesome is this that container doesn't see squat on the rest of the system? It's like I bought by myself, and I'm safe and secure and now over on the host though When I run a psax I actually did psaxf, but you can see that my docker command is actually process ID 18557 That's an example of that namespace. That's an example of the Linux kernel going no no really that's process ID one You're all by yourself you cute little thing You're all by yourself and you're secure where on the system you may have you know dozens or hundreds or even thousands of docker instances So everyone understand why that's a really good thing from a security standpoint The container is not even aware that there are other processes running on the system So that's a beautiful thing from a security standpoint. It's totally isolated You can't easily anyway do like a buffer overflow of another process because you don't even know that the process is there username spaces When you start a container assuming you've added your user to the docker group you started as your user account now I did an example earlier as root don't do that do it as your as your user I just I blew my machine away put a fresh install on to do the demos and was stupid forgot to put a user account on it but So you start dockers your user account and in the following example I'm going to start the container as T Cameron Because I realized halfway through building my slide deck. I was like, oh, I shouldn't be doing this as root And I added the account But once a container is started the user inside the container is root So in this case, I'm sitting there logged into my console. My ID is T Cameron. I'm user ID 1000 I'm a Regular user no root privileges or anything like that. I run docker run it fedora with a command of bash I'm still on the same machine still on the same console, but when I run ID Inside of the container. I'm omnipotent. I am root Now the cool thing about this is that means that within your container you can do whatever you need to do, right? But that's only within your container if you try to go Oh, I got root on the system and try to do something else You're not even aware inside of the container that there are other things going on outside of your container So again from a security standpoint, there's a bright line between what you can do even with root privileges inside of your container You can't affect the underlying host. Yes, sir What happens if you configure what? You shouldn't be able to I'm not gonna say you can't because like someone out here. I'll go well actually but But remember that even though you have root privileges. Oh, I'm sorry repeat the question. Thank you The question was what happens if you trigger a kernel crash within the container? And the reality is because we're using namespaces because we've isolated the container in this root account Doesn't actually have root privileges on the underlying host. You're not gonna have for instance You're not gonna have access to the sysrack trigger Proc file system entry to trigger a crash So at most you would be able to crash the application within the container, which is kind of a restart the container Does that make sense? Yes, sir Yes so the question is If I'm root inside of the container What would happen if I ran a kernel exploit that I found and try to well, here's the beautiful thing You're not really root your only root in the context of the container so in many cases not all in many cases a Lot of the a lot of the things that you can get like if you have local access you can crash the kernel blah blah blah That doesn't necessarily Work for a user space for in other words a non-privileged user wouldn't be able to do it now If you did find an exploit that was there was a user space, you know Or a user privilege that you crash the kernel It will probably take it down if you can execute it and it does work from from user space It'll probably take it down and you know at which point you got bigger problems So the question is will it take down the container processor will affect the host depends on the exploit? If it's actually truly like a kernel exploit that's going to cause a panic or something like that You're going to take the host down Here's the thing if they have access to your system whether it's physical access or shell access to the system It it's going to greatly increase your security Vulnerability surface area right because then they can do fork bombs and they can do all kinds of stupid stuff that you know That's bad stuff You know the the the best answer there. Unfortunately. Just keep your systems up-to-date You know really pay attention to your to the the security mailing lists keep your systems up-to-date I know that's not a perfect answer, but there was another question over here I'll talk about that in a little while the question was can a container be constrained so that one bad actor Can't overwhelm the system and answer is yes, absolutely, and I'll talk about that in a minute. All right So again security implications of username spacing The main thing is just isolation even though you give somebody sort of a virtual root environment where they have control in their environment In their container it isolates Giving those escalated privileges to somebody that's got system-wide access All right network namespaces Similar concept network namespaces allow a container to have its own IP address independent of the hosts And these addresses are not available from outside of the host So in this case, it's it's private networking very similar to what you get like with libvert or other Network or I'm sorry other virtualization But in this case the Docker service is smart enough to set up an IP tables masquerading rule So the container can get to the rest of the internet so in this following example what I'm going to do is I'm going to spin up a a Container and I'm going to run if config or in this case IP a show and you'll see that That my physical host And I did this a little bit backwards So I'm on the if I if I use Docker inspect and I query for the network settings IP address Then I get that it's a 172 17 dot zero address But my machine my laptop wasn't even hooked up to ethernet like when I do IP address show on my ethernet interface There's no IP address at all. So this is a totally separate networking namespace It's isolated from the external network in this case. I don't even have an external network, but that's just a virtualized Private network that's behind what will eventually be an IP tables masquerade rule so that it can get out to the network. Yes, sir I have the question is can you do 802 dot 1 q trunking with that within the Network namespace I Have read that there is work being done on it. I don't know if it's complete But what would the benefit of that be like why would you what are you looking to accomplish? Yeah, I mean, that's a fair point if you do have that then you can directly address your Your containers and then you don't need Docker to do a lot of stuff But my question to that would be but Docker has got an entire large community around developing that networking code and the IP masquerading code and they're actually pretty good at what they do I know I think I'm pretty good at what I do, but I guarantee you I'm not nearly as good as all of that community So I'm yes, sir Okay Okay, so the comment is in a very large-scale environment where you're using you said VPN concentrators in containers Open VPN endpoints and you started running into limitations with IP tables. Okay. That's a fair point. That's a fair point Yeah, yeah Docker may not be the best solution for that So you don't yeah, that's fair Yeah, so the comment is you don't have to use network namespacing You can actually directly address the Docker containers. Yeah, that's fair. That's fair. Was there another question? I don't know I don't know Send me an email at Thomas at red hat calm. Yeah, I've been there that long Send me an email at Thomas at red hat calm and I'll I'll find out. I just I haven't heard The next big thing. Yeah, so the question was is any work being done to allow? Docker to use NF tables instead of IP tables and I just unfortunately I haven't heard one way the other so Yeah, if you by the way, if you have questions, I'm Thomas at red hat calm. I'm really easy to reach all right So security applications of network namespacing it should be relatively obvious we can segregate those those containers We can keep them off of the network. We control ingress and egress rules through IP tables So that it's it's isolated and we know for sure what's going to get in and what's going to get out So IPC namespaces IPC namespaces really same thing but with inter-process communications. So my container Doesn't have any IPC's maps because well because I just spun it up, right? If I do if I spin up my container and I do IPCS there's nothing going on There's no inter-process communications because all this is just running bash It doesn't have any applications that are talking to each other, but on my host When I run IPCS, you can see I've got you know zillions of processes that are communicating with each other But again that container I go back a page that container thinks it's all by its lonesome running on its own Bare metal it doesn't have any any knowledge of all of those IPC's that are running on the host so from a security standpoint The the security implications there are you don't even have inter-process communications that are exposed to that Container so you don't like if a bad guy does get control of that container They don't have the ability to go and start trying to do attacks or anything like that So again isolation and segregation is not a bad thing UTS namespaces or UNIX time-sharing system namespaces again let the container think it's its own separate OS with its own host name and its own domain name So on my host on my laptop when I run the host name command I get my fully qualified domain name It's a t540p.tc.redhat.com But then when I fire up my container and I run host name on that same machine It's got a randomly generated string for a host name So again the container thinks that it's its own machine It is not aware that there that it's actually a container running on a host So again from a from an isolation standpoint if somebody does break into it They don't get any identifying information about what the underlying host is like you don't want to say yeah Well you're the host name is host xyz and mycorp.com because then they're like Now I know where to go attack right so isolation and security through keeping it segregated from the host operating system So that is namespacing those are the capabilities that the kernel has to segregate out and to isolate What's going on inside of a container process from the underlying operating system So let's talk about a different capability which is control groups So control groups really just provides a mechanism for either aggregating or for aggregating slash partitioning sets of tasks And all of their children into hierarchical groups of specialized behavior Really this allows for various system resources to be bundled into a group and I can apply limits to that group So disk IO CPU usage memory usage network use all that kind of stuff are contained into one address space or one one process space And the cool thing about using control groups is everybody thinks in terms of oh well if I use Linux control groups I can make sure that the one badly behaved neighbor doesn't take over the whole system and that's absolutely true And that's certainly very common use for control groups to make sure that you don't get some bad person that's doing something silly Like setting up fork bombs inside of your container and taking the whole machine down But the flip side is also true as well because we use control groups for containerization We also use control groups for virtualization all kinds of stuff What it also allows you to do is use your hardware to the absolute maximum So if I know for sure that I've got a fixed amount of memory let's say 16 gigs of memory or 32 gigs of memory whatever I know that when I spin up whatever it is VMs or containers or whatever And I know what I'm going to be allocating to those control groups around those That I know to the specific number of containers how many I can run on my machine So instead of this kind of like well we're going to throw some more VMs on Or we're going to throw some more in this case containers on and we're going to watch it and see if the machine is okay No, in this case I can divide up my system into exactly the number of containers that I know it will support And I know that it allows me to do forecasting, it allows me to do scheduling It allows me to know when I'm going to run out of capacity, when I need to buy new servers and things like that So control groups are pretty awesome And again it ensures that if a container is compromised or just has poorly written code Like somebody does something that gets into a race condition or something like that There are limits in place which minimize the risk that that misbehaving container is going to hurt the rest of the host Notice that when I run the command that system control status docker.service I get the control group and slice information So you can see that when I run, when I've got a container running on my machine It's actually got its own docker service system slice And so I can go in, now by default, when you spend this stuff up We do put it in its own control group but we don't put any limits on it Doing control groups like actually manipulating control groups I could spend like eight hours on that, it's a fairly complex topic I'm not going to get into all of the nuts and bolts and the nitty-gritty of doing it But be aware that at least on Red Hat based systems We automatically assign each instance to its own control group And if you do start needing to get really restrictive on how they're going to access the systems You can go in and put limits in place You can go and look in the sysfsc group pseudo directory to see what the resources are that are allocated to your containers Now there are like 8,500 entries in that directory So again, unfortunately it's kind of not practical for me to try to go into each of them But you can get information about memory, CPU, block IO, network IO and so on in there So if I go and I look in sysfsc group and I do a find pipe word count dash L There's like 8,500, almost 8,600 of them So the next component in the container strategy that I want to talk about is the docker daemon itself The docker daemon is really responsible for managing control groups, orchestrating those namespaces And all of those other things that I've talked about so that the docker images can be run and secured Because of the need to manage kernel functions, docker itself runs with root privileges And that's fine, and by that I mean the docker daemon It runs with root privileges, be aware of that But it's a pretty secure environment, no code is perfect, there will be an exploit at some point Just the nature of the war between bad guys and good guys But it's pretty safe, so there are a couple of considerations when you're running docker Obviously you don't want to allow someone access to your system that you don't trust You want to make sure that you've got some sort of way to make sure that you've vetted folks Who are going to be spending up containers in your environment The documentation recommends that you add users to the docker group so that they can run the docker commands But with that flexibility does come some risk Make sure that you only delegate this ability to trusted users And remember that they can mount host file systems in their container with potentially root privileges Also everything that I've talked about and everything the docker does can also be done via REST API So really recommend that you've got updated versions You're keeping your docker environment, your management environment, the docker daemon and so on Make sure you keep that up to date I can't stress enough, almost every single exploit that you see that makes the news that everyone's like Oh my gosh, that was bad Almost all of that is from two sources Someone being stupid and plugging a USB drive in at target that they found in the parking lot Or, more often, somebody taking advantage of outdated and insecure code that's network facing The one about the guy picking the USB drive up and plugging into the computer, we can't help that I swear, who's the comedian that says you can't pick stupid I kind of think that's just going to be with us forever But if you're running an environment, especially an environment that is internet facing You have got, you have got to keep up with your security updates That's just kind of, this is admin 101 If you are going to expose the REST API over HTTP, please do as a cell Don't expose it except to secure networks or VPNs Unless you really really really know what you're doing and you're really staying on top of security Now I get that in some places if you're doing like a public service offering Where people can spin up containers and you want to give them API access You might, you know, you might need to expose it to a public facing network Make sure you do it with SSL Make sure you have authentication mechanisms in place, you know, et cetera, et cetera, et cetera Don't just leave API gateways open So, Linux kernel capabilities is really pretty cool And even I, I've been working at Red Hat for over a decade Even I, when I started digging into this, was like, oh wow, didn't realize that this was, you know, this was a thing Didn't realize this is how this worked So historically, if I have root access to a machine If I am logged in as root, I am omnipotent I can do anything that I want to And what are we doing within containers? What kind of privileges are we granting folks within their container? Root privileges, right? So you're like, you feel a little uncomfortable about that Like, oh, but hey, it's in the context of their container They're not going to do anything stupid So, but Linux capabilities is a set of controls Very fine-grained controls that allows services or users with root equivalents to be limited in their scope It also allows non-root users to be granted extra privileges So you can do things like have a regular user And you can grant them the capability of the net-bind service And they'd be able to bind a service in their container, for instance, to a privileged port So Linux capabilities is pretty cool It basically allows you to cheat, to grant more or less privileges Than you would normally be able to grant Now, in containers, many of the capabilities to manage network and other services are not really needed SSH services, cron services, file system mounts, mounts and unmounts Not really needed because those get handled by the Docker service Network management is not needed and so on By default, Docker disallows a lot of those root privileges Which is a good thing Including the ability to modify logs, change networking, modify kernel memory and the catch-all Capabilities of system administration And if you go and read up on the documentation of Linux capabilities And I'm sorry, this is an eye-chart, I'm sure you can't really see it But I went and I looked under the Docker GitHub page You go to Docker, Damon, exec driver I can't even read that native template and look at the default template for Linux capabilities You can see that really only a very small subset of the capabilities that root normally has Are passed through to those Linux containers The net result of that is Even though we're granting users root privileges inside of their Docker container And they can, we can absolutely go, here's the gun There's your foot, knock yourself out They're probably only going to shoot themselves in the foot Because we are limiting the capabilities that root has inside of the, or that pseudo root user Has inside of their container They're not as likely to be able to damage the rest of the system And then finally, or not finally, but in addition One of my favorite topics, like I said I present on SE Linux every year at Red Hat Summit And really any conference that'll have me I will evangelize SE Linux I want to talk about what SE Linux does So security enhanced Linux is an example of a mandatory access control system There are other ones out there, it just so happens This is the one with which I'm most familiar I work at Red Hat and that's what we've adopted But basically, processes, files, memory, network interfaces Memory addresses, ports on the network and so on Are labeled by the kernel And there is a policy which is administratively set and fixed If you look under the slash edse slash SE Linux directory There's all kinds of cool information about there About what the policy is composed of Any changes to the policy and so on But basically that policy determines how processes can interact with files With other processes, with network ports The kernel and so on So essentially what happens is a policy is built And we have a default policy on Red Hat Enterprise Linux And Fedora and CentOS We have a default policy that is kind of our best guess As to what makes the most sense in the most common use environments But basically the way that SE Linux works The things that we care about from an average user perspective Is that SE Linux works with labeling and type enforcement And let me explain how that works If I've got this mythical service, the foo service The executable file on disk If I look at it using ls dash capital Z Which shows me SE Linux context If I do ls dash z I might see that it's labeled on the file system with the foo exec type foo underscore exec underscore type That says to SE Linux this is an executable in the domain of foo And it's a type So the label is foo exec t The startup scripts might have the label foo config t The log files might be foo log t The data may have foo data t So and then when you fire up the process We're no longer looking on the file system But we're actually looking at processes in memory I do ps dash capital Z That's a common argument for SE Linux So running in memory it might have the label foo underscore t Or foo type So those are labels Those labels are typically defined either by Well either we do labeling within the default policy Or an application developer if they understand SE Linux And I hope that they do will do all of the labeling So labeling is just how we identify processes and files on the file system In memory they are managed by the kernel on the file system Because it extended attributes on the file system So that's labeling Type enforcement is Now that we know all these labels, these types It's the rule that says that when a process running in the foo t Context tries to access a file on the file system with the foo config t Or foo data t Well that makes sense You want the foo type process to be able to access its configs and its data That makes sense When the process with label foo t tries to access Or also I should say when it tries to access foo log t That works as well But any other access unless it's explicitly granted By that policy that's stored under SE Under SE Linux it's going to be denied So think about it If I got my foo process that's running with the foo underscore t Label And that process tries to access Let's say it tries to access a file under SE That's got the label shadow underscore t Raise your hand if you think that's a good idea to grant that access We will all laugh at you Right, it makes sense It's actually fairly straightforward Linux is not nearly as complicated as people think it is It's all about what process is running What context, what labels, and what they have access to So If the foo process tries to access For instance the directory slash home slash t Cameron with the label user underscore home underscore Durtype even if the permissions are wide open The policy will stop that access Even if Ichamod777 slash home slash t Cameron foo will not be able to access that foo process Running with the foo underscore t label Will not be able to access that home directory SE Linux labels like I said are stored as extended attributes On the file system or in memory The SE Linux labels are stored in the format Of the SE Linux user The SE Linux role And then the SE Linux type And then multi-level and multi-category security So for the mythical foo service The full syntax for the label would be User U, object role, foo type And then S0 and C0 And I'll show you where this comes in in just a second The default policy for SE Linux Is the targeted policy In this policy we don't use the SE Linux user Or role that's for multi-level security Like in government organizations So we'll ignore those We really only care about the type And the MCS label Think of the MCS label as extra identifiers It's kind of like port numbers Like we know that the address for a host Is never going to change But the port number for incoming connections may change Like 80 for web or 443 for web And 25 for mail and so on MCS is just extra identifiers for SE Linux So in SE Linux for containers We can be very granular About what process can access Which other process or which part of the file system So to be real clear These are two totally separate labels Even though they're both user U Object role, foo type This one has S0, C0 And this one has S0, C1 As far as SE Linux is concerned They are completely different So type enforcement says That a process with the first label Is different from a process with the second label So policy would prevent them from interacting Also there's no policy allowing a process Running with those labels to access For instance a file system Unless it's labeled foo config type Or foo content type or another defined label Neither of those processes for instance Would be able to access SE shadow Or anything like that Now on a standalone system running Docker All of the containers run in the same context By default, but for instance In our Paz offering OpenShift We actually make each instance Run with its own SE Linux labels So even if somebody were able To gain access to The process running a Docker container SE Linux would still prevent them From attacking another container on the machine So it works really well So let me show you an example I'm going to emulate an exploit Where someone takes over a container I'm going to use runcon Which says run in the context of To change my context to that Of an OpenShift container And then I'm going to try to access SE shadow I'm going to try to write to the file system and so on So what happens is I do an ID and I'm root Okay When I do ID-Z You can see that I'm running unconfined In a specific context Then I'm going to take on The SE Linux context Of a Docker container So I runcon And I change The type and I also change My mls and mcs labels And so what's funny is As soon as I run the runcon command And I change SE Linux context It even comes back before I even get a full shell Back going I can't access bash rc Even though I am root I've still got the root prompt When I do cat etsy shadow Nope When I try to touch a file on the file system Not allowed I try to just do a listing And I can't see that as well I'm totally blocked off from doing any of that And I think to myself I'm going to be really smart and turn off SE Linux So I do set and force Zero Not allowed So even though I just changed my SE Linux role I didn't log out I didn't change user IDs I just took on a different SE Linux role It blocked me And I couldn't do anything Even though I was still logged in with root privileges So SE Linux Is incredibly powerful It is I'm not going to say it's trivial to learn You've got to put a little bit of brain sweat into it But seriously go watch the SE Linux premiere mortals Video It's one hour I think I encapsulate a lot of the basics of SE Linux And talk you through How to get it set up and how to fix things And change things with SE Linux But it is incredibly powerful To see somebody who is actively logged in To the console as root And able to even change a file on the file system That's pretty impressive So let's talk about a couple of tips and tricks Containers really At the end of the day are just a process running on the host That means that we As system administrators and systems engineers Have to use that exceedingly rare thing Known as common sense If you're running something on your host Just because it's containerized Remember we don't deal in snake oil here It's not a cure for all the nails here You still have to use common sense Do Have a process in place To update your containers and follow it I cannot tell you How many times I've had conversations With folks that are like Yeah our developer created this really cool PHP container And threw it over to us and we put it out there in production I'm like really Who's the last time you upgraded it Huh You can't fire and forget Do run the service In the containers with the lowest Privilege possible Drop root privileges as soon as you can Don't allow root access if you can avoid it Mount the file systems from the host Read only wherever possible Sometimes that's not possible You want to be able to write log files and things like that I totally get that Make sure you're smart and where you grant access To write those log files Where possible, mount read only Treat root inside just like you would treat root On the host Even though I've talked through how we do Isolate that root account from the host OS I'm a belt and suspenders kind of guy when it comes to security I want to have as many barriers To somebody doing something bad to me And making me stand in front of my boss When something's going wrong Just call me a wimp, I don't want to do it And Seriously use some sort of Log monitoring capabilities I don't care if that means you read the daily Emails from the root cron job Or if you have something really sophisticated In the same place it's going to do data mining Watch your logs Don't Don't just download Bill and Ted's Excellent container that you found on the internet From some site in Romania Yes, I've seen that happen Don't run SSH Inside of your container I have seen people do this Well, I'm going to go ahead and build this web service And I'm going to go ahead and put SSHD in there as well So I don't have to mess with the admin Don't do that Because that's one more increase in the service area For attack It's one more opportunity for you to forget to update Don't do it Don't run with root privileges Don't Disable SE Linux Don't roll your own containers once and then never maintain them It's easy to do guys I know as well as you do, right? Sometimes you've got a big fire going And you're doing your best and you're busting your butt And you put it together and you throw it out there And it was an emergency so you went through an exception And you didn't go through the normal dev And QA and prod type of promotion And the fire is now out And now you have dollar day job Which has taken up all your time And you think I'm going to get to that I'm going to get to There was something That I was supposed to but now I got my job to do Don't let that happen to you Because we wind up having these containers out there that have vulnerabilities And don't run Production containers on unsupported platforms If you Run your business on something My opinion is you should have that big red Oh no button So make sure that you're doing it in a Supportable and supported configuration So really In conclusion I hope that you will go forth and contain stuff Containerization is Awesome technology I am really excited It's been an incredibly fast moving Productive in a very positive way Technology for IT Just like you man I've had to bend my brain around some concepts That I wasn't familiar with in the past You heard me I came from a Nobel environment That tells you what my background is I'm a dinosaur up here And I'm having to learn new stuff But it's awesome stuff and it's exciting stuff They make application deployment super easy They leverage some incredible capabilities Within the Linux kernel And by design they are Relatively Secure Obviously just like any other technology out there There are some gotchas As with every other piece of software out there Docker, tech requires some feeding and maintenance Well maintained Containers Well maintained I should say Containers can make your business more agile Less complex and if you do it right Safe Any questions Yes ma'am Basically the question was What if we're doing our own Docker registry Are there any security concerns there Have a good CICD environment in place Have notifications when your upstream Projects that you're building the containers out of Rev Like if there's a security update for whatever The application or framework that you're using is Pay attention to that Preferably When upstream revs have some process in place It's going to suck in that source and start your CICD Environment And the container needs to be upgraded And you guys are going to push the upgrade out If you've done it correctly and you're doing The amount so that the actual data that you're using Or web service content or whatever In theory you should be able to kick a new container Out with practically zero interruption In application service Yes sir So for the Docker uninitiated When as I understand Because it's a layered file system When let's say I download like a base red hat image And then I build another container Off of that right And red hat revs that image For whatever reason So merely downloading that new image Will is insufficient Fair point What is the next step from that At that point you need to start Well there's a number of things that you need to do You really need to Again you need to pay attention to And preferably set up some sort of notification Or even more preferably An automated process that's going to go get Feeds There's a number of ways to do it Whether it's RSS or whatever To get feeds of oh hey the You know the underlying red hat Or Ubuntu or whatever container Has been updated To be real honest with you You've just asked the $64,000 question I mean that is something that Everyone out there is struggling with There are a number of ways to address it You know there are a lot of startups That are doing it I can't talk about it yet But look for some announcements coming soon from red hat So there are folks who are working on it But yes you are right it is layered You don't necessarily even often have Just a container There's going to be stuff that it's dependent upon But that is a complex question And I don't have a comprehensive answer Especially not that I can do in the next three minutes Yes sir Doctor sorry I often find myself using CAPAD for doctor And unfortunately I often end up finding myself Just telling it to run in Admin permissive mode because The particular CAPAD permissions I need Are not addressable Somebody working on that I'm sure there is I have not heard anything about that So is what you're saying That when we do the Capabilities filtering You're saying that it's too restrictive Well I'd say If I actually need to run something In the container on a VPN In what I'm sorry On a VPN Okay And that requires access To not just the network but my routing cables I can do CAPAD net admin But I can't do CAPAD routing So I end up having to do CAPAD admin And give it full admin permissions And I was hoping that somebody was working on Adding additional discriminatory permissions To CAPAD Because I haven't seen a lot of activity lately Is someone I'm almost positive there is As to who that is I haven't heard off the top of my head But I will say and I'll say this to you And I'll say this to anyone else Like that Because that's valid That's totally valid I get why you're saying that Open a bug Either with us If you're using ours Or with upstream docker Or whoever's version you're using Open a bugzilla report Or a trouble ticket So that we know to do it So Was this helpful Yes Okay good Because this is kind of a new presentation for me So I want to make sure that it was I'm not the only one Who is like I'm not sure what all this stuff does So that is fantastic Guys I got to clear the room I think we're out of time Yes we are indeed out of time Thank you Very much for coming And for giving me the opportunity to talk to you Testing Here we go We're going to get started By Q Q specializes in networking And building relationships Within Southern California To connect top talent Q is also sponsored by XLA And new UASC in the U.S. This talk is DevOpsing the operating system Our speaker is John Willis He is the director Of the operating system development For Docker We've been discussing containers And virtualization So without further ado I give you a challenge Great thank you So just to get a little perspective How many people in the room Have actually worked with Chef or Papa Paramount Okay how many people Actually worked with Docker Okay there we go All right good The idea of this Session was So when I spent a lot of time In the last year Like I always think I have four jobs What I want is my family Which is a great job The second job is My day job Right which is You know today I'll tell you I work at Docker Right so And in fact Good time to tell you Who I am John Willis director Of ecosystem development Which means that I'm in BD I try to help Partners integrate With Docker In the ecosystem You know So anybody who's trying To figure out how to You know use Docker Take advantage of Docker That kind of stuff I try to help them I've got a team of people Three guys that That we help them integrate So that's kind of my second job My you know My third job is really I've been an evangelist For many years Some of you know me And I like to pick themes Throughout a certain period And in order to themes This year has been Immutable infrastructure And I'm going to talk a little About my idea My thoughts about that And then you know Near the end of last year And well And my fourth job is This thing called DevOps Which is I spent a lot of time Thinking about lean Thinking about like The metter layer of What why DevOps is Awesome Not the tools And if all things Being equal If I could do that full time That's what I would do I would just Join a think tank And really think about how to To create organizational capital So with all that I start thinking about What kind of presentation Would I do at scale That could be valuable Giving Except for my family Actually my family My son spoke yesterday So actually all four things How could I give a presentation That might take all that into account And kind of the driver Was the unicolonial thing So when We acquired a unicolonial company A while back And it's been killing me Not to say anything about it Good friend I have a friend I'm going to I'm going to mention later Gareth Rosgrove He's like I was in London He got his eyes from Cambridge I'm like Yeah, that'd be cool You know, the Cambridge guys Are the guys I couldn't say anything to them It was so hard So I thought, okay How do I put this all in perspective Like how do I put immutable infrastructure How do I take my experiences Of what I've done with Chef And puppet And put that in a presentation Kind of, you know You know Called a devopsing operating system Really which is You know how How we dealt with this thing Within the last at least 10 years Certain, but maybe Maybe 15 years from 2000 on So for those who don't know who I am I pretty much live through Twitter in terms of my communication I try to use that as my primary Vehicle for communication You know, if you Want to get a hold of me afterwards If you want to yell at me Tell me I'm an idiot Whatever If you want more feedback On some of the things I mentioned That's really always the best place To get me That is Dr. Deming That's not me I do a podcast with One of my best friends In Edwards I've been doing this for Roughly long time 35 years in IT operations I actually was early in it Canonical When they were just first Starting their private cloud Like 2009 And then I was the ninth person In that Chef ops code Help kind of define the customer Facing business there The startup gods That is to say I spent 30 years of failed startups But the last three to four years I'll say four years now It's been very good to me Because I've had two exits I had one in Stratis, South of Dell And a year Basically next month It'll be a year I sold a company called Saka Plain to Docker We did like SDN stuff I'm a core organizer of a DevOps So again for the people that Don't know me I'm not bragging I want you to get a perspective Of why maybe this presentation Will make sense I was at the first Original DevOps days In Ghent I was the only American there I've been part of that whole system Help create the first DevOps days in the U.S. DevOps days And then I also work A lot with Gene Kim How many people have heard of Gene Kim? Right If you have it And I don't make any money off it So though Especially I will indirectly Make money And I'll see why In the next slide He wrote a book called The Phoenix Project And if you're interested In kind of the metal layer Of DevOps You should read that book It's a novel You'll read it And you'll say In fact, remember when I tell you this That you will say this Because when you say it You'll say hey That dopey guy said I would say this You'll say how did Gene Get into my company? You know How do you sneak in And figure out what was going on In my company? And to that I would We just finished a book This is kind of my last Shameless plug The We figure out the book Called the DevOps Handbook Jez Humboldt Gene And Patrick We've been working on it For four years We've written about three times It's finally done I mean, it is like done Done, done this week And it really It's not because I've written it I mean, if you think if you've Read anything like Jez Humboldt Or Gene It will be the most comprehensive Book on DevOps And I've read pretty much everyone And again, I'm not just saying It's just mine Partially mine The last thing I'll say Because I'm going to try to get into The rest of the presentation Is going to be kind of this More tully discussion But like I said My kind of fourth job Is that mental layer About DevOps And there's been a lot of definitions About DevOps over the years And Adam Jacob The founder of Ops Code For years to me Held the prominent definition Which was DevOps is a professional And cultural movement Full stop We're done Nothing more to say As I've been Really working hard On the last three months Of kind of cleaning this book I've been going through Chapter and chapter And rewriting stuff And I realized that I kind of have My own definition now And I think is The one I like best Which is DevOps is a movement Motivated to turn human capital Into high performance Organization capital There's a lot there And things that we're doing With Gene With the DevOps survey We've categorized statistically High performing organizations How do you train people It's the beauty To me it's It's a beautiful sentence Because it explains It's not just humans And it's not just tools Tools are the glue between Organization capital And human capital How many people Have ever seen this screen Yeah So let's go back With the Way Back Machine Back in the late 90s Maybe early 2000s As I get old My memory gets a little How easy But most enterprises I did a lot of Tivoli consulting Actually before I got Cloud and open source And Chef and things like that Puppet I did a bunch of work With large enterprises Doing Tivoli work And most large enterprise The most banks and insurance Companies were predominantly Windows based For most of their Core service applications There were a few examples Like in design And things like that Sure you had Sun and other type of But a lot of the business The business of America Was basically a lot Was Windows And very little Linux In fact I remember It was about 2002 Where the Captain Bank of America told me That Linux will never be in production And it told me flat out Linux will never be in production In Bank of America And so anyway So the idea was For the people who haven't used this Basically it was called Ghost And you basically took pictures Of an operating system You saved it And people would Do their rollouts Of desktops and servers On this model of a ghost imaging And basically having images A lot of people do some cadence Maybe like quarterly Have quarterly builds And then they would Early on They would just script to deltas So you'd have a quarterly Ghost build image And then before the next quarter Anything that got added Or had to get slammed In between that Hey we got a new version It has to go in Or this new tool Has to be in there You'd script to deltas And then you'd start over And the problem with this was For anybody who lived it Like it was We had no management systems For cataloging images We didn't even know We had to do that And the chaos that happened Was wrong images got Like where's the right image You could have Whole weekend rollout Plus the technology To build the systems Sometimes took a whole weekend Like if you had 3,000 servers In fact some banks They almost called it The painting of bridge syndrome You literally rolled out And then you Like you finished painting the bridge You actually start over To go back again, right? And what happened then We really Somewhere along the way You know Give or take We had this kind of What I call first generation Configuration management And if you look at like You know there was Tivoli There was BladeLogic There was Opsware There was some others Those were what used to Called the big three And if you look at a right Tivoli for installing an AIX package And for those of you Chef or puppet It kind of looks the same A little bit, right? And in fact as you go back And you look at Like the Tivoli configurator I refreshed my memory Went back to look at the manual It was kind of interesting How many of the primitives Were almost the same, right? But here's the key thing Is this was not a convergent Or it did not adhere To some of the desired State concepts that we use With what I'll talk about in a second Is the second generation Configuration management And that is Everything was kind of Individually defined Now some were smart enough To say I'm going to do this This is order But you never thought of a system As a desired state Or systems Like this is the web server 404 This is the proxy server Or this is You basically built Like you had this blob And you built it up like this So it was never a holistic view And again it didn't have Convergence It was very transactional You did things Like install Install commit Remove Remove kit You had pre-install Post-install scripts Those were a lot of cool things I mean I made a lot of money I got my first boat from Tivoli Right? Like so But it wasn't right It was a mess I mean literally There were companies that Literally were doing the painting The bridge installation Of infrastructure And then I think it was 2007 I'd go to Oskon So I did this I was doing a bunch of Tivoli stuff I sold my business To my business partner And I figured alright I'll just do open source stuff Because I don't have a non-compete On that stuff And I show up at Oskon And I'm really interested in Nagyos Versus Tivoli monitoring Like this is what Like I really want to And I go to this Session called Puppet Because somebody told me It was a monitoring tool So I sit in there In the back row And I see this young kid That I'm going to exaggerate here But he had pimples He probably didn't Luke Kniece And who founded a puppet lab And he's talking about figures Manager and I'm like In the back row thing And what is this young kid Going to tell I just left Bank of America And I Ten minutes in I'm in the front row And I realize my life is changing Because what he's talking about Is like so much better Than everything I've done For the last ten years And so I get to meet him About a month later In Nashville And I do an interview with him And he tells me about this system These guys in Seattle And this is early This is 2007, right? So this is one of the early Applications on Facebook Called I Like Went in one week So they say Hey, let's do this thing On Facebook Let's create an application One for half a million Is six million Is in one week Turns out There was some consultant In Seattle That basically Enabled him to do that Because there was no cloud They did it with bare metal They went from A half a million customers To six million customers In one week In bare metal So I wrote this blog article I'm like Why do people spend Three million on Tivoli And when they can hire A couple of consultants And use things like Cobbler and Kickstarter And Puppet And they will use Puppet And it turned out to be Adam Jacob Who's the founder of Chef So he pings me We become friends Long story short I wind up going to work for him But this One more thing before that This is the guy who came in As a CEO chef Around the same time Jesse Robbins wrote this Really cool It's still out there It's called Operations to competitive Advantage The secret source He actually subtitled it To tell two startups At a time Most people would think In a traditional operations Model With basically In the last time They'd say Okay, I'm going to be A startup I'm probably going to have In the first eight weeks Maybe twenty servers And Yeah The twelve weeks I'm going to have twenty servers And And so how am I going to Dedicate how am I going to Build it So a traditional would be You know Minus technical debt We would go ahead And just kind of build Everything we had And then learn as we grow And build stuff And what he tried to show Is Was that if You know If you really paid The technical debt Particularly in dealing With the building Of the operations I'm using it loosely Right? I think you realize that now I'm building what it takes To build infrastructure And what he was saying is That if you spent a lot of time In that first couple of weeks Building out the infrastructure In a certain repeatable way And he was at Amazon At the time You know And they would use a modified version Of the CF engine So I've been told And And then he showed Graphically what could happen In terms of your Ongoing costs And in fact Early on in the days When we started talking about These type of models We would say things like What do you think Your server to At Your server to sysadmin ratio Was? And you could go into The kind of Tivoli shops Which I did The BMC and Blade Logic Shop And they would tell you One to 30 One to 50 And in the early days Of companies that What I would call Second generation Configuration management Really starting with Luke Now CF engine was 20 years old last year I'll talk about it in a second But with Luke Luke really put This kind of infrastructure as code Desired to take Configuration management On the map I'll explain that in a minute But what we saw then Is those ratios started Going up to like One to 500 One to 700 You know Like in some of the Web scale coming in Today it's actually Not uncommon to be One to 30,000 And that's Precontainer And unicolonial Right But what happened here Just for those who don't know The history 21 years ago Mark Burgess Is a professor University of Oslo Inherits the data center To manage Like who else should Manage it A guy who teaches Queer Science Of course And he realizes that The I'll try and only curse Once in this presentation And I'm going to lie So I'll probably do it To a three But he realizes that All the current tools Are shite That actually use it And so he creates A student project To figure out What's inside And they start Deciding things Like desired state Configuration management Convergence This idea that You actually treat The machine as a state It's the web server And I want it To be tamed Like a web server So I have an agent That constantly looks And what's interesting Is Luke Kniece Was actually Kind of a power consultant Of CF Engine And when And he He got frustrated Because Mark Mark is a good friend of Mine now Mark was in academia And he didn't give a Rip about the interface It was all XML It was horrible And any time a customer A user would say Can you Like why can't you Make this interface easier And he's like Not my problem I teach computer science And the big guys Like Amazon and Facebook And all those guys They didn't care Because they put A human capital If you will Into the problem To make it better for them If you went to A large company Like Facebook said Do you run CF Engine They'd say Like we've put So many hours into CF Engine In our development That it really Is a CF Engine So Luke got frustrated And he wrote The whole thing Over again In puppet Or in his language In Ruby In a language that he Provided as a DSL For puppet And then In this story Kind of continues Where Adam Jacob Like you saw it He was doing That was I like In large scale Infrastructure With puppet And he felt That the cloud was just Coming along And puppet was very Nested deep In enterprise Red hat enterprise And all of a sudden You know Cloud you bond to And all this And just little things Like chef For those of you Chef like Chef knife was A great example Chef knife was Almost a year ahead Of puppet In terms of Be it was just Type in a command And provision Seamlessly An Amazon Instance Right It would basically Catch the exit Come back Get the information back That the instance Was started Get the instance ID Turn around And start provisioning Configuration So you have this kind of What I call Second generation Configuration manager And that's just A chef recipe there And I thought I'd put This in as I transition Because again I'm trying to walk through This What have we done Over the last 15 years What have we done Right What have we done wrong You know I did And I tried to go back And I saw Lots of people do The history of Versalization Lots of people do History cloud And then One of the things We'll talk about a little bit With Docker and a little bit Is their kind of Image copy on Right model For imaging And I realized Nobody really put one Of all of them together So Lisa's As accurate as Wikipedia is And as accurate As my 56 year old Memory is This is the history Of pretty much All the things We've seen It's kind of interesting Whenever I go back And look at this I think its zones I always think There were Slayer zones Like in the 60s And then I look at that And say it was actually Really his latest 2004 But The story really changed Right in 2006 Right Amazon Cloud instances All that stuff And again I think you saw A lot of Ops code Was able to take advantage Of the ability To do this You know What Amazon was doing Seamlessly with I like With things Like Kickstarter And then Kicking off Puppet You were able to Basically drive The good stuff's Coming guys There anyway And then It's funny And we said With the ghost stuff Right we had Like image sprawl Then everything Kind of calmed down And then we got Into cloud And it happened All over again Like there's A message here That we make The same mistakes I didn't mind Randy Byas Writes this Bargaret If only he's absolutely Right Is that we had The same image sprawl Problem All over again That we had With ghost images And in fact One of the greatest Stories was PBS Actually So there was A tool It's still used Quite prominently Called RightScale And it was a tool That basically Abstracts And allows you to Manage multiple Clouds And primarily Amazon Image And the default Was public And unfortunately PBS actually Put a private image Public And had all the keys In it And it took him Three days to clean Up that mess Right And again You know That we Had like If you read the article There's just There was Just images I mean Even today Really It's still not Really easy to find The right image For the right And stuff like that But the point is It's still A problem And So what What really drove Chef and Puppet At that point was Yeah, that is a problem That's why You want infrastructure As code Right Infrastructure code You kind of Get your best Case just enough Operating system And then We build everything Through A library Or a DSL defined Infrastructure We have cookbooks For everything I'm more familiar With chef terminology I haven't worked With Puppet Fair enough But we create A run list And we kind of Catalog that as The web server And then it runs These cookbooks Which runs These recipes That from the Kickoff of However you're Incendiating Either it's Do clouding it With Amazon And then You start firing Off these cookbooks Recipes And you start Building infrastructure And incremental And life's great Like You hit a button You should Look the same And Adam wrote A chapter in If you haven't read Web operations It's an excellent Book, John Aspar You wrote Chapter 5 Where he Jokingly talks about Watching a movie Tornado hits your Data center You know You put the movie on Pause I mean It's a little Flippant You go ahead You reload Your infrastructure's Code in You kill your cloud Service But I mean That's how we've lived For quite a while now Except somewhere along The way and I remember When this is I was at Netflix Wrote this article Called Building With Legos And I'm like Oh no Here we go again Which for all All the things That Like we Like humans Ever learn Right But what's Interesting is It wasn't It was a head fake Yes they were Building Building images Like Our two Early examples Ghosting and all But what they did Cleverly is They were actually Building Like they were Building war files So the whole Concept of Jara war They just took that To the whole Infrastructure So what they did is They'd actually Oh this whole Discussion came up Called Bake vs. Fry Do you bake An image Or do you fry An image And a bake If you have to fry it Like in research code You gotta work your way Up Could be 8 10 15 minutes I've tried to Tell me 30 minutes They're saying I don't know We bake them But what we do Is we bake them Just like we bake Application code We put it in nexus Or an artifact Or wherever they put it And when you go to Provision You pull the latest release And then of course There's Some services they built To build The scaffolding For convergence Between multiple systems In the end It was like Oh wow This is nice And actually This was really The birth of Really Even though they didn't call it At the time Immutable infrastructure This was really The birth of A concept of Immutable infrastructure Well you had this Concept of Immutability And let me say Immutability As in the terms Of a metaphor Not exactly Immutable There's no Computer system as Immutable Things changed And then you turn Take a system And throw it into Some provision And be assured that For the most part The bits are the same For the important parts And that If you need to change it You just Re-provision it Or you roll back To the earliest release And this is work for Netflix Reasonably successful Although They've had to put Like everybody else Who does this really well A lot of human capital Into making this work Because if you go out Looking at open source Projects Or a trend Or out of code That they've written Even on Amazon Where you're not That was the whole point You had to go to Amazon You didn't have to Have a whole group To build all this Infrastructure And so I'm sure everybody In this room knows this Because I'm going to Transition now into Kind of Immutability and containers And of course I'm going to talk about Docker But just To make sure We're all on the same Page And certainly are Right there's Really What Kind of three types Of virtualization That we deal with On a day to day We've got type one You know Our VM 1, ESX Hypervisor APB Indirectly That's Amazon Rack space Although I know Some of those guys Have converted to KVM At some point But in general What we saw over the Last At least Most of the time Over the last ten years Have been Zen based And then Type two Which is our KVM Virtual box VM 1 workstation Which is usually Vagrant Either on a Virtual box And then We saw With Docker The rise of what was Called OS Level of Virtualization Which is this Hypervisual list Virtualization The compute Is really Linux Processes Open VC Was a good example XC And Docker was Really an abstraction On top of LXC With some other cool Stuff And this is a Really good slide From an IBM paper That came out Of depicting the Difference between Type one Type two And Linux containers If you ask What's the big deal About a container Well It may or may not Be a big deal But you're Type one Right You have the Hardware, you have The hypervisor And then you have The VMs And the VMs contain Everything The operating system It's a full stack Right These are big Fat images And then Your type two Where OS level Virtualization The predominant Model for That is Implementation That is Linux containers You have The hardware operating System And then you share The operating system Kernel And then the images Are reasonably spark Small Compared to Big VMDKs Or large AMIs And so What you get With the OS level Virtualization is You get Right So you can Provision A container In 400 500 milliseconds Depending on How much is in it And all that Kind of good stuff Right There's some as Low as 150 200 milliseconds 240 milliseconds Probably Small as I've seen It is bare metal Performance So you don't have The hypervisor Getting in the way Of You know You don't have The queuing All the things With the hypervisor For most of the things That you see Of people using today And what you can use You know Most applications I mean name a few I mean MySQL Nginx I mean we can Spin them out They all run in Container pretty well Every once in a while There's some idiosyncrasies That we have to do Certain things a little Bit differently But for those part It satisfies Most of our Requirements for running What we call Virtualization It's lightweight And of course There's a rising tide Of quotes And that There's been a lot of people Contributing The viral growth numbers On Docker Or I think that In October They had their Billions Download for Docker Hub So that's two and a half Years right Like I mean this isn't An apples to Apples comparison But I Can't remember how long It took Facebook To get to a billion Users And it isn't in Apples and Apples But I'm just saying A billion downloads In January I remember that We were bragging It was actually April Our CEO Did a blog post Where he was bragging About our growth And it was a half It was a half You know It was basically 500 million So we had In our first Two years We went from zero 5 million And then In the next six months We went to a billion That's a hockey stick You know So if You're not Filling containers I mean You're basically It's called A container You share the kernel You get isolation Of the process We use name spaces So you build name space Isolation Actually 191 And now 110 Which is In release candidate One right now Is actually Added a user name space So now you actually Be root In the name space But not on the host And that solves A lot of it The 110 Is a really Significant security With this I mean really Signific And then Since most of people New Docker here I wouldn't spend Too much time here But if you think What I want to transition into The thing I think that's Become really interesting Is immutable infrastructure And how some people Are starting to deliver Infrastructure In a way Like the Netflix Building with Legos Concept But even Even crazier If you will And it's because of this Isolation And one of the things That happens With containers And again My preferred container Dialogue is Whether it be Docker Of course there are others And Pick your favorite But What developers The thing is Developers love Containers And I'll just say Developers love Docker And the reason Developers love Docker Is a lot of reasons One is Imagine Your developer Or your last line of code Something makes a mistake You're testing And you got to rebuild Your infrastructure And it's a So it has Your part of five Or six different services Um You probably can't run that On your laptop Maybe But But even if you could The spin up of that service On a good day Could be three, five minutes Could be twelve minutes And I remember I told you I did a DevOps cafe Podcast And I remember We interviewed A lot of people And what I'd heard Over and over was From developer Managers Would say that Our developers Are getting mad Because once they've Transitioned the development To Docker They get mad If their infrastructure Is converging In like two seconds You talk about spoiled Right Like probably It was like five, eight Minutes So there's this idea That you can spin Everything up Really quick To build your environment Um The fact that You can actually Run your whole Um Environment On your laptop So that there's Four or five other Services that are owned By four or five These infrastructure Test yours against it And here's the part That gets really exciting Is everything there Is basically immutable So that means that Everything that I do When I'm done With my infrastructure From my service And I've basically Pulled the great latest Almost think about The Netflix model Where everything's Loaded into An artifact factory Or some place Where it is immutable At that point Everybody else Is at least at this Release Is immutable I test my service I ship it off And we'll talk about Integration here in a second Because that gets even More exciting But for the most part For me as a developer Um I'm pretty safe in knowing That as it goes through The pipeline Integration CD And in production The bits I tested here Are the same bits I tested there And you know And we say in DevOps Developers should wear Pages Right Imagine you wear pages And you pretty much Can know really quick Whether it's your code Or your service Because you have At least you now know A lot of the Like the education Is going to be so insane More than likely It's going to be an Infrastructure problem Or it's going to be An Aircraft problem It's going to be easy To figure out Right So this immutability Thing has really driven It's not It's not for everybody It's not for all applications And I thought We don't go home And tell your manager We need to throw out All this legacy We've had less 40 years And turn into Immutable infrastructure Right there But Greenfield Is getting a lot of success We talked about the lightweight Again Docker just made it simple Like This container's been around forever Um Docker just made it Really brain dead simple I had my own experience Three years ago Three and a half years ago A good friend of mine Wrote a book called Test Driven to Outland Chef And he talked about Using containers in a C.I. How to do Scale out C.I. testing With containers You know Imagine you create a Postgres table And get it to that same Table state A thousand times In integration tests Without having to fire up A thousand VMs Test it That's VM Test it Two, three minutes Four minutes per VM Right Take days Imagine you could do that In like Four hundred milliseconds Right An hour So you could do these scale The thing he did Never really explained how to do it And I tried to figure out How to do it And I gave up And then the first time I got a chance to get to An early copy of Docker When it was still Docker Cloud I read the read me I read me And literally I had a container up In like Seven minutes In fact My first run I thought it didn't work I said Anybody's run the Docker run I said Docker run Ubuntu or Docker run I figured it was One of the base images And it just ran Did nothing I'm like Ah, it didn't work again It did work It just didn't help So And then we talked about Community and all that So And then Microsoft Earlier They did this Thing about When they tested They did some benchmarking And so that's the paper It's the same paper That had that Diagram I had earlier What they did is They did a serial spring Of 150 Apache servers Just simple web serves Very simple Nothing to it But they They brought them off Serially 150 in It was 36 seconds And they were The teardown time Was nine seconds Right So talking about the speed Immutability So thinking about You know About the different types of Workloads that Where the compute Probably is going to have A shorter TTL Right It's time to live And anyway So this is just A little more of the architecture Of Docker I took from the training In case you hadn't Seen Docker But basically Docker engine runs As a daemon That interfaces With these Processes That are encapsulated By namespaces And all that stuff And allows it to interact With the Physical layer And then I figured I'd throw this In the client architecture Does everybody understand How the Docker host And Docker host And Docker containers work In general Yeah Anybody done Another important point Of this model So you have containers And then you have Containers, right And one of the clever things That The Docker People did It's only been a Docker year So I take no credit For any of this The Is the You know It's the I did If you go back to One of the things I put in that timeline Earlier Which was You know Some of the things That came up like Butter FS I don't have A UFS there But the idea Of these These Copy on write File systems And What was cool Is This opened up You know I think it has A lot to do With Capital L A lot To do With It used These Copy on write File systems What do you call Union file systems Out of the box And the idea Of union file system Is everything's Layered So you're always Institutional layer On the thing And you inherit The previous layers And the top layer Is the right of layer So for those You work with A Docker You run A container Against an image That image is Kind of a binary Of everything that's Been stored Mutable If you go in And start Installing packages Or making directories That's all Going to be in the Copy on write Layer And then If you don't Commit that image Or Commit that Sorry, Commit that Canane Or you lose all that Right And We'll talk About Dockerfiles in a minute And how You can do An alternative way To do that But the bottom line Is That's basically It contains everything The application The And in the middleware All the things That you need Are encapsulated By the way I did mention this earlier If you didn't know That when you instantiate Docker It can run on bare metal It can run on You know Private virtualizations And KVM It can run on Amazon It can run on GC It basically can run anywhere Right And immutable Because it doesn't care I mean The only thing It really cares about Is the sharing System So you might want to be Consistent there But in general If it's running Ubuntu on bare metal 1404 And it's running Ubuntu 1404 on GCE Or Amazon For the most part Docker doesn't give a Rip Right But But here's the thing So you have this image Thing With the copy on right So you've got the Immutability Which is awesome But think about What is the other killer Thing Of why people like Docker Can gain things In your CI loop Testing And one of the reasons You can do that Is because of that copy On right letter There's I think It's the next slide No I've got a minute But there's a great Gardner article About you know I mean Imagine if The testing state Of a table You've got a Postgres table You want to test That state Then you want to Crush it And start back At that state Again You want to do That testing Like the image is already Layered It's already on Wherever your infrastructure Is probably a laptop Or maybe it's in some Jenkins slave That's running a Docker inside a VM And all you have to do Is basically set the Image level back To the original So now you're talking Probably less than A hundred milliseconds You know We're not even loading Up the whole thing Now We're just basically Instantiating back A layer image lower And starting over So you get Into some really Really cool scenarios Where you see people Telling stories of doing This like Crazy web scale Horizontal testing Is the eye That really Couldn't happen And so To dive back to The operating system Discussion What you basically do How many people Have built images From a Docker file A little less But that's the same So You So this is the Ubuntu This is actually A backup file For Ubuntu So So One of the things we have Is we have this thing Called scratch Scratch is basically It's an image Defined That is nothing really It's kind of A base plate For building layering Like you can't do A pull scratch And you can't do A Docker run scratch But it's how you can Start It sets up a foundation For building A container with nothing And so this example We actually Pulled We actually go to Ubuntu That's the same image You would do To create an AMI If he was talking about scratch You would create For trustee In 14.04 You would pull The call cloud About image And If he doesn't know When you do an ad Against a TarGZ It just basically Untars it And that becomes The record scratch So you just lay down Of nothing The instantiation Of a cloud image They just customized In it And stuff like that Because containers Don't really have In it Inherently And then Ultimately back So In a sense All your Ubuntu Are You know When you talk about Ubuntu You're really Kind of some modified version Of Ubuntu Based on the cloud image I mean You can do If you go to Docker Hub You can select any Docker So you can look at Sentos Or Fedora And all that It's just a Kind of A Docker Tar file for Sentos That gets pushed over And built So we have these things Called webhooks Or what not We have There are webhooks And implementations So that you can put In your github repository A file To say every time We change this Or do we do a commit Go ahead And rebuild And point it at The Docker Hub Create your image These actually have To be what they call Official images And Then there's How many people have Heard of Busybox Now I'm testing The numbers keep going down So yeah So Busybox was Developed early on As this kind of I mean We wouldn't call So I don't think There's a phrase Just enough container But just enough Operating system It really was A really minimalistic Operating system Early days of Docker Was used heavily For training It was used For Just kind of You'd certainly Use it for verification Processes If you wanted In fact I think The Docker Getting started You run a whole little world And it basically Runs Busybox Right So but Here's the cool thing And this starts getting Into the world Of Where we are And where we're going Unicernals Is that A lot of people Will take Busybox As a base To start thinking About Very small App concentric Container images There's been Some really good ones Where Java So the other words Is notice I'm not loading The cloud image From Ubuntu Right Or you don't see Anything here But if you look in there I'm not loading Anything from When I do this model I'm going to take This base image Where they've Just put the Basics in And then I'm going to Go ahead and I'm really Only going to put It's really An operating system List operating system Right So I'm going to start Adding only the things I need And I'll show you an example We did this with So my I started with Socket plane We Required you To run Open V Switch So we had What we built Is an SDN for Docker And so What we wanted to do Is ship With our Install A container That had Open V Switch in Open V Switch is And so we Kind of made Our own little Socket plane Once we Had a couple Tweaks there But in general We added Just the things That were required I didn't do this Dave Tucker did this He Reversed Engineered Open V Switch And found out Only the things That you needed For Open V Switch So we had And we Weren't doing it For We'll talk about Unicorns Because we knew It was going to be Something that somebody Had to add As part of The installation Of our product It had to be there And we wanted To make it Like as tight as possible And we wanted to Make the image As small as possible So So that's What we did But again You saw A lot Between 2012 And 2015 A lot of people Who would Do these type of You know All right Let's just roll up Our sleeves Of Java Instrumentation These kind of libraries And when it's done And in a sense Had a lot of properties Of a unicorn It wasn't a unicorn Right Because it still ran At OS level Of virtualization They ran it Under a true host And I'll talk about Unicorns in a minute So it Wasn't a unicorn But it Was very Special purpose So there was This kind of Special purpose You know And I Don't really Know what Right And I just Wanted to mention That for those You don't know That you know The top part Has been around For a while The azure But when the Server 2016 Microsoft has Actually written Their own Container Implementation That is 100% Docker API Compatible So basically out Of the box You will basically Have Docker run Docker Because you don't Like they're Not as Going to have as many images And there's still some work To be done on How to run Docker files And builds of images Given the Licensing constraints But other things too But in general All the things That you know About Docker run Docker Docker All the Primitives of the Docker command Will basically Be sorted And you will be able To create images And in fact Microsoft has Even gone Into the windows Operating system Right Like And then You know I talked about Immutable infrastructure You know I talked about The Legos Martin Fowler wrote Immutable infrastructure Let's see What am I at I'm at 45 minutes I'll probably Go a little fast To get to the I'm going to skip this I have links To all my Immutable discussion This is an important paper And it proves Basically From Really a true computer Of why you want to Do Immutable infrastructure And I've done Many presentations On this And you know You read it And you get a sense Of what's the difference Between a congruent And a And a congruent Basically it's going To be An immutable Infrastructure And I've written Some articles And so I want to get into Unicornals Because we got a few Minutes last Right So that's all The pause Right You know I've got to stop This now Now I got to explain This one to my boss Yeah So You know Unicornals are Specialized So we saw this Happening with the Containers And a special You know Purpose Like this was Building up Right And I would tell people In their infrastructure Like You know If you want to do this Right Really go ahead And have an infrastructure You know Not even talking about Unicornals This whole thing about See And do your own Vulnerability You know We're doing some stuff With Nautilus And you can file that Project for Vulnerability Scanning But you know Own your infrastructure So like Actually you're going To do this Scale You know Start having people To think about Like Don't let people Pull images From the wild We don't let people Pull Good hygiene Chef shops Don't let people Pull Some snarky faces But The Like I say that with Containers Like you know You roll up your sleeves Figure out what the base images are For your organization Build those Make those as your This is the base Java image that you get For containers And you know If you need something more Let's put in your quest And let's create that So Unicornals are like Taking it on steroids Where You're basically Going to get a language To take the parts All the parts And you basically Build an image That really is the kernel There's no concept Of user space Kernel space It's just all kernel space Again I'm not an expert here But if you think Oh my god That's the worst thing That could ever happen You would say Well Think of Erlang It has that kind of Model of Erlang Where it's designed for Failure You know So Mileage varies Definitely on The type of Unicornal But the idea is That what you're You're taking that concept I was describing At this point We are actually now Going to basically Develop Images That just Strip out Everything And you can see I have some slides On the benefits here And obviously I'm sure everybody knows this Now We acquired Unicornal systems Were the ones that Do a Mirage OS They were doing Some of the most Significant work With Unicornals Out of Cambridge They were here all week They're the guys That were teaching Most of the class They're either Unicornals And then Like our Type One Hypervisor We'd run a VM With the full stack On top of it If we're running containers We're running A host that We could run bare metal Hosts With the container An application In the container The image Or what most people Do today Unless you're really Really Scale And you've invested Human Capital People run Containers In virtualization Of armors Plus the isolation And so you Pretty much run A VM Or in some cases The guilt story They run They actually run An Amazon image Per container So they get Like a T1 small I forget what they get They get a real small image And they just They're microservices base And now what we're Going to see now Is this weird thing Where like Everything's gone Right Like now We just have the Hypervisor And this Unicornal app And that has been compiled And I'm getting Some weird spaces Faces that Probably have never heard Of Unicornals Or anything But We'll talk about Why you might Do that in a couple Minutes But imagine this too There's always upside Downside Like anytime New technology comes out Like It sounds awesome It's not binary Right There's risk Reward But some of the Upside to this There's plenty of Downside I'm not Going to smoke Anywhere to ask But some of the Upside is This idea Now people can do Actually Whether we were using Big VMDKs Or even if we were using Docker with the host image And then we thought We were doing just in time With the application Because we could take What's in Git We could file Run it bang And we could do A thousand deploys day Or we can do Amazon blows to one point They did a thousand deploys In one hour Right We do talk about Just people Developing infrastructure Really fast Imagine that Like all that Underbelly stuff Is also Right It gets pretty crazy Right And this is This is a good paper It's Any blog article You start reading about Unicorns It kind of points back Hey You probably should Read this paper Before you read my blog And it goes into Like the history More than you want to know Anyway Just get a sense I'm not going to go through It in gory detail But you're basically Going through this Compile process And then you're Creating this image You know Lots of vulnerabilities Are other Unicorns Other than Mirage This is some They all have some form Of language that they use To build this matter This is a scenario Everything is compiled You're not pulling Somebody else's base Of something And then only Building on top of it You're building The whole thing That is a VM now But not a VM In the traditional sense Of like a VMWare Or Zen It is a VM In the sense That the hypervisor Is going to launch This thing And there is No operating system In theory Of course Most of the development Is going to happen Under a VM right now But in theory You have a bare metal machine You have a type 1 hypervisor And you run Unicornal VMs And then The Rump kernel Is another one That should be looked at Very interesting stuff There with these guys Done So And so Why Unicornals Why is everybody Going not so Unicornals Right now Clearly the performance Is interesting Right We go from 400 or 500 milliseconds Under 100 milliseconds Okay you know Will that be significant Maybe I mean I think The place where you might Want to think about Significance of that is If anybody's done anything With Amazon Lambda And if you've seen some Success in those Certain workflow Or certain Patterns of work That's where This will make Difference The ability to Stantiate something As if it is so ephemeral That it doesn't The fact that it started To stop Is almost equal At the time of the application The instantiation Times of marriage But a lot of people Are getting really Interesting Now I think Unicornals Are going to It's like anything else We As humans We bought pet rocks And everybody's Going to be running around About Unicornals Now for the six months About my god Stop everything And I'm not one of those guys I love new technology But I love it for Technology So But there's Some truth Some non-truth in this The truth is That we are fighting A horrible battle With people who Figure out Who could get into our Systems And attack us And understand our pattern When your system Is completely compiled There's no known footprint Of directory structure There's no Utilities So as an attack You basically Decrease the attack Vector Off the track You can't Like I can go into Any variant of a Debian system And I can figure out Things as an attacker I'm not an attacker But And I wouldn't know The first thing about What the attackers do Other than that They screw me up sometimes But the The fact Like You know Verizon This always blows me away Just Gorman told me this 2015 Verizon report Basically 97% Of Every Compromise So they have a Pretty bad summary 97% Of Every compromise Was Due to 10 known CVs If you know what a CV is It's in this database Nationalist And all of these They tell you Here's the one That really kills you Not just in 97 We're down We're down to 10 8 of them We're all 10 years old This is why People are going to Get really excited This is why Your boss is Probably going to come From some Gartner seminar And scream at you Why we aren't For balance Character is amazing Where's the puppet He's been involved Very early on He's been with Doctor very early on I wish you were The doctor And the puppet But He gives you the Good and the bad I love his quote He's saying Somewhere along the way Why are people going crazy Part of this Is a numbers game To run a reasonable system You might need 50 different services And 200 packages And the attacker Has to compromise This one Right But he does the Counterbalance He's very critical About security And What he thinks And again It's not all It's not binary It's not black and white And you know But here's some examples Of like Like now We're talking about Again on speed Performance Right A DNS server That's 446 It's a web server An old v-switch This will Make a difference Right Things to start up In less than 100 milliseconds That are this size That can run Early days of None of those Three are even close To that level Of what Some of these Like Really good implementations Of autoscaling Infrastructure And true orchestration Modeling of infrastructure We're working on it And you could Across the world Even worse, right With Docker Just because we had it At least with Docker It's going to be twice It's three Five times as hard For Uniperials To bargain You know The next slide Will make you laugh But The Anybody can Guess what the next slide Is I will send you Something really cool But Nothing It gets on the tip Right And up But You know on the upside I said opportunities Opportunities are like Things that we can Make better Even though they're bad now Of course they get Immutable If you like In a beautiful Infrastructure This is your baby Like you're not Changing anything Like you can't Like there's no tools Like we don't Put them in there There's no utilities So if you like Immutability You're a win This is the slide I was hoping Somebody would guess Brian Kentrell I love Brian Kentrell I think he's the funniest Guy on the planet I think he's one of the Smartest guys I've ever met He's actually a very Personally nice guy He does rant a little bit So I think I don't know When it was Sometime this week He went off Chart About Uniperials But here's the thing I read it And he's right But You know The thing That's the main thing There's a whole bunch of things He's going to say that He's going to have to take back In a year for now When Joyant goes And hi, Brian, if you watch him Because he did the same thing With Docker You know When Docker first came out He ranted about how Docker was so horrible And it was going to destroy He never was against containers But he thought Docker was not ready For prom dead, could it And in a sense He was right The year-rounded Joyant Implemented in today Infrastructure In a year for now I guarantee you I guarantee you That Joyant will have A form of Docker But He's a brilliant man And read this article Because I'm not going to blow Smoke up here And tell you Kernels are the best things ever Brian is like Ten times smarter than I'll ever be And his one statement That rains absolute truth Is something we're all Going to have to deal with If we want to use Unicernals is Unicernals in capital Are entirely undebuggable There's a list Of things you're not Going to get So Thank you very much I think we're Probably at the end That Write that down It's a gist file It has all my presentations, videos Everything I've done So if you're interested In some of the meta stuff But this afternoon I will update I always update it So these slides will Definitely go in there And when that video comes up I'll put a link to it There too But if you're interested In any other stuff They didn't have time To talk about That's all there So Thank you everybody And if anything Made sense Also Before you leave If you have something To contribute Some interesting things About this Please send that to me On Twitter Because Unicernals stuff Is all new to me too To what? These few things They're kind of A broader Stacked up Broader Plain And virtual You may not have Thought about Don't necessarily Correctly relate to You are still So Really quickly We're going to cover A few really basic facts Just to make sure We're doing the same thing So this talk I guess we'll talk about Its manipulation here It's like Starting as a pure user space Working in partners And kernel accelerating Look at the tool Basically what it does Is it launches You will say The online activity time Each So oftentimes First help Because of the way You figure out stuff And then finally What makes it really interesting To basically Create a configuration I would ask someone But at least you should Be consistent This is Much They're trying to Do their stuff correctly What do I mean by that In VM Every single one of those Is a component or You can process And define something You need to know And this isn't Even a fancy place So That's quite a few We're going to show you My arguments With quite a few options So the XML Actually There's quite a few In VM There's two in particular Upgradability The kind of Two elements That I really want to talk About Our machine type And CQM So The machine type Is Defining Basically The entire hardware We have some number of Things like that So what machine type does We do of course Have all the chip sets That go into support And we'll have that That's kind of what Machine type defines It's a way Of creating A syntactic domain The link there The link there Is to source code There's not actually A machine type The CQM All the same PC What does that mean You can't do that Go read CQM A CQM So if you actually Want to know what it is You've got to go read it Out of the five I'm probably there To the CQM X86 Emulator CQM Actually you can Emulate a ton Of different hardware Architecture If you want to see The list of My name and CQM So The machine type Is defined in source code But In this example here This is taken from Red Hat And so they have a machine Name called PCTag This is defined in source code Hopefully that's triggering Some cleaning up People's heads Rel has a machine type Called rel They're originally CQM Ubuntu Knows nothing About the rel machine types The machine types are passed by A Do you start to see There's a problem there Upstream CQM Has a set of Right here Basic defined Things like PCTag And then most distros Go ahead And patch that file Reagressively Create their own profiles What this means is Oftentimes Different destroyers Even though They might be running People with the same version Of CQM They're not compatible With each other Because On rel They're going to call Something Rel is 710 We'll torture it This is a case where You kind of wish Everyone would just accept That the instrument Committee is done For distros To run Red Hat has Say a different machine type From upstream And repeating So, you know Was because If they wanted to backport a fix And this is a great point I was Going to talk about this In the live migration Section You can talk If you want to backport a fix That means you've changed The machine type And since the machine type You find your hardware profile You can't just go Changing the machine type That changes What the definition Of Of naming it So You can create An incompatibility By saying Which we first wanted to The machine type Still named Food But I'm totally changing that So the case With Red Hat In this case Is to say Well, if we backport So we can change the machine To We're not creating A Which is that An accurate By one You can try By one A piece of feedback Is I completely agree With that So you can't use any of that For machine type To stay around That I kind of know But But that's true So the thing Is a very good point Once the VM is running It is what That operating system And what the game container Is now expecting Is a machine type Just like For You know Like a desktop Server Or a desktop Computer You know, a server You can't really Just name The other one out But another one Or any of them Oh hey Okay, CPU models This is actually Really similar to Machine types So If machine types Make sense to you Great job CPU models the same thing But it's for CPU Architectures And flags So There's a lot of Different CPU Architectures out there Different CPU CPUs within the same Architecture And family support Different flags So CPU models Can use the way of saying Well Do you want me to say I'm a Halo processor And But flags Do you want me to say This is a little challenging Again You can ask You to list those As QXAC's actually Libvert also Maintains a partial list Of its own CPU models Which in some cases And in some cases Can use the same name But redefine them So that could be A little challenging To understand It's out of the CPU It basically means Well The host is Visible for this game So if I'm going to launch a VM And say the VM is going to Have a CPU model host What I'm telling you is I want it to be modeled up And then Libvert has a mode of motion Which sounds similar A slight difference He will host a mode of motion He is actually what you're passing on there Which will host a mode So he will Literally read What is on the host And present that Libvert needs to pass through Libvert will look at the Look at the flags That the CPU is getting Will list each one of those So When you start That is Functionally the same thing But over time If you start migrating Line migrating Doing some suspense And presumes And your VMs Those can actually become Two kind of R-resistant sub-flags Exists They should be talking to you For that Queue actually has To emulate the functionality of that So it's not passing That capability Right now from the CPU It's emulating that capability As a result It's a lot slower Than just Directly asking How the CPU So Those two They're basically how you Define a lot of The reason I said there Is important And a lot of it Happens in the life of the VM You're just starting the VM You're just starting the VM When you start Migrating these VMs around Queue and CPU I actually Sorry, I touched on that Yeah, so quite Yeah, so the comment For the audience Is something about destiny The long answer is To keep on the version of To this notion Being able to enable Bested virtualization And if you turn on Bested virtualization This will be If you don't turn on Bested virtualization Probably about storage here A little bit of a step back I'm talking about Queue with the first specifics About storage in general Is really about Virtualization In clouds And also really Any application So You have seen it awful A lot So if you're going to Head down in a cloud The storage mechanism That's what the user Is actually capable of The very next point is Don't forget about storage We're talking about How much space do you need Right, honestly, in 2016 Space It's a terrible amount of drives Space is pretty easy To come out of this space More often than not The problem is That you don't understand How many I want to be You know How many you have Consider this as you're Trying to Design a virtualization Trying to think Think about my house Think about how many How many VMs Are going to be on What's going to be Serving those VMs If you're using A local, say Drive A little bit Is here a little greater in How many Iops Is that capable of In some cases You might discover That that's going to limit The number of VMs So consider things like Adding more spindles This is This is how The type of storage you're having Have more spindles Spread your workload Down across that Don't buy the 6-terabyte drive Getting storage capacity Yes, it costs you A little bit more But you also get A lot more on it Did you start using This? Especially now There's a lot of good Option which SSTs to actually Screw storage So when it's a Layered on things like Slightly different things On that You also have things Like if you're using Saffir storage And employing Great cards Now which are To support the notion of All how some of them Are spinning When you write off A lot of really high end Fee all great When you sound One on your Storage array That's the last time That's the key to Being a great high house So every single one Has a great storage array It costs You stop For everyone I'm not looking To do it But consider It's very Having too many Tours of storage Some spinning Drive far For long term storage And then Using a couple SSTs Discard So you've got The running The great And the Underlying post There's Air policy And arc Air policy Is basically It's for rights Or If our Air policy is set So what is this You know What does this mean Well Imagine when I Excuse you Target And network is For the network is gone So that My operation Is going to Basically Encounter How you Tell people How do you Want to Put it And go that Air is going to Come back to you There's a couple There's a way A couple different options There's a report Which is It's not alone With EM You've told me This guy's a boss He'll come through This guy's a The guest car I'll see this guy's a And I'll do Exactly what the guest car Were you not Your boss Is different Were you not In the There's a couple What you can use Stop Which basically Tells us Lipper Keynote Well What you're going to do Is you're going to In essence Pause The VM I know this That's when Your operators And companies Have what's going on Which is All the underlying Situations And presumably Ignore Which Is like What it sounds like Error Reading Encounter Error And actually Not entirely sure What the use Case for Ignore is Because I'm Not entirely sure How you could Ignore Especially in Error writing But I mean The use case Which is Basically It's a Let me use Case Report Instead of Reporting it As something like Excusier And saying No report And out of space Just saying No operation All of this Regardless of Why it Actually Can't Just saying There was No more space That can be useful Some things Were recently hit With out of space These Are Depending on Your type of Workload You may want to Play around With different Options If you've got See Thousands of Web servers And All they're really To make Is writing A few log files Slowly But not Your Transactional data If you Have a Network Live And All your Business Is when You Dispair How do you Want To Handle Your Cache In the IO There's Some Different Values I will not Attempt To Explain The Detail Vision Values Primarily Because There Isn't Necessarily So I See that you Kind of Wants To Know the Most about What So right Through the default It Works Most like You think You can Set That In the XML Just drive Our Definition Over Subversion Over Subversion If you Have a Guest Supports That So Things like You Have to File back Base back And support it Partly Supports Several Lies As Is You Create a Guest If You're Trying to Move Over Based As Sparse You're Going to Use a few Things You Can't Go Trink Well If you Have a Guest So What happens here is You need Your Guests Three Years Basically Now Your Guest Not Consistent You Choose a Trim Or Up to The Storage Back in So In Case So In this case What we're Doing Is We're Declaring a Disc Type this File So We're Saying A Discard Policy on Map In this Case Where is That Disc And You Can Move Over So Why Especially Things Can Be None но Important As Be Yeah Right settingsソ Inc54 I弟рут I Ooh This And great, great, I'd still be able to encourage everyone to use that whenever they're working can and working sense, especially snowflakes, or it's not that they're not a good fit for virtualization, it's just they have to begin. So migrations, in Liverpool we talk about doing the migration, but there's really a bunch of nuanced different types of migrations that exist. So there's, to most basic level, you have what I call a cold migration. Just shut the VM off, copy it as what it is else, transfer the XML over there, load the XML and say you can start migrating the VM, that's great. Whereas you guys shut the VM down, wait to copy the VM to that. So this is where your VM will stay running with the VM. Typically in a live migration your copy just CPU and RAM, so that implies that you have some type of shared storage in some way, so that the disk devices will be reviewed. There's also a variant of live migrations live, which is to say, hey, also copy the disk data as well. Which actually is pretty good right now. So what if we learn about migrations? Well, quite a bit. We've actually spent quite a lot of time on it. So I'm going to talk about it through the majority of the things that you're interested in. Actually, it is a wrapper, if you will, to several different API calls on Liverpool API. There's great calls on Liverpool that you can have. Migration, URI, migration, URI2, migrating, URI3, at least 4 now. Something like that. So there's actually more than one. In API, there's more than one. First, there's very much all that, the different flags you can have, controls what happens. So if you just created a version migrating, it didn't provide any of the other more interesting arguments. What Liverpool is actually going to do is it's actually going to pause the VM, copy it, basically create an empty container on your destination. Say copy all the CPUs today, all the RAMs today, all the data over there, what's all that's done? Shut it down on the old one. So it's live. If you run a CPU, you can't command it. It doesn't, and the VM starts in the running state. It ends in the running state. But it's not live from perfect infrastructure. It will pause a down type of VM. It doesn't reboot, but it will stop by a pause. You can have a flag for version migrating. Even then, it's not entirely live. What it does here is it says, okay, I'm going to sync all this data over, and then it can very well pause the VM, sync the basic of the CPU and do an APR model sync with the CPU. This is important to know because if you have the application that is changing the string to JVM, that is doing back to back garbage collection cycles, you will actually discover with the default setting that you can't find one with that VM. The reason being is because the memory state changes so rapidly that when it goes to pause that VM the size of the improvement home that it needs to sync takes longer to sync and it's configured to allow that VM to move the VM, okay, pause it again and try to do the copy again, all, or in a time. Do that VM, pause, and you'll have a live migrating to just sit there forever trying to do it. So there's a couple flags you can use to control that. So there's before running a migrating, there's a first migrate-set-max-down type. What that does is it instructs the bird if I perform a live migrating on the VM I'm going to sync it on. Here is the number of millisci things that will allow you to be in that final pause you can listen to sync. So if you happen to have VMs that you want to migrate to know and have a really high trigger rate or something like that, I've already discovered a lot of job-based applications in this community case and you're going to want to set a higher, but the only thing you can use to control this is there's that sort of marketing that you would pass with dash dash live and what that basically says is, yeah, look, if the whole thing is going to take longer than this, forget it. It's a positive thing to do with that. Don't try to leave room to give up. Try it with storage. In the XML, you can find a path in the disk to be the same one that doesn't exist. What's happening is when you tell the bird to migrate from A to B, it's taking the XML on A. You simply have the same XML being saying, okay, here's what you want to do. Well, that contains a path in there. B needs to have the same path. If you need to change the path, change that same backing from a structure so you can opt to look at a lot of data and use a copy-story to look back and say, interesting thing when you think, oh yeah, we'll leave this in our case. We'll just copy it out of our case. The only thing to note here about my blog, in my case, is that the bird is not smart, or it is smart, but not target. If you tell it, then you should lock migrate on B and see if I'm sure to store it to resource your destination. If you've got the same path mounted, it's going to copy that data. That's not really what you want to have happen. So that's an important thing to note. There's no seeking check. That's why things like what was that known to grab a lot of extra checks around those policies to make sure. If you tell me to lock migrate on B, be sure that we have a different source of destination to store. That's why it's really important if you want to do lock migrations to make sure that your source of destination are compatible. Now, they don't have to be the same. They have to have the same idea. I say you mean the same thing. I say machine type food. That's where if you're trying to migrate between the distributed distribution to an upstream end distribution, you might start seeing incompatibilities. Well, this has this word. For this check, it's the same thing that we've done. If nobody's surprised, you figure it out. And then it will say yes, I can't do the migration. Yes, there's actually, that's something to note. So it's a really quick observation about creating different parts. Anyway, since you look for most of the time, you can just drop out of place. If you see problems, you should follow these guidelines. This mostly has to be based upon times when you should profile now and say, yeah, you can't get there. One of the most recent times that happened was when you start to be proud and you can't get there for good. So there are times when you'll run out of issues with that. So you can try. You know, again, we'll always try to go for the older version of the version. You can also consider having the other decision benefits to not do it. If you don't put one in your X-code, which in most cases is the latest version. The good of this is if you read what you get, you get the latest machine time. That's good. There's benefits to it. The downside is if you have an operating system or an application that is highly sensitive, something now that we move, think about that. Can you actually sit in it? And then the general rule that we use is always migrate. There's also the fact that that's just not the way to develop the test. It's always assuming you're operating. You're doing migrations, you're doing upgrades. Always consider, you know, the source is old. LIVER does do some checking around source and justify. At least one distro, of course. There's at least one distro that shifts with a config problem. So what that means to LIVER is that you tell two LIVERs to talk to each other. So you want to make sure that your LIVER config problem either specifies A, B, U, U, I, D. Or, ideally, it will specify U, U, I, D in the BIOS section. So this is, basically, there's a couple of different sections. This is one of the sections in the X-code where you can pass some of the A, I, D, I, D, and so on. This is actually the way all the stack works. This is the information that the stack passes through. The manufacturer of the product. I mentioned this because let's just perfectly have data that you're trying to fingerprint the hardware to make sure that you need to do licensing and management and so on. So some of those applications that the user ought to say, hey, you have no serial number ever used to issue a license, or all of a sudden it says, oh, well, your serial number, do you get ideas and all? So no, you can't have more than one license. So we're going to just pass all of the stack and add that to everybody. So this is just a smart bug on the remote side to be able to update any type of searching infrastructure so that your network knows when the app lives. No, but it doesn't. No, it doesn't. I'm just doing my own voice. I don't think it knows. It didn't always work. I know that. We had the right name to know for that very reason. Okay. So it is there. I think you will attempt to do that. Dwarf will see the best. So with regards to using human override studies in VMs, we're giving my study initiation to a back end that was based on CFS. Then no problems there. Yeah, so at least one copy out there has experience doing it. It seems to be a train ride experience. It's a low presentation. There are probably several other drivers that keep using it. Now again, it's abstracting a lot of that. Yeah, so we have right now working with like 10 different tiers of leadership and each one of them holds a different part of this massive space. My team is just working for some of the more senior positions before us. But again, I'm going to control somebody because I'm doing the mic introduction. Yeah, yeah. No, I'm not. It's the only other way I can help. Okay, happy to help anyway. Yeah. Nice job though. Keep things moving along. It's good. Not that I understood a hell of a... Like, you know, that's just me. Yeah. Since I was in this kind of virtualization track, I figured I kind of had to do things I've seen done with virtualization. That's actually where we discovered the... It's actually countering... Yeah. How are you? Good. You're at Amathon. I'm here because they're paying my time to be here, so here's a card. My company's entirely on this one. Oh, good, good, good. Gum gum. Oh, really? Good. How do you like it? Oh, I like it there. Oh, good. Good. I mean, it's like that's the best. You know? Seriously, you think I would love when people hate their jobs? Because it's better for me if you think your job sucks. I actually really like when people love what they're doing. I do. Yeah, yeah, yeah. You know, I reach out to our blog on Monday and find out what new things are now available to me this week, right? Yeah, and the crazy shit is that how fast you get to what she gets... Trust me, since I've been there, it's like I've seen like six, seven major releases of this. I've seen six, seven major releases of this. I've seen six, seven major releases of this. I've seen six, seven major releases of this. I've seen six, seven major releases of this. So for example, we have 11, and also the ones we call the GIST and the GIST. It will create a thread on the GIST that will be servicing the GIST requests. It uses others that act as a functionality to the GIST, runs as part of the community process. If there is an exploit, and there are several that are supposed to keep also the GIST and some devices, pages from the source to the reservation. And then on the second activation, you have to only transfer the pages to the market or the pages to the changed. You keep practicing as the someone else exists with the host burden. You can see what kind of a question it is. The host burden can satisfy that GIST. Setup and recipients, these are two security related features. Setup. So after two of you starts, for example, it can give up its rights to, let's say, open a fire. And this is useful because if there is a GIST exploit, the GIST exists. And if there is a open access, it cannot process, cannot read the content of a machine. So GIST helps to do such a thing. This kind of network is a very important option. For the same page merging, this helps in saving memory. This is the same type of GIST, which can also affect, for example, the bias of a filter. Bites are set up by resources. And we have software for the receptors, things like, if you have a real GIST, you can get it. It will just be points as the next. And each GIST can have the same overnight, another three months, which explains addition of new architectures. That's because new architectures that we can check with a number of other features are smaller than that. PCI Express support. So we only had support for PCI devices to be exposed to the GIST, but now we can expose PCI Express devices to the GIST. It is feature parity with PCI over there, migration works. So we can declare it to be completely supported. And this is actually a requirement for IOMMU. If you want to pass through an IOMMU or emulate an IOMMU and expose it to the GIST for nested virtualization, this is required. As well as user space drivers, which we discussed earlier, and EER, advanced error reporting. So if a device has been exposed to the GIST and that device encounters an error, with the PCI bus we had no idea on the host that the device was in an error state and that we had to reset it. That is possible with the EER capability in PCI Express devices and we now have access to that functionality. Video inside the GIST. There are three different ways of doing this. One which was recently merged was VertiO GPU, which can do 2D and 3D support. So VertiO GPU is a para-verchalized video driver which uses OpenGL on the host for all the GPU rendering. VGPU is something that's in progress, but it needs a one-to-one mapping between the host GPU and the guest GPU. Intel is working on this. KVM GT. So this basically means the host GPU has some functionality by which you can share the host GPU between multiple guests and the host GPU does something. So this is still in progress though, but it's one of the ways that we do video inside GIST. The last one of course is device assignment where you just assign a device from the host into the GIST. The host gives up all control of the device and the guest exclusively owns that device. Now this of course depends on what kind of hardware it is. If the hardware has support for multiple functions, you can assign one function to each guest and so multiple guests can use the same device and so this is mostly used for compute and not for video because these are very heavy-duty GPUs and really expensive hardware. But there have been cases where it has been used for video as well as we'll see on this slide. Device assignment, there have been several improvements over here as well. Lots of things, I won't go through it all, but one interesting thing is IRQ bypass support which means the device can directly inject and interrupt into the GIST. The host need not be involved, so it's just that much more faster. And there was one interesting video posted by someone called Seven Gamers One CPU where they used KVM to... They built a machine which had seven actual physical GPUs inside the machine and they used KVM to assign each GPU to a different guest and they ran seven games simultaneously so they had mouse, keyboard, video, whatever, hard disk for each virtual machine and they were running games at bare metal speeds getting very good FPS results for all of them. So the choice of video card was not right because if there was an error, they would have to reboot the system and so on. NVIDIA kind of gets this right with some of their cards, AMD not yet. There are some quirks that need to be added, but what's interesting is this setup is possible. They had some hundreds of GBs of RAM per guest or terabytes of SSD storage per guest with seven guests doing extremely compute intensive work and KVM handling it all really well. So this was really interesting that someone did very recently. Just shows the capability of KVM and KVM over here. The block layer in QMU gained a block-dev backup which is a... you can just backup running guests, a point-in-time snapshot of a disk and this snapshot can be taken over the network. So another feature was IOT throttling groups where all disks that are used by a guest can be made part of a group and quota restrictions can be applied to the entire group. Earlier it used to be that the quota restrictions applied only to each disk. So now they can be applied to the entire group. So it's just something for infrastructure vendors. Extended IOSats which helps with understanding guest behavior and tuning guests. Some LibWord specifics, so most of the changes also involve LibWord but some very LibWord-specific things are adding a new WordAdmin API which can tune LibWordD itself to make things faster or understand guest behavior and do things. So something like gather resource usage and produce stats for it or if there are thread pools, like how many IOT threads you have running, if they're utilized to their maximum, you can add more IOT threads to the pool. You can get such stats and make decisions based on that. IOT thread pinning was added. So like vCPU pinning, which we saw in the real-time case, you can pin vCPUs to physical CPUs. IOT thread pinning can also be done to make IOT faster. This can be for block devices. PPC64, the architecture, became a first-class citizen. This resulted in a lot of refactoring of the code. Earlier the code was very x86 centric and now LibWord can handle multiple architectures. And with the addition of PPC, you can have big Indian guests on little Indian hosts, like little Indian host PPC, and you can run a big Indian PowerPC guest inside. And not everything was tuned for it. Not everything was ready for such a scenario. For example, VertIO, of course, VertIO has to deal with Paravirtualized and IOT between the host and the guest. So VertIO1 addresses this thing, and so does LibWord. LibWord needs to also deal with this. Some of the other things, VertIO input is something that I had mentioned. So now we have Paravirtualized keyboard, mouse, tablet devices, basically gets rid of the USB dependency. So you no longer need to have USB keyboard, for example. USB does a lot of interrupts. So it's better to have VertIO. VertIO balloon, the balloon device, which is used for Overcommit. So a balloon device can allocate RAM in the guest and give it off to the host. And the guest doesn't can't use that RAM. The host starts using it, maybe giving it to other guests and so on. But one of the side effects of this was if the guest enters an OOM condition, out of memory condition, the guest used to just blow up for no fault of its own because it has the RAM, which it has given to the host as a courtesy. But now there is support for deflating the balloon on OOM condition. So we can ask back for the RAM from the host and continue operations on the guest. Memory hotplug and unplug support. So you can unplug memory from guests or plug in new memory. A new security feature is to insert guard pages after the guest RAM. So if there is a buffer overflow exploit, someone tries from inside the guest. We will be guarded against it. Just this is like adding canary values for stack overflow protection similar to that. There are some architecture-specific improvements. So S390 got PCI bus support. Who would have expected that? But yeah, S390 can do PCI now. For ARM, hosts and guests can use multiple CPUs, 8 CPUs. There is a virtual interrupt controller support which makes servicing interrupts faster and dirty-based tracking which is useful for live migration. So all of this was added for ARM. For X86 VTD emulation or IOMMU emulation is in progress. This is used for nested virtualization. So if you have a guest that can act as a hypervisor itself and it can emulate an IOMMU so that it can pass through devices of its own to the second level guest. Some more nested word improvements. Split IRQ chip. So this is a security feature we saw. The APIC was emulated by the KVM kernel module and to reduce the attack surface because that's part of the host kernel and if the guest can exploit something it gets access to the host kernel. So to reduce the attack surface, a lot of functionality which is not necessarily performance intensive has been moved into KOMU and only a small part now remains in the host kernel. PPC got CPU and memory hot plug and also support for the H-random hypercall which is similar to the RNG device. PPC always had the guest side of PPC. The Linux kernel for PPC always had support for a H-random hypercall because a previous hypervisor which PPC used to work on had support for that. So KOMU gained support for this hypercall as well and it can pass on host entropy into the guest using this. Some of the features in progress are VertiO GPU 3D spice integration. Spice is the remoting protocol and 3D only works with the GTK back end right now so integrating it with spice is something that's coming up. Native Hyper-V Paravirtualization. So Hyper-V exposes a lot of Paravirtualized devices and KVM can now as in very soon start exposing those devices as well so guests which are tuned to run under the Hyper-V hypervisor can actually run under KVM and get the same performance benefits as well. Block Dev backup, something that we saw earlier in the block will gain incremental backup functionality and it will preserve state across restart and live migration and several other features. There's a lot more. So I've reached the end of my slides. I have my email address up here and the address to my blog where I will put up the slides in a short while and if there are any questions I'll take them now. You talked about benchmarking and on the slides did you know how KVM compares to the Zen hypervisor? Yeah, so the spec word benchmark is an industry neutral and an industry standard benchmark for measuring virtualization hypervisor performance and companies can run spec word and choose to publish or not publish the results. KVM is the hypervisor that consistently gets published. Zen is not published at all, so draw your conclusions. Well, the thing is Zen cannot spec one of the things that really does well or measures is the scalability as in how many guests can be run at the same time and what is the amount of work that each guest gets done in that much amount of time. Zen doesn't scale really well and I don't even think Zen supports as many vCPUs as we do or the amount of RAM that we can give to the guest. It's just not close, not anywhere close. Can you discuss some of the differences between the type one and type two hypervisors and the use cases where Zen would still be used versus where KVM would be superior? Yeah, so I don't know. I think the type one versus type two debate is purely academic and not... Well, what does Zen do? I mean, Zen has to do its own scheduling, has to do its own power management, has to do its own memory management. We use Linux for that and so just call Linux the hypervisor and I think both are at the same parity. I don't think it matters much and frankly, from the number of features, the kind of security we provide, the scalability, performance, et cetera, we beat Zen in every possible way. Any more questions? Yeah, there's one there. Snapshots? Are you talking about live migration? No. Okay. Yeah, I guess it's mainly because you have to get the hardware in whatever state it was and you cannot because you need to reset the hardware, which was the point I mentioned. His hardware cannot handle those resets and so it's hardware dependent as well as snapshots. I don't know how useful it would be with assigned devices because yeah, you need to get it into whatever state it was in. Kind of out of your control. Okay. Thanks for staying for the last session of the last day and thanks for bearing my voice. Not too great. Thanks. Should I just put it in the root? Just put it in the root? Okay.