 Hey, good afternoon. Good morning. Just a fore-afternoon. Thank you guys for joining us today. This is Ken Moore. He is a senior TrueOS developer and he's going to be talking about CIS ADM and how it can help you administer a free BSC-based system. Thank you for joining us again. Thank you, Matthew. Alright, now can you all hear me in the back just fine over there? Make sure the sound's up and all that. Okay. So just to reiterate what Matthew said, my name's Ken Moore. I'm one of the TrueOS developers. TrueOS, we changed our name from PCBSD about six months ago or so, middle of last year or so. So if it's a new name but you're familiar with PCBSD, don't be afraid. It's a lot of the same stuff but we're taking things in a different direction. So today I'd like to talk to you about CIS ADM. This is one of the tools that we started writing at the tail end of the PCBSD days and continued to be ramped up, continued to be developed and continued to really become the primary method of interacting with free BSC systems on TrueOS. But I want to talk to you about that today because you can run it on pure free BSC systems too and I want to tell you some of the pros and cons about it and kind of what it is. Alright, so let's start off for here. What is CIS ADM? Why the crazy name? Alright, we created it for TrueOS specifically as an aid to help system administrators. For those people that are administrating lots of free BSC systems or even something as small as just your local system. It's just an aid in that. That's kind of the whole point of it. We decided that one of the things we wanted to do with CIS ADM was we wanted to break it up and have a multi-tier approach to it. And by that I mean there's a server component. That's the piece that's free BSC specific. So you can install that server on free BSC. You can install that package if once available in the free BSC package repository that you're looking at. And then once that's there just turn on the service. Tell it to turn on either the well I'll get into the modes a bit later. But basically you turn on the service and then with that one thing you now have access to an easy API for administering your free BSC systems. But along with that we wanted to couple it with a client side as well. So we wrote a cross-platform graphical client to interact with CIS ADM services from various things. So it's multi-system aware. You can manage multiple CIS ADM services or free BSC boxes with CIS ADM. And then the last component which is something that's still in development and I'll go into it a little bit later, we're calling the CIS ADM bridge. It's basically like an announcement service or an announcement, small announcement server that you can throw into a VM somewhere up on the cloud on Amazon AWS. Something like that where it's publicly available and then your servers and your clients can both talk to that to find each other. So I'll talk a little bit more about that but that's just the general framework, the general build philosophy for how we decided to approach this issue of how to administer free BSC boxes in a static way. Alright, let's talk about the server just a little bit. And again if you have any questions about the server by all means, get my attention, catch my eye, raise your hand, something, shout out. I would love to answer questions even in the middle of my talk. So let's go through the basics of the server. This is what we often call the brains of the system. How many of you are familiar with middleware? Or have you heard the term middleware before? Okay, a number of you. For those of you that aren't familiar with it, a middleware is basically something that's a go between. It provides a static interface, a static communication language that you can then write your applications to talk to that. And then that actually does all the complicated bits to interface with whatever the system is underneath the hood. In this case, SysADM's server is the middleware and FreeBSD is what's underneath the middleware. That's the complicated system that SysADM will link against the libraries, it will link against utilities and stuff like that so that the end-level user doesn't have to worry about what all it's doing in the background as far as what on FreeBSD is getting used or how it was compiled and stuff like that. They just know that they have this API, it is static, it is stable and you can just talk to that. So if you build your application with that API it will continue to work for FreeBSD 11, FreeBSD 12, FreeBSD 13. It will work for multiple versions because the ABI is designed to be static. SysADM itself is what manages and tracks the versions of FreeBSD to keep you up to date so that everything works. Now, that's just the idea of what the middleware is. So it's a single binary. It can be run in two different mode zones. This is something that was really important to us when we started developing it is we wanted to have a REST server. Everybody who's writing APIs knows how to send a simple REST request and get a simple REST response. We didn't want to try and break people's functionality by only doing WebSocket for instance, but we wanted to make sure to do WebSocket because WebSocket opens up a whole other realm of stuff that you can also do with server client communication protocols and stuff. And I'll go into the differences here in a minute. So we wrote one binary which lets you actually run it in both modes, as a REST server or as a WebSocket server and you can start up that binary in two different processes in both modes at the same time so that the REST server will listen on one socket the WebSocket version will listen on a different socket. So you can have one or both of them running depending on your particular needs. The other major component to the SysADM server and this is what distinguishes it a little bit more from a lot of the other middlewares out there or system deployment APIs. How many of you use things like Ansible or Puppet or things like that that basically is a way you can write something for that and then deploy it and use it for multiple systems? SysADM is kind of like that except it doesn't have a database of its own. You don't save anything to SysADM. You don't save anything or tell anything to SysADM and you have to port from SysADM here to SysADM there. You can boot up your free BSD server without SysADM and all the settings that you set with SysADM will still be there because it doesn't keep its own settings. It's just setting it on the box just as if you were a standard system administrator logged in via SSH or something you're manually changing config files that's exactly what SysADM will do. It will manage all those exact same config files and put stuff in there exactly as if you were doing it by hand and that means that you can actually turn off the SysADM server once you get the system running and then you can boot it up, turn it down and it will continue functioning exactly the way you set it up. There's no hard dependency on SysADM running at all times. So that's I think one of the main differences between SysADM and most of these other middlewares or most of these other deployment systems. So this is just a quick flowchart if you will of kind of what happens with the SysADM server. At the very top left you can see there's an arrow comes in that's where some kind of client initiates a connection to the system. That could be either a remote connection or just a connection from the local system to itself so a client running on the local system talking to it to the local daemon. So the connection comes in. First thing it has to do is black listing. You'll notice that we actually have black listing built into the server by default. You do not have to run extra services to secure your SysADM web service. We have tons of security systems built into it out of box because we know anything that might, that is designed for remote access where you open your firewall for that port so that things can talk to it, that is a crack in your armor so to speak. So you need to make sure that that crack is at least secure and that what's going through that port really is authorized all the way. So we built tons of security protocols directly into the SysADM server. I'll go into more details about those protocols later but here's the general idea, we do have a black list right away. So if something is denied access, I believe it's two times by default, maybe three, you can change that as one of the settings for the server. It will automatically put it on the black list for a particular time which again can be set in your config file. I think the default's an hour or two just to make sure that nobody's locked out permanently in case that was you mistyping your password over and over again. After it passes the authorization protocols and black list obviously, then it goes down to there's a few internal systems to the SysADM server itself. This is very, very small and those are mostly related to the authentication systems. For instance, the list of SSL keys associated with user accounts so that somebody can log in with a key instead of a username and password and then that is automatically linked and associated with the user account on that system. Similarly, there's a generic process dispatcher so if you need to run some script that you write and you deploy on various free BSD systems, SysADM will actually help you you can actually log into SysADM and dispatch a job saying hey I want to go run that script that I put over there, go ahead and run that and let me know the output of it, let me run it and know the stats about it and things like that and basically manage that process for you. So those are just a couple of the systems built into the SysADM server itself. But where the rubber beats the road, where the really, really the guts of the server are is in the external systems. These are basically the static API calls that let you interact with other things on a free BSD system. So I've listed a few of them here, for instance, user management. On free BSD that's typically the PW utility and that's for creating users, adding users, managing your users and groups. So you want to take user mom and you don't want to take them out of the operator group because she really shouldn't be mucking around with things on your system. She's just a bare user. Don't give her access to changing Wi-Fi settings or getting into root folders and things. Same with the wheel group, stuff like that. The service manager, that's just a front end for the service command on free BSD. So this works both with OpenRC on TrueOS and with the traditional RC service on free BSD 11 itself because the service command, it works for both of them, it doesn't matter which. The firewall manager, this is specifically a manager for the IPFW firewall, not PS. And the reason I'm saying that is the IPFW for those of you that don't know is the free BSD firewall. PS is the firewall from OpenBSD. But the version of it which is on free BSD is not the current version of PS from OpenBSD. It's actually very, very old. And the reason for that is because they changed syntax for how to write firewall rules and stuff for PF quite a while back in OpenBSD, but they never, once they made that change, they never brought that back over to free BSD. So free BSD has been maintained in an old version of the PF firewall. So we said, hey, why bother doing that? Let's at least write it for the free BSD firewall itself and then maybe later on we'll write an API for PF as well, especially if they try to bring over the newer PF from OpenBSD. What if you want to run Jails or VM? IOCage and Iohive are really good tools for that. There's a few other ones. EasyJail is another one for running Jails and stuff. We do not have an API for that yet, but you know what? The beauty of this system is all of this class system that I'm describing here, it's all modular. We can easily add a class, add a few API calls without touching the heart of SysADM itself in a very modular fashion. We can write new classes and new API calls in a day or less. It takes relatively little time to actually add new functionality into SysADM because it's not doing it itself. It's basically just knowing, hey, there's this utility over here. This is how you run that utility. This is how you configure this utility. And that's really all it is. So it becomes very very fast and very easy to write new APIs and write new classes. And then you can see there are a few other classes, but once you're done with the requests that came in, once it gets parsed out to other classes that it was accessing and said, hey, I need to access IOCage, I need to run this in IOCage, it gets back the reply from the external class and then it sends out the reply in either REST format or WebSocket format, or the JSON format through the WebSocket. So that the type of connection you have, it will respond in kind with whatever type of request you're sending in. So I mentioned security a little bit. Let's actually talk a little bit more about security. One thing it requires SSL and SSL secure pipes. So we use HTTPS for the REST server or WSS or the secure WebSocket protocol for the WebSocket version. It requires TLS. It will not use the standard old SSL SSH protocol. So it does require TLS. And it will start with the newest version of TLS that Sysadium server is available can use. And it will keep going backwards until it can find a match with the client that's connecting to it. So it always tries to use the strictest TLS security setting possible. And on TrueOS that actually means it's using Libre SSL. It's not using OpenSSL at all. So we highly recommend that for those of you that are interested. For the REST server we have authentication via username and password just as if you are SSH-ing into a box you can log in with your username and password. And that is for a local user and a local password on that box. So if Sysadium doesn't change usernames and passwords it's the same thing as what's defined on that box itself. Or one of the things we added, SSL public-private key pair. So what you can do is you can actually associate a particular SSL public key. The server gets the public key. The client has their private key. And when they sign in they say, hey, I want to sign in with an SSL key. The server will respond, oh you do. Hey, I'll generate some big long random string. Send that back to you and say, here you go, please encrypt this. And then the client takes that encrypts it with the private key, sends it back. And then the server says, okay, now I've got this encrypted. I'll go through all the public keys I've got. Find the one that matches and can decrypt this encrypted thing that you sent in and then that associates with the user account. And the reason you do that is it makes sure that, one, you're never closing your private key. You're never sending details about your private key over to the server. The server only has public keys. Yes, you do not have to do that. There's actually API calls. I mentioned there was an SSL key protocol built into the server itself. Actually the way that works with our client. I'll mention this a little later. I don't mind talking about it now. The way our client actually works is it will log in with username and password first. The very first time when you're setting up a connection it will use username and password. Once that has authenticated in past, then it will send the API call and say, okay, now I would like to register this certificate, this public cert, with the account I just logged in with. And then from then on, whenever the client connects, it always uses the SSL mode and never reuses that username and password. In fact, it doesn't save it. It doesn't use it anywhere. There's no memory of it after that. So that's just one of the ways the client does. But you can as a system administrator, I have this at the bottom here, actually disable username and password authentication and require SSL key authentication. So what this means is that if you want to allow access for somebody to access that box, they can't, even if they have a user on that box, they cannot log in unless they have an SSL key. And in that case, that's where you as the user would have to send your public SSL key to the administrator and then he will log in and add that onto the system for your box. And that can be turned off and on. That's one of the options for the server itself. In addition to that, we have strict timeouts in blacklisting. So obviously if you try to authenticate, but you never complete the authentication protocols, we will time that out and put that down as a failure. And if that's two failures or more, you'll get blacklisted. We try to prevent a lot of possible brute force methods or repeated DDoS things. So we're constantly blacklisting. And those are all done automatically for you. You don't have to go in and constantly manage blacklist lists all the time. All you have to do is set up, hey, okay, I want my timeout to be this. I want to automatically blacklist after this many fails and stuff like that. And then the server will itself manage those for you. Similarly when you actually do authenticate with the system, the CISADM server does have a concept of the distinction between different users. It actually has an understanding of groups and users. You know, can't do that. So in particular, the two distinctions that CISADM uses is that you have to be in the operator or wheel group just to be able to authenticate. If your user is not in either of those groups, it says, I'm sorry, you really don't have permission to run the stuff that we provide. So if you're one of the unprivileged users you will not be able to use this. If you're in the operator group, it gives you access to CISADM, but it's limited access to CISADM. For instance, for the SSL Key Management API that I said, if you go into there as an operator group user, you can manage your keys. You cannot manage other people's keys. It is only somebody that's in the wheel group, which has full access and can actually manage all the keys on the system. So manage all of the users. So it distinguishes between those because those are the traditional groups on free BSD for providing permissions. Because wheel lets you SU to root, operator gives you the elevated permissions from normal for doing things like mounting devices or installing a package and some things like that that more normal users will need but not full control over the system. So CISADM does understand that hierarchy. Alright, so I mentioned the two servers. What's the difference between WebSocket and REST? Why did we have to do both types? Most of you are all familiar with REST, the single request connection. You send a request, you wait, it replies, that's it. Connection closes. One-off type of connection. With WebSockets, it's really designed for a different methodology. It's designed for you to log in, authenticate, get your token and stay open the entire time. This allows for all sorts of new powerful functionality, which is why the client requires the WebSocket stuff, which I'll talk about in a little bit. But this allows for things like spontaneous events. We actually have the CISADM server run what's called a health check of your system every 15 minutes. It just runs a few basic checks. Hey, is your disk space running out? Is your CPU clock rate maxed out all the time? Do you have updates waiting to be applied on your system if you're on TruOS, if it has the update manager in there? Did you have any recent activity with regards to ZFS snapshots every recently? Have your automated jobs basically been working? So it'll send out notices about that. So we have a whole system for spontaneous events that will only work through WebSocket connections because it's not replying to something in particular, but it's still something that you can sign up for once you've authenticated. You can say, hey, I want to receive those types of events. I want to receive the health checks. Or I want to receive events about this one dispatcher process I kicked off. Or I want to receive events about, you know, there's a whole list of different types of events that you can then as a client sign up for and say, okay, I want to receive that type. So it allows for all these spontaneous system updates and health checks and things like that, which you really can't get from REST very, very easily. Not without constantly initializing and closing connections and just bogging down the system a little bit. So that's one of the things that's different. The other main difference that I want to point out is that the WebSocket uses pure JSON input and output. There's no mixture of REST headers or authentication, things like that. And I actually have a I think the next slide. Yes, here's an example of what I mean by that. So for a WebSocket, if you're running the WebSocket server, it's a pure JSON in and out with the namespace, name, and ID field. The ID field is very, very important because the request that you send in might not come back in the same order because everything is done asynchronously with WebSockets. So it's important to have that ID field so that you can match up, hey, here's a reply to the request that had this ID so that you can then match those up from the client side. With a REST call, you have to have the standard REST headers up at the top, but you'll notice the namespace and name correspond to the first thing there. So rpc slash query. We have to add them together with that slash. And that's how you distinguish that as for accessing the API. As far as the arguments, though, both of those use JSON though. So that's where the uniformity is. So if you look at our API documentation and all the examples we have, most of the time they just show you the arguments because it's the exact same form here for sending in for both types. It's just the arguments that are the same and unified no matter which type you're using. Now, before I move on to the client here, do you have any questions about the server? Okay, the question was what we like about working with FreeBSD under the hood. One that I can tell you right away, I haven't run a ton of Linux systems, but one thing I can tell you right now is we don't have all the churn on FreeBSD. I think changing. We can write API calls for FreeBSD and they'll be valid from release to release to release. The actual implementation of the utilities for managing FreeBSD might be different based on releases, but the APIs and the ways of using those utilities are almost always the same. FreeBSD is very, very stable. It's very reliable in that respect. One other question. I will answer that question because it deals with the client in just a minute after I go through this client. All right. You mind waiting? Okay. All right. Let's talk about the client a little bit and then I'll answer questions about the client too. All right. So I mentioned earlier, the CISADM client is fully across the platform. We do this by actually writing it with pure QT5. There are no FreeBSD dependencies. There are no dependencies on operating system libraries. It is purely the QT5 library and C++ and that is it. So this allows us actually to have automated builds of our CISADM client for both FreeBSD systems, Windows, Mac OSX or Mac OS is what they're calling it now I think. But we automate the builds and you can download images and run the CISADM client right now. I'm going to the website and just downloading the files and those are automatically generated whenever we push a commit to the client repository. Yes, sir. Just inform that the CISADM website says system under construction. I will let the guy know about that. But yes, we do have those builds. I can probably point you to the FTP server itself instead of going through the main website if you'd like afterwards. One of the things that we also do is that the client that we have designed, now again, this is the client that we wrote. You're not forced to use this client. By having a static API on the server, you can write your own client. You can write your own collection of scripts. You can write your own front end to managing CISADM systems. That's why we provide the stable API. The client is just one way that we can create to demonstrate all the functionality that is there. So basically whenever we add a class to the CISADM backend, we typically add a new page to the CISADM client front end as well, which corresponds to it. So you have instant access. And that also allows us a degree of testing and flexibility so that when we're actually writing the backend, we're actually testing it and making sure it all works first before we push it out to people. One of the things that we wanted to build into the CISADM client right off the bat, though, was multi-system management. What I mean by this is you can actually register connections to multiple servers. So for instance, I work out of the Tennessee office of IX Systems. We have a server room with a whole bunch of servers and racks in there. We have TrueOS on, I think almost all of them. There might be a Mac box in there so that we can do the Mac builds. But we have CISADM installed on all those servers. And the guy that helps run and administrate all the servers in our system has the CISADM client on his system. He has preset connections for his client to all those systems. So he can always keep up and say, oh, that server went down. I can do whatever the one thing is I need to get it back up and running again really quick. He gets the health checks for every single one of those servers all the time. Oh, you know, executor is starting to run out of disk space. Time to go clean that up or swap, put some more disks in there or something like that. So he can get things like that. Yes. I'm not as familiar with Nagios. So I can't answer that. Yeah. There's a lot of systems specifically designed for that multi-system connection methodology and that's one of the things why we wanted to put it into CISADM too because it really does make, open up a whole new vista of things that you can do by basically treating clusters of systems, not just an individual system at a time. So I mentioned the regular health checks already. We also allow our own way of defining your own logical groups. So you can set up as many connections as you want. You can have them all in one lump sum. You've got 150 systems and you can have the pain and suffering of sorting through those every time you're looking for that one system that was having issues. Or you could define your own groups and say, okay, these are the systems which are always unstable. I'm going to throw those in an unstable group, for instance, just to make it easier to find them if I have that many connections set up in my client. You can define your own groups. You can define subgroups. You can define subgroups. You know, go crazy. All right. You can do it. And that's just, again, another way for you as a system administrator to be able to make it easier to find the system that you need to access. And then this is kind of a no-brainer, but local health connections don't require internet access. Okay. This is actually what we do on TruOS. By default on TruOS, the firewall is really locked down. You do not have access remotely to access the SysADM server on a TruOS box unless they have opened up that firewall port to let remote systems have access and try to connect. This means, however, that we can use SysADM client as actually the control panel for TruOS systems. So basically, you're just interfacing with the local system itself, even if it's locked down completely, even if you don't have Wi-Fi, even if you don't have network, you can still administrate your box. All right. So let's talk about a little bit of the security for the client itself, because I mentioned this is fully cross-platform. There's nothing specific to the operating system. So how are we going to do security? Well, one, we mentioned SSL certificates and stuff. So what the client will do is the first time that you start up the client and before you start setting up connections, it says, hey, I do not have a certificate yet on the system. Please create one. And it will walk you through the two or three questions for how to create a certificate. And it will generate those certificates for you. It actually generates two pairs. And I'll go into the second pair in just a minute. But one of them is for talking to SSIDM server. The second one's actually for talking to a bridge. The next thing it will do is it will actually take those keys that you just created. It will take your settings file, which was created to go with the client as well, and it will put them within a secure SSL locked bundle, an SSL file, password protected bundle. I think it's, what is it, PSX12 or PSX12, something like that. But it's a standard SSL security locked bundle of files. And so it dumps those into there and you set up your own password for that file even on the client system. So we don't trust Windows. We don't trust Mac. We don't trust that somebody else isn't going to be able to break into that system. But even if they see that file, they can't then find the addresses of all your servers. They can't find all your keys. They can't do that because it's still within an encrypted bundle that you have to unlock with your password. And that's a different password from your user account on that system. That's the password just for the client. And then after that, what it does is it use the WebSockets protocol to talk to actually the SSIDM servers. I mentioned this before. It uses the username and password just to do the initial connection, the first time you're setting up the connection. It'll say, okay, you log in with username and password to that box. Yes, that is indicated just fine. And then it will automatically send the registration for the SSL certificate that you just created on the client and register that public key with the server that you just connected to so that from then on it will always use the SSL keys for login. The last thing I want to mention there is import export. What if you set up 150 connections in your one client and then you realize, oh crap, I have to go use a desktop over in the office. I don't want to have to recreate all of those. Well, by having the one encrypted file, the one encrypted bundle, you can export those settings directly from OS to OS. It's OS independent. It doesn't matter if you're taking it from a Windows box and dropping it on a Mac or you're moving to a true OS or free BSD system. Just export those settings from the one that you set it up and then import those settings, that bundle back onto the other system. You've got all your connections. You've got all the same keys. You've got everything, all set up, all your nicknames for all your systems that you set up. All of that is there. So what I highly recommend is if you're doing something like that export your settings, export your bundle and then put them on SVN or put them on your private repo or put them somewhere like that where you won't lose it because that will become really important. And that's just a nice way to safeguard all of your settings. So let's go through a few examples. I mentioned this graphical before I did that. Yes, of course. There's nothing like that where you can say, oh, here's all of this. I was going to say there is a distinction between the SSL keys and the settings, which is the IP address and stuff that's associated with it to look for. I might be able to add some of that functionality into there so that you can share it with somebody else so they can use their own keys, their own encryption and stuff, but that might take a little bit of work to set that up. Otherwise, you're sending out a hundred initial requests. We'll see. I'll look into that, but not at the moment. Okay, other questions. Yes. Oh, repeat the question. Yes, sorry. He was asking about sharing keys from user to user, so basically giving it to a coworker. Not keys, sharing your settings with a coworker. Okay, so let's do some of the examples now. Sorry. Moving on. Since it's graphical, here's just an example of what the graphical connection manager does. So you can see the buttons on the right. You can add a group. You can remove group. You can add a connection. This is all drag and drop capable, so once you get the connection in there, you can literally just drag it to different groups or drag it around however you would like. That's how you would associate with various groups. There's remove connection, reset your settings. So if you want to completely blow it away or whatever, you can do that and wipe everything and restart fresh. You can rename a system without redoing all your authentication. That's basically just rewriting the nickname that you set for the connection. So if you need to, it's like, oh no, this one's always crashy. I'm going to call it Crashy McCrasherson and I'll put that in there. Yeah, so that's just an example of that. It's a really nice small window. The main way it runs though is it runs as a system tray application. That's really the guts of it. So you'll see down here, there's a brief snapshot I took from my TrueOS system. Pardon the system monitor in the background behind there. But when it's a yellow icon like that, it will flash yellow from time to time if you have an important message that you have not seen yet. So for instance, a health check came in for me on this system which said, hey, you have updates available. That was the first time that message has come through so it will actually flash and say, hey, you've got a new message. If there is actually criticality or priorities built into the messaging system. So if there's something like, oh no, one of your disks is starting to fail, your Z pool is degraded and having issues, that will flash a nice bright red for you. It basically gives you no flashing, no notification, it's just a standard message, nothing important there. Yellow for okay, this is a somewhat higher priority message. You might want to take a look at it. And then there's also orange and red for higher elevated levels of priority for how important that type of message really is. So we give you that type of interface and then it will show all the messages there and then there's an option to hide them all as well. If you actually want to say, okay, I've seen all those, I don't care about all, just hide them. And you can do that too. The other thing I want to point out from the tray menu on the bottom right, you'll notice that local systems in the top right, if you're on a true OS system or a free BSD system, that will always be there because you can actually connect to your local system. If you're running on Mac or OS X, by definition there is no local system to be able to connect to because it's only for managing free BSD systems. So it just hides that and you only get the groups that you set up underneath that. In this case, test group and then sample system is underneath that group. And it will automatically, if you change around the groups and systems and stuff, it automatically regenerates that menu to put them in the same order that you set up in the connection manager. So you can literally define exactly where you want things to be. Now what happens if you actually click on that local system or click on one of the other systems? Then it will bring up a page like this. So this is what we often call the control panel. This is what we refer to as the control panel in true OS. And you can see up in the window title up there, it'll show you the IP address. This one's my local system 127.001. And it basically shows you just a front end to all the available systems that SysADM has access to. For instance, if you don't have IOCage installed on the system, that will not show up as an available subsystem for SysADM. You don't have to recompile the binary for every single application. It's very dynamic. It actually looks at the FreeBSD system and says, hey, that's available. Yes, you can use it. Hey, that one's available. Yes, you can use it. Oh, no, that one's not of it. You can't use it. I just won't even tell you about that one. So it acts there's a way that RPC query command, API call that I showed you earlier, that's actually the way you say, hey, what systems are actually available on this system? What classes can I actually use? And this is basically just a graphical representation of that. So there's the boot environment manager, the firewall manager, the service manager, all the things I was talking about earlier as external classes. And then it's up there under the SysADM server settings category. That's where you'll find the way to manage your keys and do some of the stuff built into the SysADM server itself. But this is very dynamic. And as you install things, things will change on here, particularly under the utilities category. That's where things like Life Preserver or IOCage or Iohive, that's where those ones will start to show up. Alright, so let's show an example of one of the classes, PKG. I wrote this one so I like to demo this one. PCBSD has historically had a graphical front end to the package systems on FreeBSD called the AppCafe. With this now we have the SysADM providing the API for interfacing with PKG itself. So in the client we call that page the AppCafe. And that's just for continuity and historical reasons with coming from the PCBSD project. And that's your graphical front end for the package manager. So here you can see there's a few tabs at the top, one for browsing applications that you might want to install, browsing packages, one for viewing your installed packages. That's what I'm showing here. And one of the things that I really like here, and this is why I put it into the graphical interface, is it gives you a top level view of all the packages that you deliberately install by default it will show the ones that were explicitly installed by somebody, not just the dependencies of dependencies and just bloat the list. If you want to see all of them the little button there it says options, there's a check thing saying oh show me all of them. And then you can show the 2,000 packages or something that you have installed on the system. Yes, with regards to the security vulnerabilities and stuff, we show the information that PKG gives us from the database. PKG actually keeps two databases on the system, a local database and then a remote database or basically an index of what's available on the remote repository. So the local database does have a lot of extra information such as when the last time it was installed and stuff like that and that is reflected in the app cafe as well. So if you double click on one of these installed packages, it will take you to the full page where you can see all the information about that package. So yes, app cafe does give you information about all of the stuff related to the packages such as when it was installed and all that extra information. On the app cafe is it showing does it show it for each of the remote servers or just for the server that the system admin is running on? Actually, when you go to the browse tab one of the first things at the top is which repository you want to browse. PKG is actually very, very smart with regards to being able to register multiple repositories with it. So you can literally browse pick a remote repository and say I want to install this package from that repository and it will work just fine. Normally. Yeah, exactly. So there are tons of packages. You can add new repos, you can remove repos, you can do whatever you want with repos on the FreeBSD box and the app cafe will still give you access to all of that. Yes. Thank you. Sorry, I'm not good at repeating the questions apparently. It's been an issue. Are you able to use the package manager on the server side to mirror the packages so you can have all those locally or just the machines the clients here have to go out and fetch everything themselves? The clients do nothing. The clients are literally just an interface to the server. So you can literally install this ADM on a FreeBSD server just like we're doing the servers in our room. And we can manage all the packages for those servers graphically because the client's running on my TruOS system, it's not running on the server in the shed. The client doesn't actually save anything from the server. It doesn't save data, it doesn't have a cache or it doesn't have anything. It generates all of this based on the information that the server sends it at any point in time. So the server is always the source of truth. We're only just seeing stuff right now from the server. Yes, in this case it's my TruOS box so it's got a lot of desktop stuff installed too. But yes, you could do this just for servers. That's not a problem whatsoever. Other questions about PKG? Yes, sir. Can you do a massive install on all the servers? I mean multiple installs. So you need a package on all of them. We do not have a way to do a bulk install yet. We basically say, I like this server, I like the packages installed on that server. I want to take that list of packages and apply it to all these other servers. That's something that we're still playing with right now with regards to the bulk and cluster management aspects of the Sysadium client which we're still working on. It's not quite there yet. All right, let me move on a little bit. All right, here's another example. Task Manager, how many of you is top? I assume all of you have used top. I mean it's kind of standard or H top or something like that. Task Manager is basically just a graphical representation of top. But it's faster. It doesn't use top. It's actually running the FreeBSD libraries itself exactly the same way that top would on FreeBSD. So you're getting the information a lot faster. You're not using CPU cycles. You're not going through all those iterations of top just to be able to present a little bit of information on the system. So it's actually much faster than running top quite a bit. And it's graphical. So you can see it. You notice that the memory usage up there, you see the multiple colors. If you highlight it, if you put your mouse over at a tooltip will pop up telling you the details of all the different types of memory. Because on FreeBSD there are five different types. There's free. There's wired. There's used. Obviously I can't remember that the other two are. There's all five of them. They've each got distinct colors and it'll tell you the exact breakup of that. Similarly it'll show you one of those little boxes for every single CPU on your system. So when you point this at one of our build servers with 38 CPUs in it or something like that, it gets really interesting. With the temperatures too. So that's the temperature and the utilization of that CPU at that point in time. And when you open this in the client it actually has a default refresh rate built into it. So it'll re-request that from the server every second or two. Or something like that just like top does for just giving you an update every second. But then we also added the task manager in below too. Because normally if you're having to do something like that you need to find what was that one process which is hogging all my CPU, hogging all my memory and just running away. I need to select that. I'm going to kill that really quick and get my server back up and running again. So it gives you that functionality right in here as well. Update manager. This would be a front end for PC-UpdateManager which is a tool that we created for TrueOS. I'm not sure if it's imported to FreeBSD or not yet because it relies on some technology which was just made available in FreeBSD 11.0 release. So I think it might be getting back ported or might be pushing it to FreeBSD itself too soon. But it's basically the way that we do updates on TrueOS using PKG for everything at the moment. And so you can check and say, hey, okay, do you have package updates available? Yes. No. It doesn't show in this screenshot but it's actually a way of viewing logs of past updates as well built into this as well. You can also in the settings tab change, for instance, TrueOS has two different major package repositories, stable and unstable. And by default you're on the stable repository but you can go into the settings and say, no, I want to be a little daring. I want to go track the unstable repository and you can do that really quickly and then just update your system and you're on the new repo. So this gives you the front end and the ability to manage your system updates and manage all of that. And if you want to talk about how it actually does updates and how we use ZFS with boot environments, by all means talk to me afterwards. But that's beyond the scope of the Sysidium talk itself. Alright. Any last questions about the client? We've got about 15 minutes left. No? Okay. One more over there. I'll repeat this one. Okay. The question was, does it manage the ports tree? And the short answer is not yet. The ports tree on FreeBSD is just the instructions for building packages for those of you that aren't aware. So this is the APIs that we have already written are only for PKG. They're not for ports itself because currently on FreeBSD there's so many different ways that you can take care of ports that we haven't quite picked one or decided to try and standardize on true OS for how to build ports yet. So we're not sure that we might eventually write an API for Poudrier, the bulk port builder or something like that. That would be a very easy task if somebody from the community actually wanted to write up that API and send that in. That would be pretty easy to do. We just haven't done it yet. So it's on the list. A lot of people have been asking for it recently in fact. Alright. I'm going to go in. I'll mention the Sysidium bridge. It is highly experimental. Note the big red letters. Okay. We do not recommend that you actually try using it yet. But we're still working on it. We're still tinkering with it. We're trying to make sure we get it working before we tell people to start using it. And basically the whole premise behind the bridge is we're trying to solve two fundamental issues. And the questions we have here. What about servers with dynamic addressing? What if you have your son's system over in the back room but it gets it's using DHCP and it gets a different address every time he turns it on? Well, it's kind of hard to find it on the network then unless you go through some of the automated network sniffing protocols, use MDNS, you know, use something like that. That gets a little different. But then that's what you could do on a local server. But what about the next question? What about if it's a server actually behind a corporate firewall and you get a call that a server went down but you're at home and you can't actually do it? You would have to VPN in to the corporate network to do that? Or what if it's a really secure network and they do not allow VPNs? How would you administer that? This kind of addresses that issue. It's like, okay, as long as you have some kind of public IP server, some kind of something you can run in the cloud, Amazon, AWS, somewhere, you know, wherever you want, a digital ocean droplet, you know, take your pick. Basically what if you just put that up somewhere that's always public and then both your clients and your servers, when they were turned on, when they initialized, they actually connected to the bridge instead of trying to connect to each other correctly. And then they say, okay, here I am, is anybody else looking for me? So basically the bridge is designed to basically just be an announcement server, a way for servers and clients to find each other. But the bridge is completely untrusted. Let me see. Do I have that on the next slide? Yes. Okay, let's go to the next slide. The bridge is completely untrusted. We never want to use our real SSL certificates to connect to the bridge. Because it is in such a public place, such in a cloud infrastructure, we don't know if somebody else got in there and hacked it or did whatever. So we're not going to send any information through the bridge or to the bridge that might impact our server itself or our client itself. So the most information that the bridge ever gets is the MD5 sum of the public key that both the server and the client will use to talk to each other. And that is it. Not the actual public key. They have a different set of search to actually authenticate with the bridge. So that's the most that the bridge does. And then that's what it uses, that MD5 sum to match up servers and clients. So when a client connects it says, okay, I have this MD5 sum and I'm looking for servers that will accept me. The servers, when they connect, say, okay, I have this list of MD5 and I will take anybody that associates with one of these MD5s. And so the bridge will say, oh, you connected. I'm going to check. Oh, okay, you match up with this one. So it tells that server about this client. But if you have hundreds of other servers on there, none of them are aware that this client ever connected. And none of the other clients are aware that a new server connected or all these other servers. It tries to be completely untrusted. So it only matches you up with those that are actually accessible. So that's the general idea for the bridge. Again, it's experimental. We're still tinkering with it. There's still a few fundamental issues with how it's actually doing the data transport. Because instead of having all the multi-threading stuff, now it's trying to compress it and go through a single pipe. And that's causing some issues. So we're playing with alternate methods of doing the announcement and things like that. But that's the general idea. So let me answer a question over here first. The question was, do you use SSH tunneling? Or can you use SSH tunneling? We could, but one of the things that that relies on is it relies on an SSH server or SSHD daemon on the server side running on the system too. SysADM gives you access without the SSH daemon. Without that. Because SSHD is probably one of the most packed and most brute-forced ports in the world. So SysADM is a completely alternate protocol that runs standalone. We could piggyback off of SSHD at some point, but we don't at this time. Now another question over here, sir. About the dynamic addressing for that server that's getting DHCP, could you use DIN DNS? Would that also work? Like an external DNS that knows about that machine and that the machine talks DIN DNS, whatever its IP address changes. It says, oh here's a new IP address in the social space. And that's the same kind of idea. Exactly. That's the same kind of idea as what we were addressing at the bridge. This is a very, very common problem. Those two questions are extremely prevalent, especially in commercial environments, especially in big business or secure environments. So there are so many people trying to tackle those two problems and come up with solutions. The bridge is just our way of trying to tackle those same issues and come up with a solution that works for us. So there might be other ways of doing it. We don't force you to use our method. You can use something else. Alright. Let me see what's next. A few minutes early but I will get to my conclusion slide and then we'll just open the general floor for questions. Okay. So I don't think I need to repeat too much of this. It gives you a framework for administering free BSD systems. I think the second point is probably my favorite point about it and this is one of the things that I really am strong about. I'm a free BSD administrator. I like SSH. I like being able to use login to my system from the terminal, change config files as I need be. This will not replace that. I do not have to relearn how I administer my free BSD boxes. It just comes alongside and is an alternate way especially for those that might be less technical than myself. Somebody like my parents who needs to just turn on the Wi-Fi or something like that. They can do that easily. They can do that graphically without having to have a guru if you will come and SSH into their box and change config files around. That I think is the real thing which is special about SysADM as opposed to a lot of the other middlewares because it is just an interface to the free BSD itself. It's not trying to change free BSD and turn it into some other Frankenstein. It's keeping free BSD free BSD so you don't need to change your muscle memory of how you interact with it. It's just an assistance tool that others can use and that you can use if you would need it. It's already in production. This isn't just a thought experiment. This isn't something that's just cooked up. It's like, yeah, you should try this sometime. We might try it sometime too. No, we're actually using it all the time. We have been using it for over a year now on the older PCBSD systems and all the time ever since we changed the name to TrueOS it's been on and active on every single TrueOS system. This is how you administer your TrueOS systems. This is how you do updates. This is how you do everything. Everything is routed through SysADM. It is tried. It is tested. It is proven. You don't have to worry about it being some experimental hacks for the bridge of course. That's a whole other thing. With that, I think I've come to the end of my talk. If you have any questions I do have a couple links here on the slide. There's the SysADM API reference because it is a static API. We have very clear documentation. Exact API calls, examples of not just the request but also the responses so that you can write up your own clients and parsers for, okay I sent this and it sent back something like this. You can write up a parser of that API call without actually having to have a system there to run the test and actually see things. It makes things very, very easy. I actually use those docs when I'm writing the client side, when I'm writing new pages. I'll write the stuff for the back end and I'll send the commit messages and then the doc guy will add that to the API documentation and if he hasn't I'll harass him until he does and then I'll write the graphical page. So now I'll open it up for questions. So in the back there. Okay the question was that he brought a TrueNAS is this built in already? The question and answer is no. This is for TrueOS not for TrueNAS. TrueNAS is the commercial version of FreeNAS and while I am hoping that eventually we will get this put into FreeNAS and then eventually TrueNAS afterwards, FreeNAS still does its own management configuration stuff so it does things completely separately. That's a completely different case. So you said the bridge was designed to be untrusted. Yes. Any possibility that in the future you'd make that a discovery point where service and register and then clients can actually find out about what's in their domain? We might be able to do something like that. The bridge is still an open topic so if you have ideas of things that you would like to do on the bridge that's great. The whole point of it is that we're trying to keep it small. We're trying to let you run it on a droplet that's your only paying pennies a month or something to keep it up and running just so that you have a consistent point of reference so that everything can find itself. That's the main purpose of it. Other things could be added but they're auxiliary. Other questions? Back there, sir. How big does this scale up? How many servers can you manage with a client? Is it in the dozens, hundreds, thousands? There's no limit built into it itself. In fact, every single connection is its own entity and gets its own thread as well. This should scale fairly well although we have not done bulk testing to see how many we can actually do. We've easily run it with 10 to 20 systems and no problem. You could probably scale it up to 100 or more. Still no problem and that's from simple laptops like the one I've got here. Again, we haven't done official testing but I don't see any inherent problems at least with scaling and there's no built-in limits out of box. Do you have a requirement for hardware for the installation of the server? What will it take on the client? All right. Hardware requirements. Only hardware requirement we have is a free BSD system. Really, that's really the only hardware requirement. SysADM itself, the server component, it only uses I want to say just megabytes of memory. It's very, very low weight and it doesn't really use much because the way it works is it normally just doesn't do much but it will spin up threads every time there's a new connection that goes into a new thread. Everything that connection sends in a new request. The handling of that request also goes into a new thread which is then destroyed once the request is finished. It should scale very, very well and only use as resources if you're actually being hammered with requests and replies right at that moment but it frees them up instantly as soon as it's done as well. We try to make sure it's very light weight because we know people are going to be running this on very powerful servers that are designed to do something else. All right. I think I've got time for probably one more question going once. All right. Well, thank you everyone for joining us today. There's a one hour lunch break. The next speaking session started at 1.30 and thank you Ken for a great presentation. Thank you very much. So, there is a lot of interest in that right now and there's a lot of people that's one of the other reasons why I couldn't give you a time frame. That's not the state that I generally think you run them over every time. I think it is. Yeah. Only to look if it's not. Okay. Yeah. If it's not important test, test. That's the microphone test. Does anybody have a pebble charger on you? My watch is dying. Good afternoon everybody. Thank you for joining us today. Today we have Nick Shadron who is with Nginx based up in San Francisco. Second time speaking at scale and he will be talking to us today about Nginx and tuning it for high performance. Let's give a warm welcome to Nick. All right. Thank you Matthew. All right. So, first of all, a couple of housekeeping slides for this talk. First, I put all the links that will be present on the slides in one page so you don't have to actually take your camera out or try typing everything that's going to be on the slides. There is a nice landing page for everything. And this slide will be at the end of the presentation during the questions so you can get there later on. If you wish to tweet, please do so with the Twitter handles of Nginx and Nginxorg. And also during all the questions and stuff, I was able to save a couple of T-shirts from our booth. So, two best questions will get T-shirts. About myself, my name is Nick Shadron. I'm based in San Francisco and right now I'm a product manager for Nginx doing some product design work for the different projects that we have in the company. And I've been doing different performance optimizations in different areas of the web technology for the last probably about 16 years now. I'm available through email or just tweet me at Shadron. In this conversation we'll talk a bit about, first of all, about the basis of the Nginx configuration about the objects and its hierarchy on how to configure the minimal configuration. And then we'll go into the performance optimization of that configuration. We'll start from some operating system level optimizations as some CCTL parameters and also the general architecture of your application, how to scale it properly. And then we will dive deeper into the Nginx configuration file into the specifics of the Nginx config. Also, we will show the Nginx architecture in a little bit more detail and talk through the reasons why that software is so fast and powerful in the different circumstances of the parallel and high performance load. Who here is using Nginx? All right, nice. So we can basically skip most of the introduction about the company and the product that we are doing. However, I want to touch on just a couple of points. We recently grew up in size quite significantly and it's right now more than 120 employees in four different offices across the US and the globe. And right now the company is hiring in almost every department including software engineers in the high level languages like Python and JavaScript for some of our newer projects. So not only C but a bunch of other things. Talk to me or talk to nginx.com.com. All right, so what is the web scale that we are talking about? For many smaller projects when you try to look at how the project is performing you just take the standard set of tools like Apache Benchmark and trying to gauge the latency of one or several simultaneous requests. But in the current world there are several different reasons why this kind of performance measurement is not going to show you the real picture. First of all, when you are measuring a low number of parallel connections that's not exactly how the users will access the web site. If you have a popular web site or if your application is going under a high load you are probably looking at the users in parallel as hundreds or thousands of requests going into your server at the same time. And when those requests are landing on your server at the same time there is a completely different pattern of the load that's getting into your web server and into your infrastructure. That's one thing. Second, if you are measuring your performance from the same network without any latency, you're not showing the real world and not using the real world data of people with the Wi-Fi connections of a conference or from a poor 3G connection somewhere in the car from a mobile phone. Even worse when it comes to some IoT devices where the stack of the IoT device is not using the correct kind of network which allows it to work really fast and your client devices are really slow. So all of those reasons they actually change the performance landscape and the reasons and the ways how you actually architect your applications and how you will be scaling them at the end. As far as the design for scaling when you are just starting out with developing your application you are probably thinking about just having it running in one node, in one machine, maybe even one let's say Node.js thread or something like that. But when you are just starting to code your application if you know that it can be something more powerful that will require more scale after that, you need to think about how it's going to be working when you will have more than one of those processes and more than one of those servers. Designing for that within the application, segmenting some things out that can be taken out of the application or segmented into a different service, that can actually help you a lot in scalability of the whole thing. Let me give you an example here. If you are bundling things like authentication deep within the code of your application it might be harder to take it out in order to either change it or in order to scale it separately from your application as a whole. Scaling the whole large application would require you to have a copy of your code, a whole copy of your application servers distributed across your new network and making sure that your application is able to be separated into different parts, that gives you an ability to scale each one of those parts as needed further along the road. Maybe even years from now that can be easier scaled than the whole application in itself. Some people call that micro services approach, some people call that distributed applications. Whichever way you want to call that you need to think about the scalability of the different parts of it. Another thing that I want to put in the very beginning of this presentation is the use of caching and the use of micro caching. We often talk to a lot of people who are saying that their application is not cacheable. Sometimes when we are trying to dig a little bit deeper into the application into the different parts of it, we actually find that there are many places and many specific URLs or API endpoints or even other smaller parts of the application that can be cached not necessarily for days or weeks but maybe for just a couple of seconds. In many types of the application that kind of micro caching, the use of very small timeouts is helping you to achieve a greater level of performance for specific services or just for some smaller API endpoints. So Nginx is sitting in the middle of the network. It is accepting your client connections and then forwarding those requests to the backend servers using HTTP or application server protocols. So what makes this Nginx process so fast and scalable? Let's zoom in that picture and take a look on that. If you stopped by our booth today or yesterday, you probably saw this whole picture right there. And right now it takes some time to discuss what are those working parts of this process. So what we have here is there is a master process in the very middle top of the screen. And this master process is basically managing the reading of the configuration files it starts listening on the sockets and it spins up the worker processes and the workers accept the connections and the process requests. As far as the use of external objects as files, log files, temporary files, static files that's done by the worker processes. And the same is true for the outgoing connections as well. So all of that information flow is not going through the master into the workers, it's the worker processes that are handling that in the system. Next, the worker processes are doing that using the EPOL or KQ event loops and what they are doing is they are serving several different connections, several different requests at the same time. You don't have to spin up as many workers as you would need for old school application servers with a blocking idea. So here you would need significant fewer workers in that case. Another thing that we have is the thread pool right there in the bottom of the worker processes you see that workers can also spin out specific threads. And those threads are used to offload the blocking IO operations as reading from disks and writing into the disks. That is very useful for the load patterns when you have the network connections going together with a low-performing file system request. This architecture together with the worker processes, there is a couple of processes that can be created if you have Cache enabled which is the Cache loader and Cache manager processes. All of those processes you can easily see in the system using just your ps command with a bunch of parameters. Alright, let's go into the operating system tuning. And the most part of that is related to this cctl parameters that you can change for your networking operations. First one is net.core.somexcon and this is the maximum number of connections that can be queued for accepting by the NGINX processes. The default of that somexcon parameters is usually quite low. However the network is usually pretty fast and you rarely see that parameter to go overboard. If you do, the messages about that will appear not in NGINX logs but rather in the kernel logs. So you will see messages from the kernel about that. Tune in that number and increase in that number can help in that particular problem. So that's queuing for acceptance by NGINX. Next, the second one is the backlog, which is the buffering of the network cards. And that parameter can help you in case if you have very high bandwidth and if you have a very high bandwidth then the buffering on the network card can be also increased. Next is the IP local port range. And that parameter is important if you have very high number of outgoing connections from the NGINX system to your back end servers. Default for local port range for Linux is something in the range of 32k ports but you can easily increase that to almost 64k. Basically you define the low port and the high port of the outgoing connections. That is also related to the problem of ephemeral port exhaustion that we will talk about in a bit more detail. The fourth parameter is related to the file system as the maximum number of file descriptors. And that is also in the same area that you limit commands for the specific users. So there is a system level limit and then there is a per user limit on the open files. Here is the thing about the open files. Your connections to the back end are also file descriptors. So the number of open files relates to that number as well. For example, if you want to have a nice experiment on running out of file descriptors you can proxy NGINX into itself so it will open as many file descriptors in an infinite loop trying to proxy into itself so it will run out of them eventually and you will see a nice message in the logs about not being able to open a file as too many open files open. Although it seems like you are not opening any files, you are still opening the network file descriptors. So increasing that number can help in most of the systems when you are running out of open files. And in some of the operating systems the default limit for the user is something as low as 1,000 open files which is really low. It gets you to a very low number of connections. You can safely increase that to a much higher number. The problem of ephemeral port exhaustion is when you have too many connections going to the backend servers and you don't have enough local ports to continue opening those connections to the backend. You might have more than one, you can have many different backend servers behind the NGINX box and the first thing that you would have to do is increase the local port range as we just discussed. The local port range will get you to up to almost 64k open ports and that is plenty for many but not for everybody. Another way to deal with that is to split traffic across multiple IP addresses. You can add several different IP addresses to the NGINX box and then use the directive split clients together with proxy bind and that is discussed in full in the blog post in that link and that's also available on the page that was shown in the beginning of the presentation. Using split clients together with those IP addresses also gives you an option to split traffic unevenly between those addresses in case if you want to give preference for some of your IP addresses over to others. So since version 112 NGINX is now using the socket option IP bind address in the local port which is a really interesting one. When you are opening a connection you have a source IP source port, destination IP and destination port as the uniqueness parameters for every one of those connections. Those four values need to be different for each and every connection. Let's say you are connecting from the NGINX system to just one backend which means your destination port and destination IP is always the same. That gives you up to 64k outgoing source ports which means you can open up to 64k connections per IP address. Now let's imagine we are connecting to not one backend but load balancing across four backends. In that case our backend IP addresses are not unique to one. There are four options which means with this option for IP bind address no port in one IP address you can open connections from the same port to multiple ports and IP addresses on the backend keeping the uniqueness basically multiplying your uniqueness parameters by four and having not 64k per IP address but four times like that. Basically you are taking as many times the local port range per IP address as many backends you have. We can talk more about applying that in the real world later on if you want more clarification of that. Let's take a look at the minimal NGINX configuration right now. This configuration file is showing you the full NGINX.conf with the minimal events configuration and minimally working HTTP versus proxy to a couple of servers. Basically you are looking at a load balancer in this case where you are tuning that configuration to tuning that configuration for a simple web server listening on a simple port 80. For a configuration like that we often receive questions like okay we created a config file it's working fine for a simple one site what should we do when we have more than that. Should we separate it should we create different include files should we try to include all the servers or all the logical items into the same server block. The ideas for that configuration are actually very simple. Don't afraid to copy the same directives over and don't be afraid to copy the server blocks into the different configuration files meaning if you are used to the apaches configuration where virtual hosts are defined in the different config files outside of the general config file you can use the same approach here and have one server block per config file and then have them in the same directory like sites available or ConfD inside the ATC NGINX directory. Doing that in the same file is also acceptable if you like gripping that same file and making changes to many directives at the same time that is also just fine. Basically instead of creating a very tough regular expression trying to describe all of your URLs in one location block it can be significantly more complicated than creating a dozen of those location blocks separately and copying the same content across all of them. In the long run when you or your successors will look at that configuration file if it's simple if it's readable it's easier even if it's a longer config. So don't be afraid to take the configuration file of the larger size. Alright let's talk now about some NGINX performance features that are very relevant for an environment with high performance. The worker processes is the directive in the NGINX configuration which is defined at the very top level and depending on the NGINX distribution that you took that directive can have different parameters. For the packages that we compile and distribute for the most popular Linux systems that parameter is set to auto and it means that NGINX will take the number of available CPU cores and spin up as many worker processes as the number of available cores. So here's two things about that. It's great for most applications but sometimes the number of worker processes doesn't have to be equal to the number of available CPU cores. An example would be some of the containerized deployments where a container can see the CPU cores of the host systems however it might not know about it being limited to a single or to a lower number of CPU cores. In that case if you run that NGINX system in a container and define the number of cores lower than your number of available cores you need to also adjust that number in the NGINX configuration file for optimal performance. Another example can be when you are running not just the NGINX system but also a number of other different software packages, application servers or web servers or other software in that system. Limiting NGINX to the specific number of CPU cores gives your other cores available in full for any other processes. So if you have to compromise this as having different software in the same server reducing that number maybe even down to one CPU core in some cases it might make sense. Another thing that we are using a lot is when we are benchmarking NGINX across the different multi-CPU systems we often make that change to see how it's working across different CPUs as just limiting the number of cores that it can use. In addition to that worker processes can be set with affinity to a specific set of CPU cores let's say if you have just four cores and you need to take the core number zero and core number two for NGINX you can set worker CPU affinity with the binary mask of every process and have it set to those CPU cores only. Since about July we also have a directive called worker CPU affinity auto where NGINX is binding to the CPU cores automatically and not changing the CPU cores on the fly. Next is the worker connections. The worker connections directive is per worker so let's say if you have that set to 1000 worker connections and you have 10 workers it means you have 10,000 simultaneous connections capability on the front end of the NGINX system. Increasing worker connections further than its default of 512 for 1024 is usually safe for most of the current service and operating systems so you can look into increasing that number in case if you are running out of the available worker connections. And the next directive is the R limit NO file which is the limit on number of open files per worker process. That directive should go in conjunction with the CSTTL and you limit system level commands and system level restrictions that you set in your server. And also if you are running NGINX together with other software don't forget to allocate some resources to that software as well. If you are running of either worker connections or a number of open files you can look up the error logs of NGINX and that information in case of any errors will be seen in the error log. Next part is really interesting. There is a directive which actually had a lot of controversy in NGINX mailing list a few years ago about the accept mutex and accept mutex is basically a directive which allows you to set if NGINX workers will be receiving the new connections in turn or they all will be notified about these connections. If you turn accept mutex off this is generally safe for most of the systems however in a low performance scenario when you don't have a lot of stuff going on through your server having accept mutex off will waste a little bit of system resources which in the current generation systems is usually not a problem. So we made a decision and changed the accept mutex to off in the default configuration started from version 113 which was done just in the middle of last year. If you are using the systems in the 110 branch or 119 branch or something even older you might want to switch accept mutex off manually or upgrade to the latest packages in the 111 branch. Next is the send file. So how does the send file help you here? Who knows about the send file operation here? That's great. The send file allows you to basically put the data from the file descriptor into the network file descriptor directly bypassing a bunch of processing in the user space and in the worker processes. So send file can be easily enabled for the static file operation and it also helps for caching as well. Send file is not really working when you have any content modification in your configuration. If you are modifying the content with sub filters or if you are zipping or unzipping the content on the fly, that directive is not really helping. But if you are delivering cat pictures directly from disk that gives you a boost in performance. Another directive that is described here is the asynchronous IO which is AIO threads. It defines the thread pools which are used to offload the disk operations into the set of those thread pools outside of the normal worker process. So here is the thing that the IO operation can be blocking in the process and the process in the Linux system will be waiting for the disk to come back with any data to it. So downloading that call into a separate thread allows us to not block into that operation and continue working serving other requests and other connections at the same time while a thread is waiting rather than the normal process would be waiting. That also keeps you from a workaround that some people were using before as worker processes over the number of CPU cores for some of the workers to wait on disk while other workers being available to work with the existing request. With the thread pools that workaround is not required. You can read through that thread pool blog post over that link and actually I talked with our marketeer people so that blog post is one of the most popular blog posts of our website. So definitely a nice very technical read. Let's take a look at more features related to HTTP operations. We talked quite a bit about the system level, file system level configuration and now let's dive a little bit deeper into the protocol. So there is something called HTTP Keep Alive which is a very useful and often used functionality across basically all of the websites in the Internet. Make sure you didn't accidentally disable it anywhere because HTTP Keep Alive, the way it works it basically allows you to put many HTTP requests and HTTP responses into the same connection. It's okay sometimes not to have it for HTTP when it's unencrypted. However when HTTP becomes encrypted and you start negotiating HTTPS all across every one of your requests, not having HTTP Keep Alive on the client side really hurts performance. Same is true if your backhands behind Nginx are also working with HTTPS. Let's say if your backhands are on HTTPS there is a bit of more configuration that you need to do in order to have a Keep Alive set for those backhands. What we did here, we made an interesting benchmark which I briefly showed last year during the HTTP 2 conversation here at scale. And that benchmark shows you page load time. The page load time for different latencies with HTTP Keep Alive enabled or disabled for HTTPS traffic. What you can see there is that if you have something like 50 objects on a page your load time for HTTPS increases several times compared to that load with HTTP Keep Alive enabled. And another thing that is shown here is the HTTP 2 compared to HTTP 1 that's on the bottom of the page. For configuration of HTTP Keep Alive on the front end you need to set Keep Alive request and Keep Alive timeout to the appropriate values here. The defaults are around the numbers that are shown on the page. It depends on the distribution and who compiled NGINX with which parameters for your operating system. However, you might want to tune it to a different number. The Keep Alive timeout not only helps you with having more requests going into the same connection across different user clicks on the website it has a downside. There is a connection open to your server for that number of seconds. If you are cautious or you cannot keep too many connections open for any reason you might need to or might want to reduce the Keep Alive timeout to a more reasonable value. It is up to you how to play with that directive and see the results in performance and the server resource utilization. It is a little bit different when you are configuring HTTP Keep Alive on the back end side behind NGINX. When you are doing that you need to set HTTP version for the back end communication to 1.1 and that needs to be explicitly set there. Another thing is to make sure that your connection header is not saying close so the servers would not negotiate Keep Alive. Basically we are clearing out the connection header here. In the upstream block we are also defining the number of Keep Alive connections, the maximum number of Keep Alive connections that can be open to your back end servers and that is the Keep Alive directive in the upstream block. Basically when you are looking through the documentation make sure that you are looking at the correct side of the things, front end Keep Alive versus back end Keep Alive. That can be a little bit confusing because of the directives as Keep Alive requests and Keep Alive timeout referring to a completely different side of things compared to the directive Keep Alive within the upstream block. So make sure you got the context of your directive right there. Next, the HTTP caching. We will not talk too much about HTTP caching strategies and guidelines. It basically is a completely separate talk that takes about an hour and the next instance of the HTTP caching presentation that will be happening is going to be at OSCON presentation by Kevin Jones, one of our solutions architects which is specifically tailored to the HTTP caching and the caching with NGINX. There is a couple of things about HTTP cache that I can mention here which is a couple of blog posts. One is about micro caching strategies and the other one is about placing your cache in a different locations, different disks, and how to split your cache properly. Once again all of those links are available on that page that will be shown at the end of this talk. Last year, about a year ago, we talked about HTTP 2 here at scale. Who here went to the HTTP 2 conversation last year? Alright, a few people. I will just walk you through a couple of things about this protocol. HTTP 2 went live and was created as a standard in 2015 and it's basically based on Google's speedy protocol. The idea for HTTP 2 is to improve performance for real world websites and real world web requests. They compressed the header with the HPEC. In some requests, headers are taken more traffic than the actual data in the request. Compressing them correctly allows us to save bandwidth for some load patterns with very low amount of data per request. Then HTTP 2 allows you to create multiple streams of data within the same connection. Normally a browser in HTTP 1 opens several connections at the same time to the web server and it's trying to download several resources at the same time. It's about five or six connections depending on the browser implementation on average. With HTTP 2, all that data can be flowing within one connection. What's good about that is multiple streams can have different priorities and the browser figures out what it needs more or does it need a background picture more or does it need the actual HTTP page or a JavaScript more. Normally the prioritization within the browser will work as defining the priority for each resource and not relying on the operating system to prioritize different connections on the level where it doesn't understand anything about the actual necessity of resource. The browsers are supposed to be a little bit smarter than the operating system in that regard. Another thing that was introduced with HTTP 2 is the server push. That's a method of the server to send information to the clients without the clients actually asking that. They will be sending request and response header as telling the client what it should have asked for. We made a few interesting benchmarks with HTTP 2 and we ran that in Ubuntu 16 with OpenSSL 102 under Chrome and the results proved to be quite interesting. What we measured for HTTP 2 was a page with a set of resources and on one axis which is the vertical axis varied the number of resources on the page. It went from 0 resources up to 140 resources there and on the horizontal axis we changed the network latency there. Here's the thing about HTTP 2 that you will find interesting here is that if your web page or if your HTTP resource has just only one resource for connection or very few resources for connection, you're not benefiting much from HTTP 2. And if your network latency is really low, just the same, HTTP 2 is not giving you much benefit. But if any of those numbers is going up as either you have many objects on the page or your network latency is more real world as 100 or a couple of hundred milliseconds, in that case HTTP 2 actually makes quite a bit of sense. Here's what we have as a couple of reference points from those graphs. As with 40 milliseconds and 50 objects on a page, HTTP 2 is only about two times faster than HTTP 1 and once you go up with 200 milliseconds and 100 objects which is basically loading a page over non-perfect 3G connection of a phone, that's when HTTP 2 helps you quite a bit more. People often are asking if it's time to use it. And that slide is from last year showing that about 76% of the real browsers are able to understand HTTP 2 with a couple of those browsers as all-perimini or UC browser not benefited from that at all. That is because they have their own proxy infrastructure and they don't care about the high performance web as they are modifying the pages themselves before delivering that to the mobile phones. And the current slide shows that this number is growing even more with almost 80% of the browsers usable for HTTP 2. And if we take opera mini out and UC browser that's mostly popular in China out of that picture, we'll get to almost 100% of the browsers capable of it. Last year in January, we looked at the statistics for that and we found that there were different spikes in usage in HTTP 2. So we took that data from W3Tech's website as the publicly available statistics for that protocol and it was about 6% usage. Nowadays we see about 12% of the websites switched to that protocol and it's interesting to see that the dynamic now is more stable with more popularity of this protocol towards the high performance websites. Basically if you have a small project you might not really care that much about the milliseconds of performance. However, many of the big guys already switched to HTTP 2 across their whole web infrastructure. In the next section here, let's talk a bit about measuring the results. Alright, so let's say you made changes to the nginx configuration. You made changes to your infrastructure and you need to figure out what those changes were and how that actually affected your environment. There are multiple ways to measure that and many of that comes from the log format directive and the variables that you put in the logs of the nginx system. The most interesting directive for measuring your website latency is the upstream response time, request time and also the cache status. All of those for the upstream response time you will see that data in the logs and in some cases it will show you multiple values. It is possible when nginx is using different sub requests and forwarding traffic across multiple backend points so the response time can have not one number but many. Upstream cache status is extremely important when you are using any cache directive and any cache environments and for those cache environments you need to figure out your hit or miss percentages. Using upstream cache status in the logs gives you one way of figuring that out. Next you can get many of the metrics that are relevant to the overall operation of the whole nginx system with the stub status module. Most of the distribution of nginx they are compiled with the stub status enabled. If you are compiling it from source make sure you are configuring it with the stub status. If you are lucky enough to use nginx plus you can get even more statistics with the JSON output of the status module. That one gives you information about every upstream and every server block with it. There is also another tool that we recently introduced and made publicly available for everybody which is called nginxamplify. nginxamplify is a SAS based monitoring system for nginx that shows a number of different metrics. Many of those metrics are delivered from the stub status module and from the system level and what's most interesting that the agent that you have to install together with nginx on your server will scrape the logs, understand all your log format directives and you will be able to create customized graphs and filters within the amplifier interface in order to create and monitor the graphs that are relevant for your specific deployment. For example, a good example of a filter is a number of your 500 errors versus a number of your 502 errors versus a number of your 400 errors. Seeing all that information and especially seeing when that standard number changes to something else allows you to more quickly react to any kind of problem with the nginx environment. In addition to that, Amplify gives you a way to monitor your configuration files and it gives you advice, automated advice, if configuration files are not configured according to the best practices. It even gives you the directives to add to your config files and locations in the config files where something is not configured to the best that it could be. So that is a very interesting system. Installation is like 1, 2, 3 very simple couple of steps that you need to go through and it is currently in a publicly available beta. You don't have to have a commercial version of nginx to have Amplify supported. You can compile your own and still use that system to go through the monitoring. We almost took 50 minutes for this talk and a couple of things here. So we went through the scalability of your deployment, how to plan the deployment there. We went through a number of features on the system level configuration and went through the nginx configuration functionality from the system level into the HTTP keep alive, HTTP to configuration and a number of other things. We went through the measurement of your results and through the monitoring features. A couple of things on how to contribute to nginx project. You can copy clone the mercurial repository from hg.nginx.org or use a github link there. And you can sign up for the nginx.org mailing list on that mail main page. The mailing list for nginx is very active. People are actively exchanging configuration snippets asking questions and exchanging ideas there. If you are more into the development of the project there is also nginx devl mailing list. Those lists are more related to the C programmers who want to contribute more to the project. Alright, so at this point let's take a few questions and if you want to go through the pages and links that page is going to be available on the screen. Yes please. We make a lot of dashboards that combine business metrics as well as system metrics and web server metrics. A lot of the monitoring vendors have great features and APIs where you can pull things out of it. So I was curious does Amplify have an API that you can pull data out of it? Inside of the Amplify system the way it's built underneath the hood it is using a lot of different APIs to pull data and push data around. However as of today those APIs are not available to the public but we have received this request before and the Amplify team is working on their roadmap so the details on the API access that will be available along the course of the year. Thank you. Next please. On the slide about the back end configuration after your Keep Alive part you had HTTP 1.1 is there a reason that Nginx doesn't do 2 as of right now? As of right now Nginx doesn't do HTTP 2 to the back ends and that's once again a question that's very relevant to our roadmap. There are considerations of having that in the roadmap of Nginx however it is a very different project compared to having HTTP 2 on the front end. Connectivity to the back end with that is significantly more complicated for the architecture of Nginx. You can join the development mailing list and if you want to help with that or discuss that with more detail you can do that there. So you mentioned that on the tuning the worker processes with regards to CPU affinity have you noticed an improvement on tuning that in virtual environments like AWS or OpenStack? Tuning CPU affinity in AWS depends on what kind of machine parameters you have there. If this is some kind of simple shared VM CPU affinity to a virtual processor is not extremely helpful. Maybe it could even hurt but we didn't do a lot of tests in environment like that for that part of functionality because of the internals of the environment being so unstable in that regard. So I see your question is exactly on point for the CPU affinity I would say not so much use for that kind of environment. So you mentioned at the beginning that in a real-life network you have all sorts of different clients that may have different connection properties. Some of them may be on networks that are suboptimal. I was just wondering what sort of discipline do you use in order to simulate that kind of distribution of different client behavior. Like you know it's sort of like low generation or anything like that. You can run the low generation on a different machine and use the Linux network emulators to increase the latency. That's one way of doing that. And another way that we use for some of the emulation is use very strict rate limits on the NGINX systems in between just to have the content delivered very slowly to the front end. When it's delivered very slowly your number of connections goes up and you will be able to see how it's operating. So basically the Linux emulators and rate limits. Any other question? I was just wondering if there's any tuning advice you would have for people that are using like a server-side language through FastCVI? There is nothing specific about FastCVI tuning. If you are running FastCVI server on the same machine you can connect to that FastCVI using a unique socket rather than a TCP connection. That makes it a little bit faster. Other than that connectivity to FastCVI is sometimes not as complicated as HTTP so there is not too much there. Thank you. You have a slide up with the worker processes and you mentioned something about if it was running in a Docker container that the number of CPU cores may not be relevant. Can you explain it but I kind of missed that part? From within a container in some container environments the list of CPUs is showing the host CPUs and when you're launching the container you can limit which CPU core it's using but it still doesn't change the visibility of the system from within a container. It's better to leave it on auto. So there are two ways of doing that. If you are limiting the CPUs in Docker or in your container system you have to limit it to the same static number in the config file. If you are leaving it to automatic on your container system you can leave it automatic in nginx. So I have a couple of t-shirts left and I will have to decide which questions were really fun and I wish I had more. I will give one to the HTTP2 to the upstreams that's a question because many people do have the same question so you got the choice between medium and large. And another question was about the CPU affinity in virtual environments which is a really interesting question and that t-shirt goes to you and it's a medium size. Thank you everybody. It was a great pleasure hearing. Test check everyone. Good afternoon everybody. Today Jonah Horowitz is going to be talking about how configuration management is an anti-pattern. He recently came back from a trip to the Philippines which sounded really gorgeous and he's joining Stripe at the end of this month. So let's give a warm welcome to Jonah. Good afternoon. When I got my first job at a tech startup this was literally our release procedure. I would check out the latest version of code from our CPS repository. I'd copy it to the first machine. I'd move the old website directory out of the way until the new website directory then I'd restart the server. Once I'd finished 15 machines the website would be running the latest version of the code. After a few months there we hired a real sys admin and he knew bash. So once he taught me a few things I wrote a little script like this which I'm sorry for the size but you can see that it's basically the same code at least if you're in the front row but in a bash script. So now with my bash script I could re-implement Ansible. I mean write SSH in a for loop and push out all the code to all the servers pretty quickly. This is how I managed releases for the world's second largest e-commerce site. This script eventually became a thousand lines of Perl code with a front end web interface and from what I hear it's actually still how they're doing releases. So back in 2001 how did we get a server up and running? How many people here have used a server install process similar to this one? Yeah like most of you? Okay it's funny because before we had good online update mechanisms for our servers you actually had a consistent build across all your systems. But only because you never updated anything right? Once you started running updates as part of your install chaos ensued. Now you might think we've gotten away from this particular anti-pattern but there's a cloudy version of it too. And if you do this well every server in production will be running a slightly different version of your operating system. Let's face it we only a few startups that are basically running like this. So a little brief bit about me. I started a hacker running a DOS base to dial up BBS back in the early 90s. I was lucky to be able to intern with Sun and from there I went to go work at the help desk at my university where I got a taste of DeckVMS. Anybody run DeckVMS anywhere? Like one guy, two people? Cool. Also I gave my first presentation ever one year ago on this very stage at a scale so if you're nervous about presenting or giving talks like scale is a great place to give your first talk. I'm a certain reliability engineer these days. I was just at Netflix for a couple years. I worked at a half dozen startups around the bay for the past 17 years and I'm joining, like you said, I'm joining Stripe in about two weeks. Here's my email address if you want to ask me any questions after the talk. Also I'll be tweeting a link to the slides. So a few startups after that first e-commerce site. Started working at a tech company with this guy, Nate Campy, and he wrote this book. This was the first time I'd ever encountered a real configuration management system. It was a revelation. It was super cool. Nate had implemented about half the infrastructure with CF Engine and over the course of the next couple years we got 100% coverage across the entire infrastructure. We could reboot and reimage any server in production in a couple hours. Except Kavjet, the database servers of course. And I'll talk about that more in a minute. So before we were running CF Engine we're sort of in this state. After we were running CF Engine it was like that. I think anyone who's used configuration management has seen a pretty similar and like done it well has seen a pretty similar outcome. So eventually I left that startup and I went to another one. And when I was interviewing I always asked what kind of configuration management are you guys using at your startup? Because I was not going back to another startup that didn't use configuration management. I was all in on DevOps, all in on automation. The company I eventually joined, one of the people interviewing me said we're running Chef. And then a little bit later I was getting interviewed by another guy on the interview panel and he's like oh we're running Puppet. And you might think that was probably a red flag. But I joined them anyway. It turns out they were actually running both. And we spent some time there getting everything moved over to Puppet. Now you might ask why did you choose Puppet and not Chef if you were running both. And I'd love to give you some great complicated story but really it was like way more Puppet code had been written than Chef code so it was easier to refactor the Chef into Puppet than the other way around. So back to point of my talk. If configuration management was so revolutionary why shouldn't you use it? From what I've seen there's basically two methods of running configuration management in production. In the first mode you've got an operation team that owns everything to do with the configuration management system. They bottleneck all changes in production. Everything has to go through them. They're the only ones with commit access to the repository. And this is like the antithesis of DevOps. It sucks in every way possible. Nobody likes it. Okay. So then you've got the other option. You expect developers to run configuration management on the clusters that they're responsible for. This is the more DevOps way of doing things. It totally makes sense. But it's ever a completely different problem. Now you have to teach developers the DSL of whatever configuration management tool you're using. And it's not that they don't like want to learn this. It's just that they don't generally touch it very often and it's like a yet another thing they have to think about and they're not going to do it enough to really get good at it. Depending on how you deploy your code to production, every developer now has the ability to take down your entire system with a poorly written configuration change. I once had a developer kill off everything owned by REIT in production. With the exception of init, on a 4000 node cluster, including SSHD, syslogd, and critically, cronD, which meant that our configuration management tool stopped running on all of the servers. Now the cluster was supporting a distributed file system and the idea of just like, hey, let's just reboot all the nodes in the cluster meant would have made a significant amount of data loss or potential for significant data loss. So we had to log into every machine over console with a Diceware generated password on a crappy serial over SSH interface and it took like seven people a couple of days to get the cluster back into a completely running state. So it's possible to fix that problem, maybe with some code reviews you have a separate configuration management branch for every cluster in production so that at least developers can only shoot their own team in the foot, but like then you've got this problem of how you manage common infrastructure code, maybe you use some like Git sub modules and like you've got a mess, right? So, and that doesn't even address this issue, which is out of sync configuration, right? Anyone who has run configuration management at scale has run into this issue. At any given time some percentage of your fleet is not up to date. That's for many reasons either broken configuration management server, they don't run configuration management at the same time like I said we were cron triggered and the crons were spread out so we didn't stampede our configuration management server, maybe you've got a broken network for a few minutes, you've got some buggy code, you've got a bad configuration push or whatever, you've always got some part of your infrastructure out of sync. To solve this problem you write a bunch of error catching and correcting code to handle your ways that your configuration management tool breaks and then you write a monitoring alert that says if any node in my cluster is out of date by longer than a certain amount of time trigger an alert so you're like responding to machines that are out of date and no matter how hard you try you're always going to have some unpredictable part of your infrastructure. So configuration management promises that you're always going to know what's going on in your cluster or at least it's going to be in a known state but it actually doesn't deliver on that promise. And then this isn't exactly unique to configuration management environments but it's enabled it by it, right? Everyone knows that one server, that's super important but nobody's gotten around to automating the build of it yet. That one server that's a single point of failure maybe your configuration management server that one server that Bob who now works at Hooli set up and nobody knows how to rebuild. Yeah, like that one server so none of this addresses the problem from the beginning of my talk which is that release engineering still sucks because configuration management doesn't really do release engineering. It can be tortured into the service of such but you end up normally running another tool on top of it and so your configuration management doesn't even cover your entire system. That doesn't mean I haven't tried, right? You can update the release version in your puppet config and then gate that deployment to a subset of your servers and then ungate that and change the deployment system and then you try and automate all of that so that you don't forget what you're supposed to do first and second when you're trying to do an emergency bug fix at 3 o'clock in the morning. So if it's so bad what's the alternative to configuration management? What's the next thing? And here to tell you surprise the alternative is immutable infrastructure. If you're running in the cloud this means baking your AMI with your application code already on it. See imagine a world where you finish writing your application you push your code into git it gets built into an RPM or a Debian package that package is installed on an AMI. That AMI is a base AMI that's like a well-known state. And then your AMI with your software installed is pushed to all of your Amazon regions and you use that as your unit of deployment. I tell you all this because I've seen it. This is exactly how Netflix runs its infrastructure. Gradle and Nebula to build the application packages the bakery and Ammonator process to build the AMIs spin occur to run the deployment system. But you know some people are going to say but we're not running an Amazon we have our own bare metal infrastructure we're running somewhere else. You can do basically the same thing with Docker. In fact Docker makes it even easier because Docker images are basically immutable. As a side note if you're running configuration management inside your Docker images you're doing it wrong like just stop. So let me walk through that a little bit more slowly base or foundation to AMI. This is an AMI that's either your handcrafted optimized image built by your performance team with some input from your security engineering team or it's just the upstream base image with the security updates applied. You might want to install a few other things infrastructure packages monitoring, logging, service discovery that sort of thing. You could use the configuration management tool to build your base image if you wanted to. But you could also use properly configured operating system packages with correct dependencies maybe a little Python to finish off the install. Once you have that base image you probably want to canary it so you have like your nightly base image or your weekly base image and then your stable base image with some sort of security release update cycle that you can push faster. And Netflix we have this problem where this is a security update we can't release the nightly and then wait and then release to production because we might have internet-facing services that need an immediate patch. So there was a way to just arbitrarily kick this off at any time and just only include the latest security updates or that specific security update not like a full update that you might otherwise get. Once you have that base image you install your application and its dependencies using your standard package manager. App get, yum, whatever. Basically if you have your application with its dependencies properly specced in the application in the like RPM or Debian package you should actually just be able to do this in one step. You compile that into a new application specific image and push that out to all your regions, voila, a mutable infrastructure. So some tools that you need to make this work need a way to build your application into a package as opposed to just like a tar ball. Netflix uses Gradle for this purpose. It's super easy and I recommend it has the ability to build both Debian and RPM packages. You need an image build system. Netflix uses Amonator and it's open source. You can get it on Netflix like GitHub.io, but you can also use Packer. A lot of people are using these days. New deployment system, I don't know if that's for you. Spinnaker is open source and if there's one piece of Netflix open source tooling that you go and check out over the next year like Spinnaker should be the one thing. It is an awesome piece of software that can deploy in pretty much every cloud provider and Kubernetes. It's easy to use. It's easy for your developers to configure. It's incredibly powerful. You can also use Terraform CloudFormation. You need something for service discovery. If you're going to run the same thing in Dev, Test and Prod and you're going to use that same AMI each time then you need some way that that AMI discovers the other services that are in that environment. Eureka, Zookieper, internal ELBs. You could probably use DNS for it if you wanted to. Finally, this is totally optional, but you might want to have a way to do dynamic configuration of your system. Fast properties, feature flags, that sort of thing. What are some of the advantages of doing it this way? Totally simplifies your operations. You no longer have to know the current state of a server when you do a new release because you're going to get rid of that server as soon as the new release is finished. You no longer have to think about how to get from one state to the other state. If your servers are broken, you don't have to fix them or log into them one at a time to restart Crondy. I don't know if anybody here has used the puppet August library that is for editing files in place, but I've spent hours of my life on that tool and immutable means you never have to use August again, which is to say that's worth it for its own sake. This enables continuous deployments of code because new code goes through your testing and release pipeline and you don't have to do old versions of libraries or dependencies or configuration that's laying around on the servers that might be incorrect. You get faster startup times. I've seen configuration managed environments where it takes four hours to launch an instance and get it into the state where it's ready to serve traffic. That's probably a pathological case, but it can easily be an hour. It's hard to use reactive auto scaling and auto scaling groups if it takes an hour for a new instance to come up. By the time your new instance comes up all of your old instances have died because they didn't have enough capacity to handle the incoming traffic. It's also hard to recover from failure if one of the machines in your cluster dies or is killed off by Chaos Monkey or if you're using AWS as your Chaos Monkey. You need to be able to start up a new one quickly. If you think back to CS1 we used to talk about things like using maybe a longer compile time and doing all your optimizations that compile time so that you get a faster run time. It's a similar thing. We're going to build all the time into the release process so that we can get faster startup time for each instance. Something that was mentioned at the Rust talk earlier today too. In addition, your configuration is always in sync across all of your nodes. Since they're all launched at the same time from the same image, no more worrying about that one node where Chef crashed halfway through its run. You also don't have to worry about cruffed building up in your systems. If one of your nodes is acting weird you just kill it and let it be replaced. You get to deploy the exact same version of your code, the exact same full image of your code to dev, to test, to prod. You can trust your systems to behave the same or at least as much as possible the same in all of your environments. It makes it a lot easier to respond to security threats. If you're used to rebooting one node at a time as you update your kernel or you update glibc you don't have to do that anymore. You deploy a new batch of servers with the new AMI and the new security update and then you kill off all your old nodes and you don't have to worry about which nodes didn't get that security update. Also in the event that, and this is maybe controversial, some people will argue with me on this, but if one of your nodes is compromised but you're replacing your nodes like relatively frequently then your attacker has a shorter time window in which they can exploit your system. It makes multi-region operations easier so you can run the same image in all of your AWS regions or all of your Google Cloud regions. And back to that one server. It's going to stick out like a sore thumb because it's the one that has the uptime of more than three weeks. In fact, if you're running Chaos Monkey you're probably going to lose that one server sooner or later anyway. So once you move to a mutable you're able to take advantage of a couple really cool release strategies. One of them is a rolling release where nodes are replaced one instance at a time through your entire cluster. This can be handy if you have some sort of state on the node that you want to preserve or the cluster as a whole maintains some state something like a Kafka cluster is a good example for rolling releases. On the other hand, blue-green or red-black push, you start up, let's say you have a hundred nodes in your existing cluster and Cloud providers make this super easy, right? You have a hundred nodes in your cluster, you start up a hundred new nodes, you either like slowly shift traffic or just cut traffic over to the new nodes you have the old nodes running so that in the end of a failure or a problem or an issue, a bug that you was unforeseen in the new nodes you just flip back to the old nodes and you're right back to where you were. Now after a reasonable window of like maybe one to three hours you shut down the old nodes so you have an overlap at a time where both sets are running. Of course there's some caveats. If you're running your own bare metal infrastructure you're still going to need something to manage that base operating system layer that you're running hopefully Docker, Kubernetes, Docker Swarm, something on. But like this is now just your base infrastructure. So you don't need all your developer teams to learn your DSL on your base like to do anything with your configuration management. You can just have your infrastructure team manage that one layer they're a small group of people you keep that infrastructure layer as small and as possible both to reduce the attack surface as much as possible but also to reduce the risk of a bad change taking down all of your systems. Yeah, so I mentioned earlier that the base engine couldn't reimage our database servers. Now we were running Oracle on Sun Hardware and it took like a month to get a new database server up and running but I still hear this story from people running like Mongo and Cassandra. Sure we can run immutable but not on our database nodes. I think you're not trying hard enough. You can actually put your database storage on EBS volume or a Docker mount point and as long as the on disk format is kept the same between database versions you can actually detach the EBS volume, launch a new instance, reattach the EBS volume. This is back to that rolling release thing I was talking about earlier. And it's possible I've actually seen it done. So with that I've got plenty of time left for questions I've also got these cool circle stripe stickers that if you want to put over the chef or puppet logo on your laptop, you're welcome to come up here. I have a bunch of them so you can come see me later. Any questions? I basically have so many questions that I don't know where to start so I'll just pick a random one. What about things like you mentioned you can move your application through dev test prod. What about parameterizing the configurations between environments? What kind of challenges do you have with individuality between servers? Each environment has some different things about it. We're kind of talking about an old golden image based thing almost. It looks like there would be so many other challenges. Are you pushing the same problems off different? So the service discovery that I was talking about is a super critical part of this whole thing. Because when an instance comes up in your dev cluster it needs to be able to find whatever its dependencies. If you're running a service oriented or micro service oriented architecture it's got to talk to all the other things. And if it can say I need to talk to the food server and it registers and service discovery and it says hey I'm the bar server, tell me where the food server is then service discovery says that. And as long as your service discovery is running in each environment separately then when it comes up there and you can also have some configuration, some environment variables in that same structure so that when it comes up in that area it's going to get the services. Does that make sense? It does. That's kind of the only thing I could think as well. So it's kind of a trade off because you're dependent on your service discovery. Your service discovery is going to become a critical service in your infrastructure and it's going to need to be fast and reliable and resilient. Awesome. Thank you. So you mentioned that configuration management is apparently hard for full stack developers. How did then do you like weigh that with okay so I don't want my full stack developers to learn a new DSL such as like YAML. However I am going to tell them that they now need to learn Dockerfile. Now they need to learn how to write a Dockerfile. So how do you balance that? Yeah, that's an interesting one. So I was having this debate over beers last night. I don't think Dockerfiles are necessarily the way to go for building Docker images. If you are already building your application into a RPM or a Debian package, then your Dockerfile can basically be the same across all your images because it's just, you know, take your base image, install your application which will pull your dependencies and then you're done. And so now you have a Docker image, a Dockerfile that's a template that you do still have to teach your developers how to write like a Gradle file because they're going to have to write the file that does that. But like Gradle A is super simple and it's something that normally is set up at the launch of a new application and not touch very much after that. In fact, if you get really cool about it, you can have a button on your Wiki page or in your Git repository or something that generates a templated Gradle file with the directory pass that you standardize in your organization. So the developers don't even have to touch that for most cases, right? They might have to change their name and email address for their maintainer stuff at the very top of it and that's it, which they should hopefully be able to do. So my question is what do you use to populate service discovery information or environment variables? I think maybe you just answered that as a Gradle for endpoints, for databases. So the way Eureka works, which is Netflix's service discovery tool open source is that it every service when it launches registers with service discovery. So there's not really much of the idea of like what do you do in a Greenfield instance where you have no instances because your first instance is going to come up, it's going to register a service discovery, then your next one is going to be able to find the first one as you go. So hopefully you just reach a stable state and then you're only upgrading one set of instances at a time. Does that answer your question? I'm not related to that, but essentially if you have different environments and you're bootstrapping, what you can do is you can use data that's set to the box as it's submitted and the other thing you can do is say they all have different DNS servers. The other thing you can do is have different you know, well I mean those are I guess the major things, but in terms of like bootstrapping up in different environments and doing that discovery you were talking about registering to a central service. They would all see the same service in this case and then find out what they are. No I was saying that you have a discovery service running in dev that all of the dev servers talk to you. And how do you find that discovery service might be a question and that's like there's a known DNS entry, so yes you run a separate DNS. I mean you could do a DNS or you could basically pass the arguments in when you bring the AML. You could put it in user data. You can also just like imagine you have discovery.test.sample company.com and discovery.prod.sample company.com and when you come up in tests you have like dollar test, dollar environment is pre-populated because it's the name of your AWS account or you know whatever like you can name your AWS environment. You have to know what you are and then pull on the piece of yarn after that. Hey I'm a database administrator so I have a couple of reasons why this might not work for relational database environments, at least as stated. So one of them is that disk formats do change. Like for instance I'm a Postgres DBA and every year or two years there's a new major release that changes this disk format. The other is say you're patching something like GLIB C or some security setting or just some configuration setting. If you rely on starting a new AMI in order to do that then every time you make that change to the database server it becomes unavailable for a little while so you're switching over to the new AMI meaning that the whole application has to take downtime even though the other things the web servers kind of roll over the database is kind of like that single point of failure. So do you have ideas about addressing that? Not as many because I'm not a DBA but obviously Kavya supply this won't work everywhere. I think that if you have master slave environment where you can promote a slave to master and you're going to need more work to make it work right? It's not going to just be like this easy like flip a switch thing. In a distributed environment like Cassandra or Mongo it's a little bit easier to cycle one node at a time and a large distributed cluster of nodes. Obviously Postgres I'm a little hard to answer that question. It's more of a what? Yeah, more of a pet. I didn't get around to putting the pets versus cattle versus chickens argument into my slides. Yes, so this is obviously kind of the future or one possible future of infrastructure and in this future what does regression testing of your changes to the infrastructure look like? Like how do you automate that to make sure that the new AMI you're rolling out doesn't have some error and some mistake that you made? There's a whole bunch of layers to the answer to that question. Obviously you're going to run standard unit testing during development but when you're running integration testing you want to run that hopefully you have a staging environment where you can run that and then another part of that is being able to run a canary cluster. So you can spin up let's say you have 100 nodes running in your production cluster, you can spin up two or three nodes in your canary cluster direct two or three percent of the traffic to them, run that for an hour compare the metrics over the time that the canaries were active between the performance of the canary cluster to the performance of your main cluster and you can actually, depending on how sophisticated your deployment tool is, gate your release based on the scoring that comes out of your canary. That's sort of the best way I've seen it done but there's probably some other ways you could do it. You also have that great, I know that, or at least I've heard that Amazon.com has a thing where they watch a very, very precise graph of their number of dollars spent on Amazon.com per minute or per second and they can watch that. It's a relatively smooth line and so when they do a release of a new build if that line changes by more than five percent over the next like 20 seconds, it's automatically rolled back. They can be like, oh look, I mean I guess if it goes up maybe they don't roll back but if it goes down they definitely roll back. Just to address the database guy's thing earlier, I think he makes a good point but I think that when the ops team isn't so totally obsessed with putting out fires, they'll probably have a little more time to help out in his case as well but my actual question is in the immutable infrastructure case, as a security guy, everyone talks about pets versus cattle and operators, they don't care about cattle so to speak but you don't pop a whole application, you pop one server so I'm a security guy this immutable infrastructure. How can I keep an eye on things and still get the forensics out of it and things of that structure when we're in something like this? So you're still going to want to have all your infrastructure tools, all your IDS, all your logging, all your output. We, or Netflix, when an instance shuts down after it's been running, it copies 100% of its logs off to S3 before finally shutting down as part of its shutdown script so we're still keeping a good portion of the state of that server preserved somewhere in case we need to look at it. In fact, putting in S3 we run hive queries against it, we can run like large map reduce jobs against the log files that are generated, both application logs and system logs. It's true, you don't have the whole system laying around. You do have the ability, at least with AWS, like if you have an instance that's part of a large auto scaling group, you pull that instance out of the auto scaling group and you can still keep it around for as long as you want to keep looking at it. I can't say that you want to go look at a server from three months ago that's no longer there because you're doing some analysis, the server's probably gone. So you definitely have to come up with some ways to work around that. So good talk, thank you. One of the things I'm worried about on the way to immutable infrastructure is we're going to lose something that we got with configuration management, which is a place to meet in terms of improving the quality of our infrastructure in terms of its code, but also sharing best practices, how to install and secure a database server or an Apache web server. One of the things I'm worried about is that we're going to be left on a bunch of different islands with crappy Docker files, which is a shittier version of Bash versus writing and stuff that's much nicer like Python or Ruby, etc. Do you see in this future a way for us to be able to share code and best practices so that we can stand on the shoulders of giants rather than on their toes? I don't have a great answer for that. I will say this, maybe if we're not spending so much time messing around with the configuration management code, we can actually push best practices upstream into the packaging of these applications so that they install by default with good best practices and not have to have Chef configure your good best practices later. Beyond that, I'm not quite sure. I'm over here. It seems like in your mind, this is about security patching. It sounds like in your model, security patching has equal weight no matter how big or small the security patch would be. In the ideal example, a kernel patch is equally painful on either case because you'd have to patch and reboot no matter what. In your case, you would go and patch and reboot anyway, that's cool. The middle case would be like OpenSSL where you have to patch and restart. You guys better be. You don't just patch. You have to patch and restart. Then the minor case would be like a shell shock where you need to patch Bash, but it sounds like in your case that you would patch and then shift and reinstall across your entire fleet to patch Bash? Or if not, do you just sort of wait for the next cycle? I was wondering about strategies for small patches or are there no small patches? You treat like a big patch. I think the thing that's great about this is because you're releasing every time you push code to your application, you're releasing a new copy of your whole operating system and everything. You get so used to doing that. Your application teams are so used to rolling out new AMIs or new Docker images that, yes, it's the same amount of work to roll out a kernel patch and a change to Bash, but that amount of work has become routine, trivial. I was looking at some stats at Netflix. We're pushing 4,000 times a week. When you think about something like a security update where you've got a kernel change, you might grab the teams that have servers that talk to the internet directly. You might push that code out. You might force them to rebuild and relaunch immediately. The same thing for your SSL vulnerability. Something like a Bash vulnerability, you've got a situation where there's some risk, but you can put that into your standard update process. It'll roll out over the course of a week. You can make your own decision as to whether or not that's critical enough to patch that way or force a rebuild across your whole infrastructure. Yeah, so the question just so that makes sure the recording gets it was how do you make sure that your whole priority security patch makes it to all of your infrastructure. The answer is we run an auditing script that looks at all of the infrastructure and will alert and notify owners of applications if the base AMI that they're running on is out of date by whatever defined spec. Normally, that's one quarter. You shouldn't be more than a quarter out of date, but if there's a security update and we need to change that window to a week, then we can notify everyone who hasn't questioned the past week. So that way you can kind of maintain that urgency on updates to all of your infrastructure. Hi. Just a quick comment because I am a database geek. I would say that you can totally do databases as immutable infrastructure. You just have to be not afraid to fail over. You can do single master databases. You just need to be not afraid to fail over. And I'd argue actually the benefits for doing it with database systems is actually if anything greater than the benefits for application code. Cool. Well, you guys get together afterwards and hash it out over beer. I might be misreading you, but I feel like you're sort of equating AMIs and Docker images somewhat. I am, yes. And I feel like they're kind of different paradigms and I mean, I have nothing against Docker files. I think Docker is the greatest thing since sliced bread, but they are kind of different paradigms, like different ways of looking at systems, right? Containers versus systems. Containers, I mean there was a whole talk on containers versus VMs earlier yesterday, I think. There's definitely a difference between containers and VMs. The argument I was making, and I'll let you finish your question, but the argument I was making is that in the world of immutable infrastructure, you can think of them very similarly in that you manage a collection of VMs or you manage a collection of running Docker containers. I've also seen some people who struggle with how Amazon runs things, they run one Docker image per VM. So like, you know, that works that way too. No, but I guess, like let's say you were starting fresh with a project and you're like, well, should I Dockerize this or should I AMI this or how should I make this? What would be considerations about that initial sort of, let's see how I want to create this immutable infrastructure? I think today if you're doing a greenfield deployment, I would start on Docker, but that's just sort of like an instinctual thing. I don't think I have a sound rational argument for why I would do that. I think there's a good argument to be made for either one, and you start thinking about your own use case. Put that outside the topic of this talk. So I had one more question about that. I want to ask you like how immutable would a given system be? So like let's say I spawn a virtual machine, it runs my thing. Like is the user read-only? Is it actually read-only? I mean like how far down that rabbit hole would you recommend going? So Mark Bridges from CF Engine has this argument that nothing's really immutable because as soon as you turn it on and bits are moving like it's been muted. But I think, right, you're still going to have log files. You're still going to have parts of your system that change as part of operation, but I think the idea behind immutable is sort of more of this golden image idea that like you're really, you have a very, very known start place for every instance that launches, and you don't ssh into that machine and change stuff on it while it's running. You don't run configuration management on it. Sure, your application is probably going to write some log files and maybe like have some log files to set state and things like that, but fundamentally you're not as a user changing the system when you are ready to upgrade the application you're going to destroy the system and start over again as opposed to working with upgrading an existing system. So we can argue about the definition of immutable. Have you ever seen this pattern or suggestions for doing this with things like developer workstations to keep developers have the same versions of things? Obviously you're not rebuilding, you know, max every night or anything and they have their own cruffed in tools and usually that's the configuration management thing or you have some other way to package it. I know Ticketmaster just gave a talk that they're packaging their tools in Docker containers and shipping those out, but that may or may not be the right solution as well. So I really like this idea that developers do a lot of their development on VMs that are in the cloud and that you destroy and create the VMs as you need them unless you do that on your local workstation just because then you're starting from the same clean slate every time. Obviously that requires your IDE to be capable of working like that, requires developers to adopt a few different patterns, but it also means your developers don't need crazy-ass high powered workstations, they can just like use essentially a terminal. So I certainly like that methodology, so I do a lot of my work, but I could see the arguments for pretty much anything. Okay, I just wanted to be clear about one thing or two. It kind of relates to the definition of immutable, so it's probably because I'm an idiot, but it might help others, but it doesn't mean if you have an application that needs to dynamically create IP tables rules, it's like things can be changed, right? It's not that things on the instance never change, right? But the idea is that you're not upgrading the instance. And then the second thing I want to walk out of here and know your opinion on is, so if I have like, you know, scntp.com or whatever, I want to change one thing about that, like iBurst, whatever thing. So the philosophy here is to make a new image, reprovision everything, right? Yeah. Now hopefully that's something that like, NDP doesn't something that's an urgent change, right? You build that into your base AMI, it's going to roll out over time. Most of your applications will pick it up within a week because most developers push more than once a week, but like everybody will get it eventually. Yeah, and I hate to split hairs about this part, but you had to think about the promise of configuration management, right? Things being out of sync. Yeah. It's not that this addresses that or makes it go away, right? It's just that, well, we already have that problem, so it doesn't matter if we still have it, right? The out of sync. There's two ways of thinking about that. I think, yes, the way you said it, but also most frequently my issue is not so much that like my application tier and my web tier are out of sync as far as their NDP comp file. I'm more worried that all of my web servers are the same and all of my application servers are the same. And this gives you that promise. Got you. So you know, we were talking about, we're rebuilding an AMI every single time. So do you like offload that to a team that get installed new packages every time? So you have an automated process that builds your base. You're talking about the base AMI? Right, yeah. So the base AMI is just kicked off by an automated process that runs on a weekly basis in general and can be run, like can be kicked off by a force update process in some other way. Netflix, the performance engineering team is responsible for that image. I could see, you know, an infrastructure team being responsible for that image. You have someone who's like responsible for making that, building that image. And however you want to build it, you can either start from yesterday's image and just run app to get on it, you know, app to get update on it, or you can run or you can start from scratch, you know, download the vendor's base image and run update and then install your infrastructure packages. Like that could be done, you know, I said that in my talk, it could be done with a configuration management tool, but like, again, we're down to like five people in your company at most needing to have to ever deal with that tool and it only runs in the sandbox of which is your build pipeline for your base AMI. So it still makes sense to like have a reproducible way to build that image, whether it's an AMI or, you know, virtual shoe or whatever, right? Yeah, you definitely want to have a process that builds your base image, that's completely repeatable. What I wanted to mention is I agree that immutable infrastructures have a lot of advantages at the same time coming back to configuration management. So one of the things we are losing in mutable infrastructure is if you look at the configuration management, so there's a lot of now how in models, for example, Chef Presby's, Tapas Manises, etc. which then, for example, give you an example I've been using for many years right now, for example, the Ancient X, the Apache cookbooks from Chef and there are the side-offs that at the end have a configuration file, which you have in Docker, in the Docker image. You have actually a lot of resources which are built around it and which help you to basically, it's not only the configuration file, you will also configure, for example, your modules, whatever, and that's something you're actually losing in a Docker file. Because in a Docker file at the end, you will just basically have a Docker file which hopefully has some same values in it and if not, then you will basically have to go back and actually to rebuild it by yourself with a configuration management tool, you have actually all this kind of intelligence in those configuration management scripts in there, and that's something that we're losing. So what I'm kind of proposing is that that's something we should as well add for Docker. Yeah, I think, I mean, that's back to the are we losing a bunch of knowledge that we had in configuration management as we move to a new paradigm and I think there's an argument to be said that public, published like, Jeff Brazell has all these, this whole repo of great Docker files for building Docker desktop applications. You're going to want to publish your Docker files so other people can learn from how you do your Docker. Just like you can publish your Chef recipes, you can publish your Docker files, other people can look at them and learn from them. I think that's a perfectly reasonable thing. Yeah, but I mean, so the Docker file is just only the binary file out of it, and then you have some possibilities to have basically some parameters you can pass in while when you have configuration management, maybe actually if you pass in a certain amount of parameters, you're already... I think the Docker build file... The Docker build file is something that you could publish. Now I was saying earlier I don't think that that Docker build file should have much in it, but I mean if you want to share some of that knowledge, maybe that's a way to do it. I don't have an answer to that particular problem though. So that's kind of a gap which is for me kind of missing, which could be actually technically fixed, so there would be nothing stopping us from having a DSL which then can be running as a kind of a pre-processor for passing that in. I think we're like almost out of time, so let's take this whole conversation to the bar and let maybe one more question. How do you recommend handling cases where the application may need secrets so you don't want to bundle into the image like passwords or certificates? Yeah, don't bundle secrets into your application. You should have some way of doing secret discovery, distribution in your infrastructure. Like that's a whole separate talk that I could go and do, but I know that HashiCorp has an application for doing that. Netflix has some open source stuff that's coming out soon that will help with that. There's a huge world of, yeah, that's just a whole other talk, but yes, don't bundle your secrets into your Docker files or your application. Great, thank you, John. It's a great talk. Hey, hey everybody, good afternoon. Last session of the day. I want to introduce Ben Carrow today. He's going to be talking about the dark arts of SSH. He came here from Portland. Give him a warm welcome. Thanks, Matthew. Okay, Mike's working good. Yeah, this is the dark arts of SSH to make sure you're in the right place. Thanks for sticking around for the last session. So where are we? This is SSH. It actually, OpenSSH has a logo. It's the blowfish and here it is. Not many people get to see this thing, but some artists spend a lot of time on it and I think it's kind of cool. So we're going to talk about what is SSH, how is SSH used, some of the more advanced topics, and some things from the long tail of applications that use it and libraries that use it. Yeah, things we're going to talk about libraries, X-forwarding, some really cool things to do that are a little better than X-forwarding. But first and foremost when you learn about SSH, the first thing you learn is I can use this to connect to a remote machine and I'll get a command line on there and I can type commands. The second thing you learn are these things, keys. When you log in, you're kind of sick of typing passwords, so you want to be able to log in potentially without typing a password if you're doing it a lot, like a lot of sysadmins are. You can think of keys each as a unique identity. There's two parts. There's the key like you hold here or you can see here and there's the key holes which would be like the servers you could log into potentially. So there's a public or private one and they can have a password or they can have no password. And if you're thinking, okay, if I'm sick of typing my password, why would it help to give my key a password? Aren't I going to type that as well? Well, yes, except if you use this thing called an agent which I'm going to talk about in a little bit. But first, I'm going to talk about creating some keys. This thing right here is an old style key maker. These things are really cool. So that's what we're going to do now. And I apologize. There's some code blocks in this talk. I can try and make these bigger for folks in the back, but let's see. Can we see this? What happens if I increase the size? Is it going to throw my presentation all the way off? Yeah, that looks okay. Let's see if it screws up the slides later. But what we can see here is we run this command called SSH key gen. And it will ask us a couple questions like what file name do we want to give our key that we create? And do you want it to have a passphrase or password? And then you press enter again because most people don't put a password on it. And it says, hooray, you've created a key. I'm going to put it at these file names. And then there's one last piece that it prints out. And this is a relatively new feature of SSH and by relatively new, I mean in the last five years or so, art. It's a beautiful piece of ASCII art for you that's unique to this key. And if you don't know the command to actually find this thing, it's difficult to get it to print this. It's kind of tricky to get it to print it again. So when you generate it, it will show you this little piece of ASCII art. And if you want to see it again, there's a command down at the bottom, SSH key gen, and you pass it that command in the file name that shows you the art again. This can be useful because no one really wants to read the print up top there, and the open SSH authors knew this. So they wanted to create a representation that might be a little more useful and easier for humans to use. So let's go over two keys. We have our private key, which pretty much looks like this. If you just look in the file, there's a comment at the beginning that says, begin the key material, and then it prints out all the key material there, and then says, end key. If you've ever used GPG, you've probably seen a lot of files that look exactly like this. Yes, and if you have a key with a pass phrase, it'll look a little bit different. You'll have these two extra lines that I've highlighted up there to give it a little more information about it. And that's about all there is to a private key. Let's look at a public key now. This looks a little bit different. For open SSH, there's actually two styles of SSH keys. There's I think the regular P-E-K style, and there's the open SSH style, which we're covering here. If you honestly care, there's an RFC that defines this whole thing. And the format is key type, space, the key material, space, and then a comment. And usually that's, usually comment is like a username at host or something about where you generated the key. And SSH-RSA is the most common. That's going to be RSA style keys. You can have DSA style keys as well. And one interesting tidbit there that the middle section, the key material, is in a special encoding called base64. So it's a lot easier to transport over network. And if you look at the very beginning if you send that to the base64 command and pass the dash D flag to decode it, it says the key type in there already. So the first one is kind of redundant. So how do we use these things? We place them in this directory inside the SSH directory called authorized keys. That's a manual way to do it. SSH comes with a command called SSH copy ID and you pass it dash I and then the key name and where you want to send it. And it will basically do what you see in the first block of code up there. One of the cool things you can do is there's a service called termbin.com which is an open source service and you can echo things to it from the command line and it's a paste bin. It will paste it for you. So in this case we are echoing our SSH public key and then telling termbin.com please post this for me. And then it hands you back a URL that you can go load up in your browser if you want to send your public key to somebody. This is one of the easiest ways to send public keys if for example you need to do this over Twitter or something that's going to limit your length. So let's take a look at an authorized keys file real quick. This is the anatomy of an authorized keys file. You can see in the first block there's two keys there. There's my personal key Ponderosa and there's my work key for Mozilla. And those are very simple. It's just one key per line and optionally there's a bunch of options that you can associate with it and I'm going to go into those a little bit in the next slide but we can see here if we want to restrict someone to only playing NetHack we can do that very easily with authorized keys. That means when they log in they're just going to go straight into a NetCat game. No shell. These are the options you can do. There's a command. There's an environment if you want to set your shell to a table flip. You can use this to kind of allow only certain sources of addresses in and you can restrict other options with this as well. Interestingly there's actually like an RC file that gets run whenever you log in. I've never heard of this until about a week ago. It's .ssa slash rc and that file just gets executed whenever you log in. So if you want to do some kind of setup you could do it from there. Next section. Can anyone tell me who this is? The Twin Peaks fans in the room? The special agent Dale Cooper from the TV show Twin Peaks. I wanted to use this as an example to talk about agents. Agents are processes that run on your machine that handle the registration of SSH keys for you. Examples of this are like an SSH agent that comes with OpenSSH OSX has one built in called keychain.app and Nome has a key locker. Yes, Nome key ring that does the same thing. The OSX and Nome ones actually have pretty GUIs and you can manage passwords in there and certificates and all sorts of wonderful things. But for the purposes of this presentation I'm just going to show you the hacky little agent usage that I use and a lot of other people use if they don't want one of the more robust solutions. With this you just run eval and then SSH agent and then it starts it in the background and prints out a couple of environment variables for you. So you just make sure these are set whenever you want to use them and then it's running on your behalf. And whenever you try to start a new SSH connection it knows to look for these environment variables and then use them to try to talk to the agent before they send you for a password. By default when you start the agent there's no keys in it. So you have to add one and you just do that by typing SSH add and handing it a key. And then if it's got a password you type the password and then it stores the unlocked key in memory for the whole life of the agent. You can set timeouts on this and it's a best practice to do it. Let's move on to a little more configuration. You can check these out in the man page for SSH and SSH config. There's a file in .ssh called config and it has a couple blocks in there. The two kinds of blocks are host and match. And the host block you give it a nickname so mine is IRC and this is the server that I run IRC from and you can give it a couple options like what is my host name that I want to connect to, what is the port. You can set up port forwards from other things in here and you can also chain these together. With match you can glob all of the host names together so I know that whenever I have to log into a Mozilla box my username is always going to be bcarrowatmozilla.com so instead of having to type that out all the time I can just stick it in a match block here. So let's say we've connected to my host and now I'm having some really bad problems with a shaky network connection. There's a couple things you can do to try and keep your session alive and make your host more aware of when your network is being shaky. Maybe it should retry a couple times. Maybe you want it to fail right away. The first of those options is TCP keep alive and this is basically your host sending little network pings to the remote host to try to make sure the connection is alive and if it's not let's say it doesn't return and there's a timeout there of maybe 60 seconds it'll just hang there and after 60 seconds go by it will just kill the connection. This has a problem in that because it's TCP an attacker could actually spoof it and try and keep your connection alive for longer and that might be something that you can run into. So SSH built in an alternative called the server alive interval which does the same thing but because it's not done on TCP and it's done through the encrypted channel then an attacker can't go in and just send keep alive for you. Additionally there's another little tool called auto SSH and it's a wonderful thing in that it has exact same commands as SSH but when the connection dies for some reason it'll just start it right back up again. The best way to use it is to type SSH and you get the command you want in and then to make it a little more robust with a restart behavior to make it enterprise software you just replace SSH with auto SSH and it does what you'd expect. But let's say your network connection is even more unstable. We have another thing called Mosh and this used to be a relatively new project but now it's much more common to use for developers and for people who have to SSH in from coffee shops which might have a connection that you're sharing with other people and it just doesn't have that much bandwidth and you can use it like SSH except it has a couple limitations and there's a couple kind of scary things about it. So in our first code block here instead of SSH we're just typing Mosh and we're going to host.example.com and then you log in and then let's say you close your laptop and you get on an airplane and you log in from another coffee shop afterwards and you open your laptop again you'll still be connected to host. After it does the initial handshake it will just use UDP so if your source address changes it can roll with that and it'll just resume your connection to the other side which is a really cool feature when it came out and I'm happy that it caught on and is maturing more. It does have some scary parts like if you look at this bottom code block here we're finding the Mosh process and then we're looking at the environment variables and if you look at the bottom highlighted line there that is the key material that is private key material that you can just pull from an environment variable which is kind of scary but as long as your user never gets compromised it won't be an issue but Mosh has a couple restrictions that I mentioned earlier and one of them is that SSH can have arbitrary data channels that's what lets you do port forward and exporting cool things like that but Mosh is just a single data channel it doesn't actually support any of that but if you don't need that then SSH can give you some cool features like local echoing so if you have a momentary lapse in your connection you can still see what you're typing which is really useful for things like IRC on a shaky connection. Another cool thing you might not have seen is SSH has a little command line built into it so if you type tilde dot no no tilde dot will kill your connection don't type that one unless you want to. Tilde question mark is this whole thing out here and it allows you to do things like background SSH make a new tunnel, kind of redraw the screen print the help message and you can kind of generally administrate your connection through this little interface here when I first saw this I thought it was really cool because it's like I found this secret command line that I never knew existed. Tilde dot is the most useful thing out of here like if you have a hung SSH connection you don't have to close the window you can just type tilde dot and it will dump you back at your local command line. Speaking of the arbitrary data channels though I want to talk about tunneling for a little bit. What do I have up here first? Yes, forward that's like forward forward? Yeah, forward port forwarding is when you're SSHing to a remote host and there's something near the remote host or on the remote host that you want to get access to. So in this example we have an internal wiki at Mozilla and I want to be able to get access to it without dealing with VPN software or anything like that. What I can do is I can say SSH dash L and the local host is implied there but you can replace it with something else if you want and then it says I want to make port 8000 connected to internet.mozilla.org the wiki I want my port 8000 to be port 443 there and I want to get there by going through the host ssh.mozilla.com and as long as that you can hit enter and as long as that SSH session is alive you can open your browser and go to local host port 8080 8000 and you'll be presented with the thing on the other side. Reverse forwards are the exact opposite of that. Reverse forwards I'm going to use in a little trick right after this. They're really cool there like if I have a port on my laptop and I want to expose it to the internet I obviously don't have a public IP address here but I have one on my co-located server so why don't I just SSH in and use a port there to give everybody access to something I want and you use that exact same syntax for it the dynamic forwards that's the outside one and 8000 would be the port on my laptop dynamic forwards are a SOX proxy which is a way of transparently tunneling traffic through it so instead of having an http proxy the specific to http or something like that a SOX proxy can deal with all of the TCP traffic on your host and how you start that is you just pass a dash D and then a port number is usually 1080 if you find the tutorials or something on the internet and that will just sit there and open up port 1080 on your laptop and any traffic you tell to go through there will come out the other side of the SSH connection one last thing I wanted to talk about here is sshuttle which unfortunately got cut off but this is a really cool project by a guy named Aaron is there a bar? I would love to scroll down oh that's unfair tell you what I'm just going to go to the github page there we go cool it's a proxy but it's done at the IP tables layer so your system firewall layer and how you use it is you basically just oh god I can't see that I'm going to go over here for a second yeah wow that says absolutely nothing okay yeah you get cloned it and then it has a binary in there called sshuttle and you tell it what host to connect to and whether you want DNS to be forwarded or not and then you turn it to the IP tables and transparently tunnel all of the systems traffic through it and this is a really cool way if you don't want to end up mucking with socks proxies if you're turning them on or off a lot you can use this this works on OSX it works on linux and when you're done with the firewall you just hit ctrl c and kill this and then you're back to your regular system and it's a little bit more of a transparent proxying I wrote a little code block here for it a couple things here I already told you about ssh-d and then the port there's a program called tsox which is like this little shim binary on your system and it has a config file at ctsox.com and you just tell it where your sock server is and then it will intercept all of the network calls the network system calls and run it through the socks proxy first so there's this website icanhasip.com and if you go there it will just show your IP address it's very useful for testing or scripts or something else so in this case we're doing that at first we get our local address from here and then we do ssh-d and then if we do that again at the very bottom instead of getting your local address you're going to get the address of your socks proxy shout out to Rockspace for hosting this little service I'm glad it's up there for all my scripts I'm going to be really screwed if they take it down one more example with the tunneling thing sneaky tunneling trick it turns out that python has this feature where if you pass dash m a module name it will run some code associated with that module and in python there is a really cool module called simplehttpserver and it does exactly what you expect so oh god alright so I'm going to go into code and I'm going to do python-m simplehttpserver and I'm going to say in a second here there we go it's listening on port 8000 so if I just go to localhost on port 8000 it gives me my directory and this is really useful if you want to share local files to someone over the network but it's even more useful if you combine it with that ssh trick that I showed you earlier so if I do ssh-r and then just says listen on all interfaces for this traffic and then port 5000 and then localhost because I'm running it on my laptop and port 8000 and then tell it where to go publish this as soon as I log in it started sharing it so let me go back here and I'll just go to bke.ro which is my domain and port 5000 and suddenly I just shared my code directory on my laptop on the public internet this is really useful if you have some files that you need to give a vendor but they're behind a firewall and you totally can't give them access you can do something like this to give them a little bit of access and make it easy for them to go see whoa there we go another cruel trick is x forwarding this is kind of a classic x trick that they added support for a classic ssh trick they added support for a long time ago if you just pass the dash capital x option and it will start an x session and forward all of the windows to you it's kind of because x was written a long time ago it makes a lot of synchronous calls and it's kind of slow and it has this huge security model and if you don't want to deal with that for performance reasons you can just pass ssh-xy and then I have a server called s and I'm just going to run bash on it so now if I just type xterm and we wait a really long time for this internet connection I should get an xterm window to pop up the dash y option is insecure mode there we go it does less round trips now I have a little xterm from my remote server and it's okay to do it for xterm but you can also do something like firefox or you can do something like a music player or something else firefox does a huge amount of calls so it's really slow to start up so we're definitely not going to wait for that there is one more let me see one more tool called xpra firefox thanks buddy wow so like going to cnn.com or something is probably going to take a minute or more something that's a little more useful than this because when I control c out of this it's just going to kill firefox but what if we wanted to stick around for a while there's another utility out there called xpra which is exactly what it sounds like you ssh somewhere and you start an x session and then you can attach and detach from it just like a console screen and you just do that with the syntax here if you're interested in that tool you can go to xpra.org and it's available for windows mac and linux and for linux they actually have repositories for all the distributions too let's move on to multiplexing connections one of the cool features that ssh has is that let's say I have my big host that I ssh to all the time instead of maintaining 1100 tcp connections to it I can maintain one connection because ssh supports multiple data connections so why don't we just use those the way that you can use these with ssh are a couple options that you have to set in the .config file one is called the control master and that just tells it to turn on or off and then the control path which is the file system path to the socket file that it uses to communicate so the way you turn this on is you just add the two lines control master yes and control path and then put it somewhere in your home directory and then no matter how many times you ssh to that host they'll all just use the same connection yes okay is that a hard limit or is it okay yeah it's really cool it has kind of a bug in it in that sometimes when you exit the connections they'll just hang there and if you have control c it'll kill all of your other connections so I turned this on once for a performance optimization with like some server scripts and we ran into a bunch of hung connections and it turned out to be that yeah let me see is that yeah so I I put a little timing thing under there so you can see when I ssh the first time it'll take a couple seconds to spin up but the second time is almost instant alright moving on to some more things there's some libraries that are really useful for this parameco is one of them that is a library to embed ssh in your applications so this is really useful if you're making a web server or something like that and you want it to be able to talk ssh so if you're making a maybe a github clone or something like that and you have your users upload an ssh key and you want to be able to use that somehow you can use parameco to do that from your application and twist it is kind of the same way but that's more for writing custom servers so if you want to write a custom server that has some weird behavior like forces everyone to play nethack or doesn't ask for a username at all or it parses the username into we used to have a back where I worked we had a switch server a console server so you would ssh into that with your username plus and then the port number and it would figure out which computer to connect you to. This is a piece of hardware that sits in a server cabinet and has a bunch of monitor and keyboard and mouse cables going into each server so I'll ssh in there with my username is bcarrow plus 8 and then it will connect me to server 8 regular ssh can't do this so if you need some behavior like that, twisted conch specifically makes it fun and easy to do that yourself unfortunately I don't have my demo working right now but yeah so I apologize alright you probably can do that do they support ssh like handshakes they might to do that you can do the open ssl command line and then sserver kind of a basic input output that you have to write another program to go do yeah so it's an interesting question I don't know the answer I'm sorry so let me see some best practices for the server permit root log in no by default on some distributions you can still log in as a remote user over ssh the general recommendation is to create a non-eroute user and then log in as that and then use sudo or sudo to set that up gateway ports these things disable users from binding ports onto non-loopback so like the trick that I did with my file sharing where I expose that to the internet I wouldn't be able to do that I could only expose that to local hosts so other people on that computer could look at it but not the internet in general this is useful to turn on or to disable unless you actually have a legitimate reason for it so disabling passwords entirely most of the ssh servers that you see exposed to the internet are going to have a feature like this simply because passwords are like a huge target so if you look at the failed logs the failed logins for ssh servers it's mostly just spam people are like brute forcing these servers trying to find a password with a common username and sometimes they succeed but if you turn on or if you turn off password authentication then they can't even start another feature that got added a little while ago is called the authorized keys command so instead of having an authorized keys file the server can specify a script to run and the script will return all of the public keys that are appropriate for that user so if we have this enabled in the server and I tried the ssh in there the server will run whatever command I specify here and pass it my username and the script is expected to go look at LDAP or I don't know Curl or like look in SQLite or something like that and return a list of public keys that are acceptable for me yeah so in review ssh is really powerful and it pays to secure it and it's one of our best and most mature tools for doing remote administration using SOX proxies sshuttle and tsox allows us to have a lot of great network agility it allows us to forget where we are like if I'm hidden behind NAT and it allows us to maintain a public presence on the internet and lastly little tricks like python dash and simple HTTP server is a great trick to provide access to vendors or anybody else that we might need to communicate with alright that's all thank you we have some time for questions if there's any questions some relatively noted using ssh just going to start in AWS with all these utilities that you as I'm sure you've probably heard a lot before the arch wiki is it one of a place to like the arch linux wiki even if you're not running arch linux they have a lot of user tricks and things like that in there you talked a little bit about ssh agent but you didn't talk about the sort of tricks of allowing it to follow you over multiple hops oh yeah that is an excellent point I apologize I think that was in the original material but it didn't end up in here if you pass ssh to the dash a flag your agent will follow you into all of your subsequent hosts which can be a great thing but it can also be scary because anybody else who compromised that host can then go make authentication requests on your behalf and so I know that the security apps people at a company I used to work for would warn people if they're going to ssh into a compromised host they should be sure to disable the agent forwarding before they finally got there yeah thank you for that you have slides available online somewhere yes you should go to bke.ro everything is on there so that's my blog WordPress instance thing and yeah right after this talk I'm going to post these any more questions alright let's thank the speaker for great talk