 Thank you all for coming. I know it's the last session of the afternoon. And usually for me, it's the last session of the afternoon. I want to be doing something other than sitting in a session. So I'm really grateful that you've all come. This session is Containerizing Rails. My name is Daniel Azuma. And yeah, we'll get started. So basically during this hour, we're going to be talking about container best practices. We'll talk about how to wrap your Rails application in Docker containers. We'll talk about how to optimize your containers for size and for performance. We'll talk about what should and what should not be in your containers. Basically, we'll talk about how to make your Rails application work well in a container-based production system. We're going to start off with a little bit of background on Docker. I'm not going to make assumptions about how much experience you've had with Docker or with Kubernetes or related technologies. So you should be able to follow this, even if you don't have a lot of experience. But that said, this is actually not meant to be a first-time tutorial. It's not really a first-time beginner thing. I kind of have to apologize for this. I know that the program says that there's going to be a walk-through for deploying an application to Kubernetes. And I ended up cutting the walk-through for time. So and I didn't get a chance to update the program. There are good tutorials on Kubernetes that you can find online. But what we're going to be focusing on here today is best practices, is tips for making better containers. So that's pretty much what we'll do. So let's start off with just a little bit about me, about your speaker. My name is Daniel Azuma. I've been developing professionally for about 20 years or so. About 12 of those years have been spent with Ruby and Ruby on Rails. I remember going to my first RailsConf in 2007. Is anyone here at RailsConf 2007 or before? A few. OK, that's actually really good. I was hoping to see most of you are newer than that, because that really means a community that continues to grow and continues to evolve. And so I'm glad to see that. So welcome. I've been doing a variety of things in Ruby over the past 12 years. Did some geospatial work. Did a bunch of startups. Currently, I'm part of a little boutique West Coast startup. At Google, I'm part of a team that is focused on programming language infrastructure. So we call it languages, optimizations, and libraries. Or LOL for short, as you could probably imagine. It's a team that, again, focuses on making language infrastructure better, both internally at Google as well as externally for customers, for you people, working on Google Cloud. I'm part of the team that handles Ruby and Elixir. So I'm just really excited to work on stuff that helps you, people who are using languages that I love. So glad to be here. This is a sponsored talk. And so I do want to thank Google Cloud Platform for enabling me to be here and to share with you about stuff that's some of our ideas on containers and what makes them tick. So let's get started. And we'll get started with some basic concepts, again, just to make sure we're all on the same page. So first, containers. What is a container? I think if you talk about containers, there's one word that you really should come to your mind when containers are mentioned. And that is isolation. Containers are about isolation. In particular, take Docker, for example. A Docker container is basically an isolated file system, slice of the CPU, slice of the memory, network stack, user space, and process space. And again, all of this is isolated. It's set up so that it's very controlled how inside the container interacts with outside the container. Now, at first glance, this might look kind of like a virtual machine, kind of like a VM. But it's not. This is not a VM. There's no hypervisor involved here. This is containers are actually a feature of Linux. So containers live inside a Linux machine. They share the Linux kernel and the CPU architecture with anything else that's running in Linux. And that includes other processes in Linux. And it includes other containers. So you can run multiple containers on a Linux machine. And again, they are isolated from each other. So really basic on containers. The next concept that we need to cover is images. So what is an image? An image is basically a template for creating a container. Most of that consists of a file system snapshot. So imagine the files and the directories that you need to start your container. So your application, the files that comprise your application, the files that comprise your operating system, any dependencies of your application, all of that lives in the image. In Docker, you create an image using a recipe called a Docker file. Docker files, this is basically what a simple Docker file might look like for a Rails app. This is somewhat simplified. So there aren't a lot of best practices here. But it's here so you can see what the different parts are. The first line here is what's known as the base image. This is a starting point for a Docker file. It's another Docker image. So again, it has files. And it has all the information you would need to start a container. There are different kinds of base images. This one in particular would refer to the official Ruby image, which includes an operating system and an installation of Ruby. From that base image, there are commands that you would run in a Docker file. Some of those commands do things like copy files into the image. So you might copy your Rails application, for example, into the image. You can also run shell commands to do things in the Docker file. For example, you can run bundle install to install your gem dependencies, install your bundle in the image. You can also set different properties of the image. And so this particular property is the command that is run by default to start a container based on this image. So different parts of a Docker file. Again, there are basically commands that are run in order by Docker when you build an image from this file. Now we're going to start by looking at this first line, the base image. When you go through a Getting Started tutorial for Docker, there are often some, you'll be directed to use some specific base image. And there are a variety of those base images. One example is as we saw the official Ruby image. There are different variants of that image using different operating systems, different versions of Ruby. There are different, there are lower level images that just install an operating system for you. So you might have an Alpine Linux image or a Debian or a Ubuntu image. There are also a variety of third party images, some good ones from Fusion, for example. I'm not going to advocate for a specific base image here. Each one has kind of its own goals, its own philosophy behind this design. Rather, what I'm going to do is I'm going to talk about the things that go into an image, what makes an image effective. And so these are tips that you can use if you're creating a base image yourself or if you're just creating an application image based on someone else's image. But that does bring us to our first pro tip here, which is that it's important to understand what's going on in a base image that you use. Don't treat a base image like a black box. It's very tempting to do that. But it's actually very important in order to make effective use of a base image to understand what it's doing and why it's doing that. So read your base images docker file. The docker files generally are not that long, take probably just a few minutes to read, and most of them are available on GitHub. So take a look at your docker file, get to know what operating system installs, how it sets up the environment, and whether that really matches how your application wants to be set up. You can also learn a lot of good docker practices by reading other people's docker files. So a good practice, read base images. And as you get familiar with docker files, you'll notice that there are some properties that a lot of good docker files have. And one of the important ones is size. Size matters a lot with docker files, with images. They matter in terms of at runtime, in terms of how much resources your image uses, your application uses at runtime. But it also matters at build time and at deployment time because if your image is large, you have to upload and download that to and from your CI system into your production system. Maybe you're building things locally. However you do that, some of these images can be fairly large. And it's good to try to minimize that size. And there are actually a lot of techniques that you'll see out there on how to optimize the size of your image. So let's take a look at a few of those because they're really interesting. One of the most common things that you'll do in a docker file is install stuff. So you have Rails, you're running Rails. It'll probably use certain libraries. You'll use maybe LibyAML. You might use a little bit XML. Haven't forbid you might use image magic, but there are various things that you'll end up using. If you're on Debian or Ubuntu, the tool that you use to install that is apt-get. One thing to know about apt-get is that it does, it's not really designed for docker files. I mean it's been around for a lot longer than that. And it actually leaves a bunch of files in your file system. It downloads package lists, downloads package archives, certain manifest files. And so it's important, these files are not necessary for your application at runtime. And so it's a good idea to get rid of them after you're done installing. So oftentimes when you look at a docker file, you'll see something that looks like this. You'll see a line of the docker file saying, okay, let's go ahead and delete all those temporary files, all those cached files that apt-get uses. And this is good, this is important to do, but it's also important to do this correctly. For example, don't do it like this. Don't run, don't update, install, and then clean up in a series of run commands. Instead, do it like this. Combine those steps in a single run command. That's very important. And the reason for that is because of how docker images are represented. A docker image is represented as a series of layers. So imagine you have a base image that you've used. And then in your docker file, you have those series of commands. So each major command that you run will add a diff on top of those layers. So as you run this series of docker commands, you have this series of diffs. And the image is that entire set of layers. So for example, if you run apt-get updates in one command in your docker file, that will download a bunch of package lists from the Debbie repositories. Now those are in your image, those are in that diff, in that layer. Now if you continue to run additional commands, they'll add additional layers. Later, if you run apt-get clean, those will remove those files from that layer. But those files are already part of your image. They were added in an earlier layer. So you actually really haven't gained anything here. The image comprises the entire set of diffs in any command that you run in the docker file. So it's important, again, to do it like this, apt-get update, install, and clean all in the same run commands. And what that does is it makes sure that those temporary files and apt-get update installs get removed before that layer gets finalized. So they never appear in a layer. So you'll see, you go out and look for docker file best practices. This is one of the key ones that you'll see a lot. A lot of people will talk about minimizing the number of layers as well. And that's also important. But I think it's more critically important to understand what's being done in each layer. And if you install something in a layer, it's there. It's part of your image at that point. It has to be downloaded when you install that image. So very important. So next prototype, again, combine your installations with theta. So this is how it looks in apt-get. If you build the source, you're installing a new source, for example. Download the source, configure and make install, and then delete your source directory all in the same run command so that the source files which you don't need at runtime don't end up in a layer. Similarly, Alpine Linux. This is a great distribution, for example, to mentor Dockermiles, because it's tiny and has a lot of really useful features. One of those is a virtual package feature. It's kind of like the virtual environments in Python. Basically, you can install stuff using APK temporarily. So install things, use them, and then remove that entire environment. But again, important to do all of that within the same run command so that those temporary packages never show up in the layer. So again, very important, combine installations with cleanup. Here's another optimization technique. Some gems come with C extensions. So if you're running Rails, one of the gems that will probably be part of your bundle is local Geary. It has C extensions as part of it. So in order to install that bundle, you need a C compiler. You need a bunch of things, in fact. You need make, you need libraries, you need a whole set of build tools to install that. Now, these build tools, you need them to install your bundlet, but you probably don't need those at runtime. And those build tools are actually pretty large. I tried installing the build essential package on top of Ubuntu last night just to see how big it is. Ubuntu, the Ubuntu big image by itself is about 100 megabytes. With build essential, it triples that size. So this is not small. So it'll be nice to be able to install your bundle but not have these build tools in your final image. Is there a way to do that? Yes, there is. There's a powerful technique that not a lot of people are using yet, but it's very useful for this and kind of a whole class of similar problems. And that is multi-stage Docker files. This is a feature that's been around for about a year in Docker. Seems like not a lot of people are using it yet, but you should. You should use this feature. It's really, really useful. The basic idea is like a multi-stage rocket, you have an initial stage that kind of does your heavy lifting for you in the build process. And then when you're done with it, you just discard it. So only your final stage, which is much, much smaller, is then used at runtime. So this is how this might look. This is, again, a kind of an illustrative Docker file. There are some commands that you would normally find that are kind of missing here. But the idea is that you have multiple images in a Docker file, multiple stages. And it's only that last image that is finally used at runtime. The earlier images are removed. So this is how this will work. So we start with a base image. I call this my base because I'm not sure if this base image actually exists as a public image. So imagine a base image that retains Ruby, but no build tools. Normally I think the official Ruby images do have all the build tools because they expect you're gonna install gems. But that image is less useful in production because you don't need those build tools at runtime. So imagine you have a base image with Ruby and no build tools. So you start there, you copy your application into the image. Now you need to install your bundle. So let's install those build tools. So before I actually go to your app, get updates, install and clean, right? Now we bundle install. So now we've got this image which has Ruby, has your operating system, has your application, has all of your gem dependencies, including those build C extensions, and has all the C compilers and build tools. That 200 megabytes of stuff that you need to build, that you don't need at runtime. So here's our first stage. Then we start over. We start over with a new base image. And then what we can do is we can copy the application directory from that first stage into the second stage. Notice what we did here with the bundle install. We did dash dash deployment. That, among other things, vendors your gems. That means it installs those gems in your application directory. That includes all of those C libraries that got built. So when you copy that application directory, it includes your application and all of those installed gems. So now we've got a new image. That's what we want at runtime. We have a base image with Ruby. We have our application. We have all the gems with all the C extensions, all built and ready. And we have no build tools. And then again, when you run the stock file, that first stage just goes away. So, basically, tip number three, use a separate build stage like this to create smaller runtime images. Really, really useful technique. Okay, so we've talked a lot about the size of your image. So let's dig into maybe the context of your image. What should go into a Docker image? What should be present in your containers? I have a few tips based on my experience with Docker images and Rails apps. I'm gonna cover some of the things that I think aren't covered enough. Basically, things that are often overlooked, but I think are still very important. First, encoding. I remember back in 2007, when I first started working with Rails, encoding was a big problem. We ran into, UTFA wasn't as widely used as it is now. We ran into encoding issues all the time. We had this very specific checklist that we set up when we deployed things to make sure that things were set up properly. Nowadays, Ruby strings have very important coding support, but the operating system that the locale set still has some kind of odd effects on the way that Ruby has those encodings. The rules are a little bit subtle. But in general, if you don't set the locale in your operating system, you don't set the encoding. Sometimes you might get Ruby strings that are US ASCII rather than probably what you want is UTFA. So it's very important to your Docker file to set the operating system coding that's not already done in the base image. This is what that might look like, parts of the indebian. So, again, next tip, make sure that locale information is set. Often overlooked still, but still very, very important. There's another thing that's often overlooked. Seems obvious, maybe, when we first look at it, but it's something that we don't often do, we don't often think to do when we're using Docker. And that's in production, do we run as a root? Control. I hope we're not running as a root in Rails. There's no reason to run Rails as a root, and there are, of course, a number of security issues that could happen. But when you're running containers, remember that containers isolate your user space. So the default user in a container is a super user in that container. So unless you explicitly set the user and set an unprivileged user in your container, you are running Rails as root in that container. So it's good practice to create an unprivileged user in your container and use that when you're running Rails. So, again, next tip, create an unprivileged user. Now, you might say, okay, it's, is this really necessary? Containers are supposed to be isolated, right? Containers are isolated, users are isolated. Does it matter if I'm running as a super user in a container? The answer is generally, actually, yes, it still matters. And that's because security, the best way to secure your systems is really defense and depth. If you don't need to run as a super user, then don't. Set up the unprivileged user. What's really, just suppose, for example, your Rails application gets hacked. Now the shooter might have super user in that container. What could that user do? What could that intruder do? Could they install something nasty in your container? And cause your container to do unpleasant things? Worse, could, how, how confident are you that Docker itself will never have a security flaw? That could allow a super user in the container to get out of that container and get access to the rest of your system. That would be kind of catastrophic. So, just use an unprivileged user. It's best to put as many layers of security in front of your application as possible. So, again, often overlooked, but very important. Next, let's move on to entry points and talk about something else that's often overlooked. So, if you've used Docker before, you've probably seen these two different forms of a command or an entry point. There's exec form, which is basically a string of posix, string of words, that form of posix command. Then there's shell form, which is a single string that gets passed to a shell. Generally, you probably might have heard that it's generally recommended to use exec form. And yes, there's various reasons why this is true. One of the less commonly cited but very important reasons for this has to do with signals. So, when you need to stop a running Docker container. So, for example, you call Docker stop or you're running Kubernetes and Kubernetes needs to upgrade your app or it needs to scale some things and so it needs to stop and start containers. What it's going to do is it's going to send a signal to the first process, the main process, process ID one in your container. So, instead of the same term or a same answer, whatever that signal is. If you use shell form to start your container, that first process is the shell. That is not your Rails app, it's the shell. And a fun fact about shells, they by default don't propagate signals into things that they start. So, what's going to happen here is your Rails application, the shell is going to receive the same term, but your Rails application is not and instead of continuing on its merry way and not know what's going to happen, it's not going to clean up after it's done. Eventually, and then your container is not going to exit and so eventually Docker is going to have to or Kubernetes is going to have to go in and force kill your container. You don't want to do that. You want nice clean up for your processes. So, very common, often overlooked problem with shell form. So, again, our next tip, prefer exact form. If possible, use exact form. Now, I know that there will be cases when you'll find shell form to be really useful. Maybe you need to do shell substitutions in your Docker cloud or something like that. If that's the case, there is a workaround. Insert exact in front of your process. Exact is a bash keyword, it's a built-in. It basically tells bash, this process is part of the main thing that's going on. So, propagates signals into this thing. So, you need to use exact, another pro-tip, prefix or command with the exact. So, if you can do shell form, prefix or command with exact. All right, let's see. Starting to, well, I think we have time. One more tip. So, Docker includes this feature called on-built. On-built, lets you define commands that run when a base image gets used in an application image or in a downloaded image. So, for example, you might write a base image that looks like this. And then when an application image builds from this image, these on-built, these two commands get run implicitly, immediately, kind of at the beginning. So, seems convenient. Seems kind of like a good idea at the time. Turns out in practice, it's usually not worth it. So, tip number eight, avoid on-built. There are several reasons for this. First, on-built makes some assumptions. It basically represents a base image making assumptions about what's being done downstream about the application structure, about what it needs to run. So, for example, you're copying the application image or the application directory. Where is that directory? Where is that application? There are assumptions being made here. So, it turns out that on-built really isn't as useful as it might first seem. Another thing, generally, for build process, it really is best to be very explicit, very transparent about what's going on when you build your application. On-built basically removes that. So, it's running things implicitly in your Docker model. Things that are defined by the base image, which is not necessarily part of your application itself, not presence in your source code. So, generally, I just recommend forgetting that on-built exists at all. You don't really need it. So, so far we've covered some tips regarding optimization of size. We've covered tips regarding what should be in your image. Let's take a step back and take a little broader view about your application writing and production and how those containers should look. Now, your real-world application is probably more complex than just a single-rail server. You might have background processes, side-cape workers running. You might have other services. You might have Ncache running. You might have multiple replicas of your application running for scaling. So, do you run all of that in a single container? Do you split it out into multiple containers? If you do, how do you do that? There are a lot of interesting questions, interesting architectural questions around this. I always have a lot of time to go through all of those, but I will touch on a few basics here. So, first, remember, Dockerfiles are containers, sorry, are about isolation. Containers are about isolation. This is important for writing and production because it enables predictability. It enables predictability. It lets you remove unknowns and so you can understand the behavior of your application. Predictability is the key to a stable production system. So, here's an example. Again, containers, isolation. Containers isolate resources like CPU and memory. You can tell Docker when it runs a container, just give it this many cores and this much memory to run. And that will prevent that container from spinning out of control and taking down the rest of your system. It's very important to do this in production. Always specify those resource constraints. So, take advantage of this feature. Really crucial to maintaining a stable, predictable production system. Similarly, if you use Kubernetes, set those resource constraints in your analytics. Also very important for Kubernetes because it allows Kubernetes to do some interesting things like bin packing. If it knows the size of your containers, it knows how many of those will fit on your system. It's able to do that in a smart way. So, it enables one of the really powerful features of your orchestration system. Now, some of you might say, well, that's great practice or a great theory. But in practice, occasionally we have containers where it's difficult to come up with kind of a static, fixed size for that container. Static, fixed resource constraints. If that's the case, what I would say is if you're having trouble coming up with static, fixed resource constraints, that's actually a container design smell. That's a sign that maybe your container might need to do too much. And maybe it will be useful to think about how you can break up that container. So, it's kind of a useful tool to decide what should be in your containers and how you should structure those containers. Here's an example of that. Some app servers like Unicorn might let you free-form workers, right? And some of them will even do things like scale that worker count up or down based on traffic. Now, again, opinions will kind of differ on things like this. But in my experience, doing this in a container generally is not a great idea. It's again, because it makes the resource usage in that container less predictable. Even if you fix the number of workers and you have copy-on-write memory, still copy-on-write memory can be tricky to predict the behavior of that, especially with a language like Ruby where there's so much dynamic stuff going on. So, in general, I recommend not free-forking in a container. Don't try to scale up inside a container. Just run one worker in a container. So, K for it to be multi-freaded, that's generally, I found to be okay, but forking multiple workers tends to make that, those resource constraints are a lot more reliable. So then, how do you scale? You've done free-forking. You scale by add-ons and containers. So, again, containers are best used now as a unit of the scale. Each container should have a static, predictable, resource constraints. If you need more resources, just add more containers. It's quite simple. One more. Logging. It's one of the basic elements of monitoring your application. By default, Rails logs to a file of the application directly. Don't do this in a container. Make some difficult tasks as your logs because the container is isolated. You have to log into the container to get access to that. You want that. That's your logs. Maybe, additionally, Docker's file system, again, is designed for layering. It's not designed for high throughput data. So, if you have big logs, you might want to use some of the source software. So, instead of directory logs, you'll decide the container. There are various ways to do this. The easiest is just to write the standard out. That will let Docker organize, handle your logs for you, give you a list of generic APIs. One of the easiest ways to do that is just to set this environment variable. That tells Rails, by default, to make a production to write logs to standard out. Of course, not before a more sophisticated solution if you have a lot of agent like FluidZ. Go ahead and use that. Again, it's probably a good idea to run that in another container. So, again, however you do it, make your logs useful by directing them outside your application container. Okay, so we've covered a bunch of tips. Hope you've learned a few things. If you didn't catch all of those, I saw some of you trying to snapshot a lot of these. I would post all these tips along with the slides. Along with some links and examples here. This URL, it's not up yet, but it will be by the end of the day. So, this is a slide to snapshot your log. Okay. Okay, so thank you all for coming. This is great. Again, I'm part of the Google Cloud Platform and we have a version down in the middle of the call. If you're interested in talking about containers or if you're heading to Stocker, or about said, if you have a lot of fun things that you're doing in the Google Cloud, machine learning and various hosting options, I'll be there to do it for most of tomorrow. And we have a whole team of my colleagues who are there to answer the questions. So, come down and hang out. Thank you very much. Thank you. Thank you.