 I'm going to talk about using containers for building and testing and covering Docker, Kubernetes, and Mises. That's my tour of the handle. If you want to tweet about it, only good things please, but things you can tweet them to that guy over there. And, well, a little bit about me. I work at CloudBees. I work in what is called the private SAS edition team, where we basically run Jenkins at the scale using everything in Docker containers. And I contribute to the Jenkins Mises plugin. I'm also the author of the Kubernetes plugin. So I'll try to tell you when I'm biased for something, but that's where my experience comes from. And also the maintainer of, or one of the maintainers of the official Docker images for Jenkins and for Maven. I'm a longtime Maven contributor, a member of the Apache Foundation, and helping with any other open-source software that I've used. And I'm a Google Cloud platform expert that this comes from the Kubernetes side of things, whatever that means, that title means. Okay, so who is using Docker? Raise your hands in one way or another. Okay. Who is this Docker in production? Okay. Well, more than usual, I mean, the pace of Docker adoption is being, like, through the roof. So I love this tweet. The solution Docker, the problem, you tell me, right? And this is a lot of, what a lot of people are doing with Docker, basically just using for anything. But it's for building and testing and deployment and all these sort of things. It's actually a pretty good solution. And it helps a lot dealing with, I don't know, like multiple architectures, multiple operating package versions, tool versions, combinations of different things. But it's not trivial. This is actually how they used to ship containers from both into the hardware in some place in the Caribbean. So using containers is not trivial. There was recently a post, I think it was Foler saying, all right, you got to be installed to ride microservices and all the things that you need to do to be able to, like, use it and it's not just, oh, yeah, let's switch completely to microservices and Docker containers and everything and everything is going to be fine, right? So one of the things that you're going to do when you're using Docker containers at a decent scale or, I mean, pretty soon after you do the hello world is going to, you're going to need a cluster scheduling system. So something that is going to create a cluster of hosts, servers running Docker or maybe now other types of container runtimes or in the future. And you're going to, and especially this is what we built at CloudVis, running in public cloud, private cloud, bare metal. And this was our case, our preferred clients and HA and full tolerant, of course, and with Docker support. And there is three alternatives. Once you decide to go Docker and more than one host. So you have Apache Mesos, you have Docker swarm and you have Kubernetes. So these are the three big cluster schedulers that exist. So what is Mesos? Mesos is what they call a distributed systems kernel. This is a way to say you can run a lot of things on top of Mesos. Basically they abstract, Mesos abstracts like the operating system and provides you some primitives to deal with multiple hosts from the application layer. And you can run Hadoop, you can run Spark, Kafka and all these other big frameworks, a lot of big data work going on to Mesos. And that's what you can do there. So it started before 2011, so it's the first of them. And it can run any sort of task, not only Docker containers, but also just pure binaries. And Rocket containers and App-C images now, so the container format from the, what's called the Container Foundation. So you have Mesos, and Mesos just basically abstracts all this infrastructure for you. And then you run frameworks on top of Mesos. There are the ones that actually do something. So some of the things we saw Hadoop, they have their frameworks. And then you have Marathon, Mesosphere Marathon for long running tasks, long running services. So if you want a service that is always running and if for whatever reason it dies, Marathon will restart it for you. Or if a host dies, Marathon will notice and will run it in another host. And then you have Apache Aurora that is doing something similar. And Aurora, it's being, so both of them are being used and Mesos are using Twitter, Airbnb, eBay, Apple, you name it. There's a lot of big companies behind it and also the traction it had over the all these years. There's another framework that is Kronos, that is like a distributed Krono-like system. And I'll talk later about the Jenkins framework that runs on Mesos. For Docker Swarm, this is something built by Docker Inc, the company behind Docker. And the first version of Docker Swarm used the same Docker API. So it would allow you to basically point your Docker client to a swarm API. And then that swarm API would run whatever you asked to run across the cluster. So you wouldn't need to modify the system tooling you had. Everything would run. It would be the same command line, the same options, with the same Docker client. But I guess they realized that had some limitations. In Docker 112 they came up with this new Docker swarm, swarm mode in Docker. And it's included by default in the Docker demo. So you don't need to install anything else. And I guess they play with this ability for them to include these features in the Docker demo. And then everybody was going to get them for free. So they had like a first step on the door for you to use it. And this, with this new Docker swarm mode, they changed the API or they created a new API, better, where you have a new object that is called the service. And this object is what basically defines how something, some Docker container runs across multiple hosts and everything. Same reasoning as in Mesos, if it dies or a host dies, then the cluster will notice and will restart it if you configure it to so and in another host. With the big difference from the previous swarm, that existing tooling needs to be changed because this is creating a new API, a new model to deal with containers in the cluster. The last of the three is Kubernetes. So it's something that came from Google based on what they were running on their Google systems. And it can run on local machine, virtual cloud. And of course, Google is making it so the best place to run it is in Google Cloud. And they offer a service called Google Container Engine, GKE, where you basically go and say I want to start a new Kubernetes cluster and it will create it for you. But then you can install it anywhere. And there's some, there's a nice provider, well, like page stack point where you can create clusters in different cloud providers. And then you have commercial software and some top also like Chorus Tectonic. And you can run it in Azure. And you can run it even on your local machine. And for the local machine, actually it's MiniCube. It's a VM that has Kubernetes installed just with one node, but it's great for testing and playing with it and with the APIs. So when we were building this scaling Jenkins goal that we have, there's, I mean, who's using Jenkins here in the room? Okay, I should ask who is not using Jenkins. And who is using MISOs? Anybody using MISOs? Not one, two persons. Docker swarm, like two more. And Kubernetes? Four or five? Okay. All right. So, well, if you are using Jenkins, you know that's how Jenkins works. There's two options we saw to scale it. So you can have either more build agents or slaves per master or more masters. And if you have more build agents, there's plenty of plugins that you can use to create new agents. There's like the old Amazon EC2 ones to create virtual machines or Azure machines or anything, any cloud provider. And dynamically, like when you have a lot of jobs, they get created automatically. And I will talk about the ones that work with Docker containers. And the problem is that the master is still a single point of failure. And if your master dies, then you have a problem, which is not, I guess, nowadays you have a resumable pipelines. If you were in this room before, there were some talks about pipelines and how there's ways where pipelines can reconnect to a master after the master gets restarted and the job continues running. So if you restart the master, your jobs continue running and they don't get killed. And you have a problem having multiple configurations or plug-in versions of a restart of the master that basically you get downtime. And there's also a limit. Although that limit could be pretty high, there's a limit on how many buildings you can attach to one Jenkins master. And then the other option is having more masters which are with the benefit that you can have multiple organizations or multiple departments having their own master. So it's basically more like a federation. Well, it's like a sharding of your bills. You can have multiple masters with their agents into different organizations. The problems you have is single sign-on. I mean, how do you all of them connect to each other? Connect, how do you log into all of them in the same way? Or how do you configure all of them from a centralized place? But we have, at CloudBees, we have this Jenkins operation center and then the private substitution where I work or when I'm working now, basically it's doing the best of both worlds. It allows you to have multiple masters running on Docker containers and they all get configured from a single place with this operation center. And then all these masters are created in Docker containers so you can spin new masters whenever you want. And all these masters get configured to use the same cloud. We are using Mesos right now. So all the masters are sharing this pool of a cluster with Docker hosts running. And then another great quote is, to make error is human, to propagate error to all serving in an automatic way, that's what DevOps, right? And when you are automating a lot of things and there's chances that what you are automating is going to break and I have a different version of this that basically conveys the message is if you haven't automatically destroyed something by a mistake, you are not automating enough. And this happened to me several times, at least a couple, nothing really bad like this guy's from this week. But yeah, if you are not breaking something is that you are not trying hard enough, right? So that's my idea. So I always try to automate things. Sometimes you screw it up but as long as it's not too bad, it's okay. So how can you run Jenkins in Docker? We have several Docker images available. So you have the official Docker image that is built by Docker themselves. But we provide like the Docker file and all the new releases. And this has the latest LTS, well all the LTS versions. If you go to just Jenkins, Docker pool Jenkins or Docker run Jenkins, this is what you get, the latest LTS. And then you also have the Jenkins community has this Jenkins CI group in the Docker Hub. So Jenkins CI slash Jenkins has the weekly builds and we possibly will have more build also more than the weekly builds this week. Because so this is an automated build that we have that is publishing continuously every new release, every new weekly build. And this is built by the Jenkins community and pushed to the Docker Hub. So it's the same thing. Just this has the weekly bits, the other one is LTS. And then if you're going to run slaves in Docker, the one you need to be aware of is the Jenkins CI slash GNLP slash slave. So this is an image that has just the remote in bits. So it's based on the Java Docker image and it has the Jenkins slave. And when you start this, basically here it says Docker run Jenkins CI GNLP slave and you pass the URL and the secret and the slave name. This will connect to the master and then that's it. You have a new slave running in Jenkins. Obviously you probably need to do this because there's plugins that will do this for you and I'll show you later. And the other interesting part about this image is that you have two versions. One is based on the official OpenGDK image which is Debian. But there's also an Alpine image that is really small. I think it's like 40 or 50 megabytes. I don't know. It's a lot smaller than the Debian base well. So if you wanted to manually create 100 slaves running in Docker, you could just run Docker run all these times and point into your Jenkins master and then you will have them. So for cluster scheduling and Jenkins, what do you want? What do we want and when do we want it? So you want isolated build agents and jobs. You want one job to not mess with the workspace or something of another job and same thing for build agents. You don't want a job using a build agent and then another job having any sort of conflict with that. We wanted to use in Docker so it can start in like seconds. And you can also, we want also to be able to drop capabilities like this in the container world, like be able to not run as root and run as a different user, maybe not have access to network or not have access to something or another. And I'm going to go through the different features that the cluster orchestrators have and I'll tell you which one of them have what. Feature number one container groups. So in the Jenkins example, imagine you can have a Jenkins agent container, a Maven container and then Firefox container or Chrome container or Safari container. So you will have what is typically called a pod of containers and you can have five containers running for one job and if your cluster scheduler support grouping containers. Otherwise you have to build one container image that has all the tools that you need. So this is something that is experimental in Mesos in 110. So you need a pretty recent one. Docker swarm supports grouping through Docker compose and you can also force the execution of all those containers in the group in the same host. And Kubernetes supports the concept of pods natively and it warranties that all of them run in the same host and they can run, let me see here, and they can all refer to the other containers by using local host. So it came, the idea comes mainly from Kubernetes, that was the one, the first one implementing it and it's the first one implementing it and that's the power of being able to use multiple containers just for one job. Because imagine that you want to do a Maven build and something else or a Selenium test. If you have to create your own image then you have extra work to do with all those tools. This way you just reuse all the images that you have available in Docker Hub. You don't have to write any new Docker image at all. Memory limits. So the scheduler needs to provide a way for you to limit how much memory the jobs can use and prevent from these containers to go over the memory limits. So imagine you have all these resources in the cluster and you have different jobs trying to fetch, get these resources. You don't want to, maybe you have a build that is going wrong and it's using more memory, more CPU or something. You don't want that to happen. So all of them support memory limits. InMessus is actually required. InSwarm is optional and in Kubernetes they have some defaults. The ones that are optional. And in Kubernetes you can even do namespaces and so you can isolate containers into namespaces and having group limits set at namespace level. So you could say not just by container but saying whatever number of containers you run just make sure they don't go over this limit. And this memory constraints translates to docker-memory parameter. So I have some questions here for you now. I'm sorry, I know it's late and you are all tired but I'm going to make you work a little bit. How do you think it happens when a container goes over a memory quota? Like you have a build that runs the JVM as a sample I have and you set a memory limit for the container. Like what would happen? Any takers? Sec fault. Sorry? Sec fault. Sec fault. Okay. Any other options? Memory? Out of memory exception. Out of memory exception in Java. Okay. Anybody else? Memory skew? Container skew. Container skew. Container skew. Okay. Let me show you. So I have this... This is just a maven application, a maven build. And in the tests I'm just using memory and you know, whatever the normal Java thing, the database collection happens and it's using this memory without limits, okay? The container has no limits and this keeps using memory and the JVM is doing this garbage collection thing and this would run forever. So I'm going to kill it. In this one I'm going to set it to memory dash M 220 max. So basically I'm limiting how much memory the container has. 220 max. This is a random number. This depends also where you run this but what you see is... Let me put it here at the top. This is doing the same thing until it reaches a point where basically something happens and nothing happens because you get nothing. This just stopped running. So what happened? And the only way you can know what happened here is by looking at the... inspecting the container. Inspect this. And when you do a Docker inspect there's an interesting line here that possibly calls your attention if you know where you're looking. Otherwise then you have a long JSON to read. That basically tells you how I'm killed through. This is telling you the kernel killed your container because it ran over the memory that was set for that container to run. So whoever said that last one, he wins. Yeah. People, especially people coming from the Java world would expect a lot of memory exception and things like that. Now the problem with Java is when you run Java in a container environment Java is not aware of the limits of the container. Until Java 1.9 was saying some patch that was merged last week that supposedly makes it be a container C-groups aware. So until Java 9.0 you start using Java 9.0 properly in months, for now, or years this is what's going to happen. So your container... you're running Java in a container. Java sees the host memory and because I'm running this Docker in my thing I think the host memory is two gigs of the virtual machine where Docker runs and typically in like 90 something percent of the cases depends on certain rules the JVM is going to take one fourth of the total host memory as maximum heap size. And this is what you see here the limit, this is the max memory 4.444 and this is the same number that was at the beginning when I was not setting any limits so Java is not aware the JVM of the limit. So that's what happened. So how can we fix this? Because especially I think that this is running maybe you're running this in a cluster so you have multiple hosts now you're running maybe Jenkins jobs in containers and they just disappear get killed and you don't know what happened. So there's another another something we can do is something that is very specific of whatever you are running. So for maven you can pass JVM options as maven opts for and it's I think it's and opts or and options and you have to know what you're doing and say okay just you pass this parameter to the JVM and I'm saying okay XMX is 210 megabytes because I know I'm giving it to the total containers 220 megabytes so let's make sure Java is aware of how much memory is available and what happens here it's a little bit different in the sense that the max memory that Java sees is 187 okay so it's keeping it under the limits and this is gonna do more garbage collection but it's never gonna run out of memory I mean it's not never gonna get the container crash killed by the kernel now I was I was cheating a bit here because by default what happens when you run maven and you run tests on maven the default is maven will fork a new JVM to run the tests and I was cheating because I said it to do not fork that's an option in maven in the POM file where you can say whether to fork or not fork so in the surfire plugin exactly so I told maven not to fork so all this was running in one JVM now if I run it in the default mode even with the same parameters 220 megawatts memory limit on XMX 210 something's gonna happen guess what so this is for this is calling maven and maven is creating a new JVM for surfire and that JVM is running the tests so what's happening this is gonna it can take a little bit longer maybe the new JVM is seeing 444 so the new JVM is not aware of XMX that I passed to maven and what I'm getting is fail to execute call the fork VM terminated without saying properly goodbye VM crashed or system exit called so the new JVM is not aware of the XMX memory limits because I said in an environment variable this is for maven now how can we fix this well you have an option which is in maven in the PON file again you can configure the surfire plugin to pass variables or environment variables to the new VM so you could go in there and say XMX equals whatever but you could keep doing this over and over and over again there's a slightly better option which is a somewhat obscure environment variable that is underscore Java underscore options and this will work in OpenJDK and some JVMs at least and what this means is any new JVM that gets started will use these parameters so whenever I start maven is gonna use XMX 210 maven start surfire is gonna use XMX 210 so now this is gonna solve you a lot of problems I'm gonna just kill it this would continue working and this would solve you the problems unless you're running several JVMs and all of them are using the total XMX then you have to play with how much you give to each of them but if you're running one, two I mean this will be honored by all of them and you gotta be aware of what's happening when you run out of memory okay, oops, what did I do alright, so that's I didn't know that key combination okay, I talked about that then there's the CPU limits which is something like the memory limits and you can pass how many for messages to arm and Kubernetes and this gets related to CPU shares and what do you think happens when a container goes over its CPU shares over the CPU limits that you set well, nothing really what the memory limits means in Meso's memory limits in Docker's CPU shares which makes it a little bit more clear is how much percentage of a CPU you can get so if you say this is basically a weight and depending on how many containers are you running has how much CPU is going to get so if you say CPU shares is one and you run one container, it's going to get 100% of the CPU if you say you run two containers and both have CPU one they each get 50% of the CPU if you run 10, they only get 10% of the CPU each so it's just a weight across all the containers that you run, so it's all relative and the other important thing to handle on a cluster is storage and how you can distribute the storage so Meso's has 1.0 plus Docker volume support and Swarm also has the Docker volume plugins so you can use whatever plugins you use for the normal Docker and Kubernetes from the very beginning it had the concept of persistent volumes and all of them pretty much do the typical thing like EBS volumes in AWS, NFS and Glass that I think is supported in all rounds is just a matter of how you use it and also some considerations that these schedulers allow you to do is run as a different user, not just root but you have to be aware that the container user ID is not the host user ID, we get a lot of questions in the Docker image about the Jenkins because the Jenkins master is running as a Jenkins user which is always 1,000 inside the container so if you run it in a host, in an Ubuntu host, user 1,000 is Ubuntu so if you are mounting host volumes into the container, which is typically a bad idea because you have to deal with all these things and it's not very good it's not great to schedule it across a cluster but you've got to be aware or if you're using NFS, then all these the names and the users not the names, but the UIDs of the users have too much how the container is trying to access the data and how the data is what are the permissions of the data itself so NFS users for networking you need to open, for the Jenkins case you need to open the HCDP port the GNLP for connecting agents and also Jenkins has a sort of SSH server building that you could open if you wanted to I'm not going to enter into details and there's support to that allows you to get one IP per container in clusters in Mesos is more recent you can run with Calico, with Weave same thing in Kubernetes and Swarm Swarm by default uses the Docker overlay but all these options in Kubernetes is pretty straightforward if you run it in Google container engine if you run Mesos or Swarm then maybe you have to do a lot more setup and configuration to make it work in a virtual networking and just lastly I'm going to talk about the Docker plugins that are available to take advantage of running in containers so there is several Docker plugins there's one there's one, I think at least there's two for dynamic agents running on Docker so basically whenever you have a job they will spin a new Docker container and run the job in the Docker container and there's no support yet for the Docker Swarm mode because it uses a new API this is not yet supported the agent image needs to include Java and it will download the slave jar from the master so it needs to have connection to the master to download it and then you have multiple plugins for different tasks this is how it is today there's the Docker build and publish plugin to build Docker images there's the haven registry notification to get initiate jobs based on when an upstream image is updated and things like that and it has great pipeline support so I'm not going to go through the configuration but I'll show you like a Docker pipeline you can run Docker with registry if you want to use your private Docker registry you can do Docker.image and the name of the image to use it and then .pool to download it from Docker hub and then you can build Docker images with Docker.build and the interesting bits probably is this image.inside and then whatever shell commands you're putting they are run into inside the Docker container itself there's also a plugin that is pretty recent that allows you to it's called the Docker slaves plugin there's a lot of mixed names here and allows you to use any Docker image for containers without the need to have Java and also so basically it's a lot easier to reuse images and allows you to define the slave in the pipeline and you can have side containers so this is called the Jenkins Docker slaves plugin not to confuse with any of the other 10 Docker plugins that are there so you can do something in Maven with Docker node the name of the image the Maven image and then shell and whatever you want to run inside the Docker image the MISOS plugin allows you also to have dynamic insidians both Docker and isolated processes so any random program that you want to run in MISOS and the image has to have Java because that's how it runs the slave jar to connect to the MISOS master and you could have Docker you could run Docker commands but it's basically outside of MISOS you would I think I didn't explain it here okay so you can use Docker pipelines with some tricks like you need the Docker client installed inside the Docker image and share the Docker sock the typical way to run Docker side by side Docker so Docker container running Docker against your host Docker demo plus you need to mount the most workspace in the host in the same directory as the container that is running with this we have an example yes with this you can run this would be in a node running on MISOS and I can run a Golang image in the host and then I can do a go build with no problems reusing that Golang image but the only caveat is that this runs outside of MISOS this is just running in the host Docker demo so MISOS does not know anything about it doesn't know how much memory is using and what ports is using or anything like that so you're basically running outside of the scheduler and then the Jenkins Kubernetes plugin same thing you can have dynamic Jenkins agents and they run as pods so a group of containers so you can have multiple containers just one of them just has to be the JNLP one the one that runs the MISOS the Jenkins slave to connect to the master and if you don't set it up it will create it by default and it has pipeline support for both defining what these pods images are and to execute things inside these pods and in the next version that I can release I hope to release soon it's also having persistent workspace so all your agents can mount the workspace from NFS or EBS or whatever this is using just what Kubernetes provides so you can have one of the typical problems when you run things on Docker is that you don't have the previous builds I mean you start from zero every time you do a build but with this you could have a volume with your workspace or NFS or mount share mount or anything that is supported in Kubernetes and then you wouldn't need to start from scratch every time so this is what the pipeline looks like I'm saying this is a pod template I have a container maven, I have a golan container and in this pod, my pod what I'm saying is check out some git code and inside the maven container run a maven build and then inside the golan container run a go build so I can reuse the images from the Docker hub I don't have to create my custom image or anything with both maven and go and I can run both things in these two containers just with one agent and just to recap these plugins allow you to do them then dynamicating creation, they all use JNLP as the protocol to connect to the master in some environments you can use Stunnel to connect to the master depending on how you run this I guess we don't have time to go into more detail and they use the cloud API which is not ideal for container workloads right now because this was designed in Jenkins for Amazon images and instances and things like that so it may take a little bit longer to start the containers but there's a Jenkins one-shot executor plugin that we hope to include into at least the Kubernetes plugin and possibly it's going to be in the Docker plugin too and this basically is optimized for containers because the previous, the cloud API assumes like when you start an instance it takes longer so it keeps the instances around and doesn't start a lot of them at the same time because it has a cost associated but this one-shot executor is going to just create the container, run your thing and then kill the container at the end so that's me, if you have any questions yes first with the Jenkins slave in an image that we can just pull it and have a slave and I'd like to ask how do you synchronize dependencies across the containers then you went on and said well in fact in the slave you can run another container like in inception like a container in a container and so how would you then, in that container how would you then synchronize dependencies let's say I need a gradle right, right I need gradle of certain versions yeah, okay so how do we manage versions and run containers inside a container okay, yeah, so maybe I didn't explain we are not running a container inside a container you can have, so what a pod is you can start multiple containers but they are all, I mean they are not one inside another no, no, no, I mean the Jenkins when the Jenkins is doing the job building my image when Jenkins is building an image yes, and I you never, right now there is no good way to run Docker inside Docker so the only, the recommendation is always running Docker side by side so what you do is you have a container running your slave or whatever this container has to have the Docker client installed and you mount the Docker socket inside the container so this container can run Docker commands in the Docker demo in the host so when this container tells the Docker demo Docker run something is the host that is running and is running here so you are basically talking to the host and the host is creating another container and so they are all side by side and then how do I keep track of all the dependencies how do I keep track of all the dependencies let's say that I deploy five slaves and I have loads of builds in a pipeline and it's taking it one by one how do I make sure that all of my slaves have the right version of how do I make sure that all my slaves have the right version okay, the reason is the way it is done in all these plugins is the slaves are short lived the slaves ideally they run just one job and die so whenever you with a cloud API it may not be exactly all the time like that they may stick around for a little bit depending on some parameters to adjust but basically you are saying I want this job to run in Maven 339 and then whenever that job runs it will download the Maven 339 image and run your job and die and you have another job that says I need to run this in Maven 3.1 it will download the Maven 3.1 image run that and die this is the beauty of it you can have all sorts of combinations and using all these half images like the Maven image has versions for Java 7, 8 and 9 there's three different images so you could run some builds in Java 7 some builds in Java 8 the same build in 7 in parallel with 8 and they are all in different containers yes? if you have had to choose only one between Maven's, Kubernetes and Spark what would you choose and why? okay if I were to choose one I would choose Kubernetes but just because I'm biased as I said before it's gonna depend if you have it, if your company has it if your operations people already have something running then it's more likely that you're gonna choose that one Mises has the advantage of being able to run any process so it's interesting for maybe some more like high performance things and there's a lot of scientific things running on Mises because of that reason that you could run in bare metal things Docker Swarm has the advantage of it's coming by default with Docker and the new but it doesn't have the support and Kubernetes has a lot of open source community behind it multiple companies, Google is Red Hat is CoroS is all these people building on top of Kubernetes if you are running on Google Cloud like Nob Reiner, they already give you that for free okay so I'm getting boot thank you