 Welcome. Thank you. Hey, thanks, everyone. Can you all hear me? Thanks for having me at Python, EuroPython. My last big Python event was Python PyConJP in Japan last year. I didn't get to speak though, but it was really fun. Although most of the talks are in Japanese and my Japanese is getting better, it's not so great. My Spanish is really, really bad. They're very similar, so maybe I should learn both together. No, really, seriously, they are. No, it doesn't seem to make sense. We're going to talk about containers and having containers and having lots of containers, because ultimately everything is going to be containerized and we're going to have lots of containers. We won't know what to do with. I'll ask you some questions later and see how far you are along with moving towards containerization. Basically, when we have lots of containers, what do we do then? This is a problem we face at Google. This is a data sensor. This is a Google data sensor in Iowa in the U.S. It's a place called Council Bluffs and this is one of our bigger data sensors. If I leave it up for a long enough, you'll probably be able to count all the machines, but this is a cluster. Clusters are one of the constructs we have internally, but these clusters are broken into cells. Cells are smaller. We have many cells per cluster. This cell we're going to look at today is going to have about 10,000 machines in it. They're quite large and this is a huge amount of compute power. We need to make this available to our engineers, our software engineers, our developers. How do we go about making this compute power available to our own developers? This is what a developer does. First in context, the one thing, given what you see there, given what you see there, we don't want the engineers to have to select a rack, select a machine and say, hey, I'm going to run it on that machine. I'm going to SFTP a binary over to the machine, SSH into the machine, stand up my process, my server or whatever, maybe log into many machines and do that multiple times. That's not going to be possible. Huge amounts of machines, huge numbers of engineers, huge amounts of jobs to run. How does it happen? Basically, we have a configuration file. In this case, it's called a Bulg configuration file. I was in India recently and nobody there had heard of Bulg. How many of you are familiar with Bulg in Star Trek? We never used to be able to talk about Bulg because Paramount Pictures own it. It was one of our worst kept secrets that we had this thing called Bulg running internally. Now we talk about it all the time. Because it's fun and it's really good to show this in the context of what we're going to talk about later which is Kubernetes. This is a Bulg configuration file and what the developer does is he creates a job. Jason file calls the job, hello world says which cell he wants to run it in. Going back to what we said earlier, a cell is a few thousand machines. In this case, he's saying it's called IC or some random cell name we chose. He specifies what binary to use. In this case, hello world web server. He wants to run hello world on a web server. This is going to be a fact binary statically linked to all of its dependencies with it. Effectively, we can run it pretty much anywhere without having to worry about the underlying operating system. That includes the web server as well. This thing is quite big, probably about 50 megabytes. He specifies the path to his binary or her binary. Unfortunately, we have too many male software engineers, not enough female software engineers. Let's encourage women to be software engineers. And arguments. We have to specify some arguments to our binary, pass them in via the environment. In this case, we want to specify what port to run them. This is parameterized. Then we have some requirements in terms of resources. This is important. We'll circle back to this in a minute. We can specify how much RAM, how much disk, how much CPU. And ultimately, we can say how many we want to run. In this case, we want to run five replicas of this job. Five tasks, effectively. And why five? Why not do it at Google scale? 10,000. Makes more sense, right? We have all those machines. We saw how many machines we have. Let's find 10,000 copies of this. In this case, our software engineer, she types in a command on the command line. Passes in the config file. And that gets pushed out to somewhere. Gets pushed out to this ball scheduler. And what happens then is this. Over a period of time, in this case, about two minutes, 40 seconds, 10,000 tasks start. 10,000 instances of that job start. And it takes two minutes, 40 seconds, roughly. And then we're going to phase the rollout of all of these jobs to make sure we don't do them all at once. One of the key factors here is the size of the binary. 50 megabytes times 10,000. It's about 20 gigabits per second of I.O. We're going to be caching that binary quite a lot, but we had to move it around between 10,000 machines. So there's a huge amount of I.O. going on. But eventually, we get to a point where we have 10,000 running. Or nearly 10,000. And Borg looks like this. This is what Borg is to Google. It's not going to assimilate you, but I think we came up with a name because it's probably going to assimilate everybody eventually. So this is Borg. And Borg runs within a cell. So each cell has its own Borg master, its own Borg configuration. In this case, we have a Borg master, which is highly replicated. We have five copies of it for resilience. And we have lots of other things. The machines we saw in the racks, they're all running a thing called a Borglet. We have a scheduler. We have some configuration files in the binary. So what happens is the developer, the engineer creates his or her binary. And they use a massively distributed parallel build system called, well, I won't say what it's called, but it's externally available now called Bazel. So we made this open source. So our own build system is now available in the Borg master. It's called Bazel, B-A-Z-E-L. Or your American B-A-Z-E-L. Or your Canadian B-A-Z-E-L. It gets very confusing. Believe me, if you go to Canada, it's so confusing. Like routes and routes. So basically, he or she creates the binary, pushes it out, and it gets stored in storage for the cell. And then they push their configuration file. Configuration file gets copied to the Borg master. And then they have the Borg master back store, consensus based. And what happens then is this scheduler, looking around, comes along and says, hey, what is the desired state? We should have this running. Do we have this running? And it sees 10,000 new tasks and says, hey, they're not running. We should have 10,000 of those. Let's make sure that's happening. Let's fix that. And so it goes about planning and it starts telling the Borg master what the Borg master makes decisions and tells them the Borglets on these machines to run this particular task. So they get communicated. The task will ultimately run inside a thin container wrapper. So it has a container around it. It's not just running the binary. It is containerized. A very lightweight shim container. It's not Docker. It's not standards-based. The Borglet ultimately will pull the binary over and it will start running. And we will see this. All over our data center. Now we're running multiple copies of that. And so that's what we had. 10,000. But if we look at it a little bit closer, we find there's 9,993 running. Not quite the 10,000 we expected. But this is a highly available service. We expect some lessening of the number of tasks we're running over time due to the way we operate. And that's interesting. So let's look at that in a little bit more detail. So failures. Things fail. But failure is kind of more of a generic term here. There are many reasons for failures. And one of the main reasons for failures, particularly for low priority jobs, is preemption. If we look at the top bar, which is our production jobs, we have very few failures. And most of them are down to a machine shutdown. Where we've actually scheduled some maintenance on a machine, and we've taken the machine down. That task, any task running on that machine would then be rescheduled elsewhere in the cluster. We have a very small number of preemptions. Down here are non-production jobs, which are things like map producers, batch jobs. They get preempted all the time. They're happy to be preempted. And in fact, the calculation generally says that for about 10,000 tasks, about 7 or 8 of them will be not running at any given time because of preemption. They'll be about to be scheduled somewhere else, but they won't be running at that particular time. And we see other things here. We see again the... Can't see my pointer. The blue bar, which is the machine shutdown, which is pretty much the same as production. And we have some other things as well, out of resources. Very small number of machine failures. In fact, when you have as many clusters, as many machines as we have, machine failures are a given. That's the normal running of our business. And another interesting thing is how we try to make efficient use of our resources. So we have CPUs, we have memory, we have disk.io, we have network.io. And sometimes it's quite possible for one task to be using lots of memory for very little CPU. Or vice versa, lots of CPU and very little memory. If you put one of those on a machine, then you may be wasting one of those resources. So this is actually our virtual machines. Our virtual machines are actually containers, believe it or not. It's a Google computer engine. So these are all virtual machines, these bars, individual bars. And what we can see here is that some of these machines have available capacity, available RAM, available CPU. And if we look over here, we can see that some of these machines have available capacity, available RAM, available CPU. And if we look over here, we see a different situation where we have maybe some with available CPU and others with, with no available RAM and vice versa. This here, and this here is called resource stranding. It means we're not actually making use of that resource. So we have spare memory capacity or spare CPU capacity that's being wasted effectively. So one of our challenges is like a Tetris puzzle to try to stack these things in a way where we get the best possible utilization out of our clusters. So we will mix and match them to make sure we have low CPU, high memory jobs running with high memory, low CPU jobs. And of course we run multiple tasks per machine. That's extremely important. That can, going to come back to all this with Kubernetes shortly. And another interesting thing is this, which is going to be a huge challenge in the future when it comes to Kubernetes, but it's going to be really important to all of us. So we saw earlier that our developer, she specifies what resources she wants to use or he wants to use. 100 megabytes of RAM, 100 megabytes of disk, 0.1 CPU. And that would be this blue line up here. So everything's running will match into this blue line. These are the resources that were requested by these jobs. In reality though, it's like this. And so we have all of this wasted space. Which we can't use because it's been allocated effectively for those running jobs. But we can use it. So what we do is we effectively estimate based on the run patterns of the current jobs how much they're going to use. And that's this blue line here. So this is our reservation. This is how much we reserve specifically for those jobs. And what we can then do is reuse all that space. Now, we can reuse that space for very low priority jobs. Again, those batch jobs, those map reduces. Things that we want to run, we want them to finish eventually. But we don't really care when it happens. It could be like running some kind of monthly report that nobody ever looks at that gets logged. Or running a map reduce across a huge amount of data that may be important at some point or just needs to be done. But we don't really care what it needs to be done. So all of that stuff, we can reuse it and we can run jobs within it. So that's really important. That's how we can get maximum utilization out of all of our machines in that data center. And so moving on to Kubernetes now gradually. One of the observations is that if you have your developers spending time thinking about machines or thinking in terms of machines and you're probably doing it wrong because it's too lower level of abstraction. Now, today maybe it's fine, but in the future this is not going to be the case. We need people to be thinking in terms of applications and not having to worry about the infrastructure in which they run. Anybody who uses a platform as a service knows how important that is anyway. You don't care about the infrastructure. You want to write your work, configuration file, build a binary and just say run this for me. I don't care where you run it. I don't care about how you do it. Just run it for me and make sure it stays running. We get efficiency by sharing our resources and reclaiming unused allocations. And containers and the fact that we containerise everything allows us to make our users much more productive. So, everything we run runs on a container. Two billion containers a week we estimate. We never really thought that was very important until Docker came along and containers became the next big thing, right? LXC, then Docker and Docker became huge. And so now one of the things we talk about all the time now is we run containers all the time. And we are pretty focused on running containers. Which is why we created Kubernetes. If you are interested in more details of what I have just talked about, Borg there is a paper here gu.gl1C4NU0 and that is the white paper in Borg. That has all of the details and graphics you just saw. It goes into much, much more detail, of course. So, let's look in terms of a simple application and how we can do this externally with containers and through Kubernetes. So, this is a very simple pattern. Generally, when we give this target to PHP in the middle MySQL, Memcache, and we have a client. We have many of these pythons running. This could be many instances of Flask, it could be some kind of invented system, but we have the ability to run many, many concurrent requests. We are probably going to want to scale this thing on demand. We may not want to scale MySQL that much until we get to a point where we have to do replicas and sharding. Memcache, we have probably gone on a scale as well, but we are going to keep it simple for now. Just keep one MySQL, one Memcache and a few of these pythons instances at the front end. So, let's talk about containers. So, how many of you are familiar with containers? How many of you have actually spun up a Docker container? Hey, lots of you. It's almost the same number, right? Last year, we had asked this, how many of you have heard of containers, lots of hands, how many of you have spun up a Docker container? Things have changed now. Docker is the future. We now have this thing called the open container project. Docker have kindly made what they have into a spec. We are going to get behind it and we have this common specification from which we can write containers. Things like core OS with a rocket container, and we are going to do a full map for containers. Which is going to be great. But just for those of you who are not familiar with containers, a few slides, very few slides on containers, just kind of give you some of the concepts. This is why we used to do things in the old days. We have a machine, maybe next to our desk in our bedrooms or in a co-low or in the server room. The machine would run our operating system. It would have all of the packages installed at provided libraries . And how many of you have had a situation where you are running one application and all of the other applications on the machine fail because that one application went mad? Use all of the CPU, use all of the RAM, it crashed the machine and took all of the other applications down. This may have been a very low priority app, one that you didn't really care about taking down some really important ones. This is never a good idea running multiple applications on one machine. If you run a machine, whatever affects one application will probably affect all of the others. There is no namespacing. They all have one view of the machine in which they are running. They have one view of the CPU, one view of the memory, one view of the file system, one view of the network. They share libraries. So you get a situation where maybe one day you update a version of a package, it updates the library and one of your applications says, hey, I'm not going to run anymore. That library is not compatible with this. It's probably even worse. Applications are highly coupled to the operating system. This is a problem. And so we created virtual machines and what we did basically is stuck a layer on top of the hardware called a hypervisor. We now had an idealized piece of hardware on which we could run multiple operating systems. So now we have this thin layer. It looks like a piece of hardware to be running virtual machines. And that gives us some isolations. Now we can run applications in their own virtual machine. So each application is now isolated. If one application crashes, it doesn't affect the others. But it's extremely inefficient because we have this red bit at the bottom here. We have the operating system, the kernel. And you know when you install a virtual machine, you pretty much have to install the entire Debian stack or the entire CentOS stack or the entire Windows stack. So it's not very efficient at all. There's still the same tight coupling between the operating system and the application. And as anybody who's tried to manage lots and lots of virtual machines to provide isolation, you know it's hard. There are new ways containers. In this case, we move up a layer. So we move above the operating system and provide an idealized operating system. Now no longer idealized hardware and idealized operating system on which we can run the dependent libraries. So the libraries here are part of the container. So the container has an application. It has all of its dependencies. It has its entire environment. So we can move this container around anywhere we want to. We can move it from one machine to another, from one runtime to another, from a laptop to a virtual machine, from a cloud to a bare metal server, to a set top box, ultimately maybe even to a phone when we have Docker on it. And let's look at an example. So we have our application, PHP and Apache. It should be Python and Apache. Sorry. See? I do apologize. So wherever you see PHP in this deck, read Python. We'll change it before I share the slide. So I try to think of what could offend a Python audience most. There's probably talking about PHP, right? OK. So we have containers. So we want to run these components of our application. Python and Apache, Memcache, MySQL. Not Apache. Obviously, Python and Flask and Bottle and all of the other things we could potentially use. Memcache, MySQL and MySQL has its own libraries. It doesn't have any common libraries with the others. So we're going to stack those libraries with the container in which MySQL runs. And Memcache, PHP and Apache. Python and Apache, Python and whatever. Unicorn, anything. They have their own dependencies. But they also have, when we install them, some shared dependencies as well. So some commonalities. So when we actually create the image, we can actually share some stuff between them. But that's not shared at run time. So when we create the container, they will have their own dependencies packaged together in the container. And underneath that, we have a server. And again, this could be a virtual machine, it could be a laptop, it could be a bare metal server, it could be anything pretty much. And underneath it, we have the actual hardware. And all of this is being maintained by a Docker engine. So Docker is the thing that runs this. So when we talk about containers, mostly synonymous with Docker nowadays, but again, there are other container formats. And hopefully, they will all comply with a standard. And that's the Nirvana we're all heading towards. So Docker effectively controls the creation of these containers and the management of these containers. So at the end of it, we will have Python, Flask, Angular, Memcache, YSQL all running in containers. So why containers? So there's many important reasons for having containers, but you can see just by looking at what we do, that's the only way we can do it. We can't do it in the other way. This is perfect solution for the kind of scale that we want. But it's also perfect for smaller scale as well. Why? Because it's much more performant. It's much more performant in terms of the fact that we don't have to do all of that installation stuff. They are pretty much like they're running on bare metal. So the performance is pretty much the same as a virtual machine, but they're much quicker to get up and running. Which means you can swap them out quicker, you can do upgrades quicker, you can do pretty much everything quicker. Repeatability. So the whole problem where we have the development, QA, build, test, production, where we want to have repeatable environments, where we have a situation where when we test something in QA and then run it in prod, it fails in prod where it works in QA. How many people have had that situation? You have your head in your hands remembering those days, right? So what containers give us is the ability to have a consistent environment, because the environment's exactly the same. So basically, when we run it in QA and when we run it in prod, it's exactly the same. It's exactly the same environment. So that's one of the great use cases of containers today. But much more is the portability of it, which we're going to talk about in a second. Quality of service. We can now do resource isolation as well. Using things like C groups in Linux and namespaces, we can actually isolate the resources. We can say we only want this to have 100 megabytes of RAM, 100 megabytes of disk, 0.1 megabytes of CPU. And ultimately accounting. These things are easier to manage. They're easier to trace. They're easier to audit. They're small, composable units that can be tracked very easily. And ultimately, portability. You can move these things around from one cloud provider to another. Images specifically. You can't just pick up a running container and move it, but you can easily run the same container in a different cloud provider, on a bare mega machine, on a laptop. You can move them from one machine to another. As the shape of your cluster, if you have a cluster of machines, changes. You can move them around to be more efficient. So we can go back to what we had before with the efficient allocation of resources. We can do that if we have containers. And ultimately, this is a fundamentally different way of managing and building applications. So, demo. I'm not going to do this demo. I left that aside in my mistake. This would have been a containers docker demo. It's very easy to find a tutorial and get up and run them a bit. Let's not talk about that. Let's talk about Kubernetes instead. How many of you have heard of Kubernetes? How many of you can say Kubernetes? It's hard word to get your head around. Probably easier if you agree, because it's a Greek word. But if you want to help pronouncing it, I'll be outside in the Google booth after this talk. So I can definitely provide assistance on that. Maybe I'm saying it wrong. Maybe I've been saying it wrong all this time. So, I'm happy to be corrected. So, Kubernetes, let's talk about that. And we've given you an introduction to what we do at Google. So that should provide the context on why Kubernetes is necessary. Something we often miss out, Moog of Talks, is that we don't really provide that kind of context. So I'm hoping that the introduction to Borg has probably provided that for you. So, Kubernetes, Greek word means helmsman, or it's the root of the word governor of some reason. Arnold Schwarzenegger's governor comes from Kubernetes. And it's effectively an orchestrator or a scheduler for Docker containers. Ultimately, for other forms of containers, I think CoreOS is already using it to schedule orchestrate rocket containers. It supports multiple cloud environments. So, I always forget them. VMware, even Microsoft are involved. You can run Kubernetes on Amazon. You can run it pretty much anywhere. You can run it on your laptop with Vagrant. So you can just create a four machine cluster, virtual machines with Vagrant up and you'll have a Kubernetes cluster. And ultimately, eventually, we may have a situation where we can run Kubernetes across multiple cloud providers. It might be difficult. It might be possible. But it may be one day you'll have your fleet of machines will be running in Google, in Amazon, and Microsoft Azure as well. It's possible. I'm not sure what's going to happen. So this is kind of inspired and informed by everything we saw previously. Everything with Borg. And it's based on our experiences. Open source, Friggin and Go, like many good programs nowadays, but completely respect Python. I love Go. I love Python. I used to be a Java developer. I spent 15 years developing in Java. No, 11 years. Now I moved to Google. I haven't wrote a line of Java code since. Now I write... It's like Java programmers anonymous, right? It's been four years since I wrote my last line of Java code. So now I write in Python, and I write in Go, and I write in Angular, and I write in JavaScript, and all of those more interesting, useful languages. Java is getting better. Java 8 is a big step forward. And ultimately, we want to be able to talk about managing applications and not machines, which is actually what we talked about earlier. And some very quick concepts. I'm not going to introduce them, but I want to show you the icons so that when you see them, you'll know what they mean. Container, pod, service, volume, label, replication controller, node, are all of the key concepts. How many of you are familiar with sort stack? How many of you like the terminology in sort stack, like grains and such things? I think it's really hard to get your head around. One of the dangers about an abstraction is that you get too far away from the terms that are familiar to people. Most of these are familiar to people with service, the idea of a replication controller, a node, a label, a container. The pod is probably the most difficult one to get your head around. Let's talk about pods. No, let's talk about nodes first and clusters. So we have a cluster that kind of maps back to what we talked about earlier with Borg, where we have a master, and the master has a scheduler and it has an API, an API server that can be used to talk to nodes. The nodes are all running a thing called a Kubelet, and they have these things called pods running containers. We'll talk about pods shortly. They also have a proxy by which we can expose our running containers to the outside world, and we have many nodes. So a cluster, this is an abstraction, so a cluster could be different depending on which cloud provider you're using. Ultimately, what you want to have is a fabric of machines that looks like a flat shape in which we can run containers. You don't care about it, you just care that they're all joined together, and it's one big flat space in which we can run stuff. We'll let this thing, the scheduler, take care of running stuff for us, ultimately. So basically, the options for clusters are laptops, multi-node clusters, hosted or even self-managed, on-prem or cloud-based using virtual machines or bare metal virtual machines. Many, many options. There's a matrix down here, a short link. Hopefully we can share these slides afterwards. And the short link will give you a matrix of how you can run Kubernetes on what you want to run it on, CoreOS on Amazon. We have different ways of doing a networking. The networking is quite tough. Google computing makes it easy because of IP addressing, but often we have to put this other layering called Flannel to actually provide that ability to give an IP address in a group of subnets to a running machine or running pod. So let's talk about pods. How many of you are familiar with the concept of pods? Okay, not so many of you. So in the diagram here, we have a pod. It has a container. This is a container, it's a web server. It's a container. And it has a volume. Docker containers can have volumes. A little bit different, but very similar. So we want to run this web server. The construct we use within Kubernetes is to create a single pod. It's like a logical host. If you wanted to run Apache and something else alongside it, you would run it on a host machine. That's the same as a pod. This is the atomic units of scheduling for Kubernetes. This is what Kubernetes schedules. We talked about jobs earlier when we looked at the bulk. Kubernetes schedules pods. Your containers run inside the pod. So thin wrapper around them. These are ephemeral. These are like... I've got this analogy. Everybody uses this pets versus cattle analogy, and I don't really like it because I'm a vegetarian. So you don't care about the crops versus flowers. So pods are like crops. You don't care about them. You have a wheat field. You don't care about your individual plants that are growing. When you have flowers, you probably give them names, and you water them, and you talk to them as well. So you care about them. You don't care about your crops though. So pods are like crops. They can come and go. They can be replaced. But the thing is simple now. You don't have to worry about a pod if you want to run a single container. You just say, run this container for me. It will create a pod for you. And you still have to think in terms of pods when you're doing monitoring, but you don't have to create a pod. You just say, run the container for me. It will create a pod for you. So pods are an abstraction. Difficult to get ahead around. A little bit more information about them. Imagine this scenario where you want to have something that synchronizes with GitHub. It's maybe a push-to-deploy type scenario where whenever your developers do a merge into GitHub, you want those changes to be immediately pushed out into production or maybe on your staging servers. So you have a thing called a Git synchronizer and it's talking to Git, monitoring your project in Git. It pulls down any changes and it writes them to somewhere on a disk and your web server can then serve that latest content. Those things are tied together. They work together and it makes sense for them to run side-by-side. When one goes away, the other goes away. So we can run them both in the same pod. So now we're saying on this logical host, this pod thing, let's run two containers. In this case, Git synchronizer and a Node.js app or a Python app. And we have a shared volume, a concept of a volume which we'll talk about shortly. These are tightly coupled together. So when a pod dies, they die together. It doesn't make any sense to have them running separately. It might do in the way you architect things, but it doesn't have to. They share the network space and port space. They have the same concept of local host. They are completely ephemeral and think in terms of things you would run on a single machine. So a volume, what's a volume? I don't normally talk about volumes, but they are very important. So not talking about them seems a bit stupid, really. So a volume is basically bound to the pod that encloses it and this is where we can write data or read data from. We have many options when it comes to volumes. Docker already has volumes. This is slightly different, but very similar. So to a container running in the pod, the volume looks like a directory. And what they are, what they're backed by and such like and when amounted is determined by the volume type. So the first type we have is an empty directory. So whenever we create a pod, it creates this space somewhere on disk, on the local disk and they can basically share that volume between them, but it lives and dies with the pod. It only exists while the pod is there. So it could be your Git synchroniser is writing stuff to this volume, being read by the Apache server or whatever server and you don't care when the pod goes away, if that space goes away. It's just scratch data, it's just temporary data. There's nothing stored there that's important to you. And it can even be backed by memory as well. So it could be TempFS file system and that's great. It's really efficient, much faster as well. So that's what an empty directory is. That's the default you get for a... I don't know if it's a default actually, yet a specified type it is. So empty directory is one of the options. The next one is host path where we can actually map part of the file system of the node on which the pod is running into the pod. So this volume is actually effectively a snapshot of, not a snapshot, a link into the file system of the actual running machine. That's useful to read configuration data and stuff. But it's also kind of dangerous as well because it may be that the state on the node may change in such a way that whenever you run a pod on one machine to another, you don't run it. Whenever the scheduler runs the pod on a different machine, it may see a different view of what's happening. So it no longer becomes completely isolated. There's a kind of dangerous thing to do, but it might work for you. The other one is NFS and other similar services like GlusterFS. I can never say that. Anything with a G on it, I can't say for some reason. So again, NFS. We can mount NFS paths on our pod and expose them to our containers as directories. Or we could also use a cloud provider, persistent storage, persistent block storage. Now we call them persistent storage in Google. Amazon call them elastic block storage. That kind of thing. So this is persistent disk. So basically they can write and read the data from the disk and it will always be there whether the pod goes away or whatever. So what we're likely to do in this case is create a volume, a volume in the cloud provider. I call it a disk. We create a disk in the cloud provider which stores data and we'll mount it onto the pod. Whenever that pod goes away, the data is still there. The pod comes along and can mount it as well. And also with Google cloud platform you can actually mount and read only on multiple pods as well. So some patterns for pods. The first one is the sidecar pattern because basically it's motorcycle and sidecar. I guess in this case the Node.js application or the Python app is the you don't get offended when I say Node.js. The Node.js application is the bike and the the sidecar in this case. That makes a lot of sense, right? Ambassador, in this case, something that acts on behalf of the actual running container. So this is a secondary container, a Redis proxy that effectively allows the PHP application to make calls and then have the Redis proxy call that to shards. So we can just have one service that the PHP application calls for reads and writes and the Redis proxy can do all of the hard work of deciding whether to read from a master or read from a slave or write to a master. And the final one is an adapter pattern where in this case we have Redis running and we want to monitor it. We want to monitor all of our pods but we need a common format for monitoring. So in this case we actually adapt the output from the Redis monitoring using an adapter container. An adapter container will be plugged into the monitoring system. So it kind of adapts what's happening within the container. So these are kind of examples of where it makes sense to have a pod. I'm hoping it does make sense and I'll be interested to hear from you afterwards about whether pods make sense to you. So labels basically the single grouping mechanism within Kubernetes. This allows us to group things that we can build applications like a dashboard. So we have a running pod, we give it a label, labels are key value pairs, so in this case type equals fe, completely arbitrary metadata. Some of these things are meaningful to Kubernetes but mostly it can be anything that's meaningful to you. So we've put labels on pods and we can say, I can build a dashboard application that uses the API to say give me the pods with this label and I can show you all the status of that. And we can have different labels for different pods. In this case we have a version 2 pod. We have a different dashboard application with more than one nose. And that makes a lot more pods can have many labels. And I'm surprised myself with my slides sometimes. Makes more sense with replication controllers because replication controllers are things that actually manage the running of pods. Now remember I said before that we created 10,000 tasks and we pushed them out to persistent storage and a ball master and the scheduler comes along and says, these should be running but they're not. I'll fix that. So this is the same thing. The replication controller is responsible for managing your desired state. You say this is the way I want it to be. I want to have X number of these pods based on this container template or X number of these pods based on this container template and I want you to maintain that state for me. That is the job of the replication controller. So basically what they do is they work on a constituency of a label type. So in this case version equals v1 is what they select on. This replication controller is responsible for all pods with label version and we tell it I want to have two of those. So this job is to make sure there's always two running. In this case we also have another replication controller that has v2 of our pod. Version equals v2. I only want one of those. So make sure there's always one of those running. And the kind of way it works is that this is kind of like a control loop. So the replication is one big control loop. Simple as that. It says look at the desired state. How many of we got running? We should have four running. We've got four running. We've got three running. That's not good. We have four running. We have five running. That's not good. Let's take one away. So it just continuously monitors the state to make sure we had the ones running. It also works with a template. So we provide a template which is the pod template which contains the container image definition. And how many we want to run. We pass that into the replication controller. It doesn't create the pods but when we create the replication controller and we say we want two of these pods it says there's not two of these running. I should start them. So it starts them. That's how it works. And we can also plug in replication controllers after we've created the pods and say you're managing containers with this label. And finally we get to services. And services are how we actually expose our running stuff. And we do this through this service here which creates a virtual IP address which has a constituency of pods based on the label selector. Again, we're going to have labels on here. We'll show it on the next slide. So basically, certain pods with a certain label are the constituency of this service. And when requests come in from clients it will load balance them across the running pods. Regardless of which node they're on. So they could be 10,000 nodes, we could have 10,000 pods running on different nodes and it would load balance them across the running pods. At the moment it only works around robin but eventually it will have much more intelligence support for load balancing. This is used for exposing internal services with incubators and also expose running services to clients externally which we'll see shortly. It not only provides a virtual IP address but also a DNS name so we can do service discovery. And I want to move on. So this is a canary example. So who understands the concept of canary? Okay, a few of you. So basically when you have a situation where you have a running application, you want to try out a new version of it. You may have one instance or two instances of that running application that are different. So some of your traffic will be pushed to the new versions, some will go to the old versions. You can then do A, B testing against them to make sure that the new service works. If it doesn't you can roll it back, if it does you can push out the change for all of them. This is a similar situation where we have version equals V1, version equals V2, replication controllers and pods, but a service all it cares about is labels type equals fe and so the service has its constituency of all three of these pods but these pods are managed by different replication controllers. So that's how it works. Virtual IP address exposes that to a client. And so if we map the Kubernetes, it all looks kind of like this. We have pods remember all the symbols? This is why it's important. So that's a pod and a volume and a service and we have all a memcash a pod, a service and replication controller with a service. How does that look to a developer? Remember how it looks to a developer on Google? So this is how it looks. They specify a name they can specify the image. This is a Docker image now. It could be a different image format in the future for a different type of container format. I left it in deliberately just to upset you. Yeah, PHP. You can specify resources. 128 maybe bits. Maybe bits. Maybe bits. You can specify how much CPU and see Kubernetes unfortunately had its own idea of slicing up a CPU and I'm not going to get into it but it's like 500 bits of a CPU in terms of Kubernetes so you have to read the manual for that otherwise it won't make any sense. It's like a percentage but that doesn't work because you can't have a percentage of a core because you don't know how powerful it is. So that's how we specify a CPU. The ports, protocol, TCP, and the replica is one or maybe 10,000. Again, we cover that case as well. So that's how it works within a replication controller. There's other configuration files as well for services. And scheduling at the moment we saw the complexity of scheduling at Google. It's a bit simpler for Kubernetes currently. It's based on pod selection so we want to have the pod running based on the selectors and it's based on node capacity. So how much capacity does that node have? Is it capable of running my pod for me? If I have multiple nodes that can run my pod I'm going to run it on the one that has the least resources consumed by running pods. And that's the priority. In the future we'll have resource aware scheduling so we can do kind of what we did what we do back in Google where we try to make maximum utilization out of our CPU memory. Kubernetes is 1.0 as of this week. Now it's on 21st of July at Oscon in Portland, Oregon. It's been open sourced for over a year now. And we have a product called Google Container Engine which I'm going to talk about shortly. Not so much, but it is a good way of running Kubernetes. But it's not a product pitch. Most of Kubernetes I'm going to talk more about Container Engine shortly. And the roadmap for Kubernetes is there. It's kind of sparse at the moment because we've just gone through 1.0 so they're now deciding on the roadmap for the next releases V1.1. And the one on the roadmap currently is autoscaling. The ability to autoscale your nodes dynamically based on the amount of work you have. Container Engine is a managed version of Kubernetes and it manages time for you. You don't have to worry about the master in this case. It will take care of the master for you. You can't even see the master. You can't connect to the master. So one of the problems we have at the moment with Kubernetes is high availability. So we don't have that replicated master scenario we saw with Borg. So the only way to do it is to have multiple clusters to do high availability. But if we look after your master for you and make sure it's running, then you don't have to worry about it. We will make sure that your cluster is highly available by making sure that your master is always running. We can resize using things called Managed Instance Groups which we'll look at in a minute. Centralize Login. We can pull all of our logins to one place in the Google Developer Console and it also supports VPN so you can actually have your pods inside your own network, your own private network. So demo very quickly. And we had to change the setup earlier to make all this work. This is a cluster. We have kubectl get nodes. So we have two nodes running. So these are machines in our cluster. And I can look at them here. This is the Google Developer Console and I can probably make that a bit smaller. If I go into VM instances here I can see my running machines. I have a couple of action machines as well but these two in the middle are the nodes for our cluster. I have this thing called Instance Group which has two instances. This is the thing that manages the size of our cluster and below here we have container clusters. We can see we have one cluster. If we go to wait a second I've got very little screen real estate so I can't see everything that's going on. We can see a representation of what's running currently. These are pods. This is a pod. This is a service. And this is another service. MySQL is not running which is a real pain. I'll have to run it. I don't know how that happened. So we have a front end service. We have a memcache service and we have MySQL. We don't have a pod running. So I'm going to start the pod very quickly. That's why it's not running. We've just gone to 1.0 so all my demos break. I had it all running but we had to reboot my machine because we were having problems with the display. Now we have a pod. Have you ever spun up MySQL so quickly? I bet you haven't. The next thing we want to do is run some PHPs. Unfortunately, there are PHPs currently but I'm trying to get time to update them completely but I had some problems with flask and Angular. Anybody else had problems with flask and Angular? No? I should talk to your... Basically, on my bag it says that my Python skills are rated as three stars. I probably need to talk to you guys about doing it. Okay. kubectl create-f and we're going to create a controller. We have a file already created and we're going to create that. And now we have pods and a replication controller. The next thing we want to do is look at the running application because my windows are all screwed up and my struggle. We have it running. This is the IP address of the service as we can see here and this is the application running. It's devox. Anybody who's been to devox, you don't want to go to devox. I told them to fix this beforehand so we have an update and we can roll that out easily. Let's do that. Let's roll out an update to our cluster and I'm going to close that down so we can see the visualization and I will go... I'll reverse my history for this. I'm going to update to v2 of our frontend controller. What's going to happen now is it creates a new controller and now it's going to change those pods one by one to roll out our new version. 2.0 and 2.1.0 We're going to get rid of one of the 1.0 and then we have a 1.0 and a 2 and then we're going to create a new 2.0 pod and then we're going to get rid of the other 1.0 pod and eventually we only have 2.0 pods and we get rid of the 1.0 controllers, we don't need that anymore and we go back to our app. Nothing's working. We should get... I'm hoping it works. I'm hoping my SQL is running properly. OK, that works, brilliant. The other thing I can do as well I should mention it. I'm probably getting close to one at a time. What is the command? Is it RC or not? I don't remember. I always forget the scale command. So I'm going to do v2 and I'm going to scale it to 6 replicas. No, the 5 replicas. And then we go back to our biz. So now we want to add replicas to this. We can do that by scaling like that. And then we have 5 running pods. That's as simple as that. So now we have 5 running. We can do that also within the developers console. And just to wrap up on the whole thing just a quick talk about the last bits and pieces. That's how we visualized it. We visualized it using the API and aproxy. So kubicostal and aproxy. We just pointed at some JSON. The JavaScript is all JS Plum. So if you want to know what we use, the JS Plum. In terms of container cluster scaling, we have this thing called a managed instance group and that runs all of our nodes. And nodes run within the managed instance group. And we have this thing called an instance group manager that creates and is responsible for making sure to run in. So that's actually monitoring the cluster of nodes. And we have a template by which we can create new nodes on demand. So we can resize that managed instance group very easily. And yeah, I think that's about that for cluster scaling. We can also create clusters using tools such as the Google developers console, Google deployment manager, and Terraform. I was going to give an example but it's very basic. Terraform will create a cluster for you but it won't allow you to resize it. If you want to resize it, you have to replace it completely which isn't really what you want to do. So you can create clusters with various different ways. And oh, that's a visualization. Some frequently asked questions are answered in a documentation. I could spend entire hours on all of these subjects. So if you have questions, I'll be outside all day on the Google booth. Come and see me. And Kubernetes is open source. So we want your help making it even better. So please contribute to Kubernetes. And if you have questions, go to IRC. IRC. It's a very popular place. And also on Twitter, Kubernetes.io. You can tweet questions for me. Or you can find me on the booth. And that's it. We have time for one or two questions. At the beginning, you were talking about Bork and like five masters that you ran in and you were talking about how do you build these things? Do these figures are based on the data center? Based on the cell. So we break it up into the cell and each cell has his own Borg master. That's about my limited knowledge of how the complexity of how Borg works. Not being a sweet on it, sorry. But yeah, that's how it works. That's why we have Bork. Thank you for the talk. Very interesting. When you compare VMs and containers, even if the user of a VM has root access, it's very difficult to escape from the IPvice or etc. How do you see the security in the current containers implementations? It's a work in progress. This is about security with containers. I'm not really going to comment too much on it, but I don't think it's going to work. Initially, we had problems with the kernel level and syscalls and such like being made back into the operating system, but it's getting better. So ultimately Docker and such like becoming more secure all the time. Ultimately, do multi-tenant maybe currently with multiple customers applications run inside by side may not be the best idea. But we have to tackle that. I don't think we're quite there yet, but we're working on it. That's one challenge we need to crack. Can I ask you a question? One more question? Is that done? That's enough. Come and find me outside. Come and find me outside. We can talk about PHP. Python, sorry. From the organization, we want to thank Mandi to come and give her a present. Thank you. Oh, no. That's wonderful. Ah, fantastic. Exactly what I need. Thank you very much. Thanks for having me.