 Okay, I guess that's my cue to get started a little bit early How's everybody doing good? so I'm gonna be talking about a It's my slides up yet So I'm gonna be talking about So in a world of infameral containers, how do we keep track of things so essentially about how do I keep track of state? in a World where containers can kind of disappear So My name is Ian Lewis. I'm a developer advocate at Google. I I work on the Google Cloud platform team I'm based here in Tokyo So if you guys ever come back Hit me up. I'm on Twitter. So there's my ID. It's Ian M. Lewis It's really original so So today's today I want to talk about like, you know a few things So I'm gonna basically give a kind of a context of like clusters and ephemeral containers and then Talk about actually like the problem of keeping track of things why that's a problem and and a little bit about how to solve it several different patterns and ideas about how to solve it and give some concrete examples and Then kind of wrap up and at the end there So so just to kind of set expectations I'm not gonna be giving you guys like a silver bullet or or something that you can go and and take home and and be like Okay, if I do this then like, you know, my state will be okay And you know everything will be all right And so there's no real silver bullets like there's a number of different patterns and trade-offs that you can use to to deal with with state and You actually do use different patterns for different types of state that you want to deal with in a in a container or a cluster of containers so ephemeral containers So like So just to kind of like get an idea of the of the room. How many people like actually know what containers are Okay, so like pretty much everybody. Okay, so how many people have actually used Docker? You know played with it run the Docker command. Okay How many people are using it in production? Okay, a few or Okay, so so how far down this rabbit hole should we go? So are you using it do you have like a full? Like see I you know click to deploy kind of thing going on. Who has that anybody? Two people Awesome Okay, so so these guys are getting their Their their salaries worth at their companies so So what are ephemeral containers or what are containers in general so like containers are just You know, there are a technology for for for encapsulating things into making things easier to deploy but what does the ephemeral part mean like And I'm going to kind of just give a You know kind of hand way over the part of like actually trying to get you Into actually using containers, you know, I'm just going to kind of assume that we all want to use them and we all want to to We're all on the same page in terms of actually using containers versus not using containers and They give us a number of things so like they're a great way of doing of providing isolation Between jobs so they don't interfere with each other. This is more like resource isolation or like isolation in terms of like them seeing each other. This isn't something like security isolation but you know other things like life cycle and discovery like actually You know service discovery setting up services doing monitoring and health Checking those types of things may become really a lot easier in containers, but ephemeral containers. What are they? They're they're essentially Disposable, right? They're things that you can throw away So that's what ephemeral means. It's basically just means temporary, right? So if you if you're storing state on something that's temporary, then your state is also temporary, right? So if like that's your say your credit card details or whatever or your customers credit card details And you're like kind of those containers just kind of go away. Then that's that's like important information that that You know your your customers wouldn't really be happy about that so So I'm gonna be talking mostly in the in the context of kubernetes and end of container engine because that's what I know best You know working at Google makes you kind of bias in the technology choices but so So what it kubernetes is basically a container over striker So I'm gonna be talking of storing state and and coming up with state in this in the context of actually Running a bunch of containers in a cluster so Running containers on a bunch of different machines, not just on your local laptop or or whatever And so kubernetes is basically a cluster orchestrator It runs on a bunch of different machine or creates a cluster for you that you can run containers in and It schedules the containers for you So if you say okay run this container It will decide where which actual physical machine to run it on or which VM depending on the environment and It's built based on based on Google's experiences Running containers inside of Google. So we run pretty much all of our services in containers and Kubernetes is an open-source project. So it's like an open-source project for real. It's on github and You know, everybody can use it there. It's not like other ones that shall be Remain nameless that that are they come out every once in a while but so it's really really cool actually to go to kubernetes on the github page and see all the issues and the And the discussion kind of going back and forth between the developers Because it's really gives some insight into how like people at Google think and how other people at other companies that are really Into the going full in on the containers thing how they how they think So kubernetes like introduces a couple of different of new Terminology or a few terms One is called the pod a pod is essentially a set of containers that get scheduled and run on the single host so It's essentially the atomic unit of scheduling when you're scheduling within the container or within the cluster and so When you actually say run this this container you actually don't really say run this container you say run this pod and it Has a number of containers defined within it We also have the idea of a replication controller and a replication controller is something that they kind of babysits your pods So you basically just say like hey, I want to have this many pods and you give it a template and that template says gives is a is that kind of a way of You know a cookie cutter that allows you to like create a bunch of different pods or the the right number of pods that you need And then kubernetes also provides a something called a service and this is a way of Creating a single endpoint basically gives you a virtual IP and a port that you can connect to and Then have the the connections load balanced across the all the pods that are actually implementing your service And then labels so you can attach labels to any Any resource but essentially you add these two pods and then you use these to like As part of a selector when you do the service so this when you create the service so the service has says like okay, you know my you know app engine X or whatever is Is what's in my service? And so it selects all the pods using the labels to know which pods to actually rot traffic to and so it's basically a way of Managing a cluster of machines and it's very declarative So you basically say here's the desired state of the cluster that I want to do Or I want to have and it kind of figures out how to Make it so So just to give you like a little demo of that Let's see So this is gonna be like pretty small On here So here's container engine So this is like the easy button, right? So you can start up Kubernetes on like any one of your on like bare metal or on AWS or on like whatever But it the easiest way to do it is to like do it in cloud platform So it's like you're really Into that or or if you want to try it out, then it's it's much easier to do it here so you can come here into the The console and like you can create a cluster. I've kind of done the The cooking show thing and created one already But just to show you how you do it like you can you give it a name and the type You know the place that you put it and then the machine type and Then you give it the number of nodes that you want to run And then it will create the nodes and install Kubernetes on it and and have it ready for you to actually access So I've created one already so in order to access that I Have a You have what's called the kubic control command And you can say things like get pods and this is basically just an API client The Kubernetes has an API that you can hit To get all the information that you need about like what pods and services and things like that are running So here I don't have any pods running So I'm going to start up my my service. This is actually kind of a guestbook service So here I'm going to create a my SQL pod and service and a memcache service right and then So I've created Can I maximize this? Yes, sure Make this a little bit bigger and then Create the replication controller for my front-end. It's a PHP application and the and the service for my front-end so Each one of these are like different applications like so once my SQL ones my memcache And one is the actual front-end PHP application and each one of them has a service associated with it That you can access so that they can actually talk to each other and So what does this look like in the in the cluster? So here's a a Kind of an overview or like a visualizer that will show you the state of the cluster and so here I've got the These are all the pods and replication controllers. I just created this and this front-end part is a service and This over here is the replication controller. This is my little babysitter that babysits my pods So I don't have a replication controller for these ones You can create just regular pods, but that's not particularly advisable, but you can do that and then you can do things like You can scale You can scale the the replication controller to have a different number of pods And so here I just it's actually too fast, but I just scaled it to to five pods and it started up five pods Here and so it basically just started up two new pods. It just and kept the the old ones around So that's what it means by basically declarative state. You tell it what How many pods you want it to run and then it kind of figures out how to get from where it is to where it needs to be So that's a pretty simple kind of demo. I'll probably come back to that a little bit later So so where's the state? So I've got this actually let me go back and actually show what the application looks like So I created a front-end that will give me an actual public IP or public load balancer and So here's the actual guest book. So if I Blow this up and then like start typing some messages then I can save those in the in the database and actually Reloaded and see that So, where's the state of this application like what? What type of a state do we have? I think that there's a Several actual pieces of state Does anybody have any idea like like what types of state there would be? So like one one is obviously that there's the the memcache server has state So it it stores like the actual database with the messages and that sort of stuff and We also have We also saw that we have memcache running. So that's that's actually a kind of a type of state It's a a cache that has the cache data in it That's a little bit more temporary You can it can go away and the application will still run but I could have performance implications I did mention the database. That's what mice mice go But there's also like there was also a memcache that I started up. Yeah So and then so there's those two type of states. There's also There's also configuration for my for my application. So like how does my PHP application know? where the database is or what the name of the database is or Where the what the name of the service is for the for memcache Also like there's one more type of state that's really kind of an interesting one So the the state of the cluster itself like so how many pods are running and what services there are and things like that That's also a piece of state. That's that is somewhere So I'll kind of talk a little bit about each of those So yeah, so where is my state? Oh, yeah, I basically thrown it away already, but so the but the so I'll kind of get into the state and stuff a little bit later But I first I want to kind of talk about the difference between containers and images Because I think a lot of you are pretty familiar with how Docker works And many of you have probably run it on your machines and done Run all the different commands But one of the things that the Docker does is It allows you to create images like package them up and gives you a really nice format that you can put on Docker hub or whatever and So it's and it's kind of a read-only system like set of layers of of Actual files that that get they can get built up so you can Usually you do something like build maybe push to repository pull to like the host that actually executes it And then like start in a container from the image but when you're doing that locally you can kind of like start it up and and and run it and then You know either make some changes To the to the hard disk or to the local drive or whatever and then actually like do a commit like Docker commit or something and so that's actually saving some states to the to the image itself and so That's that's something that we actually want to get a little bit away from if we're gonna actually start talking about clusters of machines so in contrast to two images water containers containers are actually that the a An amalgamation of kind of a bunch of different features in the Linux kernel So there's a few apis one is called C groups that helps reach restrict resources for that our processes can consume CPU and memory and disk IO and things like that There's also a set of things called namespaces Which you can you can create a namespace for a set of processes or a processor set of processes that's kind of creates a a You know walled garden of of network interfaces and PIDs and and users and mount points and things like that so this kind of like walls off your your application and do it into a you know a box or something and the then there's a set called capabilities which a features they call they're keep called capabilities that Lip limit what the user can do so things like whether they can mount or kill processes or change the owners, etc So if you use these all together You can actually do a lot of like really cool things which is like which is what basically that containers do which is run processes that that have some sort of Some resource isolation between them as well as not being able to actually see each other unless you want them to So each one of them kind of thinks that they're own running on their own little box within a box So in a sense, it's kind of a very inception like but it's a They're essentially processes. They're just running directly on the on the machine or the host that they're running on So when we're using Docker and we're starting up a bunch of different containers from the same image you're actually creating a A read write interface or each one of them has their own you know Local system file system that you can rate read and write to but each one of them are essentially copies of the original image and so you can't expect that we each one of them could write that data down or write write to the file system and then Then save it and then be able to be restarted somewhere else like each one of them has different state and so and when you restart it like each one of the the Containers all start from the same image. And so they're all going to start with the exact same state so they need some way So you need to have a system where each one of them can start up and Essentially figure out what it's what it needs to do to get started or to actually do the work that it needs to do So there's a number of ways that you can do things like try to Create states that is going to outlive the container essentially And so one of them is you can create a host or mount a host directory to the container So one of them so what you can do is where the actual container is running you can mount it Mount a directory to the hope from the host to the container and then write to that directory and when you restart like when you shut down or restart the that particular container it will be able to remount that particular host directory and Continue on But there's a downside to this and that it's only available on that particular host you're also going to be You need to have a separate host directory or separate directory on each host or on the host for each container Because each container has its own will have its own state that it needs to be able to store And so you need to manage be able to manage those to those things those directories and So that can be pretty unwieldy so and the Including the the host directory like so essentially what you need to do is you need to try to get the the state that you're That you want to be able to store or to save like outside of the container completely So you need to be able to store it somewhere else outside of the container outside of the container cluster even that Where when you so that when you start up a container No matter where it starts up again. You can actually access the data That it needs to or access the state that it needs to So we'll talk about a couple of patterns for actually for actually doing that and doing it more In a more correct way, I would say So one is to basically use network storage that's available to your cluster So The the canonical example is if you use like GCE or or AWS or something like that For to run Kubernetes. You can mount a network drive. So in GCE would be or In compute engine, it would be a persistent fault or a participant disk So you can mount that persistent dicks to a particular Container and that container can then or that pod can can then continue to run no matter where it's restarted or moved around and So this is mute they're mutable like you can actually read and write to them, but and they outlive the container and also Very importantly, they're they're accessible accessible from wherever the the container actually happens to be running so you can so Kubernetes has a number of plugins that you can use to actually mount disks like this so like I Mentioned the GCE persistent disks and AWS block stores But as well also you have like NFS or ice-guzzy or those type of or like Gluster FS There's a number of plugins that you can use that that will allow you to essentially get the same thing so Next I want to talk a little bit about a few more patterns for actually keeping track of things So the the actual patterns are So storing Storing outside the cluster. I kind of talked about a little bit So you essentially run the the software that you need to to like store the state completely outside the cluster and then and so block stores are kind of a one type of that where you actually have the The the persisted volumes outside stored outside the cluster and managed outside the cluster And you can basically have something like this like however you like manage it however you like and It's basically it connects over the network so you can also do things like have a you know my SQL or or another type of database that's actually managed and and Outside of your cluster so you might have like say your company has a bunch of DBAs or something that That are are really good at managing databases. And so, you know, it's perfectly viable to actually just use that Or you could use another managed service from from a cloud service So something like cloud sequel or like you know cloud big table or whatever those type of things The the second kind of pattern is to kind of adapt it adapt your your database to to run in the cluster. So So this is more maybe more along the lines of the the mounting the persistent volumes But so most software except expects to be able to access the file system So like basically have the file system state stored elsewhere So this is essentially like running my SQL or a type of database on the on the cluster itself And then the third is to actually have a cluster native kind of application so something that's Was designed originally to run in clusters and kind of replicates the data around enough times that it doesn't really matter if if the If the containers get moved around a lot, you've replicated the data enough times that it's it's available for you no matter what This is actually pretty challenging so But it's it's definitely a pattern or a type of pattern that you could actually have So the the first pattern that I talked about running outside the cluster is a Look something like this so essentially you would have like your containers running inside the cluster and then Talking to a service that's outside the cluster and completely managed outside So adapting it to run in the cluster is is basically a Would be running it in a container inside the cluster and then accessing it over at a service Or something like that To get service to cut discovery for it But then basically just talking to that and then that storing its own physical like hard drive data Or file system data on a volume or something analogous and then lastly the the cluster native Approach is something looks something like this essentially each you have a bunch of nodes or a bunch of of Of of replicas of your of your database running and each one of those will replicate data a certain number of times and so that would actually allow you to Have a more kind of highly available Database But you there are a number of drawbacks to that so in practical terms you actually do need to have volumes attached to these but Theoretically you could have something that would run without volumes. That's that's ephemeral But you need to have a little bit more guarantees like Like say if you happen to have a database that has a that replicates and has a quorum of like three Three nodes then you have to make sure that all three of those those particular pods or containers are are not running on the same physical hosts And things like that. So if the physical host goes down then your data wouldn't be wouldn't be lost so one I'm going to talk about a Concrete example, but Basically what you need to do is to determine your apps data needs for each of the For the state that you need to store so So for things like cash like you might not necessarily care as much as as something like your actual database like your your mysql database or Cassandra or whatever you happen to be running so So essentially like the the outside the cluster like the merits of actually using the outside of the cluster pattern would be That it's pretty easy You don't have to really deal with the the nuances of actually running in the cluster And you may already have something like this So like you might already have DBAs running at your machine or working at your company that already manage SQL machines or my skill machines and you know, they managed Or they do a good job doing that and you know, they already have the infrastructure and everything in place So you can just use it. That's that's one definitely a viable option Or you could like use you like I mentioned a cloud service So cloud services are also things that are really really easy to use so you can just Because they're completely managed. You don't actually have to manage the physical machines And they also provide you with a lot of cool Features like like failover and and automation of backups and things like that so adapting to running in the cluster is is another option and So This is really cool because you can you know, essentially run it inside the cluster and and it can When you're running things like databases alongside other other types of applications like web servers that they kind of use CPU you can actually Utilize the cluster a little bit more efficiently than you would be able to otherwise and You can basically restart the process on any node So like if you're your if you had a mysql like running outside of your cluster And it was running on one machine and that machine happened to explode Then you'd be kind of in trouble but if you have been adapted to run in the cluster you could kind of restart it on any other machine and with a little bit of down with a short downtime and You'd basically be okay But obviously with things like this you would really want to test your workload and make sure that That you would get to the the types of performance that you did you that you want out of some running it inside the cluster so things like the networking and Actually storing the volume Storing data on a network volume will give you a slightly different performance characteristics than actually running something outside the cluster on bare metal so the the third actual The cluster native kind of approach so There's this is kind of a shout-out to to a project that was started YouTube called VTES and VTES is essentially a sharded mysql server and So it kind of manages a bunch of mysql shards and then kind of allows you to interact with it as if you're interacting with one SQL server or mysql server and and they've actually provided a number of Of set up scripts and things like that that you can use to to run VTES on Kubernetes In a container cluster, so it's actually kind of working in a in a containerized world It's like it's a really pretty good example of how you would actually get that running It has a number of kind of services internally to it to kind of manage the shards and things like that But it's really really interesting So There's actually the Kubernetes configuration included in the examples directory and on the github repository So essentially so just to kind of repeat it like you need to like Figure out your your apps data needs so like so Based on to and then consider the patterns that I talked about here before actually Implementing that in your In your app for your application So like one of the one of the things that I talked about earlier was the the service abstraction so you can definitely use that as part of actually trying to run things natively in a cluster and You you'll probably end up with multiple data type of services So you might have like one service that's that's for your key value stores or one one data service That's for your your mysql service or And and cache and whatnot And you may even have like different ones for your users versus your your your other types of application data or or logs or things like that and so This kind of a service oriented architecture is really the kind of way you want to go with that so kind of the last type of state that I want to talk about is is configuration and and secrets so Configuration is it is it kind of tight is a kind of state that Usually doesn't change and in this case. I'm actually this is actually immutable state, but so In kubernetes what you can do is actually you can configure containers with with environment variables You can actually pass environment variables directly to the containers when they start so You can do things like a change based on the context so whether you're running in development or production or You're running on GCP versus your Bayer metal or your your open stack cluster or Whatever and other things like logging in vague or the name of your service for your your back end back in database but one thing that you don't really want to be storing in the in the kubernetes config files or or in the Environment variables is passwords, so you you know storing passwords or SSA or keys Kind of encryption keys and things like that You don't really want to be able to want to have to store those in your repository or in a text file So what about these so you can actually store secrets in what's called a secret file or a secret in Kubernetes so you actually do create a file for this, but you probably don't want to commit it to repo because these this The password data here or the data for each individual secret is actually just base 64 encoded so you would basically create this or this file and like and and Created in kubernetes and then once you have a secret that secret is basically a Written one time you can't actually read the data of the secret through the API anymore, but You can then You can then mount the the secrets as what's called a secret volume to your to your container And that's basically just a directory with a bunch with a bunch of text files that contain the the data for each of the secrets so that's a good way to Make it so that your your application can read those type of secrets and passwords and things like that without actually having to Interact with the API and send passwords over the wire and things like that So to wrap up like here's a here's a few kind of other Resources that you can take a look at to to learn more about containers and state and How a cluster management works? The first is a is a really good talk by Jag Wilkes and the the second link here is actually the the white paper that we at Google have Released about Borg which is our internal Container management system so kubernetes was based on Borg and Kubernetes itself as well as several patterns in Kubernetes so the each of these are Kind of talking about how to actually Create state or save state and stuff. So in here is some examples of how to run my SQL or Cassandra As well as how to use secrets Sorry about that Everybody done taking photos? You're still taking photos It's okay. If you if you don't get it didn't get a photo then I'll you can I'll send you all of this so where you can take a picture of it from here or something later So that's all I had for today. So thanks a lot for coming and I hope you got something out of it I'll take some some a few questions Because I think I have a few minutes But it's so if anybody has them I'll take them right now Yeah, so so for like ice-cazi That would be something that like would have to be I'm not really familiar especially with with ice-cazi itself. So but Yeah, that would be how that would have to be something that would be managed for you to actually move it to another physical host if you were running in a physical You know environment around bare metal Right so in that sense in that case I don't think that currently like Kubernetes really has a way of setting like affinity to a particular host or something like that So that would be something that that would be a feature coming probably in the future But basically being like storing stuff on the host is is gonna be You know, you won't you won't have any guarantees that that your your container would be started on the same post again It does kind of restart things on the same host because you might be using things like hosted host You know host directories or ice-cazi or things like that that were running on the single host But obviously if the host goes down it will have to move the container somewhere else and then you're kind of out of luck Yeah, so I don't really have a good option for you right now, but yeah, right There's a couple things that you can do like for for for processes to like make them Get shut down more gracefully But obviously yeah, like if hosts explode or something like that, then you don't have the guarantees in that sense Any other questions? Gosh, you guys are really easy Shouldn't let me off like that Somebody's got to have a hard question That was actually pretty hard question. Okay. All right. Well, if you if you guys have any questions or think of something later Then I'll kind of be around for a little while. So just come and grab me and Ask away. Thanks a lot