 Hey everybody, my phone says it's two, my computer just said it was two, and the clock in the hall says it's two, so we've achieved distributive consensus, and I think it's time to start this. My name's Bo Ingram, I'm here today to talk to you on entitled Plains, Raft, and Pods. It's going to be a tour of distributed systems within Kubernetes. So a little bit about me, like any millennial, I've got to list my Twitter handle first. I am Biluptuous on Twitter, Bo Ingram was taken, so I had to get fancy. That's my dog Ernie on the right, he is universally recognized as quote, the man, and I considered taking out all my slides and just showing pictures of him, but I figured you all would leave and they'd throw me out, and it probably wouldn't be good. So I am a platform engineer at Craftsy. We're based in Denver, Colorado, and we're an online education and e-commerce company for makers and crafters. We sell classes and topics like quilting, sewing, cooking, photography, and we've got supplies, we've got patterns, we've got kits to help you make projects, and all sorts of fun stuff. So I spend most of my time working on back-end features there, but I also do some of the operations-y tasks sometimes, as well as some of your traditional site reliability-ish tasks. I'm probably somebody's definition of DevOps, not mine, but somebody's. So I'm big on definitions, so let's start off with what's the distributed system? So in distributed systems for fun and profit, Makito Takata says distributed programming is the art of solving the same problem that you can solve on a single computer using multiple computers. Now, that's a little bit of a sarcastic kind of tongue-in-cheek definition, but a distributed systems problem is a problem that you have on one computer that needs to go to more computers, usually because it's gotten too big. We can say that a distributed system is several nodes, components, pieces, whatever, working together to accomplish the same goal. So it's KubeCon, but what's Kubernetes? We heard a great talk this morning about what are all the things Kubernetes can be. Well, for me, I like to go look at the documentation. The docs say that it's an open-source platform for automating deployment, scaling, and operations of application containers around clusters of hosts, providing container-centric infrastructure. What? I read that four times, and every time I practiced this talk, I had to run through that slide twice just to get the cadence of it right. It's a mouthful, but I like to think kind of back to what the number one definition was this morning, which is that Kubernetes is a flexible platform for running containerized apps. Not word-for-word what it was, but close enough, but Kubernetes provides capabilities greater than, hi, I'm a container. Please run me. It's got this really awesome feature set, right, it can handle rolling deploys, do load balancing, you can add health checking, all sorts of stuff. And they're great, and I think a lot of talks talk about how Kubernetes does these things, but I want to do something a little different tonight. I want to peek behind the curtain. I want to see how the components of Kubernetes work together to achieve these things. My question for us tonight, or today, is how does Kubernetes leverage distributed systems? So to start off with, Kubernetes has containerized apps. Well, what's a container? Container could be said as the usage of Linux C groups and namespaces to provide process isolation for an abstracted file system. C groups specify limits for how much memory and CPU a process can use. And a namespace can stop process from interfering with other processes. But like all definitions, that's great, and somebody has said it better undoubtedly. Amy Chen yesterday, she said a container is just a baby computer inside another computer. Way simpler than anything I ever said. A containerized app, though, something we might run in Kubernetes, could be something like an Nginx server or a little microservice. Or if you're feeling stateful, perhaps a reticence instance. So when you think of Kubernetes, right, and you come to Kubernetes for the first time, you hear containers, containers, you know, I want to join the container team. But in Kubernetes, the base unit of scheduling and the way Kubernetes thinks about things is with pods. So a pod is one or more containers that share a unique IP address in the cluster. You're guaranteed to have all containers running on the same host. A pod can share disk volumes. This is pretty helpful in many situations. For example, if you get an app that logs to a file, you could have a sidecar container sitting alongside it in the same pod that handles forwarding your logs off to a log aggregation service, like LogStash or Splunk. So the way we typically manage these pods is through a deployment. A deployment is a declarative way of managing pods. We can specify what containers we want to run and how many instances of those pods we want to have in our cluster. Under the hood, deployments use a thing called replica sets to ensure that the requested number of pods is equal to the number of pods running in the cluster for that deployment. So when we create a deployment, which contains the container images we want to run for our pods and how many pods we want to have, Kubernetes is going to handle creating a replica set. This replica set will ensure we have a desired number of pods. The replica set will create our pods and then eventually our created pods will be scheduled by another component and then run by yet another component. So going through this a little fast right now, I'm going to start at kind of at the top level. We're going to dive deeper and then dive deeper. So we talked. Kubernetes is made of distributed components. When I first got into Kubernetes, what I imagined is I would go to my computer, right? And I'd be like, okay, yum install Kubernetes. And there'd be a binary. And then I'd try to run it and it wouldn't be on my path. I'd put it on my path. I would point it at other nodes, sit back and relax. My monitor would glow. Angels would sing. And I'd bask in the glow of magical container orchestration. Except that's not the way it works. Kubernetes uses distributed systems to provide the aforementioned magical container orchestration. It's made of several components and each of them is worth their own deep dive. Each of them does their own pretty neat stuff. And each plays a crucial role in running and operating applications. So let's take a look at some of them now. We'll talk in more detail about each of them in a moment. But I want to introduce them to you before we get too far into things. So the four components on the slide here, SCD, the API server, the controllers and the scheduler, make up the control plane. These run on the master nodes of the cluster. So starting off, SCD. SCD is a distributed key value store that serves as the cluster's database. Now having access to SCD, read and write access, is equivalent to having read access in your Kubernetes cluster. So to control this, we stick in front of it a component called the API server. Other components in Kubernetes talk to SCD through this API server via good old REST calls. Next up, the controllers. So the controllers handle routine cluster tasks. They operate on what is known as a reconciliation loop pattern. The controllers will check your desired state, check your current state, diff the two, and then modify your current state so that it matches the desired state. There's a variety of different controllers, and we're going to explore a couple later on. So last up is the scheduler. As you might surmise from something that has things scheduled on it, there's a scheduler. The scheduler is in charge of scheduling unscheduled pods. On the nodes, Kubelet is the star of the show. Kubelet watches for nodes that have been assigned to its pod, and then runs them. Kubelet is constantly checking at CD, by extension the API server, as well as its local config for pods to run. It's constantly going, hey, at CD in the API server, you have any pods for me to run for me? Any pods? Any pods? Great. Any pods? Kub proxy on the nodes handles any networking things needed. It's in charge of maintaining port forwarding and managing networking rules. And since we're running containers, we need some sort of container runtime, whether it's Docker, Rocket, Cryo, what have you. So these are the components of Kubernetes. At CD is the data store. The API server provides access to it. The controllers handle running routine cluster tasks, and the scheduler schedules. Meanwhile, Kubelet is listening for pods to run on each node. Kub proxy is maintaining network rules, and there's a container runtime running containers. So whenever you're talking about the distributed components of Kubernetes, you've got to start with at CD. At CD explains so much about Kubernetes behavior. So as I mentioned earlier, at CD is a distributed key value store that serves as the database for our cluster. If you're familiar with console or zookeeper, it's pretty similar to those as well. So distributed implies that there's multiple nodes running, all working together to store data. So in order to operate, majority of the nodes in our at CD cluster need to be up, running, and healthy. So we refer to this minimum number of healthy nodes as a quorum. So what happens if there isn't a quorum? What does at CD do? To look at that, we're going to look at everyone's favorite distributed systems concept, the cap theorem. Yay, cap theorem. All right, the cap theorem. So it's impossible for a web service to provide all three of consistency, availability, and partition tolerance. So we're going to define consistency as always reading the most recently written value for a given key. We'll define availability as every request to a non-failing node receives a response. Partition tolerance will be defined as a system, our system can handle the dropping of messages between two nodes. So in the schema provided by the cap theorem, at CD is a consistent partition tolerance system. So imagine a five node cluster. If three of our nodes take a dive, the other two are going to stop responding to requests until the quorum has been restored and we have more than the minimum number of healthy nodes. So why does Kubernetes use at CD? Well, if your goal is to have a distributed system for running containerized applications, you need something that plays nice with clustered systems. So at CD's docs say that once again, big quote, at CD is designed for large scale distributed systems that never tolerate split brain behavior and are willing to sacrifice availability to achieve it. Split brain behavior is when you have multiple masters. Each one thinks they're in charge and they're each setting their own state. So you don't have distributed consensus and you're disagreeing on what the state is. Not great for Kubernetes. So you can see the cap theorem references in this quote though. At CD is going to sacrifice availability and consistency. Now I think there's an important clarification to make here, looking further ahead, right? So at CD is running as the Kubernetes data store and drops below the minimum number of healthy nodes and is forced to sacrifice availability. That doesn't mean that all the applications running on your cluster, your microservices, your engine X service, servers, your reticenses, they're not all suddenly unavailable. What it means is that Kubernetes, the platform is unavailable. Your ship is dead in the water and you're in a great state you're in, but your ship's not sinking. You're still taking traffic, but you can't schedule new pods, add new deployments, or do anything that's going to require a consensus in at CD. Your existing applications will still be running and taking traffic. So at CD does the typical CRUD operations and a couple other things on top of that via REST calls or a command line tool. But the simple interface hides the answers to some complex problems. How do the nodes agree on what the value is? We've got three nodes out there. I alluded to distributed consensus earlier saying all the clocks said it was two o'clock. But how does that happen? If not done smartly, things can go awry pretty quickly. So how does at CD agree on the value for a given key while still upholding its consistency guarantee? It's the raft algorithm. That's a really great logo by the way, I like that. So the raft algorithm is a method of achieving distributed consensus. Having multiple distributed servers agree on what the value is for a given key in a fault tolerant manner. Now there's a really trivial way to solve this. You can just say, you know what? The answer is always three. Everyone agrees on it. It doesn't matter the time, doesn't matter the date, doesn't matter where you are. Most formal papers though for distributed systems say you can't do that. I really think that it started with some exasperated person somewhere arguing with somebody. But let's take a look at a not raft system to illustrate the difficulties in distributed consensus. The raft, it's not at CD, it's been specifically designed to show how distributed consensus can be difficult. Don't go away and implement this and run it in production. Bad idea. So in our example system, we're going to have three nodes, A, B, and C. Each one holds a single value. Our nodes will receive requests from a client, they will write to disk, and then broadcast the new value out to all other nodes. So let's examine some not great scenarios. What happens if there's multiple updates to the value at the same time? So here we go we've got our three nodes, A, B, and C. Everyone agrees the value is X. Everyone's in agreement, things are happy, things are great. And then, oh no! Two clients come in, client one tells node A, hey the new value is Y. Client two tells node B, hey the new value is going to be Z. So A and B each write the new value to disk. A writes Y to disk, B writes Z to disk. Meanwhile, C doesn't know anything yet, C is in for trouble. So A broadcasts the new value Y and B broadcasts the new value Z. So we're going to assume that A's messages arrive before B is just via some arbitrary ordering. You can see here we've got problems. A and C think the value is Z, whereas B thinks the value is Y. We have not achieved distributed consensus. Not great. So let's take a look at another scenario. In this new scenario, same baseline. A, B and C are all in agreement that the value is X. Things are going great. Then, oh no! The cluster undergoes a network partition. C is unable to talk to A or B and A or B can't talk to C. All the messages to and from C are being dropped. Meanwhile, a client comes along not aware that the cluster maybe isn't in great shape and says hey, hey guess what? The new value is Y. So A broadcasts this out and says hey everybody, the new value is Y. B gets it successfully and is like okay, got you. Meanwhile, the messages to C are being dropped. So C doesn't find out. Meanwhile, time goes by and C recovers. But C hasn't gotten the messages because the messages were lost forever. And so A and B think the value is Y, whereas C thinks the value is X. Sad. So seemingly simple systems can fail in all sorts of fun ways when exposed to concurrent operations. In order for consensus to be achieved, we're going to require greater coordination between our nodes. We can't just fire and forget like our terrible example system. This is where raft comes into play. So raft is a consensus algorithm for managing a replicated log. A replicated log is a series of commands executed in order by a state machine. So we want each log to have the same commands so that the end of the day our log has the same commands in the same order and each machine will have the same state. So to do this, raft elects a leader. The leader is put in charge of managing the log, accepts entries from clients, replicates them to the other nodes, and then tells the nodes when it's safe to actually commit them. If a leader fails, a new one is elected in its place. So my mom always said to me, Bo, you can be a leader or you can be a follower. Generally I was in trouble that often included the statement, if all your friends jumped off a bridge, would you as well? But in raft, you have other options. You don't have to be just a leader or a follower. You can be a candidate as well. So nodes can be in three states. Followers just hang out and kind of respond to requests from leaders. Leaders like, hey, replicate this, and followers are like, all right, gotcha. So leaders handle all the rights and handle all the fun coordination between the followers. In our third state, candidate, candidate is the state used to elect a new leader. When a leader goes away, or disappears or whatever, candidate is the state used to elect a new leader. A node will declare itself as a candidate and say, hey, vote for me. So we talk about leaders. Part of dealing with ordered time stamps is dealing with time. Raft uses an incrementing integer time stamp called a term, which is tied to an election. Terms last until there's a new leader and there's one leader per term. Terms serve as a logical clock and aid in detecting obsolete info so that if a leader finds out that there's another leader out there with a higher term, the leader with a lower term will stand down and be like, okay, not in charge anymore. So the leader sends heartbeat messages out to the follower nodes saying, hey, still alive. Happens every fairly frequently. Just like, hey, still alive. Hey, still alive. So what happens if a follower doesn't get a heartbeat? Well, it immediately assumes the leader's dead. And that's a dark but effective assumption. Our follower will increment its term and declare itself as a candidate. That means it's election time. Our new candidate is going to send an RPC request out for votes to all of the other servers. So we heard yesterday morning about how HBO uses Kubernetes with Game of Thrones. Well, the saying in Game of Thrones was, you win or you die. In the game of Raft leadership elections, the stakes are lower. You win or you lose. Servers vote for the first candidate or votes assuming the candidate is at least as up to date as the voter. This means if the voter has entries that the candidate doesn't, the voter isn't going to vote for that candidate. Now, if the candidate gets a majority of the votes, it wins. If no one gets a majority, there's another round of elections and Raft uses randomized timeouts to make sure that eventually somebody will win, assuming your cluster is in an okay state. So after the leader gets a majority of its votes, it sends out a new set of messages to notify others of its victory and life in the cluster goes on. So in Raft, how do writes work? Well, writes all go to the leader. The leader gets every right. The leader then appends the command to its log locally. It then tells other servers via RPC to append it to their logs as well. Now, if a follower is behind, it's going to say, nope, I'm behind. Can't handle this right now. And it's like, okay. If there's a cluster behind, spoiler alert, nothing's going to happen. So once a majority have appended things to their nodes locally, so the leader has it appended and a majority of the followers have it appended, the leader commits it to its own log locally. Now, in subsequent messages to nodes, the leader is going to inform the other nodes, the followers, of the index of the last committed entry. So nodes are going to notice this. I'm like, okay, you've sent me everything up to 12 and you say 12 is committed and I haven't committed 9, 10, 11 and 12? Well, I'm going to commit 9, 10 and 11 and 12. So it's going to commit everything up to the last committed index. Now, this solves problems in our bad system from before. Our simultaneous write scenario is solved by the requirement that all writes go to the leader. The leader is going to receive each write and each write is a new entry with a new index. So if two nodes manage to somehow try to update a value simultaneously, each one is going to be a new write index. One will occur before the other. So that way we don't have the situation where, you know, two nodes think it's one value and one node thinks it's the other value. In our network partition scenario, followers will reject requests if they're behind and will eventually be caught up. Also, requests go to the leader and our dead node can't become leader because it's behind. The effect is minimized and also if we really care and we don't ever want this value to be stale, it's a consensus read. It's slower, but it makes sure that the value isn't the majority of nodes before it's returned. So we talked about the cap theorem a little while ago and how Raft and NCD is a consistent partition-tolerant system. Well, this is achieved through acquiring a majority of the nodes to act. Elections require a majority of the nodes to agree to elect a leader. All writes require a majority of the nodes to replicate the transaction in order for the transaction to be successful. If we lose more than half the nodes, they weren't able to do anything. So that was a brief overview of Raft. I do semi-deeply into it, but I didn't cover all the safety guarantees and some of the more formal things. I think the Raft paper, the formal paper is actually a really great read. I'd highly recommend checking it out. Also, for the visual-minded of you and really for everybody because this is a really great visualization, there's a visualization out there called the Secret Lives of Data which shows the way Raft clusters interact in various scenarios. It's definitely worth checking out. So moving on, controllers. Controllers are a loop that watch cluster state and make changes to ensure that we keep the desired state. So we're going to take a look at two controllers right now. The replica set controller and the deployment controller. The replica set controller is in charge of making sure that for any given pod spec or any kind of pod image thing, there's a given number of pods running at any time. And these actually suggest you don't interact with replica sets directly, so we'll instead use the deployment and the deployment controller to manage them. Now the replica set controller knows which pods to manage through labels. So we'll define labels on the pods we create and tell the replica set which labels it should look for so that it knows which pods to manage. Well, what's a label? Labels are just key value pairs and the replica set has a selector, another sort of key value pair to manage. Now the deployment controller. The deployment controller manages the whole deployment process of your app. So the deployment controller provides a declarative way of managing pods and then uses that to roll out your desired changes in a friendly way. So deployments will handle like rolling deploys, you can roll back your application if it's not going well. So we talked about the reconciliation loop a little while ago. It's how the deployment process is managed. Reconciliation loop is going to check and then modify the current state so that it matches the desired state. So if we're rolling out, say, a new version of the deployment, the deployment controller's reconciliation loop will change the desired state, either counts, of the old and new replica sets. You know, the old ones will go down, the new ones will go up. This will get picked up and then the replica set controller is going to handle tearing down old versions and spinning up new ones. That was a semi-brief overview of deployment controllers, but we're going to see them in action in a little bit. So the scheduler. The scheduler watches for unscheduled pods and assigns them to a given node. What it's looking for specifically are pods without a node name. It's constantly querying the API server and, by extension, looking for these unscheduled pods. One thing to note is that the scheduler isn't actually in charge of running pods. It's only in charge of assigning pods to a given node. So it does this by a scheduling algorithm. First off, it filters out nodes that aren't desired or just aren't a great fit. It then ranks the remaining nodes and picks the top-ranked node after ranking is complete. So step one, we're filtering against predicates. We're filtering out nodes that aren't desired or not a great fit. I think the simplest scheduling strategy is to try to assign a pod directly to a node. So you can do this via a node name specified in the pod spec. The host name predicate is in charge of the node name to the node name specified in the pod spec, if any. So it's going to exclude every node that doesn't match. And it'll let us schedule directly to a node assuming it passes the other predicates. There's some resource aware predicates later that we'll check that could prevent it from scheduling. So match node selector is kind of a variant of host name. Kubernetes lets you provide label selectors to associate resources using labels. This predicate checks whether the node selector matches the node. For example, let's imagine that you only want to schedule on high bandwidth nodes. You want the network power. You could specify a high bandwidth label, put it on your node, and then add a selector to your pod spec to only schedule on nodes with this label. So there's also the concept of affinity, which is a more expressive and flexible version of the node selector concept. You're not just limited to an exact match. You can do set matching and all sorts of fun things. Also, you can match not only on nodes, but also on nodes with affinity. Accordingly, there's a predicate that checks whether or not the pod plays nice with the affinity settings of the pods on that node. The inner pod affinity matches predicate. So taints are kind of the inverse of affinity. We can specify a taint that tells pods to avoid certain nodes. Pods won't schedule to nodes with taints unless the pod explicitly tolerates them. So if you didn't want to schedule to a node, for example, you could add a predicate to it, unless they specifically add a node schedule toleration. The pod tolerates node taints predicate, checks whether or not the pod has opted in to tolerating a given node's taints. So the schedule, like I mentioned, also checks the resources of each node. The pod fits host port predicate checks whether or not the host port, a hard coded port specified in the pod spec, is available for that node. If it's not available, we're going to filter out that host from being scheduled to nodes for this given pod. So pod fits resources is another resource I'll wear check. Pods can request a given amount of CPU in memory. This predicate checks whether the node is capable of satisfying that request. So check node memory pressure and check node disk pressure or some other predicates. They won't schedule on to nodes whose memory usage or disk usage is too high. Check node condition is a more extreme version of this. Check node condition prevents pods from being scheduled to nodes who are unavailable or network unavailable, who aren't ready according to Kubernetes, or are just playing out of disk space. Those are conditions that would make a node suboptimal for scheduling to. I didn't list them here, but there's also some volume checks. So if you're on a cloud provider, there's often a limit to how many volumes you can have attached and make sure that we're not using more than the allowed number of volumes, or maybe that there's no conflicts on volume claims. So after we filtered out our nodes, we have a set of nodes that are valid for scheduling. Some are probably better than others. Now, you could also be left with no nodes, in which case your pod is unschedulable. In that scenario, you're going to need to go back and modify the pod spec or maybe wait for some other nodes to come up or take some sort of action possibly in order to make your pod schedule again. But if you've got a set of nodes that you can possibly schedule to, it's time to rank. Ranking applies a series of priority functions that return a score. A higher score is more desirable. So functions are running against each node. They're all added up and the node with the highest score is the winner. Ties are broken randomly by picking a winner. So here's some of the ranking functions. So when you think of scheduling, right, you want to put your pod on maybe the least used node. So least requested priority helps out with this. It calculates how much CPU and memory equally weighted and added together would be left after scheduling our pod to that node. So this kind of helps with balancing resource usage across the cluster. Now, balanced resource allocation is a more focused version of this. It's attempting to prevent nodes from being largely weighted towards CPU usage or memory usage. It's specifically trying to avoid nodes with like 95% CPU, but only 5% memory. So it checks to see how the pod would affect the balance of resource usage on that given node. This function is going to favor nodes who would have CPU utilization closer to memory utilization after scheduling. So next up is selector spread priority. We lose a great chunk of the benefits of having multiple copies of an application. They're all sitting out there on the same node. That node goes down. You'll lose every copy of your app that's running on the node. There's a metaphor about eggs and baskets I think that's really helpful here. This function minimizes the amount of pods from the same service or replica set on the same node. Causing are currently being scheduled pods to favor nodes without its managed siblings. So the image locality priority function. Well, containers have images, and oftentimes we have to download them. This function prioritizes nodes that have already downloaded more of the pod's images. If you've got them all, you're going to get the highest score. If you've got none of them, you're going to get the lowest score. So node affinity priority and taint toleration priority. Node affinity priority favors nodes that match the pod's nodes affinity preferences. Taint toleration priority prioritizes nodes that have taints that are specifically tolerated by the pod. If you've gone out of your way to request some affinity or explicitly tolerate something, Kubernetes is going to notice that and say, you know what? You're going to get a little boost to schedule into this node. So what happens when we submit a deployment to Kubernetes? We've talked about all these components, but let's see how they work together. So I've got this deployment here. I talked together a really quick Hello World Go app the other day. It's going to be listed on port 8080. I've labeled it with the tag app Hello World. And I say I want three of you. And I've also specified a service. A service will select pods and provide access to them. If you see the spec selector app field, I've given it the tag app HelloCubeCon. And so the service is going to select that pods with that tag. We've also said, hey, I want you to forward port 80 over to port 8080. And I want you to create a load balancer for me. So how are we going to actually submit this? I'm on TeamCubeCuttle, by the way, for pronunciation. There's differing opinions. I believe CubeCuttle is the best one. So we use CubeCuttle to get our app running in Kubernetes. It's how we interact with the cluster. We can view resources as well as modify them. It gives us a great view of what's going on. You can check the status of our pods, our deployments. But really importantly for this section, use it to construct a timeline of events. So what do we expect to happen here? Well, we're going to create a deployment, you know, CubeCuttle create-f deployment.yaml. Our deployment is going to create a replica set. A replica set will create three pods. Our scheduler will schedule those three pods. And then kubelet, which is looking for pods to run on each node, is going to run the scheduled pods. So what actually happens? Now before we go on, I just want to warn you, these are real events. They have been formatted to fit my screen. And viewer discretion is advised. So starting off with, we can see here the involved object kind. So our deployment is the involved object, the one that I just created. Assuming I just gone kubectl create-f. The message is that we've scaled up the replica set. Okay, well I guess that means we created a replica set as well. So we created a replica set for our deployment and set it at three. You can see also the source of it. The deployment controller has done this. It's noticed via the API server and etcd. Hey, a new deployment has been created. I should check that out and do things accordingly. The reconciliation loop here. What's the current state? What's the desired state? The current state is there's not a replica set for this. The desired state is there should be a replica set for this. Let's create it and then scale it up. All right, now we can see the source component, the replica set controller is kicked in. It's created a pod. We can see the involved object at the top as well as the replica set. Our replica set has created a pod for us. We've got two more to create. Before we do that, the scheduler is kicked in. We can see the source component is the default scheduler. The involved object is a pod. The default scheduler is doing something to the pod. It has successfully assigned this pod in WC7K, looking at the end of it, to a given node in GKE. That's pretty great. Then hey, look, the replica set controller is back. It's created a pod. 03HFH. You can see the involved object is the replica set. It's created a pod for us. Is it going to be scheduled next? It is! Yay! So, the scheduler is scheduled our second pod, and it's assigned it to 33JG. What's the last one assigned to you? 33JG. So that's interesting. We talked about the various functions in ranking things. One of them was selector spread priority. That's going to favor spreading things out to other nodes. Well, because they're all added together and they're all weighted, selector spread priority actually lost out here. I had some other things running in my cluster and said, even though we desire to have things in multiple places, this node is such a great fit that we want everything to go here. Luckily for us, this is a very, very, very tiny web app. It's basically a hello world. Let's come back and create our third pod. 05KV9. And it's successfully been assigned by the scheduler to our node. The same node, 33JG. So this is something different. Kubelet's kicked in. Kubelet has been like, okay, I've got a pod I've got to deal with. Well, first, I'm going to pull my image. So it's pulling boeing.grome slash helloHTP latest from Docker Hub. And it's downloading that. Oh, the second pod is having its images download. It's downloading that single image. It's downloading the third image. Why is it downloading at the same time? That's a great question. I assume it's because I probably shouldn't freestyle on this, but I believe it's because it hasn't completed the download on any of them yet. Or it could be my image settings on the deployment. So we successfully pulled the image. Yay, it's downloaded. We can see that we downloaded the pod. Like the involved object is the pod. Kubelet is the source component that's handling this. We can also see there's a third field involved object as well that we haven't seen before. Field path. So it's saying, okay, I'm trying to find the containers in this helloCubeCon deployment. And that's how it got the image name that it's downloading. So after it successfully pulled the image, it creates a container. And then it starts a container. So Kubelet has pulled an image, successfully downloaded it, created a container, and then started a container. So I expect to see this cycle two more times because we have three pods we need to start up, each with a single container. So we successfully pulled the image. We've created the container for our next pod. And we've started it as well. Third pod, yep. We successfully pulled the image. We've created the container. And we've started the container. So we did it. Yay. So now that you've navigated to your browser, you would see helloCubeCon. Kelsey did almost the exact same thing yesterday morning. I just want to say I had my GitHub out there or my Docker Hub container out there three days ago. But either way, great minds think alike. I am honored to have the same thought. So what have we done today? So we've looked at the various components that make up Kubernetes. We've shown how Kubernetes handles distributed state. We dove into how we reconcile and schedule state and schedule pods. And then finally, we traced the deployment through the system. So we're out of time, but I learned a ton putting this talk together. And I hope you all learned something today as well. Thanks.