 Hi, I'm Rodrigo. I'm software engineer at Microsoft. Hi, I'm Marga. I'm director of engineering at ISOvalent. A quick show of hands from these people in the audience. For which of you is the first KubeCon? Wow. All right. Welcome. Welcome everybody to KubeCon. And welcome also those that are veterans. We are really glad that you are here. This talk is intended for people that are still thinking of managing VMs rather than containers or perhaps you've started thinking about managing containers but you need a little nudge to make the shift to cloud native. In fact, I'd like to introduce you to our friend Taylor who is a system administrator that is currently managing VMs. They use infrastructure as code using Terraform and Ansible and it works. They have an application that has a front end, the back end, some load balancing, some firewalls. And as I said, it mostly works. But as the application becomes more complex, it's getting harder for them to keep up. So they have been doing some research into containers and coordinates and they have decided to migrate but they need a little help. So we are here to help them make that transition. So let's start at the beginning. Rodrigo, what makes containers so special? So containers are lightweight and very easy to migrate from one node to another because they contain all the necessary code to run our applications and they are not tied to the underlying host. So you probably heard this before but we'll go over it one more time quickly. Containers help us to treat our applications as cattle rather than pets. When we treat our applications as pets, we usually, for each application, we create a VM image with a very specific version of Linux with, I don't know, Ruby 3.1, no other version, all its dependencies. And we treat each VM with a lot of care. We spend a lot of time to upgrade them with a lot of care. If something breaks with SH into them to fix it, when we treat our infrastructure as cattle, like nodes just can come and go and if a node fails, we just move the applications to another node. Right. And this works really well when our application is just one container that we can move to the next machine. But as our applications become more complex, we can have a front-end service, a back-end service, some database, some blob storage. We add load balancing. We want high availability. It all starts to become quite complex, quite fast. Right. That's why we want to automate more tasks. We want that for us to be roughly the same effort to keep one replica of our application than to keep n replicas of our application. And this is where Kubernetes can really help us. If a node goes down, Kubernetes can just migrate the applications to another node. And we maybe don't even receive a page, or if a whole data center goes down, if we have nodes in another data center, Kubernetes can just migrate the applications there to another data center, and it just works. Yeah. That sounds really great. So how do we make that happen? So to make that happen, we really want automatic health checking. So if something is unhealthy, Kubernetes can take an action to restore it, like restart our apps. And Kubernetes can take this action on our applications, but also with nodes. If a node is unhealthy, just migrate the applications there to another node. And we want to integrate this health checking with the load balancing. We want load balancing at every layer integrated with the health checking, so we only route traffic to healthy replicas. And we remove unhealthy replicas as soon as possible from the rotation. And this reduces the burden we carry significantly, makes our applications more resilient, and the paradigm shift here is just to let Kubernetes take care of this for us. So, Kubernetes is an additional abstraction layer that will help us to decouple our applications from the infrastructure where they're running on. Continuous work, the first help with this, but Kubernetes further improves upon that. Like, Kubernetes can help us to schedule, to attach the volumes to the nodes where the pod that needs those volumes are scheduled, or will help us with the networking so our apps can communicate between each other. Or we can just even say, like, create me a load balancer. And it doesn't matter if we're running in Azure or GCP, if we're in an Azure, it will create an Azure load balancer. If we are in Google Cloud, it will create a Google Cloud load balancer. All right, so this sounds great, but it's actually starting to sound a little bit too good to be true. So, where's the catch? So, the catch is the complexity, right? So, Kubernetes is a non-trivial abstraction layer. We're making heavy use of features like Linux, C Groups, Namespaces, OverlayFS, and we're adding all of that in our hot path. So, we really need to learn how to use Kubernetes, keep it up to date, and more importantly, debug it when it fails. Right, all right. So, we know there's no free meal, but we are here to help our friend Taylor make the migration. So, they've already started by packaging their application into containers. There's a front and on the back end container that represents what they want to run. So, the first step on their journey to moving to Cloud Native is to deploy this application as a Kubernetes deployment. Creating a deployment is basically writing a bunch of YAML that tells Kubernetes what we want to run. In this case, this example for the backend deployment tells Kubernetes that we want to run five replicas of the backend image. Right, and the key concept underlying most Kubernetes controller here is that in the YAML, we specify what we want, like five replicas, and we just let Kubernetes make sure that it's always true. Right, and this is not a one-time operation. So, if a pod crashes or a node goes away, Kubernetes will ensure, however it decides that this needs to happen, that we always have five replicas, right? So, if one crashes, it will start one. If the node goes away, it will move whatever pods we're running in that node to another node, and it just keeps that state. Right, and when we release a new version of our application, like version 2.0, we also specify our desired state in the YAML. We say we want to run this, and we just let Kubernetes do the rollout for us until all replicas are up to date running version 2.0. So, now we have a deployment with our backend pods, our frontend pods, but as we know, pods IP change frequently. So, how do we connect the backend pods to the frontend pods if their IPs are changing? Right, so, for that, we use the concept of a service. A service gives us a stable IP that the different parts of our application can use to connect to. Our frontailer here will need to define a frontend and a backend service that will define which are the pods that are part of that service. And notice that the YAML files for the services don't include how to run the application. That's what we specified in the deployment YAML. In the service YAML, what we include is what are the pods based on their labels using the selector field. So, this is not how we run the application, but how we distribute the load to the replicas that are running the application. Okay, so this service YAML is enough to expose our applications to the internet? Not yet. We, to decide if an application is exposed to the internet or just internal to the cluster, we have to specify the type field in the services. The default type is called cluster IP. And it has this name because the IP assigned to the service is internal to the cluster. It's not visible to the outside world. There's a component in Kubernetes called Qproxy, which is the one that does the load balancing. So when a request comes into this cluster IP assigned to our service, Qproxy will distribute the traffic across all healthy replicas of the service. So this is the type that Taylor would set in the backend service because it's an internal service that is only visible inside the cluster. Okay, but how do the frontend pods know which IP is assigned to this service so they connect to the backend pods? Right, so for that we use DNS when a service comes up, a DNS entry is created, mapping the name of the service to the cluster IP. So internally any applications that want to connect can connect to that service name that will get mapped to the cluster IP. Okay, so this is for internal services. Again, how do we expose it to the internet? To expose it to the internet, we will need to choose a different type. For example, we can choose the node port type. It has this name because the service is available through a port that is exposed on all of the nodes of our cluster. And again, the magic is done by the Kubernetes component called Qproxy which when a request comes in on a specified port, it distributes the traffic to the service that is mapped to that port. Notice that for readability in this diagram here, we are not crossing boundaries across nodes, but actually Qproxy distributes the traffic to any node in the cluster. It doesn't matter which node our request comes in through. So if it comes in through node one, but Qproxy decides that actually it should be served by a port in node two, it will redirect it to the port in node two. So this is the type of service that Taylor could set for their frontend pods. And for example, it could go to the port 30,500. We actually don't need to specify that port. Kubernetes will set it for us. We're just putting it there for clarity. Okay. And how do we make traffic go to nodes on this very specific port that is not the HTTP port? Right. So to do that, we could use, for example, we could use a legacy load balancer, a load balancer that is not part of the Kubernetes infrastructure and is not managed by Kubernetes. This can be the load balancer that we were already using before, and instead of pointing it to VMs, we point it to the nodes in the Kubernetes cluster with the specified node port, and then everything else should stays the same. Another option is to directly set the type load balancer. In this case, it's similar to the node port service type, but it will create and manage the load balancer in the cloud infrastructure that we are running on. And once we create it, we can query the IP that this managed load balancer got by, for example, checking the status field. Okay. So we have the service type load balancer, but that is not the only type of service that balances the load across pods, right? Exactly. This name can be a little bit confusing at the beginning, but it's called load balancer because it creates an external load balancer that is part of the whole Kubernetes management. But all of the services that we set, cluster IP, node port, and load balancer, balance the load across all healthy replicas of our service. All right. So Taylor's application actually got a little bit more complex. On top of their already existing application of front-end and back-end, they also have a blog where they publish what's going on with their system. And this blog is actually running on a separate application, a separate container. So what can they do? Right. So up to now we've been dealing with services that are layered for. So we can basically specify things at the TCP or UDP layer. But if you want to check the HTTP layer, like the request path to slash blog, we need to do this routing at layer seven. So in Kubernetes, we can use an Ingress resource for this. Like this will be an example for Taylor. This is again a bunch of YAML and where we can specify if the request goes to slash blog, then it will be served by the blog service. And otherwise it will be served by the front-end pods. So the Ingress controller can be a resource that runs inside our cluster, like in this diagram. It can be, for example, an nginx. Or it can also be a cloud resource, a layer seven load balancer that runs in the cloud provider. In the second case, there will be also like a lightweight Ingress controller that will just keep in sync the cloud load balancer with the pods IPs and things like that. All right, you mentioned nginx. nginx is how we would do this in the VM world, right? Right, so we probably already have an nginx configuration doing this routing and we just need to map that to the Kubernetes Ingress resource. All right. Okay, so we talk about services, we talk about Ingress. Now our front-ailer is getting a little bit excited about autoscaling. So they are already familiar with autoscaling from the VM world. They have their VM set up so that when the CPU goes up, they get more and when it goes down, they scale down. In the container world, it's a similar concept, but we have more dimensions. We can scale the number of pods, the size of the pods, or the number of nodes in the cluster. Right, so what we're used to do in the VM world is looking at the CPU usage of the nodes to scale. But that doesn't really work now with containers. Like in this example, the CPU usage of the whole node is around 20%, but the backend pods are completely overloaded. So what can we do instead? So we should look at the CPU usage of the pods, not the node, to scale. We do horizontal scaling this way in Kubernetes. Okay, and so how do we do it? Again, with a bunch of YAML. Like in this example, we put a hard limit of minimum 3 replicas and maximum 10 replicas of our pods, and we target an average CPU usage of our pods around 70%. So if the CPU usage goes above this, we'll create more pods, so we'll just remove the pods to not white resources. Is CPU the only metric we can use? No, memory is also a built-in metric, but we can also use any custom metric we want, like we can scrape from Prometheus, for example. And what can happen if we scale too much? Right, so if we have a lot of demand, we'll start creating a lot of pods and eventually we might run out of capacity in our nodes to schedule those pods. So this is where the cluster of the scaler kicks in. It will look for pending pods and it will create new nodes in our cluster to schedule them and so we can satisfy the demand that we need. Okay, so we already covered deploying our app, exposing it with services and now scaling it. Something that is bothering our front-tailer here is firewalls. The thing is, we're used to, in the on-prem world, writing IP tables using the VMs IPs to limit connectivity and we do something very similar with security groups in cloud providers. The thing is with containers, we don't know the containers IP and we don't even know in which node a container may run. It may be running one node today and another node tomorrow, so it's very hard to write the rules as we are used to in this new world. So what can we do? So instead what we do is we write network policies. In our network policies, we don't pick our containers by IP, but we pick them by labels. So we can say some pods from some labels, we only want them to receive traffic on a given port or we only want them to receive traffic that comes from pods with other labels. Right. So again, we do it with a bunch of YAML. This is an example where the backend pods will only accept traffic from pods that are labeled with app name frontend and only on this very specific port. Right. So how do we find out which pods should communicate with which other pods? That's tricky, right? Yeah. If we don't know it beforehand, it can get very tricky. We can use some tools like Inspector Gadget is a tool that can monitor the traffic across our cluster and generate network policies for us. Indeed. And if this is the default Kubernetes network policy that allows us to filter by labels, as I mentioned, but if we need more advanced filtering, for example, we want to do some layer 7 filtering or we want to block accessing like the metadata IP that our cloud provider has. For that, we will need to use more advanced network policies like the ones provided by the networking layers like Calico or Sillium. All right. So now that we applied network policies, what do you think? Is our cluster fully secure? No, not really. We need to apply several layers of security so we can reduce the impact of any successful attack and reduce the attack surface also. This talk is not a security talk. We will cover a few security strategies, but please keep in mind that there's a lot more to cover. So we invite you to do some research on your own to be sure that you have covered all the security concepts. We will divide our security recommendations in three layers, minimizing the privileges given to our pods, minimizing the vulnerabilities in our container images and minimizing the access that we give to users and service accounts. Right. So something we probably didn't do in our VM world is to run our applications as root. But when we migrate to containers, for example, Taylor here created a Docker file based on the Docker documentation. It seems very simple straight to the point, but what is hidden underneath is that Java here runs as root. That sounds really bad. It is. All right. So what can we do? So in Kubernetes, we can use the run as user and run as group directive so we can force the container to run as this user and group. Yeah. Yeah. And that works if our application can work as an unprivileged user, but unfortunately many containerized applications kind of expect to be running as root. So what can we do in that case? That can get tricky. There are several options. One is to use a feature called user namespaces. I'm actually working on adding that to Kubernetes. It's actually an alpha feature. And it basically tricks the application into thinking it's running as root. But this is only root inside the container. From the host point of view, it's an unprivileged user. So if it escapes or anything, it doesn't have any privileges. We gave a talk about user namespaces at the last KubeCon. So if you want to learn more about that, we invite you to like check that out. Okay. All right. Sorry. Okay. So the next step is to minimize the vulnerabilities in our container images. Taylor, coming from the VM world, may fall into the trap of thinking that the repositories in the internet like Docker have or Quay are actually trustworthy like the repositories from the Linux distributions. But unfortunately, that's not the case. Anybody can upload containers there. And so we can end up with containers that have vulnerabilities that are not up to date or even that include malware, like crypto mining software or software that will exfiltrate our credentials. So what should Taylor do instead? So Taylor should probably provide some golden images to use as base for all in-house applications. And to build these golden images, we can just build them from scratch using tables or we can use something we can trust, like there are some images in Docker Hub that have the official Docker image tag. But the important thing here is to use something we can trust and that will release a security update if a security issue is found. Also, like if we are providing golden images, we can, for example, take the chance to configure locales correctly. So extended characters are shown correctly in logs or we can add some debug tools that we'll need later. We also want to minimize the attack surface by adding a second profile. The simplest way to add this is to use the runtime default profile that is shipped with containers. This will basically allow all the syscalls that we will really need and just deny some, like, dangerous syscalls. And starting with Kubernetes 127 that was released like a few days ago, we can just apply this cluster-wide and not modify all the deployment YAMLs that we're doing here. Right, and if we need something more specific, we can also create our own customized second profile. This could help us if we want to really, really limit the amount of syscalls that are executed by our pods and we can use the tool that Rodrigo mentioned earlier, Inspector Gashit, to figure out which ones are the calls that our pods should do and then only allow those. All right, and the last layer of defense that I mentioned earlier is minimizing access. So it kind of can be tempting when you're starting with Kubernetes to just do everything as cluster admin, so using the cluster admin role, which has assets to all of the cluster. You can create new pods on any namespace. You can add or delete things, no limits. But using the cluster admin role is equivalent to being root on a VM. So if you couldn't give root to your developers, then you shouldn't give cluster admin to your developers. You should instead use a tool called RBAC, Role-Based Access Control, where you limit what people can do depending on their roles. So for example, in Taylor's case, they could give the developers access to the dev namespace, but no access to the prod namespace or very limited access to the prod namespace. All right, and with that, we've reached the end of our talk. We saw a bunch of different YAML files like deployment and service and ingress and horizontal pod autoscaler, network policies. It might be a little bit confusing, but the key takeaway... Yeah, the key takeaway is probably that Kubernetes is an abstraction layer. We can use it to make a lot of tasks. We can fit it with YAML where we specify how we want things to look like. And it doesn't really matter if we're running in Azure, in Google Cloud, in AppTorrent, or in Berlin. Kubernetes will just take care of making a reality what we desire. All right, so that was... Yeah, the tip of the iceberg of all of the things cloud-native, but we hope it was good for you as a very quick introduction. We've included a bunch of references in the slides that we uploaded to the system, so if you want to follow some links, they are in the sketch link. And we would love to hear your questions now if you have any. Or not. No questions. Thank you very much. Thank you.