 Good afternoon. I'm Floris. I've been a Python developer for quite a while now. In my spare time, I'm quite often... Sorry, can everyone hear me properly? Should I stand closer? Is that better? Okay. Sorry. Okay. So, yeah, I've been involved in Python for quite a while. I work on open source as well, kind of usually around PyTest, et cetera. I have very recently changed jobs and I'm now a site reliability engineer at Google. While Kubernetes is a Google originally came out of Google, this is not actually a Google project, Google Talk, sorry. This is largely based on my experience from my previous job where I was working with microservices in Python that we ran on Kubernetes. A little note about the title. So, Kubernetes is part of the Cloud Native Computing Foundation, CNCF, and that's kind of where the title kind of came from, so Cloud Native Python. Okay. So, we'll be... This is kind of the contents that we'll be covering. So, I'll start with a very, really, really brief introduction about Kubernetes. Probably the shortest introduction you can ever have, but hopefully that should be, if you're not familiar with it yet, that should be enough to kind of follow the rest of the slides really. Then I'll kind of introduce a little example. It's just a kind of traditional echo server, so it's a network server. You send a message, you get the same message back. And then we'll kind of take that example throughout the rest of the talk to sort of start modifying it a little bit. So, introduction to Kubernetes. This is kind of Kubernetes in one slide, which is a tricky thing to do, I guess. But the idea of Kubernetes is that it's a cluster or orchestrator, really. So, the idea is that you give it a bunch of machines, and it creates a cluster out of it, and then when you want to run your application, you just say, run my application somewhere in this cluster. And Kubernetes itself will decide where the right place in that cluster is for your application to run. And the kind of the aim that you're aiming for is the reason we want to build a system like this, especially with multiple microservices, and you have multiple instances of every service it tries to make a really resilient application. If one of the machines in the cluster is unhealthy or something, you can just take it down, you can fix it, replace it, and your application just keeps running. So, if something crashes, request will just be rooted somewhere else, and it's really resilient and always a kind of application. That's the end goal of what you're trying to do. We're trying to run in Kubernetes. So, the core concept of Kubernetes is like, you want to run your application, and your application, in our case, is just going to be a part of an application. And Kubernetes runs kind of containers, essentially. So, you need to containerize your application. So, right now, that basically means docker. In the future, hopefully that will be also like a rocket and the like. So, you create a container out of your application. I'll skip over that part today. And Kubernetes runs this inside of a pod. So, pod is kind of the smallest unit that Kubernetes will run for you. Essentially, it's just another app for your container. You don't really have to worry why they decided to create a pod. It's kind of the idea is to treat multiple... You can potentially put multiple containers inside one pod and treat it as one unit. Why you want to do that doesn't really matter for today. So, pod, you then tell Kubernetes, please run this pod for me, which is your application. Kubernetes will kind of just find a server in its cluster and start to run it for you. But the problem with pods is that they're kind of ephemeral. If the pod gets killed for some reason, an administrator goes rogue or a machine gets taken down or something like that, then your pod is gone. So, that's a far cry from our resilient application. So, the next concept that Kubernetes introduces is kind of this idea of a replica set. And a replica set is essentially will kind of continuously look at what's running inside of Kubernetes. And the idea is that in your replica set, you say, I want to run this many instances of my application. Whenever the replica set sees that your application isn't running that many times, it will try and make sure that that happens. So, if there's not enough instances of your application, then it will create more runs. If there's too many, it will kill a few. So, that means that machine gets taken down, the replica set will notice this and will create a new pod instead. So, this starts to create your application to be always there, especially if you request multiple instances. The problem then is, of course, again, these pods, they just run on a random machine somewhere in your cluster. You don't know anything about them. You don't know how to contact them. So, that's where the concept of the service comes in. So, service is essentially some sort of a fixed IP address. It's the easiest way to think of inside your cluster. And if you need to contact your application, you can contact it via the service. And the service will essentially load balance between all the instances of your pod and it will make sure if you send traffic to the service, it will go to one of the instances of your application that's running somewhere in the cluster. And these are really the basic things about Kubernetes that's kind of enough to follow. Anyway, I'm now going to say like everything else is bells and whistles. There is lots of other layers. They don't actually even recommend you use replicats sets directly at the moment. So, there's obviously lots more going on. But that's the core concept and that's a core concept that allows you to create resilient applications. So, that should be enough for today, hopefully. So, this is my little example application that I'll work with to address. It's basically an echo server. So, on the network, I've implemented it using 0MQ because any time you think of creating a TCP socket and you want to send and receive data, you should really use 0MQ instead because it takes care of a lot of the integrity networking details and you get whole messages automatically delivered to your application and you don't have to worry about everything else really. Other than that, this is fairly standard. I should point out that all the code I show is kind of slideware. So, I use globals. I don't show all the imports. You shouldn't write application like this, obviously. So, yeah, this application basically, the main loop essentially is like, I create my socket that I want to listen on. I then use this polar thing which is basically if you've done normal TCP programming select. So, essentially, it's asking the operating system like, can I sleep until the next message is available and then when the next message is available that will kind of come as an event and then I basically pass the server socket, the echo server socket to the event handler for that. The event handler will then basically receive the message and send it back to me. So, that's all this is. This is basically an infinite loop receiving and sending the messages. So, here are the few little helper functions to make it fit on the slide. So, create and bind really nothing, just some plumbing. So, create a socket, bind it so that we actually can receive connections and we register it also into the polar, again, using global. So, try this, it's not nice code. And the handler is where the actual work happens. So, we receive the message. Again, there's a little bit of 0MQ bookkeeping to split off the way 0MQ passes you the peer address. And then log the message and then send it back to whoever sent it to us. So, that's kind of the very simple application. And the first thing to kind of notice is that that's actually kind of sufficient. So, we can just take that application, put like, you know, the ifname equals main thing in there and etc. And we can just containerize that and run it in our Kubernetes cluster and that will just work. So, the first thing to sort of rely on when you're in Kubernetes is sort of just rely on the fact that you are running in an environment so you don't have to put complexity in your application. And it allows you to really have a very simple logical flow which is kind of what we just saw was like a super simple straightforward internal architecture. And that's kind of true for larger applications as well, you can't sort of skip through all the boilerplate, etc. And the first thing in that is like, I didn't write any exception handlers in that code and show it fits on a slide, but that's actually generally quite true. You don't really have to worry about exceptions because your application runs in multiple instances. And if you do get something unexpected, like I don't know, maybe the bind that could fail sort of the socket whatever goes wrong on the machine. Like, I don't really mind. My process will die and Kubernetes will just make sure that a new one gets created somewhere else instead and it will probably work. So, yeah, so you don't have to worry about that especially. You can even go as far as kind of doing that for when you're receiving a request from someone or the service. If this other service is completely internal to yourself, or your team or whatever is the author of the other service, then essentially you can treat that failed request as a bug. Sorry, invalid kind of schema of the request as a bug and you can basically crash again. If you're doing this for external applications, so if you're actually receiving user requests, then that is probably a bit too eager, like too brittle. So in that case you probably do want to catch the exception for request validation. Because there is depending on application, I mean in this case not very much, but there is some overhead in starting up your application again. And the other thing that kind of happens when you just crash and that you should take into account is that if you have network connection, so you're receiving messages from clients, those network connections, they will have buffers in them. So basically there will be requests queued up already in your local process. So your queue makes that really obvious because there is explicitly a queue with your socket. But even if you use raw TCP, there will always be internal kernel buffers. The kernel may even have accepted new requests already and just queued them up even though your application has no idea yet. There might be data still on the wire just coming over. And if you just crash, then you kind of lose that data. So those requests and whoever created those requests will have to then kind of wait and time out and retry. So that's not very good. So you want to take that kind of into account. And that kind of brings you to the more like how do I organize my messaging et cetera and you can kind of start playing. If you really don't want to suffer from that, you can start playing with message brokers like RabbitMQ or different systems like Kafka et cetera. But these all come with trade-offs expenses. The only thing I would say in here is basically be aware of when you crash, you may lose requests and make sure that's okay in the system you're designing. So in this slide, this actually shows how you actually would create your... This is the recommended way that they want you to create your pods. Deployment is essentially just a wrapper around this replica setting. The reason they have it is because it creates updating your application slightly easier. But as far as from our point of view, the two important things here is this line that says replica 3. That means we are requesting three replicas, so Kubernetes will always ensure that there's three of us running. And the other really important thing is the very last line that says restart policy always. That line will basically... That tells Kubernetes that if we're crashing, just start us again, please. So that's kind of the first step. The second thing is the script had no concurrency. And while it was a very simple example, this is generally true. You can rely on... You can keep your internal code really simple because the idea is to scale via the process model. Some sort of, if you've ever heard of 12-factor apps, kind of methodology. That's kind of the idea of, like, you just create more... You scale horizontally by creating more instances of application. As we just saw, that's kind of what you already did. We just create more replicas and they will handle the traffic. And that means that internally we can have really easy debugging, et cetera, because our control flow just gets really simple and we don't worry about any of the other stuff. Yeah, and basically your server is your load balancer in this case. So this is kind of the service definition that we would have. One thing, again, that I've got you to look out for with the sort of load... The service that you create, which load balancers your traffic between your pods, is if your protocol that you're using uses long-standing connections, which again, like 0MQ in our example, points this out very well because 0MQ creates long-standing connections. You connect and then it tries to reuse that connection for lots of requests and responses. So our echo service will accept lots of echo requests from the client on one connection. So this means that, essentially, we don't get the load balancing. So one client will permanently be connected to one of the pods or one of our applications, essentially. So that's not very load balancers, like... On the other hand, if you're using HTTP, then a new TSP connection gets created for each request and that would distribute automatically. The trick to use there is that Kubernetes actually allows you to see the layer below services and it has endpoints. I think objects it calls endpoints there. And that actually allows you to see which endpoints, basically IP addresses that are part of your service. And using that information so we could update our application to sort of query the Kubernetes API to ask what endpoints do I actually want to connect to? You can tell on 0MQ, connect to all these endpoints. The downside is that you have to kind of work with the Kubernetes API. You need to be constantly aware that this can change. So when an endpoint disappears, you need to tell 0MQ to disconnect from the endpoint, et cetera. But that's just generally, you know, the sort of thing you need to be aware of when you have long protocols that use long-standing connections, basically. Next on, talking a little bit about logging, so you may have seen in the very first example and cringed at this print statement. So by default, Docker kind of takes standard output of your container as log data and Kubernetes will again take that log data from the Docker containers and they'll make that available to use. And generally, the idea is that at operation time, you sort of hook that up to a log aggregation of some sort, something like Elasticsearch or Fluendee or something like that. But using simple print statements is not very nice. In general, you want to be able to control log levels at command line level. That's sort of a 12-hectare app kind of thing. So you really want to use logging libraries. So logging libraries are sort of common, there's quite a few variations, and they all kind of try to wrap a horrible amount of global state into a nice API, and global state is always quite horrible, so they all kind of are ugly in one way or another. I quite like logbook. It's not kind of like the way it tries to handle the global state, but there's nothing inherently better about logbook than standard library logging, if you like standard library logging. One nice thing about logbook is that you can use the normal curly braces for new style formatting instead of percent formatting for standard library logging. But the main thing here is to notice that once you start using a logging library, you can actually start... You can hook up your logging library instead of printing out to standard out, which is kind of the first thing you probably do, because if you don't have all the infrastructure yet. But you can hook it up to send the log records directly to the aggregator. This allows you to... If you have tracebacks or something, you can send them as a single big block. So the next thing that you really want to do as soon as you start using logging libraries, wrap your main application into... into basically this exception logging. Our logging libraries will support some variation of this. But the idea here is that by doing this, I make sure that any unhandled exception, as I was advertising earlier, will be captured by the logging library and will be sent as one single log record back to your logging aggregator. So next on is kind of this concept that the community is called health endpoints. And the idea here is that... The central idea here is that when your application starts up, the community has said, start this container, and as soon as that container is kind of running from a process point of view, it's available so the service object that you created for your application will start sending traffic to you. But there is a finite amount of time that your application is running and you haven't opened your socket yet and you're not listening for connections yet. In this case, obviously, it will be very small, but if you have a lot of setup and things you have to do before you start accepting connections from clients, that delay might be bigger as well. And the problem is if at that point community is sending traffic to your application, then you're basically saying refused connection and all these clients are like, well, the service is down. So you don't want to do that. What community does is it introduces these readiness probes. And the idea here is basically that community is like, after starting your application, it will wait until the probe succeeds, and once the probe succeeds, only then it will start sending traffic to your application. This is kind of how you configure that in the community's model configuration. In our case, so there are several different probes you can use. In our case, it's just a simple echo server. So all we care about really from the probe's point of view is that the socket is open and listening. So this TCP socket probe essentially says as soon as this socket accepts connections, send me traffic. Kubernetes will literally just try and connect to the socket and as soon as the connection closes and say, yeah, okay, I'll send the traffic. So that means there is still kind of a tiny delay in which we might actually have bound our socket, but not yet in the main loop of processes and messages. But because we know the buffering and the queuing in our sockets to care of this, that's basically perfectly fine. That will be a short amount of time. There are some requests to get queued up, but we'll start serving them very shortly. If your service is actually using HTTPS and transported, it kind of has built-in support for that. And there is this convention of using this health and Z route. And basically, this is really nice in a way, because you can basically tell your, the readiness probe is kind of completely in line with, or can be if you create a search, to be completely in line with your normal request processing. So if that actually returns, so basically Kubernetes only looks for the 200 okay on there. So as soon as that returns 200 okay, it will start sending you traffic. And because you can do this completely in line, you actually have quite high assurance that, yes, my application is fully running and can start serving traffic. So the same concept of kind of readiness, also the same concept also happens for during the pod's lifetime really. So when your application is running, especially when you're running your application at a large scale and it handles lots of requests, sometimes things will just go wrong and one of the many instances will start misbehaving. This might be because of external things, like there's another container on that box that even though things should be isolated, they're not always as isolated as you would imagine and things go terribly wrong or someone decided to run a backup on that machine and things go really slow, all sorts of things happen really. So the idea of liveness probes is that you don't want to be sending requests to an application that's slow to respond or is just completely stuck or something like that. It does this in essentially a very similar way, so it just has this liveness probes thing and this shows kind of the third type of probe that you can use. So we've already seen TCP socket and HTTP GET and here I decided to use the exact one. Exact is kind of the most work, but it's also the most flexible. So the problem is the TCP one that we used for readiness and it was very suitable there. It's not very useful for the liveness probe and what we really want is kind of the same as you can use with HTTP GET. It's like we want to know in line if things are working correctly. So the trick you do there is this exact command basically says it gives you the command line that you want to execute inside of your container. So you have to provide an extra binary inside of your container which will ideally tell how healthy or not healthy that pod is and ideally you want to aim for this in line checking. That's exactly what I've done here I've added an extra script in my application and I just invoke it with Python. So this is kind of how simple the script can be. Again obviously I have a very simple application but literally it just creates a socket connected to the public endpoint in this case because I'm running on the same container it's actually on local host but we still made the TCP connection so it still gives us a very high kind of idea of what the public behaviour is of our application and we just send a message and we wait for like 500 milliseconds and if you haven't got a message back yet at that time then we fail and then Kubernetes will start taking our pod out of basically stop sending traffic to our pod and send it to the healthy ones instead. You may notice I'm not actually even receiving my response I just check hey there is a response in this case I think that is sufficient. So it depends on the protocol you have or what sort of messaging you use sometimes you might want to introduce slightly more and often it is nice to build in something like that what the hell said kind of route to that in the HDP convention just something that you know it is working yes just something that you know it is working and everything is fine so that's the idea of lightness so next on is like termination and again this is like very similar if you thought about how we at start up time we didn't want to refuse any connections at termination time we don't want to drop any connections that exist so any requests that are currently queued up on our process if we just say oh we've been asked if we don't do anything basically our application will just receive termination signal and it will just die and all the requests that are queued up will basically be lost and all the clients will be waiting again so that's not very good so instead we should handle the termination signal which is done in Kubernetes just like it's always been done in Unix systems you basically get wise via Docker but you get sick terms hand to you and the signal handler here that we created is just a very simple internal socket it's essentially I think of it as a pipe and all we want to do is like send this signal like I'm sending a single byte to my main loop and then my main loop knows it has to shut down and yeah and it can shut down while trying to handle the connections I'm using the same signal handler here it's called sigterm and sigint sigint is the signal that you get when you press control C normally when you're running on a terminal so in Python that normally gets automatically translated to keyboard interrupt exception because you want the application to kind of behave in the same way when you're running when you're running on the tester to try it's usually best practice to just bind both the signal handlers the modifications for my main loop are maybe a little bit more messy now but essentially it's not that much I just again create instead of creating just one socket in my main loop I create two sockets so I add this termination socket I add it to the polar and when I now receive the important thing here is when I receive this new this single byte basically I don't even care what that byte is by the way again I just know I receive the message from my termination socket so I need to shut down and here I I first unbind so this means that I will stop receiving new connections and then I basically keep processing while there are still events in the queues I keep processing and I have also I set a time out actually to like five seconds something which may be fairly high but the idea here is that some requests might actually already be on the wire and I want to give those a chance to be processed well and once there are no longer any messages in my queue the wire loop will finish and I return so the last thing that I would like to add is kind of monitoring so Prometheus is another cloud native computing foundation project actually which is kind of why I mentioned it and the idea here is like to start you always want to know kind of what's going on Prometheus is kind of offers you this option of doing wide box monitoring in a way so you can add counters and you can add metrics inside your application and you can start enriching your metrics collection basically so Prometheus works in a very in a pool based fashion so you have the central infrastructure and it will go around to all your services and it will do an HTTP request and get back your metrics basically and it's a little bit like SNMP over HTTP if you remember the SNMP pool of metal but at least we learned that monitoring data is probably even more important than production traffic so we actually use a reliable TCP transport instead of something like UDP anyway Prometheus is obviously a very difficult and big project but the idea here that I really want to show is just how easy it is to get started with it and you can just add very little things there is a last year that was a talk about from Hinnig I think at Europe Python which goes in a lot more detail about Prometheus so this is kind of the recap like the idea is basically none of this is actually strictly required to be able to run on Kubernetes you can add up this gradually as you need and yeah to keep your architecture simple think about when you lose requests and you don't want to go blind basically so you always want to have some instrumentation and monitoring thank you very much I think I'm about out of time so unfortunately there probably won't be any questions but you can find me outside if you like