 Okay, so we're gonna talk about service discovery today and we'll focus on the client side. How do you use service discovery in Python? I won't argue if you should or not use this service discovery in this talk. I won't explain how to install the three technologies that I will cover here. I will just focus on their usage. And if we have time, which I hope we will have, I'm crazy enough to have done a live demo, so we'll try it. It's an opinionated talk, okay? So that's my point of view here. First introduction about me, you can find me on Nick Ultrabug, I'm a Gen2 Linux developer where I work mostly on cluster stuff and Python stuff. I maintain packages related to NoSQL, key value stores or message queuing. I'm also CTO at Numberly. We are a programmatic and data driven marketing and advertising company. You have a booth over there with a quiz and you can win some crazy stuff, so just come around and you can have a talk. Okay, so what is service discovery? To make it short, you can compare it to what DNS is for your browser, but in a dynamic way. When you connect to a website, your browser first have to find out the IP address of the host hosting the website you want to reach, and to do so, it does a DNS query. Before hand, when you own the web service and the website, you had to configure the DNS and register inside the IP address of the server, of your server. Service discovery is about the same thing. It's about registering and querying, but for service. That's the basic of it. Let's see a bit more about it. So we have a catalog. That's provided by the service discovery technologies, and then you have your servers. Each of them provides a service. Some of them provide the same service. They will register themselves into the catalog, and so you will get a list of service is running at host and port, multiple times if the service is running on multiple servers. Then you have clients. The clients will be looking for a service by its name, usually, and they will query the catalog for the given service, and they will be handed over a list of available hosts providing said service. This is service discovery. Now let's take a quick tour of the three technologies I will cover here. The first one is the oldest one. It's named ZooKeeper. It's from the HEPAGE Foundation. ZooKeeper is firstly designed as a reliable cluster coordination. It's used mostly and mainly in Hadoop. It has some pretty interesting features, and it's mature since it's the oldest of the three technologies we'll cover here. When I say in the negative points that it doesn't provide service discovery per se, that's true, but we'll get back to it later as to how we can still use ZooKeeper to achieve service discovery. What I mean by this is that it's not a built-in feature of ZooKeeper. The main design of ZooKeeper, and it's the same for ETCD, which we'll see just after this, is that you can compare them to a distributed hierarchical file system, which is also comparable to a key value store. You'll see about it. It's written in Java, and it uses a special implementation for consensus algorithm. The consensus is about making sure all the nodes of the ZooKeeper cluster agree on something. The Python C bindings are not usable. There is one provided here on the sources, but it's not really usable, and even worse for service discovery. It's not a data center aware technology. It just knows about its own cluster. Now you have ETCD. ETCD is from the CoreS guys. It's a pretty recent project. It's written in Go. It uses the raft consensus, which is pretty robust. It has good adoption. It's used on many bigger projects, like Kubernetes, and it provides an HTTP API to do all the queries and registration stuff. It's pretty simple, really simple to implement and configure. Just like ZooKeeper, it doesn't provide, per se, a service discovery mechanism. But we'll use the file system hierarchy to achieve this. It's not data center aware either, and it doesn't provide any kind of health checking of your services once you register. We'll see about it later as well. The third one is Consul. It's from HashiCorp. That's the newest of the three. It's also written in Go. It's also using the raft consensus algorithm, and yeah, I told you it's an opinionated talk, so I didn't find any bad things to say about it, because it has built-in service discovery feature. It's data center aware, so you can have multiple clusters of Consuls in each data center, and they can talk about them, between them. It also provides a DNS API, so you can also look up for services using DNS, which is kind of a good feature. The note I wanted to stress out on ZooKeeper and ETCD is that we will achieve service discovery by abusing the key value store. You can see the key value store are the sort of file system where you can store data. Curing is about creating a node, or a folder or a file, if you want to relate it to your local file system, and make it meaningful. In this kind of example, at the root of the hierarchy, I will say, okay, the first level will be my service name, A-P-A-I-X, then on the second level I will create a folder which will represent all the servers providing this service. So I call this folder providers. And then inside, I will create nodes, or you can relate it to files, which are named my host, two points, and the port. So discovering providers for A-P-A-I-X service is just like listing the content of A-P-A-I-X slash providers directory, fine? We can do the same with memcache and stuff like that. That's how you can abuse and achieve service discovery using key value store-based technologies such as ZooKeeper and ETCD, okay? Now, let's see the Python Client Library to talk to each of those technologies. The first one for ZooKeeper is Kizoo, ZZK, and yeah, I know, I'm sorry about this. We can be a very creative community, I know. We'll use the underscore line one, ZZK, which underneath uses Kizoo. So you can see a ZZK as a service discovery-oriented wrapper of Kizoo. So it's pretty handy. Then for ETCD, we have some standard Python dash ETCD library, which is pretty good. If you use Async.io, you have another for Async.io stuff as well. And for console, you have Consulate and Python console. We'll use Python console, which is more now documented and more active than Consulate. Last year, it was the contrary, but this year, Python console is very, very nicely implemented now. So good job, guys, thank you. Okay. When you choose a technology, you have to rely on it. Even more when it will be the core of your whole topology. You have to make sure that you can rely on Python clients, because they really have a direct impact on your application. So let's see about the ZZK client library, which uses Kizoo. When you want to connect to a ZooKeeper cluster, you can specify multiple hosts, which is pretty cool. It has a rotary connect feature. You can query about the connection state. You will get connected or disconnected and stuff like that. So you can have your code handle this gracefully. And it has rich exceptions if something wrongs happen. So I'm providing a quick example here. The don't fail on connect means if no server is available, when I do the first line and wait and try to connect to my ZooKeeper cluster, will it be blocking? Will it raise an exception? In this case, it's blocking, and you can change this with the wait parameter. But it will raise an exception, okay? So it's not for the one of you who are used to the Python memcached library. You have to know about this and handle it. Because this can block your whole application if no ZooKeeper server is up. On the ETCD side, Python ETCD side, you don't have the possibility to connect to multiple hosts. But you have auto-recognition gracefully, so it's pretty good. You can't really try and get the connection state. The exceptions are pretty rich. So you can see what's happening pretty easily and catch the good exceptions about the different kind of errors that you can happen to be running into. And it does fail on connect. The Python console one is, well, not so good as the ZCZK one as well, because it doesn't support multiple hosts either. It has also a reconnect feature, auto-recognition feature. The exceptions are so-so. I'm providing an example here. Continuer error is, well, sometimes not very, very meaningful. But it doesn't fail on connect. That means it's non-blocking. You just create your console client and then continue on. Nothing happens when you do that. Which can be a good feature. Okay. Now, about the service registration. There are three things you have to consider here. Three states of a service life cycle is getting up and it needs to register into the catalog. Then it's running and you have to make sure it's still running. Because if it's not running, it crashes or your server providing said service becomes unavailable, you don't want to answer clients about it, okay? So you have to remove it in a way from the catalog when it's down. That's the dynamic part. And then if we stop gracefully or if we crash, we have to deregister it from the catalog. So the health checking will also do the deregistration for you in case of failure. You'll see how it's done on every Python implementation. For ZK, it's pretty straightforward. The main line, the main thing to understand is the first one over here and the first try accept will just create the file system hierarchy I talked to you about. So we just make sure that we have the slash EP 2016 providers and we do a make path that will create the whole path like MKD or dash P. And if the node already exists, it's okay. We can continue. Then the ZCZK provides a cool method which is register and then you say, okay, on this node, on the provider's node, I will register a machine named Yaz running on port 5000. And it will create the file like node like Yaz, two points, 5000 for you, okay? That's all we have to do. Now about health checking. The health checking in ZooKeeper is implicit because ZooKeeper has this cool feature named ephemeral node. An ephemeral node is just like, they are like files or nodes in the file system hierarchy that are present on the file system as long as the session of the client who created them is alive. So whenever the client dies or closes its session, ZooKeeper will know about it and will remove the given node automatically. So it's a good way of doing health checking because if your application crashes or you want just to register, you just have to exit gracefully and close your session. By closing the session to ZooKeeper, ZooKeeper will remove all the nodes you created with this application. So the register thing does that. It creates an ephemeral node. So that's implicit in the ZCZK Python client. What about the failure detection latency? If my program is kill-9 or crashes badly and didn't have time to register gracefully, how long will it take for ZooKeeper to remove the node from the hierarchy? And then in other words, how long will it take for the clients to not be served by host and port anymore? It will take session timeout here when I created my client session. I set five. So it will take up to five seconds in this case to make this happen. So for five seconds maximum, I could be serving wrong host and port to my clients from the catalog. That's something you have to consider as well in such topologies. On ETCD, it's basically the same principle. We try to read the provider. If we can't find it, we create it as a directory. Then we just have to write. There is no register wrapper or something like this. So we just have to write the given node I'll talk to you about here. And we can set the data in it. So we put also the same thing in the value. It's not a directory. And it has a time to live, TTL, which I'll talk to you right now. That's the health checking, actually. You can see that it's coming difficult here. Why? Because ETCD doesn't have the concept of ephemeral nodes as the Keeper has. That means that you have it to implement health checking yourself. Or use a third party library or program to do it for you. But you have to do it yourself. So in this example, I'm doing it myself. So the trick I'm using is that when my applications start, I have to create a health finger thread which will constantly, and in an infinite loop, register my service. And that will be a sort of heartbeat or health checking stuff with a TTL. And then my TTL, the time to live of the node I'm creating. It will be removed after TTL seconds from the hierarchy. So my fellow detection latency is TTL. But I have to have a thread constantly making sure that my node is present. And so my service and server is in the catalog. If you use console, everything is granted and built in. So you can see in the code that it's pretty straightforward. I just have to register my service into a console agent which is as well very self-explanatory. The name of the service, the address of the host providing it and the port it's running on. It's integrated. Nothing more to add. The health checking is interesting in console because you have a way to make sure that the console servers will run some health checks of your service by themselves. So you just have to create, like in my example, it's an HTTP service. So I'm creating an object, a health check object which is of kind HTTP. And I'm providing the URL that the console server should call every two seconds. So I'm telling console, hey, okay. And when I register, I pass the extra argument check and I say to console, okay, check this URL every two seconds. If it fails, remove me from the catalogue. Or to be very correct, mark me as failing, all right? How do you discover all of this? It's pretty straightforward as well. So I will just show you the querying part. For ZooKeeper, you can get the addresses by listing the children of the given node. So I'm listing the children of the provider's folder in EP 2016. And that will be my nodes. I just have to loop over them, split the two points, and I get the host and port of every server providing my service, okay? ETCD, basically the same stuff. So you make a recursive query, read. You get the children, and you split, and you get your host and port. On console, it's also very easy. You query the health service because you want only to get the healthy servers providing your service. So that's the passing equals true here. I just want you to return the service where the health check is passing. The servers and ports for which the health check is passing, okay? And then inside, I get a lot of information. It's a directory-style thing. And inside there is the host, port, and other stuff, interesting stuff. Sounds good? Okay. Now let's play. So I have given three Raspberry Pis, and my machine here is running a zookeeper ETCD and console agent. So the idea I had is to showcase a service discovery page like this, where we will be looking for the EP 2016 host, providing the EP 2016 service. So I just wanted also to demonstrate the key value storage, which all those technologies are also used to configuration access. So you can store your configuration in these key value stores, so your application can also get them from it. So the color here, I don't know if it's readable because of the resolution. Yeah? Every time I reload this, I change the color configured on each in zookeeper and ETCD and in console for my web service. So Dirk, can you start running your Raspberry Pi? So Raspberry Pi 4 is this one that I plugged in a few seconds before. And I can just go to it like this, and you can see that every time that I will change the color on the key value store, it will be picked up by the application from in this case zookeeper. And then Dirk just plugged in the Raspberry Pi number one, which appeared and got discovered here by the server on every platform. So if you can you also plug in yours and you too. So we'll see the results coming. And what's interesting about this is also like this, okay. It's going to get hot now. I think my Raspberry Pi 4 gets a bit overloaded here. It's the Wi-Fi, but it's okay. So each time I reload, okay, Dirk Raspberry Pi is running pretty awesome. You see that my Raspberry Pi 4, which is not responding here, you can see that it's not responding. The health check failed for every one of them. And that's it has been removed from zookeeper, ETCD, and console. So it's a good thing, okay. It's working. Right. Okay. So now we can see Raspberry Pi 2, okay, Raspberry Pi 4 is getting back somehow, yeah, it's getting back, yeah, it's getting back, okay. Raspberry Pi 3 on console, yeah, it's working as well. Okay, you can see the color now, all right, yeah. So now we have the four Raspberry Pis happen running and they seem to be, yeah, pretty stable on the health check, yeah. I will remove, I will disconnect Raspberry Pi 4. Now let's see about the time it takes, it depends on the technology because they have a different kind of TTL, ephemeral node session timeout, or health check timeout, okay. Yeah, some of them are overloaded. Do you have any question? The client decides. So the question is, is there any kind of balancing? No. The catalog, when your client queries the catalog, it gets a list of all the available nodes for the given service. That's all. That then it's up to you to decide to which one you want to connect. Yeah, I have a question about redundancy. If you have an application that is dependent on the service discovery catalog, and four different services that exposes and the catalog for some recent crashes, how will you recover from that situation? Will you have like service discovery of the service discovery, or how would you do that? Yeah. No, you don't do service discovery or service discovery. The minimum that's advised of servers is three. So you should have at least three zookeeper or console servers running, okay. So if you want more resiliency, make it five, seven, but an even number, okay, always an even number. If something very bad happens and you don't have service discovery anymore, I guess you have to handle it on your application side. You can make it like with cash, caching stuff. It's not very easy, and it really depends on the type of application you're running. But the best course is to make sure your service discovery cluster has enough nodes to sustain this kind of problem. Well, it depends on the technology, actually. As you saw, if you're a zookeeper, you can connect to multiple hosts. So you don't need a load balancer, but every of the nodes in this. On the other hand, in console and ETCD, you have to specify one of those, one of the nodes. So maybe you can implement some kind of stuff on your application to handle this, like having a DQ or something like this in Python and try again in each exception, if it raises an exception, you can try and connect to the other host, et cetera, et cetera. I think for the recording, sorry. A question about registration procedure, why don't you want to use external tool to do this? It can be implemented in configuration management, chief, puppet, salt. And in this case, you will have a possibility to register third party services like MongoDB and so on, automatically. So that's the question. Why not do this as external service for your application? Well, I think, to me, chef and stuff like that are good for provisioning or configuration, really a configuration, applying configuration to servers. I don't see service discovery like this. To me, I relate to your point with MongoDB and demon stuff like this. You have external programs that do it for you. I'm not sure that, for instance, chef, et cetera, can have checks running on. So if you make it with ETCD, it may become difficult to do. There are a lot of third parties libraries doing it for ETCD, for example, because it has a wide audience. And for containers stuff like this, they use specific third party tools, but not provisioning tools. We are using register container, register container that automatically registers any container that is running on Docker host. And it's a very good idea. I think it works well. And if something happens in this container, it will be registered automatically. Yeah, but it has to be registered somewhere anyway. So we are running local agent on each host machine, local console agent. And each service knows that it can bind agent on local host. And only agent knows where there is a console cluster located. And it can be implemented in very easy way. Yeah, but you don't have central configuration place, or you do it also in... We are using salt to install everything, but actually a service discovery is implemented using special containers. No other question? Well, thank you. Have a nice day.