 Pnešno, sem Petr Stehlik, pa včasnjem v Kivi.com in sem tudi načinjeljnjavnjavnjavnjava v finaciji, kaj smo zelo však v Kivi. Kaj smo však v Kivi. Zelo včasnja je nekaj nekaj, da počkaj smo zelo však však však však však však. Zelo se počkaj, kaj smo zelo však však však však. Zelo se se sezno izpravim, ki je zelo však však však. in vseč nekaj zelo. In, da smo tudi, zelo smo vseč q&a. Tazq. Prvno, da smo vseč vseč vseč. Vseč sem tudi vseč, da sem tudi vseč, da sem tudi vseč. Zelo sem zelo. Tazq je parallelizacija v deškrih tazkov, zelo vseč. Jeste v Pythonu, tako, da je vseč, je zelo vseč, nekaj vseč, nekaj vseč, nekaj vseč, češti zelo zelo. To ne občasno je početnje izbragi. Početnje tazkiče je odpovrilo za hradi v CPU, GPU in vseh časno začo. Češti početnje so ni početno. Početnje, neč vsev tazkiče, je tazkič kjudžen. da je dobro, produsor tazki in zelo tazki. Vse možete imati, da je tukaj na marketu in tukaj tukaj tukaj na banana. Tukaj tukaj na banana, tukaj tukaj na produsor in tukaj tukaj na zelo tukaj na banana. Zelo tukaj tukaj tukaj na banana. In zelo tukaj na banana puno da je tukaj na zelo tukaj na nasljant, in da je to je tukaj na nasljant v uklavana. A zelo tukaj na tukaj na Tukaj na nasljant, da je vedno zapečil o pridrenia. Imamo in ino počust, da to potešimo počust, nekaj sa se vojličnjama ledao, in dojimo na zelo tukaj na zelo tukaj na bolj možete imati, nekaj, da je to bolj počust, dojimo na bolj pridrenia. V Kibiji tako način imamo, da je ta način semto tikaj na način, da je to bolji način, aplikacije v mikroservesih, in ki mikroservesih je začel, pošličaj se na poslutku, in pošličaj se na poslutku. To je vse. V pošličku, pošličaj se začel, in latensi, in poslutku. To je vse pošlič, in kako se vse kombine, vse poslutka vse poslutka. To je vse poslutko, is a good for. Now the story. We had this small, let's call it microservice, for handling accounting data, mainly invasives, and it's called Fantozzi. If you have watched older Italian movies, you will know that Fantozzi is a series of films about an unfortunate accountant. So that's where the name came from. In izstavljajte, da bomo res api v fronu. In početem sem sem srečin, tudi tudi tudi vkruči. Vse inoče design se očela iz kvalitkovama kruč, ta je z webhooklibrari, kaj je jazem vse webtokon, tudi svega tega, tez vse tehnologije, ki so tudi sreči tudi tudi sajdesno. Tukaj smo vedno vršali, da smo odsledali 3 vrštje, kako smo početili v Fantoci 2.0. Vzvemo, če je ta vse framework, ki se početimo. Vzvemo, nekaj, če je zelo. Vzvemo, da smo učeljali vzvema vzvema. Vzvema vzvema, da se početili, da se početili, Tako primer je... nekaj nekaj je nekaj odb modlj. Zelo, da smo bojeva sovrstvene dobro kvaliti dve aktivite, bojeva teh, da nekaj je odb modlj. Nčak ge tako tap Qin, da nekaj prikratimo. Tudi nekaj nekaj je odb modlj. Tako dobro kvaliti dve aktivite, da nekaj je odb modlj. Zobajo se za dobro ovo. Vse nekaj je odb modlj. & you just know better. So you can imagine yourself like the super duper programmer that knows everything and will do everything better. I know everything I need. Like why would I care of reading the documentation or some best practices or whatever? Or just how to set up the application itself. And then I said I can do it better. That's one of the most dangerous ones. denn vas je zelo na vseh povoljtje Carlo, in nekaj ne vegaj, da je to nekaj ne povolj, na kaj bloom, je dobro. In lahko ne padem nikaj ponukovati, boste čakaj biti nekako ugraj, nače se povolj je nekaj nekaj nekaj centers. In da se predspovatil, Cho causing set, algorithm! There was this small three week window of two developers developing. And then, at the end of three weeks suddenly realizing doesn't work. It's like, it's a really bad application. Like, it won't scale, it won't be maintainable and actually the setup would be harder than with the usual one with the usual one we have in Kiwi. So, we basically lost three weeks of development time, because we then decided, okay, we used Redis Qs, or simply RQ framework to implement it, and then we changed it to salary. Changing to salary took us around 16 hours compared to three weeks of development time. So, we wasted effectively six weeks of present time, and that's why I'm actually here to tell you why it all happened, and what would be the best practices for you. So, the first thing why it happened was examples versus reality, because in both RQ and salary, you have these beautiful examples of a simple app, just how to scaffold the app, you know, like five lines, and that's it, right? Like, yeah, that's easy, let's do it, you know, because Redis or RQ is lightweight, so let's use RQ instead of the giant salary that handles everything for you, you know. But in reality, we suddenly needed a repeater for the task. In RQ, not included, so you had to write it yourself. And then this kind of ugly mess was created to actually do a repetition without much configuration in it. Yeah? Don't try to read it, just to scare you off, you know. But surprise, surprise, in salary, it's included, so you just need to put some things in the decorator, five lines, and you're done, you don't have to write 50. And you have it all parameterized, it's all explained, it's all documented, and you're sure it will work. But also be careful, because when we were implementing salary, and we saw the five line example of how easily it is to integrate, we ended up with over 250 changes in the whole repo, which was at that time around a thousand lines. So almost a quarter of the project was changed because we implemented salary. So be also mindful about this. And suddenly we have a working application that's maintainable, it's running on salary, which we are using throughout the whole kiwi, so we can get help anytime, anywhere from anyone of our colleagues who are more experienced in some areas, some are less experienced in some areas so we can brainstorm together. And this we came in our final stride, we came to a final setup of how we actually do it, how we scaffold our application, how we develop them. So first we are using, of course, Python and Postgre. On top of it we have Flask, or currently AOH TTP. Together then we have a Konexian that takes care of the REST API, of course we have salary. For Broker we are using Redis on AWS, so it's managed. We are using multiple deploy targets in our continuous integration pipeline, and we are using LOX.io and Datadoc for monitoring, and we are slowly shifting everything to Datadoc. And when something goes bad, and really bad, we are using sentry and pager duty for notifying us. So that's how we do it, and that's how the Fantozi application was developed as well. I will break down all the points here, so you can know a bit better. With Python we are always trying, or always trying to shift to Python 3.6. And then we are starting a new project, we are always doing it 3.6 or newer, usually 3.7 now. We are also, as I mentioned in the beginning, we are trying to break everything down from monolithic architecture to microservice architecture, and using task use and asynchronous processing. With Flask and AOH TTP, these are the go-to frameworks for us, because we have boilerplates for them, and we can scaffold them quite quickly thanks to cookie cutter templates. On the right you can see the example in Flask, and how we basically instantiate the whole Fantozi application with all the monitoring, all the sentry exception catching, and everything. Just a quick question, who knows what an open API3 is? OK, not many, so I will explain a bit. With Connexion that's like an extension, or a framework, it's actually. For Flask and AOH TTP and a couple of others, and it implements the open API3 specification. So basically when you specify a YAML schema of your API, and it generates documentation and validation for your API. So you have a beautiful swagger UI, useful for other developers. And you can actually test it there, you have examples, and it's generally useful. So take a note, Connexion, or open API3 is the way to go, and just a side note, open API3 is the successor of Swagger, Swagger specification. It was just renamed, so some low thingies. With Connexion we're also using token-based authentication, and then it's needed authorization. So we don't do, you know, JSON web tokens, because they are too complicated, you just need a secret, as a bearer token, and you're ready. Sorry. We follow the best practices, which I will present shortly. And with Redis, on AWS, we are using it because it's managed, it's reliable, and it's easy to deploy. So we don't lose any tasks when something happens, when, for example, something goes wrong, really wrong. Multiple deploy targets. We are usually deploying HTTP API, the REST API itself. And together with that, we are also deploying workers, and so on, and so on. And the beautiful guys from the platform team created a really useful thing for us. It's called Crane, and it's available on GitHub, on our KiwiCom account, and it will help you to easily deploy to Rancher via GitLab CI, and it can help you with messaging channels or relevant people when you are releasing. With LogSio and Datadoc, we are using it to extensively lock everything, like when it doesn't lock, it doesn't happen. And with Datadoc and their newest development, we are slowly moving there with all the logs because we can join the tracing and logs together so we can stitch everything with the APM they provide. So that's a thing to consider as well. Sentry is when something goes wrong, so an exception happens, it's locked, the stack trace is locked, and we can reproduce the problem itself. And when something really goes wrong, we are using pager duty to wake our developers 3 AM for nothing, basically. But hey, you get money for it, you know, because you're on call, right? So lessons learned, why we are all here, mainly you. The last thing, use Redis or AMQP broker never a database for salary. You may ask why, because you already have a database in your system, so why not use it? Well, it's very simple. Yeah, but let me just wait for the camera, yeah? So never a database because imagine that you have like 20, 50 workers in your setup, and each of the workers needs to ask the database, like hey, are there any new tasks that I can take? And the database usually replies no, because you have 50 workers, and then sometimes it replies yes. So imagine that you have like 50 queries to the database a second, just from the workers. And suddenly your product goes wide, you go to production, and it's used by millions of people. And suddenly the database starts failing, it starts to time out, it starts to underperform. Why? Because the brokers, because it's serving as a broker, because it's overwhelmed by the workers, because it's always asking for new stuff. So we have a lot of sessions open, and generally you're going to crash. Redis or AMQP broker are designed for this, and they are independent systems. So if they crash, it happens, but you definitely have backups on Redis, or replicas. Here is a small example how to set up brokers for AMQP and for Redis. For Redis you need to install an extension for Celery, and then you can easily use it, just install Redis and you're good to go. That's easy. Second thing to learn. Pass simple objects to the task. When you have, for example, an ORM, a database model populated with data, you updated it, and you commit it. And then you pass it to the task, and you can work on it again, so you don't have to do the query. And then you commit it again in the task. I see where this might go, because the object is quite complicated, and it can go stale quite quickly. So when you put the object to a synchronous processing, it can go stale without you knowing it, and then you will create a conflict in the database, and so on. It's much better to pass just the primary key of the object and then query it again and have fresh new data that you can rely on. With that, you will avoid these kind of problems, which are really hard to debug because it's basically a race condition. Third thing. Do not wait for tasks inside tasks. With this, I will talk a bit more about it and explain it later on, but when you're waiting for tasks inside tasks, you are creating an endless loop if you have repetitions, if you don't have a retry limit and everything, so you will end up with a stuck task that is endlessly trying to do something and is blocking everything basically. So you can end up with quite a haywire in your system. This comes together with the set retry limit. It basically tells salary how many times you can retry the task and then just give up race exception and just mark it as successful and handle it yourself. It's really easy. It's in the decorator itself, just max retries and you're good to go. This auto retry 4, this is a really handy feature because you can specify an exception on which the task will be retried. But again, don't forget the max retries, otherwise you can end up with an endless loop of a single task which is occupying one of your workers. So you just define the exception that you want to be repeated and again, you're good to go. We are slowly building the decorator, you see. So it's now multi-line. Use retry back off through and retry jitter. With back off, you are specifying that the retry and the wait time between the retries will increase linearly. There's a beautiful formula for that on Wikipedia but don't bother, it will just prolong the periods of time. For example, when you have an API that you rely on and it has a 500 error, you can wait one second first and then retry it. Again, it's down, it's still down. Then you wait four seconds. It's still down, never mind. Five seconds down time, still fine. But then you wait for another 15 seconds or so and suddenly the server is up, your task is done and you're happy to go again. With the retry jitter, this is the same task happening at the same time. Because when the retry is happening, the jitter will add a small amount of time or subtract a little of time from the back off. So the repetition of the task doesn't happen at the exactly same time. So you don't basically deduce the other service, for example. And again, retry quarks always set the limit. Set hard and soft time limits. The time limit is basically telling you that you know, you should end. So end gracefully and the time limit itself is hard and it will kill without mercy. And then again exception and error handling will happen. Just bind for a bit of extra oomf in your task, basically meaning that you will get a reference to the task itself. So you can lock more, you can retry with contextual info, actually. So if you can, for example, decide, if you have a network error, you will try five more times, but if you have an integrity error or anything, you just will give up. Or you will just log it and give up because it's the fault of the data, not the API itself. So you can use, for example, logging, as you can see here, we log to standard out and we are using to get actually the stats for the task, if it was successful or not in quite easy manner. Separate queues for demanding tasks. Imagine that you have a task that communicates with a very, very slow API. It takes like 10 seconds to actually get a response from it. And that usually uses the super fast API that is like milliseconds to do. And you have a single queue for that. You can imagine that the long running tasks will starve out eventually because they will be always preferred for the shorter tasks that will happen often and they will come more often and eventually the long running tasks will go stale so it's always better to separate these kind of tasks to their own queues. Like, for example, here you have a fast and slow queue. This is a generic example, really. It's better always to name it a bit more precisely. And then with the Apply Async you just specify the queue and you're good to go. It will help you tremendously and of course when you have multiple queues always don't forget to deploy multiple workers which handle only that specific queue. Prefer idempotency in atomicity and because I'm a lazy developer I didn't remember the full description or definition of idempotency in atomicity so I asked good aunt Wikipedia to help me here but idempotency basically means when you call multiple times it will always produce the same result and atomicity means that when you call the task it will appear to the system as atomic meaning it will happen instantly and without side effects. To sum it up you saw this or I am QP simple object to the task don't wait for tasks inside tasks set retry limit, use auto retry use backoffs, use jitter use bind and use separate queues and always prefer idempotency and atomicity. Those are the lessons learned and there are also things to consider with salary because it's a really powerful framework so you should always take into consideration what can go wrong there as well because with salary you are sharing the code base between producer and consumer so you need to load circular imports the way of how the imports work and what will load when the worker is starting and what will load when the server is starting or the producer is starting you sell it with full potential read the salary docs they are huge but it's a nice evening read like when you have nothing to do come on let's read our workers today you don't have to read it carefully and remember everything like every param that is there just remember that something like this is there and you can use it because eventually you can use it it might come in handy so be mindful read the docs and also always bear in mind that you are using third party APIs and they don't have to scale as your application so be mindful because the developers of that third party API might not be happy when you shoot them down that's most of my talk done thanks for listening and I would like to invite you to our today's party after Europe Python where you can win flight vouchers and this is an invitation party so visit our booth and it's somewhere there on the left I was trying to pinpoint the location and you can definitely talk to us I will be there after the lunch so we can talk together and find out more about the party and also more about the party as at themeetqe.com so small thing there there's a small error I am still a Python engineer I am not an engineering manager there so it's waiting for me thank you we have about 3 more minutes for questions so if there is any questions yes of course flower is allowed yes definitely it's nice thing to have but you need to know how to use it actually so yes for monitoring for more granular monitoring I definitely recommend it but honestly personally I prefer my own monitoring where I can get alerts and everything for what exactly if you design it well you don't need it but we can talk about it later there was one more thanks for the talk you said that so what is your experience when you are using IOHTTP and have you investigated I think enable take a system like ARQ or some other things with IOHTTP we are still in quite early stage so we don't have a long term experience with it but basically if you understand the async paradigm it's kind of okay let's say we didn't have big problems with it yet so no lessons learned yet no expensive things to learn thanks I would like to know how you do your health checks on the salary workers on the machines you mean health checks? we have quite a few and we recently put up but the health checks are taking a lot of processing so we were wondering if you are doing it right I would like to know how you do it we don't do very good health checks we do logging and through the logs we can see what is happening and with that we are usually deploying quite often so if you are for example asking about the memory consumption and the memory leaks we don't care about them but about the regulatory the workers and we have like a rancher itself or any container management can be set up to restart regulatory to return to a healthy state so there are health checks but mostly for the whole API or for the whole application to see whether the database is stable or the connection to the database is stable if everything is communicating properly and we can talk about it later on Thank you