 Hello, good morning everyone, welcome. My name is Shtovo Pasha. Some of you may know me from a couple of USB-related talks, so I may disappoint you. Today I'm not going to talk about USB at all, so if you came here to listen about USB, now you've got time to leave. If not, let's go to today's topic. During last year I took a part in a project that's main goal was to keep Python as an embedded distribution up and running even when we accept some foreign code. Foreign code means that the developer is able to provide their own RPMs to the image and we would like to make the user experience better. It means that we don't want crashes, it means that we don't want resources. Our main target are long running devices, it means IoT. So long running devices are kind of similar to the servers. I mean IoT is similar to servers, they are all running pretty long. So to solve our problem, to keep our platform up and running, we try to learn from previous experience of server-based. Then we came up with our own solution because it turned out that solutions known from the server world are to pay before our small and embedded devices. The whole Python image framework for the IoT is up to 40 megabytes, so we would like to keep our solution small. The main thing for today is the problem statement and the solutions known from the server world, our solution, some short summary and questions. So what's the problem? To help you understand the problem, I would like to talk about California. Who have ever been in USA? Raise your hand. Did you enjoy your visit? I have been a couple of years ago in California. It's a really nice place. It's sunny, it's beautiful. When I went there, I saw a lot of houses like this one. The house is big, but what focused my attention is this beautiful green lawn. I thought, California seems to be a rather dry place than wet. And you have a green grass, you need a lot of water. So I expected rather lawns like this one. So I thought, to keep this lawn green, they have to water it a lot. So it is costly. Then, if you are doing this bandimally, it requires a lot of time. If you automate this, it may be beneficial, but you have to do some initial investment. So I talked with some guy from California. I asked him, how are you doing this? How are you keeping your lawn so green? He answered me, we are just faking them. They are just faking their new lawns because they know that user experience is important. Their houses are like showcases. So when you come to someone's house and you see a beautiful green lawn, you know that this house is beautiful. When you see a lawn like from the previous picture, when it's really dry, you say, oh my gosh, it could do this better. And the same is with software. Our products are being wired using eyes, mostly eyes. So what's the most important is user experience, UI, but not only because performance can influence your product and rely on reliability. User don't want your device, your product to hang and you'd like to move it smoothly. You'd like your product to be ready to use every time. So how to do this? To ensure that your project is running smoothly, that your product is well-recognized by user, you have to put some efforts into developing. In our case, we are not providing a product. Our product is free. Our product is called Linus Distribution. It's a Tizen operating system. So, as I said, Tizen image is customizable. It means that you may log into the web server and choose which features you would like to include in your image. And then you may also add some custom RPMs. We don't know what you're adding to your image. We don't know what's your use case. We don't know if your code is of good quality. So, all in all, you end up with a platform where pieces of code comes from us. And we may say that they are pretty good or they come from open source project. And we can say that they are really nice. And then we may say that you are uploading some foreign code and we have no idea about the quality of this code. But in the end, you may always place a speaker on your product and say, hey, I'm running Tizen. It doesn't matter that to tens of our good services, you add some crap. You may always say, I'm running Tizen. So, the first piece of code to have a good piece of code, you need to have a well-defined, a good software development process, like in open source. You've got code reviewed, you've got tests. But there is a question. How long are you testing your products? How long are you testing your software? Are you testing your software running for a year? Not really. Usually, you just run some unit tests and see, okay, it's working. So, it should be working also for a year long. It's not always true. You need some continuous integration. And you need some static analyzers to improve your code quality. But even if you put all the stuff in place, your code is going to be perfect. Take a look on the Linux kernel. There is a lot of eForce, a lot of reviewers, but there are still bugs. It doesn't mean that you can throw the stuff to trash bin and stop using this. No, no, no. This is not what I'm trying to tell you. Please keep this in place. But remember that even if your process is really well-defined, that you put a lot of eForce, your software may be imperfect. This means that it requires monitoring. And this is what server guys really know. So, the second piece of code is the 4n code, the one you upload to the platform. And we have no idea if it has been reviewed at all. Have you tested or not? You are just uploading this to the image. So, it even more requires monitoring during the runtime. So, what kind of problems we may encourage? There are memory leaks, file descriptor leaks, bugs. In terms of bugs, I mean everything which may call you, which may make your service behave unexpectedly, which may make your server crash. There are some boot loops. If you try to reboot your platform that you may enter, some boot loop is common, especially during updates, or many other. So, it means that our solution has to be extendable to allow you to customize your monitoring infrastructure. So, how to fix those problems in a runtime? So, we may try to restart service from time to time. It may, it's also a common practice from a server war that my web server seems to be running fine for around a day. But then it starts using all the RAM memory. So, how to fix this? Well, you should go to the code and fix it. But then you open the code base and you say, oh my gosh, it means line of code. I'm not going to find the bug here. Okay, so let's just reset it every day. Then you have some fixed scripts because services may also corrupt its data. So, you have some simple fixer, no change inside the service code itself, but change some script that's going to fix the data base or something. You may have the recovery code. It means that you may have a separate intramfs where you can boot and in some well-prepared, well-defined environment try to fix your problem. In the end, or maybe firstly, you should report the bug to the developer. It's important to not only restart the service but also let the developer, the one who wrote this piece of code, know that, hey, you've got a bug in your service. You should try to fix it so we can upload the better version for the next release. And of course, any other method that you can think about. So, as I told you, keeping platform up and running is a well-known issue in server work. Those guys have really hard contracts for availability of their service, like 99.000 and some number of 9%. So, it's really hard to keep your service up and running for such a long period of time. So, how they are doing this? The answer is monitoring and service restarting. The first method which helps us keep our service up and running is system D. Who is using system D? Yeah, that's what I call it. So, it's common and it's almost free to use. System D provides two nice options to fix your service or at least to restart it. The first one is the restart. It means that system D is going to automatically restart your service based on the conditions you choose. The most common version is to, for long running service, to restart the service every time when you enter failed state. Failed state means that it has been killed by the signal, it has exceeded with an unclean code or something like this. The other method of fixing the service is to use own failure option. Own failure is a unique, which will be automatically started by system D when a service fail. So, instead of simply restarting the service, we are running some kind of fixing script or some kind of developer notification mechanism, something like this. But it's not enough for service guys. That's why there are some more extensive monitoring tools like Nages. I think it's one of the most popular solutions. So, the core part of this software is scheduler and web interface. Why we need a scheduler? Well, the very basic principle of Nages is to run some script every 10 minutes, every one hour or every minute to check the status of the service. We just check the status and collect the result in our database. If the service is failed, or for example, some server is not responding for the ping message, we are changing our state. There are five different states and during this change we are generating an event. This event may be handled by some event handlers. Usually, event handler is a simple shell script that is going to do the job. So, for example, if you do the ping test of some server, you may try to handle the fact that it went from the OK state to soft critical. It means that server is done. So, you write a shell script that is going to do the power cycle on the server to reboot it and try to keep it up and running and obviously notify the administrator that something is wrong with that machine. The other way of checking the status is passive checks. In terms of classic plugins, I mean classic scripts which are checking the state, they are being run by the Nages itself every 10 minutes. Passive checks are kind of different because it's independently running a script or a service that submits the status when it would like to do this or when the status is changed. So, for example, if you would like to trace the amount of RAM memory used, but only with 100 megabytes resolution, you may report the status to the Nages only when passing 100 megabytes of RAM usage, not every 1 megabyte change or not every 10 minutes. What's the disadvantage of active checking is that you have no idea what happened between the checks, so you've got a 10-minute gap for your measurement. It's in a kind of form of Nages and it has been created to be more distributed. There are a lot of, you know, set-ups, separate databases, et cetera. People say that it's their both equivalent, but I think that this one is much more prepared for being distributed. But it's still very heavy. It provides quite a similar functionality. It has additional mobile clients. It doesn't matter for us because we are doing some embedded stuff. The next one is that. What's different from our perspective is that it has much more events and it has a rule engine. In Nages and it's in gap, you just say that if state of this check changes from okay to soft or to critical or to soft morning, then execute this script. Over here, you may write a rule that will combine status of multiple services, of multiple checks, and execute the script, for example, only if five or 10 services is done, not only one. So in Nages, you have only piece of state and here you may consider the state of whole system of all checks. But it's still really heavy. So in general, several monitoring tools are really good. In addition, they are web-scale like the MongoDB. So it's really good piece of software but not tailored for embedded needs. That's why we just couldn't take it and fit this in your pocket. But we needed to develop something which was just small. First of all, we did all those stuff as the central decision-making point. It may be distributed among multiple nodes but all the devices, all the checks are being uploaded to some cloud, to some kind of central server. We don't want to make a decision about your devices. We don't want to collect your data. So we would like your product, we would like your service to be independent. That's why we don't want to keep the state on some remote machine but on your product itself. So it's a single machine monitoring, no web interface, the delays should be low and we should not set any pooling interval. So we focused on passive changes. For example, service falling, change of number of file descriptor or something, instead of executing the check every 10 minutes. Why? Because it's more power efficient to make the check only when the status is checked. We tried to make some measurements and it turned out that most of our IoT devices are usually idle. So it means that they are in deep sleep state and there is no point in waiting them up every 10 minutes or every 5 minutes or every 1 hour just to check the status which didn't change. So what we have developed, we created a full deep. From our perspective, it's a generic event processing framework. It consists of single demo which has as usual input and output. Input is a change of state in the system. For example, system D may notify us that some service failed. So this is our input. It's captured by some listener and reported as an event to the core. Every event that goes through the core is being saved into the database for further analysis. Then we put it to the decision maker. What is decision maker? Decision maker is a business logic that decides what you should do in this case. It may contact the database, check the history. It may contact some external server. It may show user pop-up or anything or send push notification to some mobile phone or something. It doesn't matter. Full deep is a plugin architecture so you may write decision maker on your own. When the decision is made, so we now know what we should do. There are implementation of actions. Currently, we've got only four actions. Restart the service, recover the service, reboot and reboot to recovery. Because implementation of this action may vary between different platforms, between different devices, there are kind of abstract. So the business logic only decides what kind of action should be taken and this part decides which implementation should be used. So what kind of listener do we have now? First of all, it's system deal listener. It's subscribed on private system debuts. Why? Why private? Because debuts demo is a service. To be more precise, it's a critical service for our platform because most of our services is using debuts for communication. So we don't want to get notification from system deal via debuts demo. We want to get this notification directly. So we use the private system debuts which doesn't involve any other demo to core communication. So system deal may send us a signal whenever some service fails. When we got this signal, we are reporting events even to the core and then run our business logic to decide what we should do with this service. Another type of listener is AuditListener. Why do we use AuditListener? Because we would like to keep resources usage on, let's say, we would like to catch resource leaks like file descriptor leaks, like memory leaks. So for every service on our platform, we set the memory leak, the file descriptor leak, all the standard error leaks. Then we use AuditSysCo to catch the moment when service gets some error message from a suitable Cisco. And then we report this to the core and decide what to do with that service. What's important here that this solution is not perfect because service gets an error. As you know, handling error in service, especially errors which are uncommon, like EM file, who have ever get an EM file error in his service? Please raise your hand. So not so many people, so you see that it's not common. That's why error handling path are not well tested and usually they contain bugs. The perfect solution would be to get a notification that service hits some resource usage level without returning an actual error to the service itself. Apart from this, it turned out that for our platform, AuditSysCo is not free. It's rather expensive because if you use AuditSysCo, this slows down your open Cisco for a 33% for a cold file and up to 45% for a cold file. Cold file means that the file is overly cache in I know that you catch. So that's why we developed Error Limit Events which are a lightweight mechanism of notification to user space when service reaches even amount of resource usage. So we measured this and it's 5.6% for a cold file and 1.6% for a cold file. Unfortunately, after I developed and posed an RFC, I have been moved to other project. So if there is someone who would like to pick up those patches, feel free to do this and try to mainline them. The next part of ColdD is Decision Maker. Currently we've got three of them just for a showcase. First one is VIP Process Handler. VIP Process is for example D-Bus Demon. When D-Bus Demon dies, there is nothing to do on this system. So it's time to reboot. The standard recovery means that if some service fails 10 times, it's being recovered. Recover means that if it defines some recovery script, we are running it. If not, we are simply restarting the service. After 10 times of recovery in the service, we are doing end reboots. When we detect a reboot loop, we are entering the recovery and maybe sending some push message to the phone, something is wrong with your IoT device, try to check it. The last one is Decision Maker for resource violation. Resource violation means messages from Audi or from Aerolimic events. If we detect that some service has resource violation, we may generate a report for the developer, send, hey, your service is using much more resources than you declared, you should take a look at this. And then we are restarting a service. Actions, like I told you, we've got to recover the service, it means run the recovery unit and restart the service. Simple restart, reboot, and we've got three different types of reboots. The first one is Forced, we just use the Cisco. The second one is Reboot, Reboot using SystemD. Once again, we use Private Bus to talk with it. And the last one is Titan Specific because on Titan, platform is being rebooted using device. Because the last one is unreliable, not reliable, we are trying to use it and then if it fails, we just go to the previous option. The last one is Reboot Recovering. It's simply pretty the same as Reboot, but with some special parameter which make Titan Reboot in a recovery node the separating parameters. The last part is Database. Like I told you, every event is stored in the Database. Why? Because FoldD is a demon that is running as a root. FoldD is a demon that is rebooting your platform. So, as a user and as a developer I would really want to know who is rebooting my platform and why. So that's why we decided that every event, every action should be traceable from beginning to the end. So every event that is reported by our listener is as a trigger stored in the Database. Then when the decision is made it is also stored in the Database and belong from executing every action. So for example, we failed to contact device and we are using system to reboot or something like this is also stored inside the Database. This is due to implement a decision maker which not only checks the current state of the system but check also the previous action. So we can, for example, detect these reboot loops. Initially, for our implementation of Database backend we choose EJDB. It's kind of objective Database without external demo. It's really fast but it turned out that it's not efficient in terms of storage. Not only in terms of storage for the Database itself because initially the Database may be tailored to one megabyte but the problem is that EJDB binary together with Tokyo Cabinet algorithm which is used takes around 700 kilobytes and it turned out to be too big for our platform. So that's why we are switching now to SQLite let's say the default implementation which we used was around 10 times 10 times slower but after changing and tweeting the Database schematic we reached only 2 times 2 times I don't know what sorry I need to not make the English speaker. Okay, so summary, how does FoldD itself look like? The system D notification, our input to system D listener and to output listener from the Linux kernel what's important here is that before FoldD we didn't use output on our platform so we eliminated the output demo. Why? Because it was slow because it took some space and because no one ever used it. So there was no reason to keep it we are directly listening from the output. Then we've got the core part which is simply looking for the events database actions and decision makers. So summary, what we learned during this year? First of all, server monitoring tools are really useful. If you've got your own server try to install and play with them there is a lot of benefits you may see what your server is really doing. It's important not only in terms of kind of full tolerance or banking your grass green, it's important in terms of security. Because like my friend told me in internet there is no machines that does nothing. All machines which are connected to the internet may be subjected to attacks and be used for example to mine a bitcoin. So if you are using IoT devices remember to monitor also them because you know both NetMirai both heard about this so you know what you can use those IoT devices for. Unfortunately those tools are too big or small embedded platforms that why we developed FoldD. We try to keep it generic so if it fits your use case it's great try to use it, it's available free and open source, available on jid.tizen.org if it doesn't fit your use case try to develop some plugins it should be easy maybe it would be better than writing everything from scratch. The next one is that how did Cisco is not free? Why it's kind of surprising because a lot of server is using audit to trace activity on the server but it turned out that on ARM platform audit Cisco is much slower than on x86. On x86 it's around 5 to 10%. So on ARM it's at least 3 times slower than on x86. In JDB it's pretty fast so if you need a fast objective database or a platform try to use it. Unfortunately you have to have some spare storage for both library and the database. And that's all. Questions. Well FoldD is not using any distributed system. The goal is that FoldD should operate independently. It means that we don't want to have some center of decision. We don't want to have some center of cloud to collect the information. We want only reports about resource violations or about some attacks but it's up to the manufacturer because we are releasing this image What sort of face? Probably some kind of IoT gateways. Obviously it depends how you configure your image I mentioned that there is a web configurator and you can put anything of this. Some more? Yeah. We didn't need to handle C groups because C groups V1 already provided a kind of that functionality at least in terms of memory C groups. We didn't develop such plug-in. We didn't develop such a listener but we've been testing this for C groups and like I told you for the memory in C group there was a functionality similar to error limit events It means that you can subscribe and get notification when C group is using some specific amount of memory. So it's really cool but it has been dropped in C groups V2. Some more questions? Yeah. Are you familiar with Kiosman key? No, I'm not familiar with it. So just to repeat, Kiosman key seems to be a system that is randomly shutting down some servers just to check if they're able to recover. I'm not available maybe I'm not aware of this solution and we didn't try to use it. Thank you. So the question is if we evaluate money for this. We tried to evaluate money but it also didn't fit. I didn't mention about this because it's kind of similar to the other stuff. So all the server tools are common. They have some different features but the overall idea itself is quite common. We don't have a scheduler here, we don't have active checks we are using just passive checks just signals from the system. Any other questions? So the question is if inside there is a kind of rule engine well yes because we've got the database and every decision maker can contact the database and check the current state of the system and not only the current state but also when checking the system. So we need to check which time the server is playing Any other questions? If there is no question so thank you for your attention.