 Okay, welcome everybody. Thanks for joining this session so late in the afternoon. I'm an absolute greenhorn, so I'm first time at the Linux conference, first time speaker, so. But I'm gonna talk about automating and managing an IoT fleet using Git. First, a few words about me. My name is Matthias Lüscher, and already as a child, I prefer to automate boring stuff like a ball track. So it's utterly boring to operate such a ball track. I instead created an elevator that plays for me. And the same holds true for my professional life. So I don't like to do the same job over and over again. So when dealing with IoT devices, I try to automate as much as possible. Then another thing I do instead of attending a lot of courses and earning some training awards, I decided to start my own open source project. It's called EDI, and you will learn a bit more about it during that presentation. So EDI is my private project, it's open source. And at Schindler, we are manufacturing elevators and escalators. Unfortunately, the elevators and escalators in this building are from Kona, what the pity. That's the big playground for EDI. I live in Switzerland, and I work as a principal engineer at Schindler. During my spare time, I enjoyed the time together out in the nature with my family. It's also the boys, my two boys, they like to go biking, hiking, skiing, whatnot. And if you'd like to contact me, there is the email. Okay, so what's my mission? I would like to automate as much as possible in an IoT environment, including OS image builds, testing, configuration management, and fleet management. We will learn more about this during this presentation. So we start with continuous integration. The goal is to build an operating system image for an IoT device and dispatch it to a device, to the selected device, and test it all in an automatic manner. Then once we have this operating system in which we can put it on a device, but that's still stupid, so maybe it's a headless device then, and we want to turn it into something else that fulfills the use case we need. That's the device management. So taking care of an individual device. And finally, I prepared a small sample for you. It's not a big shindler playground. I prepared a sample fleet, where we do a continuous delivery to this fleet. So we keep an entire fleet up to date, based on the description we have in Git. So let's start with the continuous integration. Again, we would like to build an operating system image for an IoT device that gets automatically dispatched and tested. So, how does this look like? Down here, we have a private network. The devices here down here are on the private network. Up here, we have the internet. And here, we have a self-hosted GitHub runner. So how does it start on GitHub? We enter some values, like we would like to test on this device. What kind of configuration we would like to test. And of course, the idea of that device. Then, the job will go to our self-hosted runner. The self-hosted runner will fetch some repositories, namely the configuration of our operating system image. Then, the build on this Raspberry Pi starts for the operating system image. We fetch, of course, a lot of Debian packages. We fetch them, put them on to the self-hosted runner. And finally, we have an operating system artifact. In my case, it's a Mender artifact. So we upload it to hosted Mender. And finally, we tell Mender, hey, please download this artifact to this test device. Or let the device, better to say, let the device know that there is a new artifact that we want to dispatch and the device fetches it, does a proper A-B update, and the new operating system is running here. So no flashing of CF cards and so on. Stuff runs here. And as the second last step, we would like to test. So we run a test suite here, but we would like to test this device. And here, we are on the same network. So we can easily go into the device, run some tests here. In my case, I'm using test infra. And finally, when everything is done, the result goes back to GitHub. And we see here, oh, we have a new operating system image and it posts all the tests. So it's extremely convenient, no manual steps involved. Yeah, let's dig a bit into the details. If you can't read the slides from the back, they're online available so you can watch them at the same time on your mobile phone or laptop. Again, what did we enter here? We entered the repository that contains the description of the operating system image of that device here. We take the master branch of that repository. We tell, hey, we would like to have this configuration here. And finally, this is the identity of this device here. It's a UUID that points to a certain device that is registered on Mender. And of course, we put the checkbox to test the stuff. More often than not, I learned that, well, these Debian guys, they have this golden image approach. They put something on the device, then they tweak a bit with some bash scripts and maybe some handmade commands. And then that's the golden image. Yeah, I must admit that's not such a clever approach. Therefore, we start from scratch. And we're using the standard reboot strap tool to get a minimal root file system. Of course, if you have an x86 CPU, you cannot run this directly on your CPU. So you'll have to use QMU for emulation. And then we do some additional steps. And here is a difference of the EDI tool compared to other tools. I'm not using a CH root. Instead, I'm using an LXD container. I put the whole stuff into an LXD container. And I don't start system D. That's also an important thing I don't do. So I use Dumbinit that basically nothing starts on this. So I put it into an LXD container. I run it without running. And now we need to tweak it a bit. We need to install additional packages. Maybe we need to change some configurations and so on. And for this, I use Ansible. Then once Ansible is done, we stop the container. We export it. There are some additional tweaks that we get a Mender artifact or an image that you can flash to the device. But finally, we have a complete operating system image. Maybe even some documentation of the image as an ISPDF document. So these are the artifacts we get when we end up here. Now we have this image on the Raspberry Pi. Of course, we want to bring it now to Mender dispatch it to the device and finally test it. And what about testing? We're dealing with IoT devices. So let's treat them as servers. And there are good tools like Test Infra. You can easily run the Pi Test Infra stuff here and stuff gets executed or gets done here. And here I have several examples. So for instance, I test the root device whether it got properly resized. Then I check whether it's completed. I check whether the demand points are correct. And another test that I have, it's not depicted here, is just checking. Is system D properly started? Did all services properly start? And you can add as many tests as you want. So it's really a convenient set up also for testing. As you can see, there is no overhead, almost no overhead in writing tests, although you do this remote testing from here to here using SSH. So really worthwhile to look at. One important thing is security. Of course, I don't want to store passwords and so on in the Git repository. We'll be back practice. On GitHub Actions, you can store them as action secrets. And then you always, whatever step you do, you have to think about how could someone potentially hack you with a pull request and whatnot. And I guess this is pretty properly done on one hand side from GitHub. Then you also have to take care that you read the instructions of GitHub, especially if you have a self-hosted runner. So if you have a self-hosted runner, you have to make sure that you don't have a public repository. Otherwise, this would be insecure. So if you are taking a look at EDI-P, EDI-CI repository, there is an EDI-CI and the EDI-CI public repository. And that's just because we have the self-hosted runner down there. So the self-hosted runner connects to the EDI-CI repository and not to the EDI-CI public repository. Okay, now we have this operating system image. We can dispatch it to a device, but we have still a dumb, headless device. Let's now put the certain use case on the device. And according to the Eat Your Own Dog Food strategy, I would like to turn the Raspberry Pi into a GitHub Action Runner. So the GitHub Action Runner we have been using before, I would like to turn it into, I would like to do it that stuff automatically. So again, we have an individual device that we would like to manage. As a first step, I enter some values into the MenderConfigure thing. And this is the playbook I would like to run here. Then which version, in my case it's the main version. Then I have to indicate which repository I would like to connect to and which GitHub account and I need an access token. Then the configuration artifact makes it down to the self-hosted runner. And here the self-hosted runner, the image that I have put here is a kind of GitOps enabled, I call it. So it knows how to deal with Git and it knows how to deal with Ansible. Then as a first step, a script down here fetches a playbook and analyzes the playbook, oh, we have to fetch roles. And then Ansible starts to execute the whole playbook. We will fetch a runner binary which is a .NET binary. I don't care about it that much. We fetch it, put it onto the device. We also have to fetch some Debian packages that may be on top of it. And finally, the runner registers itself to GitHub. So we don't just automate, the image builds will also automate the infrastructure we need. Okay, how does, for instance, the playbook look like? This is the whole playbook that gets applied and turns my nice device, my Raspberry Pi 4 into a GitHub action runners that knows how to deal with EDI. So the two roles I'm applying is Ansible GitHub action runner. I am a lazy guy, so I just took this role from internet. That's the beauty of Ansible. You have a lot of things you can already pull in. So this installed the GitHub action runner. And on top of it, I want to enable this device to do some EDI commands in order to build the operating system image. So I created my own EDI installer role and I also pulled it in and I applied it. So in the end, we have this runner. That's the final result. The runner reaches itself on GitHub. And now we are ready to use that runner. Let's give a second example. Let's assume we are a company that does some kiosk terminals and we would like to turn a headless device into a kiosk terminal. So first, same procedure, it's just turned by 90 degrees. We give the playbook that we would like to apply. Which version? Here it's main. But if you would like to have it fully reproducibly, probably give a certain version. And here is the URL. I would like to display on that Raspberry Pi. So again, the config artifact makes it to the Raspberry Pi. A playbook gets cloned. Some roles get downloaded to the Raspberry Pi using Git. And then the playbook starts. It will install maybe Firefox and LightDM and whatnot. And as a result, we have this kiosk terminal. So now we have been able to build an operating system image in a fully automatic manner. We have seen how we can tweak individual devices. And now we would like to apply this to an entire fleet. So this is the demo fleet I have. As said, that's the small fleet, the Tome, and the big playground is the Schindler Playground. So we start from here. That's a Raspberry Pi 2. It's maybe a legacy device. It has the role of a legacy device. Then here it's a compiler, IOT gateway. It would maybe be a great Wi-Fi 6 hotspot. Then we have two Raspberry Pi's that are serving as a kiosk terminal. This guy here is a very sideboard. It looks like a development device, early development phase. Then device number six is the GitHub Action Runner. And finally, we have another kiosk terminal. So we have different use cases. We have the different devices, and that's the typical environment you have in a company. Okay, how can we now tackle this fleet? A short introduction, very short introduction, one slide about GitHub. So what is GitOps? It's, from my point of view, a new concept password in the IT industry. The goal is to automate as many IT operations as possible. The automation shall be based on a fully declared and versioned target state. Git is usually the tool of choice to store the target state. And a bunch of tools are responsible for applying the target state to the infrastructure. And last but not least, GitOps is not only applicable within the IT industry, so it also fits perfectly into our world, the embedded world, the IoT world. So we just have to do some things a bit different. For this, I created a Git repository. And now if we are going back to the fleet, we try to attach this fleet to a Git repository and that's the way we do it. Here we have the develop branch and we have maybe some feature branches. For the sake of simplicity, they are not depicted here. Here we develop the stuff, here and below we develop the stuff. We have a lot of changes. And these changes might go directly to the development device I have shown before. If it breaks, probably the developer will take care of it, but most of the time nobody will notice. If we believe that something is okay, we can merge it to the main branch. And as soon as it is merged to the main branch, it will be dispatched to the kiosk terminal that is maybe close to the developer office. So still the developers will see if it fails, but not the much more people. And if it still doesn't fail here, then let's merge it to the cannery branch. So these are the canneries that they used in the coal mine. So that's maybe the terminal that is at the management building of your company. And if it fails there, maybe you get a bit more feedback. But if it still doesn't fail, let's merge it to the production branch and automatically will be applied to all the rest of the fleet. Again, we are doing changes here and above here. No changes anymore, just merges. Looks nice, but how is it being executed behind the scene? So we start here with GitHub. We have a change on the branch. It's a commit on develop. It's just a merge on main and above. And GitHub will notice, okay, change on the develop branch. Dispatch a job to GitHub. And then on GitHub, I'm again a lazy guy. I use Ansible here, execute an Ansible playbook. The Ansible playbook will check, ah, these are the facts about my fleet. It does not ask that I have devices directly. They might be turned off or whatnot. So we get the facts of the device. We check, oh, an OS update is required or not. Then we dispatch it or not. And maybe we have to update the configuration. So an important thing compared to a typical server word is that we have a proxy in between here, the management and the fleet. So we don't touch devices directly. How does the job now look like? It's a GitHub actions job. It can either happen on a push or I can also trigger it manually. It runs on a hosted run by GitHub. And we check out as a first step the fleet management playbook. Then that's a small thing, small missing part. For Ansible, I install this. It doesn't matter at all. But then we play the playbook. And again, we already have a GitHub action that knows how to deal with an Ansible playbook. Now, how does the playbook look like? First of all, there is some checking whether we have the right Ansible version. The fun part happens down here. The first role gathers the fleet facts. And usually you would go directly to the device in order to grab the facts. We don't do this as the devices might be offline. So we gather the facts from Mender in our case. Then the second step is to install an operating system. And I'd only do it if the device is subscribed to the branch that is being applied. Then last but not least, we apply a configuration. And that's the configuration where we turned a device into a kiosk terminal or into a GitHub action runner. And now how does the inventory of the fleet look like? So basically, these are just UUIDs of the devices. That's the full inventory of the fleet. So in my configuration, I have seven UUIDs. And that's it. Now we look at an individual device configuration. By default, the devices are subscribed to the production branch. In this case, the device is subscribed to the main branch. I would like to turn it into a kiosk device. And here's the URL I want to display. That's already everything to tell, hey, I would like to have a kiosk device. If we look into the details on whether an operating system update is needed or not, then we have such a table. We have different device types, Raspberry Pi 2, Pi 3, Pi 4, Varyside device, and CompuLab device. And here, for instance, it's gonna be a Raspberry Pi 3 or whatnot. And here, then I can see, okay, that's the corresponding image. We can compare again between the stuff we know from Mender and with the desired state. And if it's needed, we apply the update. Two remarks, you have maybe noticed that this is kind of a fire and forget approach. So we tell Mender, hey, install this operating system and update this configuration. The devices might be offline. And this stuff happens a lot later. Hello? So what is very important is to monitor such a fleet. And this is completely out of scope for this presentation. And one other thing, let's say you have 200, 300, 400,000 devices, you would probably not write the inventory into the git repository because this would just give a tremendous amount of commits. So the inventory will probably go into a separate database. And also the individual device configurations will probably make it into a separate database. But for the sake of simplicity, here I've put everything together in one git repository. So what are the conclusions out of this? One very important thing is that with this approach, everybody is working on the same stuff. It's not like the developers did something and then they wrote the Chyra ticket for the ops team. Hey, please upgrade this. And the product owner has attached an Excel sheet with some versions inside. And then the ops team will maybe, or maybe not look at it and dispatch it. So everybody's working on the same repository, talking the same language. Then we have full traceability. We know exactly what made it to the main branch, what made it to the canary branch, and what made it to the production branch. You can also do easily diff. Then beyond the developer branch, we don't introduce any changes anymore. There is, as you can see, a very high level of automation. So almost everything is automated. And what is also an important aspect are the stage drawouts. So first the stuff goes to the canary branch and then later in time it makes it to the production branch. So you don't break the system in one go. And like this, there is almost no room for human errors. And on the Github or Bitbucket or whatnot, you can configure who is actually allowed to do the merge. Who needs to approve the pull request? And maybe there are not only humans that need to approve it, but also automatic tests and stuff that you get back from the fleet. So the quality should get a lot better. Then we have a powerful toolbox. It's suitable for a huge fleet. So it scales all things to scale. The components are proven in use. It's fun to work with. And last but not least, you can change almost every component. If you don't like Debian, take Yocto. If you don't like Ansible, take Solstack. If you don't like Github, take GitLab. If you don't like Mender, take SW Update. If you don't like Python, use Go. So every component is exchangeable. Here is kind of the stuff behind the scene. What kind of Git repository? So the title was automating and managing an IoT fleet using Git. So here are the Git repositories that were used behind the scene. First, we have this orchestration stuff for the CI pipeline. Then we have three different repositories for the operating system images. So Raspberry Pi for the very side and for the CompuLab. This is for the continuous integration. For the device management, we have playbooks and roles. So we have the Cindy Kayas playbook that takes role Ansible Kayas. We have the Github Runner playbook that fetches a role from somewhere from Github and the one that I've written by myself. And last but not least, if you want to manage the entire fleet, we have this eddcd repository that does the continuous delivery. So that's also maybe a side note. All of the stuff that I'm presenting here is open source. You can take a look at it. You can dig into the details. That's not a problem. Here are a few links. So I've written several blog posts about this stuff. They also go a bit more into the details. And finally, I hope that the presentation was unclear enough that there are still a few questions. So we should still have some time left for questions. Yes, please? Yes, that's a very good question. I tried to move back to the slide here and I'm supposed to repeat the question for the external audience. I think this could be a good slide here. So the question is whether this device down here operates in push or pull mode. Maybe I have to switch even a few more slides back. So this was actually the action runner. Are you more interested in the setup of the action runner? Yeah, so this would be the setup of the action runner. And here I'm reusing the mentor configure thing. So down here, we have a mentor client running and this mentor client pulls mentor and actually the configuration artifact is pulled down here like with Kubernetes. It's pulled down here and then the artifact gets unpacked and a very small shell script gets executed and the shell script then does a git clone down here. So also pull and pull some rows. And that's how it is executed. Yes, yes. Okay, that's a good question. So the question is how, let's say you do this merge here. You do this and how does this propagate to the fleet? And now I go here. So we have the merge here and then as a trigger, this triggers a job, run on a GitHub action runner, a hosted one. The GitHub action runner compares the fleet facts taken from mentor with the desired state. Sees, oh, we have to upgrade the operating system. Then it schedules. That's what I meant with just fire and forget. It schedules an update here on mentor and then we have again mentor client that is pulling for updates. So because most of the time we have a barrier here either because of some networking of wire network or of the MNO, so we cannot push something. So the IoT device is always work in pull mode. And that's also implemented here by means of mentor on top of mentor. It has a nice side effect because I do the configuration artifacts and the OS updates both over mentor so they are nicely synchronized. It does, it cannot be that an OS artifact kind of messes up with a configuration update. Okay, are there other questions? Yes? So I must say this is a bit ahead of what we do at Schindler. So for this setup, it's seven devices. Thank you. At Schindler, we have a really huge fleet. So 100,000 of devices. So it, that's also why I mentioned here, such things will need to be offloaded into a separate database, but this is perfectly possible with Ansible. And you can also group by country, you can group by device type, you can group by whatever you want. Further questions? Eric? How do you good-struct the first device on the initial seed with a mentor on top? Ah, so let's go here. So the question is how we get such a device alive down here? For the first time. So what I need to do here is that if it's a Raspberry Pi, I need to flash a complete image to the flash card and it's a reduced image that gets expounded during the first boot and then registers a mentor and then we have a new UUID. We have to enter this UUID into the fleet management repository. That's how it happens here. On this device, I don't even recall it exactly how to do it. I have not done it for a long time. On other devices, we have a chindler, we use UUU, so this NXP tool. Then there are Android, Flasher for Rockchip, so it depends. But we always make sure that we have a complete image, including bootloaded at the flash to the device. So the output of the chain I've shown here is not only an artifact that can be updated over the air. It's not only the mentor artifact, but it's also alongside it, documentation and this big artifact that makes it initially to the device during factory setup. Does this answer your question? Thank you. Are there other questions? If not, we can finish on time. Almost on time, yeah. And if you would like to see live demo, how it really works, you can of course contact me. I'm still here till Friday, lunchtime. Thank you very much.