 Hello, good morning. I'm Irene Diaz, a software engineer at Red Hat and this is my colleague, Sarita. Hello everyone, good morning and thank you for getting up early in the morning to come for our presentation. And today we are going to talk about two onboarding options, that one with ignition and other being, one being FDU and other being ignition for IoT devices. So this is today's agenda. We are going to discuss what is onboarding in broad terms. We are going to present to you what are the options that Fedora IoT has for onboarding as well as future steps and what is coming up for Fedora. So what is device onboarding? What do we care and so on. So device onboarding happens after we make an image, provision it and then we have to give the devices all the configurations and secrets that they need in order to perform its final need. During this onboarding process we also hope to get a connection from the device to the final IoT platform. The platform is what the device needs in order to get updates and some other software and so on. This whole process of course needs to be secure in each of the steps and there shouldn't be a need for human intervention so the whole process would be hands off. So if you can imagine if we are trying to onboard a thousand devices in the middle of a mountain an engineer cannot be bothered to go to the onboarding side. We simply need someone to ship the devices over there, forward them and everything else should be automatic, secure and no need for human intervention. Fedora has two different options in order to do this. So Sanita is going to give an overview about them. Yes, so let's get a brief introduction to these two options. What is FDO? It's basically FIDO device onboarding protocol. It's an open industry standard protocol developed by FIDO Alliance. FIDO Alliance is basically formed by a couple of tech leading companies such as Intel, Google and many to name. A little bit of history to FDO. FDO was initially known as SDO which is a secure device onboarding developed by Intel and later it was contributed to FIDO Alliance to become FDO with the announcements. Whereas Ignition, it's a part of Koro's ecosystem which is internally developed by Inhouse and Red High and maintained by a group called Koro's working group. The basic difference between the two options, I think it's in their infrastructure because of the distributed nature of FDO. It requires a proper complex infrastructure, maybe a server side and the client side implementations which we will see in details later. Comparatively, I would say Ignition has a simpler infrastructure which basically revolves around one configuration file. Although this configuration file, it can be linked to other configuration files which you can run at the, which will be merged at the runtime. So yeah, that's the basic difference but we'll see the more details as we go ahead. So how do FDO and Ignition fit with Fedora IoT? So in order to make a Fedora IoT image, you would be using Image Builder. There have been previous talks in this conference about what is an Osglue Composer, Image Builder and so on. So if you haven't seen them, I would recommend you to check those other talks. So you would use a blueprint in order to make a Fedora IoT simplified installer or a raw image. Then use Image Builder to produce that image. And then once the image has been provisioned, you would use either FDO or Ignition, probably both at the same time. And at first boot, the devices would be onboarded. And Serita is going to explain a little bit more about Ignition. FDO. So let's see some of the key features of FDO. The whole idea of FDO, it works around providing maximum security during the end-to-end onboarding process. So of course the first thing comes is no manual intervention. There comes a zero touch where are the maximum possibilities that security threats can be there. FDO works by the principle of establishing a chain of trust to explain this a little bit more. I would like to go to the next slide where we have the whole... This is the FDO protocol which is going to tell us about in a bit about the flow and how it works. But a few of the concepts to explain the chain of trust. So FDO can be seen or divided into two phases. One happens at the device manufacturing site and the other it happens at the onboarding site. So at these two phases, they can be in different timelines. So there has to be security in terms of its ownership which is provided at each step. The ownership of the device, it can change during the whole time a number of times. So apart from this, one of the features that's most spoken about FDO is lead binding which solves a lot of problems related to onboarding in the industry. So what is lead binding is... So when the device is manufactured, the device does not know which platform it is going to get onboarded or the configuration that it is going to onboard it with. This can be decided at later stages when the device actually goes to the onsite and that's when this feature allows flexibility in terms of device management which is very useful. And that's the feature and apart from this, FDO has other features such as reliability in terms of if there are a number of devices, every time FDO works, it works the same way for all the devices and even though the protocol is pretty complex for the end user, it's very simple to use. Even a skilled worker can power on the device at the onboarding site and doesn't need to know what happens inside. It is platform-independent and cost-saving, I would say. I mean, the infrastructure is expensive since it's complex and it has hardware software requirements, but it's a one-time job and it saves a lot of manual processes and even simplifies the errors that could possibly happen when our traditional manual onboarding is there. So now FDO in a little more detail. As Sarita said, there are two places where the whole onboarding process happens. On the left side of this slide you see the device initialization part which happens at the manufacturing side and then we have the onsite onboarding part. So during device initialization, there are two entities that are important. First of all, the device and then the manufacturing server. The device would contact the manufacturing server in order to get a good ID and two different, I guess, documents would be generated. One are the device credentials which will be hosted in the device and those device credentials tell the device where it would find the rendezvous server later on in the process which is on the onboarding side. The manufacturing server would get an ownership voucher. This ownership voucher is a cryptographic document and using that document, which is a 509X certificate, the FDO protocol allows you to verify that you are the owner of the device at the current point of time. So in this point of time, the owner of the device is the manufacturing server. The onboarding is still going on so the device would be moved to the onboarding side and meanwhile the manufacturing server would transfer this ownership voucher to the owner onboarding server. So right now, at this point of time, the owner of the device would be the owner onboarding server. As you can see in these lights, the communication starts. The device contacts the manufacturing server. That is what the arrow means, but the connections are uncertain and so on. So the device reaches the final point and then the owner onboarding server would transfer this ownership voucher to the rendezvous server and in the rendezvous server we would have a mapping from device GUID to an IP and that IP is what the device will let it use to contact the owner. So the device finally reaches its final step and it would use the device credentials in order to find where its rendezvous server is. So it would contact the rendezvous server. The rendezvous server would check if that guy GUID makes sense to him and if so, it would tell the device where its owner is. Then the device will contact the owner. They would establish a chain of trust based on the keys that the device has in its device credentials and the keys that the owner has in the ownership voucher. The owner would transfer some keys to the service info IPI server and the device would then contact the service info IPI server in order to do these final onboarding steps which is where the device would get all the configurations and keys that it needs in order to work. So how does it fit with the, you know, how does, I mean, how do we do this? So as we said before, in order to build a federale image you would need a blueprint in order to use FDO you need a simplified installer blueprint and the only thing that you need to add to this configuration apart from any other configuration that you might need is customization.fdo, I guess, part where you would place what is the URL to your manufacturing server because as we said before, the only thing that the device needs is just an IP to the manufacturing server and everything else would be automatic from that point on. So we would use image builder. If you're using the command line you would run Composer CLI, Compose Start Over Tree and then the name of your blueprint which is a Tommel file then you would select the type of image that you need any other further parameters. Since Fedora IoT is an OSC based operating system which means that it has a immutable file system you would need a commit in order to fetch the file system so you would use that and then the image builder would speed Fedora IoT Simplified Installer Image which is an ISO. You would provision the device, it would boot and the whole process that we saw in the previous slide would happen and the device would know what to do and so on. But I mean, we are onboarding the device how do I get the configuration and so on. DFDO standard has a normative part which are the service info modules. Each service info module is a configuration, I guess, module where you can put some atomic instructions and it works with a key value protocol so in this slide you can see that we have a couple of configuration things. This is a Jamel file so we can, for instance, establish an initial user. As you see there, I'm creating a user and giving it a password and some SSH keys. We can also move copy files from the service info API server to the final device. We can run some commands. We can establish a disk encryption in a given disk level and we can also specify if we want that the device wants to be rebooted after the onboarding process has happened or not. You have an URL over there where you can see which are the F-SIMs that the DFDO alliance is working on. Red Hat has a couple which haven't been published there but we are working towards making them part of the standard. And if you are interested, you can go to Federal IoT, FIDO, then by support RS because our implementation is in Rust in order to contribute or find any more information. And now with Ignition. So let's see some of the features that what is Ignition, what it can do and how does it work. Ignition can be described as the first boot configuration tool although it can be used for provisioning purposes. It basically runs in the Inutra MFS at the first boot. The things that you can do with Ignition involves things such as configuring users, disk manipulations which you can create partitions and enabling the system deservices. All of these features, they are supported by CoreOS but Federal IoT only supports a part of Ignition so not all the features that you can do with Federal IoT will see in detail what Federal IoT can perform to. And Ignition's configuration file, basically it's a JSON document which can be embedded inside the ISO or it can be kept at a remote location where it can be fetched during the onboarding process. Ignition follows a declarative pattern that means Ignition knows what stage it is going to be rather than how to do those things. So for example, if it's going to get some disk manipulations done, it knows that it has to partition the disk or what to do but how to do part, it gets it done by the OS, underlying OS tools and utilities so Ignition doesn't have to worry about how to do part. The nature of Ignition, it promotes immutable infrastructure meaning that the configurations once provisioned or after the first boot cannot be changed. If user needs to do some changes in the configuration, reprovisioning has to be done. This part is a bit expensive in terms of reprovisioning compared to FDO after updating configurations. At runtime, new configurations can also be applied. There is no need of reprovisioning the device. Also, the automatic configurations are deleted once successful provisioning is done from the VM metadata which is useful because the sensitive information is not accessible once the device is provisioned. It does support STTPS and config verification. That's a useful tool too. So, how would you configure Ignition in order for a device to be onboarded? As Alita said, you simply need a JSON configuration file. The current version is free for zero. As we said before, in Fiber IoT we don't support all the features that Ignition has. In our case, we enable file links and directories creation. We also can create system D units, users and groups as well as the other Ignition metadata configurations such as retrieving a remote file and so on. As you see there is a JSON file and this might cause a problem because, as you know, JSON files are not human-friendly. In order to avoid using JSON files, you have butane. So, butane is a transpiler which is a source-to-source compiler. Butane would take a butane configuration file pretending YAML which is more human-friendly and then you would use the butane transpiler in order to get the JSON configuration file. And why would we do that? Because while we are transpiling one configuration file to another we would get warnings if there are any errors and we also check based on the variant that we are defining over there. I don't know if you can see it but we are using the rel4h variant if the options that we are given to the butane configuration file are supported by the variant that we want. Currently, the rel4h variant is on version 110 and we are targeting Ignition 3.3.0. So, some examples. If everything goes well, you would run Ignition as shown in the first example and butane would speed a JSON Ignition configuration file if something goes wrong. For instance, in the second example, I forgot to establish an install section in the unit so butane would tell me, hey, we've got an error there and if I'm using something that is not supported altogether in the bottom example, you would see that butane would simply give an error saying that, hey, the file system configurations are not enabled for this variant that you are using. So, how would you use butane in order to onboard that device? You have two different options. As I said before, you can either copy the whole JSON file into a section in the image builder blueprint but instead of using JSON, you would have to transform it to base 64 and then, if you want to do that, you would only have the option to use, I mean, to produce a simplified installer so you would simply call Composer CLI choose the image type that you want for instance, the IoT simplified installer in this case and then you would get the image. The other option that you have is to specify, as I said before, a remote URL with the Ignition Configuration file with effects at first boot so if you choose to do that, you can either produce a simplified installer which is an ISO or a raw image which is a compressed raw image. In the case of the raw image, you would run the command that is there and then you would have the image. In both cases, you would have to provision the device and then at first boot, Ignition would work and the device would get onboarded. So after seeing FDO and Ignition, you must be wondering which one to go for, which is the option but to be honest, it's not the one-to-one comparison. It's rather two options that you can choose based on the project requirements. What security features are to be used in the project and of course the infrastructure based on the differences between the two options. So it's up to the requirements of the project and yes, the combination of the both can also be used together. For example, some of the features or some of the things that you can do with FDO and Ignition are you can create the user or disk manipulation or others that can be kept for Ignition and the rest of the protocol stays with FDO. So some of the future enhancements that are coming with FDO, one of them being the service info per device feature. So what's this? Irina earlier explained about the service info modules that we have for FDO. Currently, what happens is all the devices get onboarded with the same configuration file that we call it the base of the default configuration but with this feature, it will be possible for the user to onboard different devices with different settings if they want service-specific configuration. For example, for SSH key, we have started with that and if the user doesn't want the device-specific one, it still has the option to get onboarded with the default ones. Another thing is that we are trying to move to database storage. Currently, what we have is all the files, configuration file, which is ownership vouchers and the other configuration files are being stored at the file system, which is really not feasible when considering a number of devices. So database will be a good option. We are working on that. One of the things that we are trying to do is moving from VARP to AXEM. VARP is a web framework that we use in FDO, but it does not comply to FIPS. FIPS is basically a cryptographic standard that we use, and VARP internally uses trustDLS, which uses ring. Ring does not comply with FIPS, so we are moving to AXEM. We are missing a bit on the left side of the slides, but forget us. This is related to the future of Ignition. Ignition is developed by the courier team. As you saw, we don't support everything that Ignition does, so we are working towards enabling file system customizations during the onboarding process, because right now, you would do that in the image building process, and we have aesthetic partition tables, so you cannot change that unless you change the code in OSB Composer. All that you have been seeing right now, it will be coming for Fedora 38 as soon as we get back from this conference. That is the soonest part. Yeah, but it would definitely be for Fedora 39, and yeah, that's all. We'll have to take your questions now if you want. Thank you, everyone for listening. That's all. No questions. Thank you. Oh, yeah. So, we've been asked, where can we get those documentation or information? So, where? But I think that he's asking, where would I configure this? Am I right? Okay, we do have our documentation in the link that I showed you in the next slide. There is a file which is called the how to. You have examples over there in order to know how to configure this, and if you are interested in how the protocol works for our specific FDO fSIMs, because some of them are not part of the standard. Under the same repo, there is a docs folder where you have docs fSIMs and you have the specification over there. If you are looking for the fSIMs that are going to be part of the standard, which are the ones about the file copying, the commands, and some other ones that I'm forgetting about, those things are in the FIDO Alliance fSIM repo in GitHub. Does that answer? Yes, yes. Each fSIM has a set of configuration things that you need to configure. It's not the same to create a user to create a file. For users, you have... Can I set a password? Can I set an SSH key? Whereas with files, you have the options to set permissions, and yeah, that is the difference. Each fSIM has its own configuration parameters and so on. Thank you. Okay, another one. So we've been asked whether we... For ignition. Yeah, so... Let me... Let me check if I understood. You are asking whether we have thought about signing the ignition configuration file so that we know that the device would get the configuration file that it needs. We haven't thought about it to the best of my knowledge, so we'll take up that with the team and with the CoreOS... Yeah, and we'll tell you when we have the opportunity. Are we good? Thank you. Thank you. So, hello, everyone. I welcome you on the Automotive panel discussion. My name is Martin Pehrina, and I'm working as a manager in the Automotive part of our organization. And today I would like to introduce you my colleagues who I'm working with. So here we have, like Pierre Yves Chibon, he's the principal software engineer and the product owner for the container on wheels team. I have Rachel Sibley, who is senior principal QE engineer and is working as a QE Automotive lead for the Automotive effort. And also we have Daniel Walsh, senior distinguished engineer and the lead architect for the containers runtime platform in Red Hat. So before we will get some kind of our question, of course, we will give a talk to you. But at the beginning, let me pass the microphone and my colleague will tell you what they are working on currently. Good morning, everyone. So I'm Pierre Yves Chibon, also known as Pingu, more known as Pingu normally. I mean, as Martin said, I'm the product owner for the containers on wheels teams, which also stands for the cow team, which gives us the opportunity of doing a bunch of bad pun drugs. The team has been created about a year and a half and we still keep container moving and doing the move every week. So yeah, we're pretty good with bad pun drugs. The responsibility of the team is to do, to look at everything that has to do with running containers in the cars. What are the challenges? How do we deal with them? How do we manage them? What are the challenges? How do we do that? So it's a pretty interesting area and I have a talk about it at two, I believe this afternoon in this room, to talk about IRTA. And we'll go into more details into that here. Hi, good morning. I'm Rachel Sibley. I'm the QE technical lead for the in-vehicle operating system and automotive. I'm also the product owner and my goal is to migrate existing rail tests that relate to the safety scope and run them in an automotive environment. So yeah, I have a talk later on if you want to learn more about that in functional safety and what that means. And it's at 4.15 today in this room. Yeah, by the way, this microphone does nothing other than for the things so everybody's thinking. So we're going to try to talk loud. My job, nine months ago I moved into the RIVOS team. I consider myself... My main job is to get containers running everywhere inside a Red Hat. That's sort of my focus. And when it comes to automotive, I'm looking at how we can use containers in the car, how we use technology to satisfy a lot of the requirements for us to get approved as an operating system that can run inside of a moving vehicle. Okay, so Dan, so I will ask a question for you because like we have published several blogs in the Red Hat blog and one of the most let's say controversial or most visible thing was that we don't believe that the Kubernetes is good to be running on your car. So Dan, could you please explain? So we'll start out by saying I love Kubernetes. I want OpenShift to be successful. But Kubernetes has some key issues. When we talk about putting... The reason we bring up Kubernetes in a car is most of the automakers. That's sort of the first thing out of the mouth is we want to get into this cloud-native world and this whole buy-in to Kubernetes and there are going to be multiple computers in a car so why not just use Kubernetes to move different containerized workloads around the vehicle? Well, we're going to be talking... By the way, I didn't mention I have a container of BoF but I think at 2 o'clock this afternoon, same time. We're competing so if I get more numbers than him... You know what you think. Mine is just going to be similar to this just general questions about containers and I'll potentially show some cool stuff. And then tomorrow at 2 o'clock, 2.30 I'm doing a talk on containers on wheels or containers in cars. I'll show a lot of the technologies and we'll talk about it, but our heritage is one of the key ones. Anyways, going back to Kubernetes in a car when you're driving a car one of the key things in any type of moving vehicle is what they call functional safety. Functional safety means basically we want to make sure that you don't do anything to injure someone with a moving vehicle so it's somewhat similar to security in that you want to make sure that the system works correctly and in functional safety we want to make sure for instance you might have an app that's let's say applying the brakes and then you have another app that is running Netflix. Well you want to make sure the Netflix app is not dominating in such a way that the brake app can't function and fire off. In the world of Kubernetes so basically when you want to have an app execute that app has to execute. So to be functionally safe. In Kubernetes they have the concept of eventual consistency right so you want to eventually your app to come up and eventually the environment to be up and running. So obviously that is not the same thing as the app has to be up and running so we can't use Kubernetes and orchestrate the other thing is trying to get to functional safety you have to really look into the code really examine the code and explain how the code always works on time when you get into multi-threaded applications that are written totally in something like Golang it becomes a lot more difficult so lastly Kubernetes applies a heavy workload so in the Kubernetes environment there's always a lot going on right there's a heavy the Kubelet cryo they're always doing stuff so they're always using CPUs they're always performing stuff so there's a lot of reasons not to use and for our low overhead orchestration is Hurtay which is what he'll be talking about this afternoon and I'll be talking about a little bit in my tomorrow. Thanks Dan. So as you mentioned the functional safety is one of the biggest challenge or the biggest problem that we are trying to solve within the automotive effort so Rachel could you please talk about like you know how do we handle do we test functional safety? Easy. So for testing the in-vehicle operating system we need to well functional safety or FUSA requires 100% requirements test coverage so that can be a bit challenging especially the way we inherently do it with RELQ we were not designing tests 100% functionally testing an API for example we take a lot of what's upstream we rerun it so part of this is running the tests against our requirements which are the APIs running them identifying where our gaps we use code coverage analysis decov to assist us with that developing new tests to ensure that we have that 100% coverage so a lot of what we're doing with testing is we want to leverage existing tests from REL we don't want to fork their tests or have duplication there so we're rerunning all of the existing REL tests and then adding new tests for any of the gaps that we're finding and a lot of these of course are tailored around the safety scope and we're adapting them to an automotive environment because the tests weren't designed to run in an OS tree environment they weren't designed to run against an RPN traditional compose so there are some tweaks or changes to the tests to get them to run an OS tree and then migrating them to an updated test framework depending on where we're running them so for testing it's definitely quite challenging to also establish the traceability we need to have requirements down to the test cases the executed runs the logs to be linked to an existing issue all of this needs to be traceable we have to have the evidence we have to have retention policies in place so we have a lot of work to do I just want to point out there's a key word that she mentioned multiple times there, evidence so to me functional safety means eventually if a vehicle causes an accident or a machine causes an accident and you go in front of a court of law you have to have evidence that you did everything to make the system as functionally safe as possible so that's really what we're trying to do this will be the first time Linux operating system has ever achieved functionally safe to describe Linux in terms of being as safe as possible doesn't mean accidents are not going to happen but eventually you might have to get in a court of law to prove that we did everything possible or our partners did everything possible so to prevent an accident to follow up on that a little bit one of the example I like to take is you certify an API and that API can be something as simple as open a file and write content to it and the idea of the functional safety is to make sure, to guarantee that if you use that function what it will do is open a file if there are very specific use cases there are very specific conditions under which if I give it a 42k buffer frame and I'm pointing to a very specific place in the file system under these conditions the function does not behave as it should that's a problem, that function is no longer functionally safe so it's going a lot through the code, examining it and ensuring that it behaves the way it does that doesn't mean that the way you can't use the function to actually do something bad but it means that the example I do is you have a gun, you shoot yourself in the foot the gun has worked the way it's supposed to be you pull the trigger, it fires the bullet the fact that you're aiming it at your foot is your responsibility the gun did what it did so the function to open a file and write content to it does what it did if you use that function to mess with the Canon parameter that landed with a crash of the compute unit trigger the accident the function is what it was supposed to do the fact that you used it wrong is your responsibility and a lot of foosers have to do with this it's where does the responsibility lay and can we track that responsibility so it's a lot going through all the codes, lots of doing through all the test cases and that's why the test cases are important and the traceability of them are very important because they are to prove they are the evidence that the function does what it's supposed to be doing so it's very, very easy open a file, write a file some functions, sorting functions for example becomes a lot more challenging and there are how do we handle exceptions how do we handle the edge cases and all of these are becoming back to retraction we're saying testing is important there thank you as you all said the functional safety is one of the core problems that we are trying to solve and that also is where the Linux can move on right it's kind of very well established on the computers and servers it's probably also very well established in the edge devices now we are trying to get the Linux to the cars which is like you know completely new level regarding this functional safety so Pierre do you think there are kind of other categories where we could get or other areas where we will get into the cars this work will not be specific only to cars but we can expand from discussions we've had recently there is definitely an interest in what we are doing there is the entire automotive industry is curious of what we can do and what we can offer with it the idea about being able to update systems, being able to have a lifecycle a single software stack that is applicable, that is maintainable multiple generations of cars is something that is very appealing in the same way that you know it's nice to be able to run well night on different generation of servers and not having a specific version of an operating system for a specific version of a server so the automotive industry is very much interesting in what we are doing but they are not the only ones looking into us we have had recently discussions with industries that are less regulated, that automotive companies are increasingly critical that they actually rely on software and software stacks that are used in automotive industries but without necessarily the functional certificate one example we've met recently is a mining company that operates heavy machinery and they are not as critical from a functional safety perspective if the engine breaks in the mine it's probably going to be less dangerous than the autopilot in the highway but it is still sufficiently important for them that they are actually looking at these kind of stacks but on the line there are also other areas that are heavy regulated but that will have certifications similar to the one that we are working towards with automotive you can think like autonomous trains that have a different set of requirements but there will be a lot of overlaps with what we are looking for in the automotive industries medical devices are another area where they are very strong very complex certification you can achieve but there is also a lot of overlap with what we do in automotive so there is industries if Linux manages to enter in the functional safety areas of automotive it is the footstep in the door to a number of other areas for highly regulated that may involve that may look at what's being done in automotive that may look at the way the standards is evolving because there is also work that's being done at Red Hat to make the standards evolve the standard has been written 23 years ago something like that and hasn't really evolved since and the IT industry has changed tremendously since the core fundamental of how the standard was written is still applicable today but the way we do it and especially the way we do it in open source software where every single open source software was started by highly descriptive written requirement document that is highly implemented and covered in 100% test coverage it's like I'm sure every open source project in the room is fulfilling all of these requirements no? really? you do? you have highly described requirement 100% test coverage but no, but that doesn't mean your project is not actually following the spirit of some of these documents even if you don't realize them today so it's one of the things that we are trying to influence is the ISO community, the standards to be able to make them amendable to other approach to software development that relies on crowd sourcing reviews, highly available test devices, the test suites the Linux kernel is probably one of the most scrutinized piece of software in the industry, the number of people looking into it, it's a very complex one but it's also one of the most reviewed piece of software and yet it does not today satisfy the ISO standards but that doesn't mean in this spirit what the ISO standard wants to do is incompatible with the way the kernel is being developed so there is work being done there and it's probably going to open other areas in the future to have these discussions again Does that answer your question? So any questions from you that we would like to ask? Apart from functional safety which we just spoke about the usual thing the usual is the priority to be achieved if there is anything else you should need to do with in-line locating system then just say that if a container is responsible for safety it will be valid within the general service on that and that how do you take care when you need to take care of other problems? Okay, so I will try to repeat so the core question was if there are any other areas other than the functional safety which we need to take care about My kid used to play a video game called Need for Speed so in a vehicle there are requirements like you turn the car on within two seconds these are legal requirements within two seconds a bing has to happen and noise has to be made that tells you to put your seatbelt on if you put the car into reverse the reverse camera has to come up within two seconds so there's lots and lots of requirements so I think there was a talk yesterday on turning on the backup camera within as fast as possible in the Linux kernel so imagine going from a cold start the operating system not and being booted so that it makes a sound within two seconds so there's lots of stuff going on in the podman team we spent probably about 8, 10 engineers to try to optimize podman for starting a container and when we started out we were about two seconds on a really low end Raspberry Pi so we wanted to make the machine as low end as possible so we were able to get it up to about 0.3 seconds so six times faster starting now for human being if I hit podman and hit character return the container starts within one second it's not, you don't even notice it but when you're worried about starting it up our partner in this is General Motors and they want us to go negative time so they want us to create a time machine other things the user experience of the central system in a car is a lot different from what we're used to in a data center Dan mentioned the boot time requirements you start your car, you don't want to have to wait multiple seconds for the system to be ready you want to get in your car start the engine and drive same thing for the application, starting the podman when you click the button you don't want to wait if you take a data center server that will take minutes to start if podman takes a few seconds to start the application after that you don't really care, you've been waiting five minutes for the server to reboot anyway so a couple of seconds is not hard the entire user expectations there are challenges, we're speaking about an edge device that we need to protect physically because the user has physical access to the device we need to protect it from actually the user tampering with the device but we also need to protect the software once it's running from how to say from a non-human triggered random event that will curb the software basically if again going back to functional safety you're driving along and all of a sudden you get a corruption on a disk the system has to realize that so you can't cause an application to misrun so if a file or an object on the file system gets corrupted we have to know, so we're looking at like dn-verity and different functions, so a big effort has been this I think called Compose of S which is a composable file system that we can actually apply some of the rules of dn-verity and fs-verity on to the file system basically so the kernel will while you're reading a file the kernel will know that the files are corrupted and so then we can realize it and basically put the car into safety mode pull the car over to the side of the road into the breakdown lane a US thing called AAA get a tow truck out there to tow the car there's also the concept of freedom from interference so we need to ensure if there's a cascading failure that's happening from the quality management side of things which is non-safety aspect code if that's cascading into or affecting or preventing a safety function from delivering its capability then that would be bad so we need to ensure that we for a testing perspective we have Freedom FFI test cases to ensure that isolation between the A's will be code, safety code and the non-safety code aspect to ensure there's no interference there so the question is what happens if we get a kernel panic or a seg fault that is one of the core elements of the functional safety aspect is we're not supposed to be able to get a seg fault but that's one of the idea of functional safety you ensure that the function behaves the way it's supposed to be and the people using the functions are also due to functional safety on their application so the functional safety basically is a layered approach where we build the operating system which is built on top of a functionally-safed C-certified firmware and then functionally-certified applications will be built on top of a functionally-certified operating system which means you can't have a functional safety certified application that does not run on a functionally-certified operating system so it's every layer in the stack needs to be certified for the top level application to be so one of the things is you have to document all of this in part of the functional safety is like okay the question was what happens if there is a file system, this permission error when you try to write a file the functional safety documents all the work that Rachel is doing and documenting in tests comes with basically a pretty big user manual and we basically get to tell the OEMs RTFM and make sure that your codes is compliant with the way we describe how it works so not only are the documentation going to tell you how to use the code but also what are the different exceptions that will be raised and how the code what will be triggered so that the code can handle such an exception so I mean so if something goes wrong in the car so the goal is for the car to know that something went wrong so it's again if you run out of disk space on the car or actually a segmentation fault happens then we have to relay that information up to the monitoring program that's running to see what kind and then the monitoring program will take action so a classic example will be you're in self-driving mode on the highway and it'll notify the driver of the car to take over that the human being has to take over I'm dropping out of self-driving mode because something went wrong so from an operating system point of view our job is to not only describe what happens if you write code like this you could have a chance of running out of disk space so that's us describing what functional safety is if an app on the third node blows up system D is going to realize it system D knows that the service went down system D will then tell Herte the tool that we have our Kubernetes our Lightway orchestrator will then tell Herte will then send a message to the main node that's monitoring the system that Herte on that system will then relay the monitoring app from the vendor so the car company software will take action like notifying the human being you have to take over driving or tell the self-driving car to pull the car over into the breakdown line but that all has to be realized it's not just something happened and the car keeps going so before we get to that question one of the things I wanted to precise are basically four different levels of functional safety certifications it's basically called ASIL and it goes from A to D and D is the highest level of certifications and A is the lowest we are aiming for ASIL B and it's important to know because as much as we like the break example it's actually a bad one because the breaks are not going to be running in an ASIL B environment they are going to be in an ASIL D environment because these are critical systems and these certifications are it's no longer an operating system it's a microcontroller and those microcontroller are ASIL D and they are very embedded so even if we were to if the node in which we run would blow up the car would still be it will remain in a state that it can be drive by someone all the ASIL D functionally certified systems will keep on running so you still be able to steer the car you'll be able to drive the car and keep the safety of the integrity of the passenger intact we like the break system because it's something we can easily understand but it's actually not the best example but it makes it easy to grasp some of the challenges there and I'm sorry to go back to your question does that answer it? okay that was a I'm sorry the entire operating system is going to be based on top of REL we're not building a brand new operating system here the goal is to take REL and prove it's functionally safe we are going to be modifying slightly the Linux kernel we're running in real-time kernel and we're modifying the NIDID for quick boot up things like that we have to fall back on Red Hat 30 years of experience in an operating system and building out REL and all the software we're not building our own G-Lib C we're not building anything special for the most part this is REL so it's Red Hat Enterprise Linux for a car is what I would say are we looking at other standards for certification we're only looking at ISO 26262 we're also considering Aspice which has a lot of overlap with ISO 26262 but just looking at different aspects of that as far as our quality management system but so far just ISO 26262 maybe I will add to this that we have a working group who is trying to improve the current version of the standard to be more like let's say aligned with the modern approaches that we are trying to do so it will not be like in the past that when you want to have some kind of functionally safe operating system you will need to print out like a bunch of books and the wilderness stamps but it will be much more based on the continuous certification to be able to align with the latest changes and the other part is that we have people within the ISO standard who is working on improving the ISO standards to be more aligned with the best practices as we have now so the question is if you're targeting SLB what kind of functional safety software are we speaking about so infotainment the question about the infotainment is infotainment is not functional safety certified it's called QM the infotainment the GPS to some extent so the stack that we are mostly looking into is the ADAS stack so the advanced driving assistance systems basically autopilot being able to no Netflix it's another example that we like to use because it's easy to grasp but Netflix is not going to be running on top of Red Hat it's going to run on Android so in there'll be a separation of ASLB we actually support ASLA which I'm not even sure what that is but ASLA and ASLB applications but that could be your entire self-driving capabilities in the vehicle so it is a lot matter of fact there's very little parts of the system that I usually see in ASLB just because they're so expensive so it's a large framework QM which basically means it doesn't have to live up to the standards of functional safety if your Netflix app when I'm really talking about here is we'll be running Android inside of a VM if that VM fails then we just put it aside now what we have to guarantee is that the entire QM system will not interfere with the self-driving car so the self-driving car gets all the priority over the no no they want to combine basically three or four computers total in the car and then have sensors talking to all the rest of the systems the firmware in my car gets updated once or twice a year when there's a serious problem it's a new car once there's a serious problem the cars get traditionally called to render so what's the future of security in another place when the cars are notoriously badly security it's going to be like your cell phone it's going to get up no that's a there's lots of questions we're working back and forth with the vendors on that idea so they want to have over the air updates so the operating system itself will be updated maybe twice a year so the base operating system out of the operating system will be updated on using over the air updates so imagine in my thought would be you pack the car in the garage and it hooks on the wifi and downloads but the automakers want to be able to update the applications as you're driving around now not replacing the running applications so imagine you're doing a podman pull as you're driving along the highway but not doing a podman restart make it as simple as possible so they want to have the same experience you have on a cell phone where they can update but they also want us to we just had something come up this week where they talk about how they can protect a USB they want to update the entire operating system so you take it to the dealership and again I don't know how this is if this is US centric or not but you have to take the vehicle into the dealership it will reflash the operating system from a USB stick and they want to make sure the USB stick is certified somehow and so there's things like that we have to deal with as well you want to talk about flow diversity let me know if I got the question correctly are there any other features that we are foreseeing in the future about what we run on top of all is that correct so are there also features that are driven by the automotive use case that will be used outside of the automotive use case I think ComposeFS work from Alex is definitely going to be interesting for the entire edge ecosystem because the entire edge ecosystem is going to have the same so it doesn't have the safety element but it still has the security element of the same requirement than the cars like it's an edge device that is somewhere far away from a data center and we need to protect the system from the user or its environment so there are definitely work on ComposeFS that are interesting Quadlet is something that landed in Podman for two and that was directly driven by the automotive use case Quadlet is an easy way to manage to start container from system services it's basically you do to properly start container using systemd you have to enter in the exact start line a command that is about three or four lines long depending on the width of your terminal Podman makes this Quadlet makes these two lines in a text file and it automatically generates the correct way of invoking Podman runs in your exact start on systemd from a template and it makes it a lot easier to run containers from systemd one of the advantages of course is one of the other advantages of using that is if in the future we optimize Podman to run in the systemd use case we do other optimizations for Podman to run container from systemd you will automatically get these optimizations in Quadlet and you don't have to remember to go and edit all your service files or remove the old ones and so on you keep that in Quadlet and you will automatically benefit from it I'm going to give it to her because there's a lot of functional safety stuff that's funneling back into the rel test functional safety there's all of the tests that we're taking are derived from the requirements and having to take those and rerun them in vehicle OS there's a lot of rigor depending on how complex the API is but essentially the APIs are basically based on man page requirements and then QE has to go and then decompose a man page into low level requirements of taking something that's ambiguous and breaking it down into functional pieces so there's a big what's that? part of that is reviewing the man pages and ensuring that the man pages and what the implementation is supposed to be doing but there's a lot of rigor a lot of rigor that's going on in the testing aspect with verifying these man pages also there's a lot of effort going on right now to get Android to work well inside of eventually you'll see it'll be easier on Android inside of Fedora for instance as a VM basically RIVOS is not a huge organization but there's about a hundred people in RIVOS at this point and everything we're doing is feeding back into other parts of the vehicle I mean the operating system okay unfortunately we are out of time so thanks a lot for coming here thanks a lot for the great question and if you have any additional questions feel free to come to us and ask and we will let to answer you thanks a lot so hello and welcome to my talk about stream processing at the Edge with Awachi Kafka my name is Jakub Schultz and I work at Threadhead as engineer and I'm also a maintainer of the project called Streamsy which is cloud native computing foundation project which is about running Apache Kafka on Kubernetes and obviously that will be one of the things which I will use later in the demo how many of you know what stream processing is? right that's quite a lot of hands for those who don't know you can try to imagine it as that the world around us is full of different streams of events and streams of data for example in the room like this when someone opens a door it's basically an event someone open the door when someone close them when someone enters the room or leaves the room that can be another event you can take these events and kind of process them as a stream so for example if you take the events that someone entered the room like now or left the room then if you process the whole stream then you can basically count one person entered, second person entered third person entered then one person left, then three person entered so we have in total five people in the room and that's an information you can build from the stream of events and then you can use this information for some more things so I'm quite sure there is some fire code here which says that there is some maximum number of people in this room so you can for example use this information for some alerting which would say okay if there's more than 50 people in this room then let's send some alert message to some fire marshal and he will come and say oh you too you are over the number go out then of course you can also do things like install some sensors which would for example monitor the temperature air quality or the noise level and then you can kind of combine this information so how does the noise level depend on the number of people in the room how does the air quality depend on the number of people in the room and you can do things such as improve the air conditioning or the audio system depending on the number of the people to for example better conserve the energy and so on so that's kind of how you can imagine stream processing why does this make sense to do it at the edge there's a lot of different industries where kind of this pattern fits in it's connected to how the world is changing how we are digitalizing everything and how for example we are changing the ways how we produce things such as energy so you can imagine in agriculture you can for example use some sensors to monitor the fields whether they have enough water or enough sun animals eating the crops and so on in pharmaceutical industry or chemical industry you can use it to monitor some processes you can use it to monitor the environment because for example a lot of pharmaceuticals or I don't know the covid vaccines they have very strict limits at which temperature they have to be stored for example or that there has to be no sun whether they are stored so all these things can be again monitored for example stream processing as well and when you breach some of the rules it can automatically somewhere mark some batch as as damaged for example or you can use it in sports to monitor performance in manufacturing to monitor for example some robots and some car manufacturing line and when you see that something's going wrong with some robot you can kind of proactively fix it because stopping the whole line when it breaks will be too expensive like this in transportation waste management, logistics telecommunications energy retail and so on so it kind of fits into all these different areas and all these different industries and it's good to process the data at the source because that's where they are produced and it gives you several advantages so one of them is that you get better latency and speed often like the latency there's one obvious part if you don't have to send the data for example into some cloud and then get them back when they are processed then you save some network latency but it's also if you do it somewhere centrally in the cloud then you might have many locations sending the data and you might be queued for processing behind someone else whereas if you do it directly at the source then you are not queued behind anyone else you can better control how much processing power you have and you can make sure that they are processed really quickly and in time another advantage is that you are more resilient against things such as outages now in some of the areas you might be doing the things in the middle of the nowhere and there might be bad connectivity which is unreliable but even if you have some retail space some shop for example or supermarket you might be doing more which is in general good connectivity it might happen that the connectivity goes down because someone did something wrong and in that case if you are doing the things locally at the source at the edge then you might not need to for example close the shop because the connectivity is not working you can continue to operate as normally and in some environments it's kind of not just a question of some problem with the connectivity disconnected for a long time by design if it's a cargo ship somewhere in the middle of the notion it might not have the best connectivity possible similarly if it's a plane somewhere over North Pole I think there's not good satellite connectivity either so there are situations where we are by design not connected and where this fits quite well and last but not least it obviously impacts also the cost because if you ever use some public clouds like AWS you know for example quite often the data transfers are the biggest chunk of your invoice it's not the VMs or the storage it's just sending the data here and there and it's a bit similar here as well if you don't send the data if you process them locally then you can save some costs as well so I wanted to also touch on what I mean with the edge here what kind of environment I'm talking here about because there are many different use cases and they don't always talk about the same thing so for example in some use cases in some situations people would be talking about edge as some IoT devices but as I'm talking about using stream processing in Apache Kafka this would not have really the power to do these things right so you might use them in the environment but that's not what we will be planning the stream processing on similarly in some cases smartphones or tablets might be the thing which might be the edge and in some use cases it's completely valid and true but also quite often actually these have enough power to do these things these days it's not really designed for something like that so I'm not really talking about those either but when it comes to things such as an onboard computer such as Raspberry Pi then that already is something where it can work now Raspberry Pi is not really a professional industry grade device but you can use it you can easily buy it and you can do stream processing on it quite easily and actually I will use those in the demo at the end then in many cases you might use some consumer PCs or some kind of old PC or some powerful PC lying under the table in some office and kind of running some coffee shop or some supermarket or some fast food chain and so on and that's again that's completely fine that's completely valid in some cases it's really used like that and yeah it works and then of course in many cases it will be just a regular data center with rex and real servers it just maybe won't have hundreds of rex or thousands of rex it will maybe just have just few rex but yeah even that's something what can be called edge and yeah obviously you can use it in this case as well so why should you use Apache Kafka for these things it's something I will be focusing on a lot so why should you even care and kind of the most lame argument is that it's kind of leading even streaming platform right it's used for these things for years it's well proven it's used by many organizations and so on so yeah why not use it right but there are some more practical arguments and one of them is that it has all the tools you need to do this so you have the Kafka brokers which kind of always sit in the middle of the architecture and they are the thing which kind of receives the messages from some producers and then pass the messages to some consumers and then you can kind of process them and do whatever you want with them then you have the Streams API which streams processing engine now the nice thing about it is that it's basically just a library which you can include in your own application it's not some huge framework with some worker nodes and some schedulers which is heavy and complicated to run it's really just something what you take as a library include in your application and use its features and so this kind of being lean kind of fits quite well with the edge as well but despite that it actually has quite a lot of features you can do all the simple stateless things such as transformations filtering enrichments but you can also do the more sophisticated and heavy things you can do time windowings you can do stateful processing with aggregations and storing the data and all of that can be done while still being scalable so you can just scale it and run it in multiple instances to get better performance then the next part is the Kafka Connect so I hope that everyone after the talk will be convinced to use Kafka at the edge but it would be quite rare that you would use only Kafka there you would probably have other things as well so the integration part is important because you want to take the data be able to send them somewhere else integrate with other systems and that's what the Kafka Connect API as a Kafka component does and then another part is something called Mirror Maker which can be used to mirror data between different clusters and that's useful because you can then basically copy these data between the different environments and share them that way as well and then last but not least Kafka has also quite a lot of integrations for all kind of different tools which are not part of the Kafka project itself so for example if you would want to do something with AI or machine learning there's a lot of tools which kind of work into Kafka as well and you can for example pre-train your models somewhere else and then just use them at the edge and plug them into Kafka to kind of read the events and process them so hopefully that gave you some kind of introduction to the different parts and now let's have a look at how you can do this as a kind of introduction as a pattern which you can use so typically you might have some input data so you might have some IoT sensors which are feeding you some environment data you might have some applications running on smartphones or running on some point of sale terminals which can again feed you the different data you can have different beacons who monitor how people move through some space you can have cameras and all other monitoring tools and all of that into it and then you will have the Kafka brokers which as I said kind of sit in the middle of everywhere and they store the data and they kind of distribute them but quite often it would happen that things such as for example IoT devices will not talk directly with Kafka in the Kafka protocol because that's a bit too heavy for that and too complicated so quite often it will have some kind of bridge sitting in between and it can be using HTTP it kind of gets the data in this little bit simpler and easier protocol from the IoT devices and then it will basically format that as a Kafka connection and pass that to the Kafka brokers and then from the Kafka brokers you can kind of connect the different stream processing applications and do the actual stream processing where you read the data sent by the devices and you will basically do whatever you need to process them and of course you can also connect other clients and other applications depending on what exactly you are doing and that's basically our edge environment but what's kind of important is that the edge environment doesn't live on its own right it wouldn't be edge if it wouldn't be on the edge of something so there's typically something that you can call it here HQ as headquarters but you can call it central cluster or something like that and it's again quite typical part of it where you have something central running for example in some cloud or really in some headquarters in some on-prem data center and you want to link these two parts so that's where kind of the mirror maker tool fits in because it can basically mirror the data between the two parts and then you can do the processing maybe you aggregate something and then want to push the data into the central information system of the company or of the organization but also it can work in the other way around so quite often you for example have some reference data or some master data which you are managing in the central cloud in the central environment through some other tools and so that's what you can do with mirror maker and then what's also important is that usually you don't have just a single edge and the central cloud you usually have many of these edge locations so you need to be able to manage them and monitor them in some efficient way because you don't really want to SSH into all of these and then install something or upgrade something or check if it's still running so it gives you as little work as possible and that's where the things such as Kubernetes or the Kafka operators as Strimsy which will help you run all these components come in because they will do a lot of these things for you and make it much easier for you to manage this whole architecture. One of the advantages is that it allows you to run the same software everywhere so you can use Kubernetes and you can use Strimsy or the other operators if you would use database for example you can use them in the central cloud but you can use them in the edge locations and even at the edge locations for example you can use them on different hardware platforms you can may have different situations everywhere you can use the same tools you don't need to learn new things every single time and you can use them exactly in the same way so you know that it will be exactly the same environment as the others. But thanks to the Kubernetes and operators you should not actually need this that much because they kind of help you with making the environment resilient and self-sustaining right so the Kubernetes they will take care of restarting the processes the operators will basically make sure that the applications are running that they are upgraded that they are maintained that they are monitored that they are certificates are renewed and all these things for you and they kind of do this all job for you and you don't need to take care of it that much. And then finally there's a lot of tooling around automation which is available for Kubernetes such as the different GitOps tools Argo, Flux and so on and you can use those to kind of manage these configurations with Manor where you have some Yamlas you can kind of store them in your Git environment and then you can use these tools to kind of roll it out to all the different locations in various patterns so that makes your life much easier as well. So let's have a look at a demo and at the end I have there a link which gives you the GitHub repo with all the source that you can look into that in a bit more detail but basically what I have it's running at home in my home lab and there's some ESP32 based sensor board with some sensors for temperature and humidity and air pressure and so on and that's then connected over Wi-Fi to the Kubernetes cluster which is running there and it's actually using an HTTP bridge to kind of communicate with the Kafka cluster and then it does the stream processing using the Kafka streams API application which basically just to demonstrate something simple it does kind of one minute windows and over the windows it calculates the average values and then sends them into another Kafka topic and these are then mirrored by the mirror maker into the kind of central cloud where there is a simple application which basically takes these data and visualizes them centrally. So it's not really industry-grade equipment but it's really running on real hardware it's not just some software emulated things so it's using this ESP32 board with the sensors and it's using it in MicroPyton and then I actually have Raspberry Pi cluster which has three nodes not four nodes when I did the demo and that's running the K3S Kubernetes distribution across it and it's using the streams operator to run the Apache Kafka components for us. So I hope you will forgive me for using recording but actually running in resilient and self-sustaining method at my home behind the nut so it's not that easy to do this live but let's first have a look at how the IoT board looks like so as I said it's using MicroPyton that's kind of a cut-down version of Python. If you don't like Python don't worry I'm not biggest fan either but it's actually quite easy to use and even with a little knowledge you can easily write some things and what you see here are the basic files for it the and then the boot .py script that's kind of what's called when the board starts and it's just some initialization work like connecting to the Wi-Fi so it can send the data and things like that and then in the second file in the main .py file that's what's actually running there all the time and doing the initial work and it basically first loads the library for working with the sensor then it initializes some led lights to kind of use as to signal some problems so it will be kind of switched on if something stops working and then it kind of configures the different data like the location of the sensor and the URL of the HTTP bridge where it kind of writes the data and then it's basically running in a while loop every second and it reads the data from the sensors it formats a JSON message from it which it then sends using HTTP posts to the Kafka broker as a message and when that succeeds then great let's do another loop and when it fails then it kind of just switches off so that's basically how the IoT board looks like and now let's have a look at the edge location so as I said it's kind of Kubernetes cluster it actually has 3 nodes when I recorded the demo because I had one of them switched off but it's running the K3S distribution and when I check the ports which are running there you can also use the operator which is kind of running all the Kafka components and then you can see the Kafka cluster with Kafka port and Zookeeper port and you can see also the entity operator which is used to manage users for security and topics for where data are sent and then we have the mirror maker which is doing the mirroring and then finally we have the bridge which is doing the bridging between the HTTP world and the data and the Kafka broker and now what we can do because it's running we can start the Kafka consumer and connect it to the topic in the Kafka broker where the data are being sent from from the device it's actually using this Wi-Fi AP port which is actually running in a mobile Wi-Fi setup I find that quite neat but I don't take any credit for it I think I copied it from some colleagues who maybe copied it somewhere else but now let's run the consumer and that should show the kind of raw data as they are being sent by the sensor so we should see roughly every second a message in the JSON format with the readings from the sensor so it takes a few seconds to start the consumer port now you can see the messages as they are kind of coming in through the bridge from the IoT device and once we have them in the in the Kafka broker we can move on to the stream processing so there's one other port running in this namespace and that's the aggregator which is running the Kafka Streams API application which is actually running inside this Quarkus framework or toolkit which kind of makes it easier to just run it and it does some special tricks for in the build time to kind of optimize it but what it's basically doing is it's reading the data from the Kafka broker every second and it's creating one-minute windows window after window over them and then calculating the average time given minute and then when the minute window ends it will basically take the average value and send it to another topic in the Kafka cluster now obviously this is more for demonstration purposes it's not the most sophisticated processing possible but it should kind of give you idea about how this can work and now when I open a consumer against this topic with aggregated data we should see again basically the same message which we send just every minute because it's basically result of the aggregation and that's kind of the average value which kind of softens out the numbers we are getting from the sensor so that's kind of the edge location and then the last part is the center location where we kind of get the data from the edge location at home right now only one but in reality you would have many of them feeding the data there and what we can see here is again the familiar thing the STRIMSI operator on the Kubernetes cluster running the whole Kafka cluster this time it's a bit bigger so it can handle more data and more things it has also some more components but then there is this frontend application which basically reads the mirror data from the edge and it shows them on the map or it also kind of keeps them for the history and shows them in Prometheus so that you can kind of look how the values were evolving and what we can check also is how the data are being mirrored from the edge location so I will again run the consumer the same as the last time but this time it connects to the center cluster and not to the Kafka cluster at the edge and we should again see basically the same messages but this time these are the mirrored messages mirrored using the mirror maker and yes so here is one and if we wait another minute we will get the next one and so on but let's switch to the browser and just ignore this for development purposes that's when you use Google Maps without entering the production API key but what you can see is it just shows the map and it shows the pin with the location of the sensor and then it gives you the details and when you click on it you can kind of get to the Prometheus chart which gives you the history and shows you how it was evolving so that's kind of the demo part it doesn't have too many things which are kind of not focusing on the pattern but should demonstrate well how Kafka helps in this area hopefully and that's it for the talk this is the URL there you can find the demo it will redirect you to the GitHub repository and thanks for listening and if you have any questions we should have some time for that so it's quite complicated because you would need to create different listeners just to repeat the question the question was if someone runs the Kafka in the containerized Kubernetes environment how to get access to it from the outside and it's not easy because of how Kafka has its own discovery mechanism which makes this quite complicated but basically you would need to configure the listeners in the way that you have for example multiple listeners one of them will be for the internal applications running inside the Kubernetes cluster and the other one will be for the applications running outside and you have to use in the other one for example addresses of some node port or some load balancer address in there to kind of advertise it to the clients and then use this listener for it so if you would use the streams it running on Kubernetes it has four different types of listeners like using load balancer using ingress, using open shift route and you can kind of just configure them and it kind of does all these things in the background for you and then just gives you the address where to configure the client on but yeah you would basically need to play with the advertised host configuration in the Kafka config if you would kind of do it yourself any other questions? so the question is if you would have multiple edge locations with basically multiple mirror makers synchronizing their own data if it would be synchronized into the same topic or into a different topic for each location just to repeat the question and I think it depends a bit like with a project like this where we are just sending some sensor data the same topic makes most sense because then it's much easier to consume it but there definitely might be some cases where different topics might make more sense where for example you work for some organization with just some other kind of divisions and so on so the question is about the HTTP bridge if it's a standard component or if that's different projects and so on it's actually what I'm using here is directly from the Strimsy project which has its own HTTP bridge and you can basically deploy it with the operator just by specifying a custom resource so that's kind of easiest I use there there are some other bridge components out there as well there's the confluent rest proxy which does HTTP proxying as well confluent as an MQTT bridge as well which you can use if you want MQTT for example and there are others as well so you can kind of use what I used as a Strimsy maintainer was obviously a Strimsy bridge but if you would want you can use others yeah I use sorry what? so the question was how do I configure the aggregation part so it's using the Kafka Streams API I should have prepared it so I should have showed the source code but the source codes for it are actually in the GitHub repository as well but it basically in the Streams API you just say start from this topic and specify the serdes for kind of decoding into Java basically and then you just get the messages as an object and you can work with them in the Streams API DSL so you just start with this from and that basically in this case that gives you kind of the data and then you just use the DSL to say okay with windowing with one minute windows and then the most complicated part of it is actually writing the custom aggregation to kind of do the averages but otherwise it's quite simple there's a it's like free Java files all together or something like that so I I'm not sure exactly how easy I don't think it would be that easy to use it to distribute commands which would be to specific edge locations right because you would for example need to have different topics for kind of each location to be able to say okay location with this ID do this or you would need to mirror the commands for everyone to every edge location and then let the edge location for example filter it out whether it's for it but it's like super useful if you have some reference data like if you I don't know use it in some supermarket to update all these digital price text then if someone changes it in the in the central change the prices you can kind of roll out the update to all the edge locations this way for example other kind of reference data where basically the same data can be sent everywhere and applied to all locations I think that's where it fits a bit better than some kind of command pattern anything else in that case thanks for joining me and thanks for your time first thing was actually quite busy while the answer was yeah good to go good afternoon everyone after lunch this girl is going to be an interesting session but at least we're not talking about security so my name is I'm one of the in the specialist solution architects at Red Hat and I focus on Kubernetes class management and automation I'm Luca Ferrari I'm Italian and I'm an edge computing specialist so you might have heard Red Hat recently is investing in Edge quite a bit and I have a background in integration and API management so a few years ago I had a call from my father's neighbour they called to tell me that they heard my dad's little dog barking in the garden and when they went to find out what the problem was they found my father lying down on the patio unable to get up and he couldn't manage to get himself back up fortunately that day they called the ambulance and by the time I made it there my dad got the medical attention he needed but this wasn't the only time this was happening my dad suffered from Parkinson's and Parkinson's related dementia so if he didn't have someone with him all the time to remind him of his limitations he would set up on walks and sadly sometimes on that cold day the floor would be slippery he would have a fall and he couldn't get himself back up sadly as probably a lot of you are aware his story is not unique in fact according to the National Safety Council in the US a big chunk of falls that happen at home and cause preventable injuries and sadly injury related deaths are related to falls and these falls are very common amongst the elderly and you know we are facing an aging population so this problem is only going to get worse and worse so if the elderly don't get the care they need at home they are very likely to have a fall they might neglect themselves and they end up in A&E in the hospital and that means pressure and the health and social care we are just looking in to see how we can use technology to help with this problem and that we are looking into this assistive care today is because the hospitals are already overwhelmed with a lot of day to day problems they have a lot of legacy system and devices that they need to integrate and maintain they produce a large amount of unstructured data by the medical equipment that needs to be maintained and they need to be compliant with industry standards patients are expecting to be able to look at the data and on top of this the hospitals don't have a big IT budget or IT teams to be able to resolve this problem and that's why we are trying to see how we can use open source to relieve this burden of at least with the elderly and the assistive living from the hospitals given the challenges we have seen there are several advantages of using an edge computing approach when it comes to assistive care and using open source at all layers so first of all in terms of right at project and products you can see that we started adopting at the edge the same platform that we adopted at the core which is basically Kubernetes so that brings manageability advantages so the team doesn't have to learn a new tool basically or new technology then there is security by default at all levels so through tools like ACS advanced cluster security you can implement policy that automatically secure new clusters or in the case of assistive care secure new homes then there is a whole partner ecosystem so I don't have to explain it to you but the power of community here is pretty strong so we can for example connect to legacy protocols through library developed by community and we have a whole set of partner products that can be deployed at the edge or at the core in different way and then eventually manageability and scale so when you think about an edge architecture what comes to mind is actually day two or thinking about deploying the second third or eventually a hundred cluster after the first one so deploying the first one and managing the first one is quite easy but what about managing at scale so then you have tools like advanced cluster management that help you with that scenario so we came up with actually a reference architecture in this case and with the specific use case of assistive care and we look into literature on what are the type of sensors and scenario that can help detect a possible death at home I should say so there is a full mix of technology as you can see here quite a recipe I'm pretty sure you're familiar if you're been working a little bit with home automation with raspberry and Arduino so I will not explain a little a lot about those we use two different messaging technologies so you might have heard of NQ Broker as in Artemis as a project so that's a store forward broker we used it for NQTT messages but it's a multi-protocol broker then we use NQ streams so that's actually rather packaging of Kafka so you might know the project as string Z and we use it for event streaming scenarios so when you want to process data at the core I don't think you need to explain anything about OpenShift there are will be able to answer any question on ACM after the session and we use Camel with Quarkus so there are also quite a number of talks today about Quarkus there are now extension to actually run Camel integration on top of Quarkus runtime so to make integration even lighter this is for all the scenario where you want to integrate legacy or industrial protocols for example at the edge then we use for the monitoring and presentation part Grafana and timescale, timescale is actually a time series database that is included as part of the role that you see here and Grafana allows you to build custom dashboards so in case the hospital wants to have current state monitoring of what's the patient scenario they can do that then we use Ansible for all the event driven automation case that Faz is going to explain in a bit more deep dive a little bit more into Edge Impulse and DROP Edge Impulse is what I just call it a studio an online studio for ML Ops so if you're familiar with OpenShift Data Science it's not that dissimilar the difference is that it's highly focused on machine learning at the edge on very low resource devices that's especially important because a lot of this sensor and this platform are not really single node OpenShift or are not really beefy so if you want to run some machine learning model on stuff like Raspberry that's a really good tool the other interesting element that you can see here is that you can start building your model directly on your smartphone and then deploy it as a test case on either Android or Apple this is DROP instead this is a reddit project you might want to take a look at it if you're interested into processing data and coming from IoT IoT environment and there is a whole set of layers so in this case the diagram was more focused on automotive use case so you can see the little car there but basically there are two main concepts the devices and the application so the devices are at the left hand side they are basically all your sensors and actuators and then you have the application so the application are what the end user might use or develop and typically several devices are associated to one or more application so you have an ingestion point and the end points to the left so DROP sports HTTP, HOPE and NQTT then eventually there is a data streaming component through NQ streams so you see there Kafka there is a whole set of authentication both related to devices and applications with KeyGlock and you also have a set of device management and device registry functionalities eventually the data that can be processed filtered and even there is a basic rule engine is exposed through integration through the right hand side where you can see WebSockets, NQTT, Kafka and even serverless events so this is the architecture we came up with as I was explaining before we used NQ Booker to ingest the events coming through NQTT communication from the sensors then let me just maybe switch to the actual data flow so then the it is so then the messages are eventually stored in the event streaming part of DROP DROP will apply some filtering for example will not actually activate any alert in specific scenarios it will also store for historic purpose all the events in the time scale maybe for later exploration and then it will send and then Ansible will actually get triggered on specific situation specific topic on NQ streams eventually Ansible will trigger a call or in our case a message to telegram so the idea here is that given specific scenarios that we will explain soon a nurse will be alerted that there is a patient to be visited she will travel to the assisted house right so the use cases we have tried to implement using the technology just so are one of them is the full detection alarm as a wearable and the other one is a free usage monitor so essentially just to monitor the state of whether the fridge door is open and closed and then decide later on what we need to do and also the scenario of combining these two through sensor fusion who knows about sensor fusion who is what the concept of sensor fusion is for sensor fusion is just the process of combining the data that we get from different sensors or different disparate sources of information in order to get a better understanding or a clear picture of the whole situation on a high level that's the explanation so it's just combining the data that we get from our sensor so for the fridge usage monitor so we divided the use cases into two parts I worked on the fridge usage monitor and look how it works on the full detection alarm for the fridge usage monitor what we used was an Arduino Uno Wi-Fi and the read switch connected to it the read switch is just an electromagnetic switch that based on the voltage higher low we can decide whether the door is based on the magnet being close to each other we get the higher low voltage and then we can detect whether the door is open or closed so I started implementing the code on Arduino Wi-Fi which is this little board here which worked really well to begin with and gave me a false sense of security but as I went along and I was trying to implement SSL it took me a long time to get to do that and then I realised the libraries that I was using on this Arduino Wi-Fi do not support SSL properly so then I switched over to this ESP we must and I managed to get SSL working on that which was a good success and just to show you a little bit of the code snippet from the Arduino as you can see we're just reading the serial the data that comes through the serial board and we're just sending the information through using the MQTT client that's available on Arduino libraries we're sending the information to our MQTT broker in case anyone hasn't come across MQTT broker it's just a lightweight to go for a pop-up and the other use case was based on an interesting scenario and device so the device you see on the left-hand side is this small thing so interestingly enough this can run a very basic TensorFlow Lite model and you can deploy it using a Gimpus as I was saying before and then I basically aggregate all the measurements through Bluetooth on a Raspberry Pi so this then communicates back to the MQ broker in Snow so I'm not going to show you any C++ code because I'm not really proficient in C++ but basically Gimpus generates after you I'm going to explain it afterwards but the way it works you generate and train the model and then you can export it in several ways one of them is just an agnostic model which is just in C++ means you can deploy it almost anywhere and as you can see there's the structure, the model parameters the SDK which is the runtime and then the actual TensorFlow Lite model so the way it works as I was saying before if you've been experimenting with OpenShift data science or OpenData Hub I think it's the upstream it's just a standard ML Ops platform so you get to design the model using several libraries you get to train it and collect new data sets you get to test the accuracy of the model and eventually deploy it on several edge targets so this is an example of the interface as you can see here in our case we were interested in measuring the acceleration in XYZ axis so that's one thing the Nano can do there are other sensor packages on the Nano and based on this variation the model basically identifies whether this is somebody falling or not eventually you get to be able to deploy several tiny platforms so as you can see there are several Arduino platforms supported but there are other certified platforms for edge inputs so there is an alternative to this tool which is not online, it's completely on-premise which is TinyML so if you want to explore that instead of using the cloud approach it's also worth it so this was basically to recap a little bit so we were up to the sensor layer and eventually we decided to introduce Ankyl Broker to actually store the messages for persistence so somebody might ask why the sensor are not directly communicating with the cloud, the OpenShift platform is for persistence and resiliency for persistence so we configure Ankyl Broker so that it exposed among the several protocol support and QTT and as you can see we created a queue and an address related to this type of traffic so they are doing one and you can see an example of messages so you can browse the queue and see the continent of the message and then I configure also Drog and I created an application corresponding to the end user application so in this case it was just a notifier application and this notifier application had associated several sensors so as you can see I can also create an IoT gateway in Drog and you can create a hierarchy between the actual IoT gateway and the other sensors so far we've seen how we just get the data from our peer redirector to Drog and as a part of Drog as you can see there's an AMQ stream section that comes with Drog so now the notification part involves the data from AMQ streams or PACA and Ansible and EDA or Event Juven Automation so what you see here is an example of the Event Juven Automation using Ansible this is the main component called the rulebook we've got two sections which we've got the sources section which we have different plugin sources for it in this case we're using Kafka and then we've got the rules section which based on a condition and action is triggered and as you can see it's quite simple the usual Ansible style so we're just saying that if you're getting that condition that the door closing is detected from Kafka trigger a playbook and the playbook we've got and that's just the Automation Controller as you can see that playbook is just running in there so just talk to the Automation Controller like that playbook and the playbook is pretty simple which is using the Telegram collection to send the notification and eventually when that is done we will get the notification message of the carrier or the family member or the nest we'll get the notification and the phone whether a fall is detected or what's happening with the fridge door and so forth so this is what we've done so far with these two use cases still work in progress we're still working on improving them and we're hoping to start introducing more use cases to this POC so you may have noticed we haven't talked about camel caucus part of the architecture because we haven't implemented it yet but we will, we're looking into implementing some sort of local level 1 alerting using camel and we haven't talked about Racken with this POC and what actually you can do with it Racken takes care of the infrastructure side we have used ATM in order to provision as a good note ownership cluster and also implement the configuration and application on it most of you are familiar with this if you've worked with Racken, that's just the cluster that you can see there so to conclude on some lesson learned so you might have already been fighting with certificates so TLS certificate and DNS are usually the culprits of issues when it comes to development and integration also the test case I was showing with the Nano, it's all through a USB cable even if you have a really long one you might actually think about having batteries so that somebody can wear it on the wrist also it was quite hard personally to develop this joint project since we're not in the same country so we and we are also not really developers so we had a tough time using remote calls for programing, let's say also we were showing the symbol note ownership on AWS to bring an industrial grade server to the stage but it's really hard to actually connect to both power and network in general we also were trying to have files connecting to my homelab for some tests but VPN access through internet is not as easy as it looks like and eventually I don't know if you guys have been experimenting with both security with the recent version of OpenShift but it can be quite tough if you try to run something so just to remind you what you can explore in terms of initiatives inside Red Hat so there's a healthcare validated pattern so I don't know if you heard about validated pattern this is dedicated to medical images and providing better diagnosis to doctors this is all done through OpenShift data science that I mentioned before and basically what validated pattern is really just a way to just a POC code that you can re-execute its store in GitHub so you can contribute and it's designed by the industry and by your skates basically so yeah just as a recap of the whole thing we're just putting this effort into using OpenSource for assisted care so hopefully when it gets hard time to retire we will have a happy and healthy retirement question yeah sure yeah yeah maybe I'm just in a single load of push and trumps at the patient's home yes the idea is to run it there as you probably there were a couple of sessions you can even run MicroShift if you have limited hardware resources it's given the workload we are working with it will run on MicroShift as well yeah it would be hard to ask to train data when other people say then it can be very similar I told you as they call on the ground so how do you measure if it works correctly so actually there is so if I understood correctly the question is where do you get the data for the model to be precise the following model to be so actually there is quite a data set available I didn't have to train it myself but with Edge Impulse you can even train it yourself and add your own data to the model quite easily yeah we just wanted to incorporate as many Red Hat products as we could given the title of the talk it's just using Red Hat technologies so that was one of the main reasons behind using EDA yeah I guess also the advantage is that you can automate other stuff so you can leverage it for future use cases as well yeah actually the output is Cloud Events which is a standard so you can trigger events stuff on AWS I guess or other platforms cool, thank you thank you so, good afternoon my name is Josebli I work in Bernou and I live in Kuzim and I'm one of two and a half developers of Nemo Mobile this talk topic is the question the question is if the Nemo Mobile is suitable for Automotive so, first of all who are are here some Automotive professionals okay, so I would like to you will see why it's not really professional or for professional use probably or maybe we will see let's try to answer the question to answer that question we have to look what is actually in Nemo Mobile I will go through current status of the project I will talk a little bit about its architecture and I have some Automotive demo also and yeah I have few details about project you can hear Mobile in the name of the project and it can tell you that we are focusing on the mobile phones so we would like to have mobile phone with all Linux distribution to make phone calls etc currently we are developing that on PinePhone I have one device here so try it play with that there are no sensitive data so you can do whatever you want if it's crash then it's crash if you don't return it to me you have to develop it instead of me be careful you can try to run it also on PinePhone Pro I have proof of concept on QTPi tablet I have image it was running but right now I don't have device I'm using it in my KBM virtual machine you can find in ARM profiles repository list of devices which are supported so you can try other device some people was trying to run this distribution on Android devices they are using hali on layer which allows to run distribution with Android device so it was on the picture you can see the VolaPhone the code name is Iqtrasil I seen some Xiaomi Redmi devices and others if you are interested you may try your own device but it's more work to do so the operating system looks like this it's basically a source of sales we have three screens one with notification one with task list and one is up launcher so we have some UI we are using mallet keyboard to be able to write something it's great that you can connect to Wi-Fi and debug it not only with serial cable but with Wi-Fi or through Wi-Fi USB networking also works quite good we can control the phone using buttons so you can turn it off you can change volume of course with geeks we need terminal and wired terminal you can install applications using pacman you can see pacman it means we are using distribution based on our clinics namely manjaro manjaro provides to us many packages as kernel auto tool, cmake, gcc and others with Nemo mobile we are extending that with roughly 150 packages roughly 70% percent are taken from merproject if you don't know merproject it's me go resurrected it is part of stack which is used in selfish os and viola is maintaining big part of those libraries which is great you can see some examples of those libraries we have on for ofono so as you can see we are not using network manager and modern manager we are using conman instead of network manager we are using ofono instead of modern manager yes again some examples of applications so this is setting application and browser if you know say official os then you know that official os is using ut56 which is I would say ancient one of reasons to move to manjaro was to use up to date qt so currently we are using qt5 15 we have own qt quick controls those qt quick controls are based on qt quick controls 1 which are incompatible with qt6 we are working on port for qt6 right now it is lot of work and we are moving slowly because it's like weekend project it's not our day job so it's really slow we are pushing back patches to selfish os viola and they are managing it mostly we are using some middleware like wrappers around other libraries which are named nimo, qml, plugin which is basically interface into qml to use other things like mde, demon which can provide wakeups and so on we have own ui application we call that glacier and those are examples of user experience you can have 80% of screens you can see on the device that was nimo itself and now let's try to answer the questions it is ready for you do you have an automotive no that short answer on fosdm this year and there were some stand with automotive demo and they told me if you expect that you will get that device and put it into car you shouldn't do that you shouldn't expect that it's platform, it's 80% ready so I told to myself I'm thinking about percentage of matching of some expectations so let's try to do that with demo so I think we have at least few percent of match with automotive expectations so I have found some paper from Walmart Anders and Wikingson and architecture and if you look carefully I know the letters are small you can see home screen, we have home screen so one point for us bluetooth, wifi we can connect to bluetooth we can connect to wifi wow so much AM, FM, tuner we don't have that actually behind behind the the whole stack in demo for telephony it's actually necessary to work with audio profiles because when you want to make phone call on mobile phone you need to switch profile when you receiving phone call you have to switch the profile back after the phone call is over you need to support bluetooth headsets and so on we have quite a lot of packages which are managing those audio profiles so I expect we can be close to that requirement we have some media browser telephony contacts navigation sort of I don't know why social media media apps are mentioned twice maybe more social media is better but we don't have them and the email application doesn't work quite well right now so I would say we don't have it so far we can go deeper in the stack so we have some native HMI we have some other things like location services we have some kernel for sure so we have many components in Nemo and it is hard to say if it's like 50% of match or how much it's up to you actually that you should provide answer to me if you think if it fits for sure Nemo mobile needs a lot of love there are a lot of bugs but it can be platform also for automotive and now the question is to you what are your expectations do you have some fast boot of device boot up in 2 seconds or something no one ok let's move forward I made some demonstration application with OBD2 Bluetooth think for 10 dollars or something and I was trying to make it work and I did actually fail and I don't know why so maybe some expert can help me I connected to device paired it, made it trusted connected to that using rf.com I have created block device and that and sent ATZ command using minicom and nothing happens when I tried on other device like on x86 it was working but not on the phone not on arm and I don't know why so maybe someone can help me I was not successful in that way for the demonstration I was trying to make GPS work in and it is also not so easy there are many combinations how you can connect it together and modern distributions are using GALPLU interface in version 2 and it has some provider in QT maybe most probably also in GTK and if you call some it's QT location if you call some positioning source start updates you can get data for some reasons we realized that GALPLU 2 doesn't work for us as we needed we had some issues probably with Mozilla location service the GALPLU provider and therefore we switched to GALPLU 1 which is not good thing I think we need to change it back to GALPLU 2 but some time some laugh and as a consequence we are taking QT from Manjaro so it has all shared objects for all providers so GALPLU 1 and GALPLU 2 but it prefers GALPLU 2 for some reasons so if you want to do it on the device right now you need to remove GALPLU 2.so file then you restart device probably or application which is using provider and then you can get to GALPLU 1 and from GALPLU 1 you need to get to GPSD or Mozilla location services so MLSDB provider inside then you will have the position to go from the other side because the provider goes then to GPSD then you can start GPSD and debug it using CGPS and you will get data as far as I know the GALPLU 2 access directly to GPS and it is a little bit easier and Mozilla location service should work somehow but I haven't done it deeply so far ok that was about my my if you want to start develop NemoMobile we have pretty nice website which contains description how you can install that on on pine phone it's basically you will take the SD card and you will put the image on the SD card then you put it in pine phone so it's pretty easy we have description how to install SDK well it means that you will install those packages to Manjaro virtual machine and then you can compile everything yeah we have some description of our Qt quick controls because they have specific API and most important is that you need to know the names of components to be able to choose the right one please try to contribute with code that is like the best option for me I was writing some blog posts about progress distribution so every month one blog post with news about project I was I was adding always section back to start with but there are so many bugs that we need to solve them first so you may look into those blog posts and you will find what is not working as we need we are using Manjaro so it is using package builds package build scripts to rebuild instead of spec files but it is very easy to understand if you can work with RPM it is simply which contains functions and each function corresponds to section in RPM spec file it has some limitations but it is much simpler than RPM I would say if you want to start you will probably need to go to demo packaging repository it contains package builds for all our 150 packages so you can start if you want to debug calendar then you will start with Glacier calendar repository you will see how it is built and you can what dependencies is have and you can track it down to issue then when you have package you can build it locally using make package or build ARM package for cross compilation and then you will probably build some image so build ARM image will take the repository you will provide and will produce the image you can install on your device there is a good repository which is called ARM profiles and it contains list of devices which are supported by Manjaro if you contribute also with something else than code you can translate it, you can make some documentation we are looking for some user interface drafts to implement so if you are a graphical designer you can draw something for us and we will be happy that we can try to implement it yeah and if you spread the information and if you find some other developer who can help with project if you write a review of operating system again we will be happy that people will know about project money for living is also important but it's like we can project so it's not so important for those strange devices we are buying them for those money so more money means more devices okay demo time it boots I'm cheating a little bit and now we put it it's virtual machine so demo mobile looks like this so you can see the task which has been I go to the second screen gesture third screen and back again to the first one we have some nice applications like browser which is basically oh that's broken so what do you want to call the applications you are going over the edge same visual as the sphere pull the application away then we have some settings up and we will switch the screen somehow correctly yeah I have some automotive demo nothing works strange if you try camera on the blindfold it will not work this is proofing machine so I can I can redirect USB and I can start camera on the device and I have it works with some integrated camera on laptop and it's problem on drivers because by phone we have thumb camera which needs a lot of things to compute like like wide balance and those things need to be computed by software you need to control switching of resolution because when you stream video you need lower motion to be able to store it compressive in case of photo you need big resolution like you want to store 10 mega pixel or something like that so therefore it's difficult task and this is done somehow with the camera but it's not so easy to connect it to beauty application or the standard video or beauty multimedia doesn't work with it so well so but there are reference directly to the camera and actually in selfish OS they are having camera application which works with the camera and we are trying to have working applications for own calls and those things opens door to automotive usage so we have a lot of stack which tries to be prepared for those advance use we have also media player and so on and we have that for beauty 515 so far and beauty 620 so on so I have four questions we have 10 minutes for discussion what do you expect what do you want yes that's good question why I am trying to use it in automotive well I know that redhead started recently with automotive and I wanted my talk to be accepted on Defcon so that's answer the question is why the selfish OS is stacked with old QT version I think the answer is licensing but it is not question on me it is question on YOLA so I am not I don't know exact reasons but I think the license in QT 516 was a GPL or something and in newer QT it changed to GPL or commercial so the expensive one that could be answer but I am not sure maybe they want new people environment because they have big community of people and if they migrate to new QT then everyone needs to update the application which can be trouble I am not sure if anyone can hear me so some other questions if no I hope that I will be go ahead the question is what is how do we handle security in the mobile if we use selenux and I didn't catch all the technologies you mentioned separation of processes etc so that's good question one of differences one of differences of Nemo Mobile right now from Sailfish OS is that we removed some security mechanism they have so they have some sale jail if I remember correctly and few others and we are not using them we want it we expect to implement selenux to be closer to desktop usage like to have selenux based security speaking of process separation I don't know one of things which Sailfish does is also well it's nasty trick to reach performance they are doing launching application which loads all Qtquick components on the beginning on the startup of device and then it is using shared memory to access those components which is pretty fast but little bit insecure so we moved from that approach also because of Manjaros have I think fpicts flag the address randomization flag in gcc or as default so this approach is not possible with that flag so the answer is we are trying to move forward to move closer to traditional desktop distributions we are considering those options but I think we are really not security experts so it's hard to say if we are doing that correctly and it would be great if someone who is security expert can tell us this should be done differently thank you again for question go ahead the question is selfish or as helpful description I don't know it is there but we don't have it we don't have it in our system so far but we plan it it's one of planned features I'm not sure how do we want to do the description how do we enter the pin code as far as I remember post market OS does that with some lvgl tiny application which is in a drum FS how Sergei plans to do that ok so if you don't have any other questions I will be here on Defcon also tomorrow and also today so feel free to grab me I will try to do some mobile stand again tomorrow so I will try to bring my device the pine phone with other operating systems like I said before I hope Pavel will borrow me the library again to see other devices and yeah feel free to grab me, ask me I will try to answer you and please contribute to the mobile fix my box I need that to work in so thank you we're going to get started so thank you everyone for coming we're going to talk a little bit about a project we've been working on at Redats as part of the automotive program and that has to do with managing services across different nodes so a few things to know I'm you can see the logo on the top right because we have a logo we have stickers so if you have questions at the end we'll hand out stickers and you stick around we may give you stickers if you want them so without further ado what are we going to be talking about here I'm going to be giving you a little bit of the introduction of the setting up the context of the challenges for running the automotive industries and what led us to the challenges of the automotive industries are multi facets one of the facets that has to do with this is it's a very very competitive landscape there are many automotive car manufacturers around the world and some of them are very old if you take Peugeot Peugeot was founded in 1810 before the first car was even created Peugeot was originally a kitchen appliance company building kitchen tools which is also why you can still find in some places a Peugeot branded salt and paper grinder but then at the end of the century they started working on their first car in 1889 and Peugeot is worth about 50 billion dollars today if you look at General Motors which is the second car manufacturer worldwide it was founded in 1908 it's 47 billion if you look at the number two three and four from the bottom here we have Ford which is also a very famous and pretty old automotive company that we all know about from our history books and economy classes about the way Ford has industrialized the production of automotive they are all about the same weight when we look at market capitalization then we have Toyota Toyota is today the first car manufacturer in the world to produce per year Toyota leads and it was created in 1937 Toyota is worth about four times as much as the previous three but then we have the young king around the block the young kid around the block that's Tesla Tesla is barely 20 years old it's not even legal to drink in the US yet and yet it waits nearly 12 times as much as Ford 100 years before so Tesla is doing something in the automotive industry that is changing the landscape of things that is it's building something that the market is believing in and that is a challenge for the old automotive companies because they suddenly realize that there is a kid on the block that just created that just appeared and it's already waiting 10 times as much as I do and I've been there for a long time I'm an old timer that's a challenge, that's something they need to address what does Tesla do that we are not and that makes them at this place here so that's something they need to look into something else that has changed over the last few years was Covid-19 and as much as we like that this is over the refills of that pandemic are still around there and one of the side effects was the ship shortage and we're still recovering from that I don't have a sources for that but at some point that during the pandemic car manufacturers would have been able to sell twice as many cars as they did except they couldn't produce them they had customers they had salesmen they had the sales, they had the customers they just couldn't produce the product just because of the ship shortage so that leads to some decision that needs to be made something else that has changed to the industry is the user expectations we no longer see we no longer have the same relationship that we have with our IT systems as we used to and one of the reasons is simply our smartphones you update your smartphones Apple is well known to be able to update the operating systems over the lifetime of the hardware Samsung has announced a few years ago that they are now supporting the hardware for five years there is an expectation about portability, there is an expectation there is a cycle for our hardware that has changed we want updates, we want features we are expecting that and when you get into a car and you realize a brand new car has information in it that is maybe older that the brand new car that has been bought before simply because car number two was actually started before the program for car number two was started before the program for car number one so a friend of mine bought a fairly decent car brand new and the GPS data was older than the car which I had bought new few years before just that the model year of the car that he bought was older than the model year of the car that I bought even though it was newer outside of the factory so our user expectations have changed something else that the automotive industry is looking at is this de-visification of revenue just being able when you sell a car you have no guarantee you sell it, you have income from one time but how can we make that income persist over time how can I make more money from the single car and these there are a few ways that the industry is looking into all of these problems one of them is okay we have a chip shortage so we need to revise how we are building our cars we need to revise what the onboard computing systems look like so this is a slide from NXP which I found in one of the publication which goes to what we currently have today it's called domain vehicle architecture where you have a lot of small distinct compute units across the car like a modern car can have nearly about 100 different computing units when you're looking for when you have a shortage of ships you can understand that building different 100 distinct compute units in the car is a problem so what they're looking into is more what they're called zonal architecture and the idea behind that is that you have less distinct ECUs that are really dedicated to something specific and more bigger ECUs that are able of handling multiple of these discrete ECUs tasks so you have less hardware but bigger hardware more powerful hardware but also hardware that can potentially evolve over time hardware that is no longer able to be exactly doing this one task but it is produced with a design that it may be doing something else in the future so the architecture of the vehicles are being worked on but the hardware is only a part of the story the software becomes the other part of the story we need to we there is this concept called software defined vehicle and the idea is that by changing the software in the car we are able to make the car evolve during its lifetime but we're also able to customize it to the user's desires and wishes and needs so software defined vehicle revising the architecture revising the hardware is part one revising the software how do we approach software in the car is part two and then so what's the vision the vision ends up to be something that is very very similar to what we have on our smartphones we want to be able to do software updates and we want to be able to do that over the year just like you update your Android phone or your iOS phone just by plugging it to the Wi-Fi you want to be able to update the hardware while in the car it's still going to be take the car to the car dealer and potentially get a new computing that is more powerful just like you go to your phone store and change your phone you want to be able to have applications being able to update the car automatically this application may give you new capabilities new features you know if you look at if you look at Tesla a few years ago there was an update to Tesla that has increased the actual engine the the that has increased the horsepower of the engines basically you bought the car it simply because they were able to optimize the way the software was able to get you know working with the engine and just by changing the software they actually changed the physical capabilities of the car suddenly the car is going faster so then in the car is more powerful you know new features new capabilities can be also something fun so you know like you've missed on paying your cars don't don't pay them this month so the car is going to drive itself to customers customization and building an experience that's more interesting it's like you know when you you're able to start to customize the experience of the driver based on the driver's preferences which means you can build habits and when the when the time comes to change your car you're actually going to try to find these habits that you've built in your car and therefore you try to build fidelity to a certain brand so how do we get there we also want to standardize and that's a place where the Red Hat in Vehicle OS becomes interesting it's a Red Hat has been always very strong about standard and open standard in particular so having an operating system that relies on these standards will actually help developing on top of it and then we spoke about you know customization applications so we spoke about containers basically we want containers for process isolation that's been covered a little bit this morning about being able to ensure that containers interact with each other when they should not they don't impact one another container also means that there are specific life cycle management which we're used to so we can install we can update we can remove containers and let's be honest container is the de facto standards now in our industry which means you know talent acquisition for car manufacturer become easier if they don't have to learn the specificities of an automotive product but they don't have to learn the entire ecosystems around it when we speak about containers we want to speak about with an S then we want to speak about orchestrations as well and today when you speak about container orchestrations we're practically speaking about Kubernetes so I'm going to address the elephant in the room here do we want Kubernetes in a car and I already see someone say don't spoil it give the answer to some other people so the answer is okay I'm going to spoil it we don't want a car and there are a few reasons for that one of them is Kubernetes is built around the concept of eventual consistency which means there is no guarantee when a change will be applied but there is no guarantee of the order in which changes are going to be applied the extreme example that we always take but it's technically not a good example but it's always helped to understand it's like you don't want to be driving a car that will say that you press the brake pedal and then suddenly your car will brake that is not an experience you want to do it's a bad example because we're not actually going to be involved in the brake systems but it gives the idea you don't want a car that is working towards a state you want to be either in state A or in state B you don't want to be somewhere along the journey you don't know exactly where another something else is that Kubernetes is fairly heavyweight Kubernetes and its derivative have been built around the container run time it's not able to give a status to signify when something had changed so Kubernetes have been built around the idea of I can't get the information so I need to go and get it you can't give it to me so I'm going to get it so it's always asking like are you done yet are you done yet at some point it takes resources there was a consortium that did some investigation on that and on Raspberry Pi granted it's fairly low device but 15-20% of their system resources were used by Kubernetes just doing nothing just Kubernetes being there and poking the container run time asking how you done yet Kubernetes is meant for distributed system it's great for cloud environment it's great for worldwide distributed system but that also makes it a very complex system and a lot of that complexity is not needed in a car things like end demands scaling out I'm suddenly running Black Friday in the U.S. my store has a lot of input suddenly for the sales being able to scale out to a public cloud so that I can get more resources so that I can accommodate to the sudden influx of data makes sense but in a car you're not going to have that sudden influx of data the amount of data that you get from the review camera is always going to be the same thing so that scaling out simply is not needed in a car a lot of that complexity that makes Kubernetes great in a number of environments especially the cloud environment just is not applicable in a car another example is failures Kubernetes is designed around the idea that if some of the pod fails it's going to do its best to keep things working when something fails in a car you don't want to keep working if something fails in a car you want to know about it and you want to tell the drivers put the car on the side of the road you don't want to keep working as best as I can I can't detect pedestrian anymore but that'll be alright it's not acceptable in a car so what do we need we need something that's deterministic we need to know what runs, where it runs when it runs we need something that is lightweight we have a resource-contrained environment we need something that is fast you use the expectation you don't want things to start you don't want to wait for things to be available and you know it's a small bullet point here but it's actually one of the core elements of what we are looking into is the functional safety, FUSA the functional safety certification is basically certifying that your code is doing what you claim that it is doing that if I ask for it to write a certain content in a file it's going to write that content in a file and the more complex the system is, the more harder the functional safety certification process is going to be because you're going to have to go through every function that are used in your code base and ensure that they are doing exactly what you say they are doing so to answer this whole of this problem we've worked on something called Herter and I'll let Michael introduce it to you thanks Pierre as you already mentioned Herter is our answer to it and the basic idea behind it is to use system D to control local services on one machine but adding a thin layer on top of it we are able to manage those units remotely it's important to note here that we don't want to manage any state or so we are just the facilitator of this management and our approach here is to build a setup or a system that consists of components basically the controlling component we call Herter that one is running on the main machine and this one is controlling of course then all the connected agents so the Herter agent is then running on each managed node and it gets basically the commands from Herter and forwards them to system D so we are able to remotely start, stop control units on remote machines we decided to go here and implement this with C considering those constraints to be as fast and lightweight as possible and hopefully in the future to FUSA certified as the IPC mechanism we chose D-Bus well since it was already used in system D and if you are wondering now how this is exactly being set up in Herter well I am going to show you so Herter is running on the main node which you see here on the left it reads on startup a configuration file where we can specify of course all kinds of settings for example the port that is listening for new connections it then goes ahead connects itself to the local system bus and provides a public API to it so that other external applications for example like a state manager could use this API to well control the whole system we already implemented something like Herter CTL which is similar to system CTL from system D but for a multi node use case and on the other side we have a managed node where a Herter agent is running again reading some configuration for example where we have settings like the IP address of the main node and the agent connects itself to the local system D-D-Bus via Unix domain socket and by this we are already able in the agent to control services on the managed node but the agent then goes ahead and wants to connect itself to Herter based on the settings that we specified namely it issues a connection request over TCP IP and Herter itself responds by creating a peer-to-peer D-Bus this peer-to-peer D-Bus is used exclusively between Herter and the respective agent so in addition here does a lookup when the agent registers it does a lookup and checks if it can find for example the node name and if it can find the node name it rejects the whole connection request if it can find it if it can find it then it accepts the connection with that we are already able to control those units on remote nodes and of course we can scale up so from one to N nodes which we all specify upfront in this configuration file and as you can see here on the left side on the main node we can of course run the Herter agent alongside Herter basically the same mechanism of course one question that might already arise is how do we deal with cross node dependencies like in this example here consider we have the setup on the left side we have a new foo on the right side a node bar and both are connected to Herter and now we want to start the cow service on the node foo what we don't know yet is that the cow service requires the sheep service to run on the node bar well how could we resolve that kind of dependency for once like I said we could use an external state manager that basically uses the Herter API but already knows that the cow service so it would first start the sheep service wait for it to run and then start the cow service what we added however was a feature so called proxy feature to push this dependency resolving to system D that a developer could at development time define I need the sheep service to run on node bar for the cow service and it works roughly like this you see here on the lower left side that the cow service requires a so called template unit this is a unit file that Herter provides to the developer where you can pass in the tuple after the example for node underscore the unit so by the name of the unit that you require and the node that you expect this unit to be run on in our case we want to have the and that's actually a mistake it should be the sheep service so it should be like Herter minus proxy at bar underscore sheep service please substitute that which this template unit then takes and passes it to small binary the Herter proxy which in turn separates those two input parameters and does API call to the Herter agent it's important to note that this API call is blocking so the proxy waits for the whole flow so it's between a successful or a failed setup and therefore this can reflect then of course in the cow service the agent forwards that request to Herter Herter knows now on which node to run which unit so it creates a start request on the node bar in our example and it wants to start another template we don't want to start it directly because then we can not because then we kind of limit the ability of the developer by using a template in between the developer has all the freedom to specify and define the sheep service however he wants it to run this template unit again then has a very weak dependency on the requested unit meaning if it's not already running we will start it but if it's already started then we don't care nothing happens basically important to note however is that Herter will keep track of all the references on the node unit tuple so we know basically when how many references there are and with this setup we can already resolve this dependency just by using already existing system D features and of course we can do so on development time so to say after this and now I have a few examples that I can show you which I pre-recorded first of all for these examples I used this setup I started a Raspberry Pi and had an Heter agent running on it and I also run a Heter agent on my laptop connecting both to Heter which was also running on my laptop I was interacting with the system by using Heter built here to CTL and well the first thing that comes to mind is listing all the units that are on those nodes so in this example I varied for all nodes the units that are running or not running and I filtered them based on their name in this specific case I wanted to have all units with the name that contained a bus in it so we see for example the bus service and bus socket on the laptop which are in an active state and are running but we also see for example some devices that are plugged we can also start and stop system the units in this case I first filtered of course for the specific cow service and we see that is currently in an inactive state and that then I started the cow service with Heter CTL start on the pie cow.service and often querying it again we see it's active and running again stopping same procedures always stopping it here to CTL stop pie stop pie cow service listing again and it's inactive in that again when starting this cow service I created beforehand a monitor so you can imagine those start and stop operations always involve some state changes internally on system D site so what if we wanted to get notified on certain changes with for example here the CTL monitor and I wanted to get all units on the pie and this cut example because otherwise it got too large just chose the state changes between the states of the cow service so you see in the second one you see that currently had an inactive state then it changed to activating state and of course I could now probably do some different operations if I need to and the last example is kind of similar which is here to CTL monitor node minus connection so as the name basically suggests we can also monitor the state of the nodes in this example you see that the laptop and the pie were online then I went ahead on the agent on the Raspberry Pi stopped the agent and this immediately got reflected on this monitor so the pie was stated as offline which is especially useful if I want to health monitor my system and do some operations or have some fallbacks and that's already it some questions so systemD already has the mechanism to do remote connection but systemD already has a built-in mechanism to do remote connection but that remote connection relies on SSH SSH is not something we want to be running in the car because SSH cannot be hardly be limited by the tool shell section which means if someone plays around in the car and gets a shell access to one of the computing it it's not actually something we want and by using the ATH and the control we are actually able to expose in one port only the management to control services you won't be able to install a rootkit by this because you can only control systemD services the other problems the other question is doing this in systemD itself is something that we've been thinking about we would love to be able to have it in systemD proper but there was a timing perspective being able to work with the systemD community to polish it and make it up to the systemD standards would probably have taken more time than we are available to get this project in a state where it's usable so I would still love to see that figuring out is this what we wanted we started with a concept we talked with the car manufacturer that we are involved with does that satisfy your solution do you see things that are missing this presentation is still something about us trying to figure out are there missing missing are we missing something in the picture here did we forget something there are always more brains in two heads than in one so there is always a capability that we have seen something in systemD then we would not have the flexibility about potentially changing our approach if we needed to so that would be my question we needed to validate our approach we could not use SSH and doing it in systemD proper would be ideal but we would have done it ok but we are now there are many things that you would want to check for the moment but check if it's running or not and you could have it through the systemD again the fourth leg the systemD dependency check for version 5 sorry the question is about leveraging systemD for dependency resolution and it's exactly what we want to do that's why we implemented the process service because we don't want to have to deal with figuring out the order what can I start in parallel what needs to be sequential what needs to be built for that so we actually want to leverage as much as systemD as we can and we try to complement it rather than implement it yes the Q&A I'm trying to see if understood if understood all the questions so basically your question is if you already have a project where we applied here to or so did we start by considering communities before we implemented it yes we did but we built the analysis that I presented earlier we did the analysis that I presented earlier and figured that despite a lot of people asking us we want to run communities in the cars we conclude that this is a great tool but not to run in the specific environment that the car is because it's complex because it's heavy weight because the complexity is going to make the functional safety analysis practically impossible because it has the eventual consistency that this is built around so all of this may be like it's not suitable for the in-car communities so maybe it's very likely that should be run communities one of the things that and I think it's going to have to be the last question because we're out of time one of the things that communities does it's very hard to impose communities run certain payloads on certain systems and in the automotive world you need to be able to test the entirety of your system and then you need to start in the car so that you cover all of your bases when it comes with I have a critical system that I need to ensure that this critical system always have enough resources and then the question of the dynamism of I want this container to run wherever there are resources is actually something that is being considered but we don't have a proper answer for it yet because we still need to ensure that adding an extra container on a specific system is not going to interfere with critical systems that are already running in there so there is the environment, the in-car environment can be a lot more static than what communities is used to deal with and that's they are basically built really for different worlds and it's going to be hard to make communities work for front-end TVs but I think we're going to stop in here at the time we can finish outside thank you thank you a bit difficult to pronounce I've been a software engineer for six plus years but I've recently transitioned into the cloud native space like sometime this year I'm a software engineer at SpectroCloud and I also contribute to the Kairos team so today we're going to be talking about Meet Kairos for those who have already seen us at the booth Meet Kairos again the meta distribution for the edge so when we talk about meta distribution what do we mean we understand that different teams use different Linux operating systems for example Alpine Opium Susie Fedora and we do not want to reinvent the wheel I mean all these operating systems already do a great job at providing great package managers and everything that they do so we don't want to ask you to change your operating system so and we also understand the fact that companies have invested so many years into learning how to use these operating systems so Kairos is not asking you to change any of this aside that different organizations have licenses that they use and you don't want to do away with that also for people that don't have licenses you might have golden images and those golden images like some of the templates and the standardization that your team makes use of you don't want to do away with that so that's not what Kairos is asking you to do Kairos actually builds on top of what you have already a lot of our computing is moving to the edge which means that we want to move away from centralized data centers and then move our workloads closer to the customers or closer to our clients and for real time processing and all the advantages that the edge provides we are gradually moving more to the edge and then less towards data centers so because our devices and our workloads are moving closer to our customers there are a lot of challenges that this comes with and some of these challenges include security I mean so when your computer sorry or when your computing is closer to the customer there's increased vulnerability it can be tampered with easily and there's also the case of stability so if a user wants to make an upgrade or make an update you want the user to still be able to make use of that edge device without having any issues so the stability also comes in and then there's deployment everybody wants a plug and play nobody wants to go through all the deployment hazards and companies also want it to be able to have very easy deployment experiences so these are some of the issues that come with having your devices or your computing closer to the customer or in the edge so Kairos to the rescue so for security one of the ways in which Kairos solves this problem is by immutability this means that all the OS components and including the kernel and in it RD are immutable you can only read you can't write however Kairos also provides a persistent like where you can persist your data and you can always update this part of Kairos and then when you think about it the first thing that comes to mind is okay doesn't mean that this other part can be tampered with and that's not the case Kairos actually provides security by encrypting this encrypting it in TPM chips and this means that if for some reason your hard disk for example is stolen you cannot decryption cannot happen except on the same computer that on which it was encrypted so the user actually needs your physical device to be able to decrypt that data so Kairos provides the security both on the OS level and then your persistent volumes and then for stability Kairos also provides alpha beta upgrades and what this means is that you have your running image which is your active image and you want to make an upgrade to another another like make an upgrade and then that upgrade that you want to make we call it the passive one so Kairos downloads the newer image that you want to update to and there's a transition phase where after the download Kairos tries to reboot your device and then loads the new image which you've downloaded and if there's any issue it's going to switch back to the previous image and this is how it provides stability so I'm going to allow Mauro to continue from here. Yeah and of course even if we try to make everything as simple and resilient as possible we know as engineers that bad stuff can happen, right? For those scenarios where for some reason neither the active or the passive image is working properly we offer also a recovery partition in which you can have a shell and try to either fix one of those two images that is broken or at least get a hold of your data and save it, right? Another interesting feature that comes with Kairos is what we call a peer-to-peer network with self-coordinated cluster bootstrap. What does this mean? Well there's this component that we introduced called HBPN. It's basically a mix of a BPN ledger and what it does is that every note of this component is going to keep record of every other note which you provided with the same kind of key and these notes are running on top of the live peer-to-peer library that means that if for some reason there is limited connectivity, some protocols are not working, you can make use of any other protocol that the live peer-to-peer library offers. This can be useful because some of the edge locations might be restricted in the kind of connectivity that you have there. I don't know if for some reason TCP is not working, maybe the web sockets are available and the notes can continue speaking to each other there. Another way in which this is useful is let's say that I want to install a cluster and I decide to have for example two master nodes and the rest can be workers but of course one of our master nodes stops working for some reason then what happens is that because every node has the ability to know about the others then they will start to communicate between each other and with consensus decide you know what, we lost one of the master nodes so now I'm going to become master node and again update this ledger that each of these nodes had in its own so that at the end of the day you continue having the stability in this case for example if you want to provide high availability on that particular cluster of course if at some point that other master node come back online he's going to be able to talk to all of them and again decide whether it stays as a master node or if downgrades as a worker what else another thing that we want to do like we were mentioning is make it easy to deploy and one of the things we provide here is that Kyros is easy to configure and maintain what we use right now is cloud init for the configuration of those nodes so all your configuration can be done via YAML files that means that you can keep it in your github repositories keep track of the changes that that configuration is having throughout time and as you can see here this is of course a very very simple configuration but you can get an idea of how it works you just have in this case the number of users that I want to install I want the user Kyros is going to have the password Kyros and also on top of that I'm going to tell that this user can get the github SSH key from the user called Modeler and also maybe an SSH key that I provided there and finally in this case for example I'm enabling the use of K3S so that means that this node will have the full API capabilities of a Kubernetes cluster another way in which we try to make it easy to deploy is that we provide a web interface so that the only thing that you have to do is if you know the IP of one of the nodes you go in this case for example to that IP and the 8080 port and you will see website where you can just copy paste the configuration that you want installed for that node and then you run it and by the time it's done your system is going to be done. Some cases being at the edge you have some particular scenarios maybe the person that is in physical possession of the node doesn't have you don't want them to have the full configuration of the node itself and in those scenarios what we provide is that as soon as the machine is booted you have a QR code and then the person just needs to take a picture then they can send this picture to your headquarters or however you're doing this and the person in the headquarters is going to with the simple use of a CLI tool in this case the Kyro CTL is going to register that node and it's going to as you can see here I'm going to provide what is the configuration file that I am going to use to connect to it. Another interesting component that we provide also useful when you're doing deployments is something that we call Aurora Boot. Let's say that for some reason you want to send a couple of clusters to different places across the country to build these clusters in I don't know different parking lots or grocery stores or businesses, right? But you don't want to go send a person do the full configuration in the place and then go back it could be very expensive or it could take too long. For that scenario what you might want to do is do some networking configuration you just spin up Aurora Boot and then you tell it which image you want to install again which configuration you want to have. Then every machine that you start in the same network is going to be configured to have that specific setup that you set for it. Then of course you just unplug all those devices, put them in boxes and send them to the locations that they need to be. The next time that they boot is going to be with the proper configuration that you were expecting to have. We think Kyra is a great solution for edge computing because just to summarize the little things that we mentioned right now it's immutable and that means that we reduce the risk surface that an attacker will have is distribution agnostic so we allow you to continue using your existing knowledge on whatever main distribution of Linux that you are using. We do a lot of things that we don't know how to do and to avoid ending up with a system that is not working after an upgrade. We have TPN encryption to add security on the user data that you have there. We have peer-to-peer networking so that communication between the nodes is as easy and simple as possible and as reliant as well because of the peer-to-peer library. And last but not least let us do a quick demo. I would have loved to do a live demo but you know the risk of this not going properly right? So instead I decided to go for a video of this so that you continue listening to me and what we're going to do here is that we're going to create five nodes and these five nodes all together are going to become a cluster a Kubernetes cluster and I will be skipping because you don't need to wait for everything that is happening and then once I create those machines in my data center I am going to spin up Aurora boot here you can see that Aurora boot can be run from your computer or you can install it on your machine and I'm going to pass this YAML is the configuration YAML that I was just describing before it has K3S configuration and in this particular case I also mentioned that I want to have three master nodes from the five machines and that's it we let it continue for a bit and what Aurora boot is is going to be an image from the cloud this can be one of the images that we provide or one that you built with Kairos and it's going to copy the root fs into your local system or in the container depending on which of the two versions you used and the final result that it does is an ISO image that will be delivered via the network and now we're going to start spinning up the VMs so you can see here we can start them in parallel they are all in the same network so they are all going to start getting the same configuration and I'm going to launch a remote console so that we can see what is happening in the node here on the left you see Aurora boot receiving already all these connections from the different VMs and that smaller window is the VM that's spin up and it's already receiving Aurora boot instructions on how to configure and install this node and let's make it go a bit faster that way there's more room for questions so you can see that everything is happening in parallel you don't need to wait in serial for each of these to be configured let's go faster this part is a bit slow and you can see that one of the nodes at this point is already booting I think I went a bit too far yes so you can see that there is grub and it's a bit small there you can see that we have the option to just boot Kairos to go into the fallback which is the description that we were saying about the AB upgrade so if there is some issue with the current image you can fall back to the previous one that was running the recovery one in case the two A and B are broken so that you can try to fix things and the rest of the nodes of course are also going to start booting at some point and here in this in this screen back here we can see that Kairos already booted for that particular machine so now what we're going to do is we're going to SSH into one of the machines and using the Kairos agent inside the machine we're going to fetch the information on how to connect to that cluster so for that what we use is a command called getCubeConfig and this gives me all the information of that configuration which then I could either all of these images come with K9S as well so you could do it from that node but just to show you that I also can do it from my machine all that needs to be done is copy that configuration save it locally and then point our KubeConfig to that configuration then we start K9S it's going to communicate with the cluster that we just spin up and now we can see it there and next step we can see how KubeConfig registered in the cluster three of them the first two and the last one are masters like we defined and two of them are workers and just like we can do anything that can be done at this point with our Kubernetes cluster like showing up the pods and see traffic there or any other service as you can see is relatively simple to set up a complete Kubernetes cluster on your edge devices so now I would like to give you guys time for questions is there any question out there sorry immutable you're asking if the images are immutable that's correct so the is that the root partition so and all your configurations like everything on their Etsy directory and all of that is not going to be able to be changed because it's mounted read only and on top of that like what some distributions do is that they create the RAMFS at the moment of doing the installation and the problem with that is that during that step you could be at risk but what we do is that we create the init RAMFS at the moment of building the image so at that point you're still within your network right within your safe environment to a certain degree and once you have it installed in the system even the init RAMFS is read only and same thing for the boot partition anyone else yeah at the moment we sorry I'm going to repeat the question so which flavors or which distributions are supported with Kairos and at the moment we have as you were mentioning Ubuntu we have Alpine Linux we have Fedora we have Rocky Linux yesterday someone coming to the booth asked me if we could make it work with Alma Linux which is very similar to the Red Hat family as you were asking turned out just changing 2-3 files very simply made it work there so I thought that was pretty cool Red Hat yes but of course with Red Hat you're probably using licensing so then there's this additional step that you have to do but it's basically supported since we have Rocky Linux and Fedora there yeah and we're happy to try to introduce more distributions if people are interested in them of course the basic requirement that we need is right now our two types of systems are system D and open RC and ideally G Lib C but of course for example with Alpine we have access to muscle so it will really depend what your distribution offers for us to do the transition anyone else? well if it's not the case then thank you very much for your time and if you have still any other questions or you want to try Kyra's out we have our website where it's all the documentation and the different image matrix that we support we also have we also have matrix and Slack channels where you can make questions and finally we have office hours we meet at 5.30 every Wednesday European time so if you just want to pop up say hi or you know tell us what is missing in Kyra's please do and we'll be happy to try to make it better thank you for joining us again in DEF CON now we will continue with another session which will be led by Paweł Piczak who comes from Czech Technical University in Prague enjoy and welcome thank you for introduction and good afternoon to ladies and gentlemen here on the stream my presentation refers to our long journey through the work on CAN and real time and so on so it is quite dense and if you have more questions and want to know more about specific project and so on then please take the contacts and contact me later and so on all the hardware and software which we have built for this area is open source or you can find documentation for it and so on so you can reuse the code and the content of today's presentation is some short introduction who knows what it is and how it works here about arbitration CAN I have made very short introduction because I expect that there will be more people who know a little more about CAN but anyway I try to introduce the technology and I start to speak about our open work in this area and then about our interesting open source project of CAN FD controller which can be even implemented in the FPGA or in the hardware there has already been some attempt to go through the synthesis in the hardware and to speak about real-time and latency testing so for the introduction as you know probably CAN has been introduced in 1983 by Bosch company and the main goal for this communication was to somehow simplify the wiring inside the cars because today you have so many signals and you will have thousands of wires coming from one position to another to switch something on and off and they decide to multiplex it on the wire through the packet communication and initial CAN 1.0 design allows us to 2048 identifiers which can hold 8 bytes of the actual information so in the fact 64 single BTS and no signals so for example into one message you can distribute all the switching of lights of that day standard car or vehicle today when you have dimming and back channels for measuring the voltage on the lights and diagnostic of profiler and so on you can find that even those 2048 messages is not so much so they extended the support to 29 bits in 1991 in CAN and what is interesting on this communication is that it has a deterministic media arbitration which is based on the ID priority so as the message has lower ID it has higher priority and I have skipped this in my presentation today but it works such way that basically at each bit which is sent from the identifier there is running var and function and that station which sends a given bit with low value zero wins and the other ones stops sending a frame so this way it is ensured that only one station succeed and send the desired frame on the bus and even that one station that sends a given frame this solution of the base collision avoidance or resolution it has its price and it is that the signal has to be distributed through a whole wiring fort and back to detect the collision deterministically so it has a disadvantage that at this phase we cannot speed up communication above some limit for example if we decide that in typical car single bus network or single bus channel doesn't exceed 30 meters then you can go with communication frequency up to the one megabit per second but you cannot go faster because in such case you cannot resolve the collision because there is some time when the signal is distributed through the wires okay so this is the price and this means that the throughput is not so great so then there has been provided a new version extended version which is called can every flexible data rate can which uses slower communication with a deterministic resolution only for the standing of the ID of the message and then actual data are sent at the higher bit rate typically in today cars the can every frame can send up to 64 bytes not 64 bits as the previous version and usually it is used with 500 kilobits per second for the arbitration phase but during the data transfer phase the frequency or speed is increased to 2 megabits per second or in theory it can be even 8 megabits per second but in automotive usually 2 megabits per second are used usually the arbitration is run at the low speed then we switch to the data rate inside the middle of the control field where is the information that now is the switch then the actual data are running at that high speed then there is quite precise cyclic redundancy check field which ensures that corrupted messages is spotted and that there is no possibility that very low probability that we consider corrupted message as correct and then there is interesting field which is run again at the initial speed and it is used for the confirmation of the correctness of the reception of the frame so in theory all units find that there is a problem with the frame then this unit sends NAC non-acknowledge field and this way ensures that no unit considers the message as correctly transferred so in theory you should have all units in the scenes with global view of the messages and so on that is the advantage if some unit misbehaves then it is it automatically counts that it is necessary to disconnect it and basically it falls into the passive mode and so on so as we have seen can every extend the data field to 64 bytes then today there is extra long which allows to sending up to 2 kilobytes and 48 bytes in the land messages which is used to encapsulate even the ethernet traffic inside the vehicles onto the can network in the vehicle it is question how much it will be used but it allows to switch to even faster which is symmetric up to 10 megabits per second and it limits the actual part which is sent at that low bitrate for the arbitration to really a minimum to allow the resolution of the priority and then the rest is sent at high high speed, high bitrate and even the IDs are not sent during the arbitration now but they are moved to the fast field and arbitration is only 8 12 bit at this time okay, the support in the Linux systems because Linux are used today for industrial application for automotive and so on and the first implementation was of different vendors provided drivers for CAN and those has been usually integrated as standard character device we have started it in 2003 at our university to build some Lincoln driver which was vendor neutral and it was part of some bigger project of open source components and it has been used at some companies up to 2018 at least at 2018 I have get some feedback and support request and so on and it was character based but because CAN is in the fact networking standard then Volkswagen together with Spengutronics company started development of the socket can which is socket communication based interface to the CAN hardware and okay, so you can use typical select and bind and so on on the CAN network even that it is little special because the concept of addresses is not so it is little different from the standard TCP IP communication and so on and this solution has been accepted at 2008 into the Linux mainline we have helped in this design with computation on bitrates and some other people converted parts of our Lincoln support even to the socket can and if I return back to my first experiment with CAN we have built some ISA bus card in 1997 for our RS485 communication without instruments but we have added there a block to allow to communicate even with CAN then when we have built our motion control system we have developed our controller which has a token controller then we have started to play with the CAN then at the university we have more PC104 system for that OSERA project and we have developed a Lincoln driver based on it then we have ported it from ESD and it has accumulated support from many many different hardware and contributors and so on. We have even implemented drivers for unicontrols company which use VME based system first based on OS 9 but when the support of OS 9 is terminated they switch it to the linux they look at the linux as a childish toy and loud as us when we speak about the CAN and linux and so on but when they cannot use anymore OS 9 then they have been very happy that they have a support and they use our Lincoln in the railway vehicles and other applications for many years. Then we have attempted to build a converter between the USB and serial and CAN communication and for this we have already built a Lincoln support and then even a socket can drivers. Then we have worked on higher levels of the CAN communication which is for example CAN open standard which is standard how to map parameters and service object to some numbers and indexes and allow to maintain industrial drivers motor drivers or IO extenders and so on and we have built this system such way that it was quite generic because for each device it can open protocol so it is that protocol above the basic you should get electronic data sheet and we have this side that we will build a general purpose device and library which can read this data sheet and mimic actual hardware so for example we have some Vaggo IO terminals which can open so we take the data sheet and run our virtual device but not so virtual we will see later and there is a state machine and some actual interface to the drivers and our driver solution was done such way that it can connect to Nutics Lincoln or socket can drivers and this is how it looks like inside the CAN open device so basically there is some object dictionary which is described by the EDS so for example where to read how many IO ports are available on given device ability to access actual IO ports then it is possible to map those objects to the process data messages which is very interesting because on CAN open basically actual process variables can be transferred without any need of overhead they are basically sent directly in the raw CAN frames because it is possible to allocate for them ID and then through this mapping there is a generic mechanism how you can map different bits from different variables to the actual location in the process so our library allows to interface with different CAN drivers providers and standard on Linux or even on a small microcontroller based devices and because we want to connect it even to the hardware to mimic the whole device completely then we have allowed to load dynamically and describe connection between those objects which are used for the communication to the actual real hardware and we have IO device built this way and we have built a device which or software which allows to monitor such network which allows to communicate and set different parameters and makes in the fact a gateway between the TCP IP communication and it can open communication and we have provided a Java based monitor and QT based monitor for this and we have there demonstration for example of IO module when we have use Linux comedy drivers which allows to connect to IO devices so basically really our in virtual it is a question of how the software can operate exactly the same way as a standard PLC controller which is providing this functionality of can open extender or IO device and we have written even comedy drivers for some hardware from some data acquisition cards which has been released by one Czech company and those has been accepted to the main line so I skip this or we will skip this exact example because it is quite complex because we can even emulate who that system inside a QMU this is the way how we can access from the monitor a different variables in this system we have there even example how to run the software which allows to control brushless DC or permanent signals and so on and all this can be done with our libraries and connected to the hardware and so on then started our attempt to provide a first attempt to implement can support which is real-time executive for multi-processor systems which is designed initially for US Army and now is used by European Space Agency and NASA but when we started work on the can in this area there was a problem that there is not common hardware so for this reason we have started Google Summer of Code with one of them and implemented at the end the QMU emulation of the can hardware so today you can run a standard QMU because it has got into 2018 into the QMU mainline so you can run a standard QMU with standard Linux you can specify when you start a QMU that you want to have there a PCI based card or ARM or MIPS or x86 based system and drivers automatically load and allows to interconnect a can bus inside that QMU so basically run there the software of some engine control unit or something like that and connect it with external hardware and external work through the drivers and real hardware to the standard bus and because at our university or faculty there was strong group which support can for unicontrol, Skoda Auto, Volkswagen and build the test equipment for this area which needs not only to have the access to the network but needs even ability to alternate the actual frames and inject errors and so on so you cannot use a third party fix it silicon for this then there started a project by Andrei Ile in 2015 to upgrade these designs for can FD that extended version and at that time I was done at my company a consultation for one Volkswagen which was a company which was a company with FDG based hardware and they want to connect it to a multiple can buses and found that it is big problem to get M-Can licenses from Bosch and that M-Can core which is standard can FD core provided by Bosch so at that time I have started the project and negotiated that this subsidiary of Volkswagen bought the design which has started as that experimental tester design at the university make it open source and paid for adapting it for their specific needs and this is how CTU can FD core has slowly developed into the full featured can FD controller which is fully described you can find the sources and it allows to work with 5-phone input connect to the socket can between 2 up to 8 TX buffers to send the frames to the network and it is implemented such way that it can be connected to the AXI bus on some Xiling Zinc FPGAs or it can be connected to APB to HP there was even attempt of some people and contributed connection to LITEX which is now used in the hobby project with FPGAs and this design included even some advanced features for the testing for example bus monitoring mode and rustic operation mode and so on so it is interesting even for such applications and when we finish the project for Volkswagen and Andre Ile who is main author and at that time a graduate student at our university continued on the project on his overtime and built a complex infrastructure to allow full testing and basically get very near to full compliance or in the fact we think that we are compliant but we do not have enough many at this moment or need collect too many I need to ask for the real certification so it is waiting for some group of people, companies who has interest to join for this and resources for this but we have our open test which are mostly or which are passed by this okay and because we want to have even emulator then I have run Beshore thesis with one of my students who have implemented emulation of our open can FD controller and it is now part of the main line of QMU so you can connect even the hardware through a can FD and run in the fact in the virtual environment a whole engine control unit and so on and we have support for zinc hardware education kids for PCI express FPGAs from altera there has been some third party attempt to connect it to latex it has been ported even to some space great FPGAs and some diagnostic tools and so on one of our student has used our design as a basis for his open testing of the open source machine to transfer a design to the silicon and so on unfortunately we do not have money to get the silicon yet but we have got even some offers there are some companies which has done this us exercise how to get it into the state when it can be run on the ASIC it can be run on the virtual environment which is used in the space already and when you run some communication or you care about emotion control application or something like that then you need to ensure that your latency in the system do not damage the stability and safety of your control application and the quality assurance farm on real time which is very good service which provides a precise testing of the fully preemptive kernels and standard kernels but for standard Linux kernel you cannot expect anything there can be 10 millisecond latency if you plug USB or HDMI cable and so on but for fully preemptive kernel we have been able to run our some control application up to 30 kHz with no detected misses of the sampling so you can see this is some slow arm thing in the fact this is I have took from their farm the latency plots for this hardware which I have here and they do the testing for long long time so their long term testing through the many years shows that well maintained fully preemptive kernel can achieve okay you cannot do mathematical proof that there will never be a longer latency but you can see that in this logarithmic scale if there is a single overshot of the latency or jitter bigger than 100 something like 20 microsecond on this hardware slow hardware then you will spot it but we have decided that it is necessary to provide some such guarantee even for a can communication and because our system allows to measure even up to 10 nanoseconds the times of arrival of the frames and so on we have decided that we switch our older attempt when we have done some assessment and every gateways and drivers infrastructure and system calls for a Volkswagen that we switch it to our new hardware and it was task of one of my students who has finished the thesis last year Matej Wasilevski who added the time stamping into our driver and follow up by Bechtel thesis of Pavel who has finished in the latency testing on the web system and basically we send the can message on one bus from one system we receive it by the second can fd channel which precisely measures the time but then there is a device under the test which receives the can frame and only functionality is to go through the kernel or even other space and send it on the other bus and then we measure another time and this way we can exactly measure duration which is caused by the latency inside the drivers and Linux kernel and so on and this is the test system which runs a daily test of mainline Linux kernel and on the real time patched version of the kernel and this is overview we can do the latency inspection of individual runs and see latency plots we have there this hit map or we can see again this long time plot where we see a latency how many messages basically exceed given latency you see that there are some still some better problems and it runs each day and you can find materials for our work and in this area but I work even on some project for European space agency for real time control of different devices at this moment on the roof of ESA in the Netherlands is running our motion control to track some satellites and so on. So it is all from my side if you have interest take the leaflets ask I can even demonstrate something for you here we have included in our design even the logic analyzer so we can see the signals on the real bus on the virtual buses down to the 10 nanosecond resolution and so on. So okay thank you for your and if you have some questions yes we have work okay question is if we have work on the security issues with the can we cooperate with Oliver Hardcobb if you know him from Volkswagen he is behind the start of socket can and he has designed a message authenticated can variant and we have done the second implementation so they done their oven and we had the university has done the second one to prove that there is possibility to do the interoperability but it was at can days so it was based on that 8 byte messages which make the thing easy because it was necessary basically it works for short messages it was possible to provide the cryptography directly with the signal in the 8 bytes but for the longer messages it works such way that message has been sent and if the receiving unit react on the message but then later didn't get the information that it was correct through the another message which has a problem but then it basically need to abandon the functionality or signal that there is a problem but usually if you have this latency something like 10 milliseconds or something like that then reacting on the incorrect message for so short time is usually not a problem so and today on can every it could be a better so but at this moment I have not work on this area but it is working Oliver Hardcop is now part of security team of Volkswagen so we can ask him directly maybe he is looking now maybe we can have answer on the chat okay it is it is propaganda that it is steering to Ethernet but you know today many new processor units has up to three channels on one device so basically yes I'm not sure if that attempt to send the TCP IP frame through the can is good idea or not maybe not okay but for sure for those small nest around the processor where you need many communication channels which are deterministic real time and so on then the can maybe it's area of application and today for example I know that some companies are developing directly can have the light base chips so directly something like was I2C but I2C is not good idea for automotive because it doesn't protect the messages and so on and so on it is it is good for connection of the chip on a single board but not for anything more but today there are attempts to use can FD light for this area for example that you will have chips which are directly which has to be cheap they have directly finite state machine inside the chip and they can be connected to standard can FD controllers and you have chips Texas Instruments has now 14 can FD so they probably do not think that they would not be used you know because if they have so many interfaces in the hardware at this moment we only do monitoring and to see what happens we have already catch one error on the fully preemptive kernel which has been resolved in one week by the people from Linootronics so it was a fast reaction but it was it was hard fault basically okay this is available online so you can test it okay so you can at this moment at the university we do not have we can run maybe 10 kernel tests and test per day or something like that and we have limited amount of the hardware but we are in the contact with OSEDL which has that quality assurance farm for a real time and they plan to test our solution on their farm so basically company which means doing the business seriously then can ask OSEDL to run their hardware there and provide them a feedback on their development branches our solution is open fully it can be run even on different hardware than on this one basically you need a tool reasonable can have the controllers with precise time stamping and you can run our system in house okay so thank you hi everyone so my name is Rachel Sibley I'm the automotive QE technically for the automotive program at Red Hat and here with me is Priyanka Verma who also works with me on the initiative so we're going to talk to you today about how to qualify a safe Linux distribution in cars so a little bit about the agenda I'll talk about what is the Red Hat protocol operating system and safe Linux our overall approach to safe Linux and how we're working towards compliance against the ISO 26262 we'll go into what is the test strategy like freedom from interference failure mode and effects analysis the process aspect of it and how we're managing the requirements based on man pages and test assets and traceability and all of the items that go into that so what is the in-vehicle operating system it's a smaller footprint of rail we are inheriting everything from rail except the kernel the kernel is the only package we're rebuilding for hardware enablement we're using the RT kernel it's based on OS tree which is very common we're running OS tree we're working to achieve functional safety certification to conform to the ISO standard it wasn't really made for pre-existing complex software but there is an initiative to adapt the ISO 26262 through an ISO pass to complement that better so we're working with a company or partner Exida consulting body and they're helping us with achieving continuous certification for FUSA so what is FUSA or functional safety it's the absence of unreasonable risk that could lead to harm or injury or even death later on in the road in the vehicle so we want to ensure that we did everything in our power to avoid potential harm to the user so by doing that there's compliance with this international standard we have safety goals that are derived from potential hazards we provide technical solutions to be able to react to these faults big part of it is it's very process oriented very rigorous detailed flow to be able to work towards functional safety certification so a big aspect of it is providing the evidence that if we get pulled into court later on or there's an audit we have the evidence to be able to back up to say yes we did what we claimed that we were doing and we set out to do we have the evidence to back it up so you might hear the term easel a lot throughout the talk easel stands for automotive safety integrity levels low being the lowest level up through D which is the highest level of hazard so easel a would be something like your rear light which is very unlikely that's going to cost somebody to be in an accident and become harmed in any way where you look at something like easel D which is the air bags failing to deploy so this is a high level view of the different capabilities of the car and how they would relate to the easel levels for us we're certifying against easel B so easels are determined by three factors probability of exposure controllability by the driver and the severity of the failure so this slide has a lot going on but I'll try to summarize it so the ISO throughout the specification mentions the V model quite often this is the verification and validation process it's derived from the waterfall model this isn't something we typically follow within RELQE so we take what we're already doing well and complement it with the various aspects into the ZV model the safety analysis is really what identifies how complex and how much rigor we really put into this and these are the FMEA analysis of failure modes and effects analysis which Priyanka will get into a little bit later on the left side is the verification activities where we do our system design and requirements analysis on the right side we're doing our iterative testing at the unit level integration system level and so forth so for the in-vehicle OS we're leveraging all of the same tests from REL they're already doing a really good job there we don't want to duplicate the effort there and they also take a lot of the test from our upstream REL being our upstream and then further upstream and so forth so we take all of those tests and we re-run them in our environment which is an OS tree environment also working on adapting the test to a new test framework the tests weren't really designed to work against an RPM traditional composed so there's a little bit of work to work with an OS tree system so because of the aspect of not being trained and not using TNF we're using RPM OS tree and so forth and then there's additional work to adapt to the test framework to get them to run in our CI systems so the requirements are derived from the ASLB APIs so for each of these APIs we needed to derive test cases, have targeted tests that are 100% covering the requirements which I'll talk about in a little bit but we want to be able to reuse the technologies and tooling within REL QE we don't want to fork their tests and we don't want to duplicate what they're already doing well and be able to reuse those technologies so we have two main pipelines where we're running our tests one is the auto tool chain CI pipeline this is specific to automotive with building custom images and re-running them in the pipeline and then we use where we're using testing farm in TMT there were other talks about that earlier you might have attended those but TMT stands for test management tool and highly used within REL QE and then CKI is the kernel CI effort that is going on in Red Hat they have high engagement with the upstream kernel community with testing upstream kernel trees so they test very early in the development cycle at the patch level and we want to be able to reuse that as well and then the other effort is being able to provide the traceability in Polarian which complies with the ISO standard so for requirements testing we have the package level verification and this has to happen per API level there's a very detailed workflow that goes into this which starts with reviewing code reviews and static code analysis structural coverage so the code coverage analysis the requirements level verification which is derived from the man page of the API we then take an API then we take a man page and we break it down into low level requirements so we are able to break that down into something more maybe something that's ambiguous we break it down into testable parts to ensure that we have the traceability for all the behavior that are specified in the man page and then if we find any discrepancies with the man page it's also a chance for us to be able to file a merge request to be able to provide those changes to make sure that the specification e.g. man page is actually doing what the implementation is designed for so the integration level testing this comes out of the analysis which I mentioned earlier so the failure modes effects and analysis there are specific failure modes that are identified within an API and external sys calls and dependencies and we need to ensure that our existing package level testing is already covering the integration level testing as well and then again more details about putting that into the requirements linked to test cases and test cases to test runs and so forth so Priyanka will talk about that a little bit so I talked about code coverage analysis earlier so the ISO recommends the software is verified using code coverage analysis the target is 100% compliance for the recommended quality metrics defined in the specification we don't have to do this for every API only for the complex APIs we also use it to assist us with the simple APIs to ensure that we have the 100% requirements coverage so within the specification you'll see recommended highly recommended for all of the techniques within the v-model so for the code coverage structural coverage it's recommending both statement and so for code coverage workflows we're trying to do more granular reports that are specific to the API level rather than running against the entire package source be able to do detailed analysis where we can drill down into specific API to see where our gaps are so the low level requirements also have code coverage analysis that are showing the traceability of the tests to the requirements and the test coverage there so that helps us understand where our gaps with code coverage and requirements and therefore requirements coverage to know well I need to go and develop new tests and then push them upstream to be able to pull them back down to Raul so another major aspect of functional safety for road vehicles is freedom from interference freedom from interference is about as we can see in the image the cascading failure is the failure that actually cascades from one element to another so for example there was an event that causes a fault in element A which made the element A fail and the failure of element A causes the fault in element B and it fails the element B so this is a cascading failure so freedom from interference is majorly about avoiding the cascading failures so it's also linked to the ASIL that is there should be freedom from interference between lower and higher ASILs or a quality managed level to any ASIL A, B, C or D so how do we ensure freedom from interference first thing we analyze so there is DFA dependent failure analysis so this covers the cascading failures which are for the freedom from interference and then also the common cause failures which are related to the independence so common cause failure as the name suggests it is due to a common cause that at the same time fails element A and element B so these failure analysis help us figure out the failure modes that could lead to the failures and we try to mitigate those failures again these dependent failure analysis are of two categories one is deductive analysis and the another one is inductive analysis so as Rachel mentioned that we are targeting ASIL B so according to the functional safety standard for road vehicles for ASIL B it's highly recommended to do inductive analysis which is FMEA so under this exercise what we do is we brainstorm and list the failure modes that could be applicable to a particular component and then further analysis happens where we calculate the risk priority number so risk priority number is nothing but the multiplication of severity occurrence and detectability so we get that number after the multiplication and then we have the data for the failure mode then we have the chance of mitigating that failure so we apply technical solutions to mitigate the failure that we have already listed in the FMEA so after putting that medication again the risk priority number is calculated and it's checked if it is if it's under the acceptable range or not and from the mitigations then finally there is one more way of deriving the failure modes that comes into picture now comes the process aspect so process aspect is majorly about the evidence and then the traceability so evidence is that we have to prove that we did what we were supposed to do according to the functional safety standard we have to prove that we tested a requirement or we made sure the failure mode was covered with the mitigations and we followed all the timelines the fault tolerance interval and a lot of more things so these evidences could be for testing for QE it could be our test asset it could be our test plan our test specification and our test reports that tell us when we tested the requirement how we tested the requirement what techniques were used and what methods were used and what were the results which tools we used so we have all the data if we want to retrieve all the data this data needs to be traceable so we should be able to make out which test is for which requirement and which result is for which requirement whether the failure mode has the right requirement associated to it or it was a tested well was it 100% tested or partially tested so that kind of data should be traceable so yeah that's traceability is one of the things that add value to the evidence and evidence which is not traceable is not valuable so how do we achieve it we are doing it by using Polarian we are using Polarian in different ways as you can see for our test case management for writing our technical safety concept managing our requirements and something that's in progress is the configuration management tracking our change request whether they are related to the tools the process and lastly we are using it for the metric reports like the traceability matrix and also the pass field metrics according to the requirement now we come into how do we derive the low level requirements and the associated conditions of use so as Rachel previously said that we are taking reference from the man pages of the associated APIs that are in the safety scope we analyze the man page and from that man page we derive the low level requirements that are testable and now there are other things that come into picture that are assumptions or the conditions of use these assumptions of use or conditions of use come into picture when the fulfillment of these low level safety requirements is dependent on the context for example I define a requirement for an API but that holds true only if the API system is 64 bit so that is going to be my assumption of use that the so and so requirement should be verified and passes because if you use the right environment so that's the assumptions or conditions of use later yes the verification happens and yeah that's how we move forward next thing when we talk about man pages for deriving requirements is man pages are subject to change because they're upstream right so how do we detect and handle those changes so we have an application that runs and gives us the diff if there is a change and then there is a CI that's automatically triggered and MR is raised and there the analysis happened which tells us that whether the change is related to the API in our safety scope or not it automatically gets merged and if it is on our safety if not in safety scope it's automatically merged and if it's in safety scope then the MR is pending for review so what happens when there is an MR pending to review due to a change then our change management workflow comes into picture a change request is opened and then the impact analysis take place in that impact analysis we take into account the technical impact the schedule impact and what are the different work products that will be impacted due to this change whether it affects only documentation or testing or testing development and documentation so that kind of impact analysis is performed based on that impact analysis there's a change control board that approves or reject a change so once the board approves the change of the man page the related requirement gets the change of text and hence the implementation changes and hence followed by the VNV activities that is validation and verification activities now the traceability aspect how does it fit in the complete scenario so we have lower level safety requirements we have failure modes we have man page based requirements and for traceability needs according to the functional safety standard we need to have bi-directional traceability between the different hierarchical levels of the requirement for example if I'm at the lower level requirement I should be able to trace back to the top level or the parent requirement and from there I should be able to come back to the related or derived lower level requirement and hence the requirement should also be traceable with the test specifications that are verifying that requirement the test results and hence the test plans and test report basically all of our test assets and same goes for the failure modes so our failure modes should also be traceable according to what test plan they were planned in and the test specification reports and more so this is an example traceability report how it finally looks like so what you see here at the top level is the man page based requirement and which is further associated to the lower level requirements that it has the failure modes that are in the blue icon over there and that exclamation mark is for the assumptions or conditions of use and below that if you see there is the low level requirement associated with the test case so we see that which test case is verifying which requirement and how the requirements are you know related to or bi-directly traceable to the failure modes so that's an example any questions yes please yes actually the standard incorporates at it is for all three levels at the software level, the hardware and the system level which is software plus hardware so currently here we are concentrating on the software part for now and maybe in future if it you know the lifecycle permits then hardware and software interfacing HSI yeah G-Cub G-Cub R as well the question was which tool we're using for code coverage analysis G-Cub, G-Cub R yeah a lot of the work that we're doing for the safety scope like a big part of it is affecting G-Lib C package for example so there's a very large upstream test suite that we're re-running for RIVOS and then we're trying to break it apart by the unit level to show the traceability to the specific failure mode or the specific package level of the test against the API so new tests are being developed and they're being staged and we're in coordination with the SMEs about how to be able to upstream those tests to the G-Lib C project because then RELQE can take advantage of that and then pull them back down and re-run them for RELC as well the question was how we create this requirement hierarchy with the high level requirements with the man page and then low level so there's even more levels actually beyond that there's a technical software safety requirements that are defined from our safety goals and then they trickle down to our man page requirements which go to the API like getM and then from the API getM we then look at the man page and then we look at the the behaviors within the man page and then break it down into functional parts because not every part of the man page is something that you would test for example so those become low level requirements and then those get fed into code coverage analysis to show that we have 100% code coverage against those low level requirements but now we might not always get 100% code coverage so as long as we can provide justification as to why that code path wasn't reached for example but otherwise for the complex APIs we have to have 100% code coverage. Where do we get our requirements from the APIs which are our ASLB APIs those come from the OEM so they provide the list of APIs within the safety scope of course we work with them to influence them and help to guide them about which ones should be in the safety scope but those specifically come from our customer which is GM who we're working with right now so the question was some of the man pages are very minimum and then they point to other documentation was that the question if it's we will have to handle that yes well there's this call wrappers in that sort of thing so yeah that's something that we need to handle and yeah so part of it will be redirected back to the actual API that's wrapping around it so there's different use cases based on if it's like a syscall wrapper or the API or and then there's different categories of very complex complex high, medium, low and there's a different rigor that's taken upon the complexity of the API as well but eventually those are going to be broken down into two aspects simple and complex once we get to a point where we can do that classification well thank you very much it's come in front especially if you will have questions because last time yesterday I couldn't hear anything from the audience and in the last three rows nobody could hear anything as well because of the air conditioning I mean feel free to stay there and do something other work but if you want to listen to me it's going to be easier from the front still have one minute so hello everyone obviously my name is not Barbara Binova I'm a last minute jump in because there was an accident, there was an argument between Barbara and the staircase here at the conference and she couldn't make it today so my name is David Halas and I would be talking about trust management in digital ecosystems please try to sit in the first rows because in the back you cannot hear anything ciao, stay in the front so you can hear unlike yesterday okay so I'm going to stay seated because first of all I'm tired second I have some notes from my supervisor who actually prepared this talk and I'm just the last minute replacement so at the beginning at the beginning I would like to talk about our university which is not this one but the other one it was established in 1919 it's the second largest in Czech Republic after Charles University in Prague it has over 30,000 students our youngest faculty is the faculty of informatics which has over 2,000 students Barbara is the vice dean of industrial partnerships so I guess that's the reason why she had these slides there so as digitalization is advancing and getting more and more normal in our lives there are new challenges that we have tackle and also there is the dual use dilemma that technology can be also used for good or bad and if we want to embrace the good we actually have to use it, banning it is a bad idea, it's the same as with fire you can use it for good and also for bad and if we talk about digitalization there are challenges regarding hyperconnection so everything will be a device, we'll be talking to every device humans will have more interactions with machines which brings us to the topic of dynamic autonomous and not just autonomous but dynamic cyber physical ecosystems where uncertainty and unpredictable situations can happen on a larger scale than before and obviously we have to secure and future-proof these technologies so we can catch issues that can happen that we don't even know about right now our research specifically is around critical infrastructure which makes it critical because usually human lives are at stake the way of our life is at stake so we talk about smart grids autonomous vehicles, smart cities it's basically everything that's around us that can harm us if it's used in a bad way that's our view, there are different definitions of what a critical infrastructure is so we believe that trust might be the way out of this the way how systems trust or not trust each other or even humans or system trusting humans human trusting systems so our first view on this is the trust management in an internet of vehicles or a network of vehicles or vehicle area networks we can talk about collision avoidance so if two vehicles meet then may vehicle A can trust vehicle B if they behave to avoid each other from colliding then there is vehicle platooning as you see on the picture we can move them into one lane and the aerodynamic resistance gets lower except for the first vehicle so they can save fuel for example can we trust can the randomly joining vehicle trust the other members of the platoon or can the platoon trust the vehicle that's trying to join the network then there is my research kind of the runtime update or running smart agents in the vehicle and trying to assess some kind of trust against that software that's being run and of course as I said it's the human to autonomous vehicle trust say there is a human that's driving a vehicle that starts becoming autonomous slowly I'm not talking about like two months or months so a human starts driving the car but the car is slowly taking over as the human allows it so we can have situations like this where we have traffic light still which wouldn't be needed in a fully autonomous system because only for only for pedestrians so how trust and trustworthiness was handled by the industry before increasing the security or reliability or availability of these services, these technologies doesn't really improve the trust itself in the system it's it's not a conventional problem in computer science because trust as we interpreted is a human social concept a belief that the other system will not do harm to you by exposing vulnerabilities to it and even though the system might say that hey I'm trustworthy and it's certified and everything's fine by some checks in the past it can behave bad in the future and we can even intentionally design devices that would have one sole purpose of do harm and do damage at a certain point in an ecosystem let's imagine a vehicle that behaves well for 14 days and on the 15th day it will just do something that would collapse the whole city and kill a lot of people so that's what I was talking about agents with malicious intentions banning this which sometimes comes up in legislation isn't really a solution because somebody else will anyway implement it that's one of the reason so we need to be proactive and come up with technologies that will solve these problems so understanding trust we started doing some service about in other fields of science then computer science about what trust is and we found these nice definitions so basically trust is a relationship between a thruster and the trustee I mean if you want to read these I can give you some time but what we got into is the trust in automation definition that suits mostly our needs which is basically a belief, a relationship between the thruster and the trustee in the context of uncertainty and vulnerability to expose some kind of some kind of side of the thruster that the trustee could exploit but in a safe environment it shouldn't happen this kind of trust is subjective so if A is trusting B it's not necessarily true that B will trust A which is also asymmetry and transitive so if agent A is trusting B and B is trusting C then A might trust C as well but not always we can go into reputation and how social structures let's say humans gossip about each other or have reputation about each other so let's say one person is very famous and they talk about him and you hear good things about him or her you hear good things about that person then you might trust that person implicitly so we can evaluate trust various scopes there is a local in a situation locally like two vehicles meet then they can assess trust then based on that relationship in the future they can build up some kind of reputation that can be consumed by other consumers of this trust framework and of course it can be context specific that you can trust vehicle in collision avoidance scenario but for example you cannot trust it in a vehicle tuning scenario for example because of a faulty component or a wrong implementation so we are looking into some kind of dynamic evaluation where we deal with these things so we came up with this nice picture of how is trust guys if you go to the back you will not hear much so in direct and indirect trust direct is when you interact with some other entity directly indirect is if you hear some kind of gossip if we talk about humans then there is some kind of context information about the specific situation and based on that we can aggregate these results and make a decision about trust which in our case as I stated yesterday if you have been on my talk it is probably non binary some kind of complex structure we don't really know yet what but so far we work with percentages in some kind of proof of concept case studies but it will probably like a factor of different percentages so if it trust directly we don't talk about reputation but just trust only then there is a quality of credibility or accuracy and there is the social metrics when we talk about these human aspects of friendship, honesty, benevolence altruism or unselfishness and we need some kind of ways to measure these so two metrics we already know how to measure there is openness or transparency and our view on this that we use digital twins is anybody familiar with digital twins here the ones who were on my talk yesterday so a digital twin is a model of cyber physical system but only existing in the cyber world so you can use it to simulate what this actual system would do in the future if you put it into certain context and if the cyber physical system shares with you this model it means some kind of transparency like hey I'm willing to give you my model I'm open about it you can use it to determine my future moves so this is our way of defining openness in certain cyber physical situations the other one is honesty is how good this model is how honest this model is basically if you run this model it behaves in a certain way but the vehicle doesn't behave in the same way then we can assess that assume that the model the vehicle wasn't really honest about this this digital twin again we have these challenges that we are trying to solve so there is a thing to think update trust so the vehicle is trying to download some kind of upgrade a black box and can we trust it can we run it can we expose it to critical functions of the vehicle or will it kill us then there is thing to think trust when you have two entities trying to assess each other's trustworthiness how much they can interact how much of their safety features can they turn off to be more efficient usually if we talk about safety features limiting vehicles let's say slowing them down if you trust another vehicle you will probably pass each other with a higher speed because you trust it and you don't feel that much vulnerable but if the trust level is lower then you probably do it in a slower way when you have more time to do sudden reactions if there is something bad happens then there is my research about adaptive safety that how much should we impose these safety features directly based on based on the level of trust or the trust score what happens if I do a false positive and false negative assumption one of the partial solutions is that as I said the non-binary the percentage where you have much much lower chance of false positives and false negatives because you don't just have the two extremes but we will probably take it more to some kind of vector which is still in progress and the last one is governance that who should be responsible for such an ecosystem who should be managing the trust from a centralized way who should be responsible for certain things and of course we also look into the ethical side so regarding things to think update trust one of the scenarios is what I was talking about is a digital twin being ran let's say a software module that ensures in a smart city speed limit or certain traffic rules how we ensure that this third party software will not break our vehicle and also do its purpose limiting you in school zones let's say we treat it as a black box we simulate its digital twin and do predictive simulations and based on those predictive simulations we can do the life compliance checking assess how much we can trust it and based on that expose it to certain features or not or if it's really bad then we can trigger some kind of safety system that disables it and doesn't allow it to run the agent in the vehicle at all this is the architecture I was talking about yesterday so I'm not going to go into the details about it but this is supposed to somehow enforce software module to run in a secure way on autonomous vehicles and the next one is collision avoidance we have some experiments with drones that assess trustworthiness of drones based on previous behavior and also pulling reputation into the picture and try to avoid collision in mid-air then there is adaptive safety my research where we are trying to adapt safety features based on the level of trust and safeguarding how vehicles will behave in an uncertain and unpredictable world not just vehicles but we are trying to demo it on autonomous vehicles most of the things again this architecture that I'm not going to go into details with we also designed some kind of safety adaptive safety framework where you have a model that calculates you trust score or value we propagate it to the outside world there is a safety module that based on a decision tree turns on some kind of safety features or exposes some features based on how the trust is changing over time and there is governance at the end where we are looking into how to actually calculate the trust how to and how to represent it how to punish or reward systems that are breaking or increasing their trust what kind of attacks are there and also we are looking into evidence collection some kind of forensic readiness that we are we can give it to legal institutions later we are looking into some kind of misbehavior because that might happen there is also pre and post incident then there are problems that we haven't solved yet like what kind of score trust score should we assess to newly joining let's say vehicles or entities to our system then we will probably need some kind of erosion or inflation on the trust score it's not necessary that you will be trusted today and there are some kind of attacks like black swan blindness and other sources that we are looking into it also so the challenges are the scope is situational so in once and as I said earlier in one scenario you can trust a vehicle for let's say lane keeping but in another one for traffic control you cannot trust is subjective again trust score as I talked about and erosion we also need to figure out how to detect malicious intentions that are hidden so let's say 14 days system is doing well increases its trust score then on the 15 day it just goes on a rampage and starts killing people how to ensure safety against untrusted agents like cyber physical system can limit the cyber part like vehicles you can disconnect them from the network but it will be still in your smart city it can do still damage so we probably need some kind of means to enforce our rules like police will be not jobless at that point and there is also still high degree of dynamism and uncertainty we will never know in the future how will be the convergence of a network of let's say autonomous vehicles or any kind of smart systems because they will change over time and you will have millions of unpredictable combinations so regarding attacks one of my colleagues is looking into what kind of attacks can happen the important part is their individual and collusion attacks and usually they can end up either in unreliable decision making so we have some wrong trust evaluation and based on that we do something that we shouldn't do and the other one is false trust recommendation when we again we have too high or too low rating and recommend to another system that can do bad things some of these attacks are like self-promoting when the system is trying to say good stuff about itself and there is whitewashing when probably you've done it in your childhood that you created an email address you did something on the internet you got banned and then you created another email address and tried it again there is discriminatory attack when it tries to attack other actors of the system and tries to pull down their trust score and there are other things like on off attacks when it's sometimes good sometimes bad we need some kind of system that can handle these kind of situations then there is another thing when I can bring up like from human social situations in primary school when five kids are starting to pick on one kid and start saying bad things about that kid this can happen with autonomous vehicles or the other thing the ballot stuff thing that five kids will agree that one of them will have a high five vehicles will agree that one has higher trust score by trying to prove that it's doing good and good and good at some point on the 15th day it will start going on a rampage so this is the things we're working on for different aspects of this thing mine is more the adaptive safety and the architecture around software modules so that's the think I can take questions to try thank you any questions yes I cannot hear you please come closer or shout do you need a one-word answer yeah so I have to repeat the question so the first question was if this decision three static or dynamic and the second one is how it is designed it will be probably dynamic not sure how much dynamic I cannot answer you it's right now but in uncertain situations you cannot have a static one it might happen that vehicles will exchange their decision trees at some point we are also looking into that but we are not really deep in it and how it will be designed that's also an open question anybody else please okay do we have any questions on matrix do we have the session chair okay if we have no session chair then I guess this was all thank you everybody for coming see you at the party