 So, hi everybody, my name is Dirk Eibach, I'm a member of Karl-Kloos-Schweißtechnik, so I will refer to it as Kloos-Fromnauer. And a few words about me, I was born in the 70s, I always had a kind of fascination for home computers and electronics. So, I studied it and finished as a German Diplom-Engineer in 1997, and my first job was in industrial automation. Later on, I did Embedded Linux Systems for about 17 years at a company called Gunthermann & Drunk in Germany, you may never have heard of them. They are in the KVM business, not what you think, it's about keyboard, video, mouse, extending, matrix switching, that stuff. If you came in by plane, all the flight control centers like Eurocontrol and DFS are controlled by them and you were probably guided by one of my kernel drivers. And yes, I joined Kloos in 2019, the robot controller team, well-knowing that they are doing Windows and the proprietary real-time operating systems, so I thought I would leave the Linux world for foreseeable time, but things might have changed a little. I'm married, I got three kids. Okay, so what does this talk about? Yes, Kloos, as you may have heard or may have not heard, is a manufacturer of industrial robots, so one of them is pictured here. And in 2021, we decided to do a study if we could change the situation in our robot controller and, yes, port the proprietary in-time real-time operating system from, yes, to Linux pre-empt RT patch. And this is about our experience, what we learned on the way, and all the lessons, and I want to share that with you. So a little bit on history. In robots, at Kloos, we are doing robots since 1981. Here we have some nostalgia pictures of that and we are doing in-house designed robot controller hardware and software since 1986. First of all, we were using the PLM programming language before starting at Kloos. I had never heard of that language, maybe you have, I don't know. And since 1995, we are using standard PC hardware, so industrial hardened, but at the basic standard PC hardware in our robot controllers, which was the first in robot controller industry, at least as far as I know. And some words on the Tenasis in-time real-time operating system. It was originally started by Intel under the name RNX in the 70s, and at that time, if you wanted to sell a processor, you needed some kind of software to support it, and from that it came. Later on, this real-time operating system was acquired by Radiuses, and they founded their own company to manage it, and that's Tenasis. So a few words about the status quo, how welding is done at Kloos, how robotics is done at Kloos at the moment. When I started at Kloos, I really have no clue about welding and welding processes. Today I still don't have that much of a clue, but at least I know all the stuff I don't know. And yes, this picture here is a typical arc welding process, gas shielded. So we have a welding torch, we have wire, we have a drop of metal in the air that is going to a weld pool, and this is what you classically think about when you think about arc welding, but there's all kinds of other processes. Here we have a tungsten welding process, there you have a tungsten needle that isn't consumed in the process, so you have to add wire externally. We also support laser welding processes, so the laser itself can be welding, or the laser is heating the workpiece so that the arc welding itself works better. Yes, we also support cutting processes and grinding processes, and yes, I have brought a video, maybe that's interesting for you. I hope so. It's taken with a high-speed camera. I really have no clue with what kind of precision these welding sources are working, so we see we have a pulsed arc, and we also have the wire itself is moved in the process to create a short circuit, and later on it's pulled back again, and this is really, really a precise process. It's a high-speed camera, as I said, this is not real time, and yes, we can basically control every drop of molten metal that goes into the weld pool, so it's really an amazing technology, and I always enjoy having a look over the shoulder of the welding source developers. It's very, very interesting. Okay, so let's finish it. So what is inside one of our robot controllers? Here we have a picture of the cabinet, and inside it's not that much, it's quite simple basically. We have our own in-house built industrial PC that has also in-house developed PCI card that has some 24-volt digital IOs, some analog IOs, some CAN buses, and the communication to the servo drives is done using the EtherCAD real-time bus. Maybe some of you are familiar with that, and all the servo drives are EtherCAD slaves connected to that bus in such a cabinet we can control up to 32 axes, and so a few words on our real-time requirements, what we are talking about here. The axes are running in a mode called cyclic synchronous position, so what does that mean? That means we have a bus clock that might be in our case something like one millisecond or four milliseconds, and each bus cycle we have to provide a new target position for every servo axis. If that fails, we are in trouble. The axis starts stuttering and everything goes down the drain basically. That really must not happen. Our motion planner cannot do that in one to four milliseconds. We need a little more time to calculate positions for all the axes, so the motion planner is running at a lower clock, typically eight milliseconds, or if we have really this full 32 axes, it might even be 16 milliseconds, and in between these clock cycles we interpolate the position, so we have a new position for each axis every clock cycle. There's some more stuff that's going on in real time to help the axis controller. We calculate the torque for each axis, that is to be expected in the next bus cycle using the dynamic model of the robot. This is also done in the motion planner, and we also adapt the speed controller depending if we have a light tool or heavy tool installed on the robot so that the axis servo controller can do its work properly. We also have some sensors that can be used in the process to compensate variations tolerances in the workpiece, and also that has to be considered in real time. So our codebase, it's about three million lines of source code, so there's not only motion planning, there's also know-how about the welding sources, there's all kind of stuff included. It was originally written in PLM, as I told you, some of it was converted to C, and some of that code is still actually present in our codebase. And yes, the really ugly part is we have to be still compatible to ANSI C. I don't know if everybody is familiar with that, and what that means, it basically means this part of C is not fun. And yes, the application itself is split in different processes that communicate with each other using shared memory and semaphores and mailboxes, all kind of stuff. The graphical user interface was originally done with Microsoft Foundation classes because the in-time real-time operation system is tightly coupled with Windows for graphical user interfaces. Later on it was we designed a new graphical user interface based on Qt, so that should possibly help porting this to Linux. And it's still not feature complete, so for some things we still have to switch to the old graphical user interface, but I think we are close to 100%. Yeah, the codebase itself also has some technical depth, as you may think. It started in the 80s, things have added up. There were some management decisions that put features over maintenance, and all together we have collected some technical depth. There's also some knowledge in this codebase from the well all parts that have left the company. And we also have to support the legacy system, as I told you, so the only way to handle this is carefully refactoring the stuff, and this can only be done little by little. And starting from zero is not an option for us. And yes, some words about the tenacious in-time RTOS. It runs a separate kernel on the on the industrial PC. It's covered to one dedicated to one core of the processor, and all the other cores can run the graphical user interface. The graphical user interface can communicate with the real-time processors using an API called NTX, whatever that means. I'm not sure what the NTX abbreviation means. Yes, a little diagram. So, oops, sorry. So, the Windows process wants to communicate with the real-time process. It has to do and call to the NTX library. That goes to a special Windows kernel driver that communicates with the real-time kernel from in-time, and this again talks to the real-time process. However, that kind of magic works. I'm not sure. It's not open source, so I cannot tell, but it works. So, we got a tempting offer in 2021. We talked to Keba, which is an automation supplier, and from Austria, they have a product called Keba D3, which is an automation controller that is running Linux. Internally, it's a Debian Linux 32-bit, sadly, but with pre-empt RT kernel attached, and the very important part is it has an integrated safety solution. So, safety is a critical component for us because we have industrial robots, and if you want to teach this robot, you have to stand next to this robot, which really should not go wrong. And to make sure it doesn't go wrong, you need some kind of safety controller that is, for example, checking that the robot is not exceeding 250 millimeters per second speed, so you are safe to stand next to it and can work with it. So, at the moment, we have an in-house designed solution for that, but the standards are changing, the requirements are getting harder, and we cannot do it with our solution anymore. And the Keba solution is suitable for robotics for our requirements. So, this is really important for us, and the flex core label means you can run your own Linux real-time application besides the Keba applications on that controller without many limitations. So, you can do a lot of dangerous stuff, but you can also do a lot of good stuff, you just have to know what you're doing. So, what are the benefits for us? The safety problem is solved, that's an important part. We can use some of the shelf hardware, so we don't have to do it on our own anymore. And yes, the alternative to the flex core approach would have been we would use our Windows PC within time and communicate to the Keba D3 controller, we are EtherCAD, and we would have to act as an EtherCAD slave, and lots of communication would be required, and all this isn't needed if we are running on the system itself, and we are also getting rid of some license fees. We don't need a Windows license anymore, we don't need the in-time license, we don't need the EtherCAD master license, so it's quite attractive for us. Yeah, but how could we find out if that would actually work for us? If it would be possible at all to port our software to Linux? And yes, maybe that can be evaluated theoretically. I'm not sure how to do that, so I came up with another approach. I decided it would be a good idea to do a study. So I suggested that in a meeting and I told the team what do you think we could do a study and for six weeks all the real time and the graphical user interface folks would see how far can we get in porting our system to Linux. And I was really amazed that everybody agreed, that was really nice, and so we decided to start that. One more, but how do you do that with a team of Windows developers? So first of all, it's important to see who these people are. These are people with decades of experience in the robot and welding industry, and they know their Windows stuff. They know real time operating stuff and you have to really honor that. And simply saying, okay, we are doing Linux now and everybody should really join that, wouldn't probably work. It would probably break the motivation for all the team. So we decided that everybody is involved in the decision making process and we really wanted to make sure that this process or what comes out of this process is open. The decision wasn't made how to proceed when we started this. Yeah, one thing I thought about was how can we make people comfortable with this? I mean, you have to consider for all these developers, it's a whole new world using Linux and how could they make themselves familiar? First of all, you need some time obviously to look into all this. What does it mean? How does it work? And it's important to give support. So I have some experience on that and yes, I really looked into the problems that popped up and it's also important to don't make fun of anyone. There are no stupid questions. It's all new for them. And it's also important to join the fun when some progress happens and something good. And it was also important to create an environment that's comfortable. This doesn't mean free drinks and couches or something like that, but this means we have Visual Studio as a development environment that the team is comfortable with. And actually you can do Linux development using Visual Studio. So we also had a look what can be done there. So some words on the methodology. Where could we get help in the process? First of all, there's an excellent book. If you do some system level development, it's really highly recommended. Probably you all have it in your bookshelf. And there's also some really good support on the Tenasis website where all the API is documented. And there's also a very knowledgeable colleague, Axel Scholz, who was in in the process. And he has a lot of knowledge since the 80s. He was in the in-time development of our system and all the stuff that isn't on this website was in his head. And so we did the API implementation together. And yeah, that worked really well. We did some kind of pair programming together and it was quite some fun for us. So what was the concept? Just a moment. Need some water. So how did we approach it? We decided to implement the in-time API as a user space library. So no kernel coding required there. And yes, we will have to support building our in-time application for the future. We still must add features there. So we wanted to be able to build our application still from the same code base for in-time and for Linux. So we decided to emulate the in-time API. And at least for the study, we decided to only implement the parts of the in-time API that are really used by us. And even in this talk, I can only cover a very small subset of what we did. There's really a lot of API functions that we implemented. And yes, to get started, we threw our source code to GCC and had to adapt some things in the in-time header files, first of all, because some of the types that Visual Studio understands are not understood by GCC. And that was a little work. And after that, the application compiled. And the link has started spitting out all the unresolved dependencies. And yeah, that very API functions we had to implement. And we started doing that and used a virtual machine, 32-bit, to be as close as possible to the Flexcore environment. And at the same time, the graphical user interface people started porting their project files to CMake, which is required for Visual Studio Linux support. And they also started to remove most of the Windows dependencies that are still pending in our libraries and have to be removed for the process. So I think I skipped a bit. The library is called Grintime, which is the generic re-implementation of the in-time API. And yes, this is our real-time application. As you know, now it's linked to the Grintime. And we also need some runtime support, which is the Grintime Demon. And one important thing to get right are objects and object directories. So what are objects in in-time? Every resource, like for example, a process, a thread, a summer for a mailbox, what I can think of is an in-time object. And these objects get 16-bit unique IDs. Yes, it's 16-bit really. So we can only have 64k objects in in time. And then we are done. So that's interesting, but that's the way it is. And the object directory is a process key value stirrer. There you can give names to the objects. So you can store the name as a key and the handle is the value. And you can use that for inter-process communication because any process can look into the object directory of any other process. And so you can get the objects from another process for inter-process communication. This is how it works in in-time. So how do we do this in Linux? First of all, we had to implement some, this is the API for that. So we have a catalog function that needs the process as a parameter, the handle and the name. So that's the way you write into this object directory for looking up. It's similar. And you see the last parameter is in milliseconds. So this is a timeout. Looking up things in the object directory cannot be done in real time. So we don't have to do that in Linux either. The array of objects, so 64k objects, if you remember, is stored in a shared memory. And each green time application, first of all, has to map this memory to access objects. And every object that is created or is deleted is registered in this array. And each process has to set this up properly. And looking up things in this array can be done in real time. So if you have the 16-bit object idea, you can get the object in real time. That's all right, no problem. And this is how we implemented the objects. It's a structure. You have a flag. If the entry in the 64k array is enabled or not, we have a union with data that is specific to each of our object types that we support. And we have an entry. What kind of object is stored in that entry? The object itself is implemented as a C++ STR map. We use the process ID and the name string as a key. And the value is the object ID. To get started quickly, we decided to do that in GRPC. And the service is running in the green time daemon that I showed previously. And yes, there are no real time requirements that is pretty much all right. And you can also use the standard GRPC tools to write your own applications, for example, for dumping this kind of object directories for debugging purposes. The process API is also, yes, quite few functions. And something that is really important to understand is the main function. We implemented a standard main function in libgrind time. So everything you need is done there. We set up scheduling and priority and signal handlers, memory allocator, all kinds of stuff in this main function. That is automatically there. And then you link libgrind time to your application. But the question is what happens with your original main function that is still there. So we did a trick. We used the minus d compiler flag to remain this function to main green time. And that is called later from the main function from the green time. And this is some kind of a hack, I know, but I really have no idea how to do this better to hook up in the startup process of a process. Maybe you have some ideas. You're very welcome to talk about that later. Yes, that's the threat API that's also, yes, easily implemented, just a lot of work. More interesting is this part, suspending and resuming individual threats. This is something that I did not find in the Linux API. I'm not sure if there's something. Talk to me later, please. So we had to think about how to implement that. And we implemented this as per threat signal handlers. And the signal handler is basically called going to a pause if it gets a SIG USR1 and the SIG USR2 is send its resumes. And we used the tgkill syscall to send this per threat functions. Yes, I'm a little scared because tgkill is not nglibc. Maybe there's a reason for that. If you know, tell me. Yeah, also trivial problem in time threat priorities are pretty much different from Linux real time scheduler priorities. We've solved that with a lookup table. So that has to be configured for every application. Yes, now a more interesting topic. It's memory. And if you have a look at this API, especially have a look at this functions, you may realize that this is getting some more fun. Yes. But another topic first. Every memory that you allocate using malloc can also be used as shared memory in end time. This cannot be easily done with POSIX shared memory. Or at least I don't know how to do that. And also we must be able to deliver object IDs for all allocated memories. So we must keep track of every memory allocation that we do. So we decided we need some more powerful allocator for that and writing custom allocator was not really an option, at least not in six weeks. So I remembered that the malloc has some nice features. And in fact, you can configure the malloc to use MAP for all applications for allocations. And you can also supply a custom MAP function. And so you can easily keep track. So this is part of our custom MAP functions. So we have a common shared memory file. And we get a size requirement that we have to page align. So it works with MAP properly. And that's it. And then we keep track of all allocations that are done and can use that later to share memory or to get an object ID for memory. But yes, there's another point in the API, as I pointed out earlier. There are functions to map physical memory, which is, yes, not that easy in Linux user space. And I was really horrified when this function popped up in the unresolved dependencies from the linker. And yes, what was done in the application was someone doing IPC had a structure and he took the physical address of that structure, communicated it to another process and mapped the physical address of this structure there. So yes, that's an interesting decision. And first of all, we thought, okay, we must do this. And we implemented the MAPRT physical memory function using the proc page map file and trying to use DevMam for that. And it kind of worked, but it's not fun. And later on we decided that doing this was a bad design decision originally. And this was some part where we really fixed the application and changed that. So what have we achieved in six weeks? First of all, good stuff. The tenacious in-time hello world example is working properly. That's nice. We also managed to get our real-time main real-time applications starting up and doing communication, inter-process communication with each other. All the API functions we need are implemented. So there's some stuff present already. At the graphical user interface side, we didn't come quite that far. We ported all the projects to CMake. We have removed a lot of dependencies, but yes, we didn't manage to get the graphical user interface working. So now the full success, but that was really not expected to achieve in six weeks. But we really got a feeling for the system. And we are thinking we can achieve the rest too. So what have we learned? First, I think personally doing a study was the right approach. We learned a lot. We achieved a lot. And we are quite sure now we can do it. Another comment from me, porting an RTOS API, can get very addictive. It's fun. Each function you implement and that is working is a success. And I've really spent a lot of time there. It's fun, but be warned, you easily get sucked into this. Another thing is if you pay respect to the people on your team, you can achieve amazing things. And kudos to Microsoft where you've really done an amazing job for making Visual Studio suitable for Linux development. That's really nice. It's still got some rough edges, but it works really nicely. So you can connect it via SSH to your virtual machine. The compiler runs on the virtual machine. It's all transparent to the developer. You get your built messages there. And you can also run GDB transparently from the Microsoft Visual Studio debugger. And yes, it's all perfect for our guys who are doing Visual Studio. And they can really use it. So what are the consequences? After doing the study, we did a vote. And I abstained from voting and all the windows developers voted. And after all, everybody voted for continuing with the Linux slash flex core approach. So something must have worked right. But after doing our study, the component crisis hit. And yes, that was bad for us. So the next year we had to do hardware support for replacement components and stuff. So we couldn't continue with our with our in time part. And that's a little sad. So this still will have to wait. I hope we will continue this year. We also decided that green time will be open sourced. I don't know if anybody of you is involved in the in time business. But if so, please come to me talk to me. If you're interested in the code, yes, you can follow on github.com slash green time. It's not there yet because I have to remove some things from our Git repository that are tenacious in time header files. We cannot deliver those obviously, but after cleaning that up, I will post it there and you can have a look. So questions. Yeah. So very nice story. Reminded me a lot of what we've been through, what we're looking to in the past. Did you happen to have a look at regarding implementation, how the artist layer, the artist emulation layer of Xenomai was solving some of the tasks as well for other APIs. We don't have an in time API there, but we have it for weeks works and other stuff. So a lot of things sounds very familiar to me. Yes, maybe that could have been a source for it was a source of inspiration or maybe even a source code for emulating the limits of that. I had a look at Xenomai and I used it a few years ago in a different project. And yes, it's another approach. But as we are using this KBAR controller, we don't have much room for doing other stuff. So we wanted to be as close to standard Linux as possible. But you know that there's also a version of it, which is standard Linux. Yes, yes, I know of that. But we decided to only use the Linux programming interface for that. A very short remark about the shared memory. If anybody is struggling with this boost shared memory is a good way to go. Boost shared memory, okay. Didn't have a look at this yet. Thanks. And the other thing you want to know is constructors. Look at constructor function. This is what basically a trick you can avoid overloading the main function. Constructors, okay. Yes, I think I remember I had some experiments with that. But yeah, maybe I should have continued Xenomai. Okay, okay. Let me have a question.